Controllable Mind Visual Diffusion Model

Authors

  • Bohan Zeng Beihang University
  • Shanglin Li Beihang University
  • Xuhui Liu Beihang University
  • Sicheng Gao Beihang University
  • Xiaolong Jiang Xiaohongshu Inc.
  • Xu Tang Xiaohongshu Inc.
  • Yao Hu Xiaohongshu Inc.
  • Jianzhuang Liu Shenzhen Institute of Advanced Technology, Shenzhen, China
  • Baochang Zhang Beihang University Zhongguancun Laboratory, Beijing, China Nanchang Institute of Technology, Nanchang, China

DOI:

https://doi.org/10.1609/aaai.v38i7.28519

Keywords:

CV: Computational Photography, Image & Video Synthesis

Abstract

Brain signal visualization has emerged as an active research area, serving as a critical interface between the human visual system and computer vision models. Diffusion-based methods have recently shown promise in analyzing functional magnetic resonance imaging (fMRI) data, including the reconstruction of high-quality images consistent with original visual stimuli. Nonetheless, it remains a critical challenge to effectively harness the semantic and silhouette information extracted from brain signals. In this paper, we propose a novel approach, termed as Controllable Mind Visual Diffusion Model (CMVDM). Specifically, CMVDM first extracts semantic and silhouette information from fMRI data using attribute alignment and assistant networks. Then, a control model is introduced in conjunction with a residual block to fully exploit the extracted information for image synthesis, generating high-quality images that closely resemble the original visual stimuli in both semantic content and silhouette characteristics. Through extensive experimentation, we demonstrate that CMVDM outperforms existing state-of-the-art methods both qualitatively and quantitatively. Our code is available at https://github.com/zengbohan0217/CMVDM.

Published

2024-03-24

How to Cite

Zeng, B., Li, S., Liu, X., Gao, S., Jiang, X., Tang, X., Hu, Y., Liu, J., & Zhang, B. (2024). Controllable Mind Visual Diffusion Model. Proceedings of the AAAI Conference on Artificial Intelligence, 38(7), 6935-6943. https://doi.org/10.1609/aaai.v38i7.28519

Issue

Section

AAAI Technical Track on Computer Vision VI