Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models

Liu, Qihao; Zeng, Zhanpeng; He, Ju; Yu, Qihang; Shen, Xiaohui; Chen, Liang-Chieh

Computer Science > Computer Vision and Pattern Recognition

arXiv:2406.09416 (cs)

[Submitted on 13 Jun 2024]

Title:Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models

Authors:Qihao Liu, Zhanpeng Zeng, Ju He, Qihang Yu, Xiaohui Shen, Liang-Chieh Chen

View PDF HTML (experimental)

Abstract:This paper presents innovative enhancements to diffusion models by integrating a novel multi-resolution network and time-dependent layer normalization. Diffusion models have gained prominence for their effectiveness in high-fidelity image generation. While conventional approaches rely on convolutional U-Net architectures, recent Transformer-based designs have demonstrated superior performance and scalability. However, Transformer architectures, which tokenize input data (via "patchification"), face a trade-off between visual fidelity and computational complexity due to the quadratic nature of self-attention operations concerning token length. While larger patch sizes enable attention computation efficiency, they struggle to capture fine-grained visual details, leading to image distortions. To address this challenge, we propose augmenting the Diffusion model with the Multi-Resolution network (DiMR), a framework that refines features across multiple resolutions, progressively enhancing detail from low to high resolution. Additionally, we introduce Time-Dependent Layer Normalization (TD-LN), a parameter-efficient approach that incorporates time-dependent parameters into layer normalization to inject time information and achieve superior performance. Our method's efficacy is demonstrated on the class-conditional ImageNet generation benchmark, where DiMR-XL variants outperform prior diffusion models, setting new state-of-the-art FID scores of 1.70 on ImageNet 256 x 256 and 2.89 on ImageNet 512 x 512. Project page: this https URL

Comments:	Introducing DiMR, a new diffusion backbone that surpasses all existing image generation models of various sizes on ImageNet 256 with only 505M parameters. Project page: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2406.09416 [cs.CV]
	(or arXiv:2406.09416v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2406.09416

Submission history

From: Qihao Liu [view email]
[v1] Thu, 13 Jun 2024 17:59:58 UTC (12,974 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators