EscherNet: A Generative Model for Scalable View Synthesis

Kong, Xin; Liu, Shikun; Lyu, Xiaoyang; Taher, Marwan; Qi, Xiaojuan; Davison, Andrew J.

Computer Science > Computer Vision and Pattern Recognition

arXiv:2402.03908 (cs)

[Submitted on 6 Feb 2024 (v1), last revised 19 Mar 2024 (this version, v2)]

Title:EscherNet: A Generative Model for Scalable View Synthesis

Authors:Xin Kong, Shikun Liu, Xiaoyang Lyu, Marwan Taher, Xiaojuan Qi, Andrew J. Davison

View PDF

Abstract:We introduce EscherNet, a multi-view conditioned diffusion model for view synthesis. EscherNet learns implicit and generative 3D representations coupled with a specialised camera positional encoding, allowing precise and continuous relative control of the camera transformation between an arbitrary number of reference and target views. EscherNet offers exceptional generality, flexibility, and scalability in view synthesis -- it can generate more than 100 consistent target views simultaneously on a single consumer-grade GPU, despite being trained with a fixed number of 3 reference views to 3 target views. As a result, EscherNet not only addresses zero-shot novel view synthesis, but also naturally unifies single- and multi-image 3D reconstruction, combining these diverse tasks into a single, cohesive framework. Our extensive experiments demonstrate that EscherNet achieves state-of-the-art performance in multiple benchmarks, even when compared to methods specifically tailored for each individual problem. This remarkable versatility opens up new directions for designing scalable neural architectures for 3D vision. Project page: this https URL.

Comments:	CVPR2024 Project Page: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2402.03908 [cs.CV]
	(or arXiv:2402.03908v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2402.03908

Submission history

From: Xin Kong [view email]
[v1] Tue, 6 Feb 2024 11:21:58 UTC (46,386 KB)
[v2] Tue, 19 Mar 2024 17:41:04 UTC (46,407 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:EscherNet: A Generative Model for Scalable View Synthesis

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:EscherNet: A Generative Model for Scalable View Synthesis

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators