Text-to-Image Cross-Modal Generation: A Systematic Review

Żelaszczyk, Maciej; Mańdziuk, Jacek

Computer Science > Computer Vision and Pattern Recognition

arXiv:2401.11631 (cs)

[Submitted on 21 Jan 2024]

Title:Text-to-Image Cross-Modal Generation: A Systematic Review

Authors:Maciej Żelaszczyk, Jacek Mańdziuk

View PDF HTML (experimental)

Abstract:We review research on generating visual data from text from the angle of "cross-modal generation." This point of view allows us to draw parallels between various methods geared towards working on input text and producing visual output, without limiting the analysis to narrow sub-areas. It also results in the identification of common templates in the field, which are then compared and contrasted both within pools of similar methods and across lines of research. We provide a breakdown of text-to-image generation into various flavors of image-from-text methods, video-from-text methods, image editing, self-supervised and graph-based approaches. In this discussion, we focus on research papers published at 8 leading machine learning conferences in the years 2016-2022, also incorporating a number of relevant papers not matching the outlined search criteria. The conducted review suggests a significant increase in the number of papers published in the area and highlights research gaps and potential lines of investigation. To our knowledge, this is the first review to systematically look at text-to-image generation from the perspective of "cross-modal generation."

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2401.11631 [cs.CV]
	(or arXiv:2401.11631v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2401.11631

Submission history

From: Maciej Żelaszczyk [view email]
[v1] Sun, 21 Jan 2024 23:54:05 UTC (14,187 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Text-to-Image Cross-Modal Generation: A Systematic Review

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Text-to-Image Cross-Modal Generation: A Systematic Review

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators