POS: A Prompts Optimization Suite for Augmenting Text-to-Video Generation

Ma, Shijie; Xu, Huayi; Li, Mengjian; Geng, Weidong; Wang, Yaxiong; Wang, Meng

Computer Science > Computer Vision and Pattern Recognition

arXiv:2311.00949 (cs)

[Submitted on 2 Nov 2023 (v1), last revised 10 Jun 2024 (this version, v3)]

Title:POS: A Prompts Optimization Suite for Augmenting Text-to-Video Generation

Authors:Shijie Ma, Huayi Xu, Mengjian Li, Weidong Geng, Yaxiong Wang, Meng Wang

View PDF HTML (experimental)

Abstract:This paper targets to enhance the diffusion-based text-to-video generation by improving the two input prompts, including the noise and the text. Accommodated with this goal, we propose POS, a training-free Prompt Optimization Suite to boost text-to-video models. POS is motivated by two observations: (1) Video generation shows instability in terms of noise. Given the same text, different noises lead to videos that differ significantly in terms of both frame quality and temporal consistency. This observation implies that there exists an optimal noise matched to each textual input; To capture the potential noise, we propose an optimal noise approximator to approach the potential optimal noise. Particularly, the optimal noise approximator initially searches a video that closely relates to the text prompt and then inverts it into the noise space to serve as an improved noise prompt for the textual input. (2) Improving the text prompt via LLMs often causes semantic deviation. Many existing text-to-vision works have utilized LLMs to improve the text prompts for generation enhancement. However, existing methods often neglect the semantic alignment between the original text and the rewritten one. In response to this issue, we design a semantic-preserving rewriter to impose contraints in both rewritng and denoising phrases to preserve the semantic consistency. Extensive experiments on popular benchmarks show that our POS can improve the text-to-video models with a clear margin. The code will be open-sourced.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2311.00949 [cs.CV]
	(or arXiv:2311.00949v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2311.00949

Submission history

From: Shijie Ma [view email]
[v1] Thu, 2 Nov 2023 02:33:09 UTC (4,732 KB)
[v2] Tue, 12 Mar 2024 02:19:00 UTC (26,846 KB)
[v3] Mon, 10 Jun 2024 03:16:09 UTC (26,846 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:POS: A Prompts Optimization Suite for Augmenting Text-to-Video Generation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:POS: A Prompts Optimization Suite for Augmenting Text-to-Video Generation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators