PIXAR: Auto-Regressive Language Modeling in Pixel Space

Tai, Yintao; Liao, Xiyang; Suglia, Alessandro; Vergari, Antonio

Computer Science > Computation and Language

arXiv:2401.03321 (cs)

[Submitted on 6 Jan 2024 (v1), last revised 23 Feb 2024 (this version, v2)]

Title:PIXAR: Auto-Regressive Language Modeling in Pixel Space

Authors:Yintao Tai, Xiyang Liao, Alessandro Suglia, Antonio Vergari

View PDF HTML (experimental)

Abstract:Recent work showed the possibility of building open-vocabulary large language models (LLMs) that directly operate on pixel representations. These models are implemented as autoencoders that reconstruct masked patches of rendered text. However, these pixel-based LLMs are limited to discriminative tasks (e.g., classification) and, similar to BERT, cannot be used to generate text. Therefore, they cannot be used for generative tasks such as free-form question answering. In this work, we introduce PIXAR, the first pixel-based autoregressive LLM that performs text generation. Consisting of only a decoder, PIXAR can perform free-form generative tasks while keeping the number of parameters on par with previous encoder-decoder models. Furthermore, we highlight the challenges of generating text as non-noisy images and show this is due to using a maximum likelihood objective. To overcome this problem, we propose an adversarial pretraining stage that improves the readability and accuracy of PIXAR by 8.1 on LAMBADA and 8.5 on bAbI -- making it comparable to GPT-2 on text generation tasks. This paves the way to build open-vocabulary LLMs that operate on perceptual input only and calls into question the necessity of the usual symbolic input representation, i.e., text as (sub)tokens.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2401.03321 [cs.CL]
	(or arXiv:2401.03321v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2401.03321

Submission history

From: Yintao Tai [view email]
[v1] Sat, 6 Jan 2024 22:49:38 UTC (262 KB)
[v2] Fri, 23 Feb 2024 19:06:35 UTC (2,020 KB)

Computer Science > Computation and Language

Title:PIXAR: Auto-Regressive Language Modeling in Pixel Space

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:PIXAR: Auto-Regressive Language Modeling in Pixel Space

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators