LongFin: A Multimodal Document Understanding Model for Long Financial Domain Documents

Masry, Ahmed; Hajian, Amir

Abstract:Document AI is a growing research field that focuses on the comprehension and extraction of information from scanned and digital documents to make everyday business operations more efficient. Numerous downstream tasks and datasets have been introduced to facilitate the training of AI models capable of parsing and extracting information from various document types such as receipts and scanned forms. Despite these advancements, both existing datasets and models fail to address critical challenges that arise in industrial contexts. Existing datasets primarily comprise short documents consisting of a single page, while existing models are constrained by a limited maximum length, often set at 512 tokens. Consequently, the practical application of these methods in financial services, where documents can span multiple pages, is severely impeded. To overcome these challenges, we introduce LongFin, a multimodal document AI model capable of encoding up to 4K tokens. We also propose the LongForms dataset, a comprehensive financial dataset that encapsulates several industrial challenges in financial documents. Through an extensive evaluation, we demonstrate the effectiveness of the LongFin model on the LongForms dataset, surpassing the performance of existing public models while maintaining comparable results on existing single-page benchmarks.

Comments:	Accepted at AAAI 2024 Workshop on AI in Finance for Social Impact
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2401.15050 [cs.CL]
	(or arXiv:2401.15050v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2401.15050

Computer Science > Computation and Language

Title:LongFin: A Multimodal Document Understanding Model for Long Financial Domain Documents

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators