FLAME: A small language model for spreadsheet formulas

Joshi, Harshit; Ebenezer, Abishai; Cambronero, José; Gulwani, Sumit; Kanade, Aditya; Le, Vu; Radiček, Ivan; Verbruggen, Gust

Computer Science > Programming Languages

arXiv:2301.13779 (cs)

[Submitted on 31 Jan 2023 (v1), last revised 19 Dec 2023 (this version, v2)]

Title:FLAME: A small language model for spreadsheet formulas

Authors:Harshit Joshi, Abishai Ebenezer, José Cambronero, Sumit Gulwani, Aditya Kanade, Vu Le, Ivan Radiček, Gust Verbruggen

View PDF HTML (experimental)

Abstract:Spreadsheets are a vital tool for end-user data management. Using large language models for formula authoring assistance in these environments can be difficult, as these models are expensive to train and challenging to deploy due to their size (up to billions of parameters). We present FLAME, a transformer-based model trained exclusively on Excel formulas that leverages domain insights to achieve competitive performance while being substantially smaller (60M parameters) and training on two orders of magnitude less data. We curate a training dataset using sketch deduplication, introduce an Excel-specific formula tokenizer, and use domain-specific versions of masked span prediction and noisy auto-encoding as pre-training objectives. We evaluate FLAME on formula repair, formula completion, and similarity-based formula retrieval. FLAME can outperform much larger models, such as the Davinci (175B) and Cushman (12B) variants of Codex and CodeT5 (220M), in 10 of 14 evaluation settings for the repair and completion tasks. For formula retrieval, FLAME outperforms CodeT5, CodeBERT, and GraphCodeBERT.

Comments:	Accepted to AAAI 2024
Subjects:	Programming Languages (cs.PL); Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
Cite as:	arXiv:2301.13779 [cs.PL]
	(or arXiv:2301.13779v2 [cs.PL] for this version)
	https://doi.org/10.48550/arXiv.2301.13779

Submission history

From: Harshit Joshi [view email]
[v1] Tue, 31 Jan 2023 17:29:43 UTC (230 KB)
[v2] Tue, 19 Dec 2023 22:56:39 UTC (422 KB)

Computer Science > Programming Languages

Title:FLAME: A small language model for spreadsheet formulas

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Programming Languages

Title:FLAME: A small language model for spreadsheet formulas

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators