Type Prediction With Program Decomposition and Fill-in-the-Type Training

Cassano, Federico; Yee, Ming-Ho; Shinn, Noah; Guha, Arjun; Holtzen, Steven

Computer Science > Software Engineering

arXiv:2305.17145 (cs)

[Submitted on 25 May 2023]

Title:Type Prediction With Program Decomposition and Fill-in-the-Type Training

Authors:Federico Cassano, Ming-Ho Yee, Noah Shinn, Arjun Guha, Steven Holtzen

View PDF

Abstract:TypeScript and Python are two programming languages that support optional type annotations, which are useful but tedious to introduce and maintain. This has motivated automated type prediction: given an untyped program, produce a well-typed output program. Large language models (LLMs) are promising for type prediction, but there are challenges: fill-in-the-middle performs poorly, programs may not fit into the context window, generated types may not type check, and it is difficult to measure how well-typed the output program is. We address these challenges by building OpenTau, a search-based approach for type prediction that leverages large language models. We propose a new metric for type prediction quality, give a tree-based program decomposition that searches a space of generated types, and present fill-in-the-type fine-tuning for LLMs. We evaluate our work with a new dataset for TypeScript type prediction, and show that 47.4% of files type check (14.5% absolute improvement) with an overall rate of 3.3 type errors per file. All code, data, and models are available at: this https URL.

Subjects:	Software Engineering (cs.SE); Machine Learning (cs.LG); Programming Languages (cs.PL)
Cite as:	arXiv:2305.17145 [cs.SE]
	(or arXiv:2305.17145v1 [cs.SE] for this version)
	https://doi.org/10.48550/arXiv.2305.17145

Submission history

From: Ming-Ho Yee [view email]
[v1] Thu, 25 May 2023 21:16:09 UTC (40 KB)

Computer Science > Software Engineering

Title:Type Prediction With Program Decomposition and Fill-in-the-Type Training

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Software Engineering

Title:Type Prediction With Program Decomposition and Fill-in-the-Type Training

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators