MS-Mentions: Consistently Annotating Entity Mentions in Materials Science Procedural Text

Tim O’Gorman, Zach Jensen, Sheshera Mysore, Kevin Huang, Rubayyat Mahbub, Elsa Olivetti, Andrew McCallum


Abstract
Material science synthesis procedures are a promising domain for scientific NLP, as proper modeling of these recipes could provide insight into new ways of creating materials. However, a fundamental challenge in building information extraction models for material science synthesis procedures is getting accurate labels for the materials, operations, and other entities of those procedures. We present a new corpus of entity mention annotations over 595 Material Science synthesis procedural texts (157,488 tokens), which greatly expands the training data available for the Named Entity Recognition task. We outline a new label inventory designed to provide consistent annotations and a new annotation approach intended to maximize the consistency and annotation speed of domain experts. Inter-annotator agreement studies and baseline models trained upon the data suggest that the corpus provides high-quality annotations of these mention types. This corpus helps lay a foundation for future high-quality modeling of synthesis procedures.
Anthology ID:
2021.emnlp-main.101
Original:
2021.emnlp-main.101v1
Version 2:
2021.emnlp-main.101v2
Volume:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1337–1352
Language:
URL:
https://aclanthology.org/2021.emnlp-main.101
DOI:
10.18653/v1/2021.emnlp-main.101
Bibkey:
Cite (ACL):
Tim O’Gorman, Zach Jensen, Sheshera Mysore, Kevin Huang, Rubayyat Mahbub, Elsa Olivetti, and Andrew McCallum. 2021. MS-Mentions: Consistently Annotating Entity Mentions in Materials Science Procedural Text. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 1337–1352, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
MS-Mentions: Consistently Annotating Entity Mentions in Materials Science Procedural Text (O’Gorman et al., EMNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.emnlp-main.101.pdf
Video:
 https://aclanthology.org/2021.emnlp-main.101.mp4
Data
SOFC-Exp