Extractive Summarisation for German-language Data: A Text-level Approach with Discourse Features

Freya Hewett, Manfred Stede


Abstract
We examine the link between facets of Rhetorical Structure Theory (RST) and the selection of content for extractive summarisation, for German-language texts. For this purpose, we produce a set of extractive summaries for a dataset of German-language newspaper commentaries, a corpus which already has several layers of annotation. We provide an in-depth analysis of the connection between summary sentences and several RST-based features and transfer these insights to various automated summarisation models. Our results show that RST features are informative for the task of extractive summarisation, particularly nuclearity and relations at sentence-level.
Anthology ID:
2022.coling-1.63
Volume:
Proceedings of the 29th International Conference on Computational Linguistics
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Editors:
Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Paggio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He, Tony Kyungil Lee, Enrico Santus, Francis Bond, Seung-Hoon Na
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
756–765
Language:
URL:
https://aclanthology.org/2022.coling-1.63
DOI:
Bibkey:
Cite (ACL):
Freya Hewett and Manfred Stede. 2022. Extractive Summarisation for German-language Data: A Text-level Approach with Discourse Features. In Proceedings of the 29th International Conference on Computational Linguistics, pages 756–765, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Cite (Informal):
Extractive Summarisation for German-language Data: A Text-level Approach with Discourse Features (Hewett & Stede, COLING 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.coling-1.63.pdf
Code
 fhewett/rst-features +  additional community code