Thu. May 6th, 2021


Historical research necessarily includes text reading and summarizing. Nevertheless, interpretation of an old document may be difficult even for specialists. A recent study addresses the need for automatic historical text summarization in modern language.

Example of Ugaritic language. Image credit: Rama via Wikimedia, CC-BY-SA-2.0-FR

A transfer-learning-based approach is used to summarize news from 16th to 19th century in German and Chinese languages. It uses similarities between languages and therefore requires limited or no cross-lingual supervision. The method outperforms state-of-the-art baselines for standard cross-lingual summarization.

Experts also positively evaluated informativeness, conciseness, and fluency of automatically generated summaries, as well as their affinity to current linguistic styles. This work indicates the improvement directions in historical text processing, such as issues with texts on themes not seen in modern documents.

We introduce the task of historical text summarisation, where documents in historical forms of a language are summarised in the corresponding modern language. This is a fundamentally important routine to historians and digital humanities researchers but has never been automated. We compile a high-quality gold-standard text summarisation dataset, which consists of historical German and Chinese news from hundreds of year ago summarised in modern German or Chinese. Based on cross-lingual transfer learning techniques, we propose a summarisation model which can be trained even with no cross-lingual (historical to modern) parallel data, and further benchmark it against state-of-the-art algorithms. We report automatic and human evaluations that distinguish the historic to modern language summarisation task from standard cross-lingual summarisation (i.e., modern to modern language), highlight the distinctness and value of our dataset, and demonstrate that our transfer learning approach outperforms standard cross-lingual benchmarks on this task.

Research paper: Peng, X., Zheng, Y., Lin, C., Siddharthan, A./ arXiv210110759. Summarising Historical Text in Modern Languages..Link: https://arxiv.org/abs/2101.10759

 






Source link