University of Regensburg @ SwissText 2021 SEPP-NLG: Adding Sentence Structure to Unpunctuated Text

Donabauer, Gregor und Kruschwitz, Udo

(2021) University of Regensburg @ SwissText 2021 SEPP-NLG: Adding Sentence Structure to Unpunctuated Text. In: Swiss Text Analytics Conference, 2021, Online.

Veröffentlichungsdatum dieses Volltextes: 08 Okt 2021 05:20
Konferenz- oder Workshop-Beitrag

Vorschau

Veröffentlichte Version
Download ( PDF | 202kB)

Lizenz: Creative Commons Namensnennung 4.0 International

Zusammenfassung

This paper describes our approach (UR-
mSBD) to address the shared task on Sentence
End and Punctuation Prediction in NLG Text
(SEPP-NLG) organised as part of SwissText
2021. We participated in Subtask 1 (fully un-
punctuated sentences – full stop detection) and
submitted a run for every featured language
(English, German, French, Italian). Our sub-
missions are based on pre-trained BERT mod-
els that have been fine-tuned to the task at hand.
We had recently demonstrated, that such an ap-
proach achieves state-of-the-art performance
when identifying end-of-sentence markers on
automatically transcribed texts. The difference
to that work is that here we use language-
specific BERT models for each featured lan-
guage. By framing the problem as a binary
tagging task using the outlined architecture we
are able to achieve competitive results on the
official test set across all languages, with Re-
call, Precision, F1 ranging between 0.91 and
0.96 which makes us joint winners for Recall
in two of the languages. The official baselines
are beaten by large margins.

Alternative Links zum Volltext

Verlagexterner Link, öffnet neues Fenster

Beteiligte Einrichtungen

Sprach- und Literatur- und Kulturwissenschaften > Institut für Information und Medien, Sprache und Kultur (I:IMSK) > Lehrstuhl für Informationswissenschaft (Prof. Dr. Udo Kruschwitz) Informatik und Data Science > Fachbereich Menschzentrierte Informatik > Lehrstuhl für Informationswissenschaft (Prof. Dr. Udo Kruschwitz)
Browse Publikationen

Details

Dokumentenart	Konferenz- oder Workshop-Beitrag (Paper)
Buchtitel:	SwissText 2021. Proceedings of the Swiss Text Analytics Conference 2021 Winterthur, Switzerland, June 14-16, 2021 (held online due to COVID19 pandemic)
Verlag:	RWTH Aachen
Ort der Veröffentlichung:	Aachen
Sonstige Reihe:	CEUR workshop proceedings
Band:	2957
Datum	Juni 2021
Institutionen	Sprach- und Literatur- und Kulturwissenschaften > Institut für Information und Medien, Sprache und Kultur (I:IMSK) > Lehrstuhl für Informationswissenschaft (Prof. Dr. Udo Kruschwitz) Informatik und Data Science > Fachbereich Menschzentrierte Informatik > Lehrstuhl für Informationswissenschaft (Prof. Dr. Udo Kruschwitz)
Stichwörter / Keywords	language-specific BERT models, text processing, language
Dewey-Dezimal-Klassifikation	000 Informatik, Informationswissenschaft, allgemeine Werke > 020 Bibliotheks- und Informationswissenschaft
Status	Veröffentlicht
Begutachtet	Ja, diese Version wurde begutachtet
An der Universität Regensburg entstanden	Ja
URN der UB Regensburg	urn:nbn:de:bvb:355-epub-493520
Dokumenten-ID	49352

Bibliographische Daten exportieren

Nur für Besitzer und Autoren: Kontrollseite des Eintrags

Downloadstatistik

Weitere Literatur (mittels CORE)

nach oben