Direkt zum Inhalt

Donabauer, Gregor ; Kruschwitz, Udo

University of Regensburg @ SwissText 2021 SEPP-NLG: Adding Sentence Structure to Unpunctuated Text

Donabauer, Gregor und Kruschwitz, Udo (2021) University of Regensburg @ SwissText 2021 SEPP-NLG: Adding Sentence Structure to Unpunctuated Text. In: Swiss Text Analytics Conference, 2021, Online.

Veröffentlichungsdatum dieses Volltextes: 08 Okt 2021 05:20
Konferenz- oder Workshop-Beitrag


Zusammenfassung

This paper describes our approach (UR- mSBD) to address the shared task on Sentence End and Punctuation Prediction in NLG Text (SEPP-NLG) organised as part of SwissText 2021. We participated in Subtask 1 (fully un- punctuated sentences – full stop detection) and submitted a run for every featured language (English, German, French, Italian). Our sub- missions are based on pre-trained BERT ...

This paper describes our approach (UR-
mSBD) to address the shared task on Sentence
End and Punctuation Prediction in NLG Text
(SEPP-NLG) organised as part of SwissText
2021. We participated in Subtask 1 (fully un-
punctuated sentences – full stop detection) and
submitted a run for every featured language
(English, German, French, Italian). Our sub-
missions are based on pre-trained BERT mod-
els that have been fine-tuned to the task at hand.
We had recently demonstrated, that such an ap-
proach achieves state-of-the-art performance
when identifying end-of-sentence markers on
automatically transcribed texts. The difference
to that work is that here we use language-
specific BERT models for each featured lan-
guage. By framing the problem as a binary
tagging task using the outlined architecture we
are able to achieve competitive results on the
official test set across all languages, with Re-
call, Precision, F1 ranging between 0.91 and
0.96 which makes us joint winners for Recall
in two of the languages. The official baselines
are beaten by large margins.



Beteiligte Einrichtungen


Details

DokumentenartKonferenz- oder Workshop-Beitrag (Paper)
Buchtitel:SwissText 2021. Proceedings of the Swiss Text Analytics Conference 2021 Winterthur, Switzerland, June 14-16, 2021 (held online due to COVID19 pandemic)
Verlag:RWTH Aachen
Ort der Veröffentlichung:Aachen
Sonstige Reihe:CEUR workshop proceedings
Band:2957
DatumJuni 2021
InstitutionenSprach- und Literatur- und Kulturwissenschaften > Institut für Information und Medien, Sprache und Kultur (I:IMSK) > Lehrstuhl für Informationswissenschaft (Prof. Dr. Udo Kruschwitz)
Informatik und Data Science > Fachbereich Menschzentrierte Informatik > Lehrstuhl für Informationswissenschaft (Prof. Dr. Udo Kruschwitz)
Stichwörter / Keywordslanguage-specific BERT models, text processing, language
Dewey-Dezimal-Klassifikation000 Informatik, Informationswissenschaft, allgemeine Werke > 020 Bibliotheks- und Informationswissenschaft
StatusVeröffentlicht
BegutachtetJa, diese Version wurde begutachtet
An der Universität Regensburg entstandenJa
URN der UB Regensburgurn:nbn:de:bvb:355-epub-493520
Dokumenten-ID49352

Bibliographische Daten exportieren

Nur für Besitzer und Autoren: Kontrollseite des Eintrags

nach oben