| Veröffentlichte Version Download ( PDF | 202kB) | Lizenz: Creative Commons Namensnennung 4.0 International |
University of Regensburg @ SwissText 2021 SEPP-NLG: Adding Sentence Structure to Unpunctuated Text
Donabauer, Gregor und Kruschwitz, Udo
(2021)
University of Regensburg @ SwissText 2021 SEPP-NLG: Adding Sentence Structure to Unpunctuated Text.
In: Swiss Text Analytics Conference, 2021, Online.
Veröffentlichungsdatum dieses Volltextes: 08 Okt 2021 05:20
Konferenz- oder Workshop-Beitrag
Zusammenfassung
This paper describes our approach (UR- mSBD) to address the shared task on Sentence End and Punctuation Prediction in NLG Text (SEPP-NLG) organised as part of SwissText 2021. We participated in Subtask 1 (fully un- punctuated sentences – full stop detection) and submitted a run for every featured language (English, German, French, Italian). Our sub- missions are based on pre-trained BERT ...
This paper describes our approach (UR-
mSBD) to address the shared task on Sentence
End and Punctuation Prediction in NLG Text
(SEPP-NLG) organised as part of SwissText
2021. We participated in Subtask 1 (fully un-
punctuated sentences – full stop detection) and
submitted a run for every featured language
(English, German, French, Italian). Our sub-
missions are based on pre-trained BERT mod-
els that have been fine-tuned to the task at hand.
We had recently demonstrated, that such an ap-
proach achieves state-of-the-art performance
when identifying end-of-sentence markers on
automatically transcribed texts. The difference
to that work is that here we use language-
specific BERT models for each featured lan-
guage. By framing the problem as a binary
tagging task using the outlined architecture we
are able to achieve competitive results on the
official test set across all languages, with Re-
call, Precision, F1 ranging between 0.91 and
0.96 which makes us joint winners for Recall
in two of the languages. The official baselines
are beaten by large margins.
Alternative Links zum Volltext
Beteiligte Einrichtungen
Details
| Dokumentenart | Konferenz- oder Workshop-Beitrag (Paper) |
| Buchtitel: | SwissText 2021. Proceedings of the Swiss Text Analytics Conference 2021 Winterthur, Switzerland, June 14-16, 2021 (held online due to COVID19 pandemic) |
|---|---|
| Verlag: | RWTH Aachen |
| Ort der Veröffentlichung: | Aachen |
| Sonstige Reihe: | CEUR workshop proceedings |
| Band: | 2957 |
| Datum | Juni 2021 |
| Institutionen | Sprach- und Literatur- und Kulturwissenschaften > Institut für Information und Medien, Sprache und Kultur (I:IMSK) > Lehrstuhl für Informationswissenschaft (Prof. Dr. Udo Kruschwitz) Informatik und Data Science > Fachbereich Menschzentrierte Informatik > Lehrstuhl für Informationswissenschaft (Prof. Dr. Udo Kruschwitz) |
| Stichwörter / Keywords | language-specific BERT models, text processing, language |
| Dewey-Dezimal-Klassifikation | 000 Informatik, Informationswissenschaft, allgemeine Werke > 020 Bibliotheks- und Informationswissenschaft |
| Status | Veröffentlicht |
| Begutachtet | Ja, diese Version wurde begutachtet |
| An der Universität Regensburg entstanden | Ja |
| URN der UB Regensburg | urn:nbn:de:bvb:355-epub-493520 |
| Dokumenten-ID | 49352 |
Downloadstatistik
Downloadstatistik