Nur für Besitzer und Autoren: Kontrollseite des Eintrags

Madge, Chris ; Yu, Juntao ; Chamberlain, Jon ; Kruschwitz, Udo

; Paun, Silviu ; Poesio, Massimo

Crowdsourcing and Aggregating Nested Markable Annotations

Madge, Chris, Yu, Juntao, Chamberlain, Jon, Kruschwitz, Udo

, Paun, Silviu und Poesio, Massimo (2019) Crowdsourcing and Aggregating Nested Markable Annotations. In: 57th Annual Meeting of the Association for Computational Linguistics, July, 2019, Florence, Italy.

Veröffentlichungsdatum dieses Volltextes: 29 Jun 2020 13:02
Konferenz- oder Workshop-Beitrag
DOI zum Zitieren dieses Dokuments: 10.5283/epub.43402

Vorschau

Veröffentlichte Version
Download ( PDF | 570kB)

Lizenz: Creative Commons Namensnennung 4.0 International

Zusammenfassung

One of the key steps in language resource creation is the identification of the text segments to be annotated, or markables, which depending on the task may vary from nominal chunks for named entity resolution to (potentially nested) noun phrases in coreference resolution (or mentions) to larger text segments in text segmentation. Markable identification is typically carried out semi-automatically, by running a markable identifier and correcting its output by hand—which is increasingly done via annotators recruited through crowdsourcing and aggregating their responses. In this paper, we present a method for identifying markables for coreference annotation that combines high-performance automatic markable detectors with checking with a Game-With-A-Purpose (GWAP) and aggregation using a Bayesian annotation model. The method was evaluated both on news data and data from a variety of other genres and results in an improvement on F1 of mention boundaries of over seven percentage points when compared with a state-of-the-art, domain-independent automatic mention detector, and almost three points over an in-domain mention detector. One of the key contributions of our proposal is its applicability to the case in which markables are nested, as is the case with coreference markables; but the GWAP and several of the proposed markable detectors are task and language-independent and are thus applicable to a variety of other annotation scenarios.

Alternative Links zum Volltext

Beteiligte Einrichtungen

Sprach- und Literatur- und Kulturwissenschaften > Institut für Information und Medien, Sprache und Kultur (I:IMSK) > Lehrstuhl für Informationswissenschaft (Prof. Dr. Udo Kruschwitz) Informatik und Data Science > Fachbereich Menschzentrierte Informatik > Lehrstuhl für Informationswissenschaft (Prof. Dr. Udo Kruschwitz)
Browse Publikationen

Details

Dokumentenart

Konferenz- oder Workshop-Beitrag (Nicht ausgewählt)

Buchtitel:

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy

Verlag:

Association for Computational Linguistics

Seitenbereich:

S. 797-807

Datum

Juli 2019

Institutionen

Sprach- und Literatur- und Kulturwissenschaften > Institut für Information und Medien, Sprache und Kultur (I:IMSK) > Lehrstuhl für Informationswissenschaft (Prof. Dr. Udo Kruschwitz)
Informatik und Data Science > Fachbereich Menschzentrierte Informatik > Lehrstuhl für Informationswissenschaft (Prof. Dr. Udo Kruschwitz)

Identifikationsnummer

Wert	Typ
10.18653/v1/P19-1077	DOI

Dewey-Dezimal-Klassifikation

000 Informatik, Informationswissenschaft, allgemeine Werke > 020 Bibliotheks- und Informationswissenschaft

Status

Veröffentlicht

Begutachtet

Ja, diese Version wurde begutachtet

An der Universität Regensburg entstanden

URN der UB Regensburg

urn:nbn:de:bvb:355-epub-434024

Dokumenten-ID

43402

Bibliographische Daten exportieren

Nur für Besitzer und Autoren: Kontrollseite des Eintrags

Downloadstatistik

Altmetric

Alternative Statistik (altmetrics)

Weitere Literatur (mittels CORE)

nach oben