Nur für Besitzer und Autoren: Kontrollseite des Eintrags

Poesio, Massimo ; Chamberlain, Jon ; Paun, Silviu ; Yu, Juntao ; Uma, Alexandra ; Kruschwitz, Udo

A Crowdsourced Corpus of Multiple Judgments and Disagreement on Anaphoric Interpretation

Poesio, Massimo, Chamberlain, Jon, Paun, Silviu, Yu, Juntao, Uma, Alexandra und Kruschwitz, Udo

(2019) A Crowdsourced Corpus of Multiple Judgments and Disagreement on Anaphoric Interpretation. In: NAACL 2019 - Conference of the North American Chapter of the Association for Computational Linguistics, June, 2019, Minneapolis, Minnesota.

Veröffentlichungsdatum dieses Volltextes: 29 Jun 2020 13:34
Konferenz- oder Workshop-Beitrag
DOI zum Zitieren dieses Dokuments: 10.5283/epub.43420

Vorschau

Veröffentlichte Version
Download ( PDF | 190kB)

Lizenz: Creative Commons Namensnennung 4.0 International

Zusammenfassung

We present a corpus of anaphoric information (coreference) crowdsourced through a game-with-a-purpose. The corpus, containing annotations for about 108,000 markables, is one of the largest corpora for coreference for English, and one of the largest crowdsourced NLP corpora, but its main feature is the large number of judgments per markable: 20 on average, and over 2.2M in total. This characteristic makes the corpus a unique resource for the study of disagreements on anaphoric interpretation. A second distinctive feature is its rich annotation scheme, covering singletons, expletives, and split-antecedent plurals. Finally, the corpus also comes with labels inferred using a recently proposed probabilistic model of annotation for coreference. The labels are of high quality and make it possible to successfully train a state of the art coreference resolver, including training on singletons and non-referring expressions. The annotation model can also result in more than one label, or no label, being proposed for a markable, thus serving as a baseline method for automatically identifying ambiguous markables. A preliminary analysis of the results is presented.

Alternative Links zum Volltext

Verlagexterner Link, öffnet neues Fenster

Beteiligte Einrichtungen

Sprach- und Literatur- und Kulturwissenschaften > Institut für Information und Medien, Sprache und Kultur (I:IMSK) > Lehrstuhl für Informationswissenschaft (Prof. Dr. Udo Kruschwitz) Informatik und Data Science > Fachbereich Menschzentrierte Informatik > Lehrstuhl für Informationswissenschaft (Prof. Dr. Udo Kruschwitz)
Browse Publikationen

Details

Dokumentenart	Konferenz- oder Workshop-Beitrag (Nicht ausgewählt)
Buchtitel:	Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
Verlag:	Association for Computational Linguistics
Ort der Veröffentlichung:	Minneapolis, Minnesota
Seitenbereich:	S. 1778-1789
Datum	Juni 2019
Institutionen	Sprach- und Literatur- und Kulturwissenschaften > Institut für Information und Medien, Sprache und Kultur (I:IMSK) > Lehrstuhl für Informationswissenschaft (Prof. Dr. Udo Kruschwitz) Informatik und Data Science > Fachbereich Menschzentrierte Informatik > Lehrstuhl für Informationswissenschaft (Prof. Dr. Udo Kruschwitz)
Dewey-Dezimal-Klassifikation	000 Informatik, Informationswissenschaft, allgemeine Werke > 020 Bibliotheks- und Informationswissenschaft
Status	Veröffentlicht
Begutachtet	Ja, diese Version wurde begutachtet
An der Universität Regensburg entstanden	Ja
URN der UB Regensburg	urn:nbn:de:bvb:355-epub-434200
Dokumenten-ID	43420

Bibliographische Daten exportieren

Nur für Besitzer und Autoren: Kontrollseite des Eintrags

Downloadstatistik

Weitere Literatur (mittels CORE)

nach oben