| Download ( PDF | 399kB) |
Concept Extractor – Ein flexibler und domänenspezifischer Web Service zur Beschlagwortung von Texten
Faulstich, Lukas C., Quasthoff, Uwe, Schmidt, Fabian and Wolff, Christian (2002) Concept Extractor – Ein flexibler und domänenspezifischer Web Service zur Beschlagwortung von Texten. In: Hammwöhner, Rainer and Wolff, Christian and Womser-Hacker, Christa, (eds.) Information und Mobilität. Proc. 8. Internationales Symposium für Informationswissenschaft. Schriften zur Informationswissenschaft, 40. UVK, Konstanz, pp. 165-180. ISBN 978-3896697592.Date of publication of this fulltext: 19 Oct 2009 12:26
Book section
Abstract
We describe a flexible and modular system for keyword extraction and attribution which operates on top of a text mining engine. Texts are analysed in comparison with a large reference corpus and key words are determined using a frequency based method for determining relative term significance. Additionally, selected terms may be expanded using large knowledge bases on inflected forms, ...
We describe a flexible and modular system for keyword extraction and attribution which operates on top of a text mining engine. Texts are analysed in comparison with a large reference corpus and key words are determined using a frequency based method for determining relative term significance. Additionally, selected terms may be expanded using large knowledge bases on inflected forms, orthographic variants, synonyms and multi word terms. This solution is realised as a web-based service which can easily be integrated into existing content management systems.
Der Beitrag beschreibt ein flexibles und modulares System zur automatischen Beschlagwortung von Texten, das auf einer Text Mining-Engine aufbaut. Dabei liegt eine Methode der differentiellen Corpusanalyse zugrunde: Der zu verarbeitende Text wird im Vergleich mit einem unfangreichen Referenzcorpus analysiert und Unterschiede in relativen Häufigkeitsklassen dienen der Auswahl geeigneter Schlagworte. Zusätzlich kommen Datenbanken zum Einsatz, die eine Expansion von Termen hinsichtlich Grundform, Schreibvarianten, Synonymen und Mehrwortbegriffen erlauben. Das System ist als web service realisiert und lässt sich problemlos in Content Management-Systeme integrieren.
Alternative links to fulltext
Involved Institutions
Details
| Item type | Book section | ||||||||||
| ISBN | 978-3896697592 | ||||||||||
| Title of Book: | Information und Mobilität. Proc. 8. Internationales Symposium für Informationswissenschaft | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Publisher: | UVK | ||||||||||
| Place of Publication: | Konstanz | ||||||||||
| Other Series: | Schriften zur Informationswissenschaft | ||||||||||
| Volume: | 40 | ||||||||||
| Page Range: | pp. 165-180 | ||||||||||
| Date | 2002 | ||||||||||
| Institutions | Languages and Literatures > Institut für Information und Medien, Sprache und Kultur (I:IMSK) > Lehrstuhl für Medieninformatik (Prof. Dr. Christian Wolff) Informatics and Data Science > Department Human-Centered Computing > Lehrstuhl für Medieninformatik (Prof. Dr. Christian Wolff) | ||||||||||
| Identification Number |
| ||||||||||
| Classification |
| ||||||||||
| Keywords | Corpus Linguistics Indexing terminology Management term extraction Information Retrieval Web Services Service oriented computing | ||||||||||
| Dewey Decimal Classification | 000 Computer science, information & general works > 020 Library & information sciences 400 Language > 400 Language, Linguistics 000 Computer science, information & general works > 004 Computer science | ||||||||||
| Status | Published | ||||||||||
| Refereed | Yes, this version has been refereed | ||||||||||
| Created at the University of Regensburg | Yes | ||||||||||
| URN of the UB Regensburg | urn:nbn:de:bvb:355-epub-67572 | ||||||||||
| Item ID | 6757 |
Download Statistics
Download Statistics