| Veröffentlichte Version Download ( PDF | 427kB) | Lizenz: Creative Commons Namensnennung-NichtKommerziell 4.0 International |
Crimp: An efficient tool for summarizing multiple clusterings in population structure analysis and beyond
Lautenschlager, Ulrich
(2022)
Crimp: An efficient tool for summarizing multiple clusterings in population structure analysis and beyond.
Molecular Ecology Resources 23 (3), S. 705-711.
Veröffentlichungsdatum dieses Volltextes: 08 Mrz 2023 14:35
Artikel
DOI zum Zitieren dieses Dokuments: 10.5283/epub.53909
Zusammenfassung
When a data set is repeatedly clustered using unsupervised techniques, the resulting clusterings, even if highly similar, may list their clusters in different orders. This so-called 'label-switching' phenomenon obscures meaningful differences between clusterings, complicating their comparison and summary. The problem often arises in the context of population structure analysis based on multilocus ...
When a data set is repeatedly clustered using unsupervised techniques, the resulting clusterings, even if highly similar, may list their clusters in different orders. This so-called 'label-switching' phenomenon obscures meaningful differences between clusterings, complicating their comparison and summary. The problem often arises in the context of population structure analysis based on multilocus genotype data. In this field, a variety of popular tools apply model-based clustering, assigning individuals to a prespecified number of ancestral populations. Since such methods often involve stochastic components, it is a common practice to perform multiple replicate analyses based on the same input data and parameter settings. Available postprocessing tools allow to mitigate label switching, but leave room for improvements, in particular, regarding large input data sets. In this work, I present Crimp, a lightweight command-line tool, which offers a relatively fast and scalable heuristic to align clusters across replicate clusterings consisting of the same number of clusters. For small problem sizes, an exact algorithm can be used as an alternative. Additional features include row-specific weights, input and output files similar to those of CLUMPP (Jakobsson & Rosenberg, 2007) and the evaluation of a given solution in terms of CLUMPP as well as its own objective functions. Benchmark analyses show that Crimp, especially when applied to larger data sets, tends to outperform alternative tools considering runtime requirements and various quality measures. While primarily targeting population structure analysis, Crimp can be used as a generic tool to correct multiple clusterings for label switching. This facilitates their comparison and allows to generate an averaged clustering. Crimp's computational efficiency makes it even applicable to relatively large data sets while offering competitive solution quality.
Alternative Links zum Volltext
Beteiligte Einrichtungen
Details
| Dokumentenart | Artikel | ||||
| Titel eines Journals oder einer Zeitschrift | Molecular Ecology Resources | ||||
| Verlag: | WILEY | ||||
|---|---|---|---|---|---|
| Ort der Veröffentlichung: | HOBOKEN | ||||
| Band: | 23 | ||||
| Nummer des Zeitschriftenheftes oder des Kapitels: | 3 | ||||
| Seitenbereich: | S. 705-711 | ||||
| Datum | 9 November 2022 | ||||
| Institutionen | Biologie und Vorklinische Medizin > Institut für Pflanzenwissenschaften > Arbeitsgruppe Evolution und Systematik der Pflanzen (Prof. Dr. Christoph Oberprieler) | ||||
| Identifikationsnummer |
| ||||
| Stichwörter / Keywords | LABEL SWITCHING PROBLEM; DIVERSITY; INFERENCE; cluster correspondence; cluster matching; cluster relabelling; label switching; population structure | ||||
| Dewey-Dezimal-Klassifikation | 500 Naturwissenschaften und Mathematik > 580 Pflanzen (Botanik) | ||||
| Status | Veröffentlicht | ||||
| Begutachtet | Ja, diese Version wurde begutachtet | ||||
| An der Universität Regensburg entstanden | Ja | ||||
| URN der UB Regensburg | urn:nbn:de:bvb:355-epub-539096 | ||||
| Dokumenten-ID | 53909 |
Downloadstatistik
Downloadstatistik