Nur für Besitzer und Autoren: Kontrollseite des Eintrags

Perez-Rubio, Paula ; Lottaz, Claudio ; Engelmann, Julia C.

FastqPuri: high-performance preprocessing of RNA-seq data

Perez-Rubio, Paula, Lottaz, Claudio und Engelmann, Julia C.

(2018) FastqPuri: high-performance preprocessing of RNA-seq data. bioRxiv. (Eingereicht)

Veröffentlichungsdatum dieses Volltextes: 26 Apr 2019 07:08
Artikel
DOI zum Zitieren dieses Dokuments: 10.5283/epub.40096

Vorschau

Download ( PDF | 1MB)

Lizenz: Creative Commons Namensnennung 4.0 International

Zusammenfassung

Background RNA sequencing (RNA-seq) has become the standard means of analyzing gene and transcript expression in high-throughput. While previously sequence alignment was a time demanding step, fast alignment methods and even more so transcript counting methods which avoid mapping and quantify gene and transcript expression by evaluating whether a read is compatible with a transcript, have led to significant speed-ups in data analysis. Now, the most time demanding step in the analysis of RNA-seq data is preprocessing the raw sequence data, such as running quality control and adapter, contamination and quality filtering before transcript or gene quantification. To do so, many researchers chain different tools, but a comprehensive, flexible and fast software that covers all preprocessing steps is currently missing.

Results We here present FastqPuri, a light-weight and highly efficient preprocessing tool for fastq data. FastqPuri provides sequence quality reports on the sample and dataset level with new plots which facilitate decision making for subsequent quality filtering. Moreover, FastqPuri efficiently removes adapter sequences and sequences from biological contamination from the data. It accepts both single- and paired-end data in uncompressed or compressed fastq files. FastqPuri can be run stand-alone and is suitable to be run within pipelines. We benchmarked FastqPuri against existing tools and found that FastqPuri is superior in terms of speed, memory usage, versatility and comprehensiveness. Conclusions: FastqPuri is a new tool which covers all aspects of short read sequence data preprocessing. It was designed for RNA-seq data to meet the needs for fast preprocessing of fastq data to allow transcript and gene counting, but it is suitable to process any short read sequencing data of which high sequence quality is needed, such as for genome assembly or SNV (single nucleotide variant) detection. FastqPuri is most flexible in filtering undesired biological sequences by offering two approaches to optimize speed and memory usage dependent on the total size of the potential contaminating sequences. FastqPuri is available at https://github.com/jengelmann/FastqPuri. It is implemented in C and R and licensed under GPL v3.

Alternative Links zum Volltext

Beteiligte Einrichtungen

Medizin > Institut für Funktionelle Genomik > Lehrstuhl für Statistische Bioinformatik (Prof. Spang) Informatik und Data Science > Fachbereich Bioinformatik > Lehrstuhl für Statistische Bioinformatik (Prof. Spang)
Browse Publikationen

Details

Dokumentenart

Artikel

Titel eines Journals oder einer Zeitschrift

bioRxiv

Verlag:

www.biorxiv.org

Datum

Dezember 2018

Institutionen

Medizin > Institut für Funktionelle Genomik > Lehrstuhl für Statistische Bioinformatik (Prof. Spang)
Informatik und Data Science > Fachbereich Bioinformatik > Lehrstuhl für Statistische Bioinformatik (Prof. Spang)

Identifikationsnummer

Wert	Typ
10.1101/480707	DOI

Dewey-Dezimal-Klassifikation

600 Technik, Medizin, angewandte Wissenschaften > 610 Medizin

Status

Eingereicht

Begutachtet

Nein, diese Version wurde noch nicht begutachtet (bei preprints)

An der Universität Regensburg entstanden

URN der UB Regensburg

urn:nbn:de:bvb:355-epub-400963

Dokumenten-ID

40096

Bibliographische Daten exportieren

Nur für Besitzer und Autoren: Kontrollseite des Eintrags

Downloadstatistik

Altmetric

Alternative Statistik (altmetrics)

Weitere Literatur (mittels CORE)

nach oben