Bibliography

The following articles and software packages were used in the development of MetaGenePipe.

kra

Kraken taxonomic sequence classification system: operating manual. URL: https://ccb.jhu.edu/software/kraken/MANUAL.html, doi:10.1186/gb-2014-15-3-r46.

AGM+90

S F Altschul, W Gish, W Miller, E W Myers, and D J Lipman. Basic local alignment search tool. J. Mol. Biol., 215(3):403–410, October 1990. doi:10.1016/S0022-2836(05)80360-2.

AMSchaffer+97

S F Altschul, T L Madden, A A Schäffer, J Zhang, Z Zhang, W Miller, and D J Lipman. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research, 25(17):3389–3402, September 1997. doi:10.1093/nar/25.17.3389.

And10

S. Andrews. FASTQC. A quality control tool for high throughput sequence data. 2010. URL: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/.

ABME+20

T. Aramaki, R. Blanc-Mathieu, H. Endo, K. Ohkubo, M. Kanehisa, S. Goto, and H. Ogata. KofamKOALA: KEGG Ortholog assignment based on profile HMM and adaptive score threshold. Bioinformatics, 36(7):2251–2252, 04 2020. doi:10.1093/bioinformatics/btz859.

ABME+19

Takuya Aramaki, Romain Blanc-Mathieu, Hisashi Endo, Koichi Ohkubo, Minoru Kanehisa, Susumu Goto, and Hiroyuki Ogata. Kofamkoala: kegg ortholog assignment based on profile hmm and adaptive score threshold. Bioinformatics, 36(7):2251–2252, 2019. doi:10.1093/bioinformatics/btz859.

AHS+17

Katherine E Arden, Claire Heney, Babak Shaban, Graeme R Nimmo, Michael D Nissen, Theo P Sloots, and Ian M Mackay. Detection of Toscana virus from an adult traveler returning to Australia with encephalitis. J. Med. Virol., 89(10):1861–1864, October 2017. doi:10.1002/jmv.24839.

BGQ+11

Derek W. Barnett, Erik K. Garrison, Aaron R. Quinlan, Michael P. Strömberg, and Gabor T. Marth. BamTools: a C++ API and toolkit for analyzing and managing BAM files. Bioinformatics, 27(12):1691–1692, 04 2011. URL: https://doi.org/10.1093/bioinformatics/btr174, arXiv:https://academic.oup.com/bioinformatics/article-pdf/27/12/1691/709404/btr174.pdf, doi:10.1093/bioinformatics/btr174.

BLU14

A. M. Bolger, M. Lohse, and B. Usadel. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics, 30(15):2114–2120, Aug 2014. doi:10.1093/bioinformatics/btu170.

BLT+07

E. Boutet, D. Lieberherr, M. Tognolli, M. Schneider, and A. Bairoch. UniProtKB/Swiss-Prot. Methods Mol Biol, 406:89–112, 2007. doi:10.1007/978-1-59745-535-0_4.

BLB20

Tomáš Brůna, Alexandre Lomsadze, and Mark Borodovsky. GeneMark-EP+: eukaryotic gene prediction with self-training in the space of genes and proteins. NAR Genomics and Bioinformatics, 05 2020. lqaa026. URL: https://doi.org/10.1093/nargab/lqaa026, arXiv:https://academic.oup.com/nargab/article-pdf/2/2/lqaa026/34054524/lqaa026.pdf, doi:10.1093/nargab/lqaa026.

BRD21

Benjamin Buchfink, Klaus Reuter, and Hajk-Georg Drost. Sensitive protein alignments at tree-of-life scale using diamond. Nature Methods, 18(4):366–368, 2021. doi:10.1038/s41592-021-01101-x.

BXH15

Benjamin Buchfink, Chao Xie, and Daniel H Huson. Fast and sensitive protein alignment using DIAMOND. Nat. Methods, 12(1):59–60, January 2015. doi:10.1038/nmeth.3176.

CCA+09

Christiam Camacho, George Coulouris, Vahram Avagyan, Ning Ma, Jason Papadopoulos, Kevin Bealer, and Thomas L Madden. BLAST+: architecture and applications. BMC Bioinformatics, 10(1):421, December 2009. doi:10.1186/1471-2105-10-421.

CAK12

Gaudart Corinne, Garrigou Alain, and Chassaing Karine. Analysis of organizational conditions for risk management: the case study of a petrochemical site. Work, 41 Suppl 1:2661–2667, 2012. doi:10.3233/wor-2012-1032-2661.

DTCF+17

Paolo Di Tommaso, Maria Chatzou, Evan W Floden, Pablo Prieto Barja, Emilio Palumbo, and Cedric Notredame. Nextflow enables reproducible computational workflows. Nature Biotechnology, 35(4):316–319, 2017. doi:10.1038/nbt.3820.

EMLK16

Philip Ewels, Måns Magnusson, Sverker Lundin, and Max Käller. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics, 32(19):3047–3048, 06 2016. URL: https://doi.org/10.1093/bioinformatics/btw354, arXiv:https://academic.oup.com/bioinformatics/article-pdf/32/19/3047/25072524/btw354.pdf, doi:10.1093/bioinformatics/btw354.

GonzalezTKA+21

Enrique González-Tortuero, Revathy Krishnamurthi, Heather E. Allison, Ian B. Goodhead, and Chloë E. James. Comparative analysis of gene prediction tools for viral genome annotation. bioRxiv, 2021. URL: https://www.biorxiv.org/content/early/2021/12/13/2021.12.11.472104, arXiv:https://www.biorxiv.org/content/early/2021/12/13/2021.12.11.472104.full.pdf, doi:10.1101/2021.12.11.472104.

HEG+20a

Steven Hofmeyr, Rob Egan, Evangelos Georganas, Alex C. Copeland, Robert Riley, Alicia Clum, Emiley Eloe-Fadrosh, Simon Roux, Eugene Goltsman, Aydın Buluç, Daniel Rokhsar, Leonid Oliker, and Katherine Yelick. Terabase-scale metagenome coassembly with metahipmer. Scientific Reports, 10(1):10689, 2020. URL: https://doi.org/10.1038/s41598-020-67416-5, doi:10.1038/s41598-020-67416-5.

HEG+20b

Steven Hofmeyr, Rob Egan, Evangelos Georganas, Alex C. Copeland, Robert Riley, Alicia Clum, Emiley Eloe-Fadrosh, Simon Roux, Eugene Goltsman, and Aydın et al. Buluç. Terabase-scale metagenome coassembly with metahipmer. Scientific Reports, 2020. doi:10.1038/s41598-020-67416-5.

HCL+10a

Doug Hyatt, Gwo-Liang Chen, Philip F Locascio, Miriam L Land, Frank W Larimer, and Loren J Hauser. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics, 11(1):119, March 2010. doi:10.1186/1471-2105-11-119.

HCL+10b

Doug Hyatt, Gwo-Liang Chen, Philip F LoCascio, Miriam L Land, Frank W Larimer, and Loren J Hauser. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics, 2010. doi:10.1186/1471-2105-11-119.

Kan19

M. Kanehisa. Toward understanding the origin and evolution of cellular organisms. Protein Sci, 28(11):1947–1951, 11 2019. doi:10.1002/pro.3715.

KFS+21

M. Kanehisa, M. Furumichi, Y. Sato, M. Ishiguro-Watanabe, and M. Tanabe. KEGG: integrating viruses and cellular organisms. Nucleic Acids Res, 49(D1):D545–D551, 01 2021. doi:10.1093/nar/gkaa970.

KG00

M. Kanehisa and S. Goto. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res, 28(1):27–30, Jan 2000. doi:10.1093/nar/28.1.27.

KLK+19

Dongwan D Kang, Feng Li, Edward Kirton, Ashleigh Thomas, Rob Egan, Hong An, and Zhong Wang. Metabat 2: an adaptive binning algorithm for robust and efficient genomereconstruction from metagenome assemblies. PeerJ, 7:e7359, 2019. doi:10.7717/peerj.7359.

KGM16

Kevin P. Keegan, Elizabeth M. Glass, and Folker Meyer. Mg-rast, a metagenomics service for analysis of microbial community structure and function. Microbial Environmental Genomics (MEG), pages 207–233, 2016. doi:10.1007/978-1-4939-3369-3_13.

KBZ+20

Silas Kieser, Joseph Brown, Evgeny M. Zdobnov, Mirko Trajkovski, and Lee Ann McCue. Atlas: a snakemake workflow for assembly, annotation, and genomic binning of metagenome sequence data. BMC Bioinformatics, 2020. doi:10.1186/s12859-020-03585-4.

KSG+22

Sabrina Krakau, Daniel Straub, Hadrien Gourlé, Gisela Gabernet, and Sven Nahnsen. nf-core/mag: a best-practice pipeline for metagenome hybrid assembly and binning. NAR Genomics and Bioinformatics, 02 2022. lqac007. URL: https://doi.org/10.1093/nargab/lqac007, arXiv:https://academic.oup.com/nargab/article-pdf/4/1/lqac007/42366621/lqac007.pdf, doi:10.1093/nargab/lqac007.

KJE+21

Felix Krueger, Frankie James, Phil Ewels, Ebrahim Afyounian, and Benjamin Schuster-Boeckler. Trimgalore. July 2021. URL: https://doi.org/10.5281/zenodo.5127899, doi:10.5281/zenodo.5127899.

KSB17

Gregory M. Kurtzer, Vanessa Sochat, and Michael W. Bauer. Singularity: scientific containers for mobility of compute. PLOS ONE, 12(5):e0177459, 2017. doi:10.1371/journal.pone.0177459.

LS12

Ben Langmead and Steven L Salzberg. Fast gapped-read alignment with Bowtie 2. Nature Methods, 9(4):357–359, 2012. URL: https://doi.org/10.1038/nmeth.1923, doi:10.1038/nmeth.1923.

LLL+15

Dinghua Li, Chi-Man Liu, Ruibang Luo, Kunihiko Sadakane, and Tak-Wah Lam. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics, 31(10):1674–1676, 01 2015. URL: https://doi.org/10.1093/bioinformatics/btv033, arXiv:https://academic.oup.com/bioinformatics/article-pdf/31/10/1674/17085710/btv033.pdf, doi:10.1093/bioinformatics/btv033.

LD10

Heng Li and Richard Durbin. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics, 26(5):589–595, March 2010. doi:10.1093/bioinformatics/btp698.

LHW+09

Heng Li, Bob Handsaker, Alec Wysoker, Tim Fennell, Jue Ruan, Nils Homer, Gabor Marth, Goncalo Abecasis, Richard Durbin, and 1000 Genome Project Data Processing Subgroup. The Sequence Alignment/Map format and SAMtools. Bioinformatics, 25(16):2078–2079, 06 2009. URL: https://doi.org/10.1093/bioinformatics/btp352, arXiv:https://academic.oup.com/bioinformatics/article-pdf/25/16/2078/531810/btp352.pdf, doi:10.1093/bioinformatics/btp352.

MagovcS11

Tanja Magoč and Steven L Salzberg. FLASH: fast length adjustment of short reads to improve genome assemblies. Bioinformatics, 27(21):2957–2963, November 2011. doi:10.1093/bioinformatics/btr507.

ML22

Vijini Mallawaarachchi and Yu Lin. Metacoag: binning metagenomic contigs via composition, coverage and assembly graphs. In Itsik Pe'er, editor, Research in Computational Molecular Biology, 70–85. Cham, 2022. Springer International Publishing.

MSB+21a

M Mirdita, M Steinegger, F Breitwieser, J Söding, and E Levy Karin. Fast and sensitive taxonomic assignment to metagenomic contigs. Bioinformatics, 37(18):3029–3031, 2021. doi:10.1093/bioinformatics/btab184.

MSB+21b

M Mirdita, M Steinegger, F Breitwieser, J Söding, and E Levy Karin. Fast and sensitive taxonomic assignment to metagenomic contigs. Bioinformatics, 37(18):3029–3031, 03 2021. URL: https://doi.org/10.1093/bioinformatics/btab184, arXiv:https://academic.oup.com/bioinformatics/article-pdf/37/18/3029/40471478/btab184.pdf, doi:10.1093/bioinformatics/btab184.

MJL+21

Felix Mölder, Kim Philipp Jablonski, Brice Letcher, Michael B. Hall, Christopher H. Tomkins-Tinch, Vanessa Sochat, Jan Forster, Soohyun Lee, Sven O. Twardziok, and Alexander et al. Kanitz. Sustainable data analysis with snakemake. F1000Research, 10:33, 2021. doi:10.12688/f1000research.29032.1.

PLYC12a

Y. Peng, H. C. M. Leung, S. M. Yiu, and F. Y. L. Chin. Idba-ud: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics, 28(11):1420–1428, 2012. doi:10.1093/bioinformatics/bts174.

PLYC12b

Yu Peng, Henry C. M. Leung, S. M. Yiu, and Francis Y. L. Chin. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics, 28(11):1420–1428, 04 2012. URL: https://doi.org/10.1093/bioinformatics/bts174, arXiv:https://academic.oup.com/bioinformatics/article-pdf/28/11/1420/742285/bts174.pdf, doi:10.1093/bioinformatics/bts174.

RLT+18

Ben Roediger, Quintin Lee, Shweta Tikoo, Joanna C A Cobbin, James M Henderson, Mika Jormakka, Matthew B O'Rourke, Matthew P Padula, Natalia Pinello, Marisa Henry, Maria Wynne, Sara F Santagostino, Cory F Brayton, Lorna Rasmussen, Leszek Lisowski, Szun S Tay, David C Harris, John F Bertram, John P Dowling, Patrick Bertolino, Jack H Lai, Wengen Wu, William W Bachovchin, Justin J-L Wong, Mark D Gorrell, Babak Shaban, Edward C Holmes, Christopher J Jolly, Sébastien Monette, and Wolfgang Weninger. An atypical Parvovirus drives chronic tubulointerstitial nephropathy and kidney fibrosis. Cell, 175(2):530–543.e24, October 2018. doi:10.1016/j.cell.2018.08.013.

SGS19

Erika Sallet, Jérôme Gouzy, and Thomas Schiex. EuGene: An Automated Integrative Gene Finder for Eukaryotes and Prokaryotes, pages 97–120. Springer New York, New York, NY, 2019. URL: https://doi.org/10.1007/978-1-4939-9173-0_6, doi:10.1007/978-1-4939-9173-0_6.

SE11

Robert Schmieder and Robert Edwards. Fast identification and removal of sequence contamination from genomic and metagenomic datasets. PLoS One, 6(3):e17288, March 2011. doi:10.1371/journal.pone.0017288.

SeS12

Marcello Silva e Santos. The PhOCoe model–ergonomic pattern mapping in participatory design processes. Work, 41 Suppl 1:2643–2650, 2012. doi:10.3233/WOR-2012-0507-2643.

TLevequeD+12

Michèle Tosello, Françoise Lévêque, Stéphanie Dutillieu, Guillaume Hernandez, and Jean-François Vautier. Conditions for the successful integration of human and organizational factors (HOF) in the nuclear safety analysis. Work, 41 Suppl 1:2656–2660, 2012. doi:10.3233/wor-2012-0508-2656.

UC18

The UniProt Consortium. Uniprot: the universal protein knowledgebase. Nucleic Acids Research, 46(5):2699–2699, 2018. doi:10.1093/nar/gky092.

VDHV+21

Renaud Van Damme, Martin Hölzer, Adrian Viehweger, Bettina Müller, Erik Bongcam-Rudloff, and Christian Brandt. Metagenomics workflow for hybrid assembly, differential coverage binning, metatranscriptomics and pathway analysis (muffin). PLOS Computational Biology, 17(2):e1008716, 2021. doi:10.1371/journal.pcbi.1008716.

VVdAG22

Kate Voss, Geraldine Van der Auwera, and Jeff Gentry. Full-stack genomics pipelining with gatk4 + wdl + cromwell. 2022. URL: https://f1000research.com/slides/6-1381, doi:10.7490/f1000research.1114634.1.

WSZH15

Ryan R. Wick, Mark B. Schultz, Justin Zobel, and Kathryn E. Holt. Bandage: interactive visualization of de novo genome assemblies. Bioinformatics, 31(20):3350–3352, 06 2015. URL: https://doi.org/10.1093/bioinformatics/btv383, arXiv:https://academic.oup.com/bioinformatics/article-pdf/31/20/3350/17088082/btv383.pdf, doi:10.1093/bioinformatics/btv383.