Usage
Setup input file
Open input_file.txt
and update with your samples. The file format is as follows
SampleID Read1FQ Read2FQ
For example, the first two lines of input_file.txt
could be
mockpos_S50 /data/cephfs/punim0256/gitlab/metaGenePipe/metaGenePipe/fastqFiles/mockpos_S50_100k_R1.fasta /data/cephfs/punim0256/gitlab/metaGenePipe/metaGenePipe/fastqFiles/mockpos_S50_100k_R2.fasta
mockpos_S52 /data/cephfs/punim0256/gitlab/metaGenePipe/metaGenePipe/fastqFiles/mockpos_S52_100k_R1.fastq /data/cephfs/punim0256/gitlab/metaGenePipe/metaGenePipe/fastqFiles/mockpos_S52_100k_R2.fastq
Note
The paths need to be the full paths on your file system. The spaces between the sampleID and reads are tabs. There can be no whitespaces at the end of each line or else the pipeline will fail. Use the complete path to the files to avoid any missed files.**
Output Directory
By default the workflow with write output to the ./outputs
directory. To change this, edit line 2 in metaGenPipe.options.json
:
"final_workflow_outputs_dir": "/path/to/output/",
Blast (Optional)
To use blast, download your preferred database from here: https://ftp.ncbi.nlm.nih.gov/blast/db/
Tell the worklow to use Blast by changing the metaGenPipe.blastBoolean
variable to true
on line 5 of metaGenePipe.json
"metaGenPipe.blastBoolean": true,
Add the path to the Blast database on line 25 of metaGenePipe.json
"metaGenPipe.database": "/path/to/BLAST/db/"
Additionally, set the database_directory
in metaGenPipe.config
on lines 34 and 104:
String database_directory = "/path/to/BLAST/db/"
High Performance Computing (HPC) instructions
To run in a High Performance Computing (HPC) environment, change the metaGenPipe.config
file
and change the default provider on line 17 from local
to Slurm
. e.g.
default = "Slurm"
Change the account string to the appropriate account on your HPC system on line 45 of metaGenPipe.config
String account = "--account=ACCOUNT_NAME"
Change the rt_queue
string on line 39 of metaGenPipe.config
to the partition name(s) in your job scheduler
String rt_queue = "PARTITION_NAME_1,PARTITION_NAME_2"
Run Pipeline
To run the pipeline use the command:
java -Dconfig.file=./metaGenePipe.config -jar cromwell-latest.jar run metaGenePipe.wdl -i metaGenePipe.json -o metaGenePipe.options.json