×

nf-core/stitch @ 0.0.0-6c4ed3a

STITCH is an R program for reference panel free, read aware, low coverage sequencing genotype imputation. STITCH runs on a set of samples with sequencing reads in BAM format, as well as a list of positions to genotype, and outputs imputed genotypes in VCF format.

Latest version: 0.0.0-6c4ed3a
Total downloads: 11
Source: nf-core/modules
Authors: @saulpierotti
Maintainers: @saulpierotti

Summary

STITCH is an R program for reference panel free, read aware, low coverage sequencing genotype imputation. STITCH runs on a set of samples with sequencing reads in BAM format, as well as a list of positions to genotype, and outputs imputed genotypes in VCF format.

Get started

Add the following snippet to your workflow script to include this module.

include { STITCH } from 'nf-core/stitch'

License

MIT License

Process
Name STITCH
Input 3 channels
#1 tuple
meta map

Groovy Map containing information about the set of samples e.g. [ id:'test' ]

collected_crams file

List of sorted BAM/CRAM/SAM file

*.{bam,cram,sam}
collected_crais file

List of BAM/CRAM/SAM index files

*.{bai,crai,sai}
cramlist file

Text file with the path to the cram files to use in imputation, one per line. Since the cram files are staged to the working directory for the process, this file should just contain the file names without any pre-pending path.

*.txt
samplename file

(Optional) File with list of samples names in the same order as in bamlist to impute. One file per line.

*.{txt}
posfile file

Tab-separated file describing the variable positions to be used for imputation. Refer to the documentation for the --posfile argument of STITCH for more information.

*.tsv
input directory

Folder of pre-generated input RData objects used when STITCH is called with the --regenerateInput FALSE flag. It is generated by running STITCH with the --generateInputOnly TRUE flag.

input
genetic_map file

(Optional) File with genetic map information, a file with 3 white-space delimited entries giving position (1-based), genetic rate map in cM/Mbp, and genetic map in cM. If no file included, rate is based on physical distance and expected rate (expRate).

*.{txt,map}{,gz}
rdata directory

Folder of pre-generated input RData objects used when STITCH is called with the --regenerateInput FALSE flag. It is generated by running STITCH with the --generateInputOnly TRUE flag.

RData
chromosome_name string

Name of the chromosome to impute. Should match a chromosome name in the reference genome.

start integer

Start position of the region to impute.

end integer

End position of the region to impute.

K integer

Number of ancestral haplotypes to use for imputation. Refer to the documentation for the --K argument of STITCH for more information.

nGen integer

Number of generations since founding of the population to use for imputation. Refer to the documentation for the --nGen argument of STITCH for more information.

#2 tuple
meta2 map

Groovy Map containing information about the reference genome used e.g. [ id:'test' ]

fasta file

FASTA reference genome file

*.{fa,fasta}
fasta_fai file

FASTA index file

*.{fai}
seed integer

Seed for random number generation

Output 8 channels
#1 vcf tuple
meta map

Groovy Map containing sample information e.g. [ id:'test' ]

*.vcf.gz file

Imputed genotype calls for the positions in posfile, in vcf format. This is the default output.

.vcf.gz
#2 bgen tuple
meta map

Groovy Map containing sample information e.g. [ id:'test' ]

*.bgen file

Imputed genotype calls for the positions in posfile, in vcf format. This is the produced if --output_format bgen is specified.

.bgen
#3 input tuple
meta map

Groovy Map containing sample information e.g. [ id:'test' ]

input directory

Folder of pre-generated input RData objects used when STITCH is called with the --regenerateInput FALSE flag. It is generated by running STITCH with the --generateInputOnly TRUE flag.

input
#4 plots tuple
meta map