×

nf-core/glimpse2/concordance @ 0.0.0-6c4ed3a

Program to compute the genotyping error rate at the sample or marker level.

Latest version: 0.0.0-6c4ed3a
Total downloads: 3
Source: nf-core/modules
Authors: @louislenezet
Maintainers: @louislenezet

Summary

Program to compute the genotyping error rate at the sample or marker level.

Get started

Add the following snippet to your workflow script to include this module.

include { GLIMPSE2_CONCORDANCE } from 'nf-core/glimpse2/concordance'

License

MIT License

Process
Name GLIMPSE2_CONCORDANCE
Input 2 channels
#1 tuple
meta map

Groovy Map containing sample information e.g. [ id:'test', single_end:false ]

estimate file

Imputed dataset file obtain after phasing.

*.{vcf,bcf,vcf.gz,bcf.gz}
estimate_index file

Index file for the imputed dataset file.

truth file

Validation dataset called at the same positions as the imputed file.

*.{vcf,bcf,vcf.gz,bcf.gz}
truth_index file

Index file for the truth file.

freq file

File containing allele frequencies at each site.

*.{vcf,bcf,vcf.gz,bcf.gz}
freq_index file

Index file for the allele frequencies file.

samples file

List of samples to process, one sample ID per line.

*.{txt,tsv}
region string

Target region used for imputation, including left and right buffers (e.g. chr20:1000000-2000000). Can also be a list of such regions.

chrXX:leftBufferPosition-rightBufferPosition
#2 tuple
meta2 map

Groovy Map containing sample information e.g. [ id:'test', single_end:false ]

groups file

Alternative to frequency bins, group bins are user defined, provided in a file.

*.{txt,tsv}
bins string

Allele frequency bins used for rsquared computations. By default they should as MAF bins [0-0.5], while they should take the full range [0-1] if --use-ref-alt is used.

0 0.01 0.05 ... 0.5
ac_bins string

User-defined allele count bins used for rsquared computations.

1 2 5 10 20 ... 100000
allele_counts string

Default allele count bins used for rsquared computations. AN field must be defined in the frequency file.

min_val_gl float

Minimum genotype likelihood probability P(G|R) in validation data. Set to zero to have no filter of if using –gt-validation

min_val_dp integer

Minimum coverage in validation data. If FORMAT/DP is missing and –min_val_dp > 0, the program exits with an error. Set to zero to have no filter of if using –gt-validation

Output 7 channels
#1 errors_cal tuple
meta map

Groovy Map containing sample information e.g. [ id:'test', single_end:false ]

*.error.cal.txt.gz file

Calibration correlation errors between imputed dosages (in MAF bins) and highly-confident genotype.

*.errors.cal.txt.gz
#2 errors_grp tuple
meta map

Groovy Map containing sample information e.g. [ id:'test', single_end:false ]

*.error.grp.txt.gz file

Groups correlation errors between imputed dosages (in MAF bins) and highly-confident genotype.

*.errors.grp.txt.gz
#3 errors_spl tuple
meta map

Groovy Map containing sample information e.g. [ id:'test', single_end:false ]

*.error.spl.txt.gz file

Samples correlation errors between imputed dosages (in MAF bins) and highly-confident genotype.

*.errors.spl.txt.gz
#4 rsquare_grp tuple
meta map

Groovy Map containing sample information e.g. [ id:'test', single_end:false ]

*.rsquare.grp.txt.gz file

Groups r-squared correlation between imputed dosages (in MAF bins) and highly-confident genotype.

*.rsquare.grp.txt.gz
#5 rsquare_spl tuple
meta map

Groovy Map containing sample information e.g. [ id:'test', single_end:false ]

*.rsquare.spl.txt.gz file

Samples r-squared correlation between imputed dosages (in MAF bins) and highly-confid