×

nf-core/fgbio/collectduplexseqmetrics @ 0.0.0-6c4ed3a

Collects a suite of metrics to QC duplex sequencing data.

Latest version: 0.0.0-6c4ed3a
Total downloads: 6
Source: nf-core/modules
Authors: @georgiakes
Maintainers: @georgiakes

Summary

Collects a suite of metrics to QC duplex sequencing data.

Get started

Add the following snippet to your workflow script to include this module.

include { FGBIO_COLLECTDUPLEXSEQMETRICS } from 'nf-core/fgbio/collectduplexseqmetrics'

License

MIT License

Process
Name FGBIO_COLLECTDUPLEXSEQMETRICS
Input 1 channel
#1 tuple
meta map

Groovy Map containing sample information e.g. [ id:'sample1' ]

grouped_bam file

It has to be either 1)The exact BAM output by the GroupReadsByUmi tool (in the sort-order it was produced in) 2)A BAM file that has MI tags present on all reads (usually set by GroupReadsByUmi and has been sorted with SortBam into TemplateCoordinate order.

*.bam
interval_list file

Calculation of metrics may be restricted to a set of regions using the --intervals parameter. The file format is descripted here https://samtools.github.io/htsjdk/javadoc/htsjdk/index.html?htsjdk/samtools/util/Interval.html

*.{tsv|txt|interval_list}
Output 8 channels
#1 duplex_qc tuple
meta map

Groovy Map containing sample information e.g. [ id:'sample1' ]

**.duplex_qc.pdf file

A series of plots generated from the preceding metrics files for visualization

*.pdf
#2 umi_counts tuple
meta map

Groovy Map containing sample information e.g. [ id:'sample1' ]

**.umi_counts.txt file

Metrics on the frequency of observations of UMIs within reads and tag families

*.txt
#3 family_sizes tuple
meta map

Groovy Map containing sample information e.g. [ id:'sample1' ]

**.family_sizes.txt file

Metrics on the frequency of different types of families of different sizes

*.txt
#4 versions_fgbio tuple
${task.process} string

The process the versions were collected from

fgbio string

The tool name

fgbio --version 2>&1 | tr -d "[:cntrl:]" | sed -e "s/^.*Version: //;s/\[.*$//" eval

The command used to generate the version of the tool

#5 versions_ggplot2 tuple
${task.process} string

The process the versions were collected from

ggplot2 string

The tool name

Rscript -e "cat(as.character(packageVersion('ggplot2')))" eval

The command used to generate the version of the tool

#6 duplex_umi_counts tuple
meta map

Groovy Map containing sample information e.g. [ id:'sample1' ]

**.duplex_umi_counts.txt file

Metrics on the frequency of observations of duplex UMIs within reads and tag families.

*.txt
#7 duplex_family_sizes tuple
meta map

Groovy Map containing sample information e.g. [ id:'sample1' ]

**.duplex_family_sizes.txt file

Metrics on the frequency of duplex tag families by the number of observations from each strand

*.txt
#8 duplex_yield_metrics tuple
meta map

Groovy Map containing sample information e.g. [ id:'sample1' ]

**.duplex_yield_metrics.txt file

Summary QC metrics produced using 5%, 10%, 15%...100% of the data

*.txt
Tool Description Homepage
fgbio A set of tools for working with genomic and high throughput sequencing data, including UMIs http://fulcrumgenomics.github.io/fgbio/
r-ggplot2 ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics. https://ggplot2.tidyverse.org/
Version 0.0.0-6c4ed3a
Commit ID 6c4ed3a220310b905a1fc9d04f05be2e0837142b