Nextflow Registry | nf-core/rundbcan/easysubstrate@0.0.0-6c4ed3a

nf-core/rundbcan/easysubstrate @ 0.0.0-6c4ed3a

Substrate annotation module for the dbcan pipeline. This module is used to annotate carbohydrate-active enzymes (CAZymes) from genomic data using the dbCAN annotation tool.

dbCAN download CAZyme CAZyme gene Cluster genomes

Latest version: 0.0.0-6c4ed3a

Total downloads: 11

Source: nf-core/modules

Authors: @Xinpeng021001

Maintainers: @Xinpeng021001

Summary

Substrate annotation module for the dbcan pipeline. This module is used to annotate carbohydrate-active enzymes (CAZymes) from genomic data using the dbCAN annotation tool.

Get started

Add the following snippet to your workflow script to include this module.

include { RUNDBCAN_EASYSUBSTRATE } from 'nf-core/rundbcan/easysubstrate'

License

MIT License

Process

`Name`	`RUNDBCAN_EASYSUBSTRATE`

Input 3 channels

#1 tuple

`meta` map	Groovy Map containing sample information e.g. `[ id:'sample1']`
`input_raw_data` file	FASTA file for protein sequences. `*.{fasta,fa,faa}`

#2 tuple

`meta2` map	Groovy Map containing sample information e.g. `[ id:'sample1']`
`input_gff` file	GFF file for protein sequences.
`gff_type` string	Type of GFF file. Options are `NCBI_prok`, `JGI`, `NCBI_euk`, and `prodigal`. This is used to parse the GFF file correctly.

`dbcan_db` directory	Path to the dbCAN database directory.

Output 13 channels

#1 cgc_gff tuple

`meta` map	Groovy Map containing sample information e.g. `[ id:'sample1']`
`${prefix}_cgc.gff` file	GFF file containing the CAZyme gene clusters (CGC) identified by dbCAN. This file is generated from the dbCAN annotation and contains the locations of CAZyme gene clusters in the genome.

#2 versions

`versions.yml` file	File containing software versions `versions.yml`

#3 synteny_pdf tuple

`meta` map	Groovy Map containing sample information e.g. `[ id:'sample1']`
`${prefix}_synteny_pdf/` directory	Directory containing the synteny plots in PDF format for the CAZyme gene clusters (CGC) identified by dbCAN. This directory will contain one or more PDF files showing the syntenic regions of the CGC in the genome.

#4 diamond_out_tc tuple

`meta` map	Groovy Map containing sample information e.g. `[ id:'sample1']`
`${prefix}_diamond.out.tc` file	TSV file containing the diamond output for transporter annotation.

#5 tf_hmm_results tuple

`meta` map	Groovy Map containing sample information e.g. `[ id:'sample1']`
`${prefix}_TF_hmm_results.tsv` file	TSV file containing the results of Transcription factor.

#6 total_cgc_info tuple

`meta` map	Groovy Map containing sample information e.g. `[ id:'sample1']`
`${prefix}_total_cgc_info.tsv` file	TSV file summarizing the total additional genes in the genome.

#7 stp_hmm_results tuple

`meta` map	Groovy Map containing sample information e.g. `[ id:'sample1']`
`${prefix}_STP_hmm_results.tsv` file	TSV file containing the results of signaling transduction proteins (STP) annotation.

#8 cgc_standard_out tuple

`meta` map	Groovy Map containing sample information e.g. `[ id:'sample1']`
`${prefix}_cgc_standard_out.tsv` file	Standard output file from dbCAN for CAZyme gene clusters (CGC) in a tabular format. This file summarizes the CAZyme gene clusters identified in the genome.

#9 dbcanhmm_results tuple

`meta` map	Groovy Map containing sample information e.g. `[ id:'sample1']`
`${prefix}_dbCAN_hmm_results.tsv` file	TSV file containing the detailed dbCAN HMM results for CAZyme annotation.

#10 dbcansub_results tuple

`meta` map	Groovy Map containing sample information e.g. `[ id:'sample1']`