×

nf-core/rundbcan/easysubstrate @ 0.0.0-6c4ed3a

Substrate annotation module for the dbcan pipeline. This module is used to annotate carbohydrate-active enzymes (CAZymes) from genomic data using the dbCAN annotation tool.

Latest version: 0.0.0-6c4ed3a
Total downloads: 11
Source: nf-core/modules
Authors: @Xinpeng021001
Maintainers: @Xinpeng021001

Summary

Substrate annotation module for the dbcan pipeline. This module is used to annotate carbohydrate-active enzymes (CAZymes) from genomic data using the dbCAN annotation tool.

Get started

Add the following snippet to your workflow script to include this module.

include { RUNDBCAN_EASYSUBSTRATE } from 'nf-core/rundbcan/easysubstrate'

License

MIT License

Process
Name RUNDBCAN_EASYSUBSTRATE
Input 3 channels
#1 tuple
meta map

Groovy Map containing sample information e.g. [ id:'sample1']

input_raw_data file

FASTA file for protein sequences.

*.{fasta,fa,faa}
#2 tuple
meta2 map

Groovy Map containing sample information e.g. [ id:'sample1']

input_gff file

GFF file for protein sequences.

gff_type string

Type of GFF file. Options are NCBI_prok, JGI, NCBI_euk, and prodigal. This is used to parse the GFF file correctly.

dbcan_db directory

Path to the dbCAN database directory.

Output 13 channels
#1 cgc_gff tuple
meta map

Groovy Map containing sample information e.g. [ id:'sample1']

${prefix}_cgc.gff file

GFF file containing the CAZyme gene clusters (CGC) identified by dbCAN. This file is generated from the dbCAN annotation and contains the locations of CAZyme gene clusters in the genome.

#2 versions
versions.yml file

File containing software versions

versions.yml
#3 synteny_pdf tuple
meta map

Groovy Map containing sample information e.g. [ id:'sample1']

${prefix}_synteny_pdf/ directory

Directory containing the synteny plots in PDF format for the CAZyme gene clusters (CGC) identified by dbCAN. This directory will contain one or more PDF files showing the syntenic regions of the CGC in the genome.

#4 diamond_out_tc tuple
meta map

Groovy Map containing sample information e.g. [ id:'sample1']

${prefix}_diamond.out.tc file

TSV file containing the diamond output for transporter annotation.

#5 tf_hmm_results tuple
meta map

Groovy Map containing sample information e.g. [ id:'sample1']

${prefix}_TF_hmm_results.tsv file

TSV file containing the results of Transcription factor.

#6 total_cgc_info tuple
meta map

Groovy Map containing sample information e.g. [ id:'sample1']

${prefix}_total_cgc_info.tsv file

TSV file summarizing the total additional genes in the genome.

#7 stp_hmm_results tuple
meta map

Groovy Map containing sample information e.g. [ id:'sample1']

${prefix}_STP_hmm_results.tsv file

TSV file containing the results of signaling transduction proteins (STP) annotation.

#8 cgc_standard_out tuple
meta map

Groovy Map containing sample information e.g. [ id:'sample1']

${prefix}_cgc_standard_out.tsv file

Standard output file from dbCAN for CAZyme gene clusters (CGC) in a tabular format. This file summarizes the CAZyme gene clusters identified in the genome.

#9 dbcanhmm_results tuple
meta map

Groovy Map containing sample information e.g. [ id:'sample1']

${prefix}_dbCAN_hmm_results.tsv file

TSV file containing the detailed dbCAN HMM results for CAZyme annotation.

#10 dbcansub_results tuple
meta map

Groovy Map containing sample information e.g. [ id:'sample1']