×

Nextflow Modules

Clear

Showing module(s) with keyword "normalisation"

Module Keywords Description
nf-core/custom/orfnormalise orf ribo-seq normalisation bed12 translation Convert one ORF caller's per-sample output table into a unified BED12 plus a sidecar metadata TSV, ready for cross-caller merging. An "ORF caller" is a tool that scans ribosome-profiling (Ribo-seq) data and predicts which open reading frames are being translated. Each caller writes its own table format and uses its own location encoding, classification vocabulary, and confidence score. This module reconciles five callers into one harmonised schema. The `caller` val input selects the parser; supported values: - ribocode (RiboCode predicted ORF table; transcript-coord input, lifted to genomic blocks against the GTF) - ribotish (Ribo-TISH predict output; GenomePos + optional Blocks) - ribotricer (Ribotricer detect-orfs translating ORFs TSV; ORF span parsed from ORF_ID, multi-exon blocks recovered by intersecting with host-transcript exon structure from the GTF) - rpbp (Rp-Bp predicted-orfs BED12 with extra columns) - price (PRICE orfs.tsv; Gedi-style Location field, already genomic) Output BED12 column order: chrom start end name score strand thickStart thickEnd itemRgb blockCount blockSizes blockStarts The BED `name` column carries `<caller>|<caller-native-id>`. The BED `score` column is the caller's native score rescaled to 0-1000 (higher == more confident regardless of native direction). Output sidecar TSV columns: orf_id caller sample_id chrom start end strand gene_id transcript_id orf_class aa_length score Harmonised `orf_class` vocabulary written into the sidecar TSV: - canonical_cds: ORF maps to an annotated CDS (including truncated / extended variants of one). - uORF: upstream ORF (5'UTR-resident). - dORF: downstream ORF (3'UTR-resident). - novel_u: novel / intergenic ORF not assigned to an annotated CDS. - smORF: small ORF (aa_length <= 100); promoted regardless of location-based class so downstream tools can treat smORFs uniformly. - other: internal / overlap / frame variants and anything else. Per-caller mapping notes (lossy collapses): - PRICE `iORF` (internal ORF), `intronic`, and `orphan` map to `other`. Cross-caller catalogue tracking still flags these via `called_by_price`, but the specific PRICE sub-type is not preserved. - Rp-Bp's predicted-orfs BED carries no ORF-type column; this module defaults every Rp-Bp call to `canonical_cds` (the post- selectfinalpredictionset curated set is dominated by canonical CDSs). uORF/dORF/novel calls present in Rp-Bp's separate `.tab.gz` / `extracted-orfs.bed.gz` files are not propagated here. Each caller's native confidence score has a "direction" - some are lower-is-better (p-values), some are higher-is-better (Bayes factors, phase scores): ribocode: min (combined p-value) ribotish: min (combined p-value) ribotricer: max (phase_score) rpbp: max (Bayes factor mean) price: min (p-value) Downstream merging uses this to pick the best per-ORF call.