Nextflow Registry | Nextflow Modules

Nextflow Modules

Showing module(s) with keyword "orf"

Module	Keywords	Description
nf-core/custom/orfcollapse	orf ribo-seq catalogue smorf deduplication	Collapse small ORFs that share an amino-acid sequence cluster into a single catalogue entry. Pair with `custom/orfmerge` (coordinate-based catalogue), `bedtools/getfasta` + `seqkit/translate` (AA FASTA keyed by orf_id), and `mmseqs/easycluster` (AA clusters) upstream. The coordinate-based merge in `custom/orfmerge` only groups ORFs that overlap on the genome, so the same micropeptide encoded at several distinct, non-overlapping loci (typically repetitive regions) survives as separate rows. This adopts the peptide-level deduplication and 0.9 amino-acid-similarity threshold of the GENCODE Ribo-seq ORF consolidation (Mudge et al. 2022, Nat Biotechnol, doi:10.1038/s41587-022-01369-0; gencode-riboseqORFs collapse_cutoff 0.9), implemented here with MMseqs2 sequence-identity clustering rather than that tool's longest-shared-string / P-site-overlap metric. Small ORFs (orf_class "smORF", i.e. aa_length <= 100) are clustered by amino-acid identity upstream and this module folds each multi-member cluster down to one representative. Only smORF rows are collapsed; larger ORFs and transcript-anchored classes are passed through untouched. Among the smORF members of a cluster the representative is chosen by longest aa_length (ties broken by orf_id), so the result does not depend on which sequence MMseqs2 labelled the cluster representative. Catalogue row order is preserved; dropped members fold their `called_by_<caller>` / `score_<caller>` evidence, `n_samples` / `samples` recurrence and gene mappings into the survivor.
nf-core/custom/orfmerge	orf ribo-seq catalogue merge clustering	Cluster normalised per-sample, per-caller ORF predictions into a single cohort-level catalogue. Pair with `custom/orfnormalise` upstream and (typically) `bedtools/getfasta` + `seqkit/translate` downstream to obtain the AA FASTA. Strategy is class-aware (operating on the harmonised `orf_class` written by `custom/orfnormalise`): - canonical_cds: collapse by (transcript_id, strand). One canonical CDS per transcript by definition. - uORF, dORF, other: collapse by (transcript_id, strand, start, end). A single transcript can host multiple distinct uORFs / dORFs / internal ORFs, so keying on the outer span keeps them in separate clusters while still merging cross-caller calls that agree on coordinates. - novel_u, smORF: greedy reciprocal-overlap clustering on the outer genomic span at `--reciprocal-overlap` (default 0.8). Catches fuzzy cross-caller matches and exact-coordinate collapses in one pass. Order-dependent at the boundary: a chain A-B-C where A-B and B-C overlap at ~0.85 but A-C only at ~0.75 may cluster as {A,B,C} or {A,B}+{C} depending on iteration order. Rare in practice at 0.8. Cross-caller consensus is recorded in two column families on the catalogue TSV: - `called_by_<caller>`: 0/1 indicator per supported caller (ribotish, ribocode, ribotricer, rpbp, price). - `score_<caller>`: best score from that caller within the cluster. Score direction is per-caller (p-values are minimised; Bayes factors / phase scores are maximised). Cross-sample recurrence is recorded in two further columns: - `n_samples`: number of distinct samples contributing to the cluster (a cohort recurrence metric). - `samples`: sorted, comma-separated list of those sample ids. Emits a small MultiQC custom-content TSV (per-class counts) for inclusion in downstream MultiQC reports. Alongside the full catalogue, emits a consensus view (`.consensus.`) filtered to ORFs supported by at least `--min-callers` distinct callers and recurring in at least `--min-samples` samples (both default 1, i.e. no filtering, so the consensus view equals the full catalogue). Raising either threshold yields a higher-confidence catalogue without altering the full one.
nf-core/custom/orfnormalise	orf ribo-seq normalisation bed12 translation	Convert one ORF caller's per-sample output table into a unified BED12 plus a sidecar metadata TSV, ready for cross-caller merging. An "ORF caller" is a tool that scans ribosome-profiling (Ribo-seq) data and predicts which open reading frames are being translated. Each caller writes its own table format and uses its own location encoding, classification vocabulary, and confidence score. This module reconciles five callers into one harmonised schema. The `caller` val input selects the parser; supported values: - ribocode (RiboCode predicted ORF table; transcript-coord input, lifted to genomic blocks against the GTF) - ribotish (Ribo-TISH predict output; GenomePos + optional Blocks) - ribotricer (Ribotricer detect-orfs translating ORFs TSV; ORF span parsed from ORF_ID, multi-exon blocks recovered by intersecting with host-transcript exon structure from the GTF) - rpbp (Rp-Bp predicted-orfs BED12 with extra columns) - price (PRICE orfs.tsv; Gedi-style Location field, already genomic) Output BED12 column order: chrom start end name score strand thickStart thickEnd itemRgb blockCount blockSizes blockStarts The BED `name` column carries `<caller>\|<caller-native-id>`. The BED `score` column is the caller's native score rescaled to 0-1000 (higher == more confident regardless of native direction). Output sidecar TSV columns: orf_id caller sample_id chrom start end strand gene_id transcript_id orf_class aa_length score Harmonised `orf_class` vocabulary written into the sidecar TSV: - canonical_cds: ORF maps to an annotated CDS (including truncated / extended variants of one). - uORF: upstream ORF (5'UTR-resident). - dORF: downstream ORF (3'UTR-resident). - novel_u: novel / intergenic ORF not assigned to an annotated CDS. - smORF: small ORF (aa_length <= 100); promoted regardless of location-based class so downstream tools can treat smORFs uniformly. - other: internal / overlap / frame variants and anything else. Per-caller mapping notes (lossy collapses): - PRICE `iORF` (internal ORF), `intronic`, and `orphan` map to `other`. Cross-caller catalogue tracking still flags these via `called_by_price`, but the specific PRICE sub-type is not preserved. - Rp-Bp's predicted-orfs BED carries no ORF-type column; this module defaults every Rp-Bp call to `canonical_cds` (the post- selectfinalpredictionset curated set is dominated by canonical CDSs). uORF/dORF/novel calls present in Rp-Bp's separate `.tab.gz` / `extracted-orfs.bed.gz` files are not propagated here. Each caller's native confidence score has a "direction" - some are lower-is-better (p-values), some are higher-is-better (Bayes factors, phase scores): ribocode: min (combined p-value) ribotish: min (combined p-value) ribotricer: max (phase_score) rpbp: max (Bayes factor mean) price: min (p-value) Downstream merging uses this to pick the best per-ORF call.
nf-core/dotseq/dotseq	riboseq rnaseq translation differential orf	Detect differential ORF usage (DOU) and ORF-level differential translation efficiency (DTE) from Ribo-seq with matched RNA-seq using DOTSeq. Wraps DOTSeqDataSetsFromSummarizeOverlaps() + DOTSeq() + getContrasts() and emits the package's native contrast tables plus plotDOT() visualisations.
nf-core/gedi/indexgenome	riboseq index genome gedi price orf	Build a GEDI genome index from a FASTA and GTF for downstream PRICE ORF prediction
nf-core/gedi/price	riboseq orf price gedi translation	Identify translated ORFs from Ribo-seq BAMs using the PRICE algorithm
nf-core/ribotricer/detectorfs	riboseq orf genomics	Accurate detection of short and long active ORFs using Ribo-seq data
nf-core/ribotricer/prepareorfs	riboseq orf genomics	Accurate detection of short and long active ORFs using Ribo-seq data
nf-core/rpbp/estimatemetagenebayesfactors	rpbp metagene bayes orf riboseq	Score how strongly each per-read-length metagene profile shows the 3-nucleotide periodicity expected of actively translating ribosomes. For each candidate (read length, P-site offset) pair, Rp-Bp fits two competing Bayesian models to the count window around annotated start codons: a "periodic" model whose signal repeats every three nucleotides, and a "non-periodic" background model. The Bayes factor (ratio of the two marginal likelihoods) quantifies how much the data prefer the periodic explanation. Returns one row per (length, offset) pair with the mean and variance of the log Bayes factor across MCMC samples. Downstream, `rpbp/selectperiodicoffsets` picks the best offset per length from this table, and `rpbp/getperiodiclengthsoffsets` filters to the high-confidence pairs that drive ORF-level scoring. Uses the Stan models bundled inside the rpbp Python package.
nf-core/rpbp/estimateorfbayesfactors	rpbp orf bayes translation riboseq	Score every candidate ORF for evidence of active translation. For each ORF, Rp-Bp fits two competing Bayesian models to its per-codon P-site count vector: a "translated" model that expects P-site density to concentrate at codon-start positions (the in-frame signal a translating ribosome produces), and an "untranslated" / noise model for the same data. The Bayes factor (ratio of marginal likelihoods) quantifies how much the data favour the translated hypothesis. Emits a BED-style table with one row per ORF carrying genomic coordinates plus the mean and variance of the log Bayes factor across MCMC samples. Downstream, `rpbp/selectfinalpredictionset` applies Bayes-factor, length and overlap rules to this table to produce the final filtered prediction set. Uses the Stan models bundled inside the rpbp Python package.
nf-core/rpbp/extractmetageneprofiles	rpbp metagene orf riboseq	Build per-read-length pileups of Ribo-seq read 5'-ends around annotated start codons - the "metagene profile". For each read length, the profile counts how many reads of that length have their 5' end at each position in a window around every annotated start codon, summed across all transcripts. Looking at the