Nextflow Modules
Showing module(s) with keyword "rpbp"
| Module | Keywords | Description |
|---|---|---|
| nf-core/rpbp/estimatemetagenebayesfactors | rpbp metagene bayes orf riboseq | Score how strongly each per-read-length metagene profile shows the 3-nucleotide periodicity expected of actively translating ribosomes. For each candidate (read length, P-site offset) pair, Rp-Bp fits two competing Bayesian models to the count window around annotated start codons: a "periodic" model whose signal repeats every three nucleotides, and a "non-periodic" background model. The Bayes factor (ratio of the two marginal likelihoods) quantifies how much the data prefer the periodic explanation. Returns one row per (length, offset) pair with the mean and variance of the log Bayes factor across MCMC samples. Downstream, `rpbp/selectperiodicoffsets` picks the best offset per length from this table, and `rpbp/getperiodiclengthsoffsets` filters to the high-confidence pairs that drive ORF-level scoring. Uses the Stan models bundled inside the rpbp Python package. |
| nf-core/rpbp/estimateorfbayesfactors | rpbp orf bayes translation riboseq | Score every candidate ORF for evidence of active translation. For each ORF, Rp-Bp fits two competing Bayesian models to its per-codon P-site count vector: a "translated" model that expects P-site density to concentrate at codon-start positions (the in-frame signal a translating ribosome produces), and an "untranslated" / noise model for the same data. The Bayes factor (ratio of marginal likelihoods) quantifies how much the data favour the translated hypothesis. Emits a BED-style table with one row per ORF carrying genomic coordinates plus the mean and variance of the log Bayes factor across MCMC samples. Downstream, `rpbp/selectfinalpredictionset` applies Bayes-factor, length and overlap rules to this table to produce the final filtered prediction set. Uses the Stan models bundled inside the rpbp Python package. |
| nf-core/rpbp/extractmetageneprofiles | rpbp metagene orf riboseq | Build per-read-length pileups of Ribo-seq read 5'-ends around annotated start codons - the "metagene profile". For each read length, the profile counts how many reads of that length have their 5' end at each position in a window around every annotated start codon, summed across all transcripts. Looking at the profile across the window reveals whether reads of that length show the 3-nucleotide periodicity characteristic of translating ribosomes. This per-length view matters because different ribosome footprint lengths place the ribosomal P-site (the codon being decoded) at different offsets from the read's 5' end, so each length needs its own offset calibration. Output is consumed by `rpbp/estimatemetagenebayesfactors`, which scores each (length, offset) combination for periodicity. |
| nf-core/rpbp/extractorfprofiles | rpbp orf psite profile riboseq | Build a per-ORF P-site count vector for every candidate open reading frame (ORF) in the catalogue. For each ORF, walks the spliced exons in 3-nucleotide codon steps and counts the P-site positions (read 5'-end coordinate plus the length-specific offset selected upstream) that fall in each codon. Counts are summed across all read lengths that passed the periodicity filter from `rpbp/getperiodiclengthsoffsets`. The resulting per-ORF vectors are the input to Bayesian translation scoring in `rpbp/estimateorfbayesfactors`: a translated ORF should show P-site density concentrated at codon-start positions, while a non-translated region should look flat or noisy. Emitted as a sparse matrix (one row per ORF, columns indexed by codon position). |
| nf-core/rpbp/getperiodiclengthsoffsets | rpbp psite offset filter riboseq | Filter the per-read-length P-site offset table down to the (length, offset) pairs that will actually drive ORF-level scoring. Drops read lengths whose metagene profile is too sparsely populated, or whose periodicity Bayes factor is too low / too uncertain, so that downstream P-site counting only uses read lengths with a clean 3-nucleotide signal. Wraps Rp-Bp's `get_periodic_lengths_and_offsets` Python helper directly. Thresholds are configured via named flags in `ext.args`: `--min-count` (default: 1000), `--min-bf-mean` (default: 5), `--max-bf-var` (default: no limit), `--min-bf-likelihood` (default: 0.5). Defaults mirror `rpbp.defaults.metagene_options`. |
| nf-core/rpbp/preparegenome | rpbp orf prepare genome bed riboseq | Build the per-ORF reference files that Rp-Bp's downstream scoring needs, starting from a genome FASTA and an annotation GTF. Enumerates every candidate open reading frame (ORF) in the annotation (annotated CDSs plus alternative start codons within transcript exons), records their genomic and per-exon coordinates, and labels them with the transcript and gene they belong to. Invokes Rp-Bp's `get_orfs` Python function directly, chaining the upstream helpers `gtf-to-bed12`, `extract-bed-sequences`, `extract-orf-coordinates`, `split-bed12-blocks` and `label-orfs`. Bypasses Rp-Bp's `prepare-rpbp-genome` umbrella script, which would also build `bowtie2` (rRNA filtering) and `STAR` (alignment) indices - neither is consumed by the Rp-Bp tools wrapped here, since alignment is supplied externally as a BAM. A minimal `chrName.txt` (one contig name per line) is seeded from the FASTA headers because `gtf-to-bed12` reads it via `--chr-name-file` to control output sort order. Note: emits the `*.annotated.bed.gz` filenames produced by `get_orfs` directly, rather than the `*.bed.gz`-renamed forms that the upstream umbrella `prepare-rpbp-genome` script produces. The downstream module outputs and consumers in this module set reference these names explicitly, so the two are functionally equivalent. |
| nf-core/rpbp/selectfinalpredictionset | rpbp orf bayes prediction riboseq | Produce the final filtered set of predicted translated ORFs from the per-ORF Bayes factor table. Applies the standard Rp-Bp prediction rules: a minimum Bayes-factor cutoff (favouring translated over untranslated), a minimum ORF length, and overlap resolution so that among overlapping candidates only the highest-scoring representative is kept. Emits three files describing the same prediction set: a BED of ORF genomic coordinates plus score, a FASTA of ORF DNA sequences (extracted from the genome FASTA), and a FASTA of the corresponding translated protein sequences. This is the terminal step of the Rp-Bp per-sample chain. |
| nf-core/rpbp/selectperiodicoffsets | rpbp psite offset orf riboseq | Pick the single best P-site offset for each read length from the per-(length, offset) Bayes factor table produced upstream. For each read length, the offset with the highest periodicity Bayes-factor mean is selected - this is the offset that, when added to a read's 5' end, is estimated to land closest to the ribosomal P-site (the codon being decoded). Downstream, these offsets are used to convert raw read 5'-end coordinates into P-site positions when counting reads against candidate ORFs. Emits one row per read length (length, best offset, supporting Bayes factor statistics). The next step (`rpbp/getperiodiclengthsoffsets`) filters this table to the high-quality pairs that pass user-specified count / signal thresholds before P-site counting in `rpbp/extractorfprofiles`. |