Nextflow Modules
Showing module(s) with keyword "clustering"
| Module | Keywords | Description |
|---|---|---|
| nf-core/ampcombi2/cluster | antimicrobial peptides amps parsing reporting align clustering mmseqs2 | A submodule that clusters the merged AMP hits generated from ampcombi2/parsetables and ampcombi2/complete using MMseqs2 cluster. |
| nf-core/autocycler/cluster | autocycler genome-assembly clustering long-read | Cluster replicons in compressed assemblies with Autocycler. |
| nf-core/comebin/runcomebin | metagenomics binning clustering | Effective binning of metagenomic contigs using COntrastive Multi-viEw representation learning |
| nf-core/custom/clustermetrics | clustering metrics silhouette calinski-harabasz davies-bouldin evaluation | Computes clustering quality metrics (silhouette, Calinski-Harabasz, Davies-Bouldin) and performs k-sweep analysis |
| nf-core/custom/clustervisualization | clustering visualization pca umap tsne dimension-reduction | Generates UMAP and t-SNE visualizations colored by cluster |
| nf-core/custom/orfmerge | orf ribo-seq catalogue merge clustering | Cluster normalised per-sample, per-caller ORF predictions into a single cohort-level catalogue. Pair with `custom/orfnormalise` upstream and (typically) `bedtools/getfasta` + `seqkit/translate` downstream to obtain the AA FASTA. Strategy is class-aware (operating on the harmonised `orf_class` written by `custom/orfnormalise`): - canonical_cds: collapse by (transcript_id, strand). One canonical CDS per transcript by definition. - uORF, dORF, other: collapse by (transcript_id, strand, start, end). A single transcript can host multiple distinct uORFs / dORFs / internal ORFs, so keying on the outer span keeps them in separate clusters while still merging cross-caller calls that agree on coordinates. - novel_u, smORF: greedy reciprocal-overlap clustering on the outer genomic span at `--reciprocal-overlap` (default 0.8). Catches fuzzy cross-caller matches and exact-coordinate collapses in one pass. Order-dependent at the boundary: a chain A-B-C where A-B and B-C overlap at ~0.85 but A-C only at ~0.75 may cluster as {A,B,C} or {A,B}+{C} depending on iteration order. Rare in practice at 0.8. Cross-caller consensus is recorded in two column families on the catalogue TSV: - `called_by_<caller>`: 0/1 indicator per supported caller (ribotish, ribocode, ribotricer, rpbp, price). - `score_<caller>`: best score from that caller within the cluster. Score direction is per-caller (p-values are minimised; Bayes factors / phase scores are maximised). Cross-sample recurrence is recorded in two further columns: - `n_samples`: number of distinct samples contributing to the cluster (a cohort recurrence metric). - `samples`: sorted, comma-separated list of those sample ids. Emits a small MultiQC custom-content TSV (per-class counts) for inclusion in downstream MultiQC reports. |
| nf-core/custom/pcaclustering | clustering kmeans dbscan pca embeddings | Performs KMeans or DBSCAN clustering on a sample-by-feature numeric matrix (e.g. principal components, embeddings) |
| nf-core/diamond/cluster | clustering alignment genomics proteomics | calculate clusters of highly similar sequences |
| nf-core/diamond/deepclust | clustering protein diamond deepclust proteomics | Fast graph-based protein sequence clustering using DIAMOND deepclust |
| nf-core/diamond/linclust | clustering protein diamond linclust proteomics | Fast protein sequence clustering using a greedy incremental approach |
| nf-core/humid | umi fastq deduplication hamming-distance clustering | HUMID is a tool to quickly and easily remove duplicate reads from FASTQ files, with or without UMIs. |
| nf-core/lsa/cosine | similarity cosine clustering rnaseq heatmap | Calculates the cosine similarity matrix between samples based on a gene expression matrix. |
| nf-core/mmseqs/cluster | protein sequence databases clustering searching indexing mmseqs2 | Cluster sequences using MMSeqs2 cluster. |
| nf-core/mmseqs/createdb | protein sequence databases clustering searching indexing mmseqs2 | Create an MMseqs database from an existing FASTA/Q file |
| nf-core/mmseqs/createindex | protein sequence databases clustering searching indexing | Creates sequence index for mmseqs database |
| nf-core/mmseqs/createtaxdb | protein sequence databases clustering searching taxonomy | Adds taxonomy |