cellgeni/fetch10xmeta @ 0.0.4
cellgeni/fetch10xmeta
Summary
Fetches and parses metadata for public 10x datasets from GEO (GSE*), ArrayExpress (E-MTAB*), or BioProject/ENA (PRJ*). For each dataset it:
- Downloads raw metadata from NCBI SRA, EBI ENA, or BioStudies depending on accession type.
- Resolves sample accessions to experiment and run IDs, building an accessions map.
- Classifies each run by download type (paired-end FASTQs, 10x BAM, or SRA) via
parse_ena_metadata.sh/parse_sra_metadata.sh. - Merges the per-run classification with sample IDs into
links.tsvviaadd_samples.awk.
For GEO datasets the module falls back through project IDs → sub-series project IDs → BioSample IDs if earlier ENA/SRA metadata downloads fail.
Inputs
| Name | Type | Description |
|---|---|---|
meta.id |
string | Dataset accession. Supported prefixes: GSE* (GEO), E-MTAB* (ArrayExpress), PRJ* (BioProject). |
sample_ids |
string | Comma-separated sample accessions to restrict processing to (e.g. GSM7232572,GSM7232573, ERS4689152,ERS4689153). Pass empty/null to process all samples in the dataset. |
Outputs
| Name | File(s) | Description |
|---|---|---|
links |
links.tsv |
Per-run metadata with an appended sample_id column mapping each run back to its source sample. |
list |
*.list |
Accession list files: run list (*.run.list), sample list (*.sample.list), project list (*.project.list), etc. |
tsv |
*.tsv |
TSV files from the collection pipeline: raw SRA/ENA metadata, accession mapping (*.accessions.tsv), sample-run mapping (*.sample_x_run.tsv), and parsed run classification (*.parsed.tsv). |
txt |
*.txt |
Optional SDRF/IDF plain-text files, present for ArrayExpress (E-MTAB*) datasets. |
soft |
*_family.soft |
Optional GEO SOFT family file, present for GEO (GSE*) datasets. |
versions |
versions.yml |
Pipeline version record. |
Usage
include { FETCH10XMETA } from 'cellgeni/fetch10xmeta'
// GEO dataset with comma-separated sample IDs
FETCH10XMETA(
channel.of([[id: 'GSE230685'], 'GSM7232572,GSM7232573'])
)
// All samples in an ArrayExpress dataset (no sample ID filter)
FETCH10XMETA(
channel.of([[id: 'E-MTAB-9221'], null])
)
License
MIT
meta
map
|
Map with dataset-level metadata. Must contain key 'id' with the dataset accession. e.g. [ id:'GSE230685' ] |
|---|---|
sample_ids
string
|
Comma-separated sample accession IDs to restrict processing to (e.g. 'GSM7232572,GSM7232573', 'DRS408305,DRS408306'). If empty or null, all samples in the dataset are used. |
tsv
tuple
meta
map
|
Map with dataset-level metadata, passed through from input. e.g. [ id:'GSE230685' ] |
|---|---|
*.tsv
file
|
TSV files from the collection pipeline: raw SRA/ENA metadata, accession mapping (.accessions.tsv), sample-run mapping (.sample_x_run.tsv), and parsed run classification (*.parsed.tsv). *.tsv
|
txt
tuple
meta
map
|
Map with dataset-level metadata, passed through from input. e.g. [ id:'GSE230685' ] |
|---|---|
*.txt
file
|
Optional SDRF/IDF plain-text files, present for ArrayExpress (E-MTAB*) datasets. *.txt
|
list
tuple
meta
map
|
Map with dataset-level metadata, passed through from input. e.g. [ id:'GSE230685' ] |
|---|---|
*.list
file
|
Accession list files produced during metadata collection (*.run.list, *.sample.list, *.project.list, etc.). *.list
|
soft
tuple
meta
map
|
Map with dataset-level metadata, passed through from input. e.g. [ id:'GSE230685' ] |
|---|---|
*_family.soft
file
|
Optional GEO SOFT family file, present for GEO (GSE*) datasets. *_family.soft
|
links
tuple
meta
map
|
Map with dataset-level metadata, passed through from input. e.g. [ id:'GSE230685' ] |
|---|---|
links.tsv
file
|
TSV file with per-run metadata and an appended sample_id column mapping each run back to its source sample accession. links.tsv
|
versions
| Tool | Description | Homepage |
|---|---|---|
| collect_metadata.sh | Downloads and parses GEO/ArrayExpress/BioProject metadata, resolving sample-to-run mappings. | https://github.com/cellgeni/nf-reprocessing-public-10x |
| Version | 0.0.4 |
|---|---|
| Release Date | 15 May 2026 15:53:27 (UTC) |
| Download URL | https://registry.nextflow.io/api/v1/modules/cellgeni%2Ffetch10xmeta/0.0.4/download |
| OCI Store URL | https://public.cr.seqera.io/v2/nextflow/plugin/modules/cellgeni/fetch10xmeta/blobs/sha256:a46741ae6e4627d91349eced7b13d9c0e917e9f5240b435d54bd35a0b14b6d3a |
| Size | 8.1 KB |
| Checksum | sha256:a46741ae6e4627d91349eced7b13d9c0e917e9f5240b435d54bd35a0b14b6d3a |
| Downloads | 0 |