Input Files¶

quantms accepts three types of input: an SDRF metadata file (recommended), a CSV samplesheet, and a protein FASTA database. Optionally, an isotope correction matrix can be provided for isobaric experiments.

SDRF File (Recommended)¶

The Sample and Data Relationship Format (SDRF) is a tab-delimited metadata format that encodes both experimental design and file references in one file. It is the recommended input for quantms because it fully specifies all search parameters, avoiding the need to pass them individually on the command line.

--input /path/to/experiment.sdrf.tsv

Required SDRF Columns¶

Column	Description	Example
`source name`	Unique sample identifier	`sample1`
`comment[file uri]`	Path or URI to the spectra file	`file:///data/run1.mzML`
`comment[label]`	Labeling type	`label free sample` or `TMT10plex-126`
`comment[fraction identifier]`	Fraction number	`1`
`comment[data file]`	Raw file name	`run1.raw`
`characteristics[organism]`	Species	`Homo sapiens`

Common Optional SDRF Columns¶

Column	Description
`comment[cleavage agent details]`	Enzyme (e.g. `MS:1001251` for Trypsin)
`comment[modification parameters]`	Fixed and variable modifications
`comment[precursor mass tolerance]`	Precursor tolerance (e.g. `10 ppm`)
`comment[fragment mass tolerance]`	Fragment tolerance (e.g. `0.02 Da`)
`comment[dissociation method]`	`HCD`, `CID`, `ETD`
`factor value[disease]`	Experimental condition for DE analysis

SDRF File Extensions¶

The SDRF file may use any of these extensions: .sdrf.tsv, .sdrf, .tsv, .csv

Example SDRF Fragment¶

source name comment[file uri]   comment[label]  comment[fraction identifier]    characteristics[organism]   factor value[disease]
ctrl_1  file:///data/ctrl_1.mzML    label free sample   1   Homo sapiens    healthy
ctrl_2  file:///data/ctrl_2.mzML    label free sample   1   Homo sapiens    healthy
case_1  file:///data/case_1.mzML    label free sample   1   Homo sapiens    NASH
case_2  file:///data/case_2.mzML    label free sample   1   Homo sapiens    NASH

SDRF Resources¶

CSV Samplesheet¶

A CSV samplesheet is a simpler alternative when you do not need full SDRF metadata. Pass it with --input.

Minimum Required Columns¶

Column	Description
`spectra_file`	Absolute path or URI to the spectra file

Full Example¶

spectra_file,sample_name,condition,replicate
/data/sample1.mzML,ctrl_1,control,1
/data/sample2.mzML,ctrl_2,control,2
/data/sample3.mzML,treat_1,treatment,1
/data/sample4.mzML,treat_2,treatment,2

When using a CSV samplesheet, specify search parameters (enzyme, modifications, tolerances) directly on the command line or in a params file. See Usage for details.

Spectra Files¶

Supported Formats¶

Format	Extension	Notes
mzML	`.mzML`	Preferred open standard; used directly
Thermo RAW	`.raw`	Auto-converted to mzML via ThermoRawFileParser
Bruker timsTOF	`.d`	Converted when `--convert_dotd true`

Compressed Variants¶

Compressed spectra files are accepted for .raw, .mzML, and .d:

.gz — gzip
.tar — tar archive
.tar.gz / .tgz — tar + gzip
.zip — zip

Files are automatically decompressed before processing. Decompressed intermediates can be cached locally.

Remote Files¶

Files can be referenced by URI in the SDRF comment[file uri] column or the CSV spectra_file column:

# Local
file:///data/run1.mzML

# AWS S3
s3://my-bucket/data/run1.mzML

# HTTP
https://ftp.pride.ebi.ac.uk/pride/data/archive/2023/01/PXD012345/run1.raw

Protein FASTA Database¶

A standard FASTA file containing the protein sequences to search against.

--database /path/to/proteome.fasta

Recommendations¶

Use reviewed UniProt sequences for standard proteomics
Remove stop-codon characters (*) — different search engines handle them differently
Decoy sequences are added automatically by the pipeline (--add_decoys true by default)
Contaminants (e.g. cRAP) can be appended before running

Example (downloading from UniProt)¶

# Human reviewed proteome
wget "https://rest.uniprot.org/uniprotkb/stream?format=fasta&query=organism_id:9606+AND+reviewed:true" \
    -O uniprot_human_reviewed.fasta

TMT / iTRAQ Correction Matrix (Optional)¶

For isobaric labeling experiments, you can supply an isotope correction matrix to account for isotopic impurities in the reagent lots.

--isotope_correction_file /path/to/correction_matrix.tsv

The correction matrix is a tab-delimited file with one row per TMT/iTRAQ channel. The exact format follows the OpenMS IsobaricAnalyzer convention. Correction matrices are typically provided by reagent manufacturers (e.g., Thermo Fisher TMT lot-specific correction factors).

If no correction matrix is provided, the pipeline uses default values (equal isotope distribution — no correction).

Input Summary Table¶

Input	Flag	Required	Format
SDRF or samplesheet	`--input`	Yes	`.sdrf.tsv` or `.csv`
Protein database	`--database`	Yes	`.fasta`
Spectra files	(referenced from input)	Yes	`.mzML`, `.raw`, `.d`
Isotope correction	`--isotope_correction_file`	No	`.tsv`