Targeted proteomics

(but not that kind!)

Ani

Dec 28, 2025

If shotgun proteomics is listening to the whole song
(DDA,
DIA),

then what I’m doing here is hovering over a few notes and asking:

“Were you even played? And roughly how loud?”

Not re-identifying peptides.
Not doing PRM.
Not replacing Skyline.

Just opening the RAW file and counting stuff.

Where this came from

In my earlier post Chopping Proteins to Peptides I started with a very naive question:

What peptides even exist, theoretically, in our proteome?

So I:

chopped protein sequences into overlapping peptides (10–30 aa)
computed monoisotopic masses
deduplicated sequences
verified mass correctness with known isobaric peptides

That exercise made one thing painfully obvious:

wildly different peptides can have exactly the same mass

For example:

LSLAQEDLISNR (12 aa)
GSLLLGGLDAEASR (14 aa)

Same monoisotopic mass. Completely unrelated proteins.

From an MS¹ point of view, they collapse onto the same ion.

This post is a continuation of that line of thinking.

The next naive question

Given a list of theoretical peptide masses:

Do I actually see these ions in my RAW file?
And if yes, how much signal do they carry?

That’s it.

No peak picking.
No fragment scoring.
No chromatogram integration.

Just counting intensities already recorded by the instrument.

About “targeted” (important caveat)

Let me be very clear:

This is not proper targeted proteomics in the sense of:

PRM
SRM
Skyline workflows
interference-aware quantification

If you want that, Skyline exists — and it’s excellent.

This is much more primitive:

no deconvolution
no fragment analysis
no sequence confirmation
no confidence model

Think of it as:

“What did the instrument actually collect for these m/z values?”

Nothing more.

How the RAW data is read

Unlike timsTOF-pro data (where conversion to mzML was done,
Convert timsTOF-pro data to mzML), here Thermo RAW files is read directly using the vendor-provided DLL comes from: http://planetorbitrap.com/rawfilereader

My repo: https://github.com/animesh/RawRead is just an example implementation showing how to:

open RAW files
iterate scans
extract scan metadata
parse scan titles

Support for Bruker / timsTOF is a different problem, API calls are just too slow ATM 🤪 but I’m working on it 🤞

Minimal targets: just m/z

The minimal input for this workflow is a CSV (targets.csv) with m/z values.

For the two isobaric peptides mentioned earlier (charge 2):

Compound                  Mass [m/z]
GSLLLGGLDAEASR            679.867348
GSLLLGGLDAEASR (heavy)    684.871482
LSLAQEDLISNR              679.867348
LSLAQEDLISNR (heavy)      684.871482

Yes, I added (heavy) because that’s what a good PRM person does 😉

Optional columns are charge-number of the ion, collision energy (NCE) used in the instrument , … and of course when do we expect the ion to fly into the instrument, the Retention Time (RT) window

Start [min]
End [min]

More details here:👉 https://github.com/animesh/RawRead/tree/count-ions#:~:text=Minimal%20CSV%20requirements

Finally here is the code countIons.cs: https://raw.githubusercontent.com/animesh/RawRead/refs/heads/count-ions/countIons.cs which is a tiny helper that does three things:

Reads a Thermo RAW file
Writes a compact per-scan TSV
Accumulates signal for target m/z values

Build & run

mcs countIons.cs /reference:ThermoFisher.CommonCore.RawFileReader.dll \
    /reference:ThermoFisher.CommonCore.Data.dll \
    /reference:ThermoFisher.CommonCore.MassPrecisionEstimator.dll \
    /reference:MathNet.Numerics.dll \
    /reference:System.Numerics.dll \
    -out:countIons.exe

mono countIons.exe file.raw targets.csv

Mass tolerance (0.0001) and RT slack (0.01 min) are hard-coded for now.

What it actually matches

Observed m/z is parsed from the scan title
BasePeakMass is not used
Absolute mass tolerance ≤ 0.0001
Optional RT windows are applied if provided

Because of this:

two peptides with the same monoisotopic mass
cannot be disambiguated unless nonoverlapping RT is provided!

Here if a scan matches multiple targets, its intensity is counted for each —
and this is explicitly reported, not hidden.

Outputs (boring but honest)

<raw>.cI.tsv
Compact per-scan table (scan, RT, TIC, title, etc.)
<raw>.<csv>.accumulation.csv
Accumulated signal per target
<raw>.<csv>.duplicated_scans.tsv
Scans matched to multiple targets
<raw>.<csv>.unmatched_scans.tsv
Scans matched to none

Bottom line

This is not about being correct.

It’s about being explicit.

Before running fancy tools, I want to know:

what ions exist
where they show up
how crowded mass space really is

Sometimes the most useful thing is to just open the RAW file and look.

That’s all this does — and that’s exactly the point.

A final note on acquisition methods

How the RAW file was acquired matters.

Targeting on Orbitraps has subtle but important pitfalls, nicely explained here:
👉 https://proteomicsnews.blogspot.com/2015/03/targeting-on-q-exactive-which-method-to.html

Missing signal does not automatically mean missing peptide.

fuzzyLife

Discussion about this post

Ready for more?