Skip to content

EMSL-Computing/CoreMS-Orchestrator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CoreMS Orchestrator

Version Python DOI

A PySide6 desktop application for processing direct infusion FT-ICR mass spectrometry data using the CoreMS framework and AI agent.


Overview

CoreMS Orchestrator is an AI-augmented desktop application for automated processing, molecular formula assignment, and intelligent interpretation of Fourier-transform ion cyclotron resonance (FT-ICR) mass spectrometry data. The inventive features are:

(a) Embedded autonomous AI agent with dual-scope tool execution. A large language model (LLM) agent operates within the desktop application and possesses programmatic control over both the local four-stage processing workflow (import, calibrate, search, export) and remote computational infrastructure via the Model Context Protocol (MCP). The agent can autonomously execute any combination of local GUI operations and server-side actions (database queries, batch workflow submission, parameter management, object storage) within a single conversational interaction, without requiring the user to manually switch between interfaces.

(b) Structured concurrency bridge between synchronous GUI and asynchronous protocol layers. The application implements a signal-mediated architecture that bridges PySide6/Qt's synchronous event loop with asynchronous MCP transport (anyio/httpx) and OpenAI-compatible LLM inference, enabling real-time agentic tool-use loops within a responsive desktop GUI without blocking the user interface.

(c) Context-aware scientific reasoning with live spectral state injection. The agent receives a continuously updated representation of the current spectral state—including all assigned peaks with mass accuracy, molecular formula, double-bond equivalence, compound class, and confidence scores, enabling domain-specific analytical reasoning (van Krevelen analysis, compound class distributions, aromaticity assessment) grounded in the actual experimental data rather than generic knowledge.

(d) Instrument-agnostic, endpoint-agnostic design. The application supports multiple instrument vendors (Bruker SolarIX, Thermo Scientific) and can connect to any OpenAI-compatible LLM endpoint and any MCP-compliant server, permitting deployment within diverse institutional computing environments without vendor lock-in.

(e) Planned: Automated molecular networking from tandem MS data. Future extensions incorporate MS/MS fragmentation analysis for organometallic compound characterization and metabolomics, with AI-driven construction and annotation of molecular similarity networks.


Features

Feature Details
Non-blocking processing Each step runs in a QThread; the UI stays responsive
Four-step workflow Import → Calibrate → Search → Export
Live Matplotlib plots Mass spectrum, assigned bar chart, DBE vs C#, PPM error histogram
Sortable / filterable results table Click any column header to sort; type to filter
Multiple export formats CSV, HDF5, Excel (.xlsx)
Agent chatbot Embedded OpenAI-compatible assistant — reads live spectrum context, triggers local workflow steps, and calls remote CoreMS MCP tools

Installation

Prerequisites

  • Python 3.10 or newer
  • CoreMS installed in the same environment

1. Clone this repository

git clone https://github.com/EMSL-Computing/CoreMS-GUI.git
cd CoreMS-GUI

2. Install CoreMS

Follow the CoreMS installation guide or install from source:

git clone https://github.com/EMSL-Computing/CoreMS.git
cd CoreMS
pip install -e .
cd ..

3. Install the GUI package

Install corems-gui as an editable package (recommended for development):

pip install -e .

Or install only the runtime dependencies directly:

pip install -r requirements.txt

macOS note: PySide6 ships its own Qt libraries — no separate Qt installation is required.


Usage

Run via the installed entry point

corems-gui

Run as a module

python -m corems_gui

Import programmatically

from corems_gui import main
main()

# or launch just the window inside an existing QApplication:
from corems_gui import FTICRMainWindow
window = FTICRMainWindow()
window.show()

Check the version

import corems_gui
print(corems_gui.__version__)  # 0.1.0

Version management

This project uses bump-my-version to keep version numbers consistent across pyproject.toml and corems_gui/__init__.py.

Install the tool:

pip install bump-my-version

Bump the version:

# patch: 0.1.0 → 0.1.1
bump-my-version bump patch

# minor: 0.1.0 → 0.2.0
bump-my-version bump minor

# major: 0.1.0 → 1.0.0
bump-my-version bump major

Each bump updates both files, creates a commit, and tags the release (e.g. v0.1.1).


Workflow

┌──────────────────────────────────────────────────────────────┐
│  MenuBar  (File · Help)                                      │
├──────────────────┬───────────────────────────────────────────┤
│ Step tabs        │  Spectrum plot  (Matplotlib + nav toolbar) │
│  1. Import       │  ─────────────────────────────────────────│
│  2. Calibrate    │  Results table  (sortable / filterable)    │
│  3. Search       │   + Processing log                        │
│  4. Export       │                                           │
├──────────────────┴───────────────────────────────────────────┤
│  Status bar                          [ progress indicator ]  │
└──────────────────────────────────────────────────────────────┘

Step 1 — Import

Select a Bruker Solarix .d directory. Configure:

  • Apodization method and zero-fill / truncation counts
  • Noise threshold method and sensitivity parameters
  • Peak-picking m/z range and minimum prominence

Click Import & Process to load the transient and build the mass spectrum.

Step 2 — Calibrate

Select a reference mass list (.ref format, two-column: Formula m/z). Configure:

  • PPM error window for calibrant matching
  • Polynomial order (1 = linear, 2 = quadratic)
  • Minimisation method and signal-to-noise threshold for calibrant peaks

Click Run Calibration to apply m/z domain calibration.

Step 3 — Search

Configure the molecular formula search:

  • PPM error tolerance and DBE (double bond equivalent) range
  • Ion types: protonated [M±H], radical [M]•, or adduct
  • Per-atom min/max ranges for C, H, O, N, S, P
  • Scoring method and first-hit mode

Click Search Molecular Formulas to assign formulas to every peak.

Step 4 — Export

Choose one or more output formats and a filename stem, then click Export Results:

Format Description
.csv Flat tabular results, Excel-compatible
.hdf5 Full CoreMS HDF5 archive (metadata + spectra)
.xlsx Excel workbook via pandas

A live Spectrum Summary shows peak count, assignment rate, m/z range, and baseline noise.


Plot Types

View Description
Mass Spectrum All peaks (blue) with assigned peaks highlighted (red)
Assigned vs. Unassigned Bar chart of peak assignment counts
DBE vs. C# Scatter plot of double-bond equivalent vs carbon number
PPM Error Distribution Histogram of formula assignment errors

Agent Chatbot

The Agent tab (right panel) embeds an OpenAI-compatible LLM assistant (default: PNNL AI Incubator Depot) that can:

  • Answer questions about the currently loaded spectrum using a live peak table passed as context.
  • Execute local workflow steps (Import, Calibrate, Search, Export) directly from the chat, with optional parameter overrides.
  • Call remote tools via the CoreMS MCP server to query the EMSL database or submit server-side processing jobs.

Setup

  1. Install dependencies (included in pip install -e .):

    pip install openai 'mcp[cli]>=1.8' PyJWT
  2. Start the CoreMS MCP server (from mcp/):

    cd /path/to/corems-app/mcp
    python server.py
  3. In the Agent tab — Agent Settings:

    • Set LLM API Key (or set the LLM_API_KEY environment variable)
    • Set LLM Base URL (default: https://ai-incubator-api.pnnl.gov)
    • Set Model — any model available on the endpoint, e.g. gpt-4o-birthright
    • Set MCP URL to http://localhost:8811/mcp (default)
  4. To authenticate against protected MCP tools, expand Generate Auth Token:

    • Enter the server's Secret Key (matches SECRET_KEY in the CoreMS API config)
    • Optionally adjust User ID, First/Last Name, Email
    • Click Generate Token — a local HS256 JWT is created and filled into the Auth Token field automatically (same algorithm as ftms_monet_etl)

GUI Action Tools

The agent has five built-in tools that operate directly on the local GUI — no MCP server required:

Tool Description
gui_get_state() Return a JSON snapshot of the current panel settings and loaded spectrum. The agent calls this first to confirm parameters before running any step.
gui_run_import(...) Trigger Step 1 — Import with optional parameter overrides (file path, apodization, noise method, m/z range, etc.).
gui_run_calibrate(...) Trigger Step 2 — Calibrate with optional overrides (reference file, PPM window, polynomial order, etc.).
gui_run_search(...) Trigger Step 3 — Search with optional overrides (PPM tolerance, DBE range, ion types, atom ranges, etc.).
gui_run_export(...) Trigger Step 4 — Export with optional overrides (output path, formats).

All parameters are optional — omitted ones use the current panel values. The agent always asks for confirmation before executing a step unless explicitly instructed to proceed.

MCP tool categories

Category Auth required Description
MonetResult queries No Query processed results from the EMSL database
FTMS data & parameters Yes (JWT) Register files, manage parameter sets
FTMS workflows Yes (JWT) Submit QC and DI molecular formula jobs
GCMS data & parameters Yes (JWT) Register GC-MS files, manage parameter sets
GCMS workflows Yes (JWT) Submit low-resolution GC-MS peak-picking jobs
MinIO storage Yes (MinIO creds) Generate presigned upload/download URLs

Example prompts

Local processing:

  • "Run the import step with the file I have selected."
  • "Search for molecular formulas using a ±2 ppm window and CHO only."
  • "Run the full workflow — import, calibrate, search, and export."

Data analysis:

  • "I have 3500 peaks and 60% assigned — what should I check?"
  • "What is the compound class distribution for the current spectrum?"
  • "Plot DBE vs C# for the assigned peaks."

Server / database (requires MCP server):

  • "What proposals are available in the database?"
  • "List all FTMS results for proposal P12345."
  • "Submit a DI workflow for data IDs 42 and 43 using parameter set 7."

Module Structure

corems_gui/
├── __init__.py          ← main() entry point and public API
├── __main__.py          ← enables  python -m corems_gui
├── _constants.py        ← shared enumerations (method lists, column names…)
├── _helpers.py          ← Qt widget factory functions (int_spin, combo…)
├── app.py               ← FTICRMainWindow  (QMainWindow)
├── canvas.py            ← SpectrumCanvas   (Matplotlib + Qt toolbar)
├── models.py            ← PeaksModel / SortFilterPeaksModel
├── workers.py           ← ImportWorker, CalibrationWorker, SearchWorker, ExportWorker, ChatWorker
└── panels/
    ├── __init__.py
    ├── import_panel.py        ← Step 1 form widget
    ├── calibration_panel.py   ← Step 2 form widget
    ├── search_panel.py        ← Step 3 form widget (atom ranges table)
    ├── export_panel.py        ← Step 4 form widget + spectrum summary
    └── chat_panel.py          ← Agent chatbot (Chat / Settings / Auth Token tabs)

Each *Panel is a self-contained QWidget that emits a run_requested(dict) signal — it has no direct dependency on CoreMS. All CoreMS calls are isolated inside workers.py. ChatWorker runs the agentic tool-use loop (LLM + MCP + GUI actions) in a QThread.


Supported Data Formats

Input Extension Notes
Bruker Solarix transient .d directory Reads fid/ser + apexAcquisition.method
Thermo Fisher RAW .raw Reads via CoreMS Thermo reader
Reference mass list .ref Two columns: Formula and m/z

Requirements

See requirements.txt for runtime dependencies. Key dependencies:

Package Version Purpose
PySide6 ≥ 6.5.0 Qt 6 bindings for the GUI
matplotlib ≥ 3.7.0 Embedded spectrum plots
pandas ≥ 1.5.0 Results table and Excel export
openpyxl ≥ 3.1.0 .xlsx file writing
openai ≥ 1.30.0 OpenAI-compatible LLM client for the agent chatbot
mcp[cli] ≥ 1.8 MCP client SDK (Streamable-HTTP transport)
httpx ≥ 0.27 Async HTTP (transitive dep of mcp[cli])
PyJWT ≥ 2.8 Local HS256 JWT generation for CoreMS API auth
Development extras:
pip install bump-my-version

Release Notes

v0.1.0 — Initial Release (2026-06-23)

First public release of CoreMS Orchestrator.

Direct Infusion FT-ICR MS Workflow

  • Four-step processing pipeline: Import → Calibrate → Search → Export
  • Bruker SolarIX (.d) and Thermo Scientific (.raw) file support
  • Configurable noise thresholding (log, minima, S/N, relative, absolute)
  • Reference-mass m/z domain calibration with polynomial regression (1st–3rd order)
  • Bayesian-scored molecular formula assignment with full periodic table element support
  • Multi-format export: CSV, HDF5, Excel (.xlsx)

LC-MS DDA Workflow with Molecular Networking

  • Persistent-homology peak picking for mass feature detection
  • Automated MS1/MS2 spectral association (centroid and profile mode auto-detection)
  • Molecular formula search on LC-MS mass features (SearchMolecularFormulasLC)
  • Element-based mass feature filtering (e.g. Fe for siderophore discovery)
  • FlashEntropy spectral library construction from MSP files
  • Molecular networking: open search and neutral loss search types
  • Entropy and cosine similarity matrices with greedy modularity clustering
  • Interactive HTML network visualizations and edge list/matrix export

Embedded AI Agent

  • Conversational LLM agent with real-time access to live spectral state
  • Six local GUI tools: gui_get_state, gui_run_import, gui_run_calibrate, gui_run_search, gui_run_export, gui_run_lcms
  • Remote MCP server integration for database queries, workflow submission, and object storage
  • Compatible with any OpenAI-compatible LLM endpoint (GPT, Claude, Grok, o-series)
  • Automatic retry with exponential backoff on transient API failures
  • Structured concurrency bridge (Qt signals ↔ asyncio ↔ anyio/MCP)
  • Built-in JWT token generator for authenticated MCP server operations

Desktop GUI

  • PySide6 (Qt 6) responsive UI with non-blocking QThread workers
  • Live Matplotlib spectrum visualization with multiple plot types
  • Sortable/filterable results table with column-click sorting
  • Persistent per-panel settings via QSettings
  • Dark-themed high-contrast agent chat interface

License

This material is free to use, and attribution is always appreciated.  Attribution may read as follows:

Authored by Yuri E. Corilo at the Pacific Northwest National Laboratory, operated by Battelle for the U.S. Department of Energy.

Please cite the following in your work: Yuri Corilo. (2026). EMSL-Computing/CoreMS-Orchestrator: CoreMS Orchestrator version 0.1.0 (0.1.0). Zenodo. https://doi.org/10.5281/zenodo.20821747

About

CoreMS Orchestrator is an AI-augmented desktop application for automated processing, molecular formula assignment, and intelligent interpretation of Fourier-transform ion cyclotron resonance (FT-ICR) mass spectrometry data.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors