Batch Alignment and Scaling for LC-MS Data • massSight

Installation
Input Data Format
Usage
Key Parameters
Output Format
Examples and Documentation

massSight is an R package for combining and scaling LC-MS metabolomics data. It enables alignment and integration of metabolomics data from multiple experiments by correcting systematic differences in retention time and mass-to-charge ratios.

Citation: if you use massSight, please cite our manuscript: Chiraag Gohel and Ali Rahnavard. (2023). massSight: Metabolomics meta-analysis through multi-study data scaling, integration, and harmonization. https://github.com/omicsEye/massSight

Installation

pak::pak("omicsEye/massSight")

You can then load the library using:

library(massSight)

Input Data Format

massSight works with LC-MS data frames that must contain the following required columns:

Compound ID - Unique identifier for each feature
Retention Time (RT) - The retention time in minutes
Mass to Charge Ratio (MZ) - The mass-to-charge ratio
Intensity (Optional) - Average intensity across samples
Metabolite Name (Optional) - Known metabolite annotations

Example input data format:

Compound_ID	MZ	RT	Intensity	Metabolite
1.69_121.1014m/z	121.1014	1.69	40329.32	1.2.4-trimethylbenzene
3.57_197.0669m/z	197.0669	3.57	117400.93	1,7-dimethyluric acid
7.74_282.1194m/z	282.1194	7.74	16491.00	1-methyladenosine
5.27_166.0723m/z	166.0723	5.27	22801.91	1-methylguanine
5.12_298.1143m/z	298.1143	5.12	41602.96	1-methylguanosine
9.58_126.1028m/z	126.1028	9.58	3004.32	1-methylhistamine

Usage

1. Create massSight Objects

First, convert your LC-MS data frames into MSObjects using create_ms_obj:

ms1 <- create_ms_obj(
    df = hp1,
    name = "hp1",
    id_name = "Compound_ID",  # Column name for compound IDs
    rt_name = "RT",           # Column name for retention time
    mz_name = "MZ",           # Column name for mass-to-charge ratio
    int_name = "Intensity",   # Column name for intensity (optional)
    metab_name = "Metabolite" # Column name for metabolite names (optional)
)

ms2 <- create_ms_obj(
    df = hp2,
    name = "hp2",
    id_name = "Compound_ID",
    rt_name = "RT", 
    mz_name = "MZ",
    int_name = "Intensity",
    metab_name = "Metabolite"
)

2. Align Datasets

Use mass_combine() to align the datasets. The function offers two main approaches:

A. Automatic Parameter Optimization (Recommended)

aligned <- mass_combine(
    ms1,                    # Reference dataset
    ms2,                    # Dataset to align
    optimize = TRUE,        # Enable automatic parameter optimization
    smooth_method = "gam",  # Method for drift correction
    n_iter = 50            # Number of optimization iterations
)
#> Optimizing parameters using Bayesian optimization...
#> Initializing optimization...
#> 
#> Target score achieved! Stopping optimization.
#> Optimization complete. Final score: 1.000
#> 
#> Optimal parameters:
#>   RT delta: 0.690
#>   MZ delta: 7.781
#>   RT isolation threshold: 0.066
#>   MZ isolation threshold: 4.336
#>   Alpha rank: -1.173
#>   Alpha RT: -0.568
#>   Alpha MZ: -1.835

B. Manual Parameter Setting

aligned <- mass_combine(
    ms1,
    ms2,
    optimize = FALSE,
    rt_delta = 0.5,        # RT window (±minutes)
    mz_delta = 15,         # MZ window (±ppm)
    minimum_intensity = 10, # Minimum intensity threshold
    smooth_method = "gam"  # Drift correction method
)
#> GAM smoothing for RT drift
#> Starting mass error correction
#> GAM smoothing for mass error
#> Creating potential final matches
#> Calculating match scores

3. Access Results

The alignment results can be accessed in several ways:

# Get all matched features
matches <- all_matched(aligned)
# Get unique 1:1 matches
unique_matches <- get_unique_matches(aligned)

4. Visualize Results

Generate diagnostic plots to assess alignment quality:

final_plots(aligned)

Images can be saved using ggplot2::ggsave().

ggplot2::ggsave("alignment_diagnostics.png", plot = final_plots(aligned), width = 10, height = 10)

Key Parameters

optimize: When TRUE, uses Bayesian optimization to find optimal alignment parameters
rt_delta: Retention time window for matching (in minutes)
mz_delta: Mass-to-charge ratio window for matching (in ppm)
smooth_method: Method for drift correction (“gam”, “bayesian_gam”, “gp”, or “lm”)
match_method: Strategy for initial matching (“unsupervised” or “supervised”)
minimum_intensity: Minimum intensity threshold for features

Output Format

The aligned results contain:

Matched Features: All corresponding features between datasets
Drift Corrections: Systematic differences in RT and MZ
Quality Metrics: Alignment evaluation scores
Diagnostic Plots: Visualization of RT and MZ drift

Examples and Documentation

For more detailed examples and extensive documentation, visit our documentation site.