massSight
is an R package for combining and scaling LC-MS metabolomics data. It enables alignment and integration of metabolomics data from multiple experiments by correcting systematic differences in retention time and mass-to-charge ratios.
- Citation: if you use
massSight
, please cite our manuscript: Chiraag Gohel and Ali Rahnavard. (2023). massSight: Metabolomics meta-analysis through multi-study data scaling, integration, and harmonization. https://github.com/omicsEye/massSight
Input Data Format
massSight
works with LC-MS data frames that must contain the following required columns:
- Compound ID - Unique identifier for each feature
- Retention Time (RT) - The retention time in minutes
- Mass to Charge Ratio (MZ) - The mass-to-charge ratio
- Intensity (Optional) - Average intensity across samples
- Metabolite Name (Optional) - Known metabolite annotations
Example input data format:
Compound_ID | MZ | RT | Intensity | Metabolite |
---|---|---|---|---|
1.69_121.1014m/z | 121.1014 | 1.69 | 40329.32 | 1.2.4-trimethylbenzene |
3.57_197.0669m/z | 197.0669 | 3.57 | 117400.93 | 1,7-dimethyluric acid |
7.74_282.1194m/z | 282.1194 | 7.74 | 16491.00 | 1-methyladenosine |
5.27_166.0723m/z | 166.0723 | 5.27 | 22801.91 | 1-methylguanine |
5.12_298.1143m/z | 298.1143 | 5.12 | 41602.96 | 1-methylguanosine |
9.58_126.1028m/z | 126.1028 | 9.58 | 3004.32 | 1-methylhistamine |
Usage
1. Create massSight Objects
First, convert your LC-MS data frames into MSObject
s using create_ms_obj
:
ms1 <- create_ms_obj(
df = hp1,
name = "hp1",
id_name = "Compound_ID", # Column name for compound IDs
rt_name = "RT", # Column name for retention time
mz_name = "MZ", # Column name for mass-to-charge ratio
int_name = "Intensity", # Column name for intensity (optional)
metab_name = "Metabolite" # Column name for metabolite names (optional)
)
ms2 <- create_ms_obj(
df = hp2,
name = "hp2",
id_name = "Compound_ID",
rt_name = "RT",
mz_name = "MZ",
int_name = "Intensity",
metab_name = "Metabolite"
)
2. Align Datasets
Use mass_combine()
to align the datasets. The function offers two main approaches:
A. Automatic Parameter Optimization (Recommended)
aligned <- mass_combine(
ms1, # Reference dataset
ms2, # Dataset to align
optimize = TRUE, # Enable automatic parameter optimization
smooth_method = "gam", # Method for drift correction
n_iter = 50 # Number of optimization iterations
)
#> Optimizing parameters using Bayesian optimization...
#> Initializing optimization...
#>
#> Target score achieved! Stopping optimization.
#> Optimization complete. Final score: 1.000
#>
#> Optimal parameters:
#> RT delta: 0.690
#> MZ delta: 7.781
#> RT isolation threshold: 0.066
#> MZ isolation threshold: 4.336
#> Alpha rank: -1.173
#> Alpha RT: -0.568
#> Alpha MZ: -1.835
B. Manual Parameter Setting
aligned <- mass_combine(
ms1,
ms2,
optimize = FALSE,
rt_delta = 0.5, # RT window (±minutes)
mz_delta = 15, # MZ window (±ppm)
minimum_intensity = 10, # Minimum intensity threshold
smooth_method = "gam" # Drift correction method
)
#> GAM smoothing for RT drift
#> Starting mass error correction
#> GAM smoothing for mass error
#> Creating potential final matches
#> Calculating match scores
3. Access Results
The alignment results can be accessed in several ways:
# Get all matched features
matches <- all_matched(aligned)
# Get unique 1:1 matches
unique_matches <- get_unique_matches(aligned)
4. Visualize Results
Generate diagnostic plots to assess alignment quality:
final_plots(aligned)
Images can be saved using ggplot2::ggsave()
.
ggplot2::ggsave("alignment_diagnostics.png", plot = final_plots(aligned), width = 10, height = 10)
Key Parameters
-
optimize
: WhenTRUE
, uses Bayesian optimization to find optimal alignment parameters -
rt_delta
: Retention time window for matching (in minutes) -
mz_delta
: Mass-to-charge ratio window for matching (in ppm) -
smooth_method
: Method for drift correction (“gam”, “bayesian_gam”, “gp”, or “lm”) -
match_method
: Strategy for initial matching (“unsupervised” or “supervised”) -
minimum_intensity
: Minimum intensity threshold for features
Output Format
The aligned results contain:
- Matched Features: All corresponding features between datasets
- Drift Corrections: Systematic differences in RT and MZ
- Quality Metrics: Alignment evaluation scores
- Diagnostic Plots: Visualization of RT and MZ drift
Examples and Documentation
For more detailed examples and extensive documentation, visit our documentation site.