massSight
is an R package for combining and scaling LC-MS metabolomics data.
- Citation: if you use
massSight
, please cite our manuscript: Chiraag Gohel and Ali Rahnavard. (2023). massSight: Metabolomics meta-analysis through multi-study data scaling, integration, and harmonization. https://github.com/omicsEye/massSight
Examples
Examples and extensive documentation can be found here
Installation
First, if you don’t have it installed, install devtools
using:
install.packages("devtools")
Then, in an R
console, run:
devtools::install_github("omicsEye/massSight")
You can then load the library using:
Data Preparation
massSight
works with the output of LC-MS experiments, which should contain columns corresponding to:
- Compound ID
- Retention Time
- Mass to Charge Ratio
- (Optional) Average Intensity across all samples
- (Optional) Metabolite Name
Compound_ID | MZ | RT | Intensity | Metabolite |
---|---|---|---|---|
1.69_121.1014m/z | 121.1014 | 1.69 | 40329.32 | 1.2.4-trimethylbenzene |
3.57_197.0669m/z | 197.0669 | 3.57 | 117400.93 | 1,7-dimethyluric acid |
7.74_282.1194m/z | 282.1194 | 7.74 | 16491.00 | 1-methyladenosine |
5.27_166.0723m/z | 166.0723 | 5.27 | 22801.91 | 1-methylguanine |
5.12_298.1143m/z | 298.1143 | 5.12 | 41602.96 | 1-methylguanosine |
9.58_126.1028m/z | 126.1028 | 9.58 | 3004.32 | 1-methylhistamine |
The massSight
Object
massSight
creates and uses the MSObject
class to store data and results pertaining to individual LC-MS experiments. Prior to alignment, LC-MS data frames or tibbles should be converted into an MSObject
using create_ms_obj
:
data(hp1)
data(hp2)
ms1 <-
create_ms_obj(
df = hp1,
name = "hp1",
id_name = "Compound_ID",
rt_name = "RT",
mz_name = "MZ",
int_name = "Intensity"
)
ms2 <-
create_ms_obj(
df = hp2,
name = "hp2",
id_name = "Compound_ID",
rt_name = "RT",
mz_name = "MZ",
int_name = "Intensity"
)
An MSObject
provides the following functions:
-
raw_df()
to access the experiment’s raw LC-MS data -
isolated()
to access the experiment’s isolated metabolites, which is important for downstream alignment tasks -
scaled_df()
to access the experiment’s scaled LC-MS data -
consolidated()
to access the experiment’s consolidated data -
metadata()
to access the experiment’s metadata
Compound_ID | Metabolite | RT | MZ | Intensity |
---|---|---|---|---|
cmp.3837 | C10 carnitine | 7.261300 | 316.2479 | 638168.92 |
cmp.3903 | C10:2 carnitine | 7.395033 | 312.2165 | 50418.96 |
cmp.3749 | C12 carnitine | 7.074067 | 344.2792 | 203210.69 |
cmp.3756 | C12:1 carnitine | 7.105283 | 342.2635 | 363021.48 |
cmp.3682 | C14 carnitine | 6.926967 | 372.3107 | 93491.07 |
cmp.3705 | C14:2 carnitine | 6.993833 | 368.2792 | 235545.00 |
Alignment
auto_combine()
Alignment is performed using auto_combine()
aligned <- auto_combine(
ms1,
ms2,
rt_lower = -.5,
rt_upper = .5,
mz_lower = -15,
mz_upper = 15,
smooth_method = "gam",
log = NULL
)
More information on the auto_combine()
function can be found in the package documentation
ml_match()
The ml_match()
function is an alternative method for merging LC-MS experiments with semi-annotated data sets.
ml_match_aligned <- ml_match(
ms1,
ms2,
mz_thresh = 15,
rt_thresh = 0.5,
prob_thresh = .5,
seed = 72
)
Results
Results from an alignment function are stored as a MergedMSObject
. This object contains the following slots:
-
all_matched()
: All of the final matched metabolites between the two datasets. This is the main result of the various matching functions.
rep_Compound_ID | rep_RT | rep_MZ | rep_Intensity | rep_Metabolite | Compound_ID_hp1 | Compound_ID_hp2 | Metabolite_hp1 | Metabolite_hp2 | RT_hp1 | RT_hp2 | MZ_hp1 | MZ_hp2 | Intensity_hp1 | Intensity_hp2 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0.63_74.0995m/z | 0.630000 | 74.0995 | 0.40 | 0.63_74.0995m/z | cmp.1 | 0.63 | 1.003550 | 74.0995 | 74.09722 | 0.40 | 5683858.19 | |||
cmp.10 | 1.003550 | 429.4048 | 14695.70 | NA | cmp.10 | NA | NA | 1.003550 | NA | 429.40477 | NA | 14695.70 | ||
1.14_262.0378m/z | 1.140000 | 262.0378 | 17383.20 | 1.14_262.0378m/z | cmp.100 | 1.14 | 1.199700 | 262.0378 | 262.03783 | 17383.20 | 20343.60 | |||
cmp.1000 | 1.918017 | 540.5345 | 38459.42 | NA | cmp.1000 | NA | NA | 1.918017 | NA | 540.53450 | NA | 38459.42 | ||
1.74_348.0645m/z | 1.740000 | 348.0645 | 59393.19 | 1.74_348.0645m/z | cmp.1001 | 1.74 | 1.918017 | 348.0645 | 348.06456 | 59393.19 | 41235.47 | |||
1.86_478.1970m/z | 1.860000 | 478.1970 | 132193.24 | 1.86_478.1970m/z | cmp.1002 | 1.86 | 1.918017 | 478.1970 | 478.19718 | 132193.24 | 35052.10 |
-
iso_matched()
: The matched isolated metabolites between the two datasets.
df1 | RT | MZ | Intensity | df2 | RT_2 | MZ_2 | Intensity_2 | delta_RT | smooth_rt | srt | delta_MZ | smooth_mz | smz | sintensity |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
7.98_76.0402m/z | 7.98 | 76.0402 | 13067.16 | cmp.4356 | 8.193933 | 76.04010 | 27158.674 | 0.2139333 | 0.1462600 | 7.848138 | -0.0001027 | -9.2e-06 | 76.04021 | 7832.970 |
8.14_77.0799m/z | 8.14 | 77.0799 | 9291.18 | cmp.4388 | 8.283100 | 77.07982 | 11295.184 | 0.1431000 | 0.1617242 | 7.992453 | -0.0000786 | -8.9e-06 | 77.07991 | 3625.042 |
4.79_79.0220m/z | 4.79 | 79.0220 | 2356.37 | cmp.2649 | 4.910117 | 79.02198 | 6818.932 | 0.1201167 | 0.1440289 | 4.627987 | -0.0000210 | -8.1e-06 | 79.02201 | 2327.172 |
7.50_80.1318m/z | 7.50 | 80.1318 | 84942.05 | cmp.4025 | 7.635767 | 80.13173 | 11765.610 | 0.1357667 | 0.0897390 | 7.419991 | -0.0000662 | -7.7e-06 | 80.13181 | 3757.301 |
7.86_84.9120m/z | 7.86 | 84.9120 | 10189.64 | cmp.4247 | 8.015617 | 84.91193 | 9442.532 | 0.1556167 | 0.1332109 | 7.740726 | -0.0000657 | -6.0e-06 | 84.91201 | 3097.303 |
8.75_84.9605m/z | 8.75 | 84.9605 | 240071.83 | cmp.4584 | 8.978550 | 84.96037 | 287055.188 | 0.2285500 | 0.1999628 | 8.559125 | -0.0001285 | -5.9e-06 | 84.96051 | 62125.356 |
Plotting results from alignment
The final_plots()
function returns plots containing information on RT and MZ drift for pre isolation, isolation, and final matching results. These plots can be used for diagnostic purposes.
plots <- final_plots(aligned,
rt_lim = c(-.5, .5),
mz_lim = c(-15, 15)
)
plots
This plot can be saved locally using ggsave()
from the ggplot2
package:
ggplot2::ggsave(
filename = "plot.png",
plot = plots
)
Using massSight
to annotate unknown metabolites
merged_df <- all_matched(aligned)
hp2_annotated <- merged_df |>
dplyr::select(Compound_ID_hp2, rep_Metabolite) |>
dplyr::inner_join(hp2, by = c("Compound_ID_hp2" = "Compound_ID"))
hp2_annotated |>
dplyr::filter(rep_Metabolite != "") |>
dplyr::arrange(rep_Metabolite) |>
head(10) |>
knitr::kable()
Compound_ID_hp2 | rep_Metabolite | Metabolite | RT | MZ | Intensity |
---|---|---|---|---|---|
cmp.2157 | 1,7-dimethyluric acid | 3.817950 | 197.0669 | 140175.90 | |
cmp.4168 | 1-methyladenosine | 7.864050 | 282.1194 | 43167.05 | |
cmp.2810 | 1-methylguanine | 5.361300 | 166.0723 | 25898.35 | |
cmp.2782 | 1-methylguanosine | 5.267700 | 298.1145 | 70138.72 | |
cmp.785 | 1.2.4-trimethylbenzene | 1.805967 | 121.1014 | 13453.95 | |
cmp.2750 | 3-(N-acetyl-L-cystein-S-yl) acetaminophen | 5.124100 | 313.0850 | 38069.50 | |
cmp.4740 | 3-methylhistidine | 10.289133 | 170.0923 | 490315.34 | |
cmp.2091 | 4-acetamidobutanoate | 3.595050 | 146.0811 | 78624.40 | |
cmp.2828 | 5-acetylamino-6-amino-3-methyluracil | 5.424667 | 199.0824 | 29391.98 | |
cmp.2446 | 6.8-dihydroxypurine | 4.544583 | 153.0407 | 10692.24 |
Here, rep_Metabolite
is the metabolite name from the reference dataset.
Dev Instructions
Installation
- Clone/pull
massSight
- Open the R project
massSight.Rproj
- Build package using
devtools::build()
- Install package using
devtools::install()