1.长江大学 物理与光电工程学院,湖北 荆州 434023
2.深圳爱湾医学检验实验室 深圳罕见病代谢组学精准医学工程研究中心,广东 深圳 518000
3.中国科学院大学 深圳医院,广东 深圳 518000
杨琴,博士,副教授,研究方向:化学计量学方法及其在疾病检测和机理分析中的应用,E - mail:yangqin00055@hnu.edu.cn
扫 描 看 全 文
肖雯,牛芊芊,孙智勇等.基于气相色谱-质谱的尿液代谢组学技术结合化学计量学用于戊二酸血症Ⅰ型早期检测研究[J].分析测试学报,2022,41(11):1577-1583.
XIAO Wen,NIU Qian-qian,SUN Zhi-yong,et al.Early Detection of Glutaric Acidemia Type I by Urinary Metabolomics Analysis Based on Gas Chromatography-Mass Spectrometry Coupled with Chemometrics[J].Journal of Instrumental Analysis,2022,41(11):1577-1583.
肖雯,牛芊芊,孙智勇等.基于气相色谱-质谱的尿液代谢组学技术结合化学计量学用于戊二酸血症Ⅰ型早期检测研究[J].分析测试学报,2022,41(11):1577-1583. DOI: 10.19969/j.fxcsxb.22042101.
XIAO Wen,NIU Qian-qian,SUN Zhi-yong,et al.Early Detection of Glutaric Acidemia Type I by Urinary Metabolomics Analysis Based on Gas Chromatography-Mass Spectrometry Coupled with Chemometrics[J].Journal of Instrumental Analysis,2022,41(11):1577-1583. DOI: 10.19969/j.fxcsxb.22042101.
采用气相色谱-质谱联用技术结合化学计量学,针对高维小样本的疾病代谢组学图谱建立高性能的戊二酸血症Ⅰ型(GA-Ⅰ)早期检测模型。基于偏最小二乘判别分析(PLS-DA)的共线性处理和数据解释优势,自助抽样法(Bootstrap)通过数据扰动方式集成多个模型的变量选择能力,挑选出能够持续被筛选的变量实现稳健特征筛选(BS-PLSDA)。对于GA-Ⅰ的尿液代谢组学图谱,在两种逐步增大训练集之间样本差异的比例划分(7∶3和6∶4)下,载荷(LW)、变量投影重要性(VIP)、显著性多元相关(sMC)3种信息向量对应的BS-PLSDA均优于其单独PLS-DA建模的特征变量筛选稳健性。在样本划分比例为7∶3时,BS-VIP-PLSDA的Kuncheva指数高达0.807 5。筛选出的稳健特征变量与文献报道的诊断指标一致,不仅真正解释组别间的差异与GA-Ⅰ的代谢机理密切相关,且BS-LW-PLSDA、BS-VIP-PLSDA和BS-sMC-PLSDA展示了良好的预测性能,受试者工作特征曲线下面积均值分别为0.773 9、0.854 8和0.847 1,马修斯相关系数均值分别为0.671 9、0.783 8和0.801 3。与支持向量机递归特征消除法(SVM-RFE)相比,在采用相同的集成特征选择策略下,尽管非线性径向基核函数对应的BS-RBF-SVMRFE可获得高建模性能,但数据解释能力较低。该研究提出的BS-PLSDA可兼顾建模性能和模型解释能力,符合实际临床需求,对GA-Ⅰ早期检测、辅助诊断和疾病机理研究具有很好的指导意义。
An efficient early detection framework for glutaric acidemia type Ⅰ(GA-Ⅰ) was developed by utilizing urinary metabolomics analysis based on gas chromatography-mass spectrometry(GC-MS) coupled with chemometrics,aiming to overcome small samples and high dimension modeling problems.In the proposed framework,assisted by the capability of partial least squares discriminant analysis(PLS-DA) in collinearity processing and data interpretation,bootstrap was introduced to perform data perturbation and induce multiple base classifiers,integrating their feature selection strengths and forming a novel algorithm of BS-PLSDA.Based on three informative vectors of loading weights(LW),variable importance in the projection(VIP) and significance multivariate correlation(sMC),the formed novel algorithm BS-PLSDA enabled the screening of discriminative features that were so strong to survive across multiple base classifiers.Investigated by GC-MS urinary metabolomic profiling of GA-Ⅰ,the results showed that BS-PLSDAs of three informative vectors all outperformed their corresponding PLS-DAs modeled by single classifier in selection stability,even if the ratio of sample partitioning was altered from 7∶3 to 6∶4,gradually increasing the sample difference among training sets.When the ratio of sample partitioning was 7∶3,the Kuncheva index of BS-VIP-PLSDA could reach to 0.807 5.Furthermore,the screened stable discriminative features exhibited close biological correlations to the metabolic mechanism of GA-Ⅰ,in which several reported diagnostic organic acids were searched.Meanwhile,they yielded desired predictive powers that the averages of area under receiver operating characteristic curve(AUC) were 0.773 9,0.854 8 and 0.847 1,while Matthews correlation coefficient(MCC) were 0.671 9,0.783 8 and 0.801 3 for BS-LW-PLSDA,BS-VIP-PLSDA and BS-sMC-PLSDA,respectively.Finally,a comparison was performed between PLS-DA and support vector machine recursive feature elimination(SVM-RFE).Equipped with the same ensemble feature selection strategy,the model BS-RBF-SVMRFE using nonlinear radial basis function(RBF) was superior to BS-PLSDAs in classification performance.Nevertheless,it obtained poor model interpretability.All the results revealed that the proposed BS-PLSDA exhibited its modeling feasibilities both in classification performance and data interpretation,resulting in good meet in clinical demand.It suitably guided the early detection,and aided clinical diagnosis and disease mechanism understanding for GA-Ⅰ.
戊二酸血症I型早期检测气相色谱-质谱偏最小二乘判别分析自助抽样法稳健特征筛选
glutaric acidemia type Ⅰearly detectiongas chromatography-mass spectrometrypartial least squares discriminant analysisbootstrapstable feature selection
Goodman S I,Kohlhoff J G.Biochem. Med.,1975,13:138-140.
Boy N,Mühlhausen C,Maier E M,Heringer J,Assmann B,Burgard P,Dixon M,Fleissner S,Greenberg C R,Harting I,Hoffmann G F,Karall D,Koeller D M,Krawinkel M B,Okun J G,Opladen T,Posset R,Sahm K,Zschocke J,Kölker S.J. Inherit. Metab. Dis.,2017,40(1):75-101.
Han L S,Yang Y L,Yang R L,Chen R M,Huang X W.Chin. J. Med. Genet. (韩连书,杨艳玲,杨茹莱,陈瑞敏,黄新文.中华医学遗传学杂志),2021,38(1):1-6.
Kuhara T.Mass Spectrom. Rev.,2005,24(6):814-827.
Hampe M H,Panaskar S N,Yadav A A,Ingale P W.Clin. Biochem.,2017,50:121-126.
Naccarato A,Gionfriddo E,Elliani R,Sindona G,Tagarelli A.J. Chromatogr. A,2014,1372:253-259.
Paul A,de Boves Harrington P.Trends Anal. Chem.,2021,135:116165.
Madsen R,Lundstedt T,Trygg J.Anal. Chim. Acta,2010,659:23-33.
Blaise B J,Correia G D S,Haggart G A,Surowiec I,Sands C,Lewis M R,Pearce J T M,Trygg J,Nicholson J K,Holmes E,Ebbels T M D.Nat. Protoc.,2021,16:4299-4326.
Fu G H,Zhang B Y,Kou H D,Yi L Z.Chemom. Intell. Lab. Syst.,2017,160:22-31.
Wang K Y,Yang S,Guo C Y,Bian X H.J. Instrum. Anal. (王恺怡,杨盛,郭彩云,卞希慧.分析测试学报),2022,41(3):398-402.
Sun X M,Yu X P,Liu Y,Xu L,Di D L.Chemom. Intell. Lab. Syst.,2012,115:37-43.
Wehrens R,Franceschi P,Vrhovsek U,Mattivi F.Anal. Chim. Acta,2011,705(1/2):15-23.
Abeel T,Helleputte T,de Peer Y V,Dupont P,Sæys Y.Bioinformatics,2010,26(3):392-398.
Brereton R G,Lloyd G R.J. Chemom.,2014,28(4):213-225.
Mehmood T,Sæbø S,Liland K H.J. Chemom.,2020,34(6):e3226.
Kuncheva L I.The 25th International Multi-conference on Artificial Intelligence and Applications,Anaheim,2007.
Zhang Y H,Li H X,Ma R Y,Mei L B,Wei X D,Liang D S,Wu L Q.Clin. Chim. Acta,2016,453:75-79.
Guyon I,Weston J,Barnhill S,Vapnik V.Mach. Learn.,2002,46:389-422.
0
浏览量
10
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构