1.陕西科技大学 电子信息与人工智能学院,陕西 西安 710021
2.上海交通大学 药学院,上海 200240
3.广东药科大学 中医药研究院,广东 广州 510006
扫 描 看 全 文
王娅妮,杜丽晶,郭拓等.分类先验特征选择算法在代谢组学数据变量筛选中的应用[J].分析测试学报,2023,42(04):423-431.
WANG Ya-ni,DU Li-jing,GUO Tuo,et al.Application of Classification Prior Feature Selection Algorithm in Screening Metabonomic Data Variables of Hyperlipidemia[J].Journal of Instrumental Analysis,2023,42(04):423-431.
王娅妮,杜丽晶,郭拓等.分类先验特征选择算法在代谢组学数据变量筛选中的应用[J].分析测试学报,2023,42(04):423-431. DOI: 10.19969/j.fxcsxb.22113002.
WANG Ya-ni,DU Li-jing,GUO Tuo,et al.Application of Classification Prior Feature Selection Algorithm in Screening Metabonomic Data Variables of Hyperlipidemia[J].Journal of Instrumental Analysis,2023,42(04):423-431. DOI: 10.19969/j.fxcsxb.22113002.
该文提出了基于无监督判别投影特征选择的支持向量机方法(UDPFS-SVM)用于标志物筛选。UDPFS-SVM首先通过无监督判别投影算法(UDPFS)引入分类先验信息、添加正则化与惩罚函数等约束自适应地获得具有稀疏性的判别投影矩阵,然后根据获得的矩阵求得相应低维代谢矩阵,最后建立支持向量机(SVM)分类模型寻找生物标志物。所提出的方法能够同时进行模糊学习与稀疏学习,并可合理利用变量之间的依赖关系。通过UDPFS-SVM与偏最小二乘判别分析(PLS-DA)方法对高脂血症大鼠血浆代谢组学数据进行变量筛选,并采用方差分析、ROC曲线、线性判别分析(LDA)对筛选得到的生物标志物进行评价。结果表明,两种方法均发现8个生物标志物。方差分析显示UDPFS-SVM方法获得的生物标志物均具有显著性差异,且显著性差异值均大于PLS-DA;ROC结果显示UDPFS-SVM结果为1.00,比PLS-DA结果高0.05;LDA显示UDPFS-SVM获得的生物标志物在高脂血症样本中可以更好地消除组内代谢差异,区分组间代谢差异,说明UDPFS-SVM方法在高脂血症生物标志物发现上优于PLS-DA,为生物标志物的发现提供了一种新思路。
Partial least squares discriminant analysis(PLS-DA) is currently a common method for biomarker screening in metabolomics research.However,it is often not ideal for finding the biomarkers in biomedicine,a class of complex non-linear research objects since it is a typical linear algorithm.Thus,a support vector machine approach based on unsupervised discriminative projection feature selection(UDPFS-SVM) is proposed in this paper.This method may be divided into two steps.The first step is to obtain the low-dimensional discriminant projection matrix.The UDPFS-SVM firstly introduces category prior information,then adding regularization and constraints such as penalty functions to obtain a discriminant projection matrix.Subsequently,the discriminant projection matrix is filtered by weights to become a low-dimensional discriminant projection matrix.The second step is to establish the support vector machine classification model.The UDPFS-SVM is used to build a support vector machine classification model based on the projection matrix to find biomarkers.It is worth mentioning that it is able to adaptively adjust the low-dimensional sparse projection matrix.Meanwhile, the UDPFS-SVM is able to perform both fuzzy and sparse learning,and it can also make reasonable use of the dependency relationships between variables.Therefore,it can handle non-linear research objects very well.In this paper,the metabolomic data of hyperlipidemic rats were screened for variables using the UDPFS-SVM and PLS-DA.And the biomarkers obtained from the screening were evaluated by variance analysis,ROC curves,and linear discriminant analysis(LDA).The results showed that eight biomarkers were identified by each of the two methods.Variance analysis showed that the numbers of significant biomarkers obtained by UDPFS-SVM were more than those of PLS-DA.Furthermore,the significant difference values obtained by UDPFS-SVM were all larger than those by PLS-DA.ROC curves results showed that the ROC value of UDPFS-SVM was significantly higher than that of PLS-DA.The ROC value of UDPFS-SVM is 1.00,which is 0.05 higher than that of PLS-DA.The results of LDA showed that biomarkers obtained by UDPFS-SVM could better eliminate the intra-group metabolic differences in hyperlipidaemic samples,and it could more significantly differentiate inter-group metabolic differences in hyperlipidaemic samples.In summary,the UDPFS-SVM is superior to PLS-DA in the discovery of biomarkers for hyperlipidemia.Therefore,UDPFS-SVM is a relatively ideal marker screening method for dealing with the complex non-linear research subject of biomedicine.It improves the accuracy of screening for markers in biomedicine,a non-linear research subject.This method offers a new way for biomarker discovery in the era of precision medicine.
变量筛选无监督判别投影分类先验信息非线性高维小样本代谢组学
variable screeningunsupervised discriminative projectionclassified prior informationnon-linearhigh-dimensional and small samplesmetabonomics
Yang Q C,Li S N,Chen S,Lin S Q,Yu Y H.Chin. J. Clin. Ration. Drug Use(杨倩春,李思宁,陈硕,林少勤,余艳红.临床合理用药),2020,13(2):176-178.
Wang J T,Hou Y,Li K.Chin. J. Health Stat. (王璟涛,侯艳,李康.中国卫生统计),2016,33(3):374-378.
Tan B B,Xiang D,Jia M,Fu L.Sci. Sin. (Vitae)(谭斌斌,向迪,贾蒙,付利.中国科学:生命科学),2018,48(1):15-23.
Song K,Li X.Chin. J. Bioinf. (宋凯,李霞.生物信息学),2008,(2):90-92, 96.
Qi H W,Xu X,Wen R,Gao D S,Wang C R,Liu Y F,Jin H L,Liang X M.J. Instrum. Anal. 戚华文,徐鑫,温柔,高德嵩,王超然,刘艳芳,金红利,梁鑫淼.分析测试学报),2021,40(1):72-78.
Ke Z F,Zhang T,Wu X Y,Li K.Chin. J. Health Stat. (柯朝甫,张涛,武晓岩,李康.中国卫生统计),2014,31(2):357-359,365.
Du L J,Wang Q,Ji S,Sun Y F,Huang W,Zhang Y,Li S S,Yan S K,Jin H.Front. Cell. Infect. Microbiol.,2022,12:729940.
Dai P Y,Yu X J,Xie W H,Zhao C,Liu R,Yin L H,Chen B W.Chin. J. Health Stat.
戴品远,余小金,谢纬华,赵超,刘冉,尹立红,陈炳为.中国卫生统计),2021,38(5):656-660.
Miller H A,Yin X,Smith S A,Hu X,Zhang X,Yan J,Miller D M,van Berkel V H,Frieboes H B.Lung Cancer,2021,156:20-30.
Guo J,Zhu W.Thirty-second AAAI Conference on Artificial Intelligence,2018,32(1):2232-2239.
Bhadra T,Mallik S,Sohel A,Zhao Z M.IEEE/ACM Trans. Comput. Biol. Bioinf.,2022,19(3):1354-1364.
Wu X,Ding H,Liu N,Dong Y,Guan J.IEEE Trans. Geosci. Remote Sens.,2022,11(60):1-15
Liu Q,Zhang X L,Wang Y N.Radio Eng. (刘启,张晓蕾,王亚楠.无线电工程),2021,51(12):1471-1476.
Khan M A,Ashraf I,Alhaisoni M,Damaševičius R,Scherer R,Rehman A,Bukhari S A C.Diagnostics,2020,10(8):565.
Wang R,Bian J,Nie F,Li X L.IEEE Trans. Knowl. Data Eng.,2022,34(2):942-953.
Cortes C,Vapnik V N.Mach. Learn.,1995,(20):273-297.
Karaboga D,Basturk B.J. Global Optim.,2007,39:459-471.
Liu Y C,Wang H J,Ma J G,Liu J,Wei D J,Zhou C S,Cao H.Chem. Anal. Meterage(刘月程,王焕军,马金刚,刘静,魏德健,周晨烁,曹慧.化学分析计量),2018,27(5):105-109.
Wu Y,Li L.J. Chromatogr. A,2016,1430:80-95.
Bijlsma S,Bobeldijk I,Verheij E R,Ramaker R,Kochhar S,Macdonald I A,Ommen B V,Smilde A K.Anal. Chem.,2006,78(2):567-574.
Zhu P F,Zuo W M,Zhang L,Hu Q H,Shiu S C K.Pattern Recognit.,2015,48(2):438-446.
The SPSSAU Project (2022). SPSSAU. (Version 22.0) [Online Application Software]. Retrieved from https://www.spssau.comhttps://www.spssau.com.
Bewick V,Cheek L,Ball J.Crit. Care,2004,8(2):130.
Erkel A R V,Pattynama P M T.Europ. J. Radiol.,1998,27(2):88-94.
Xu L,Raitoharju J,Iosifidis A,Gabbouj M.IEEE Trans. Cybern.,2022,52(10):200-213.
Chang C C,Lin C J.ACM Trans. Intell. Syst. Technol.,2011,2(27):1-27.
Abulizi A,Vatner D F,Ye Z,Wang Y,Camporez J P,Zhang D,Kahn M,K L,Sirwi A,Cline G W,Hussain M M,Aspichueta P,Samuel V T,Shulman G I.J. Lipid Res.,2020,61(12):1565-1576.
Watson A D.J. Lipid Res.,2006,47(10):1-11.
Sethi J K,Vidal-Puig A J.J. Lipid Res.,2007,48(6):53-62.
Barupal D K,Fiehn O.Environ. Health Perspect.,2019,127(9):97008.
0
Views
5
下载量
0
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution