浏览全部资源
扫码关注微信
1.上海理工大学材料与化学学院,上海 200093
2.上海海关工业品与原材料检测技术中心,上海 201210
Received:05 February 2025,
Revised:28 March 2025,
Published Online:28 May 2025,
Published:15 June 2025
移动端阅览
唐磊,茅晔辉,蔡婧,刘恒钦,闵红,安雅睿,刘曙.光谱数据增强方法及其应用进展[J].分析测试学报,2025,44(06):1-10.
Tang Lei,Mao Ye-hui,Cai Jing,Liu Hengqin,Min Hong,An Ya-rui,Liu Shu.Spectral Data Augmentation Methods and Their Application Progress[J].Journal of Instrumental Analysis,2025,44(06):1-10.
唐磊,茅晔辉,蔡婧,刘恒钦,闵红,安雅睿,刘曙.光谱数据增强方法及其应用进展[J].分析测试学报,2025,44(06):1-10. DOI: 10.12452/j.fxcsxb.25020569.
Tang Lei,Mao Ye-hui,Cai Jing,Liu Hengqin,Min Hong,An Ya-rui,Liu Shu.Spectral Data Augmentation Methods and Their Application Progress[J].Journal of Instrumental Analysis,2025,44(06):1-10. DOI: 10.12452/j.fxcsxb.25020569.
随着机器学习在光谱分析中的深入应用,模型训练面临数据样本稀缺、类别分布失衡等挑战,制约模型的泛化性能并引发过拟合风险。该文综述了2017年以来国内外公开文献,将光谱数据增强方法归纳为非深度学习数据增强方法和深度学习数据增强方法两大类,揭示了其从浅层数据扩充向深度生成建模的演进趋势。非深度学习的数据增强方法通过光谱变换和光谱合成来实现数据扩展,凭借其计算效率优势,在工业过程监控、中药材溯源及药物与食品质量检测等小样本场景中展现出良好的适用性。深度生成模型主要为生成对抗网络(GAN)及其衍生方法和改进型自编码器(AE)。GAN通过对抗博弈机制生成与原始数据具有结构相似性和分布一致性的增强样本,在医疗影像诊断、精准农业和材料分类等高精度建模场景广泛应用;改进型AE通过潜在空间表征学习捕获数据本质特征,其生成数据既保持原始分布特性又具备特征鲁棒性,在化学物质鉴定和土壤成分检测等高维数据处理任务中优势显著。该综述指出了现有数据增强方法的局限性,并对未来发展方向进行了探讨。
With the deepening application of machine learning in spectral analysis,model training faces challenges such as data sample scarcity and class imbalance,which limit the model's generalization performance and lead to the risk of overfitting. This paper reviews the domestic and international literature published since 2017 and categorizes spectral data augmentation methods into two major types:non-deep learning data augmentation methods and deep learning data augmentation methods. It reveals the evolutionary trend from shallow data expansion to deep generative modeling. Non-deep learning data augmentation methods achieve data expansion through spectral transformation and spectral synthesis. Due to their computational efficiency,they show good applicability in small-sample scenarios such as industrial process monitoring,traditional Chinese medicine traceability,and drug and food quality detection. Deep generative models primarily include Generative Adversarial Networks(GAN) and their derivative methods,as well as improved Autoencoders(AE). GAN generates augmented samples with structural similarity and distribution consistency with the original data through adversarial game mechanisms. It is widely applied in high-precision modeling scenarios such as medical image diagnosis,precision agriculture,and materials classification. Improved AEs capture the intrinsic features of data through latent space representation learning,with generated data maintaining the original distribution characteristics while also having robust features. They have significant advantages in high-dimensional data processing tasks such as chemical substance identification and soil component detection. This review also highlights the limitations of existing data augmentation methods and discusses future development directions.
Mumuni A , Mumuni F . Array , 2022 , 16 : 100258 .
Lemley J , Corcoran P . IEEE Consum. Electron. Mag , 2020 , 93 : 55 - 63 .
Khan A A , Chaudhari O , Chandra R . Expert Syst. Appl , 2024 , 244 : 122778
Yang S R , Yang H C , Shen F R , Zhao J . J. Softw . (杨锁荣,杨洪朝,申富饶,赵健.软件学报), 2024 , 36 ( 3 ): 1 - 23 .
Feng X S , Shen Y , Wang D Q . CSA. (冯晓硕,沈樾,王冬琦.计算机科学与应用), 2021 , 11 ( 2 ): 370 - 382 .
Shorten C , Khoshgoftaar T M . J. Big Data , 2019 , 61 : 60 .
Hao X , Liu L , Yang R , Yin L , Zhang L , Li X . Remote Sens. , 2023 , 153 : 827 .
Bayer M , Kaufhold M-A , Reuter C . ACM Comput. Surv , 2022 ,557:Article 146.
Abayomi-Alli O O , Damaševičius R , Qazi A , Adedoyin-Olowe M , Misra S . Electronics-Switz , 2022 , 1122 : 3795 .
Iglesias G , Talavera E , González-Prieto Á , Mozo A , Gómez-Canaval S . Neural. Comput. APPL. , 2023 , 3514 : 10123 - 10145 .
Yan C L , Liu S , Zhu Z , Min H , Zhang Q , Zhao W , Su P , An Y , Li C , Wu X . J. Environ. , 2024 , 123 : 112580 .
Zhao W Y , Li C , Yan C L , Min H , An Y , Liu S . Anal. Chim. Acta , 2021 , 1166 : 338574 .
Li J Y , ZHU X L , Chen P , Xu Y P , Liu D . Metall. Anal. (李敬岩,褚小立,陈瀑,许育鹏,刘丹.冶金分析), 2024 , 4410 : 1 - 9 .
Gracia Moisés A , Vitoria Pascual I , Imas González J J , Ruiz Zamarreño C . Sensors-Basel , 2023 , 2320 : 8562 .
Blake N , Gaifulina R , Griffin L D , Bell I M , Thomas G M H . Diagnostics , 2022 , 126 : 1491 .
Jin D , Yang B . IEEE JSTARS , 2023 , 16 : 7896 - 7906 .
Oord A , Dieleman S , Zen H , Simonyan K , Vinyals O , Graves A , Kalchbrenne N , Senior A , Kavukcuoglu K . ArXiv , 2016 , 1609 : 03499 .
Sasaki H , Willcocks C G , Breckon T P . ArXiv , 2017 , 1710 : 01927 .
Zhao W Y , Min H , Liu S , An Y R , Yu J . Spectrosc. Spectral Anal. (赵文雅,闵红,刘曙,安雅睿,俞进.光谱学与光谱分析), 2021 , 4107 : 1998 - 2004 .
Rangel F , Saez E , Henry A , Caceres-Hernandez D , Galan J S . IEEE ISIE , 2021 ,: 1 - 6 .
Georgouli K , Osorio M T , Martinez Del Rincon J , Koidis A . J. Chemom. , 2018 , 326 : e3004 .
Guo Z H , Wen S Z , Li S F , Wang Q , Wang Y X , Wang X G , Niu L Y , Li Y W , Feng W . J. Chin. Pharm. Sci . (郭兆华,文师召,李思凡,王琪,王颖鑫,王鑫国,牛丽颖,李亚薇,冯薇.中国药学), 2024 , 5921 : 2022 - 2029 .
Bjerrum E J , Glahder M , Skov T . ArXiv , 2017 , 1710 : 01927 .
Nallan Chakravartula S S , Moscetti R , Bedini G , Nardella M , Massantini R . Food Control , 2022 , 135 : 108816 .
Ma C Y , Huang Y Y , Shi Y B , Kong X M . Chem. Res. Appl . (马晨字,黄越洋,石元博,孔宪明. 化学研究与应用) , 2024 , 3602 : 292 - 298 .
Conlin A K , Martin E B , Morris A J . Chemom. Intell. Lab. Syst , 1998 , 441 : 161 - 173 .
Wu T F , Zhao H , Yang G , Jia L X , Liu R . Wool Text. J . (武天福,赵慧,杨光,贾丽霞,刘瑞. 毛纺科技), 2024 , 5207 : 141 - 146 .
M-JSÁIZ-ABAJO , Mevik B-H , Segtnan V H , Næs T . Anal. Chim. Acta , 2005 , 533 : 147 - 159 .
Naveed T . Explore the Effect of Data Augmentation of Spectroscopic Data for Deep Learning model . Norwegian : Norwegian University of Life Sciences , 2022 .
Blazhko U , Shapaval V , Kovalev V , Kohler A . Chemom. Intell. Lab. Syst , 2021 , 215 : 104367 .
Díaz-Romero D J , Van den Eynde S , Sterkens W , Eckert A , Zaplana I , Goedemé T , Peeters J . Spectrochim. Acta B , 2022 , 196 : 106519 .
Poggialini F , Campanella B , Legnaioli S , Raneri S , Palleschi V . Appl. Spectrosc. , 2022 , 768 : 959 - 966 .
Houston J , Glavin F G , Madden M G . J. Chem. Inf. Model , 2020 , 604 : 1936 - 1954 .
Chawla N V , Bowyer K W , Hall L O , Kegelmeyer W P . J. Artif. Int. Res. , 2002 , 161 : 321 – 357 .
Bogner C , Kühnel A , Huwe B . IEEE WHISPERS , 2014 ,pp: 1 - 4 .
Wang S , Liu S , Zhang J , Che X , Yuan Y , Wang Z , Kong D . Fuel , 2020 , 282 : 118848 .
Alotaibi A . Symmetry , 2020 , 1210 : 1705 .
Hu M F , Zuo X , Liu J W . JAS . (胡铭菲,左信,刘建伟.自动化学报), 2022 , 48 ( 1 ): 40 - 74
Zhang X , Li H , Tian X , Chen C . Su Y,Li M,Lv J,Chen C,Lv X . Chemom. Intell. Lab. Syst , 2022 , 231 : 104681 .
Wu M , Wang S , Pan S , Terentis A C . Strasswimmer J,Zhu X . Sci. Rep. , 2021 , 111 : 23842 .
Pavlou E , Kourkoumelis N . Chemom. Intell. Lab. Syst. , 2022 , 228 : 104634 .
Reyes-Rivera A E , López-Canteñs G d J , Cruz-Meza P , Chávez-Aguilera N . Agrociencia , 2024 , 58 ( 6 ): 674 - 688 .
Tan A L , Chu Z Y , Wang X S , Zhao Y . Spectrosc. Spectral Anal . (谈爱玲,楚振原,王晓斯,赵勇.光谱学与光谱分析), 2022 , 4203 : 769 - 775 .
Tan H B , Hu Y T , Ma B X , Yu G W , Li Y J . Food Control , 2024 , 157 : 110168 .
Wu Z J , Liu F Q , Li Z G , Chen H . Food Sci. (吴至境,刘富强,李志刚,陈慧 . 食品科学) , 2024 , 46 ( 2 ): 214 - 221 .
Yang S , Zhang X A , Wang Z M , Lei C W , Song W W . TCSAE. (杨森,张新奡,王振民,类成伟.宋文龙. 农业工程学报), 2023 , 3919 : 250 - 257 .
Ouali Y , Hudelot C , Tami M . ArXiv , 2020 , 2006 : 05278 .
Trinh N H , Brien D O . ISSC , 2020 ,: 1 - 5 .
Park S , Lee K H , Ko B , Kim N . Sci. Rep. , 2023 , 131 : 2925 .
Xu M , Wang Y . IEEE Access , 2021 , 9 : 27736 - 27747 .
Zhan Y , Hu D , Wang Y , Yu X . IEEE GRSL , 2018 , 152 : 212 - 216 .
He Z , Liu H , Wang Y , Hu J . Remote Sens. , 2017 , 9 ( 10 ): 1042 .
Kerdegari H , Razaak M , Argyriou V , Remagnino P . Clin. Orthop. Relat. R , 2019 , 1905 : 10920 .
Li Z X , Hyperspectral Wheat Seed Identification Based on Dataaugmentation and Cross Sensing . Henan:Henan Institute of Science and Technology(李泽旭,基于数据增强与交叉感知的高光谱小麦种子鉴别研究及应用.河南:河南科技学院), 2024 .
Luo J C , Wu Q B , Jin C , FANG H F , Xue C Y , He D F . Anal. Methods , 2025 , 17 ( 6 ): 1236 - 1251 .
Zhu Y , Su H , Xu P , Xu Y , Wang Y , Dong C H , Lu J , Le Z , Yang X , Xuan Q , Zou C L , Ren H . Opt. Express , 2023 , 3123 : 37722 - 37739 .
Zhang Y , Feng C C , Xia Q , Hu T , Yuan L B , Spectrosc . Spectral Anal. (张印,冯程成,夏启,胡挺,苑立波. 光谱学与光谱分析), 2024 , 4408 : 2273 - 2278 .
Meng X Z , Liu Y Q , Liu L N , Spectrosc . Spectral Anal. (孟星志,刘亚秋,刘丽娜. 光谱学与光谱分析), 2024 , 4402 : 542 - 547 .
Zhang M , Wang Z , Wang X , Gong M , Wu Y , Li H . Pattern Recognit , 2023 , 142 : 109701 .
Li Y L , Research on Hyperspectral Regressionalgorithm of Crop Nitrogen Elementbased on Deep Learning . Beijing:Beijing University of Chemical Technology(李昱霖,基于深度学习的农作物氮元素高光谱回归算法研究.北京:北京化工大学), 2024 .
Singh A , Bruzzone L . IEEE GEOSCI REMOTE S , 2022 , 19 : 1 - 5 .
Ma W H , Research Study on the Origin Tracing of Chinese MedicinalMaterials Based on Convolutional Neural Network and CTGANData Enhancement-A Case Study ofAstragalus membranaceus . Zhejiang:Zhejiang Gongshang University (马文浩,基于卷积神经网络和 CTGAN 数据增强的中药材产地溯源研究——以黄芪为例 . 浙江 : 浙江工商大学) , 2023 .
Duan C , Liu X , Cai W , Shao X . J. Chem. Inf. Model , 2022 , 6216 : 3695 - 3703 .
Kingma D P , Welling M . Found. Trends Mach. Learn. , 2019 , 124 : 307 – 392 .
Kingma D P , Welling M . Clin. Orthop. Relat. R , 2013 , 1312 : 6114 .
Mu G , Chen J . IEEE Trans. Instrum. Meas. , 2022 , 71 : 1 - 8 .
Efitorova A , Burikova S , Laptinskiya K , Dolenkoab T , Dolenkoa S . proceedings of science , 2021 , 1 : 12
Wan M , Yan T , Xu G , Liu A , Zhou Y , Wang H , Jin X . Comput. Electron. Agr. , 2023 , 215 : 108427 .
Makhzani A , Shlens J , Jaitly N , Goodfellow I J . ArXiv , 2015 , 1511 : 05644 .
Zhao X Y , Li T L , Huang G Z . Optoelectron. (赵新宇,李统乐,黄光造. 光电子), 2024 , 1402 : 35 - 43 .
Zhang T , Li C Y , Li C Z . Laser Optoelectron. Prog. (张涛,李春宇,李传召 . 激光与光电子学进展) , 2024 , 31 ( 1690 /TN): 1 - 16 .
0
Views
0
下载量
0
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution