A novel variable selection method based on sampling error profile analysis frame and least angel regression(SEPA-LAR) was proposed in order to build a robust NIR model.Based on SEPA-LAR,more models were obtained by Monte Carlo sampling(MCS),and the LAR regression coefficients at each wavelength were statistically analyzed,which were sorted by the sum sequence of their absolute values.Wavelengths containing larger sums of the absolute values of regression coefficients were selected,and a model with the wavelengths was built.Samples in the independent validation dataset were applied in the evaluation of the model.NIR datasets of corn moisture,diesel density and cheese fat were used to evaluate the performance of SEPA-LAR.Errors of root mean squared error of prediction(RMSEP) estimated with the validation dataset are 0001 44%(moisture),0001 58 g/mL(density) and 113 g/100 g(fat content),respectively.The results showed that,compared with Monte Carlo uninformative variable elimination(MCUVE),moving window partial least squares regression(MWPLS) and competitive adaptive reweighted sampling(CARS),SEPA-LAR could select less wavelengths and has smaller prediction error.The calibration model built by SEPA-LAR has good predictive ability,stability and interpretability.
关键词
最小角回归回归系数蒙特卡洛采样采样误差分布分析变量选择近红外光谱
Keywords
least angle regressionregression coefficientMonte Carlo samplingsampling error profile analysisvariable selectionnear infrared spectroscopy