基于GMM-PSO-RF的烟叶变黄程度预测与烤房环境因子耦合分析

    Prediction of Yellowing Degree of Tobacco Leaves and Coupled Analysis of Environmental Factors in Curing Barn Based on GMM-PSO-RF

    • 摘要:
      目的 为提高烘烤过程中烟叶变黄程度的预测精度,探究烤房环境因子对上下层烟叶变黄程度的影响。
      方法 基于颜色阈值分割和高斯混合模型(GMM)提出一种阶段分割算法,提取烘烤过程中上下层烟叶图像的变黄程度。采用粒子群(PSO)算法对随机森林(RF)、支持向量机(SVR)和反向传播神经网络(BPNN)3种机器学习算法进行超参数优化,并结合烤房环境因子(温度、湿度和烘烤时间)构建烟叶变黄程度的预测模型。利用SHAP方法对最优预测模型进行解释性分析,揭示烤房环境因子与烟叶变黄程度的关系。
      结果 阶段分割算法的平均绝对误差(MAE)和均方误差(MSE)均值分别为0.02407和0.00058,小于单独颜色阈值分割算法(0.07657、0.00588)和单独GMM算法(0.06541、0.00429),在提取烟叶变黄程度方面具有较高的提取精度。五折交叉验证中,PSO-RF模型对于上下层烟叶变黄程度具有最优的预测精度,上层模型MAE、MSE、r2的标准差和变异系数最小,分别为0.0073、0.0058、0.0066和0.0440、0.1246、0.0069;下层模型MAE、MSE、r2的标准差和变异系数也最小,分别为0.0062、0.0051、0.0052和0.0403、0.1181、0.0053。模型预测结果分析中,模型PSO-BPNN与PSO-SVR精度分别为r2 < 0.90、MSE > 0.15,r2 > 0.90、MSE < 0.15,模型PSO-RF精度最高(r2 > 0.95,MSE < 0.06);利用SHAP法对最优模型PSO-RF分析可得,上层温度和烘烤时间分别是影响上层和下层烟叶变黄程度的关键环境因子。
      结论 建立的GMM-PSO-RF模型可在复杂烘烤环境下准确预测不同棚次的烟叶变黄程度,为烘烤工艺的调整提供科学依据。

       

      Abstract:
      Objective In order to improve the prediction accuracy of yellowing degree of tobacco leaves during curing process, the influence of environmental factors on the yellowing degree of upper and lower tobacco leaves was studied.
      Method Based on color threshold segmentation and gaussian mixture model (GMM), a stage segmentation algorithm was proposed to extract the yellowing degree of upper and lower tobacco leaf images during baking process. The particle swarm optimization (PSO) algorithm was used to optimize the hyperparameters of three machine learning algorithms: random forest (RF), support vector machine (SVR) and back propagation neural network (BPNN), and the prediction model of tobacco yellowing degree was constructed by combining the environmental factors (temperature, humidity and curing time) of curing barn. The SHAP method was used to interpret the optimal prediction model, and the relationship between the environmental factors of the curing barn and the yellowing degree of tobacco leaves was revealed.
      Result The mean absolute error (MAE) and mean square error (MSE) of the segmentation algorithm in the stage were 0.02407 and 0.00058, respectively, which were smaller than those of the single color threshold segmentation algorithm (0.07657, 0.00588) and the single GMM algorithm (0.06541, 0.00429). It has high extraction accuracy in extracting the yellowing degree of tobacco leaves. In the five-fold cross-validation, the PSO-RF model had the best prediction accuracy for the yellowing degree of the upper and lower tobacco leaves. For the upper layer model, the standard deviation and coefficient of variation for MAE, MSE, and r2 were the smallest, which were 0.0073, 0.0058, 0.0066 and 0.0440, 0.1246, 0.0069, respectively. The standard deviation and coefficient of variation of MAE, MSE and r2 in the lower layer model were also the smallest, which were 0.0062, 0.0051, 0.0052 and 0.0403, 0.1181, 0.0053, respectively. In the analysis of model prediction results, the accuracy of the model PSO-BPNN and PSO-SVR were r2 < 0.90, MSE > 0.15, r2 > 0.90, MSE < 0.15, respectively. The model PSO-RF has the highest accuracy (r2 > 0.95, MSE < 0.06). SHAP analysis of the optimal PSO-RF model revealed that upper layer temperature and curing time were the key environmental factors influencing the yellowing degree of the upper and lower layers of tobacco leaves, respectively.
      Conclusion The GMM-PSO-RF model can accurately predict the yellowing degree of tobacco leaves in different sheds under complex baking environment, and provide scientific basis for the adjustment of baking process.