Abstract:
Objective Sow estrus recognition is essential for determining insemination timing and improving reproductive efficiency. Aiming at the problems of strong subjectivity and poor continuous monitoring capability of traditional manual estrus detection, as well as the limited robustness of single-modal methods in complex pig house environments, this study explored the effect of feature-level fusion of visible light and infrared thermal imaging on estrus recognition, and proposed a non-contact intelligent estrus detection method.
Method Weaned multiparous Large White sows were selected as the research objects. Visible light images and infrared thermal imaging data of their vulva regions were collected twice daily continuously. Paired visible light-infrared samples were constructed based on timestamps. A two-branch structure was adopted for feature extraction. The visible light branch employed ResNet-50 with the backbone network retained to extract image spatial features, and adopted BiLSTM combined with temporal attention mechanism for time-series modeling. The infrared branch extracted statistical temperature features of the vulva and performed full connection encoding. The high-level semantic features of the two branches were linearly projected to a unified low-dimensional space for concatenated feature-level fusion. MLP was used to learn the complementary relationship between modalities, and the Sigmoid classifier at the output end generated the final recognition results.
Result By comparing the performance of single-modal models, different fusion strategies and ablation components, the feature-level fusion model achieved an accuracy of 93.1%, a recall rate of 94.5%, and an AUC of 96.2% on the test set. Compared with the single visible light modality, the accuracy increased by 1.5 percentage; compared with the single infrared modality, the accuracy increased by 4.6 percentage and the precision increased by 8.3 percentage. In addition, its accuracy was 1.0 and 0.3 percentage points higher than that of input-level fusion and decision-level fusion respectively. In ablation experiments, with the gradual introduction of LSTM, BiLSTM and temporal attention mechanism, the accuracy increased sequentially from 89.8% to 91.4%, 92.2% and 93.1%.
Conclusion Visible light vulvar phenotypic information and infrared thermal physiological information present strong complementarity. Feature-level fusion can effectively eliminate heterogeneous differences in the raw data layer, realize cross-modal feature interaction at a high semantic level, and significantly improve the accuracy of sow estrus recognition.