Abstract
Integrating domain knowledge into machine learning (ML) models is critical for achieving reliable and interpretable predictions in complex scientific fields such as geoscience. In several recent studies centered on the so-called Neuro-Symbolic AI (NSAI) frameworks, symbolic geological knowledge was successfully combined with traditional ML algorithms to improve the prediction of mineral deposit types. The fast development of Large Language Model (LLM) brings new opportunities to further enhance the NSAI applications. In this study, to construct the symbolic component of NSAI, we used an LLM to automatically extract, structure, and transform descriptive knowledge from authoritative geoscience textbooks into a machine-readable format. The result captures geochemical signatures, lithological settings, and alteration features associated with various mineral systems. The structured knowledge was integrated into a decision tree classifier by embedding each sample with a vectorized representation of its corresponding deposit type. Compared to conventional ML models trained solely on geochemical data, our NSAI model achieved significantly higher accuracy on the test sets, indicating improved generalization. Moreover, the NSAI model demonstrated consistent performance across a broader set of deposit types, including those with extremely limited training samples. In particular, the NSAI framework improved predictive stability and accuracy even for minority classes with only 3 to 5 samples, where traditional ML models tend to overfit or fail. This robustness underscores the value of incorporating expert-level geological knowledge into data-driven pipelines. In our result assessment, the SHAP (SHapley Additive exPlanations) analysis further revealed that symbolic knowledge vectors contributed substantially to the model's decision-making process, confirming their importance in enhancing interpretability and predictive power. Our work demonstrates that LLM-guided knowledge extraction offers an effective and scalable way to integrate structured domain knowledge into mineral prediction tasks. We hope the work can also provide insights for other geoscientific applications of NSAI.