Abstract
Failure event narratives contain detailed and valuable information describing how failures initiate and propagate. Event causality analysis can help improve the understanding of failure physics and facilitate the use of non-failure data (e.g., near-misses and degradations) to complement the limited data pool of failures, which is common in high-reliability industries such as the nuclear power industry. Automatically extracting event causality from text data, however, is challenging given complex and diverse language structures and causal patterns, and the lack of access to large, annotated datasets for use as training data. Existing automated mining approaches are mainly knowledge-based and extract causality using a set of predefined keywords and rules, which have difficulty achieving good performance. In this paper, we propose a novel large language model (LLM)-based approach for automated causality extraction. It leveraged the strong capability of LLM to understand intricate language patterns in long-range contexts and accurately extract cause-and-effect pairs from texts. The proposed approach has a twofold framework: causality detection and causality extraction. The causality detection step trained a deep learning model to identify texts with causality. The causality extraction step developed a T5-CE LLM to identify and extract cause-and-effect pairs in each text sample. A large, annotated dataset of the U.S. nuclear power plant failure event reports was used to train and evaluate the models. The model evaluation was performed using three performance metrics, including precision, recall, and F1 score. The proposed approach can effectively detect implicit and embedded causalities across multiple sentences.