Failure Event Mining With Fine-Tuned Large Language Model: Case Study of Analyzing United States Nuclear Power Plant Failure Event Reports

Sai Zhang; Shahidur Rahoman Sohag; Min Xian; Shoukun Sun; Zhegang Ma

doi:10.1111/risa.70191

Back

Failure Event Mining With Fine-Tuned Large Language Model: Case Study of Analyzing United States Nuclear Power Plant Failure Event Reports

Journal article

Open access

Peer reviewed

Failure Event Mining With Fine-Tuned Large Language Model: Case Study of Analyzing United States Nuclear Power Plant Failure Event Reports

Sai Zhang, Shahidur Rahoman Sohag, Min Xian, Shoukun Sun and Zhegang Ma

Risk analysis, Vol.46(3), e70191

03/2026

DOI: https://doi.org/10.1111/risa.70191

PMID: 41715937

Appears in Artificial Intelligence and Machine Learning Research

Abstract

failure event narrative

deep learning

nuclear power plant

text mining

large language model

causality extraction

Failure event narratives contain detailed and valuable information describing how failures initiate and propagate. Event causality analysis can help improve the understanding of failure physics and facilitate the use of non-failure data (e.g., near-misses and degradations) to complement the limited data pool of failures, which is common in high-reliability industries such as the nuclear power industry. Automatically extracting event causality from text data, however, is challenging given complex and diverse language structures and causal patterns, and the lack of access to large, annotated datasets for use as training data. Existing automated mining approaches are mainly knowledge-based and extract causality using a set of predefined keywords and rules, which have difficulty achieving good performance. In this paper, we propose a novel large language model (LLM)-based approach for automated causality extraction. It leveraged the strong capability of LLM to understand intricate language patterns in long-range contexts and accurately extract cause-and-effect pairs from texts. The proposed approach has a twofold framework: causality detection and causality extraction. The causality detection step trained a deep learning model to identify texts with causality. The causality extraction step developed a T5-CE LLM to identify and extract cause-and-effect pairs in each text sample. A large, annotated dataset of the U.S. nuclear power plant failure event reports was used to train and evaluate the models. The model evaluation was performed using three performance metrics, including precision, recall, and F1 score. The proposed approach can effectively detect implicit and embedded causalities across multiple sentences.

Files and links (1)

url

Article Landing PageView

Published (Version of record) Open

Metrics

1 Record Views

Details

Title: Failure Event Mining With Fine-Tuned Large Language Model: Case Study of Analyzing United States Nuclear Power Plant Failure Event Reports
Creators: Sai Zhang - Idaho National Laboratory
Shahidur Rahoman Sohag - University of Idaho
Min Xian - University of Idaho
Shoukun Sun - University of Idaho
Zhegang Ma - Idaho National Laboratory
Publication Details: Risk analysis, Vol.46(3), e70191
Publisher: Wiley
Grant note: 31310019N0006 / U.S. Nuclear Regulatory Commission
Identifiers: 996892012301851
Academic Unit: Institute for Modeling Collaboration and Innovation; Initiative for Bioinformatics and Evolutionary Studies; Computer Science
Language: English
Resource Type: Journal article