Output list
Preprint
Posted to a preprint site 05/20/2026
Metaphor requires a language model to resolve a token whose contextual meaning diverges from its basic literal sense. Understanding how transformer models organize this reinterpretation across depth remains an open problem in mechanistic interpretability. We introduce conditional scale entropy (CSE), a wavelet-derived measure of how broadly transformer computation engages across frequency scales at each layer position. Two theorems establish that CSE is invariant to update magnitude, isolating the structural pattern of updates from their intensity. Using CSE, we find that metaphorical tokens produce significantly higher spectral breadth than literal tokens at contiguous layer positions on every decoder-only architecture tested, from 124M to 20B parameters (GPT-2 family, LLaMA-2 7B, GPT-oss 20B). The effect survives cluster-based permutation correction, recurs in the early-to-mid relative depth range across models, and converges with an independent analysis of 200 naturalistic VUA pairs. Specificity controls further show that the effect is not explained by semantic complexity or by matched propositional content. These results identify multi-scale coordination as a consistent signature of metaphorical language processing in the decoder-only architectures examined, and establish CSE as a principled tool for characterizing cross-depth structure in transformers.
Preprint
UA-Net: Uncertainty-Aware Network for TRISO Image Semantic Segmentation
Posted to a preprint site 04/16/2026
Tristructural isotropic (TRISO)-coated particle fuels undergo dimensional changes and chemical reactions during high-temperature neutron irradiation. Post-irradiation materialography helps understand processes that impact fuel performance, such as coating integrity and fission product retention. Conventionally, experts manually evaluate features in thousands of cross sections of sub-mm-sized samples, which is tedious and subjective. In this work, we propose UA-Net, a deep learning framework that segments five characteristic regions of TRISO fuel micrographs and generates an uncertainty map for predictions. The model uses a multi-stage pretraining strategy, starting with general image representations learned from ImageNet, followed by fine-tuning on TRISO micrographs from various irradiation experiments and AGR-5/6/7 particle cross sections. A meta-model for uncertainty prediction is integrated to identify small defects in TRISO images. UA-Net was evaluated on a test set of 102 images, achieving mean Intersection over Union (mIoU) and mean Precision (mP) of 95.5% and 97.3%, respectively. The meta-model achieved a specificity of 91.8% and sensitivity of 93.5%, demonstrating strong performance in detecting misclassifications. The model was also applied to new TRISO images for qualitative evaluation, showing high accuracy in extracting layer regions.
Preprint
RU-Net for Automatic Characterization of TRISO Fuel Cross Sections
Posted to a preprint site 09/10/2025
, 1 - 30
During irradiation, phenomena such as kernel swelling and buffer densification may impact the performance of tristructural isotropic (TRISO) particle fuel. Post-irradiation microscopy is often used to identify these irradiation-induced morphologic changes. However, each fuel compact generally contains thousands of TRISO particles. Manually performing the work to get statistical information on these phenomena is cumbersome and subjective. To reduce the subjectivity inherent in that process and to accelerate data analysis, we used convolutional neural networks (CNNs) to automatically segment cross-sectional images of microscopic TRISO layers. CNNs are a class of machine-learning algorithms specifically designed for processing structured grid data. They have gained popularity in recent years due to their remarkable performance in various computer vision tasks, including image classification, object detection, and image segmentation. In this research, we generated a large irradiated TRISO layer dataset with more than 2,000 microscopic images of cross-sectional TRISO particles and the corresponding annotated images. Based on these annotated images, we used different CNNs to automatically segment different TRISO layers. These CNNs include RU-Net (developed in this study), as well as three existing architectures: U-Net, Residual Network (ResNet), and Attention U-Net. The preliminary results show that the model based on RU-Net performs best in terms of Intersection over Union (IoU). Using CNN models, we can expedite the analysis of TRISO particle cross sections, significantly reducing the manual labor involved and improving the objectivity of the segmentation results.
Preprint
Posted to a preprint site 01/24/2025
Recent advancements in machine learning-based methods have demonstrated great
potential for improved property prediction in material science. However,
reliable estimation of the confidence intervals for the predicted values
remains a challenge, due to the inherent complexities in material modeling.
This study introduces a novel approach for uncertainty quantification in
fatigue life prediction of metal materials based on integrating knowledge from
physics-based fatigue life models and machine learning models. The proposed
approach employs physics-based input features estimated using the Basquin
fatigue model to augment the experimentally collected data of fatigue life.
Furthermore, a physics-informed loss function that enforces boundary
constraints for the estimated fatigue life of considered materials is
introduced for the neural network models. Experimental validation on datasets
comprising collected data from fatigue life tests for Titanium alloys and
Carbon steel alloys demonstrates the effectiveness of the proposed approach.
The synergy between physics-based models and data-driven models enhances the
consistency in predicted values and improves uncertainty interval estimates.
Preprint
GCSAM: Gradient Centralized Sharpness Aware Minimization
Posted to a preprint site 01/20/2025
The generalization performance of deep neural networks (DNNs) is a critical
factor in achieving robust model behavior on unseen data. Recent studies have
highlighted the importance of sharpness-based measures in promoting
generalization by encouraging convergence to flatter minima. Among these
approaches, Sharpness-Aware Minimization (SAM) has emerged as an effective
optimization technique for reducing the sharpness of the loss landscape,
thereby improving generalization. However, SAM's computational overhead and
sensitivity to noisy gradients limit its scalability and efficiency. To address
these challenges, we propose Gradient-Centralized Sharpness-Aware Minimization
(GCSAM), which incorporates Gradient Centralization (GC) to stabilize gradients
and accelerate convergence. GCSAM normalizes gradients before the ascent step,
reducing noise and variance, and improving stability during training. Our
evaluations indicate that GCSAM consistently outperforms SAM and the Adam
optimizer in terms of generalization and computational efficiency. These
findings demonstrate GCSAM's effectiveness across diverse domains, including
general and medical imaging tasks. Our code is available at https://github.com/mhassann22/GCSAM
Preprint
Posted to a preprint site 12/17/2024
Producing large images using small diffusion models is gaining increasing
popularity, as the cost of training large models could be prohibitive. A common
approach involves jointly generating a series of overlapped image patches and
obtaining large images by merging adjacent patches. However, results from
existing methods often exhibit obvious artifacts, e.g., seams and inconsistent
objects and styles. To address the issues, we proposed Guided Fusion (GF),
which mitigates the negative impact from distant image regions by applying a
weighted average to the overlapping regions. Moreover, we proposed
Variance-Corrected Fusion (VCF), which corrects data variance at
post-averaging, generating more accurate fusion for the Denoising Diffusion
Probabilistic Model. Furthermore, we proposed a one-shot Style Alignment (SA),
which generates a coherent style for large images by adjusting the initial
input noise without adding extra computational burden. Extensive experiments
demonstrated that the proposed fusion methods improved the quality of the
generated image significantly. As a plug-and-play module, the proposed method
can be widely applied to enhance other fusion-based methods for large image
generation.
Preprint
Posted to a preprint site 09/30/2024
Identifying and classifying shutdown initiating events (SDIEs) is critical for developing low power shutdown probabilistic risk assessment for nuclear power plants. Existing computational approaches cannot achieve satisfactory performance due to the challenges of unavailable large, labeled datasets, imbalanced event types, and label noise. To address these challenges, we propose a hybrid pipeline that integrates a knowledge-informed machine learning mode to prescreen non-SDIEs and a large language model (LLM) to classify SDIEs into four types. In the prescreening stage, we proposed a set of 44 SDIE text patterns that consist of the most salient keywords and phrases from six SDIE types. Text vectorization based on the SDIE patterns generates feature vectors that are highly separable by using a simple binary classifier. The second stage builds Bidirectional Encoder Representations from Transformers (BERT)-based LLM, which learns generic English language representations from self-supervised pretraining on a large dataset and adapts to SDIE classification by fine-tuning it on an SDIE dataset. The proposed approaches are evaluated on a dataset with 10,928 events using precision, recall ratio, F1 score, and average accuracy. The results demonstrate that the prescreening stage can exclude more than 97% non-SDIEs, and the LLM achieves an average accuracy of 93.4% for SDIE classification.
Preprint
Do Sharpness-based Optimizers Improve Generalization in Medical Image Analysis?
Posted to a preprint site 08/07/2024
Effective clinical deployment of deep learning models in healthcare demands high generalization performance to ensure accurate diagnosis and treatment planning. In recent years, significant research has focused on improving the generalization of deep learning models by regularizing the sharpness of the loss landscape. Among the optimization approaches that explicitly minimize sharpness, Sharpness-Aware Minimization (SAM) has shown potential in enhancing generalization performance on general domain image datasets. This success has led to the development of several advanced sharpness-based algorithms aimed at addressing the limitations of SAM, such as Adaptive SAM, surrogate-Gap SAM, Weighted SAM, and Curvature Regularized SAM. These sharpness-based optimizers have shown improvements in model generalization compared to conventional stochastic gradient descent optimizers and their variants on general domain image datasets, but they have not been thoroughly evaluated on medical images. This work provides a review of recent sharpness-based methods for improving the generalization of deep learning networks and evaluates the methods performance on medical breast ultrasound images. Our findings indicate that the initial SAM method successfully enhances the generalization of various deep learning models. While Adaptive SAM improves generalization of convolutional neural networks, it fails to do so for vision transformers. Other sharpness-based optimizers, however, do not demonstrate consistent results. The results reveal that, contrary to findings in the non-medical domain, SAM is the only recommended sharpness-based optimizer that consistently improves generalization in medical image analysis, and further research is necessary to refine the variants of SAM to enhance generalization performance in this field.
Preprint
Causality Extraction from Nuclear Licensee Event Reports Using a Hybrid Framework
Posted to a preprint site 04/08/2024
arXiv.org
Industry-wide nuclear power plant operating experience is a critical source of raw data for performing parameter estimations in reliability and risk models. Much operating experience information pertains to failure events and is stored as reports containing unstructured data, such as narratives. Event reports are essential for understanding how failures are initiated and propagated, including the numerous causal relations involved. Causal relation extraction using deep learning represents a significant frontier in the field of natural language processing (NLP), and is crucial since it enables the interpretation of intricate narratives and connections contained within vast amounts of written information. This paper proposed a hybrid framework for causality detection and extraction from nuclear licensee event reports. The main contributions include: (1) we compiled an LER corpus with 20,129 text samples for causality analysis, (2) developed an interactive tool for labeling cause effect pairs, (3) built a deep-learning-based approach for causal relation detection, and (4) developed a knowledge based cause-effect extraction approach.
Preprint
A2DMN: Anatomy-Aware Dilated Multiscale Network for Breast Ultrasound Semantic Segmentation
Posted to a preprint site 03/22/2024
arXiv.org
In recent years, convolutional neural networks for semantic segmentation of breast ultrasound (BUS) images have shown great success; however, two major challenges still exist. 1) Most current approaches inherently lack the ability to utilize tissue anatomy, resulting in misclassified image regions. 2) They struggle to produce accurate boundaries due to the repeated down-sampling operations. To address these issues, we propose a novel breast anatomy-aware network for capturing fine image details and a new smoothness term that encodes breast anatomy. It incorporates context information across multiple spatial scales to generate more accurate semantic boundaries. Extensive experiments are conducted to compare the proposed method and eight state-of-the-art approaches using a BUS dataset with 325 images. The results demonstrate the proposed method significantly improves the segmentation of the muscle, mammary, and tumor classes and produces more accurate fine details of tissue boundaries.