Abstract
Effective clinical deployment of deep learning models in healthcare demands high generalization performance to ensure accurate diagnosis and treatment planning. This requirement is critical as deep learning models often face challenges with limited data availability and imbalanced datasets across different hospitals, device settings, and patient populations. In recent years, significant research has focused on improving the generalization of deep learning models by regularizing the sharpness of the loss landscape, which refers to the curvature of the error surface that the models are trained to minimize. Among the optimization approaches that explicitly minimize sharpness, Sharpness-Aware Minimization (SAM) has shown substantial potential in enhancing generalization performance on general domain image datasets. This success has influenced the development of several advanced sharpness-based algorithms aimed at addressing the limitations of SAM, including Adaptive SAM (ASAM), Surrogate-Gap SAM (GSAM), Weighted SAM (WSAM), and Curvature Regularized SAM (CR-SAM). These algorithms have demonstrated improvements in model generalization compared to conventional stochastic gradient descent optimizers and their variants on non-medical image datasets. However, their efficacy on medical images has not been thoroughly evaluated.This thesis aims to fill this gap by evaluating the generalization performance of these sharpness-based optimizers specifically on breast ultrasound (BUS) images by comparing them against the baseline performance achieved with the Adam optimizer. The experiments involve two CNN-based classification models (ResNet50 and VGG16) and two Vision Transformer models (ViT and Swin Transformer). The experimental results indicate that SAM successfully enhances the generalization of various deep learning models on BUS images. Specifically, while Adaptive SAM improves generalization for convolutional neural networks, it fails to do so for Vision Transformers. Other sharpness-based optimizers, including GSAM, WSAM, and CR-SAM, do not demonstrate consistent results across our experiments. These results reveal that, contrary to findings in the non-medical domain, SAM is the only sharpness-based optimizer that consistently improves generalization in medical image analysis.
Further analysis, including Hessian-based evaluations, supports the hypothesis that flatter minima correlate with better generalization. The Hessian calculations validate that the loss landscape produced by SAM is flatter than that resulting from the standard Adam optimizer for all tested models. These insights underscore the importance of further research to refine SAM and its variants, with the goal of enhancing generalization performance in medical image analysis and ensuring the reliable clinical deployment of deep learning models.