Abstract
The transformative potential of deep learning is often hindered in critical, data-scarce applications, such as materials science and medical imaging, where acquiring and meticulously annotating large datasets is prohibitively costly and labor-intensive. This dissertation confronts this dual challenge by introducing a synergistic suite of methods that enables the development of powerful deep learning models with limited resources. The research delivers novel, interconnected contributions in three core areas: data-efficient instance segmentation, rapid interactive annotation, and high-fidelity synthetic data generation.The first contribution introduces the Multitask Instance Segmentation Network (MTIS-Net), a model engineered for data-scarce environments. By integrating a dual-decoder architecture that simultaneously segments object regions and their boundaries, MTIS-Net achieves a recall of over 90% with a minimal number of training samples, drastically reducing the annotation burden required to train effective segmentation models.
The second contribution accelerates dataset creation through Cascade-Forward Refinement with Iterative Click Loss (CFR-ICL), an efficient interactive image segmentation model. This method introduces a novel loss function that directly optimizes for fewer user clicks and a unified inference process that refines segmentation quality without requiring extra network modules. This approach facilitates rapid and intuitive human-AI collaboration, significantly speeding up the annotation workflow and reducing the number of required clicks by up to 33.2% compared to state-of-the-art methods on benchmark datasets.
The third contribution is a novel framework for generating large-content images, featuring Guided and Variance-Corrected Fusion (GVCF) and one-shot Style Alignment (SA). This approach enables small, pre-trained diffusion models to produce high-fidelity, seamless, and arbitrarily sized images by correcting statistical discrepancies during the fusion process and aligning style with minimal computational overhead. This serves as a powerful tool for data augmentation, capable of synthesizing realistic and diverse images to extend sparse training sets and improve model generalization.
Collectively, these components form a cohesive, human-in-the-loop framework that synergizes synthetic data generation, interactive labeling, and efficient model training. This dissertation delivers a significant advance in data-efficient deep learning, offering a comprehensive solution that makes the development of advanced models more accessible, practical, and effective for a wide range of critical applications operating under severe data constraints.