Abstract
Producing large images using small diffusion models is gaining increasing
popularity, as the cost of training large models could be prohibitive. A common
approach involves jointly generating a series of overlapped image patches and
obtaining large images by merging adjacent patches. However, results from
existing methods often exhibit obvious artifacts, e.g., seams and inconsistent
objects and styles. To address the issues, we proposed Guided Fusion (GF),
which mitigates the negative impact from distant image regions by applying a
weighted average to the overlapping regions. Moreover, we proposed
Variance-Corrected Fusion (VCF), which corrects data variance at
post-averaging, generating more accurate fusion for the Denoising Diffusion
Probabilistic Model. Furthermore, we proposed a one-shot Style Alignment (SA),
which generates a coherent style for large images by adjusting the initial
input noise without adding extra computational burden. Extensive experiments
demonstrated that the proposed fusion methods improved the quality of the
generated image significantly. As a plug-and-play module, the proposed method
can be widely applied to enhance other fusion-based methods for large image
generation.