Abstract
Modern human productivity and daily life rely on identifying ground objects using remote sensing images (RSIs). Traditional remote sensing object detection (RSOD) techniques lack timeliness and accuracy and fail to meet practical demands. Existing deep-learning algorithms face continued challenges when processing RSIs because of the diverse shapes and extensive scale variations of objects, of which a significant proportion are small-scale. To address these challenges, we propose the PSWP-DETR, a Transformer-based network that leverages adaptive deformation-learning and multiscale integration for enhanced object detection in remote sensing. First, we propose PradatorConv (PdConv) to address the significant shape changes of objects because it adaptively learns the horizontal and vertical deformations to perceive the complex geometric features of RSIs. Secondly, we propose Scale-wise Differential Modules (SDM), which comprise multi-scale convolution and Edge Captor Convolution (ECC). SDM integrates features across various scales and captures edge characteristics and local textures. This is advantageous for detecting multi-scale objects, tiny objects with limited feature information. Finally, we propose the Whale Particle Optimization (WPO) algorithm for learning rate optimization, which improves convergence speed and accuracy. Experiments using the VisDrone2019-DET, DIOR, and AI-TOD datasets demonstrated that PSWP-DETR achieves the best accuracy benefits, offering significant insights for future RSOD efforts. The source code will be available at https://github.com/Get1star/PSWP-DETR.git.