Abstract
Recommender Systems (RSs) have become critical tools for navigating the overwhelming volume of digital content, helping users discover items of interest in domains such as e-commerce, media streaming, and social networks. One of the most commonly used techniques in RSs is Collaborative Filtering (CF), which generates personalized suggestions by analyzing the interaction patterns between users and items. However, traditional CF methods face challenges of data sparsity, scalability, and variability in user behavior. These limitations are difficult to evaluate using real-world datasets, where the presence of uncontrolled noise, sparsity, and user counts complicates performance analysis. This research addresses these gaps by using synthetic data and evolutionary optimization to systematically evaluate CF performance under controlled conditions.
The model proposed in this research combines CF with a Genetic Algorithm (GA) and uses a customized similarity metric, SimGen, to improve prediction accuracy under diverse data conditions. Synthetic datasets are generated with controlled variations in rating patterns (uniform and varying), noise levels, missing rates, user scales, and group complexities, resulting in various experimental configurations. This framework enables a systematic exploration of CF performance and optimization in environments that simulate real-world complexity with controlled variation.
The results show that the GA-enhanced CF model handles diverse data conditions, with low memory usage and fast prediction times across different data conditions. The model achieves consistently low prediction times and stable memory usage, even in large-scale, high-sparsity scenarios. Accuracy improves with increased user count, indicating better generalization and reduced overfitting, particularly in datasets with a varying pattern. While noise has a moderate impact, the model remains robust to its presence. Sparsity significantly affects accuracy, especially under complex rating patterns, whereas group complexity has a smaller influence. Notably, the model performs best under varying rating patterns, where it generalizes effectively despite increased data complexity, making it suitable for realistic recommendation scenarios.