Abstract
The rising popularity of data science and machine learning (ML) across diverse domains, often driven by users with limited computational expertise, reflects the growing commoditization of ML tools. However, the advanced technical and mathematical knowledge demanded by current ML frameworks poses a formidable barrier for non-experts, preventing them from fully exploiting these powerful platforms.
In response, we introduce MQL, a novel declarative query language for ML application design, alongside its corresponding query processing engine. We demonstrate that abstracting ML concepts – similarly to SQL – can preserve both processing efficiency and analytical fidelity. Our implementation defines MQL semantics through a semantics-preserving mapping to widely understood ML code fragments. By leveraging task-specific meta-features, heuristic knowledge, and standard assessment methods, our system ranks candidate ML libraries, selects optimal algorithms, and frees users from these choices.
We introduce mapping algorithms to ensure that each MQL program retains its intended semantics and present experimental evaluations demonstrating that MQL’s algorithmic selections not only match but surpass human-engineered solutions in terms of performance and model accuracy. By offering declarative queries as a high-level alternative to traditional coding, MQL significantly reduces the complexity of data analysis pipeline construction, thereby democratizing machine learning application design.