Abstract
At matriculation, university advising typically operates under tight informational constraints, often with no access to post-enrolment interaction history. We propose a unified, leakage-controlled pipeline that (i) predicts early dropout risk and (ii) generates cold-start programme recommendations using only pre-enrolment signals, with an optional early-warning variant that additionally incorporates first-term academic aggregates. The pipeline instantiates lightweight multimodal components: a tabular RNN, a DistilBERT encoder for short profile sentences, and a cross-attention fusion module, trainedand evaluated end-to-end on a public benchmark (UCI id 697; n = 3630 students across 17 programmes). For dropout prediction, fusing text with numeric features yields the strongest thresholded performance (Hybrid RNN–DistilBERT: F1 ≈ 0.9161, MCC ≈ 0.7750), while simple ensembling modestly improves threshold-free discrimination (AUROC up to ≈ 0.9488 via Stacking Ensemble, compared to ≈ 0.9459 for Weighted Ensemble). A text-only branch performs substantially worse, indicating that numeric demographics and early curricular aggregates carry most of the predictive signal at this horizon. For programme recommendation, pre-enrolment demographics alone support actionable rankings (Demographic MLP: NDCG@10 ≈ 0.5793, Top-10 ≈ 0.9380), outperforming a popularity prior by roughly 25–27 percentage points in NDCG@10; adding text yields only marginal improvements in hit rate and does not improve NDCG on this cohort. Methodologically, we apply leakage guards, deterministic preprocessing, stratified splits, and comprehensive metric reporting to enable reproducibility on non-proprietary data. Practically, the pipeline supports orientation time triage via high- recall early warning and shortlist generation for programme selection. Overall, the results cast matriculation-time advising as a joint prediction–recommendation problem solvable with carefully engineered pre-enrolment views and lightweight multimodal models, without relying on historical interactions.