Abstract
Pedestrian and cyclist safety at urban intersections remains a critical challenge for transportation agencies, as vulnerable road users are significantly exposed to crash risks in complex traffic environments. Identifying high-risk locations and factors that contribute to crashes is essential for improving road safety. This study developed an explainable machine learning framework to predict motor vehicle-involved pedestrian and cyclist crash occurrence at urban intersections using five years of crash, geometric, operational, and socioeconomic data from a large set of urban intersections. Five supervised machine learning algorithms were trained and evaluated, including Binary Logistic Regression, K-Nearest Neighbors, Support Vector Machine, Decision Tree, and Random Forest. The evaluated models demonstrated strong predictive performance overall, with accuracies approaching 91% and high discriminative capability. In particular, the Binary Logistic Regression and Random Forest models achieved the highest area under the receiver operating characteristic curve (AUC) values of 0.961 and 0.964, respectively. To enhance transparency, SHAP values were used to quantify the contribution of predictors and examine feature effects at both the global and local levels. The results indicate that roadway hierarchy, intersection markings, and total entering volume are among the most influential determinants of crash likelihood, while socioeconomic variables exhibit weaker but interpretable effects.