Abstract
This research utilizes historical crash, roadway, and socio-economic data from urban intersections in Idaho, covering 2015 to 2019, to evaluate machine learning algorithms' effectiveness in regression (predicting crash frequencies) and classification (binary occurrence of crashes) tasks. Algorithms investigated are Linear Regression (logistic regression for classification), K-Nearest Neighbors, Support Vector Machine, Decision Trees, and Random Forests. By testing models with different numbers of features, the study explored Occam’s Razor and the Curse of Dimensionality principles. Results presented Linear Regression and Random Forests as top performers for regression and classification, respectively, largely affirming the benefit of using fewer features for better performance, despite some exceptions. Notably, roadway characteristics such as Total Entering Vehicles, Major Road Classification, Minor Road Classification and Intersection Control emerged as crucial predictors of crashes, underscoring the significant influence of intersection design on pedestrian and bicycle safety.