Abstract
This project sought to explore various ways of analyzing an image of a skin mole to detect the presence of cancer. Machine learning techniques utilized included logistic regression, perceptron, adaline, fully connected neural network, support vector machine, neural network combined with support vector machine, decision tree, XGBoost, random forest, basic convolutional neural network, MobileNetV2, and ResNet50 models. Information was collected on 1500 medical grade images, all of which were obtained from the ISIC Archive. Of the 1500 images, 750 were labeled as benign and 750 were labeled as malignant. After being rescaled and padded to be 224x224 pixels in dimensions, images were processed in two ways. The first way involved applying a mask to the images so that four features of the skin mole could be extracted: asymmetry, border irregularity, color, and diameter, or so-called ABCD features. Once extracted, these four features were fed into logistic regression, perceptron, adaline, fully connected neural network, support vector machine, neural network combined with support vector machine, decision tree, XGBoost, and random forest models. The second way images were processed involved feeding the images directly into a basic convolutional neural network, the MobileNetV2 convolutional neural network, and the ResNet50 convolutional neural network. In addition to these two sets of data being run through the aforementioned models, a combination of the ABCD data and the images were run in modified MobileNetV2 and ResNet50 models. Performance of just the ABCD data resulted in consistently low accuracy, with training accuracy rarely exceeding 70% unless overfitting was taking place. For the convolutional neural networks, just the basic network produced both training and cross-validation accuracy around 70%, but the original and modified versions of the more advanced models all produced the highest accuracies among both training and cross-validation with the original MobileNetV2 model reaching a training accuracy of 77.9% and cross-validation accuracy of 73.4% and the modified MobileNetV2 model reaching 81.1% accuracy for training and 75.1% accuracy for cross-validation. The original ResNet50 model reached a training accuracy of 90.8% accuracy and 78.1% cross-validation accuracy, while the modified ResNet50 model obtained 93.2% training accuracy and 79.5% cross-validation accuracy. To measure whether performance improved upon adding the four ABCD features to the information obtained from each image fed into the ResNet50 convolutional neural network, a distribution curve of the accuracies for the modified network and the original network was plotted.