Abstract
Large language models (LLMs) such as ChatGPT, Claude, and Gemini have become heavily relied on and trusted by users of all ages. Due to the nature of recent innovation, the inner workings of these artificial intelligence (AI) models are not yet publicly known; however, the models are still often used verbatim without verification or review. This thesis presents a study that looked at the effect of contextual bias on an AI task to evaluate a piece of code for security vulnerabilities and to offer a rewrite of that code. The study found that contextual bias does impact the results of the security assessment and subsequent rewrite. Additionally, randomness in the models also contributes to variations in the outputs. Regarding programming languages, this study also tested the security strength in different languages per large language model. The findings in this study should be used as a warning to developers and researchers that they need to review the AI outputs, and overall can be used as part of an effort to improve the use of AI systems in code development and research and to better understand the potential impact of contextual bias in AI model output.