Output list
Preprint
Ethical Implications of Training Deceptive AI
Posted to a preprint site 03/10/2026
Deceptive behavior in AI systems is no longer theoretical: large language models strategically mislead without producing false statements, maintain deceptive strategies through safety training, and coordinate deception in multi-agent settings. While the European Union's AI Act prohibits deployment of deceptive AI systems, it explicitly exempts research and development, creating a necessary but unstructured space in which no established framework governs how deception research should be conducted or how risk should scale with capability. This paper proposes a Deception Research Levels (DRL) framework, a classification system for deceptive algorithm research modeled on the Biosafety Level system used in biological research. The DRL framework classifies research by risk profile rather than researcher intent, assessing deceptive mechanisms across five dimensions grounded in the AI4People ethical framework: Pillar Implication, Severity, Reversibility, Scale, and Vulnerability. Classification follows a ``highest dimension wins'' approach, assigning one of four risk levels with cumulative safeguards ranging from standard documentation at DRL-1 to regulatory notification and third-party security audits at DRL-4. A dual-development mandate at DRL-3 and above requires that detection and mitigation methods be developed alongside any deceptive capability. We apply the framework to eight case studies spanning all four levels and demonstrate that ecological validity of the deceptive mechanism emerges as a consistent, non-independent indicator of classification level. The DRL framework is intended to fill the governance gap between regulated deployment and unstructured research, supporting both beneficial applications and defensive research under conditions where safeguards are proportional to the potential for harm.
Preprint
Intentional Deception as Controllable Capability in LLM Agents
Posted to a preprint site 03/08/2026
, 1 - 36
As LLM-based agents increasingly operate in multi-agent systems, understanding adversarial manipulation becomes critical for defensive design. We present a systematic study of intentional deception as an engineered capability, using LLM-to-LLM interactions within a text-based RPG where parameterized behavioral profiles (9 alignments x 4 motivations, yielding 36 profiles with explicit ethical ground truth) serve as our experimental testbed. Unlike accidental deception from misalignment, we investigate a two-stage system that infers target agent characteristics and generates deceptive responses steering targets toward actions counter to their beliefs and motivations. We find that deceptive intervention produces differential effects concentrated in specific behavioral profiles rather than distributed uniformly, and that 88.5% of successful deceptions employ misdirection (true statements with strategic framing) rather than fabrication, indicating fact-checking defenses would miss the large majority of adversarial responses. Motivation, inferable at 98%+ accuracy, serves as the primary attack vector, while belief systems remain harder to identify (49% inference ceiling) or exploit. These findings identify which agent profiles require additional safeguards and suggest that current fact-verification approaches are insufficient against strategically framed deception.
Dataset
Published 03/06/2026
Processed behavioral datasets and raw game logs from a large-scale study of LLM-based agent behavioral inference. Agents assigned one of 36 profiles (9 moral alignments x 4 motivations) navigate a procedurally generated dungeon environment. The deposit contains raw game logs (17,411 games, 1,575,377 sequences) and two processed training datasets used for BiLSTM and Longformer classification experiments respectively. Associated paper: Starace and Soule, Behavioral Inference at Scale: The Fundamental Asymmetry Between Motivations and Belief Systems. Full documentation in README.md.
Journal article
Deceptive algorithms in games: A systematic literature review
Published 01/2026
Entertainment computing, 56, 1 - 14
This systematic literature review examines the evolving landscape of deception in video games and artificial intelligence (AI). The integration of deceptive strategies in AI, particularly within gaming environments, represents a growing area of interest with significant implications for both gameplay and broader applications, such as cybersecurity. Through a systematic review of 97 papers, 79 were excluded after introduction analysis revealed focus on deception outside gaming contexts (e.g., advertising, propaganda, movement detection), leaving 18 papers directly applicable to game-based deception. Of these 18, 61% provided formal or contextual definitions while 39% relied on assumed understanding. The review categorizes the current body of research into three primary areas: definitions of deception, methods for implementing and mitigating deception, and the frameworks used to analyze these strategies. The review highlights the diversity in the conceptualization of deception, ranging from formal definitions grounded in game theory, to more context-specific operational definitions. Key models such as signaling games (information asymmetry scenarios), Stackelberg games (leader–follower dynamics), and hypergames (perception-based interactions) are explored alongside AI-driven approaches like reinforcement learning (trial-and-error learning) and generative neural networks, which simulate and detect deception in complex environments. The review identifies significant gaps in the standardization of definitions and the practical implementation of deceptive strategies, calling for further interdisciplinary research to address these challenges. The ethical implications of deploying deceptive AI systems are discussed, emphasizing the need for comprehensive frameworks that balance innovation with responsible usage. Future research must prioritize the standardized definitions and interdisciplinary collaboration across ethics, law, and social sciences to address the expanding applications and ethical implications of deceptive AI technologies.
Conference paper
Modeling Player Types with LLMs: A Framework for Belief- and Motivation-Driven NPC Behavior
Published 2026
Serious Games, 228 - 244
This paper explores the potential for large language models (LLMs), specifically ChatGPT-4o, to engage in role-playing games (RPGs) by making decisions based on predefined belief systems and motivations. Using a text-based dungeon crawler environment, the LLM was assigned structured character profiles incorporating alignments from Dungeons & Dragons and motivations—wealth accumulation, wanderlust, or safety—to guide decision-making. This approach supports player modeling by enabling the creation of non-player characters (NPCs) that reflect diverse player types, facilitating personalized, adaptive serious games. We also introduce a system for evaluating an LLM’s effectiveness in character generation, offering a structured framework for assessing its ability to maintain consistent, motivation-driven behavior. LLMs demonstrated improved decision-making accuracy ranging from 75% to 93% under the structured framework. The lowest performance appeared in chaotic and evil profiles—behavioral patterns often attenuated during pretraining—while the highest accuracy was found in lawful and neutral profiles oriented toward safety. These findings highlight the potential for LLMs to enhance game design through richer NPC interactions and more dynamic, player-adaptive experiences.
Preprint
Systematic Evaluation of Multi-modal Approaches to Complex Player Profile Classification
Posted to a preprint site 09/06/2025
, 1 - 11
Modern adaptive games require nuanced player understanding, yet most models use simplified 5-10 category taxonomies that fail to capture diversity. Behavioral clustering cannot distinguish players with different motivations who act similarly. We present a systematic evaluation of multi-modal classification at scale, combining behavioral telemetry with semantic context to support 36 player profiles. Using 19,413 gameplay sessions from an AI-controlled text-based RPG, we compared behavioral-only baselines with multi-modal approaches that integrate action sequences and semantic descriptions. Traditional clustering achieved only 10% accuracy for 36-category classification, limited by semantic conflation where opposite actions produced identical features. Our multi-modal LSTM processing action-text pairs improved accuracy to 21%, showing both potential and limits of non-conversational data. Analysis by behavioral complexity revealed that non-neutral profiles reached 42% accuracy (15x above random), while neutral profiles dropped to 25% (9x above random). Identical actions such as "help the merchant" cannot reveal whether a player is neutral or strategically waiting. Without access to reasoning, even multi-modal models struggle, though above-baseline results confirm a meaningful signal. Since prediction beyond 20 categories remains unexplored, our findings establish benchmarks for complex player modeling. Behavioral data alone plateaus near 10% for 36 categories, while multi-modal integration enables 25%. For designers, this shows that personality-based adaptation requires conversational interaction, as predefined choices cannot capture intent. Our evaluation at 36-category scale offers guidance for building adaptive games that better understand their players.
Journal article
Space Medicine Meets Serious Games: Boosting Engagement with the Medimon Creature Collector
Published 08/07/2025
Multimodal technologies and interaction, 9, 8, 80
Serious games that integrate educational content with engaging gameplay mechanics hold promise for reducing cognitive load and increasing student motivation in STEM and health science education. This preliminary study presents the development and evaluation of the Medimon NASA Demo, a game-based learning prototype designed to teach undergraduate students about the musculoskeletal and visual systems—two critical domains in space medicine. Participants (n = 23) engaged with the game over a two-week self-regulated learning period. The game employed mnemonic-based characters, visual storytelling, and turn-based battle mechanics to reinforce medical concepts. Quantitative results demonstrated significant learning gains, with posttest scores increasing by an average of 23% and a normalized change of c = 0.4. Engagement levels were high across multiple dimensions of situational interest, and 74% of participants preferred the game over traditional formats. Qualitative analysis of open-ended responses revealed themes related to intrinsic appeal, perceived learning efficacy, interaction design, and cognitive resource management. While the game had minimal impact on short-term STEM career interest, its educational potential was clearly supported. These findings suggest that mnemonic-driven serious games like Medimon can effectively enhance engagement and learning in health science education, especially when aligned with real-world contexts such as space medicine.
Book chapter
Deceptive Algorithms in Massive Multiplayer Online Role Playing Games (MMOs)
Published 2025
Serious Games, 414 - 420
This paper proposes using a text-based dungeon crawler adventure as a case study to explore the methods to implement deception in video games. The study proposes a framework for integrating deception into gameplay, leveraging the alignment system from Dungeons and Dragons to define character behavior and motivation. The proposed approach would create an environment that allows researchers to observe AI-controlled characters in a dynamically generated environment that leverages LLMs. The framework is designed to address the issue of monotony in current games by training a deceptive agent, or villain, to recognize and exploit player beliefs and intentions. This adds complexity and depth to the gaming experience, making it more engaging and dynamic. Future research directions include integrating human players into the game environment and transitioning to 3-D gaming platforms, potentially leading to more immersive experiences, particularly in massive multiplayer online role-playing games (MMORPGs). By exploring the intersection of AI, deception, and gaming, this paper contributes to the evolving interactive entertainment landscape, paving the way for more sophisticated and captivating game experiences.
Conference paper
On Students’ Perception of Compiler Syntax Error Messages: A Human Factors Approach
Published 2025
HCI International 2024 – Late Breaking Papers, 3 - 16
International Conference on Human-Computer Interaction
Error messages play a crucial role in helping developers, especially novices. The main job of error messages is to help users find and fix errors. This paper explores students’ perceptions of using compiler syntax error messages to find and repair erroneous programs. We conducted an experiment in which participants had to find and fix errors, followed by reflection on their experiences with three tasks. In task one, participants evaluated the error messages using a proposed rubric for user experiences with compiler error messages. In task two, they described the error messages through open-ended questions, while in task three, they suggested alternative messages. Eighteen error messages from three compilers were ranked based on the rubric. The results indicate that users prefer error messages that are informative, human-centric, and provide accurate and precise information about the error and its resolution. Reported difficulties with error messages include lack of clarity and misdirection.
Conference proceeding
A Parsing Technique for Enhancing Compiler Syntax Error Messages for Student Programmers
Published 10/13/2024
Proceedings - Frontiers in Education Conference, 1 - 7
2024 IEEE Frontiers in Education Conference, 10/13/2024–10/16/2024, Washington, DC
Contribution: This full research paper presents an innovative parsing technique that aims to improve syntax error messages for undergraduate students. The quality of syntax error messages generated by the new parsing technique was evaluated and compared with the messages produced by mainstream compilers. Background: Unfortunately, compiler error messages are often unhelpful. The study explains some intrinsic challenges faced in generating good syntax error messages and presents a global, local, and expression-level (GLE) parsing technique to overcome some of these challenges. GLE is a 3-phase parsing that prioritizes the parsing of the large code components over diving into all the details. The first phase parses the functional structures and ignores errors in the syntax of the smaller constructions. The second phase parses the control structures and ignores errors in the expressions and other statements. The third phase parses the expressions and statements excluded from phase two. Research Question: Can GLE parsing techniques help generate better syntax error messages? Methodology: The study evaluated the quality of syntax error messages generated by the proposed GLE parsing technique. The evaluation was done in a controlled experiment and within-group design where participants found and fixed errors in erroneous programs using accompanying error messages from different compilers. The independent variable is the compiler type. The dependent variable is the quality of syntax error messages. The quality of syntax error messages is measured by three factors: the success rate of finding errors in erroneous programs, the success rate of fixing syntax errors in erroneous programs, and mean-time-to-find and -fix erroneous programs. Three questions were used to evaluate the "helping in finding errors" quality of the error message: 1) what is the error in the program? 2) in which line is the error? 3) what is the cause of the error? One question was used to evaluate the quality of "helping in fixing errors": "how to fix the error?" The time that participants used to find and fix a program was calculated. The participants were 51 undergraduate students in the Computer Science and Engineering department at the New Mexico Institute of Mining and Technology. Findings: The results show there is a significant statistical difference in finding errors and fixing erroneous programs using messages generated by the proposed GLE parsing technique and two mainstream compilers: GNU GCC and Microsoft Visual C++. No significant difference exists in the time-to-find and -fix. The result indicates that the proposed GLE parsing technique can help generate better error messages for undergraduate students.