Logo image
From LLMs to Randomness: Analyzing Program Input Efficacy with Resource and Language Metrics
Journal article   Open access   Peer reviewed

From LLMs to Randomness: Analyzing Program Input Efficacy with Resource and Language Metrics

Gavin Black, Eric Yocam, Varghese Vaidyan, Gurcan Comert and Yong Wang
IEEE Access, Vol.13, pp.1-1
01/01/2025

Abstract

Anomaly detection Codes Computer crashes Fuzzing Fuzzing Techniques Large Language Models Measurement Memory management Program Behavior Analysis Python Resource Usage Metrics Software Profiling Statistical analysis Testing Software
Security-focused program testing typically focuses on crash detection and code coverage while overlooking additional system behaviors that can impact program confidentiality and availability. To address this gap, we propose a statistical framework that combines embedding-based anomaly detection, resource usage metrics, and resource-state distance measures to systematically profile software behaviors beyond traditional coverage-based methods. Leveraging over 5 million labeled samples from 50 Python programs, we evaluate how these independent scoring terms distinguish among different sources of input, including Large Language Model (LLM)-generated inputs, and demonstrate how standard statistical tests (e.g., Kolmogorov-Smirnov and Kendall's τ) confirm their effectiveness. Our findings show that LLM-generated samples can trigger diverse behaviors but are often less effective at exploring resource usage dynamics (CPU, memory) compared with conventional fuzzing. However, combining LLM outputs with existing techniques broadens behavior coverage and reveals commonalities between commercial LLM outputs. We provide open-source tools for this evaluation framework, demonstrating the potential to refine software testing by integrating behavior metrics into security-testing workflows.
url
https://doi.org/10.1109/ACCESS.2025.3571205View
Published (Version of record) Open

Metrics

1 Record Views

Details

Logo image