Arxiv Selected Papers

2026-02-03

Generated at 2026-02-03 14:07 UTC

ATLAS: Automated Toolkit for Large-Scale Verified Code Synthesis

Author(s): Mantas Baksys, Stefan Zetzsche, Olivier Bouissou, Remi Delmas, Soonho Kong, Sean B. Holden

Primary Category: cs.SE

Categories: cs.SE

Last Updated: 2026-02-02 17:29 UTC

TL;DR

              
The problem is a lack of verified, formally correct code to train large language models, which limits their ability to generate code with machine-checked safety guarantees for critical systems.
The core method is an automated pipeline that translates Python solutions into Dafny programs, automatically generates specifications, proves them correct, and decomposes the results into a large-scale synthetic training dataset.
This matters because it directly scales the ability to automatically generate memory-safe and functionally correct low-level code, moving from probabilistic trust in testing to deterministic trust in proofs for security-critical software.

2026-01-29

Generated at 2026-01-29 14:01 UTC

Neural Theorem Proving for Verification Conditions: A Real-World Benchmark

Author(s): Qiyuan Xu, Xiaokun Luan, Renxi Wang, Joshua Ong Jun Leang, Peixin Wang, Haonan Li, Wenda Li, Conrad Watt

Primary Category: cs.AI

Categories: cs.AI, cs.PL, cs.SE

Last Updated: 2026-01-28 18:25 UTC

TL;DR

              
The problem is that real-world program verification often produces verification conditions too difficult for current automated theorem provers, forcing developers to write manual proofs and slowing down practical verification efforts.
The core method is creating a multi-language benchmark (NTP4VC) from verification conditions extracted via industrial tools from actual OS kernels, then testing both general and fine-tuned large language models on their ability to prove these conditions.
This matters because it directly measures and pushes progress on automating a critical bottleneck in software verification, which is essential for building memory-safe and secure low-level systems like kernels without prohibitive manual effort.

2026-01-28

Generated at 2026-01-28 13:52 UTC

Discovering 100+ Compiler Defects in 72 Hours via LLM-Driven Semantic Logic Recomposition

Author(s): Xingbang He, Yuanwei Chen, Hao Wu, Jikang Zhang, Zicheng Wang, Ligeng Chen, Junjie Peng, Haiyang Wei, Yi Qian, Tiantai Zhang, Linzhang Wang, Bing Mao

Primary Category: cs.SE

Categories: cs.SE

Last Updated: 2026-01-27 14:51 UTC

TL;DR

              
The problem is that existing compiler fuzzing methods—relying on syntactic mutations or general LLM fine-tuning—often lose the specific semantic logic (like an out-of-bounds access) that historically triggered bugs, limiting the diversity and effectiveness of generated test programs.
The core method is FeatureFuzz, which extracts "features" from past bug reports: each feature pairs a natural-language description of a bug-prone invariant with a concrete code snippet. It then recomposes these semantic features into coherent groups and instantiates them into valid C programs to fuzz compilers.
This matters because compilers are a root-of-trust in software supply chains; defects can lead to security vulnerabilities or silent miscompilation. By preserving and recombining proven bug-triggering semantics, FeatureFuzz more effectively uncovers deep, critical compiler bugs—finding 113 bugs in GCC/LLVM in 72 hours, with 97 confirmed by developers.

2026-01-27

Generated at 2026-01-27 13:52 UTC

Handling Scope Checks (Extended Version)

Author(s): Michael Lee, Ningning Xie, Oleg Kiselyov, Jeremy Yallop

Primary Category: cs.PL

Categories: cs.PL

Last Updated: 2026-01-26 18:55 UTC

TL;DR

              
Addresses the risk of scope extrusion in metaprogramming, where effect handlers can generate ill-scoped code that may lead to unsafe program behavior, a gap between theoretical type systems and practical dynamic checks in metalanguages.
Introduces a formal calculus (λ₍₍ₒₚ₎₎) to model dynamic checks and proposes the "Cause-for-Concern" check, a dynamic mechanism that detects scope extrusion by tracking environment interactions without relying on static types.
Enhances memory and system security by preventing scope extrusion that could cause undefined behavior in low-level code generation, offering a practical, provably correct alternative to static type systems for safer metaprogramming in languages like MetaOCaml.

Symmetric Proofs of Parameterized Programs

Author(s): Ruotong Cheng, Azadeh Farzan

Primary Category: cs.LO

Categories: cs.LO, cs.PL

Last Updated: 2026-01-26 18:06 UTC

TL;DR

              
Addresses the challenge of verifying safety for infinite-state parameterized programs with complex network topologies, where manual or exhaustive verification is infeasible due to scalability.
Introduces parametric proof spaces that leverage local symmetry in program topologies to reuse proofs for isomorphic neighborhoods, avoiding axiomatization of the topology.
Enables automated, scalable safety verification for low-level concurrent or distributed systems, directly enhancing memory and system security by provably preventing errors in parameterized architectures like grids or rings.

2026-01-15

Generated at 2026-01-15 13:44 UTC

Path-optimal symbolic execution of heap-manipulating programs

Author(s): Pietro Braione, Giovanni Denaro, Luca Guglielmo

Primary Category: cs.SE

Categories: cs.SE, cs.LO

Last Updated: 2026-01-14 16:36 UTC

TL;DR

              
Addresses path explosion in symbolic execution for heap-manipulating programs, where current methods create redundant analysis traces due to aliasing, unlike numeric-input programs which fork only per program path.
Introduces POSE, a path-optimal symbolic execution algorithm that ensures each program path is analyzed exactly once by precisely tracking heap states and aliasing conditions.
Improves memory safety and low-level security analysis by enabling efficient, exhaustive exploration of heap-dependent program behaviors, crucial for finding bugs in systems code with data structures.

2026-01-14

Generated at 2026-01-14 13:43 UTC

Memory DisOrder: Memory Re-orderings as a Timerless Side-channel

Author(s): Sean Siddens, Sanya Srivastava, Reese Levine, Josiah Dykstra, Tyler Sorensen

Primary Category: cs.CR

Categories: cs.CR, cs.AR

Last Updated: 2026-01-13 17:59 UTC

TL;DR

              
Problem Addressed: Relaxed memory models in CPUs/GPUs allow memory operations to be re-ordered, and this re-ordering frequency can be influenced by activity from other processes, creating a side-channel that does not rely on traditional timers.
Core Technical Method: A fuzzing campaign identifies that memory re-orderings can be observed across processes on mainstream processors (X86, Arm, Apple CPUs, and NVIDIA/AMD/Apple GPUs). These re-orderings are used to infer other processes' activity, enabling timerless side-channel attacks such as covert communication and application fingerprinting.
Impact on Security: This exposes a new side-channel vector that bypasses timer-based mitigations, allowing cross-process information leakage. Demonstrated attacks include covert channels (up to 30 Kb/s on X86 CPUs) and reliable fingerprinting of applications like DNN architectures, posing risks to memory safety and system isolation in shared hardware environments.

STELP: Secure Transpilation and Execution of LLM-Generated Programs

Author(s): Swapnil Shinde, Sahil Wadhwa, Andy Luo, Akshay Gupta, Mohammad Shahed Sorower

Primary Category: cs.SE

Categories: cs.SE, cs.AI

Last Updated: 2026-01-13 17:55 UTC

TL;DR

              
Addresses the risk of deploying LLM-generated code directly into production systems, which can contain vulnerabilities like data poisoning, malicious attacks, and hallucinations leading to system malfunctions.
Proposes STELP, a secure transpiler and executor that runs LLM-generated code in a controlled environment, using safety checks and isolation to prevent harmful execution.
Matters for system security by enabling safe use of autonomous AI code generation in production, where human review and traditional testing are impractical or insufficient.

2026-01-08

Generated at 2026-01-08 13:44 UTC

Deadlock-free Context-free Session Types

Author(s): Andreia Mordido, Jorge A. Pérez

Primary Category: cs.PL

Categories: cs.PL

Last Updated: 2026-01-07 16:27 UTC

TL;DR

              
Addresses the risk of deadlocks in message-passing concurrent programs using expressive, tree-like communication protocols (context-free session types), which standard session types cannot handle.
Uses a priority-based type system that statically enforces both adherence to these complex protocols and deadlock freedom, even with recursion, polymorphism, and cyclic communication topologies.
Matters for system security and low-level safety by guaranteeing reliable, predictable concurrent execution without deadlocks, a critical failure mode in distributed systems and parallel software.

2026-01-07

Generated at 2026-01-07 13:43 UTC

FSLH: Flexible Mechanized Speculative Load Hardening

Author(s): Jonathan Baumann, Roberto Blanco, Léon Ducruet, Sebastian Harwig, Catalin Hritcu

Primary Category: cs.CR

Categories: cs.CR, cs.PL

Last Updated: 2026-01-06 14:20 UTC

TL;DR

              
Addresses the incomplete security of Speculative Load Hardening (SLH) against Spectre v1 attacks: Selective SLH only protects constant-time cryptographic code, while Ultimate SLH secures arbitrary programs but imposes prohibitive performance overhead by treating all data as secret.
Introduces a flexible SLH mechanism that generalizes both Selective and Ultimate SLH, allowing security policies to be tailored per program—applying strong hardening only where needed—and formally proves its security using the Rocq theorem prover.
Provides a practical, verifiable defense for non-cryptographic software, enabling efficient memory safety against speculative execution attacks without assuming all data is secret, thus bridging the gap between security and performance for low-level system code.

2025-12-30

Generated at 2025-12-30 13:39 UTC

How Safe Are AI-Generated Patches? A Large-scale Study on Security Risks in LLM and Agentic Automated Program Repair on SWE-bench

Author(s): Amirali Sajadi, Kostadin Damevski, Preetha Chatterjee

Primary Category: cs.CR

Categories: cs.CR, cs.LG, cs.SE

Last Updated: 2025-12-29 16:44 UTC

TL;DR

              
The paper addresses the real risk that LLM-generated patches for fixing software bugs can introduce new security vulnerabilities, particularly in complex real-world codebases, unlike prior studies on simpler synthetic tasks.
It conducts a large-scale security analysis of over 20,000 GitHub issues, comparing patches from developers, a standalone LLM (Llama 3.3), and three agentic frameworks, using automated vulnerability detection tools to identify introduced flaws.
The core finding is that LLM-generated patches often exhibit unique vulnerability patterns not seen in developer patches, with agentic frameworks producing more insecure patches when granted higher autonomy, highlighting a gap in current automated repair tools.
This matters for system security because it shows that deploying LLM-based program repair without safeguards can degrade memory safety and low-level security, necessitating proactive risk assessments that consider both code characteristics and issue context to prevent exploitation.

2025-12-25

Generated at 2025-12-25 13:36 UTC

CoTDeceptor:Adversarial Code Obfuscation Against CoT-Enhanced LLM Code Agents

Author(s): Haoyang Li, Mingjin Li, Jinxin Zuo, Siqi Li, Xiao Li, Hao Wu, Yueming Lu, Xiaochuan He

Primary Category: cs.CR

Categories: cs.CR, cs.MA

Last Updated: 2025-12-24 15:55 UTC

TL;DR

              
Addresses the risk that advanced LLM code-review agents, which use Chain-of-Thought reasoning, can be systematically fooled by obfuscated malicious code, allowing backdoors to slip into software supply chains.
Uses an automated framework (CoTDeceptor) that generates multi-stage, evolving code obfuscation strategies specifically designed to disrupt the LLM's reasoning and semantic analysis processes.
Matters because it demonstrates a critical weakness in current AI-powered security auditing; such evasion enables the propagation of hidden vulnerabilities, undermining trust in automated code review and supply chain safety.

2025-12-17

Generated at 2025-12-17 13:36 UTC

Lexo: Eliminating Stealthy Supply-Chain Attacks via LLM-Assisted Program Regeneration

Author(s): Evangelos Lamprou, Julian Dai, Grigoris Ntousakis, Martin C. Rinard, Nikos Vasilakis

Primary Category: cs.CR

Categories: cs.CR

Last Updated: 2025-12-16 04:40 UTC

TL;DR

              
Problem: Stealthy supply-chain attacks in open-source software, where components appear normal but execute hidden malicious code only in specific target environments, evading conventional detection.
Core method: Lexo automatically learns a component's full observable behavior by generating input-output pairs, then uses multiple LLM instances guided by correctness and coverage metrics to synthesize a new, functionally equivalent version that excludes malicious code.
Security impact: It provides a concrete defense against memory-safety and system-security threats by regenerating components to eliminate hidden malicious functionality, even when direct LLM prompts fail, ensuring low-level safety without breaking compatibility.

The Functional Machine Calculus III: Control

Author(s): Willem Heijltjes

Primary Category: cs.PL

Categories: cs.PL, cs.LO

Last Updated: 2025-12-15 21:21 UTC

TL;DR

              
Addresses the gap between functional and imperative programming by extending a foundational calculus to model control flow like conditionals, loops, and exceptions, which are essential for real-world programming but not natively captured in pure lambda-calculus.
Introduces an operational semantics based on a Krivine machine extended with multiple operand stacks and a continuation stack, enabling precise modeling of sequential, branching, and looping computation within a functional framework.
Provides a unified model where an imperative language can be embedded with guarantees like confluence and type-based termination, directly enhancing memory safety and system security by enabling formal verification of low-level control flow in a well-behaved, analyzable calculus.

Taint Analysis for Graph APIs Focusing on Broken Access Control

Author(s): Leen Lambers, Lucas Sakizloglou, Taisiya Khakharova, Fernando Orejas

Primary Category: cs.CR

Categories: cs.CR, cs.LO, cs.SE

Last Updated: 2025-12-15 16:54 UTC

TL;DR

              
Addresses broken access control in Graph APIs, where users may improperly access or manipulate data without required privileges, a critical security vulnerability in modern web services.
Uses static taint analysis by modeling GraphQL API calls as graph transformation rules, applying Critical Pair Analysis to detect both direct and indirect tainted information flows between privileged data sources and sensitive sinks.
Enables systematic identification of privilege escalation risks in GraphQL APIs, such as unauthorized data access, and guides dynamic testing to verify these vulnerabilities, directly enhancing memory safety and system security by preventing exploitation of access control flaws.

2025-12-12

Generated at 2025-12-12 13:38 UTC

Towards Cumulative Abstract Semantics via Handlers

Author(s): Cade Lueker, Andrew Fox, Bor-Yuh Evan Chang

Primary Category: cs.PL

Categories: cs.PL

Last Updated: 2025-12-11 17:55 UTC

TL;DR

              
Addresses the rigidity in abstract interpretation frameworks that intertwine syntax and semantics, hindering modular adjustments like path sensitivity or direction of analysis.
Uses scoped effects and effect handlers to separate syntax elimination from domain-specific semantics, allowing semantic fragments to accumulate against a fixed syntax.
Enables flexible, reusable interpreters for diverse static analyses, enhancing memory safety by supporting precise, customizable low-level program analyses without complex monad transformers.

Lax Modal Lambda Calculi

Author(s): Nachiappan Valliappan

Primary Category: cs.LO

Categories: cs.LO, cs.PL

Last Updated: 2025-12-11 16:15 UTC

TL;DR

              
Addresses the underdevelopment of programming language type theories that incorporate diamond modalities, which are essential for modeling certain computational effects and strong functors in functional programming.
Introduces a family of typed lambda calculi based on sublogics of Lax logic, providing formal semantics and proving properties like normalization and equational completeness, with Agda formalizations.
Enhances memory and system security by enabling more precise type systems that can statically enforce safety properties for effectful computations, reducing runtime errors and vulnerabilities.

2025-12-09

Generated at 2025-12-09 13:39 UTC

Polymorphic Coverage Types

Author(s): Zhe Zhou, Ashish Mishra, Benjamin Delaware, Suresh Jagannathan

Primary Category: cs.PL

Categories: cs.PL

Last Updated: 2025-12-08 18:19 UTC

TL;DR

              
Addresses the problem of verifying whether property-based testing generators can produce all required inputs, as current manual or post-mortem checks are unreliable and don't scale with generator complexity.
Introduces coverage types as refinement types that statically guarantee the set of values an expression must produce, using a polymorphic type system to support higher-order functions and monadic combinators common in testing frameworks.
Provides a formal, automated method to ensure test generators achieve full input coverage, which is critical for memory and system security because incomplete testing can miss vulnerabilities in low-level code.

2025-12-03

Generated at 2025-12-03 13:40 UTC

TypeDis: A Type System for Disentanglement

Author(s): Alexandre Moine, Stephanie Balzer, Alex Xu, Sam Westrick

Primary Category: cs.PL

Categories: cs.PL

Last Updated: 2025-12-02 18:17 UTC

TL;DR

              
The paper addresses the difficulty of statically verifying *disentanglement*—a runtime property where parallel tasks operate on separate memory regions—without requiring complex, manual proofs in separation logic.
It introduces TypeDis, a type system using timestamps on types to identify the allocating task, with features like subtiming (a form of subtyping) and polymorphism to allow timestamp changes at control-flow join points.
This matters for memory safety and system security because it enables automatic, task-local garbage collection without synchronization, reducing runtime overhead and preventing data races in parallel programs through compile-time guarantees.

Belobog: Move Language Fuzzing Framework For Real-World Smart Contracts

Author(s): Wanxu Xia, Ziqiao Kong, Zhengwei Li, Yi Lu, Pan Li, Liqun Yang, Yang Liu, Xiapu Luo, Shaohua Li

Primary Category: cs.CR

Categories: cs.CR, cs.PL, cs.SE

Last Updated: 2025-12-02 16:36 UTC

TL;DR

              
Addresses the risk that Move smart contracts, despite strong static typing, still contain vulnerabilities (like logic errors or complex condition bugs) that evade the type system and can lead to financial losses in blockchains managing billions in assets.
Uses a type-aware fuzzer that builds a type graph from the contract’s code to generate and mutate only well-typed transactions, combined with concolic execution to handle complex runtime checks and explore deeper paths.
Matters because it provides the first effective automated testing for Move contracts, detecting real-world exploits (e.g., in Cetus and Nemo) and critical vulnerabilities that human audits find, directly enhancing memory and asset safety in low‑contract environments.

2025-11-27

Generated at 2025-11-27 13:35 UTC

SV-LIB 1.0: A Standard Exchange Format for Software-Verification Tasks

Author(s): Dirk Beyer, Gidon Ernst, Martin Jonáš, Marian Lingsch-Rosenfeld

Primary Category: cs.PL

Categories: cs.PL, cs.SC, cs.SE

Last Updated: 2025-11-26 15:44 UTC

TL;DR

              
Addresses the fragmentation in verification tools by creating a common exchange format (SV-LIB) that allows tools designed for one language (e.g., C) to be reused for others, reducing redundant development.
Uses an imperative-style intermediate language with SMT-LIB expressions to represent programs, specifications, and verification witnesses, enabling interoperability with existing SMT-solver-based verification infrastructure.
Enhances memory and system security by enabling independent witness validation and broader tool reuse, ensuring verification results (proofs or counterexamples) are trustworthy and not tool-specific.

2025-11-26

Generated at 2025-11-26 13:38 UTC

Translating Large-Scale C Repositories to Idiomatic Rust

Author(s): Saman Dehghan, Tianran Sun, Tianxiang Wu, Zihan Li, Reyhaneh Jabbarvand

Primary Category: cs.SE

Categories: cs.SE, cs.PL

Last Updated: 2025-11-25 18:42 UTC

TL;DR

              
Addresses the gap in C-to-Rust translation where existing methods either produce unsafe, unidiomatic code at scale or are too computationally expensive to apply to large repositories.
Uses a fully automated pipeline (Rustine) that generates compilable, idiomatic Rust by balancing safety features and readability, avoiding excessive unsafe constructs and raw pointers.
Matters because it enables practical migration of legacy C systems to memory-safe Rust, reducing security risks from memory corruption vulnerabilities without requiring prohibitive resources.

2025-11-25

Generated at 2025-11-25 13:37 UTC

Can LLMs Recover Program Semantics? A Systematic Evaluation with Symbolic Execution

Author(s): Rong Feng, Suman Saha

Primary Category: cs.SE

Categories: cs.SE

Last Updated: 2025-11-24 13:55 UTC

TL;DR

              
Addresses the gap in automated tools' ability to recover original program semantics from obfuscated code, which hinders vulnerability detection and program analysis.
Uses fine-tuned LLMs enhanced with symbolic execution artifacts (SMT constraints, path statistics, test cases) to deobfuscate programs transformed by control-flow flattening and encoding techniques.
Improves semantic preservation and compilation success in deobfuscated code, directly strengthening vulnerability analysis and security auditing capabilities for obscured software.

LLM-CSEC: Empirical Evaluation of Security in C/C++ Code Generated by Large Language Models

Author(s): Muhammad Usman Shahid, Chuadhry Mujeeb Ahmed, Rajiv Ranjan

Primary Category: cs.AI

Categories: cs.AI, cs.CR

Last Updated: 2025-11-24 10:31 UTC

TL;DR

              
Identifies that LLM-generated C/C++ code frequently contains known vulnerability types (CWEs) linked to real exploits (CVEs), creating security risks in production systems.
Uses static analysis to detect memory safety flaws (e.g., buffer overflows, use-after-free) in code produced by ten different LLMs across varied prompts.
Shows automated code generation introduces memory corruption risks unless rigorously validated, undermining system security foundations in unsafe languages.

Summary-Mediated Repair: Can LLMs use code summarisation as a tool for program repair?

Author(s): Lukas Twist

Primary Category: cs.SE

Categories: cs.SE

Last Updated: 2025-11-24 05:33 UTC

TL;DR

              
Addresses the problem that LLMs often generate code with subtle implementation bugs not caught during direct repair, despite strong high-level understanding shown in summaries.
Uses natural-language code summaries as an intermediate step in program repair, where error-aware diagnostic summaries help identify mismatches between intent and implementation.
Improves memory safety and low-level reliability by catching implementation flaws before deployment, reducing risks from latent bugs in generated code.

Reasoning about External Calls

Author(s): Sophia Drossopoulou, Julian Mackay, Susan Eisenbach, James Noble

Primary Category: cs.PL

Categories: cs.PL

Last Updated: 2025-11-23 22:23 UTC

TL;DR

              
Addresses the challenge of reasoning about internal code safety when it interacts with untrusted external code, where external code effects are unknown and potentially harmful.
Introduces capability access assertions and effect-limiting specifications in a Hoare logic framework to verify that internal code maintains security properties despite external calls.
Enables formal verification that defensive programming using encapsulation and object capabilities effectively constrains external code's potential damage to memory and system resources.

2025-11-20

Generated at 2025-11-20 13:34 UTC

Soteria: Efficient Symbolic Execution as a Functional Library

Author(s): Sacha-Élie Ayoun, Opale Sjöstedt, Azalea Raad

Primary Category: cs.PL

Categories: cs.PL

Last Updated: 2025-11-19 14:16 UTC

TL;DR

              
Addresses performance and feature limitations in symbolic execution tools caused by intermediate languages, which hinder accurate analysis of complex language semantics like Rust's aliasing rules.
Provides a functional OCaml library for building symbolic execution engines directly in source languages, enabling precise, configurable analysis without intermediate translation.
Enables the first sound Rust symbolic execution engine supporting Tree Borrows, crucial for verifying memory safety and preventing bugs in unsafe code blocks of systems software.

2025-11-18

Generated at 2025-11-18 13:35 UTC

Cost-Driven Synthesis of Sound Abstract Interpreters

Author(s): Qiuhan Gu, Avaljot Singh, Gagandeep Singh

Primary Category: cs.PL

Categories: cs.PL, cs.LG

Last Updated: 2025-11-17 18:16 UTC

TL;DR

              
Addresses the difficulty of manually constructing abstract interpreters that guarantee global soundness, a key requirement for reliable static analysis in security and verification.
Uses LLMs to synthesize abstract interpreters by formulating the task as a constrained optimization problem with a mathematically grounded cost function for unsoundness, combined with syntactic/semantic validation and cost-guided feedback.
Enables automatic generation of sound, high-precision analyzers for complex operations (e.g., nonlinear functions in neural networks), reducing manual effort and improving the trustworthiness of memory safety and system security analyses.

It's a Feature, Not a Bug: Secure and Auditable State Rollback for Confidential Cloud Applications

Author(s): Quinn Burke, Anjo Vahldiek-Oberwagner, Michael Swift, Patrick McDaniel

Primary Category: cs.CR

Categories: cs.CR

Last Updated: 2025-11-17 17:53 UTC

TL;DR

              
Addresses the risk of replay and rollback attacks on cloud applications through untrusted storage, which can reintroduce stale data to compromise decisions, while also allowing legitimate rollbacks for recovery from corruption or misconfiguration.
Uses a reference monitor to mediate state transitions, enforce authorization policies, ensure atomic updates and rollbacks, and produce a tamper-evident log for transparency and auditability.
Enhances memory safety and system security by enabling secure, auditable state management that prevents unauthorized rollback attacks without blocking necessary operational recoveries, reducing vulnerabilities in confidential cloud environments.

2025-11-12

Generated at 2025-11-12 13:36 UTC

QLCoder: A Query Synthesizer For Static Analysis of Security Vulnerabilities

Author(s): Claire Wang, Ziyang Li, Saikat Dutta, Mayur Naik

Primary Category: cs.CR

Categories: cs.CR, cs.PL, cs.SE

Last Updated: 2025-11-12 01:59 UTC

TL;DR

              
Addresses the difficulty of manually writing static analysis queries (e.g., in CodeQL) to detect vulnerabilities from CVE descriptions, which requires specialized security and program analysis expertise.
Uses an LLM-driven agent with execution feedback, constrained by a structured interface to a Language Server Protocol for syntax and a RAG database for semantic guidance, to automatically generate CodeQL queries from CVE metadata.
Achieves 53.4% success in generating queries that correctly identify vulnerabilities in vulnerable code versions but not in patched ones, compared to 10% without the framework, enhancing scalable detection of memory safety and security flaws in software.

2025-11-06

Generated at 2025-11-06 13:35 UTC

SecRepoBench: Benchmarking Code Agents for Secure Code Completion in Real-World Repositories

Author(s): Chihao Shen, Connor Dilgren, Purva Chiniya, Luke Griffith, Yu Ding, Yizheng Chen

Primary Category: cs.CR

Categories: cs.CR, cs.AI

Last Updated: 2025-11-05 16:42 UTC

TL;DR

              
Addresses the lack of benchmarks for evaluating security-aware code completion in realistic C/C++ repository contexts, where current models miss real-world vulnerability patterns.
Creates SecRepoBench with 318 tasks across 27 C/C++ repositories covering 15 Common Weakness Enumerations (CWEs) to test code agents and LLMs.
Shows code agents significantly outperform standalone LLMs, revealing current AI coding tools' limitations in preventing memory safety vulnerabilities like buffer overflows or use-after-free errors.
Provides measurable evidence that real-world secure coding requires more than raw LLM capability, impacting software supply chain security and vulnerability reduction.

2025-10-28

Generated at 2025-10-28 16:09 UTC

Linear effects, exceptions, and resource safety: a Curry-Howard correspondence for destructors

Author(s): Sidney Congard, Guillaume Munch-Maccagnoni, Rémi Douence

Primary Category: cs.PL

Categories: cs.PL, cs.LO

Last Updated: 2025-10-27 16:56 UTC

TL;DR

              
Addresses the challenge of combining linear types, effects (like resource allocation/deallocation), and exceptions while preserving resource safety guarantees in programming languages.
Uses a monadic framework with destructors (inspired by C++/Rust) that trigger automatic cleanup actions, modeling moves and exceptions through side-effecting type system rules.
Ensures resources like memory are never leaked even when exceptions occur, directly improving memory safety and reliability in low-level systems programming.

Are Agents Just Automata? On the Formal Equivalence Between Agentic AI and the Chomsky Hierarchy

Author(s): Roham Koohestani, Ziyou Li, Anton Podkopaev, Maliheh Izadi

Primary Category: cs.AI

Categories: cs.AI, cs.FL

Last Updated: 2025-10-27 16:22 UTC

TL;DR

              
Addresses the inability to formally verify the safety and predictability of AI agents due to their undefined computational power boundaries.
Maps agent memory architectures to automata classes: reflex agents to finite automata, hierarchical agents to pushdown automata, and reflective agents to Turing machines.
Enables formal verification of agent behavior, allowing security guarantees for memory-safe agent designs and risk quantification for undecidable systems.

Proceedings of the Combined 32nd International Workshop on Expressiveness in Concurrency and 22nd Workshop on Structural Operational Semantics

Author(s): Cinzia Di Giusto, Giorgio Bacci

Primary Category: cs.LO

Categories: cs.LO, cs.FL

Last Updated: 2025-10-27 11:01 UTC

TL;DR

              
This is a workshop proceedings, not a research paper addressing a specific security problem or presenting a technical method.
It contains multiple papers on formal semantics and concurrency theory rather than proposing a single security mechanism.
Understanding formal semantics matters for memory safety as it provides mathematical foundations for verifying correctness of low-level systems and concurrent programs.

Beyond Imprecise Distance Metrics: LLM-Predicted Target Call Stacks for Directed Greybox Fuzzing

Author(s): Yifan Zhang, Xin Zhang

Primary Category: cs.CR

Categories: cs.CR, cs.PL, cs.SE

Last Updated: 2025-10-27 08:17 UTC

TL;DR

              
Problem: Directed greybox fuzzers waste time on irrelevant execution paths due to imprecise static analysis-based distance metrics that over-approximate mutation probabilities.
Method: Uses LLMs to predict vulnerability-triggering call stacks from static call graphs, then prioritizes seeds whose execution paths overlap most with these precise call sequences.
Impact: Achieves 1.86x–3.09x faster vulnerability triggering in real programs and found 10 new CVEs, demonstrating concrete improvements in fuzzing efficiency and memory safety bug discovery.

Anvil: A General-Purpose Timing-Safe Hardware Description Language

Author(s): Jason Zhijingcheng Yu, Aditya Ranjan Jha, Umang Mathur, Trevor E. Carlson, Prateek Saxena

Primary Category: cs.AR

Categories: cs.AR, cs.PL, B.5.2; D.3.1

Last Updated: 2025-10-27 04:57 UTC

TL;DR

              
Addresses timing hazards in hardware design caused by unintended signal value changes during multi-cycle operations, a common source of critical bugs in digital circuits.
Uses a type system that statically enforces timing contracts between register mutations and signal usages, preventing unsafe value changes without sacrificing cycle-level timing control.
Eliminates a major class of hardware vulnerabilities by ensuring timing safety across module boundaries, directly preventing erratic behavior that could compromise system reliability and security.