Arxiv Selected Papers

2026-04-17

Generated at 2026-04-17 14:22 UTC

Pure Borrow: Linear Haskell Meets Rust-Style Borrowing

Author(s): Yusuke Matsushita, Hiromi Ishii

Primary Category: cs.PL

Categories: cs.PL

Last Updated: 2026-04-16 17:53 UTC

TL;DR

              
Problem addressed: Linear Haskell's linear types could not support non-local, Rust-style borrowing—where a borrowed reference can be split, dropped, or passed around without explicit ownership return—while preserving language purity.
Core technical method: The Pure Borrow library introduces affine mutable references that allow parallel state mutation within pure computations, using a history-based model to track borrowing relationships without requiring IO or ST monads.
Why it matters: This enables safe, leak-free mutation and parallelism in a purely functional setting, combining Haskell's purity, lazy evaluation, and first-class polymorphism with Rust-like borrowing, advancing memory safety without runtime overhead or side effects.

Simplifying Safety Proofs with Forward-Backward Reasoning and Prophecy

Author(s): Eden Frenkel, Kenneth L. McMillan, Oded Padon, Sharon Shoham

Primary Category: cs.LO

Categories: cs.LO, cs.PL

Last Updated: 2026-04-16 17:40 UTC

TL;DR

              
Addresses the difficulty of constructing complex inductive invariants for verifying safety of concurrent/distributed protocols, where manual proofs are error-prone and automated search struggles with Boolean complexity and quantifiers.
Introduces an incremental proof system combining forward reasoning (standard invariants), backward reasoning (invariants of time-reversed system), and prophecy steps (adding witnesses for existential properties) to decompose a single invariant into simpler steps.
Reduces the search space for invariants by allowing simpler formulas with fewer quantifiers, making automated verification more tractable for systems like Paxos and Raft, thus improving reliability of memory-safe and secure low-level implementations.

Exact Structural Abstraction and Tractability Limits

Author(s): Tristan Simas

Primary Category: cs.CC

Categories: cs.CC, cs.AI, cs.LO

Last Updated: 2026-04-16 16:15 UTC

TL;DR

The paper addresses the fundamental impossibility of creating a universal, exact "relevance certification" system for formally specified problems. The core risk is that even with a rigorous problem specification, determining which parts of an input are truly relevant for correctness cannot be done exactly for all problems without encountering intractable cases.
The core technical method is analyzing problems via their admissible-output relation and the induced equivalence classes of inputs. It proves that the barrier to exact classification is the presence of "orbit gaps"—differences between inputs that are not distinguished by any closure-invariant predicate, which are the only predicates a structural classifier can use.
This matters for system security because it sets a mathematical limit on automated reasoning about program correctness and memory safety. It shows that no single automated tool can exactly and universally certify which program components (e.g., memory addresses, data fields) are relevant for safety, forcing a reliance on approximations or domain restrictions.
The paper demonstrates this limit concretely by constructing a "uniform pair-targeted affine witness" problem on binary pairwise domains. For this problem, any two inputs in the same "closure orbit" can disagree on the target output, making exact structural classification impossible. This witness problem is itself a legitimate, rigorously specified problem, proving the obstruction is inherent.

Feedback-Driven Execution for LLM-Based Binary Analysis

Author(s): XiangRui Zhang, Qiang Li, Haining Wang

Primary Category: cs.CR

Categories: cs.CR

Last Updated: 2026-04-16 15:15 UTC

TL;DR

              
Addresses the limitation of one-pass LLM-based binary analysis, which cannot adapt exploration based on intermediate results and struggles with long, multi-path analysis under context constraints.
Introduces FORGE, which uses a feedback-driven loop (reasoning-action-observation) and a Dynamic Forest of Agents to coordinate parallel exploration while managing per-agent context.
Enables scalable, precise vulnerability discovery in firmware (e.g., 1,274 vulnerabilities found with 72.3% precision), improving memory and system security by effectively analyzing complex, real-world binaries.

2026-04-16

Generated at 2026-04-16 15:00 UTC

VeruSAGE: A Study of Agent-Based Verification for Rust Systems

Author(s): Chenyuan Yang, Natalie Neamtu, Chris Hawblitzel, Jacob R. Lorch, Shan Lu

Primary Category: cs.OS

Categories: cs.OS, cs.AI, cs.FL, cs.SE

Last Updated: 2026-04-15 16:44 UTC

TL;DR

              
The paper addresses the gap that while LLMs can generate code, their ability to rigorously prove the correctness of low-level Rust system software using formal verification (specifically the Verus tool) is untested and critical for building secure, memory-safe systems.
It introduces VeruSAGE-Bench, a benchmark of 849 proof tasks from real verified Rust systems, and designs specialized AI agents that strategically combine different LLMs (like GPT-5 and Sonnet 4.5) with verification tools to break down and solve complex proof obligations.
This matters because automating formal proof for Rust systems can drastically accelerate the creation of high-assurance, bug-free software, directly enhancing memory safety and system security by ensuring core correctness properties are mathematically verified.

Persistent Iterators with Value Semantics

Author(s): Yihe Li, Gregory J. Duck

Primary Category: cs.PL

Categories: cs.PL

Last Updated: 2026-04-15 16:44 UTC

TL;DR

              
Addresses the risk of iterator invalidation, aliasing, and side effects in mutable containers (like C++ STL), which can lead to memory corruption, undefined behavior, and security vulnerabilities in low-level systems.
Introduces persistent iterators that capture a snapshot of the container at creation, enabling all operations to work on a local, immutable copy with true value semantics, implemented in a C++ library (LibFPP) as persistent versions of STL containers.
Enhances memory safety by eliminating iterator invalidation bugs and aliasing issues, making code more predictable and secure without requiring a shift to functional programming models, thus preserving familiar imperative patterns for systems programming.

Asynchronous Global Protocols, Precisely: Full Proofs

Author(s): Kai Pischke, Jake Masters, Nobuko Yoshida

Primary Category: cs.PL

Categories: cs.PL

Last Updated: 2026-04-15 09:05 UTC

TL;DR

              
The paper addresses the risk of deadlocks or communication errors in distributed systems when independently optimized components are composed, by ensuring local refinements (like reordering sends and receives) remain globally safe.
It introduces an asynchronous refinement relation (precise asynchronous multiparty subtyping) that allows local components to permute message actions without breaking the overall protocol, verified via coinductive full merging projection.
This matters for system security and low-level safety because it guarantees liveness and deadlock-freedom in message-passing systems, preventing subtle concurrency bugs that could lead to crashes or vulnerabilities in asynchronous distributed software.

Cerisier: A Program Logic for Attestation in a Capability Machine

Author(s): June Rousseau, Denis Carnier, Thomas Van Strydonck, Steven Keuchel, Dominique Devriese, Lars Birkedal

Primary Category: cs.PL

Categories: cs.PL

Last Updated: 2026-04-15 09:04 UTC

TL;DR

              
Addresses the inability to formally verify and reason about trust evolution in systems using hardware enclave attestation, a critical security gap for trusted computing applications.
Introduces Cerisier, a program logic built on Iris separation logic, which extends the Cerise capability machine model to support modular reasoning about trusted, untrusted, and attested code via a universal contract.
Provides a foundation for proving end-to-end security properties (like integrity and secrecy) in mixed-trust systems, directly linking low-level capability safety to high-level attestation guarantees.

BugScope: Learn to Find Bugs Like Human

Author(s): Jinyao Guo, Chengpeng Wang, Dominic Deluca, Jinjie Liu, Zhuo Zhang, Xiangyu Zhang

Primary Category: cs.SE

Categories: cs.SE

Last Updated: 2026-04-15 01:53 UTC

TL;DR

              
Addresses the gap where current LLM-based code auditors underperform because they lack alignment with the specialized, pattern-based reasoning human experts use to find security bugs in complex software.
Uses a framework that mimics human auditing: it first identifies seed bug patterns, retrieves relevant contextual code, and then applies distilled, concrete detection guidelines—trained on real bug reports and mutated examples—to scan code.
Matters for system security because it significantly improves automated bug detection precision and recall over existing tools, enabling the discovery of numerous previously unknown vulnerabilities in critical software like the Linux kernel.

WybeCoder: Verified Imperative Code Generation

Author(s): Fabian Gloeckle, Mantas Baksys, Darius Feher, Kunhao Zheng, Amaury Hayat, Sean B. Holden, Gabriel Synnaeve, Peter O'Hearn

Primary Category: cs.SE

Categories: cs.SE, cs.AI

Last Updated: 2026-04-14 22:21 UTC

TL;DR

              
Addresses the gap in automated verification for imperative code generation, where current LLMs excel at writing code but struggle to produce provably correct, memory-safe implementations.
Uses an agentic framework that interleaves code generation with invariant synthesis and proof construction in Lean, leveraging SMT solvers and interactive proving to co-evolve code, invariants, and correctness proofs.
Enables scalable production of verified low-level algorithms (like Heapsort), directly reducing risks of memory corruption, undefined behavior, and security vulnerabilities by ensuring generated code meets formal specifications.

2026-04-10

Generated at 2026-04-10 14:18 UTC

Soteria: Efficient Symbolic Execution as a Functional Library

Author(s): Sacha-Élie Ayoun, Opale Sjöstedt, Azalea Raad

Primary Category: cs.PL

Categories: cs.PL

Last Updated: 2026-04-09 16:03 UTC

TL;DR

              
Addresses the performance and accuracy trade-offs in symbolic execution tools caused by using intermediate languages, which can obscure source-language semantics and limit support for complex language features like Rust's aliasing model.
Provides a functional OCaml library (Soteria) that enables building symbolic execution engines directly for source languages, using configurable, compositional reasoning without intermediate compilation steps.
Matters for memory safety and system security by enabling precise, efficient bug detection in low-level code; demonstrated with the first Rust symbolic execution engine supporting Tree Borrows, outperforming existing tools in finding bugs in Rust and C programs.

2026-04-09

Generated at 2026-04-09 14:56 UTC

Dimensional Type Systems and Deterministic Memory Management: Design-Time Semantic Preservation in Native Compilation

Author(s): Houston Haynes

Primary Category: cs.PL

Categories: cs.PL, math.CT, math.LO

Last Updated: 2026-04-08 15:33 UTC

TL;DR

              
Addresses the risk of memory safety violations and representation errors in low-level compilation by preserving dimensional type information (e.g., value ranges) through all compilation stages, rather than erasing it early.
Uses a Dimensional Type System (DTS) with constraints from finitely generated abelian groups to infer value ranges and a coeffect-based Deterministic Memory Management (DMM) to classify value lifetimes into four escape categories, jointly guiding representation selection and memory allocation.
Matters because it enables compile-time verification of memory placement (e.g., stack vs. heap) and data representation, reducing runtime memory errors and improving cache locality, while also supporting safe auto-differentiation by preserving semantic properties through transformations.

2026-04-03

Generated at 2026-04-03 14:03 UTC

Dimensional Type Systems and Deterministic Memory Management: Design-Time Semantic Preservation in Native Compilation

Author(s): Houston Haynes

Primary Category: cs.PL

Categories: cs.PL, math.CT, math.LO

Last Updated: 2026-04-02 16:04 UTC

TL;DR

              
Addresses the problem that traditional compilers discard type-level dimensional and escape information before code generation, forcing late-stage heuristic choices for memory layout and representation that can introduce memory-safety risks or inefficiencies.
Uses a Dimensional Type System (DTS) that persists dimensional annotations (like value ranges) through all compilation stages and couples it with a deterministic memory management (DMM) coeffect system to jointly infer numeric representations and provably safe allocation strategies based on escape analysis.
Matters for system security and memory safety because it replaces heuristic, error-prone memory management with compile-time-verified allocation strategies (e.g., stack vs. heap) for different lifetime categories, preventing use-after-free or bounds violations, and ensures representation choices preserve semantic fidelity across targets.

Finding Memory Leaks in C/C++ Programs via Neuro-Symbolic Augmented Static Analysis

Author(s): Huihui Huang, Jieke Shi, Bo Wang, Zhou Yang, David Lo

Primary Category: cs.SE

Categories: cs.SE, cs.CR

Last Updated: 2026-04-02 16:04 UTC

TL;DR

              
Addresses static analyzers' inability to detect memory leaks involving custom memory-management functions and infeasible control-flow paths in C/C++ code.
Uses an LLM to classify project-specific functions as allocators or deallocators, validates summaries with Z3 symbolic reasoning, and integrates them into CodeQL/Infer to improve path-sensitive analysis.
Enables detection of real-world memory leaks that standard tools miss, reducing manual effort and identifying security-critical bugs, as shown by finding 52 leaks (including CVEs) in large codebases.

Triosecuris: Formally Verified Protection Against Speculative Control-Flow Hijacking

Author(s): Jonathan Baumann, Yonghyun Kim, Yan Farba, Catalin Hritcu, Julay Leatherman-Brooks

Primary Category: cs.CR

Categories: cs.CR, cs.PL

Last Updated: 2026-04-02 15:24 UTC

TL;DR

              
Addresses the risk of speculative control-flow hijacking attacks (Spectre-v2 variants like BTB, RSB, and PHT) that bypass traditional control-flow integrity by exploiting CPU speculation.
Combines CET-style hardware control-flow integrity with compiler-inserted speculative load hardening (SLH), using hardware assistance to precisely detect misspeculation on indirect calls and returns to trigger SLH's defense mechanisms.
Provides a formally verified guarantee that transformed programs leak no more information under speculation than the original program does without speculation, strengthening memory safety for arbitrary code without requiring constant-time programming.

YASA: Scalable Multi-Language Taint Analysis on the Unified AST at Ant Group

Author(s): Yayi Wang, Shenao Wang, Jian Zhao, Shaosen Shi, Ting Li, Yan Cheng, Lizhong Bian, Kan Yu, Yanjie Zhao, Haoyu Wang

Primary Category: cs.SE

Categories: cs.SE, cs.CR, cs.PL

Last Updated: 2026-04-02 08:36 UTC

TL;DR

              
Addresses the challenge of performing consistent, precise taint analysis across multi-language enterprise codebases, where single-language tools create high engineering overhead and existing multi-language tools lack scalability and precision for industrial use.
Introduces a Unified Abstract Syntax Tree (UAST) as a common intermediate representation, enabling point-to analysis and taint propagation through a core language-agnostic model augmented by pluggable, language-specific semantic models.
Provides a scalable, unified framework that identified 92 confirmed zero-day vulnerabilities in a 100-million-line industrial deployment, directly enhancing memory and system security by finding unknown data-flow paths that lead to exploitable flaws.

Quokka: Accelerating Program Verification with LLMs via Invariant Synthesis

Author(s): Anjiang Wei, Tianran Sun, Tarun Suresh, Haoze Wu, Ke Wang, Alex Aiken

Primary Category: cs.PL

Categories: cs.PL, cs.AI, cs.CL, cs.LG

Last Updated: 2026-04-02 05:18 UTC

TL;DR

              
The problem is that automated program verification often fails at generating strong loop invariants, which are essential for proving program correctness, especially for complex loops.
The core method is Quokka, a framework that uses large language models (LLMs) to directly propose loop invariants and then rigorously checks each proposal with a verifier to see if it suffices to prove the target program assertion.
This matters for system security because reliable automated verification of low-level code (e.g., in operating systems or cryptography) can catch memory safety bugs and logic flaws before deployment, and LLMs can potentially make this verification more practical and scalable.

Fuzzing with Agents? Generators Are All You Need

Author(s): Vasudev Vikram, Rohan Padhye

Primary Category: cs.SE

Categories: cs.SE

Last Updated: 2026-04-01 22:28 UTC

TL;DR

              
The paper addresses the high effort and expertise required to manually write effective, domain-specific input generators for fuzzing, which are needed to reach deep, bug-revealing states in software.
The core method, Gentoo, uses an LLM coding agent with access to the target's source code to automatically synthesize and iteratively refine a dedicated input generator, optionally guided by fine-grained coverage feedback.
This matters because it shows AI-generated generators can structurally and semantically encode enough logic to achieve high coverage without relying on traditional coverage-guided mutation, potentially automating a key bottleneck in memory and system security testing.

2026-04-01

Generated at 2026-04-01 14:42 UTC

Detecting speculative leaks with compositional semantics

Author(s): Xaver Fabian, Marco Guarnieri, Boris Köpf, Jose F. Morales, Marco Patrignani, Jan Reineke, Andres Sanchez

Primary Category: cs.CR

Categories: cs.CR

Last Updated: 2026-03-31 14:32 UTC

TL;DR

              
The paper addresses the lack of formal security guarantees for defenses against speculative execution attacks like Spectre, which exploit microarchitectural traces from incorrect CPU predictions to leak sensitive data.
It introduces speculative non-interference (SNI) as a formal security definition and a compositional framework that models multiple speculation mechanisms together, implemented in a tool called Spectector that uses symbolic execution and SMT solvers to detect leaks.
This matters because it provides a rigorous method to verify whether software is secure against speculative side-channels, enabling systematic vulnerability detection and defense validation in low-level systems where memory safety failures can be exploited across hardware optimizations.

2026-03-31

Generated at 2026-03-31 14:27 UTC

C2RustXW: Program-Structure-Aware C-to-Rust Translation via Program Analysis and LLM

Author(s): Yanyan Yan, Yang Feng, Jiangshan Liu, Di Liu, Zixi Liu, Hao Teng, Baowen Xu

Primary Category: cs.SE

Categories: cs.SE

Last Updated: 2026-03-30 17:07 UTC

TL;DR

              
Addresses the problem that automated C-to-Rust translators produce syntactically correct but non-idiomatic, unsafe, and unmaintainable Rust code, which hinders practical adoption of memory-safe Rust for legacy systems.
Uses program analysis to extract control flow, data flow, and dependencies from C code, then injects this structure into LLM prompts to guide translation and a multi-stage repair pipeline for generating idiomatic Rust.
Reduces unsafe code blocks in translated output to under 6% and shrinks code size by up to 44%, directly enhancing memory safety and making large-scale migration to a safer language feasible.

2026-03-27

Generated at 2026-03-27 14:13 UTC

Do Code LLMs Do Static Analysis?

Author(s): Chia-Yi Su, Collin McMillan

Primary Category: cs.SE

Categories: cs.SE

Last Updated: 2026-03-26 17:20 UTC

TL;DR

              
The paper examines whether code LLMs can perform static analysis—like call graph or dataflow generation—which is essential for human programmers to understand and manipulate code, revealing a gap in LLMs' ability to replicate this foundational programming skill.
The core method involves testing LLMs on three static analysis tasks (callgraph, AST, and dataflow generation) and three code intelligence tasks (generation, summarization, translation) using models like Gemini, GPT-4o, CodeLlaMA, and Jam, and evaluating their performance and generalization.
It matters because poor static analysis capability in LLMs limits their reliability for secure, low-level programming; without accurately analyzing code structure and dependencies, they may produce unsafe code, undermining memory safety and system security in critical applications.

2026-03-26

Generated at 2026-03-26 14:26 UTC

Nominal Automata with Name Deallocation

Author(s): Simon Prucker, Stefan Milius, Lutz Schröder

Primary Category: cs.FL

Categories: cs.FL

Last Updated: 2026-03-25 16:13 UTC

TL;DR

              
Addresses the modeling gap for memory allocation/deallocation patterns in languages like C, where scopes are interleaved rather than nested, unlike formal models such as λ-calculus.  
Introduces nominal automata with deallocating transitions that explicitly handle allocation and deallocation of names in data words, enabling a finite NFA-style representation.  
Provides a deterministic model for memory-safe reasoning about low-level programs, bridging formal verification and real-world systems with explicit memory management.

QLCoder: A Query Synthesizer For Static Analysis of Security Vulnerabilities

Author(s): Claire Wang, Ziyang Li, Saikat Dutta, Mayur Naik

Primary Category: cs.CR

Categories: cs.CR, cs.PL, cs.SE

Last Updated: 2026-03-25 15:48 UTC

TL;DR

              
Problem: Manually writing static analysis queries to detect security vulnerabilities from CVEs requires deep expertise in both security and program analysis, creating a high barrier to effective use of tools like CodeQL.
Method: QLCoder uses an LLM in a feedback-driven synthesis loop, guided by a structured interface that provides syntax validation via a Language Server Protocol and retrieves relevant query examples and documentation from a RAG database to generate CodeQL queries automatically from CVE metadata.
Impact: This automation significantly increases the ability to produce precise, executable detection rules for known vulnerabilities, directly enhancing proactive security auditing and reducing the time and skill needed to harden codebases against memory safety and other low-level threats.

2026-03-25

Generated at 2026-03-25 14:19 UTC

ConceptCoder: Improve Code Reasoning via Concept Learning

Author(s): Md Mahbubur Rahman, Hengbo Tong, Wei Le

Primary Category: cs.SE

Categories: cs.SE

Last Updated: 2026-03-24 17:37 UTC

TL;DR

              
Addresses the gap where large language models (LLMs) struggle with accurate code reasoning, specifically in vulnerability detection, due to a lack of structured semantic understanding of code.
Trains models via a two-step fine-tuning method: first to recognize human-understandable semantic properties of code (called "code concepts"), then to perform reasoning tasks using those concepts.
Improves vulnerability detection accuracy (e.g., F1 score from 66.32 to 72.15 across models) by enabling more reliable, concept-guided analysis, directly enhancing software security through better identification of memory safety flaws and other low-level vulnerabilities.

2026-03-24

Generated at 2026-03-24 14:20 UTC

Lemma Discovery in Agentic Program Verification

Author(s): Huan Zhao, Haoxin Tu, Zhengyao Liu, Martin Rinard, Abhik Roychoudhury

Primary Category: cs.SE

Categories: cs.SE

Last Updated: 2026-03-23 15:42 UTC

TL;DR

              
Problem: Deductive verification stalls when proving complex verification conditions (VCs) because automated agents cannot discover the necessary helper lemmas, which are auxiliary facts derived from understanding the program's semantics, not just its mathematical encoding.
Core method: LemmaNet is an LLM agent that discovers helper lemmas in two stages: offline, by analyzing source code and specifications to synthesize lemmas linked to the program's meaning, and online, by adapting existing lemmas dynamically as the proof state changes during verification.
Security impact: By automating lemma discovery, the method scales deductive verification to larger, real-world systems (like OS kernels and parsers), directly improving the ability to formally guarantee memory safety and correctness in critical low-level software, reducing security risks from undetected bugs.

Dynamic analysis enhances issue resolution

Author(s): Mingwei Liu, Zihao Wang, Zhenxi Chen, Zheng Pei, Yanlin Wang, Zibin Zheng

Primary Category: cs.SE

Categories: cs.SE

Last Updated: 2026-03-23 14:48 UTC

TL;DR

              
Problem: Automated code repair using LLMs often fails on complex bugs (e.g., implicit type errors, polymorphic flows) because it lacks visibility into program runtime behavior, leading to inefficient, speculative guessing.
Core Method: The DAIRA framework integrates lightweight dynamic analysis into the repair agent's cycle. It uses test tracing to monitor and report runtime data (variable mutations, call stacks), turning the agent's process into evidence-based deduction.
Security/Safety Impact: By enabling precise, deterministic bug diagnosis and repair, this method directly improves memory and system safety by reliably fixing the root causes of subtle, runtime-dependent vulnerabilities that static analysis alone misses.

Set-Theoretic Types for Erlang: Theory, Implementation, and Evaluation

Author(s): Albert Schimpf, Stefan Wehr, Annette Bieniusa

Primary Category: cs.PL

Categories: cs.PL

Last Updated: 2026-03-23 14:36 UTC

TL;DR

              
Addresses Erlang's lack of enforced static typing, which allows runtime type errors to persist across process restarts, reducing reliability in concurrent systems.
Uses a set-theoretic type system with subtyping, union/intersection types, and recursive types to statically check Erlang's dynamic type tests and specifications without requiring code changes.
Prevents memory-safety issues like badarg exceptions and process crashes by catching type mismatches early, improving system stability in distributed, fault-tolerant environments.

2026-03-19

Generated at 2026-03-19 14:09 UTC

AVIATOR: Towards AI-Agentic Vulnerability Injection Workflow for High-Fidelity, Large-Scale Code Security Dataset

Author(s): Amine Lbath, Massih-Reza Amini, Aurelien Delaitre, Vadim Okun

Primary Category: cs.CR

Categories: cs.CR, cs.AI

Last Updated: 2026-03-18 15:39 UTC

TL;DR

              
Addresses the lack of large, realistic, and accurately labeled code vulnerability datasets needed to train effective deep-learning vulnerability detection models, which current datasets fail to provide due to noise, limited coverage, or low real-world fidelity.
Uses a multi-agent AI framework where specialized agents, guided by retrieval-augmented generation (RAG) and fine-tuned with LoRA, collaboratively analyze code and inject context-aware vulnerabilities by mimicking expert reasoning, without relying on predefined patterns.
Produces high-fidelity synthetic vulnerabilities (91–95% success) that improve downstream detection models, boosting F1 scores by up to 25% over prior methods, thereby enhancing the reliability of automated security auditing and memory-safety analysis tools.

2026-03-18

Generated at 2026-03-18 14:23 UTC

Dimensional Type Systems and Deterministic Memory Management: Design-Time Semantic Preservation in Native Compilation

Author(s): Houston Haynes

Primary Category: cs.PL

Categories: cs.PL, math.CT, math.LO

Last Updated: 2026-03-17 12:19 UTC

TL;DR

              
Addresses the gap where compilers discard type-level dimensional information early, preventing joint optimization of numeric representation selection and deterministic memory management based on value ranges and lifetimes.  
Introduces a Dimensional Type System (DTS) that extends Hindley-Milner with constraints from finitely generated abelian groups, enabling persistent dimensional annotations through MLIR lowering stages to guide representation and memory decisions.  
Unifies escape analysis and memory placement via a coeffect discipline (Deterministic Memory Management), classifying value lifetimes into four categories to map each to a verified allocation strategy, ensuring memory safety and predictable low-level behavior.  
Enables design-time verification of memory allocation strategies, representation fidelity, and cache locality, directly impacting system security by eliminating memory errors through deterministic management and improving performance via precise representation selection.

2026-03-12

Generated at 2026-03-12 14:05 UTC

MAD: Memory Allocation meets Software Diversity

Author(s): Manuel Wiesinger, Daniel Dorfmeister, Stefan Brunthaler

Primary Category: cs.CR

Categories: cs.CR

Last Updated: 2026-03-11 14:47 UTC

TL;DR

              
Addresses the risk of generalized RowHammer attacks exploiting DRAM errors, which prior defenses fail to cover due to their focus on early, specific attack patterns.
Introduces MAD, a system that combines memory allocation with software diversity, using two spatial diversification techniques to create sufficient entropy in memory layouts despite subsystem limitations.
Enhances memory safety by delaying RowHammer attacks significantly, enabling proactive responses like system reboots or detailed attack analysis, without requiring hardware changes or incurring performance costs.

2026-03-04

Generated at 2026-03-04 14:00 UTC

VeriStruct: AI-assisted Automated Verification of Data-Structure Modules in Verus

Author(s): Chuyue Sun, Yican Sun, Daneshvar Amrollahi, Ethan Zhang, Shuvendu Lahiri, Shan Lu, David Dill, Clark Barrett

Primary Category: cs.SE

Categories: cs.SE, cs.AI

Last Updated: 2026-03-03 18:26 UTC

TL;DR

              
Addresses the difficulty of using large language models to automatically verify complex data structure modules in Verus, as LLMs often misinterpret the tool's specialized annotation syntax and verification semantics.
Embeds syntax guidance into prompts and adds an automated repair stage to correct annotation errors, orchestrating the generation of abstractions, type invariants, and proof code.
Matters because it automates formal verification for entire data-structure modules (not just single functions), directly improving the assurance of memory safety and correctness in low-level systems code written in Rust.

Efficient Dynamic Algorithms to Predict Short Races

Author(s): Minjian Zhang, Mahesh Viswanathan

Primary Category: cs.PL

Categories: cs.PL, cs.DS

Last Updated: 2026-03-03 16:28 UTC

TL;DR

              
Addresses the problem of detecting short data races in concurrent program traces, where existing dynamic race detectors can be inefficient or miss races that occur within small windows of events, limiting their practical use in continuous monitoring or resource-constrained environments.
Proposes a monitoring framework that focuses on races where the two conflicting events are within a bounded window size \( w \), and provides efficient algorithms for happens-before and sync-preserving race detection that optimize space usage to scale with \( \log w \) rather than the full trace length.
Improves memory safety and system security by enabling more scalable and effective dynamic race detection, allowing for real-time or long-term monitoring with lower overhead, which helps identify concurrency bugs that could lead to crashes, data corruption, or security vulnerabilities in low-level systems.

2026-02-27

Generated at 2026-02-27 14:01 UTC

Array-Carrying Symbolic Execution for Function Contract Generation

Author(s): Weijie Lu, Jingyu Ke, Hongfei Fu, Zhouyue Sun, Yi Zhou, Guoqiang Li, Haokun Li

Primary Category: cs.PL

Categories: cs.PL, cs.LO, cs.SE

Last Updated: 2026-02-26 17:00 UTC

TL;DR

              
The paper addresses the challenge of automatically generating precise function contracts (pre-/post-conditions and modification summaries) for programs that manipulate arrays, where existing methods struggle to track invariants and modifications over specific array segments.
It introduces a symbolic execution technique that explicitly tracks and propagates invariants and modification summaries for contiguous segments of arrays as the analysis proceeds, enabling accurate reasoning about partial array updates.
This matters because it allows automated verification tools to soundly analyze low-level, array-intensive code (common in systems software) by generating correct contracts, which is essential for scaling formal methods to prove memory safety and functional correctness in inter-procedural analysis.

2026-02-26

Generated at 2026-02-26 14:12 UTC

Scylla: Translating an Applicative Subset of C to Safe Rust

Author(s): Aymeric Fromherz, Jonathan Protzenko

Primary Category: cs.PL

Categories: cs.PL

Last Updated: 2026-02-25 16:25 UTC

TL;DR

              
The problem is that existing C-to-Rust translators often produce `unsafe` Rust code, which loses the memory safety guarantees that motivate porting C code in the first place.  
The core method is a type-directed translation for a structured subset of C, using a static "split tree" analysis to model C pointer arithmetic with safe Rust slices and borrows, without requiring `unsafe` blocks.  
This matters because it enables incremental, verifiable migration of critical C codebases (like cryptographic libraries and parsers) to fully memory-safe Rust, while uncovering hidden undefined behaviors in the original C during translation.

RustyDL: A Program Logic for Rust

Author(s): Daniel Drodt, Reiner Hähnle

Primary Category: cs.PL

Categories: cs.PL, cs.LO

Last Updated: 2026-02-25 16:25 UTC

TL;DR

              
Addresses the gap in directly verifying Rust source code for complex functional properties, as existing tools rely on translating code to intermediate languages, which can obscure the original logic and hinder human-guided verification.
Introduces RustyDL, a program logic that operates directly on Rust source code, enabling precise reasoning about ownership, borrowing, and lifetimes without intermediate translation, and integrates it into the KeY verification tool prototype.
Matters for memory safety and system security by providing a foundation for rigorous, human-in-the-loop verification of low-level Rust programs, ensuring correctness of safety-critical properties like data-race freedom and memory integrity beyond Rust's built-in guarantees.

2026-02-24

Generated at 2026-02-24 14:13 UTC

The LLMbda Calculus: AI Agents, Conversations, and Information Flow

Author(s): Zac Garby, Andrew D. Gordon, David Sands

Primary Category: cs.PL

Categories: cs.PL, cs.AI, cs.CR

Last Updated: 2026-02-23 17:22 UTC

TL;DR

              
The paper addresses the risk of prompt injection attacks in AI agent systems, where malicious inputs can compromise the agent's reasoning, trigger dangerous tool calls, or distort outputs, due to the lack of formal models for such systems.
It introduces a lambda calculus extended with primitives for LLM calls and dynamic information-flow control, explicitly modeling conversations and allowing serialization of prompts and parsing of responses as terms within the calculus.
This matters because it provides a semantic foundation to formally analyze and enforce safety in agentic programming, enabling reasoning about defenses like conversation quarantine, code isolation, and information-flow restrictions to protect memory and system integrity.

Taming Scope Extrusion in Gradual Imperative Metaprogramming

Author(s): Tianyu Chen, Darshal Shetty, Jeremy G. Siek, Chao-Hong Chen, Weixi Ma, Arnaud Venet, Rocky Liu

Primary Category: cs.PL

Categories: cs.PL

Last Updated: 2026-02-23 15:19 UTC

TL;DR

              
Problem: Combining gradual typing, metaprogramming, and mutable state risks scope extrusion, where generated code can incorrectly access or modify references outside its intended lexical scope, breaking memory safety. Static solutions exist but fail in gradual systems where code may be partially untyped.
Core method: Introduces a gradual language ($λ^{α,\star}_{\text{Ref}}$) with a dynamic cast calculus that enforces environment classifier checks at runtime. This tracks and validates the lexical scopes of code fragments containing mutable references, preventing unsafe escapes.
Security impact: It guarantees scope safety for low-level metaprogramming with mutable state, ensuring that generated code cannot corrupt memory by accessing dangling or out-of-scope references, which is critical for secure systems programming and safe language interoperation.

Combining Small-Step and Big-Step Semantics to Verify Loop Optimizations

Author(s): David Knothe, Oliver Bringmann

Primary Category: cs.PL

Categories: cs.PL, cs.LO

Last Updated: 2026-02-23 14:12 UTC

TL;DR

              
Addresses the difficulty of verifying structural compiler optimizations like loop unrolling using only small-step semantics, which are better suited for local, step-by-step transformations.
Introduces an abstract behavioral semantics as an interface to combine small-step semantics (for local transformations) with a coinductive big-step semantics (for structural transformations), enabling both to handle finite and infinite execution traces.
Allows verified compilers like CompCert to safely incorporate optimizations that reshape program control-flow, directly enhancing the trustworthiness of performance-critical code without compromising memory safety guarantees.

2026-02-19

Generated at 2026-02-19 14:11 UTC

A Mixed-Methods Study on the Implications of Unsafe Rust for Interoperation, Encapsulation, and Tooling

Author(s): Ian McCormack, Tomas Dougan, Sam Estep, Hanan Hibshi, Jonathan Aldrich, Joshua Sunshine

Primary Category: cs.SE

Categories: cs.SE

Last Updated: 2026-02-18 16:36 UTC

TL;DR

              
Addresses the gap where Rust's unsafe features, needed for tasks like foreign function calls, can reintroduce memory and concurrency bugs due to inadequate tooling for verification and encapsulation.
Uses mixed-methods research—interviews with 19 developers and a survey of 160—to analyze how developers use unsafe code, revealing struggles with foreign aliasing, concurrency patterns, and limited tool support.
Matters for memory safety because poor encapsulation of unsafe code undermines Rust's core guarantees, risking system security; better verification tools are essential for sound multi-language applications.

Towards Cumulative Abstract Semantics via Handlers

Author(s): Cade Lueker, Andrew Fox, Bor-Yuh Evan Chang

Primary Category: cs.PL

Categories: cs.PL

Last Updated: 2026-02-17 22:23 UTC

TL;DR

              
Addresses the modularity problem in abstract interpretation frameworks, where existing designs tightly couple syntax and semantics, making it difficult to adapt analyses for different sensitivities (e.g., path- or flow-sensitive) or directions (e.g., forward/backward) without complex, unwieldy code.
Introduces cumulative abstract semantics using scoped effects and handlers, separating syntax elimination from domain-semantic introduction to allow semantic fragments to accumulate independently against a fixed syntactic structure.
This approach enables a single interpreter to generate multiple dynamic evaluators and static analyses, improving flexibility and maintainability in memory safety and system security tools that rely on precise, adaptable low-level program analysis.

2026-02-10

Generated at 2026-02-10 14:23 UTC

DyMA-Fuzz: Dynamic Direct Memory Access Abstraction for Re-hosted Monolithic Firmware Fuzzing

Author(s): Guy Farrelly, Michael Chesser, Seyit Camtepe, Damith C. Ranasinghe

Primary Category: cs.CR

Categories: cs.CR, cs.SE

Last Updated: 2026-02-09 14:52 UTC

TL;DR

              
The paper addresses the gap in existing firmware fuzzing tools, which fail to test Direct Memory Access (DMA) interfaces—a high-throughput hardware feature that allows peripherals to read/write memory directly, bypassing the CPU and creating untested attack surfaces.
DyMA-Fuzz uses runtime analysis to automatically infer DMA memory access patterns from firmware execution, dynamically identifying DMA descriptors and injecting fuzz inputs into the corresponding data buffers without requiring manual configuration or datasheets.
This matters for system security because DMA is widely used in critical embedded systems (e.g., automotive, medical devices), and missing DMA testing leaves vulnerabilities undetected; the tool exposed new bugs and increased code coverage by up to 122% on real firmware, improving low-level safety testing.

2026-02-06

Generated at 2026-02-06 14:03 UTC

SEAL: Symbolic Execution with Separation Logic (Competition Contribution)

Author(s): Tomáš Brablec, Tomáš Dacík, Tomáš Vojnar

Primary Category: cs.SE

Categories: cs.SE

Last Updated: 2026-02-05 14:29 UTC

TL;DR

              
Addresses the challenge of automatically verifying memory safety and functional correctness in programs that use unbended linked data structures, a key source of bugs in low-level systems code.
Uses separation logic to model complex heap states and a general-purpose solver (Astral) that translates logical reasoning into SMT queries to check program properties.
Provides a modular, extensible foundation for proving absence of memory errors (like use-after-free) in heap-manipulating code, which is critical for securing systems software.

2026-02-05

Generated at 2026-02-05 14:07 UTC

Evaluating Code Reasoning Abilities of Large Language Models Under Real-World Settings

Author(s): Changshu Liu, Alireza Ghazanfari, Yang Chen, Reyhaneh Jabbarvand

Primary Category: cs.SE

Categories: cs.SE

Last Updated: 2026-02-04 17:49 UTC

TL;DR

              
Addresses the risk that current LLM code reasoning evaluations overlook real-world complexities like nested structures and API calls, leading to overestimated practical capabilities.
Uses static and dynamic program analysis to automatically serialize complex, custom data types from GitHub repositories, creating a dataset with 1200 problems categorized by difficulty.
Reveals that existing benchmarks mostly test lower-complexity scenarios, highlighting a gap in assessing true reasoning ability for secure, reliable low-level code.

2026-02-03

Generated at 2026-02-03 14:07 UTC

ATLAS: Automated Toolkit for Large-Scale Verified Code Synthesis

Author(s): Mantas Baksys, Stefan Zetzsche, Olivier Bouissou, Remi Delmas, Soonho Kong, Sean B. Holden

Primary Category: cs.SE

Categories: cs.SE

Last Updated: 2026-02-02 17:29 UTC

TL;DR

              
The problem is a lack of verified, formally correct code to train large language models, which limits their ability to generate code with machine-checked safety guarantees for critical systems.
The core method is an automated pipeline that translates Python solutions into Dafny programs, automatically generates specifications, proves them correct, and decomposes the results into a large-scale synthetic training dataset.
This matters because it directly scales the ability to automatically generate memory-safe and functionally correct low-level code, moving from probabilistic trust in testing to deterministic trust in proofs for security-critical software.

2026-01-29

Generated at 2026-01-29 14:01 UTC

Neural Theorem Proving for Verification Conditions: A Real-World Benchmark

Author(s): Qiyuan Xu, Xiaokun Luan, Renxi Wang, Joshua Ong Jun Leang, Peixin Wang, Haonan Li, Wenda Li, Conrad Watt

Primary Category: cs.AI

Categories: cs.AI, cs.PL, cs.SE

Last Updated: 2026-01-28 18:25 UTC

TL;DR

              
The problem is that real-world program verification often produces verification conditions too difficult for current automated theorem provers, forcing developers to write manual proofs and slowing down practical verification efforts.
The core method is creating a multi-language benchmark (NTP4VC) from verification conditions extracted via industrial tools from actual OS kernels, then testing both general and fine-tuned large language models on their ability to prove these conditions.
This matters because it directly measures and pushes progress on automating a critical bottleneck in software verification, which is essential for building memory-safe and secure low-level systems like kernels without prohibitive manual effort.

2026-01-28

Generated at 2026-01-28 13:52 UTC

Discovering 100+ Compiler Defects in 72 Hours via LLM-Driven Semantic Logic Recomposition

Author(s): Xingbang He, Yuanwei Chen, Hao Wu, Jikang Zhang, Zicheng Wang, Ligeng Chen, Junjie Peng, Haiyang Wei, Yi Qian, Tiantai Zhang, Linzhang Wang, Bing Mao

Primary Category: cs.SE

Categories: cs.SE

Last Updated: 2026-01-27 14:51 UTC

TL;DR

              
The problem is that existing compiler fuzzing methods—relying on syntactic mutations or general LLM fine-tuning—often lose the specific semantic logic (like an out-of-bounds access) that historically triggered bugs, limiting the diversity and effectiveness of generated test programs.
The core method is FeatureFuzz, which extracts "features" from past bug reports: each feature pairs a natural-language description of a bug-prone invariant with a concrete code snippet. It then recomposes these semantic features into coherent groups and instantiates them into valid C programs to fuzz compilers.
This matters because compilers are a root-of-trust in software supply chains; defects can lead to security vulnerabilities or silent miscompilation. By preserving and recombining proven bug-triggering semantics, FeatureFuzz more effectively uncovers deep, critical compiler bugs—finding 113 bugs in GCC/LLVM in 72 hours, with 97 confirmed by developers.

2026-01-27

Generated at 2026-01-27 13:52 UTC

Handling Scope Checks (Extended Version)

Author(s): Michael Lee, Ningning Xie, Oleg Kiselyov, Jeremy Yallop

Primary Category: cs.PL

Categories: cs.PL

Last Updated: 2026-01-26 18:55 UTC

TL;DR

              
Addresses the risk of scope extrusion in metaprogramming, where effect handlers can generate ill-scoped code that may lead to unsafe program behavior, a gap between theoretical type systems and practical dynamic checks in metalanguages.
Introduces a formal calculus (λ₍₍ₒₚ₎₎) to model dynamic checks and proposes the "Cause-for-Concern" check, a dynamic mechanism that detects scope extrusion by tracking environment interactions without relying on static types.
Enhances memory and system security by preventing scope extrusion that could cause undefined behavior in low-level code generation, offering a practical, provably correct alternative to static type systems for safer metaprogramming in languages like MetaOCaml.

Symmetric Proofs of Parameterized Programs

Author(s): Ruotong Cheng, Azadeh Farzan

Primary Category: cs.LO

Categories: cs.LO, cs.PL

Last Updated: 2026-01-26 18:06 UTC

TL;DR

              
Addresses the challenge of verifying safety for infinite-state parameterized programs with complex network topologies, where manual or exhaustive verification is infeasible due to scalability.
Introduces parametric proof spaces that leverage local symmetry in program topologies to reuse proofs for isomorphic neighborhoods, avoiding axiomatization of the topology.
Enables automated, scalable safety verification for low-level concurrent or distributed systems, directly enhancing memory and system security by provably preventing errors in parameterized architectures like grids or rings.

2026-01-15

Generated at 2026-01-15 13:44 UTC

Path-optimal symbolic execution of heap-manipulating programs

Author(s): Pietro Braione, Giovanni Denaro, Luca Guglielmo

Primary Category: cs.SE

Categories: cs.SE, cs.LO

Last Updated: 2026-01-14 16:36 UTC

TL;DR

              
Addresses path explosion in symbolic execution for heap-manipulating programs, where current methods create redundant analysis traces due to aliasing, unlike numeric-input programs which fork only per program path.
Introduces POSE, a path-optimal symbolic execution algorithm that ensures each program path is analyzed exactly once by precisely tracking heap states and aliasing conditions.
Improves memory safety and low-level security analysis by enabling efficient, exhaustive exploration of heap-dependent program behaviors, crucial for finding bugs in systems code with data structures.

2026-01-14

Generated at 2026-01-14 13:43 UTC

Memory DisOrder: Memory Re-orderings as a Timerless Side-channel

Author(s): Sean Siddens, Sanya Srivastava, Reese Levine, Josiah Dykstra, Tyler Sorensen

Primary Category: cs.CR

Categories: cs.CR, cs.AR

Last Updated: 2026-01-13 17:59 UTC

TL;DR

              
Problem Addressed: Relaxed memory models in CPUs/GPUs allow memory operations to be re-ordered, and this re-ordering frequency can be influenced by activity from other processes, creating a side-channel that does not rely on traditional timers.
Core Technical Method: A fuzzing campaign identifies that memory re-orderings can be observed across processes on mainstream processors (X86, Arm, Apple CPUs, and NVIDIA/AMD/Apple GPUs). These re-orderings are used to infer other processes' activity, enabling timerless side-channel attacks such as covert communication and application fingerprinting.
Impact on Security: This exposes a new side-channel vector that bypasses timer-based mitigations, allowing cross-process information leakage. Demonstrated attacks include covert channels (up to 30 Kb/s on X86 CPUs) and reliable fingerprinting of applications like DNN architectures, posing risks to memory safety and system isolation in shared hardware environments.

STELP: Secure Transpilation and Execution of LLM-Generated Programs

Author(s): Swapnil Shinde, Sahil Wadhwa, Andy Luo, Akshay Gupta, Mohammad Shahed Sorower

Primary Category: cs.SE

Categories: cs.SE, cs.AI

Last Updated: 2026-01-13 17:55 UTC

TL;DR

              
Addresses the risk of deploying LLM-generated code directly into production systems, which can contain vulnerabilities like data poisoning, malicious attacks, and hallucinations leading to system malfunctions.
Proposes STELP, a secure transpiler and executor that runs LLM-generated code in a controlled environment, using safety checks and isolation to prevent harmful execution.
Matters for system security by enabling safe use of autonomous AI code generation in production, where human review and traditional testing are impractical or insufficient.

2026-01-08

Generated at 2026-01-08 13:44 UTC

Deadlock-free Context-free Session Types

Author(s): Andreia Mordido, Jorge A. Pérez

Primary Category: cs.PL

Categories: cs.PL

Last Updated: 2026-01-07 16:27 UTC

TL;DR

              
Addresses the risk of deadlocks in message-passing concurrent programs using expressive, tree-like communication protocols (context-free session types), which standard session types cannot handle.
Uses a priority-based type system that statically enforces both adherence to these complex protocols and deadlock freedom, even with recursion, polymorphism, and cyclic communication topologies.
Matters for system security and low-level safety by guaranteeing reliable, predictable concurrent execution without deadlocks, a critical failure mode in distributed systems and parallel software.

2026-01-07

Generated at 2026-01-07 13:43 UTC

FSLH: Flexible Mechanized Speculative Load Hardening

Author(s): Jonathan Baumann, Roberto Blanco, Léon Ducruet, Sebastian Harwig, Catalin Hritcu

Primary Category: cs.CR

Categories: cs.CR, cs.PL

Last Updated: 2026-01-06 14:20 UTC

TL;DR

              
Addresses the incomplete security of Speculative Load Hardening (SLH) against Spectre v1 attacks: Selective SLH only protects constant-time cryptographic code, while Ultimate SLH secures arbitrary programs but imposes prohibitive performance overhead by treating all data as secret.
Introduces a flexible SLH mechanism that generalizes both Selective and Ultimate SLH, allowing security policies to be tailored per program—applying strong hardening only where needed—and formally proves its security using the Rocq theorem prover.
Provides a practical, verifiable defense for non-cryptographic software, enabling efficient memory safety against speculative execution attacks without assuming all data is secret, thus bridging the gap between security and performance for low-level system code.

2025-12-30

Generated at 2025-12-30 13:39 UTC

How Safe Are AI-Generated Patches? A Large-scale Study on Security Risks in LLM and Agentic Automated Program Repair on SWE-bench

Author(s): Amirali Sajadi, Kostadin Damevski, Preetha Chatterjee

Primary Category: cs.CR

Categories: cs.CR, cs.LG, cs.SE

Last Updated: 2025-12-29 16:44 UTC

TL;DR

              
The paper addresses the real risk that LLM-generated patches for fixing software bugs can introduce new security vulnerabilities, particularly in complex real-world codebases, unlike prior studies on simpler synthetic tasks.
It conducts a large-scale security analysis of over 20,000 GitHub issues, comparing patches from developers, a standalone LLM (Llama 3.3), and three agentic frameworks, using automated vulnerability detection tools to identify introduced flaws.
The core finding is that LLM-generated patches often exhibit unique vulnerability patterns not seen in developer patches, with agentic frameworks producing more insecure patches when granted higher autonomy, highlighting a gap in current automated repair tools.
This matters for system security because it shows that deploying LLM-based program repair without safeguards can degrade memory safety and low-level security, necessitating proactive risk assessments that consider both code characteristics and issue context to prevent exploitation.

2025-12-25

Generated at 2025-12-25 13:36 UTC

CoTDeceptor:Adversarial Code Obfuscation Against CoT-Enhanced LLM Code Agents

Author(s): Haoyang Li, Mingjin Li, Jinxin Zuo, Siqi Li, Xiao Li, Hao Wu, Yueming Lu, Xiaochuan He

Primary Category: cs.CR

Categories: cs.CR, cs.MA

Last Updated: 2025-12-24 15:55 UTC

TL;DR

              
Addresses the risk that advanced LLM code-review agents, which use Chain-of-Thought reasoning, can be systematically fooled by obfuscated malicious code, allowing backdoors to slip into software supply chains.
Uses an automated framework (CoTDeceptor) that generates multi-stage, evolving code obfuscation strategies specifically designed to disrupt the LLM's reasoning and semantic analysis processes.
Matters because it demonstrates a critical weakness in current AI-powered security auditing; such evasion enables the propagation of hidden vulnerabilities, undermining trust in automated code review and supply chain safety.

2025-12-17

Generated at 2025-12-17 13:36 UTC

Lexo: Eliminating Stealthy Supply-Chain Attacks via LLM-Assisted Program Regeneration

Author(s): Evangelos Lamprou, Julian Dai, Grigoris Ntousakis, Martin C. Rinard, Nikos Vasilakis

Primary Category: cs.CR

Categories: cs.CR

Last Updated: 2025-12-16 04:40 UTC

TL;DR

              
Problem: Stealthy supply-chain attacks in open-source software, where components appear normal but execute hidden malicious code only in specific target environments, evading conventional detection.
Core method: Lexo automatically learns a component's full observable behavior by generating input-output pairs, then uses multiple LLM instances guided by correctness and coverage metrics to synthesize a new, functionally equivalent version that excludes malicious code.
Security impact: It provides a concrete defense against memory-safety and system-security threats by regenerating components to eliminate hidden malicious functionality, even when direct LLM prompts fail, ensuring low-level safety without breaking compatibility.

The Functional Machine Calculus III: Control

Author(s): Willem Heijltjes

Primary Category: cs.PL

Categories: cs.PL, cs.LO

Last Updated: 2025-12-15 21:21 UTC

TL;DR

              
Addresses the gap between functional and imperative programming by extending a foundational calculus to model control flow like conditionals, loops, and exceptions, which are essential for real-world programming but not natively captured in pure lambda-calculus.
Introduces an operational semantics based on a Krivine machine extended with multiple operand stacks and a continuation stack, enabling precise modeling of sequential, branching, and looping computation within a functional framework.
Provides a unified model where an imperative language can be embedded with guarantees like confluence and type-based termination, directly enhancing memory safety and system security by enabling formal verification of low-level control flow in a well-behaved, analyzable calculus.

Taint Analysis for Graph APIs Focusing on Broken Access Control

Author(s): Leen Lambers, Lucas Sakizloglou, Taisiya Khakharova, Fernando Orejas

Primary Category: cs.CR

Categories: cs.CR, cs.LO, cs.SE

Last Updated: 2025-12-15 16:54 UTC

TL;DR

              
Addresses broken access control in Graph APIs, where users may improperly access or manipulate data without required privileges, a critical security vulnerability in modern web services.
Uses static taint analysis by modeling GraphQL API calls as graph transformation rules, applying Critical Pair Analysis to detect both direct and indirect tainted information flows between privileged data sources and sensitive sinks.
Enables systematic identification of privilege escalation risks in GraphQL APIs, such as unauthorized data access, and guides dynamic testing to verify these vulnerabilities, directly enhancing memory safety and system security by preventing exploitation of access control flaws.

2025-12-12

Generated at 2025-12-12 13:38 UTC

Towards Cumulative Abstract Semantics via Handlers

Author(s): Cade Lueker, Andrew Fox, Bor-Yuh Evan Chang

Primary Category: cs.PL

Categories: cs.PL

Last Updated: 2025-12-11 17:55 UTC

TL;DR

              
Addresses the rigidity in abstract interpretation frameworks that intertwine syntax and semantics, hindering modular adjustments like path sensitivity or direction of analysis.
Uses scoped effects and effect handlers to separate syntax elimination from domain-specific semantics, allowing semantic fragments to accumulate against a fixed syntax.
Enables flexible, reusable interpreters for diverse static analyses, enhancing memory safety by supporting precise, customizable low-level program analyses without complex monad transformers.

Lax Modal Lambda Calculi

Author(s): Nachiappan Valliappan

Primary Category: cs.LO

Categories: cs.LO, cs.PL

Last Updated: 2025-12-11 16:15 UTC

TL;DR

              
Addresses the underdevelopment of programming language type theories that incorporate diamond modalities, which are essential for modeling certain computational effects and strong functors in functional programming.
Introduces a family of typed lambda calculi based on sublogics of Lax logic, providing formal semantics and proving properties like normalization and equational completeness, with Agda formalizations.
Enhances memory and system security by enabling more precise type systems that can statically enforce safety properties for effectful computations, reducing runtime errors and vulnerabilities.

2025-12-09

Generated at 2025-12-09 13:39 UTC

Polymorphic Coverage Types

Author(s): Zhe Zhou, Ashish Mishra, Benjamin Delaware, Suresh Jagannathan

Primary Category: cs.PL

Categories: cs.PL

Last Updated: 2025-12-08 18:19 UTC

TL;DR

              
Addresses the problem of verifying whether property-based testing generators can produce all required inputs, as current manual or post-mortem checks are unreliable and don't scale with generator complexity.
Introduces coverage types as refinement types that statically guarantee the set of values an expression must produce, using a polymorphic type system to support higher-order functions and monadic combinators common in testing frameworks.
Provides a formal, automated method to ensure test generators achieve full input coverage, which is critical for memory and system security because incomplete testing can miss vulnerabilities in low-level code.

2025-12-03

Generated at 2025-12-03 13:40 UTC

TypeDis: A Type System for Disentanglement

Author(s): Alexandre Moine, Stephanie Balzer, Alex Xu, Sam Westrick

Primary Category: cs.PL

Categories: cs.PL

Last Updated: 2025-12-02 18:17 UTC

TL;DR

              
The paper addresses the difficulty of statically verifying *disentanglement*—a runtime property where parallel tasks operate on separate memory regions—without requiring complex, manual proofs in separation logic.
It introduces TypeDis, a type system using timestamps on types to identify the allocating task, with features like subtiming (a form of subtyping) and polymorphism to allow timestamp changes at control-flow join points.
This matters for memory safety and system security because it enables automatic, task-local garbage collection without synchronization, reducing runtime overhead and preventing data races in parallel programs through compile-time guarantees.

Belobog: Move Language Fuzzing Framework For Real-World Smart Contracts

Author(s): Wanxu Xia, Ziqiao Kong, Zhengwei Li, Yi Lu, Pan Li, Liqun Yang, Yang Liu, Xiapu Luo, Shaohua Li

Primary Category: cs.CR

Categories: cs.CR, cs.PL, cs.SE

Last Updated: 2025-12-02 16:36 UTC

TL;DR

              
Addresses the risk that Move smart contracts, despite strong static typing, still contain vulnerabilities (like logic errors or complex condition bugs) that evade the type system and can lead to financial losses in blockchains managing billions in assets.
Uses a type-aware fuzzer that builds a type graph from the contract’s code to generate and mutate only well-typed transactions, combined with concolic execution to handle complex runtime checks and explore deeper paths.
Matters because it provides the first effective automated testing for Move contracts, detecting real-world exploits (e.g., in Cetus and Nemo) and critical vulnerabilities that human audits find, directly enhancing memory and asset safety in low‑contract environments.

2025-11-27

Generated at 2025-11-27 13:35 UTC

SV-LIB 1.0: A Standard Exchange Format for Software-Verification Tasks

Author(s): Dirk Beyer, Gidon Ernst, Martin Jonáš, Marian Lingsch-Rosenfeld

Primary Category: cs.PL

Categories: cs.PL, cs.SC, cs.SE

Last Updated: 2025-11-26 15:44 UTC

TL;DR

              
Addresses the fragmentation in verification tools by creating a common exchange format (SV-LIB) that allows tools designed for one language (e.g., C) to be reused for others, reducing redundant development.
Uses an imperative-style intermediate language with SMT-LIB expressions to represent programs, specifications, and verification witnesses, enabling interoperability with existing SMT-solver-based verification infrastructure.
Enhances memory and system security by enabling independent witness validation and broader tool reuse, ensuring verification results (proofs or counterexamples) are trustworthy and not tool-specific.

2025-11-26

Generated at 2025-11-26 13:38 UTC

Translating Large-Scale C Repositories to Idiomatic Rust

Author(s): Saman Dehghan, Tianran Sun, Tianxiang Wu, Zihan Li, Reyhaneh Jabbarvand

Primary Category: cs.SE

Categories: cs.SE, cs.PL

Last Updated: 2025-11-25 18:42 UTC

TL;DR

              
Addresses the gap in C-to-Rust translation where existing methods either produce unsafe, unidiomatic code at scale or are too computationally expensive to apply to large repositories.
Uses a fully automated pipeline (Rustine) that generates compilable, idiomatic Rust by balancing safety features and readability, avoiding excessive unsafe constructs and raw pointers.
Matters because it enables practical migration of legacy C systems to memory-safe Rust, reducing security risks from memory corruption vulnerabilities without requiring prohibitive resources.

2025-11-25

Generated at 2025-11-25 13:37 UTC

Can LLMs Recover Program Semantics? A Systematic Evaluation with Symbolic Execution

Author(s): Rong Feng, Suman Saha

Primary Category: cs.SE

Categories: cs.SE

Last Updated: 2025-11-24 13:55 UTC

TL;DR

              
Addresses the gap in automated tools' ability to recover original program semantics from obfuscated code, which hinders vulnerability detection and program analysis.
Uses fine-tuned LLMs enhanced with symbolic execution artifacts (SMT constraints, path statistics, test cases) to deobfuscate programs transformed by control-flow flattening and encoding techniques.
Improves semantic preservation and compilation success in deobfuscated code, directly strengthening vulnerability analysis and security auditing capabilities for obscured software.

LLM-CSEC: Empirical Evaluation of Security in C/C++ Code Generated by Large Language Models

Author(s): Muhammad Usman Shahid, Chuadhry Mujeeb Ahmed, Rajiv Ranjan

Primary Category: cs.AI

Categories: cs.AI, cs.CR

Last Updated: 2025-11-24 10:31 UTC

TL;DR

              
Identifies that LLM-generated C/C++ code frequently contains known vulnerability types (CWEs) linked to real exploits (CVEs), creating security risks in production systems.
Uses static analysis to detect memory safety flaws (e.g., buffer overflows, use-after-free) in code produced by ten different LLMs across varied prompts.
Shows automated code generation introduces memory corruption risks unless rigorously validated, undermining system security foundations in unsafe languages.

Summary-Mediated Repair: Can LLMs use code summarisation as a tool for program repair?

Author(s): Lukas Twist

Primary Category: cs.SE

Categories: cs.SE

Last Updated: 2025-11-24 05:33 UTC

TL;DR

              
Addresses the problem that LLMs often generate code with subtle implementation bugs not caught during direct repair, despite strong high-level understanding shown in summaries.
Uses natural-language code summaries as an intermediate step in program repair, where error-aware diagnostic summaries help identify mismatches between intent and implementation.
Improves memory safety and low-level reliability by catching implementation flaws before deployment, reducing risks from latent bugs in generated code.

Reasoning about External Calls

Author(s): Sophia Drossopoulou, Julian Mackay, Susan Eisenbach, James Noble

Primary Category: cs.PL

Categories: cs.PL

Last Updated: 2025-11-23 22:23 UTC

TL;DR

              
Addresses the challenge of reasoning about internal code safety when it interacts with untrusted external code, where external code effects are unknown and potentially harmful.
Introduces capability access assertions and effect-limiting specifications in a Hoare logic framework to verify that internal code maintains security properties despite external calls.
Enables formal verification that defensive programming using encapsulation and object capabilities effectively constrains external code's potential damage to memory and system resources.

2025-11-20

Generated at 2025-11-20 13:34 UTC

Soteria: Efficient Symbolic Execution as a Functional Library

Author(s): Sacha-Élie Ayoun, Opale Sjöstedt, Azalea Raad

Primary Category: cs.PL

Categories: cs.PL

Last Updated: 2025-11-19 14:16 UTC

TL;DR

              
Addresses performance and feature limitations in symbolic execution tools caused by intermediate languages, which hinder accurate analysis of complex language semantics like Rust's aliasing rules.
Provides a functional OCaml library for building symbolic execution engines directly in source languages, enabling precise, configurable analysis without intermediate translation.
Enables the first sound Rust symbolic execution engine supporting Tree Borrows, crucial for verifying memory safety and preventing bugs in unsafe code blocks of systems software.

2025-11-18

Generated at 2025-11-18 13:35 UTC

Cost-Driven Synthesis of Sound Abstract Interpreters

Author(s): Qiuhan Gu, Avaljot Singh, Gagandeep Singh

Primary Category: cs.PL

Categories: cs.PL, cs.LG

Last Updated: 2025-11-17 18:16 UTC

TL;DR

              
Addresses the difficulty of manually constructing abstract interpreters that guarantee global soundness, a key requirement for reliable static analysis in security and verification.
Uses LLMs to synthesize abstract interpreters by formulating the task as a constrained optimization problem with a mathematically grounded cost function for unsoundness, combined with syntactic/semantic validation and cost-guided feedback.
Enables automatic generation of sound, high-precision analyzers for complex operations (e.g., nonlinear functions in neural networks), reducing manual effort and improving the trustworthiness of memory safety and system security analyses.

It's a Feature, Not a Bug: Secure and Auditable State Rollback for Confidential Cloud Applications

Author(s): Quinn Burke, Anjo Vahldiek-Oberwagner, Michael Swift, Patrick McDaniel

Primary Category: cs.CR

Categories: cs.CR

Last Updated: 2025-11-17 17:53 UTC

TL;DR

              
Addresses the risk of replay and rollback attacks on cloud applications through untrusted storage, which can reintroduce stale data to compromise decisions, while also allowing legitimate rollbacks for recovery from corruption or misconfiguration.
Uses a reference monitor to mediate state transitions, enforce authorization policies, ensure atomic updates and rollbacks, and produce a tamper-evident log for transparency and auditability.
Enhances memory safety and system security by enabling secure, auditable state management that prevents unauthorized rollback attacks without blocking necessary operational recoveries, reducing vulnerabilities in confidential cloud environments.

2025-11-12

Generated at 2025-11-12 13:36 UTC

QLCoder: A Query Synthesizer For Static Analysis of Security Vulnerabilities

Author(s): Claire Wang, Ziyang Li, Saikat Dutta, Mayur Naik

Primary Category: cs.CR

Categories: cs.CR, cs.PL, cs.SE

Last Updated: 2025-11-12 01:59 UTC

TL;DR

              
Addresses the difficulty of manually writing static analysis queries (e.g., in CodeQL) to detect vulnerabilities from CVE descriptions, which requires specialized security and program analysis expertise.
Uses an LLM-driven agent with execution feedback, constrained by a structured interface to a Language Server Protocol for syntax and a RAG database for semantic guidance, to automatically generate CodeQL queries from CVE metadata.
Achieves 53.4% success in generating queries that correctly identify vulnerabilities in vulnerable code versions but not in patched ones, compared to 10% without the framework, enhancing scalable detection of memory safety and security flaws in software.

2025-11-06

Generated at 2025-11-06 13:35 UTC

SecRepoBench: Benchmarking Code Agents for Secure Code Completion in Real-World Repositories

Author(s): Chihao Shen, Connor Dilgren, Purva Chiniya, Luke Griffith, Yu Ding, Yizheng Chen

Primary Category: cs.CR

Categories: cs.CR, cs.AI

Last Updated: 2025-11-05 16:42 UTC

TL;DR

              
Addresses the lack of benchmarks for evaluating security-aware code completion in realistic C/C++ repository contexts, where current models miss real-world vulnerability patterns.
Creates SecRepoBench with 318 tasks across 27 C/C++ repositories covering 15 Common Weakness Enumerations (CWEs) to test code agents and LLMs.
Shows code agents significantly outperform standalone LLMs, revealing current AI coding tools' limitations in preventing memory safety vulnerabilities like buffer overflows or use-after-free errors.
Provides measurable evidence that real-world secure coding requires more than raw LLM capability, impacting software supply chain security and vulnerability reduction.

2025-10-28

Generated at 2025-10-28 16:09 UTC

Linear effects, exceptions, and resource safety: a Curry-Howard correspondence for destructors

Author(s): Sidney Congard, Guillaume Munch-Maccagnoni, Rémi Douence

Primary Category: cs.PL

Categories: cs.PL, cs.LO

Last Updated: 2025-10-27 16:56 UTC

TL;DR

              
Addresses the challenge of combining linear types, effects (like resource allocation/deallocation), and exceptions while preserving resource safety guarantees in programming languages.
Uses a monadic framework with destructors (inspired by C++/Rust) that trigger automatic cleanup actions, modeling moves and exceptions through side-effecting type system rules.
Ensures resources like memory are never leaked even when exceptions occur, directly improving memory safety and reliability in low-level systems programming.

Are Agents Just Automata? On the Formal Equivalence Between Agentic AI and the Chomsky Hierarchy

Author(s): Roham Koohestani, Ziyou Li, Anton Podkopaev, Maliheh Izadi

Primary Category: cs.AI

Categories: cs.AI, cs.FL

Last Updated: 2025-10-27 16:22 UTC

TL;DR

              
Addresses the inability to formally verify the safety and predictability of AI agents due to their undefined computational power boundaries.
Maps agent memory architectures to automata classes: reflex agents to finite automata, hierarchical agents to pushdown automata, and reflective agents to Turing machines.
Enables formal verification of agent behavior, allowing security guarantees for memory-safe agent designs and risk quantification for undecidable systems.

Proceedings of the Combined 32nd International Workshop on Expressiveness in Concurrency and 22nd Workshop on Structural Operational Semantics

Author(s): Cinzia Di Giusto, Giorgio Bacci

Primary Category: cs.LO

Categories: cs.LO, cs.FL

Last Updated: 2025-10-27 11:01 UTC

TL;DR

              
This is a workshop proceedings, not a research paper addressing a specific security problem or presenting a technical method.
It contains multiple papers on formal semantics and concurrency theory rather than proposing a single security mechanism.
Understanding formal semantics matters for memory safety as it provides mathematical foundations for verifying correctness of low-level systems and concurrent programs.

Beyond Imprecise Distance Metrics: LLM-Predicted Target Call Stacks for Directed Greybox Fuzzing

Author(s): Yifan Zhang, Xin Zhang

Primary Category: cs.CR

Categories: cs.CR, cs.PL, cs.SE

Last Updated: 2025-10-27 08:17 UTC

TL;DR

              
Problem: Directed greybox fuzzers waste time on irrelevant execution paths due to imprecise static analysis-based distance metrics that over-approximate mutation probabilities.
Method: Uses LLMs to predict vulnerability-triggering call stacks from static call graphs, then prioritizes seeds whose execution paths overlap most with these precise call sequences.
Impact: Achieves 1.86x–3.09x faster vulnerability triggering in real programs and found 10 new CVEs, demonstrating concrete improvements in fuzzing efficiency and memory safety bug discovery.

Anvil: A General-Purpose Timing-Safe Hardware Description Language

Author(s): Jason Zhijingcheng Yu, Aditya Ranjan Jha, Umang Mathur, Trevor E. Carlson, Prateek Saxena

Primary Category: cs.AR

Categories: cs.AR, cs.PL, B.5.2; D.3.1

Last Updated: 2025-10-27 04:57 UTC

TL;DR

              
Addresses timing hazards in hardware design caused by unintended signal value changes during multi-cycle operations, a common source of critical bugs in digital circuits.
Uses a type system that statically enforces timing contracts between register mutations and signal usages, preventing unsafe value changes without sacrificing cycle-level timing control.
Eliminates a major class of hardware vulnerabilities by ensuring timing safety across module boundaries, directly preventing erratic behavior that could compromise system reliability and security.