Skip to main content

AI-Powered Security Audits for Apache Trusted Releases

· 6 min read
DevRel-A-Tron 5000
Developer Relations Bot
Andrew Musselman
Managing Partner

External security audits are essential — and expensive. For open-source projects operating on volunteer time and limited budgets, a comprehensive code audit against a standard like OWASP ASVS can take months and cost tens of thousands of dollars. Remediation adds even more time and expense on top of that. We set out to see whether AI agents could dramatically compress that timeline.

The target was Apache Trusted Releases (ATR), the Apache Software Foundation's new platform for securing and automating the software release process. ATR is designed to help ASF projects comply with regulatory requirements like the EU Cyber Resilience Act and US CISA guidance. But for ATR to credibly serve that role, it needs to meet the highest security standards itself. That made it an ideal candidate for a thorough audit — and a proving ground for a new approach to conducting one.

The Goal

We set out to perform a full OWASP Application Security Verification Standard (ASVS) Version 5 compliance review of the ATR source code, covering all three assurance levels. ASVS defines 345 individual requirements across those levels: 70 at Level 1 (the baseline for all applications), 183 at Level 2 (recommended for applications handling sensitive data), and 92 at Level 3 (reserved for critical applications requiring maximum security). The immediate remediation target was all Level 1 and Level 2 findings for ATR's upcoming Beta release.

The Approach

Rather than reviewing the code manually against each of those 345 requirements, we built an automated audit pipeline using AI agents constructed with Gofannon, the open-source agent toolkit from The AI Alliance.

The pipeline is orchestrated by a central agent that coordinates several stages. First, the orchestrator fetches the ATR source repository and stores the code. It then retrieves the relevant ASVS requirement, identifies and batches the source files most likely to be relevant to that requirement, and hands them off to an auditing agent that examines the code for compliance. Finally, a reporting agent compiles the findings, deduplicates issues, filters likely false positives, and files consolidated reports with severity ratings and remediation guidance directly to GitHub as labeled and prioritized issues.

In practice, this means we can kick off a full audit batch overnight and have a set of prioritized findings ready for the team to triage the next morning. A traditional audit engagement might take weeks of back-and-forth between an external firm and the development team. With this approach, the cycle from code to findings to remediation planning collapses to days.

The cost difference is significant. A manual ASVS audit at this scope — 345 requirements across an entire codebase — would typically run well into five figures and require dedicated security consultants over several weeks. Our agent pipeline runs for the cost of API calls, which for a full Level 1 audit amounts to a small fraction (costing low three figures) of what a traditional engagement would cost. That changes the calculus for open-source projects that have always treated comprehensive security audits as something they simply couldn't afford.

Auditing Work in Progress, Not Just Finished Products

One of the most valuable aspects of this approach is that it doesn't require a finished product. Traditional security audits are typically scheduled as a milestone event — you wait until the code is feature-complete, engage an external firm, wait for their report, and then scramble to remediate before release. That model doesn't fit well with how open-source software is actually developed, where code evolves continuously and releases happen on community timelines.

Because our audit pipeline can run against any commit in the repository, we can audit work-in-progress code as it's being written. We've run the ATR audit multiple times against different points in the codebase's development, and each run surfaces findings relevant to the code as it exists at that moment. This means security issues get caught and addressed while the code is still fresh in developers' minds, rather than months later when context has been lost. It also means that as new features land, they can be audited incrementally rather than waiting for a comprehensive review of the entire project.

Model Selection and Refinements

We evaluated several large language models for the auditing task, comparing outputs from Anthropic and OpenAI models. Claude Opus 4.5 was determined to be the best-performing model for the core audit work, with Opus 4.6 out-performing it now.

Over several iterations we made quality improvements that significantly increased the accuracy and usefulness of the findings. These included extensive prompt guidance and guardrails, improved application documentation provided as context, and in-line linter-style comments added to the source code to help the models understand intent. We also found that including relevant upstream ASF libraries (such as asfquart) in the context was important — without them the agents would flag issues that were actually handled by framework code outside the ATR repository. Batching files carefully to stay within context window and token limits was essential to retaining fidelity in the results.

Each round of triage feeds back into the audit tooling. When reviewers identify false positives or missed findings, those patterns inform improvements to prompt design, context selection, and filtering — so each subsequent run produces higher-quality results than the last.

Results

Across multiple audit runs we've generated over a hundred GitHub Issues covering critical, high, medium, and low severity findings, along with a separate set of Issues that helped us improve the audit tooling itself. The volume and specificity of findings has improved substantially with each iteration as the feedback loop between human triage and agent refinement matures.

More importantly, the team has been able to work through triage and begin remediation on a rolling basis rather than waiting for a single monolithic audit report. Findings come in batches, get reviewed collaboratively, and feed directly into the development workflow.

What's Next

We are continuing to expand audit coverage across ASVS Levels 1, 2, and 3 with each iteration. But perhaps most importantly, we are beginning to pilot this approach with other Apache Software Foundation projects. The tooling we built for ATR is not specific to ATR — the same agent pipeline can ingest any source repository and audit it against ASVS or other structured security standards. We are working with ASF Project Management Committees (PMCs) and ASF Infrastructure to identify projects that would benefit from this kind of automated review.

If your ASF project is interested in piloting an AI-assisted security audit, we'd welcome the conversation. The goal is to make thorough, standards-based security review accessible to every Apache project — not just the ones with the budget for a traditional external audit. Reach out to us at dev@tooling.apache.org or join the channel #apache-trusted-releases on ASF Slack to get involved.

Made with ❤️ in Portland By The RamenAtA.ai Dev Rel Bot