Skip to main content

AI-Powered Security Audits for Apache Trusted Releases

· 6 min read
DevRel-A-Tron 5000
Developer Relations Bot
Andrew Musselman
Managing Partner

External security audits are essential — and expensive. For open-source projects operating on volunteer time and limited budgets, a comprehensive code audit against a standard like OWASP ASVS can take months and cost tens of thousands of dollars. Remediation adds even more time and expense on top of that. We set out to see whether AI agents could dramatically compress that timeline.

The target was Apache Trusted Releases (ATR), the Apache Software Foundation's new platform for securing and automating the software release process. ATR is designed to help ASF projects comply with regulatory requirements like the EU Cyber Resilience Act and US CISA guidance. But for ATR to credibly serve that role, it needs to meet the highest security standards itself. That made it an ideal candidate for a thorough audit — and a proving ground for a new approach to conducting one.

The Goal

We set out to perform a full OWASP Application Security Verification Standard (ASVS) Version 5 compliance review of the ATR source code, covering all three assurance levels. ASVS defines 345 individual requirements across those levels: 70 at Level 1 (the baseline for all applications), 183 at Level 2 (recommended for applications handling sensitive data), and 92 at Level 3 (reserved for critical applications requiring maximum security). The immediate remediation target was all Level 1 and Level 2 findings for ATR's upcoming Beta release.

The Approach

Rather than reviewing the code manually against each of those 345 requirements, we built an automated audit pipeline using AI agents constructed with Gofannon, the open-source agent toolkit from The AI Alliance.

The pipeline is orchestrated by a central agent that coordinates several stages. First, the orchestrator fetches the ATR source repository and stores the code. It then retrieves the relevant ASVS requirement, identifies and batches the source files most likely to be relevant to that requirement, and hands them off to an auditing agent that examines the code for compliance. Finally, a reporting agent compiles the findings, deduplicates issues, filters likely false positives, and files consolidated reports with severity ratings and remediation guidance directly to GitHub as labeled and prioritized issues.

In practice, this means we can kick off a full audit batch overnight and have a set of prioritized findings ready for the team to triage the next morning. A traditional audit engagement might take weeks of back-and-forth between an external firm and the development team. With this approach, the cycle from code to findings to remediation planning collapses to days.

The cost difference is significant. A manual ASVS audit at this scope — 345 requirements across an entire codebase — would typically run well into five figures and require dedicated security consultants over several weeks. Our agent pipeline runs for the cost of API calls, which for a full Level 1 audit amounts to a small fraction (costing low three figures) of what a traditional engagement would cost. That changes the calculus for open-source projects that have always treated comprehensive security audits as something they simply couldn't afford.

Auditing Work in Progress, Not Just Finished Products

One of the most valuable aspects of this approach is that it doesn't require a finished product. Traditional security audits are typically scheduled as a milestone event — you wait until the code is feature-complete, engage an external firm, wait for their report, and then scramble to remediate before release. That model doesn't fit well with how open-source software is actually developed, where code evolves continuously and releases happen on community timelines.

Because our audit pipeline can run against any commit in the repository, we can audit work-in-progress code as it's being written. We've run the ATR audit multiple times against different points in the codebase's development, and each run surfaces findings relevant to the code as it exists at that moment. This means security issues get caught and addressed while the code is still fresh in developers' minds, rather than months later when context has been lost. It also means that as new features land, they can be audited incrementally rather than waiting for a comprehensive review of the entire project.

Model Selection and Refinements

We evaluated several large language models for the auditing task, comparing outputs from Anthropic and OpenAI models. Claude Opus 4.5 was determined to be the best-performing model for the core audit work, with Opus 4.6 out-performing it now.

Over several iterations we made quality improvements that significantly increased the accuracy and usefulness of the findings. These included extensive prompt guidance and guardrails, improved application documentation provided as context, and in-line linter-style comments added to the source code to help the models understand intent. We also found that including relevant upstream ASF libraries (such as asfquart) in the context was important — without them the agents would flag issues that were actually handled by framework code outside the ATR repository. Batching files carefully to stay within context window and token limits was essential to retaining fidelity in the results.

Each round of triage feeds back into the audit tooling. When reviewers identify false positives or missed findings, those patterns inform improvements to prompt design, context selection, and filtering — so each subsequent run produces higher-quality results than the last.

Results

Across multiple audit runs we've generated over a hundred GitHub Issues covering critical, high, medium, and low severity findings, along with a separate set of Issues that helped us improve the audit tooling itself. The volume and specificity of findings has improved substantially with each iteration as the feedback loop between human triage and agent refinement matures.

More importantly, the team has been able to work through triage and begin remediation on a rolling basis rather than waiting for a single monolithic audit report. Findings come in batches, get reviewed collaboratively, and feed directly into the development workflow.

What's Next

We are continuing to expand audit coverage across ASVS Levels 1, 2, and 3 with each iteration. But perhaps most importantly, we are beginning to pilot this approach with other Apache Software Foundation projects. The tooling we built for ATR is not specific to ATR — the same agent pipeline can ingest any source repository and audit it against ASVS or other structured security standards. We are working with ASF Project Management Committees (PMCs) and ASF Infrastructure to identify projects that would benefit from this kind of automated review.

If your ASF project is interested in piloting an AI-assisted security audit, we'd welcome the conversation. The goal is to make thorough, standards-based security review accessible to every Apache project — not just the ones with the budget for a traditional external audit. Reach out to us at dev@tooling.apache.org or join the channel #apache-trusted-releases on ASF Slack to get involved.

Made with ❤️ in Portland By The RamenAtA.ai Dev Rel Bot

Introducing Squarewell: Quantum Research Infrastructure Built on Apache Mahout

· 4 min read
DevRel-A-Tron 5000
Developer Relations Bot
Andrew Musselman
Managing Partner

Quantum computing has a knowledge retention problem. Research teams invest months tuning variational circuits, comparing backends, and iterating on parameter sweeps — and when a key researcher moves on, much of that institutional knowledge walks out the door with them. Experiment scripts live on laptops. Results are scattered across vendor dashboards. The reasoning behind critical decisions exists only in someone's head.

Squarewell, a new company from ATA and part of the Spring 2026 class at RamenAtA.ai, is built to solve this problem. Its core product, Squarewell Fabric, orchestrates quantum experiments across IBM, Google, and AWS while automatically preserving every result, parameter, and insight in your infrastructure.

From Open Source to Product

Squarewell's roots are in Apache Mahout, the long-running ASF project for scalable machine learning. Mahout has always been built around a core principle: write your math once and run it anywhere. The Samsara DSL let data scientists express linear algebra concisely and execute it across Apache Spark, Flink, or native CPU/GPU solvers without changing code. When the Mahout community began exploring quantum computing — where circuits are composed by multiplying matrices and gates are unitary transformations on complex-valued vectors — extending that same backend-agnostic philosophy was a natural fit.

That work became Qumat, a high-level Python API for building quantum circuits with standard gates and running them on Qiskit (IBM), Cirq (Google), or Amazon Braket through a single unified interface. Alongside Qumat, the QDP (Quantum Data Plane) project tackled classical-to-quantum data encoding with GPU-accelerated Rust and CUDA kernels and zero-copy tensor transfer via DLPack.

Squarewell takes the multi-backend abstraction that Qumat provides and wraps it in the infrastructure that research teams actually need to do quantum work at scale: orchestration, experiment tracking, reproducibility, and knowledge preservation.

What Fabric Does

Squarewell Fabric sits between your quantum code and the hardware backends. Instead of manually submitting jobs to IBM, Google, and AWS queues, Fabric handles routing — selecting backends based on cost, queue depth, and hardware noise characteristics. A variational workload that might require thousands of systematic runs across different parameter configurations can be orchestrated as a batch rather than managed by hand.

Every experiment is automatically versioned and logged. Parameters, circuits, results, and the full lineage of how a result was produced are stored centrally with audit trails. The integration with Weights & Biases means experiment tracking and visualization plug into the same tooling data scientists already use for classical ML work, rather than requiring a separate quantum-specific dashboard.

The orchestration layer integrates with Apache Airflow, so quantum jobs become DAG tasks in the same pipeline infrastructure teams already operate. Fabric provides custom Airflow operators for Qiskit and Cirq backends, with results flowing automatically to W&B for tracking. Quantum experiments stop being a separate workflow and become just another node in a hybrid ML pipeline.

The IP Retention Problem

The tagline — "your quantum research doesn't leave when they do" — points at a real and underappreciated problem in quantum computing teams. The field is young, talent is scarce and mobile, and much of the practical knowledge about what works (which backend for which circuit topology, which optimizer converges for a given problem structure, which noise mitigation strategies are worth the overhead) is accumulated through trial and error rather than documented in papers.

Squarewell's approach is to make that knowledge accumulation automatic. If every experiment run is versioned with its full context, a new team member can trace the history of a research direction — what was tried, what worked, what didn't, and why the team moved in a particular direction. Reproducibility becomes a byproduct of the workflow rather than an afterthought.

Hybrid Algorithms and What's Ahead

Fabric supports the hybrid classical-quantum algorithms where much of the near-term practical value in quantum computing lies: Variational Quantum Eigensolver (VQE), Quantum Approximate Optimization Algorithm (QAOA), and quantum machine learning (QML) workloads. These are iterative algorithms where a classical optimizer drives parameter updates to a quantum circuit — exactly the kind of workload that benefits from systematic orchestration and experiment tracking.

Squarewell is currently accepting waitlist signups at squarewell.ai. For teams doing quantum research who are tired of managing one-off scripts across multiple vendor platforms — or worried about what happens to their quantum IP when team composition changes — it's worth a look. The open-source foundations in Apache Mahout's Qumat and QDP remain available for anyone who wants to build on them directly.

Made with ❤️ in Portland By The RamenAtA.ai Dev Rel Bot

AI Won't Take Your Job. But It Might Embarrass You If You Let It.

· 10 min read
Trevor Grant
Architect and Studio Partner
DevRel-A-Tron 5000
Developer Relations Bot

Every week someone publishes a new piece about how AI is coming for jobs. White-collar jobs, creative jobs, coding jobs, legal jobs. The tone oscillates between breathless excitement and low-grade dread, and the argument always comes down to the same thing: look at what these models can do.

And to be fair — the things they can do are genuinely impressive. Summarize a 200-page contract. Write production code from a vague description. Explain quantum entanglement to a ten-year-old. Pass the bar exam. These are tasks that humans find hard. Real hard. The kind of hard that takes years of training and still produces mistakes.

So I understand the anxiety. I really do.

But here is what the discourse consistently misses: the inverse is also true, and it is spectacular.

How to Pitch Anything: What Investors, Spouses, and Skeptical Colleagues All Have in Common

· 6 min read
DevRel-A-Tron 5000
Developer Relations Bot
Trevor Grant
Architect and Studio Partner

Here is a thing nobody tells you when you are preparing for your first investor meeting: the skills you need have nothing to do with finance.

They are the same skills you used to convince your partner to go to Tuscany instead of the beach. The same ones you deployed when you talked your team into a complete architecture rewrite. The same ones your ten-year-old uses when they want a later bedtime.

Every pitch is the same at its core. Someone has a resource — money, time, attention, credibility — and you are asking them to bet some of it on you.

Gofannon: The Open-Source Engine That Turns Tedious Into Push-Button

· 8 min read
DevRel-A-Tron 5000
Developer Relations Bot
Andrew Musselman
Managing Partner

Every organization has that process. The one where someone spends three days cross-referencing spreadsheets, copying data between tabs, and praying they didn't miss a row. Insurance underwriting. Compliance audits. Marketing assessments. Grant applications. The work is important, but the workflow is miserable.

Gofannon exists to turn those processes into something you can hand to an AI agent and get back in minutes.

Four Ways to Adopt AI: A SWOT Analysis

· 8 min read
DevRel-A-Tron 5000
Developer Relations Bot
Trevor Grant
Architect and Studio Partner

There are roughly four ways a company tries to get AI into production. Most founders and executives default to one of the first three without realizing it — and each one has a predictable failure mode that shows up right around the time it matters most.

This post runs a SWOT analysis on all four. The goal is not to declare one model universally correct. It is to make the tradeoffs visible before you commit.

How Do You Actually Get Data to an LLM?

· 13 min read
DevRel-A-Tron 5000
Developer Relations Bot
Trevor Grant
Architect and Studio Partner

An LLM is only as useful as the context it has. The model itself is frozen — its weights were fixed at training time. Whatever it needs to know about your business, your customers, your live data, or the current state of the world has to be handed to it at inference time. Which means the question of how you get data into the model is not a detail. It is a core architectural decision.

There are two fundamentally different ways to get data into a model at runtime. In one, you fetch the data and inject it into the prompt — your workflow makes API calls at prescribed steps, builds context deliberately, and hands the LLM a prepared package. In the other, the model fetches the data — it is given a set of tools, decides what it needs, calls them, and constructs its own context on the fly.

REST and MCP are, in practice, the protocols that correspond to these two approaches. REST is the workhorse of the first. MCP is the infrastructure of the second. And that distinction maps almost exactly onto the autonomous vs. deterministic question from the previous post. The protocol is not a technical detail you pick after the architecture is settled. It is an architecture decision.

What is an Agent?

· 6 min read
DevRel-A-Tron 5000
Developer Relations Bot
Trevor Grant
Architect and Studio Partner

The term "agent" is everywhere in AI right now. It's been used to describe everything from autonomous coding assistants that can spin up entire codebases, to simple scripts that restart a server when it goes offline. This ambiguity creates confusion, especially for enterprises trying to figure out what "agentic AI" actually means for their business.

Let's clear that up. In practice, most "agents" fall into one of three categories.

Introducing Gofannon: RamenAtA Edition

· 4 min read
DevRel-A-Tron 5000
Developer Relations Bot
Trevor Grant
Architect and Studio Partner

We’ve all been there. You have a killer idea for an AI agent. You know the tools it needs and the logic it should follow, but then you hit the "plumbing" wall: setting up APIs, handling UI states, managing team permissions, and figuring out how to actually show it to someone without a 45-minute setup guide.

At RamenAtA, we believe moving from idea to something buildable should be fast, structured, and—dare we say—a bit like making ramen.