← Blog

We audited one of our own internal tools. Here is what we found

AI code qualityZegaware engineering9 min read

Last updated: 2 July 2026

We used to keep a small internal tool that gave our team passwordless access to the open-source applications we run for ourselves. It was built quickly, with outside help, and we never put it through the senior review we run on everything we build for clients. Before deciding whether to keep it, we finally did. It failed, in the ordinary ways that unreviewed software fails: a hard-coded encryption key, credentials stored without proper protection, an authorisation check that had been disabled, a framework years past its security support, and a test suite that proved almost nothing. None of it needed AI to happen, and that is the point. These failures come from skipping the review, not from who or what writes the code.

We review a lot of software at Zegaware, and every verdict carries a named senior engineer's sign-off. It was uncomfortable, and instructive, to point the same review at ourselves. This is an honest account of what we found, why an internal tool ends up in that state, and what we decided to do about it.

Why an internal tool ends up unreviewed

The tool was never meant to be a product. It gave the team passwordless access to the various open-source applications we host for our own use, so that nobody had to pass shared passwords around. It ran inside our own environment, for occasional internal use by a handful of people.

Our senior engineers were committed to client work, and this was a low-stakes convenience, so we made a deliberate decision at the time: we built it quickly with outside help and did not put it through the review we run on client software. That decision is the whole story. It is the same decision a founder makes when a weekend build goes straight to customers, or a team makes when it lets a coding assistant write a feature and ships it unread. The code gets written. The review that would have caught its problems never happens. When we finally ran that review, years later, it found exactly what a skipped review lets accumulate.

What the review found

None of the findings were exotic. Every one maps to a well-known weakness class, which is why an audit surfaces them quickly.

A hard-coded encryption key. One password was encrypted before it was stored, which is the right instinct, but the key was written directly into the source and committed to the repository. Anyone who could read the code could undo the encryption. MITRE tracks this as its own weakness, CWE-321, because it quietly defeats the control it appears to provide [1], and secrets in source remain one of the most common findings in any review. GitGuardian recorded 28.65 million new secrets pushed to public repositories in a single year [2].

Credentials without proper protection. Most of the credentials the tool held were stored without encryption at rest. This is the OWASP Top 10 category of Cryptographic Failures, which sits at number two in the 2021 list for how routinely sensitive data is left unprotected in storage and transit [3].

A disabled authorisation check. A part of the code whose job was to authorise requests to a sensitive route had been disabled and never re-enabled, so the route accepted whatever reached it. Broken Access Control is the top category in the OWASP Top 10 [4], and it is the one we find most often, because it never shows up when the person who built the software tests the path they expect to work.

A stack past its security support. The tool ran on Laravel 9 and PHP 8.1. Laravel 9 stopped receiving security patches on 6 February 2024, and PHP 8.1 reached the end of its security support on 31 December 2025 [5][6]. A dependency had drifted too, carrying a published directory-traversal advisory with no fixed release available [7]. This is the category of Vulnerable and Outdated Components [8], the one that grows on its own while nobody writes a line.

Tests that proved almost nothing. The tool had a test suite. It confirmed that the application started, and little else. A green suite of that kind is worse than none, because it reads as reassurance while asserting no real behaviour.

The author is not the point. The review is

It would be easy to read that list as an argument about competence, or about AI. It is neither. The tool was written by people, before the current generation of coding assistants existed. What let the problems accumulate was not the author. It was the absence of a review.

That is why the same failures show up in AI-built software. A coding assistant produces a great deal of code very quickly, and if none of it is reviewed, it carries the same classes, faster. Veracode's testing across more than 150 models found that 45% of AI-generated code introduces at least one flaw from the OWASP Top 10 [9]. A Carnegie Mellon benchmark found AI agents produced functionally correct code about 61% of the time, but only 10.5% of those solutions were also secure [10]. The gap between working and safe is not an AI problem. It is a review problem, and AI simply makes more code to review.

The uncomfortable lesson for us was that we knew this and skipped the review anyway, because the tool felt too small to bother with. That is precisely the judgement we tell clients not to trust.

What we did about it

We did not patch the tool. We retired it.

That is a real audit outcome, not an evasion. Part of a review is deciding what is safe to keep, what can be fixed, and what should be rebuilt. Here the honest answer was that a credential-handling tool on an end-of-life stack was not worth repairing, and keeping it would contradict everything we tell clients. So we decommissioned it safely and rotated the internal credentials it had held.

Retiring your own work is uncomfortable. It is also the clearest demonstration of what a review is for. The deliverable was never a list of complaints. It was a decision we were willing to put our name to.

Why this matters for what you are shipping

If a small internal tool can drift into this state in a few years, a product assembled at speed with a coding assistant can arrive there far sooner, because there is so much more unreviewed code and so much less time in which it built up. Forrester predicts that 75% of technology decision-makers will see their technical debt rise to a moderate or high level of severity by 2026, and names the rapid development of AI solutions as an accelerant [11].

The remedy is not to stop building quickly. It is to put a review between building something and relying on it, every time, including on the things that feel too small to bother with. We learned that on our own code. We now apply it to ours as well as to yours.

Frequently asked questions

Was the tool built with AI?

No. It predates the current generation of coding assistants, and it was built by people. We are publishing it precisely because the failures are not unique to AI. They are what any unreviewed code accumulates, whether it is written by an outside developer in a hurry, a busy team, or a coding assistant. AI changes the volume and the speed, not the nature of the problem.

It was only an internal tool. Why did it matter?

Because internal tools still hold real credentials, and because the lesson generalises. The point is not the blast radius of this particular tool, which was small. It is that a review we run for every client was skipped on our own software for years, and that skipping it is exactly how these problems accumulate anywhere.

Does this mean old software is always unsafe?

No. Age is not the fault. Unreviewed accumulation is. Software on a supported stack, reviewed on a cadence and patched as its dependencies move, stays safe for a long time. The risk lives in the code nobody has looked at since it was built.

What would you do differently now?

Put even small, internal, low-stakes software through the same review as client work. Design the controls in from the start rather than adding them later, keep the stack supported, and review on a schedule so that years cannot pass in silence.

How does this apply to AI-built software?

Directly. The same failure classes appear, and the volume of code means there is more to review and less time in which it accumulated. That is why we assess AI-built software against recognised standards before it ships, rather than trusting that a working demo means a safe product.

We put our own code through it

We are not here to talk anyone out of building quickly, with AI or without it. We are here to answer the one question a demo cannot: is it safe to rely on? We asked it of our own internal tool, did not like every answer, and acted on all of them.

If you have built something you are about to put in front of customers, scale, or sell, a senior engineer can tell you honestly where it stands before more depends on it. Book a Vibe Code Audit, and we will tell you what we find and put our name to it, the same way we did with our own. You can also read what a vibe code audit actually finds across other people's systems, how we approach securing software you built or inherited, and our view on whether AI-generated code is safe to ship.

Sources

  1. MITRE, CWE-321: Use of Hard-coded Cryptographic Key. https://cwe.mitre.org/data/definitions/321.html
  2. GitGuardian, The State of Secrets Sprawl 2026, 17 March 2026. https://blog.gitguardian.com/the-state-of-secrets-sprawl-2026/
  3. OWASP, Top 10 2021, A02: Cryptographic Failures. https://owasp.org/Top10/A02_2021-Cryptographic_Failures/
  4. OWASP, Top 10 2021, A01: Broken Access Control. https://owasp.org/Top10/A01_2021-Broken_Access_Control/
  5. Laravel, Release notes and support policy (Laravel 9 security fixes ended 6 February 2024). https://endoflife.date/laravel
  6. PHP, Supported Versions (PHP 8.1 security support ended 31 December 2025). https://www.php.net/supported-versions.php
  7. GitHub Advisory Database, CVE-2025-65345: alexusmai/laravel-file-manager Directory Traversal (GHSA-rr44-8j7r-jg2q). https://github.com/advisories/GHSA-rr44-8j7r-jg2q
  8. OWASP, Top 10 2021, A06: Vulnerable and Outdated Components. https://owasp.org/Top10/A06_2021-Vulnerable_and_Outdated_Components/
  9. Veracode, Spring 2026 GenAI Code Security Update, 24 March 2026. https://www.veracode.com/blog/spring-2026-genai-code-security/
  10. Songwen Zhao et al., "Is Vibe Coding Safe? Benchmarking Vulnerability of Agent-Generated Code in Real-World Tasks" (SUSVIBES benchmark), Carnegie Mellon University, arXiv:2512.03262, 2026. https://arxiv.org/abs/2512.03262
  11. Forrester, "Predictions 2025: Technology And Security", 22 October 2024. https://www.forrester.com/press-newsroom/forrester-predictions-2025-tech-security/

Not sure what you are shipping? Our Vibe Code Audit puts senior engineers across your AI-built software and signs off what is safe to ship. Fixed fee, scored review, a clear go or no-go.

Book an audit