With the increasing popularity of AI-created code, NetSPI sought to assess the potential types of vulnerabilities present in an (almost) exclusively AI-coded web application. The results of this experiment can be used to predict industry trends.

TL;DR

  • The increased usage of AI coding platforms is potentially introducing vulnerabilities into “vibe coded” applications
  • We vibe coded our own application and tested the application for security issues
  • Security in the application was a mixed bag, but there were some glaring issues

This is not an anti-AI post. AI is only going to get bigger.
Research like this will help those of us who are trying to keep data and systems safe be PROACTIVE instead of REACTIVE.

What is Vibe Coding?

Firstly, we need to understand what is behind the term: Vibe Coding. Vibe coding is something that’s only possible with AI chat interfaces added to code editors and IDEs (integrated development environments). In this scenario, we’re using and referencing Cursor, but there are similar experiences available with Copilot, Windsurf, etc.

You could type a prompt like “Add a login page” and it looks at what exists in your code, and does its best to add a login page, given context of what’s already there. It’s super neat!

Most developers (we hope), are using a mix of AI-powered coding and self-driven efforts and checking the AI’s work as they go.

You could totally take AI coding to the extreme and hardly touch the code base at all. You could pump up the vibes, pump down the braincells. Exclusively use prompts, it’s faster! There’s no need to read the changes before accepting them, because you can just test the functionality and ask for changes based on that. Before you know it, you have an app!

NOW that’s what I call: Vibe Coding!

Experiment Outline

Yes, the idea of vibe coding is funny. But no one’s actually doing that, right? RIGHT?

They are. And even if they’re doing some things themselves, we’ll see more AI-coded components in existing codebases. Businesses and individuals will be able to churn out code at a speed they never could before. It’s hard to say no to that productivity increase.

Because of that, we couldn’t resist peeling-back the layers of “what is the AI even doing?”  Will we see security vulnerabilities go away? Will we see certain vulnerabilities minimized and certain ones more frequently? What’s going to happen? We’ve got to know!!!

To answer all of these questions, we did a 4-phased experiment:

  1. Coding: We made a vibe-coded web application
  2. Security Audit: We asked the AI to review the codebase and suggest security changes
  3. Security Implementation: We implemented the security suggestions
  4. Pentest: We performed a pentest to see which vulnerabilities were there

Note: we performed 3 total security audits. After the first, we performed the security implementation. Then, we asked for another security audit, performed those security implementations. Finally, we asked for a 3rd audit – at which point, it started contradicting previous suggestions (or giving different ones), so we elected to not implement those suggestions.

Example Security Audit Prompt:

“Review the entire codebase for this project and suggest any security changes or updates that would make this app more secure.”

Rules

Because we are security professionals, we had to implement some rules to make this experiment as relevant as possible. We wanted to know what the “default AI” behavior would be.

  1. We kept the prompts vague and didn’t specify how we wanted things implemented.
    1. E.g. “Ensure only the admin role can access XYZ functionality”
  2. We were only allowed to touch the code itself if it was a very simple fix or something equivalent to copy/pasting.
    1. This lowered the tokens needed but also, frankly, sometimes getting it to do little fixes were difficult. It had a tendency to over-engineer some solutions that should’ve been simple.
  3. For troubleshooting, we would provide error messages, but the AI needed to decide on the path to fix.
  4. We could only make security recommendations if it was something the AI had already implemented. This was a rule added during the experiment. You’ll see why!
  5. We used a variety of models throughout, but we wanted to give it the best shot. Therefore, we used the most trusted models for large code changes and the security audits (e.g. claude-4-sonnet thinking in MAX mode – one of the top code generation LLMs at the time of coding).

The App

Now, what did we actually build? A dental app! We call it DentAIst (pronounced den-tayst) so that you would constantly think you were misspelling “dentist” – isn’t that fun?

DentAIst has Admin, Reception, Dentist, Dental Assistant, and Patient roles.

It has appointment scheduling, room assignment, and tooth status tracking as well as X-ray image upload, billing and invoicing, and user management. We thought this would be a good mix of functionality for a variety of potential vulnerability categories.

Vibe Coding App Example
This screenshot shows the appointment scheduling functionality.

Stories By Category

To show the process of the different phases from default AI behavior, through the security audit, and then to pentesting, let’s take it by category.

Passwords & the Database

Starting simply, when we first asked for the app to have users with corresponding passwords, the AI helpfully created a database for us.

However, by default, the AI did not:

  • Hash the passwords in the database
  • Encrypt the database itself
  • Suggest any sort of password policy

We won’t get into why these are all BIG OOPSIES in this post (we’ll be here all day!), just note that these are all insecure practices.

However, that’s the worst-case scenario. If developers ask the AI to perform a security audit, these things did get caught in the first audit.

Screenshot showing part of the 1st security audit done by the AI

Previously we were at a 1.8/10 and now we’re at 8.6/10, a 378% improvement!
WOW! Where did the AI get those numbers? Who knows?!

Injections

This is generally a broad vulnerability category, but the main ones that would’ve been relevant were SQL injection (SQLi) and Cross-Site Scripting (XSS).

SQLi

SQLi happened to be in a good spot because it was using a framework that has built-in parameterization of queries, which is the main SQLi remediation technique.

It’s a good thing that this was essentially sorted from the beginning, because it wasn’t until the 2nd security audit that the AI thought it would be noteworthy to even consider input validation and parameterized queries.

While this framework happened to be secure by default, many are not. In enterprise settings, developers are often working with no say over existing frameworks and technology stacks. In those cases, it’s possible that they would be wide open to SQLi if relying on AI’s defaults.

A screenshot showing part of the 2nd security audit done by the AI
 Output from the 1st security audit performed by AI

XSS

Now, for XSS, this is a wild ride, strap in. In the beginning, there was nothing. AI defaults left the application wide open to XSS.

In the 1st security audit, AI bestowed upon us: Security Headers. It was very pleased with itself, stating that the security headers were comprehensive and production secure.

The noteworthy security header in this case was the wily beast we call Content-Security Policy (CSP). Unfortunately, hoping for a perfectly configured CSP header was a fool’s errand. It was mostly good, but it was set up in a way that left the door open for bypasses. An example of a bypass could be through uploading a .js or .html file to the application and referencing that in a <script> tag.

Also, and this is very noteworthy, in the process of trying to get the app back to functionality after implementing the security fixes, there were CSP errors. By default, the AI had made every script call inline in the HTML. This isn’t a best practice, because it conflicts with the parts of the CSP that try to prevent XSS.

In keeping with the rules we set out for ourselves, once the CSP was implemented, we would just paste the error message. The AI’s solution was, of course, to helpfully suggest to remove that pesky CSP header that was causing the error. We had to keep redirecting it to keep the CSP header that it had suggested. Eventually we were able to get it to stop trying to remove it by adding “Note that I want any changes to be able to be used in production” to each prompt.

The conclusion here is that even if you are trying to do things that are security-minded, if it goes against the AI defaults, you often have to go out of your way to make sure that the security implementations don’t get overwritten.

In the 2nd security audit, the AI decided that XSS prevention needed HTML encoding and script tag removal. The first one is great and will prevent most XSS scenarios. It really should be context-specific output encoding, but HTML encoding would have been fine in the scenarios in this app.

Unfortunately, it was only doing this in a few spots. There were many instances where it wasn’t encoding and vulnerable to HTML injection. It would’ve been equally vulnerable to XSS if we let it remove the CSP header like it kept trying to.

Also, its suggestion to remove script tags is OK but there are many other HTML tags that facilitate XSS. And even then, it wasn’t removing script tags recursively. One could simply nestle a script tag in a warm blanket of another script tag to bypass that. I.e. <scr<script>ipt> –> after only the middle script is removed –> <script> .

At the end of the day, we have a sub-par implementation of an incomplete fix.

Authorization

We have saved the best for last with this category.

Before implementing any authorization checks, we had the baseline functionality set up, and we had users with specific roles. There was already a login page that would give a session token in the form of a cookie. This is fine, no issues there.

To implement access controls (note this is still before the security audit), we would ask the AI to restrict access to both a page and associated functionality based off of a user’s role. What’s noteworthy is that we were careful to not specify anything that would imply that we wanted either a server-side or client-side-only implementation. Someone with security in mind could read our request and they would implement a server-side check for role.

Here’s a portion of the prompt we used for that implementation:

Implement role-based access controls to adhere to the following:

Admin (can access everything)
- Login
- Index/Homepage
- Patient List
<etc.>

Ensure that each user role can only access and perform the functionality associated with the pages listed above.

Instead of doing that, and despite the existing session cookies and the user role being an attribute that it could check on the server-side, it decided adding a new “User-Role” request header would be the thing to do.

Here are some screenshots of the response to add role-based access controls:
Note that it calls the use of the User-Role header “enterprise-grade” – yikes!
Hopefully the old-timey meme implies how light-speed dumb this is

A low-privileged malicious actor could simply take an existing request like this:

At least there was nowhere to go but up from here in the 1st security audit! At that time, it suggested that we remove the User-Role header and start using JWTs instead of cookies.

Then change the header to “admin” in the eyes of the server:

Which is good in theory, although there’s nothing inherently more secure about JWTs. The 2nd security audit didn’t mention anything else regarding Authorization, so it must be all good.

Spoiler alert: it was not all good.

Nothing reveals the true security of an app like a pentest. And the roughest area was surrounding authorization issues. There are 2 main categories of Authorization issues for web applications: Missing Function Level Access Control (MFLAC) and Insecure Direct Object Reference (IDOR).

MFLACs: These were either completely there or completely not. The AI did a good job of preventing unauthenticated or unauthorized users from accessing an API call if they shouldn’t have any access to it. HTML pages were wide open to any unauthenticated user, however. For pages that you should only be able to get to after authenticating, most people want to keep even the HTML away from prying eyes.

IDORs: This is where it got tricky. If someone should be able to access functionality but not be able to see/do everything, this led to vulnerabilities throughout the application. For example, a regular user should be able to see their own profile information. They should not be able to change the ID to the ID of another user and access the other user’s information. That’s a non-trivial vulnerability on any application.

Even worse, in this app, reception roles were able to add new patients (like if someone was visiting the office for the first time). However, the patient role was restricted on the client-side only, so a malicious reception user could create an admin account. Then, they could login as that account and have full admin access to the app. Note that when prompting to create this functionality, we even specified “ensure that the [new patient] functionality can only add users of the ‘patient’ role.”

This screenshot shows functionality to add a patient

Similarly, there were a fair amount of Business Logic vulnerabilities. This is a broad category in and of itself, but suffice to say that despite the 2nd audit saying that there was protection against “Business Logic Attacks,” these were still present after the pentest.

Making edits to a field via a browser’s dev tools to bypass client-side protections

For example, in the billing functionality, we gave it the formula:
total amount - insurance coverage = amount owed

It wasn’t wise enough to extrapolate that the insurance coverage shouldn’t be more than the total amount and that the amount owed shouldn’t be negative. This dentist office won’t last very long if they have to pay the patients!

Implications of Vulnerabilities

As funny as the AI behaviors are in this instance, what we observed can be extrapolated into some much more telling trends:

  1. Authorization issues will go up
    1. Specifically of the IDOR variety. If you are trusting AI to understand the finer-grained details of privileges and who should do what, it’s just not ready for that yet.
  2. Business logic issues will go up
    1. Anything where there is a context-specific flow or logic, AI will struggle with the finer details. Sometimes those finer details can be vitally important….. like not letting payments be negative!
  3. Injections will likely go down
    1. For injections like SQLi and newly developed apps, a lot more database management systems are implementing “secure by default” methodologies. Note that AI changes to existing applications would likely not improve on the security posture of injections.
    2. For injections like XSS, these might be mitigated by things that seem like a good fix like CSP, but the AI didn’t do a good job of implementing the actual remediation. That leaves the door open for more nuanced exploitation or lesser but still present issues like HTML tag injection.

What Can YOU Do?

Are you an application owner?

You need to have your application penetration tested! Specifically, tested by people who have the ability to understand how the application should behave and can perform security tests based on that expectation.

Not only does automated tooling, e.g. AI coder, have a hard time making apps with robust authorization controls, it also has a hard time testing those things. Testers can use tools to help make the process faster, but at the end of the day, there’s a person interpreting the results. It’s still too complex and nuanced for automation to replace.

Seek out testing that includes cross-role, cross-user, and cross-tenant – that way you can make sure they’re checking from every angle.

Also, as we observed with AI trying to go back on previous security updates, there’s the possibility of new vulnerabilities being introduced with every addition to a codebase. So frequent and change-based penetration testing is also a great idea.

Did I mention that we offer a suite of testing services that will help with all of the above? 😀 Sorry, I had to!

Are you a penetration tester?

No way, me too!

We need to never let up on the gas when it comes to being thorough about Authorization testing. Test every request! Tools like Auth Analyzer help with that, especially in conjunction with Macros. Auth Analyzer helps automate sending requests from different user perspectives and using Macros with certain requests makes testing them possible. E.g. If you want to test a “delete a record” API call, use a Macro to “create a record” before every delete. That way you can still make several “delete a record” requests.

The automation helps to send the requests but they should still be reviewed manually and compared with the developer intent for which roles should do what within the app.

Also, emphasize nuanced testing as opposed to relying on scanners. Scanners and payload lists should definitely be used, I’m not saying they shouldn’t be. However, they shouldn’t be all you’re doing when testing. Scanners can knock out many basic test cases so that you can spend more time on the advanced ones. A scanner doesn’t have the ability to look at a misconfigured CSP and adjust the type of XSS payloads to ones that are more likely to work.

Are you someone who works at one of these AI coder companies?

This is a plea from someone who has really enjoyed the AI coding experience: please make it more secure by default. Include system prompts that assume the user eventually wants to use their code in a production environment.

Also, periodically recommend that the end user perform a security audit. Ideally, make sure that the audit doesn’t cost the user any additional tokens. This could look something like a notification that says “would you like a security audit?” (Clippy wouldn’t have been so annoying if that’s what he was offering!). Perhaps this prompt would be on a time-based cadence, or after X number of lines of code.

Whichever AI coder starts adding in security by default and proactively encouraging security from its users will have my vote in the AI arms race!

Are you a user of AI coders?

No way, me too!

There are a variety of ways that you can provide rules to your AI coder. For example, here’s a good spot to start (note the accompanying blog post). Also, you can generate rules within most AI coders. For example, within Cursor’s chat: /Generate Cursor Rules. You will need to review and fine tune any rules created to fit your own environment and goals. There are a variety of resources out there for how to augment a rules file, too many to compile here, but knowing what to look for is the first step!

Once you’ve compiled a list of rules to help the AI coder be more secure by default, it’s still necessary to both understand secure coding practices and check an AI coder’s work to make sure it’s adhering to those practices.

It’s tedious, but you should not have “auto code” mode on. Different coders have different terms for it, but it’s when it auto-applies code changes and runs tools on your behalf. It’s how you could get to a spot where it’s made irreversible changes. And also, read through all suggestions BEFORE accepting the changes.

Are you a person?

No way, me too!

As someone who might just want to have some “here’s what I would do”-type advice for operating in a world that will be filled with more AI-coded apps, my biggest recommendation is to not aimlessly download things off the app store. And if you are downloading, be very careful about where you’re putting sensitive information like your credit card number.

There’s nothing new there, but with the accessibility of AI coders, people are likely to take advantage of that, churn out apps to make a quick buck, and not worry too much about security.

Conclusion

Regardless of which of the above takeaways are the most relevant to you, you will be affected by AI-created code. We’re doing our best to anticipate trends, prepare accordingly, and help everyone stay safe out there. We hope you do the same!