Webinar Recap: The AI Balancing Act: Benchmarking LLMs for Usability vs. Security

TL;DR

Security or usability? When it comes to large language models (LLMs), it’s not always possible to have both. In a recent webinar, Kurtis Shelton and Defy Security’s John Tarn break down how modern security teams are approaching LLM security without sacrificing too much functionality.

In this webinar recap article, explore expert insights from one of the minds behind NetSPI’s Open LLM Security Benchmark, explore real-world trade-offs between technical, regulatory, and compliance-based approaches, and walk away with a tactical framework for benchmarking LLMs in your own company.

Watch the Webinar.

Whether you’re implementing GenAI models, building your own internally, or guiding product teams, asking the right questions can better secure your business. Here’s what we covered:

Common vulnerabilities in today’s LLMs (and how they’re being exploited)
What “usable” vs “secure” actually means in practice
How to build a repeatable framework for LLM benchmarking
Our top three pieces of advice on benchmarking LLMs

The Current State of AI Vulnerabilities

LLMs are transforming how we work, especially within the cybersecurity industry. Their potential for efficiency and innovation is immense, yet it introduces a familiar tension: the struggle between usability and security. As businesses race to adopt Generative AI (GenAI), understanding this fundamental trade-off and how to navigate it is paramount to protecting your business without losing the functionality that makes these tools valuable.

The core philosophy guiding modern AI security acknowledges that vulnerability is inevitable. After all, math is math, and because these models are designed to perform a particular function, they will always have some level of vulnerability. This “inevitability of breakability” mindset is not just to show that no model may be truly “defensible”, but rather to help ensure that when a model breaks, the surrounding infrastructure can cushion the impact.

AI vulnerabilities generally fall into three broad categories:

1. Technology-Specific Vulnerabilities

This is the area that draws the most industry interest, focusing on exploiting the model’s target function. Common threats include:

Model Extraction: Stealing a model’s target function without accessing the original training data.
Evasion: Perturbing a model to return improper classifications, effectively tricking it.
Data Extraction/Inference: Verbatim data extraction or mathematically verifying that a specific piece of data was used in the model’s training repository.
Poisoning: Introducing malformed data or hard-coding a particular malicious behavior during training or deployment.
Prompt Injection (Direct & Indirect): This is a primary concern with LLMs. It involves coercing the model to behave outside its guardrails, often by directly interfacing with it (direct) or by seeding malicious data somewhere the model will later parse (indirect). This is especially prevalent when models leverage protocols like model context protocol (MCP) or frameworks like agent-to-agent (A2A) to interface with external functionality.

2. Compliance and Regulation

This category is often overlooked but critically important, especially with generative models. The versatility and stochasticity (randomness) of LLMs make tracing their decision-making processes difficult. As John Tarn, Defy Security Solutions Architect, highlights, regulations like GDPR are pushing core principles such as lawfulness, fairness, and transparency.

Key concerns for compliance and regulations include:

Traceability and Bias: Can we track when an LLM becomes biased and provide a mechanism for correction?
Purpose Limitation: Establishing guardrails to ensure an agent cannot act in a certain way (e.g., providing complete misinformation or materially affecting a citizen’s life, as in a home loan process).
Data Minimization: Ensuring the model is trained with enough data to be useful, but not so much that it contains excessive personally identifiable information (PII) or protected health information (PHI).
Integrity and Confidentiality: Proving confidential data will never be returned in a prompt request and establishing an audit process to back that up.

Finding the ‘Sweet Spot’ with Benchmarking

The core struggle in AI security is bridging the gap between security groups, who traditionally “want nothing” (as little risk as possible), and data groups, who “want everything” (as much data as possible for performance).

The Intersection of Function

To find the right balance, businesses must triangulate three things:

Model’s Target Function: What the math is actually doing.
Model’s Business Function: How it’s making or preserving/generating money for the business.
Vulnerability Umbrella: The full spectrum of potential threats.

The intersection of these three points tells you which vulnerabilities you should actually care about. For example, an image classification model used by a bank to verify check amounts is at a far greater risk from evasion than the same model used by a pet shop for fun breed classification.

Usability vs. Breakability

NetSPI’s benchmarking framework for LLMs focuses on two key measurements. The first is breakability, which is essentially how easy it is to jailbreak a model or use prompt injection to elicit overt, undesirable behavior. The second is usability, as in how useful the model remains when contextualized within the given business’s operational reality. Benchmarking allows organizations to measure these values over time, determining if a security guardrail is providing the proper influence and if the risk of remediation (e.g., adversarial training, which can diminish target function) is worth the security gain.

The AI Interface Problem

While current efforts are positive, with some frontier models showing improved security without a noticeable drop in usability, new challenges are on the horizon, largely driven by the stochasticity and human-like interfaces of LLMs.

The most pressing future concern is the interaction of LLMs with existing security tools.

Tools like SQL Map or Nmap, which are formulaic in nature, struggle when an LLM is placed in front of a system. If a user can free-type a natural language query that the LLM then translates into a SQL request for a database, the random, human-adjacent language element of the LLM makes traditional, formulaic attacks unreliable.

The future of benchmarking and security must therefore include:

Technology Fingerprinting: Quickly determining which external technologies an LLM is actively interfacing with (e.g., a database, an email client).
Contextual Breakability: Benchmarking an LLM’s willingness to go outside its guardrails specifically to achieve a meaningful, technology-specific vulnerability (e.g., leveraging prompt injection to trigger a SQL injection due to improper permissions).

Top Advice for Securing Your LLMs

To successfully navigate the AI balancing act, organizations must shift their mindset and approach to implementation.

1. Start with the Business Goal, Not the Tool

Before deploying any LLM, define what you are trying to accomplish. As John Tarn advises, “What is this going to improve in your business?” Determine the desired outcome for your business function (e.g., faster SOC response, better threat intel collation) before selecting the inputs, models, and tools. Simultaneously, pull in Governance, Risk, and Compliance (GRC) to define the risk tolerance and blast radius if the model is breached, poisoned, or tampered with.

2. Appoint or Find a Liaison

The most valuable asset in this process is the person who can act as the liaison between the data and security groups. This individual possesses the knowledge to bridge the two mindsets, helping both teams understand how to secure the technology without totally gutting its utility.

3. Reject the Assumption of Security

The single biggest mistake companies make when moving too quickly with GenAI projects is assuming the models are secure. Never assume an LLM will track permissions, enforce business logic, or handle sensitive data tracking as these are tasks that should be handled at the server or client level. Build your application with the knowledge that the model will eventually behave in a stochastic and unpredictable way, and ensure your surrounding infrastructure is properly postured to mitigate the damage.

By focusing on the intersection of function, accepting the inevitability of vulnerability, and prioritizing security-informed decision-making, organizations can leverage the transformative power of LLMs with a calculated and managed level of risk.

NetSPI Can Help

By focusing on the intersection of function and anticipating future threats like the stochasticity problem, NetSPI ensures your strategy covers all three pillars: technical, compliance, and regulation. We provide the math and the traceable empirical evidence so you can deploy your LLMs with confidence and control. Contact us to see what we can do for your business or watch the webinar replay for more details.

Hardware Pentesing

Chubb partners with NetSPI to bring attack surface management to its policyholders

Partner with NetSPI