How App Integration Transactions Increase the Attack Surface of LLMs

OpenAI recently released AppsSDK, AgentKit, and “Buy It” within their LLM ecosystem.

While the concept itself has existed via frameworks such as Model Context Protocol (MCP) and N8n, this is a major breakthrough in bringing these features to the mainstream. Through integrations with platforms like Zillow, Spotify, Etsy, Walmart, and dozens of others, we’re witnessing an acceleration of LLMs from text generators into active agents capable of making purchases, scheduling appointments, and executing transactions and actions with third-party integrations.

While the technology demonstrates impressive capabilities, the fast-paced deployment timeline we’re seeing across LLM providers can be particularly challenging from a security perspective. For example, Apps SDK, AgentKit, and “Buy It” were released days apart to accelerate third-party integrations, enabling developers to quickly connect their services to ChatGPT. The speed of rollout means security teams have limited time to understand and mitigate emerging risks before these integrations reach millions of users. Each new integration multiplies the potential attack surface, creating pathways for unauthorized transactions, data exfiltration, and supply chain compromises.

From a security perspective, evolving our approach is critical. We’ll soon be facing an increase in scenarios where compromised and/or manipulated LLMs can interact with entire ecosystems of commerce and service platforms. This will require LLMs to be more restrictive and security-oriented when filtering prompt injections.

The bottom line: this isn’t just another feature update. It’s a fundamental expansion of what LLMs can do, making it easier to access functionality that was technically possible before, but required more effort and understanding of the process for integration. Now, with less thought into what’s happening behind the scenes and how these agent ensembles are working together, our security measures need to evolve accordingly.

The Weakest Links in the AI Chain

The security of these new transactional AI systems depends on multiple factors, many of which are prone to failure, so much so that through 2029, over 50% of successful cybersecurity attacks against AI agents will exploit access control issues, using direct or indirect prompt injection as an attack vector. Most companies rely on one of the five mainstream model providers: OpenAI, Google, Meta, xAI or Anthropic. This consolidation means that businesses are at the mercy of the provider’s implementation quality and the model’s hallucination rate. Granted, those are generally more resilient than a self-hosted and trained model.

However, the most significant vulnerabilities often arise from third-party integrations, as these introduce more moving parts and variability. Common issues include:

Poor Authentication/Authorization: Inadequate verification of user identity and permissions can allow attackers to impersonate users or escalate privileges.
Weak System Prompt Controls: Insufficiently robust system prompts that define the AI’s behavior and limitations can be bypassed, allowing attackers to override intended safeguards via direct and/or indirect prompt injection.
Race Conditions and Workflow Integrations: The complex interplay between the LLM, third-party apps, and user inputs can create race conditions or other workflow vulnerabilities that attackers can exploit to bypass security checks.

Why “Buy It” Is Just the Beginning

OpenAI is the first to pioneer this type of in-house feature for their LLM, but others will likely follow suit. The reality is that LLMs are currently extremely expensive. Training and operating these models are incredibly resource-intensive, with reports indicating OpenAI’s costs ran into the billions in 2024. As a result, AI model providers will continue pursuing new revenue streams. Introducing features, ad revenue, and transaction fees from e-commerce integrations seems like the next logical step.

As this technology matures, more applications will adopt it, expanding the attack surface and increasing the urgency for thorough security testing and validation with partners like NetSPI.

Secure LLM Implementations with NetSPI

The integration of transactional capabilities into LLMs represents a monumental shift in AI security. Traditional security measures are no longer sufficient to address these new, complex threats. Organizations must proactively test the resilience of their AI and LLM implementations to protect against unauthorized actions and financial loss.

NetSPI’s AI/ML Penetration Testing service is specifically designed to assess the security of LLM implementations across a wide spectrum of applications. Our team helps identify and remediate vulnerabilities in AI workflows before they can be exploited. Schedule a demo to see how proactive security can protect your AI-powered applications.

Hardware Pentesing

Chubb partners with NetSPI to bring attack surface management to its policyholders

Partner with NetSPI

How App Integration Transactions Increase the Attack Surface of LLMs

Rafael Seferyan

OpenAI recently released AppsSDK, AgentKit, and “Buy It” within their LLM ecosystem.

The Weakest Links in the AI Chain

Why “Buy It” Is Just the Beginning

Secure LLM Implementations with NetSPI

Authors:

Explore More Blog Posts

Turning Regulation into a Resilience Advantage: 6 Top Pentesting Tips for CISOs

Webinar Recap: How to Keep Your CISO Out of Jail

Decrypting VM Extension Settings with Azure WireServer