Think Outside the Prompt: Creative Approaches to Attacking AI

The practitioners of machine learning were recently reminded that we are not isolated from security threats. Recently, the PyTorch-nightly dependency chain was compromised. According to its website, PyTorch is “an open source machine learning framework that accelerates the path from research prototyping to production deployment.” Meaning, if compromised by adversaries a baseline result could be running malware on their systems that would extract credentials and other sensitive information from the build computers, but it could have been much worse. Depending on the attack scenario, the AI models themselves could have been modified or extracted, which would result in various attack scenarios depending on the intended use of the model.

While machine learning seems like a new and exciting field, it isn’t a new discipline. Artificial intelligence (AI) has been a mathematical discipline since we figured out how to use a delta function to gradually decrease the error of a given equation. Yet, we have not put the emphasis on security like we have with other computer programming disciplines.

Machine learning is a new discipline and requires new strategies to secure, because like all software projects there are always vulnerabilities. So, how can one hack an artificial intelligence? Well there are a number of ways depending on the technique used to implement it.

Hacking AI IRL

Let’s start with what happened recently. One of the major machine learning frameworks fell victim to what appears to be a type squatting attack. An attacker created a library that was similar or in some cases the same as a library that is called by an application, when the application, a python framework in this instance is updated, the repository pulls from the match that has the most priority which can be the most recently updated.

This resulted in a framework that was kept at the cutting edge of development, installing code that would copy protected information out of the OS that the framework was running as that could be used to create a breach into that application.

As bad as that is, it could be worse.

It could be possible to create an edge case or blind spot in an application built on a tampered framework that would affect its outcome during training. Imagine if some data set that the programmer intends to train the model on was provided by the injected code. It would be possible for any number of scenarios to occur, from market manipulation, self-driving cars not recognizing pedestrians who wore the wrong color shirt, to any number of weird scenarios. This leads me to another means to corrupt an AI.

Data Poisoning Attacks

There is an old programming adage that goes, “garbage in, garbage out” this is pretty much the thought process of a poisoning data attack. This is an attack in which an attacker corrupts the training data used to create an AI model, either during the initial or ongoing learning tracks.

For example, one of the most famous examples of this is Tai, a model created by Microsoft to handle natural language processing. However, it was designed to learn continuously, and a select few individuals decided to provide Tai with a dataset that consisted of offensive and biased language that made its responses reflect the data that it was trained with. Any AI that learns from information after it’s been put into production can be poisoned. Examples would be security software that monitors standard behavior being slowly modified so that “unusual” tasks become “normal,” or self-driving cars that are programmed to adjust to sensors that degrade may learn to react poorly to some incoming sensor data.

If the data provided to a model isn’t poisoned, it can still be hijacked. The most common method is model inversion or invasion, in this instance an attacker would need to have gained access to the model itself, and developed an area of the model where the classification may not be as strong as desired and look for cases that may not be entirely correct. One method to accomplish this may be to fuzz the inputs of a model and look for changes, but a more effective method would be to implement an AI to attack the original; such a technique would be creating Generative Adversarial Networks to create models that are close to the original training set in key points but different enough that they should not be accepted if the model was as precise as was intended.

In some case it may be possible to reverse engineer the model, by viewing the activation weighs of each of the hidden layers in order to speed up the above attacks, but this is a complicated process and can be easily protected against by adding a layer of encryption to the completed model.

How to Protect ML and AI from Attacks

There are a couple of methods that can be used to protect machine learning and artificial intelligence projects that are reasonably effective.

Protect your development environment: this includes your source code, your training sets, and filtering any data that gets added into your model.
Use a diverse training set: Depending on the data used for the model, consider employing more diverse data sets. While GAN’s are primarily thought of to help produce realistic images and sounds, they do so by finding connections and pathways that trick other AI’s into accepting they are created by something other than an AI. If the GAN starts to develop a training set that deviates greatly from the preset training data, then it may introduce weird edge cases where unexpected results may result. Such as finding patterns where Masterprints and “Accessorize to a Crime: Real and Stealthy Attacks on State-of-the-Art Face Recognition” and Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning are possible.
Watermark or encrypt the model to make it harder for attackers to reverse engineer or clone.
Risk assessment: Before deploying any machine learning model, it’s important to assess the risks that the model may be targeted by and what risk it might pose — this includes identifying any potential attack vectors, and the likelihood/impact of a successful attack.
Monitoring and logging: flag suspicious access and any data supplied to the model. Occasionally confirm that no data resulted in atypical results.
Continual testing: Regularly testing the model to check that it hasn’t deviated in an unwarranted manner, check the hidden layers for areas that may be vulnerable to bad input. This can be done via fuzzing, visual analysis, removing layers and checking results, or just continuous evaluation of the model.

It’s important to note that these steps may not be a one-time occurrence. Protecting any software project is a continuous process, even one that’s operation is sometimes referred to as black magic, as the threat landscape and the technology is constantly changing. Additionally, it’s important to have a team that has knowledge in both the realms of AI and cybersecurity. The malicious mindset is invaluable to protecting assets such as this and even the most thorough engineer can miss some of the more devious “what-ifs.”

Chubb partners with NetSPI to bring attack surface management to its policyholders

Partner with NetSPI

Think Outside the Prompt: Creative Approaches to Attacking AI

Larry 'Patch' Trowell

Hacking AI IRL

Data Poisoning Attacks

How to Protect ML and AI from Attacks

Explore More Blog Posts

ADPathFinder: OpenGraph Attack Path Mapping in BloodHound CE

Confidence Over Noise: Introducing Continuous AI Findings Validation

Bypassing Microsoft Entra Conditional Access Policies via Nested App Authentication