Common Terminology in Adversarial Machine Learning

Artificial Intelligence (AI) and Machine Learning (ML) have vast applications in the cyber space. With its quick adoption and limitless possibilities, the industry is in need of authorities who can provide expertise and perspective to help guide other professionals in their exploration of Large Language Models (LLMs). One of the best ways to start learning a new area is by studying the common terminology practitioners use. We created this glossary of terms to help anyone researching AI and ML to gain an understanding of discussions around Adversarial Machine Learning

Artificial Intelligence (AI) versus Machine Learning (ML) 

Before we dive in, let’s level set on the differences between AI and ML, or perhaps the lack thereof.  

Artificial Intelligence 

Artificial Intelligence is a broader field that focuses on creating machines that can perform tasks that typically require human intelligence. It aims to build systems that can reason, learn, perceive, and understand natural language, among other capabilities. AI encompasses various techniques, and machine learning is one of its subfields. 

Machine Learning 

Machine Learning is a subset of AI that deals with designing algorithms and models that enable computers to learn from data without explicit programming. Instead of being programmed with specific rules, ML models use patterns and examples to improve their performance on a given task. ML can be further divided into different categories, such as supervised learning, unsupervised learning, and reinforcement learning, each suited for different types of learning tasks. 

While they are closely related areas, they do have nuanced differences. To put it concisely, AI is a broader field that encompasses various techniques and methods to create intelligent systems, while ML is a specific approach within AI that focuses on learning from data to improve task performance. 

At this point in time, definitions within the realm of Adversarial Machine Learning (AML) lack standardization. We recognize the significance of setting clear and robust definitions to shape the future of AML, which is why our team is integrated in refining and solidifying these definitions to help establish industry standards. By leveraging NetSPI’s expertise and in-house knowledge, we strive to present definitions that are not only comprehensive but also accurate and relevant to the current state of AML.

Key Terminology in AI Cybersecurity

Adversarial AttacksTechniques employed to create adversarial examples and exploit the vulnerabilities of machine learning models.
Adversarial Example DetectionMethods designed to distinguish adversarial examples from regular clean examples and prevent their misclassification.
Adversarial ExamplesAML hinges on the idea that machine learning models can be deceived and manipulated by subtle modifications to input data, known as adversarial examples. These adversarial examples are carefully crafted to cause the model to misclassify or make incorrect predictions, leading to potentially harmful consequences. Adversarial attacks can have significant implications, ranging from evading spam filters and malware detection systems to fooling autonomous vehicles’ object recognition systems.
Adversarial Learning/TrainingA learning approach that involves training models to be robust against adversarial examples or actively generating adversarial examples to evaluate the model’s vulnerability.
Adversarial Machine Learning (AML)A field that focuses on studying the vulnerabilities of machine learning models to adversarial attacks and developing strategies to enhance their security and robustness.
Adversarial PerturbationsSmall, carefully crafted changes to the input data that are imperceptible to humans but can cause significant misclassification by the machine learning model.
Adversarial Robustness EvaluationThe process of assessing the robustness of a machine learning model against adversarial attacks, often involving stress testing the model with various adversarial examples.
Adversarial TrainingA defense technique involving the augmentation of the training set with adversarial examples to improve the model’s robustness.
AutoencodersNeural network models trained to reconstruct the input data from a compressed representation, useful for unsupervised learning and dimensionality reduction tasks.
Batch NormalizationA technique used to improve the training stability and speed of neural networks by normalizing the inputs of each layer.
Bias-Variance TradeoffThe tradeoff between the model’s ability to fit the training data well (low bias) and its ability to generalize to new data (low variance).
Black-Box AttacksAdversarial attacks where the attacker has limited knowledge about the target model, usually through input-output interactions.
Certified DefensesDefense methods that provide a “certificate” guaranteeing the robustness of a trained model against perturbations within a specified bound.
Cross-Entropy LossA loss function commonly used in classification tasks that measures the dissimilarity between the predicted probabilities and the true class labels.
Data AugmentationA technique used to increase the diversity and size of the training dataset by generating new samples through transformations of existing data.
Decision BoundariesThe dividing lines or surfaces that separate different classes or categories in a classification problem. They define the regions in the input space where the model assigns different class labels to the data points. Decision boundaries can be linear or nonlinear, depending on the complexity of the classification problem and the algorithm used. The goal of training a machine learning model is to learn the optimal decision boundaries that accurately separate the different classes in the data.
Defense MechanismsTechniques and strategies employed to protect machine learning models against adversarial attacks.
DefenseGANA defense technique that uses a Generative Adversarial Network (GAN) to project adversarial perturbed images into clean images before classification.
Deep LearningA subfield of machine learning that utilizes artificial neural networks with multiple layers to learn hierarchical representations of data. 
Discriminative ModelsModels that learn the boundary between different classes or categories in the data and make predictions based on this learned decision boundary.
DropoutA regularization technique where random units in a neural network are temporarily dropped out during training to prevent over reliance on specific neurons.
Ensemble MethodsRefer to machine learning techniques that combine the predictions of multiple individual models to make more accurate and robust predictions or decisions. Instead of relying on a single model, ensemble methods leverage the diversity and complementary strengths of multiple models to improve overall performance.
Evasion AttacksAdversarial attacks aimed at perturbing input data to cause misclassification or evasion of detection systems.
Feature EngineeringThe process of selecting, transforming, and creating new features from the available data to improve the performance of a machine learning model.
Generative ModelsModels that learn the underlying distribution of the training data and generate new samples that resemble the original data distribution.
Gradient DescentAn optimization algorithm that iteratively updates the model’s parameters in the direction of steepest descent of the loss function to minimize the loss.
Gradient Masking/ObfuscationDefense methods that intentionally hide or obfuscate the gradient information of the model to make adversarial attacks less successful.
Gray-Box AttacksAdversarial attacks where the attacker has partial knowledge about the target model, such as access to some internal information or limited query access.
HyperparametersParameters that are not learned from data during the training process but are set by the user before training begins. These parameters control the behavior and performance of the machine learning model. Unlike the internal parameters of the model, which are learned through optimization algorithms, hyperparameters are predefined and chosen by the user or the machine learning engineer.
L1 and L2 RegularizationTechniques used to prevent overfitting by adding a penalty term to the model’s objective function, encouraging simplicity or smoothness.
Mean Squared Error (MSE)A commonly used loss function that measures the average squared difference between the predicted and true values.
Neural NetworksComputational models inspired by the structure and functioning of the human brain, consisting of interconnected nodes (neurons) organized in layers.
Offensive Machine Learning (OML)The practice of leveraging machine learning techniques to design and develop attacks against machine learning systems or to exploit vulnerabilities in these systems. Offensive machine learning aims to manipulate or deceive the target models, compromising their integrity, confidentiality, or availability.
OverfittingA phenomenon where a machine learning model becomes too specialized to the training data and fails to generalize well to new, unseen data.
Poisoning AttacksAdversarial attacks involving the injection of malicious data into the training set to manipulate the behavior of the model.
Precision and RecallEvaluation metrics used in binary classification tasks to measure the model’s ability to correctly identify positive samples (precision) and the model’s ability to find all positive samples (recall).
Regularization MethodsTechniques that penalize large values of model parameters or gradients during training to prevent large changes in model output with small changes in input data.
Reinforcement LearningA machine learning paradigm where an agent learns to take actions in an environment to maximize a cumulative reward signal. A learning paradigm where an agent interacts with an environment, receiving rewards or penalties based on its actions, to learn optimal policies.
Robust OptimizationDefense techniques that modify the model’s learning process to minimize misclassification of adversarial examples and improve overall robustness.
Security-Accuracy Trade-offThe trade-off between the model’s accuracy on clean data and its robustness against adversarial attacks. Enhancing one aspect often comes at the expense of the other.
Semi-Supervised LearningA learning paradigm that combines labeled and unlabeled data to improve the performance of a model by leveraging the unlabeled data to learn better representations or decision boundaries.
Supervised LearningA machine learning approach where the model learns from labeled training data, with inputs and corresponding desired outputs provided during training. 
Transfer AttacksAdversarial attacks that exploit the transferability of adversarial examples to deceive target models with limited or no direct access.
Transfer LearningA technique that leverages knowledge learned from one task to improve performance on a different but related task.
TransferabilityThe ability of adversarial examples generated for one model to deceive other similar models.
UnderfittingA phenomenon where a machine learning model fails to capture the underlying patterns in the training data, resulting in poor performance on both the training and test data.
Unsupervised LearningA machine learning approach where the model learns patterns and structures from unlabeled data without explicit output labels.
White-Box AttacksAdversarial attacks where the attacker has complete knowledge of the target model, including its architecture, parameters, and internal gradients.

Want to continue your education in Adversarial Machine Learning? Learn about NetSPI’s AI/ML Penetration Testing

Discover how NetSPI ASM solution helps organizations identify, inventory, and reduce risk to both known and unknown assets.