Deep neural networks are vulnerable to so-called adversarial examples: inputs which are intentionally constructed to cause the model to make incorrect predictions or classifications. Adversarial examples are often visually indistinguishable from natural data samples, making them hard to detect. As such, they pose significant threats to the reliability of deep learning systems. In this work, we study an adversarial defense based on the robust width property (RWP), which was recently introduced for compressed sensing. We show that a specific input purification scheme based on the RWP gives theoretical robustness guarantees for images that are approximately sparse. The defense is easy to implement and can be applied to any existing model without additional training or finetuning. We empirically validate the defense on ImageNet against L^∞perturbations at perturbation budgets ranging from 4/255 to 32/255. In the black-box setting, our method significantly outperforms the state-of-the-art, especially for large perturbations. In the white-box setting, depending on the choice of base classifier, we closely match the state of the art in robust ImageNet classification while avoiding the need for additional data, larger models or expensive adversarial training routines. Our code is available at https://github.com/peck94/robust-width-defense.
An Introduction to Adversarially Robust Deep Learning
Jonathan Peck, Bart Goossens, and Yvan Saeys
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024
The widespread success of deep learning in solving machine learning problems has fueled its adoption in many fields, from speech recognition to drug discovery and medical imaging. However, deep learning systems are extremely fragile: imperceptibly small modifications to their input data can cause the models to produce erroneous output. It is very easy to generate such adversarial perturbations even for state-of-the-art models, yet immunization against them has proven exceptionally challenging. Despite over a decade of research on this problem, our solutions are still far from satisfactory and many open problems remain. In this work, we survey some of the most important contributions in the field of adversarial robustness. We pay particular attention to the reasons why past attempts at improving robustness have been insufficient, and we identify several promising areas for future research.
2023
Improving the robustness of deep neural networks to adversarial perturbations
Over the past decade, artificial neural networks have ushered in a revolution in science and society. Nowadays, neural networks are applied to various problems such as speech recognition on smartphones, self-driving cars, malware detection and even assisting doctors in making medical diagnoses. Often, neural networks achieve accuracy scores that rival or even surpass human domain experts. It is all the more surprising, then, that these same networks can be misled by minor manipulations that are invisible to the human eye. A neural network trained to identify lung cancer from MRI images, for example, can come to an incorrect diagnosis when a single pixel in the image is deliberately manipulated in a very specific way. Despite the fact that these perturbations are of no consequence to the underlying task and often would go unnoticed by human experts, neural networks tend to be incredibly sensitive to them. Developing defense methods which make our models resilient to such attacks therefore becomes paramount. In this work, I propose four methods that can be employed under different circumstances to protect systems based on artificial intelligence against adversarial attacks.
2022
Distilling Deep RL Models Into Interpretable Neuro-Fuzzy Systems
Deep Reinforcement Learning uses a deep neural network to encode a policy, which achieves very good performance in a wide range of applications but is widely regarded as a black box model. A more interpretable alternative to deep networks is given by neuro-fuzzy controllers. Unfortunately, neuro-fuzzy controllers often need a large number of rules to solve relatively simple tasks, making them difficult to interpret. In this work, we present an algorithm to distill the policy from a deep Q-network into a compact neuro-fuzzy controller. This allows us to train compact neuro-fuzzy controllers through distillation to solve tasks that they are unable to solve directly, combining the flexibility of deep reinforcement learning and the interpretability of compact rule bases. We demonstrate the algorithm on three well-known environments from OpenAI Gym, where we nearly match the performance of a DQN agent using only 2 to 6 fuzzy rules.
2020
Inline Detection of DGA Domains Using Side Information
Raaghavi Sivaguru, Jonathan Peck, Femi Olumofin, and
2 more authors
Malware applications typically use a command and control (C&C) server to manage bots to perform malicious activities. Domain Generation Algorithms (DGAs) are popular methods for generating pseudo-random domain names that can be used to establish a communication between an infected bot and the C&C server. In recent years, machine learning based systems have been widely used to detect DGAs. There are several well known state-of-the-art classifiers in the literature that can detect DGA domain names in real-time applications with high predictive performance. However, these DGA classifiers are highly vulnerable to adversarial attacks in which adversaries purposely craft domain names to evade DGA detection classifiers. In our work, we focus on hardening DGA classifiers against adversarial attacks. To this end, we train and evaluate state-of-the-art deep learning and random forest (RF) classifiers for DGA detection using side information that is harder for adversaries to manipulate than the domain name itself. Additionally, the side information features are selected such that they are easily obtainable in practice to perform inline DGA detection. The performance and robustness of these models is assessed by exposing them to one day of real-traffic data as well as domains generated by adversarial attack algorithms. We found that the DGA classifiers that rely on both the domain name and side information have high performance and are more robust against adversaries.
Calibrated Multi-probabilistic Prediction as a Defense Against Adversarial Attacks
Jonathan Peck, Bart Goossens, and Yvan Saeys
In Artificial Intelligence and Machine Learning, 2020
Machine learning (ML) classifiers—in particular deep neural networks—are surprisingly vulnerable to so-called adversarial examples. These are small modifications of natural inputs which drastically alter the output of the model even though no relevant features appear to have been modified. One explanation that has been offered for this phenomenon is the calibration hypothesis, which states that the probabilistic predictions of typical ML models are miscalibrated. As a result, classifiers can often be very confident in completely erroneous predictions. Based on this idea, we propose the MultIVAP algorithm for defending arbitrary ML models against adversarial examples. Our method is inspired by the inductive Venn-ABERS predictor (IVAP) technique from the field of conformal prediction. The IVAP enjoys the theoretical guarantee that its predictions will be perfectly calibrated, thus addressing the problem of miscalibration. Experimental results on five image classification tasks demonstrate empirically that the MultIVAP has a reasonably small computational overhead and provides significantly higher adversarial robustness without sacrificing accuracy on clean data. This increase in robustness is observed both against defense-oblivious attacks as well as a defense-aware white-box attack specifically designed for the MultIVAP.
Regional image perturbation reduces L_p norms of adversarial examples while maintaining model-to-model transferability
Utku Özbulak, Jonathan Peck, Wesley De Neve, and
3 more authors
In the 37th International Conference on Machine Learning (ICML 2020), Proceedings, 2020
Regional adversarial attacks often rely on complicated methods for generating adversarial perturbations, making it hard to compare their efficacy against well-known attacks. In this study, we show that effective regional perturbations can be generated without resorting to complex methods. We develop a very simple regional adversarial perturbation attack method using cross-entropy sign, one of the most commonly used losses in adversarial machine learning. Our experiments on ImageNet with multiple models reveal that, on average, 76% of the generated adversarial examples maintain model-to-model transferability when the perturbation is applied to local image regions. Depending on the selected region, these localized adversarial examples require significantly less L_p norm distortion (for p ∈{0,2,∞}) compared to their non-local counterparts. These localized attacks therefore have the potential to undermine defenses that claim robustness under the aforementioned norms.
2019
CharBot: A Simple and Effective Method for Evading DGA Classifiers
Jonathan Peck, Claire Nie, Raaghavi Sivaguru, and
5 more authors
Domain generation algorithms (DGAs) are commonly leveraged by malware to create lists of domain names which can be used for command and control (C&C) purposes. Approaches based on machine learning have recently been developed to automatically detect generated domain names in real-time. In this work, we present a novel DGA called CharBot which is capable of producing large numbers of unregistered domain names that are not detected by state-of-the-art classifiers for real-time detection of DGAs, including the recently published methods FANCI (a random forest based on human-engineered features) and LSTM.MI (a deep learning approach). CharBot is very simple, effective and requires no knowledge of the targeted DGA classifiers. We show that retraining the classifiers on CharBot samples is not a viable defense strategy. We believe these findings show that DGA classifiers are inherently vulnerable to adversarial attacks if they rely only on the domain name string to make a decision. Designing a robust DGA classifier may, therefore, necessitate the use of additional information besides the domain name alone. To the best of our knowledge, CharBot is the simplest and most efficient black-box adversarial attack against DGA classifiers proposed to date.
Hardening DGA Classifiers Utilizing IVAP
Charles Grumer, Jonathan Peck, Femi Olumofin, and
2 more authors
In 2019 IEEE International Conference on Big Data (Big Data), 2019
Domain Generation Algorithms (DGAs) are used by malware to generate a deterministic set of domains, usually by utilizing a pseudo-random seed. A malicious botmaster can establish connections between their command-and-control center (C&C) and any malware-infected machines by registering domains that will be DGA-generated given a specific seed, rendering traditional domain blacklisting ineffective. Given the nature of this threat, the real-time detection of DGA domains based on incoming DNS traffic is highly important. The use of neural network machine learning (ML) models for this task has been well-studied, but there is still substantial room for improvement. In this paper, we propose to use Inductive Venn-Abers predictors (IVAPs) to calibrate the output of existing ML models for DGA classification. The IVAP is a computationally efficient procedure which consistently improves the predictive accuracy of classifiers at the expense of not offering predictions for a small subset of inputs and consuming an additional amount of training data.
Detecting adversarial examples with inductive Venn-ABERS predictors
Jonathan Peck, Bart Goossens, and Yvan Saeys
In Proceedings of the 27th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2019), 2019
Inductive Venn-ABERS predictors (IVAPs) are a type of probabilistic predictors with the theoretical guarantee that their predictions are perfectly calibrated. We propose to exploit this calibration property for the detection of adversarial examples in binary classification tasks. By rejecting predictions if the uncertainty of the IVAP is too high, we obtain an algorithm that is both accurate on the original test set and significantly more robust to adversarial examples. The method appears to be competitive to the state of the art in adversarial defense, both in terms of robustness as well as scalability
2017
Lower bounds on the robustness to adversarial perturbations
Jonathan Peck, Joris Roels, Bart Goossens, and
1 more author
In Advances in Neural Information Processing Systems, 2017
The input-output mappings learned by state-of-the-art neural networks are significantly discontinuous. It is possible to cause a neural network used for image recognition to misclassify its input by applying very specific, hardly perceptible perturbations to the input, called adversarial perturbations. Many hypotheses have been proposed to explain the existence of these peculiar samples as well as several methods to mitigate them. A proven explanation remains elusive, however. In this work, we take steps towards a formal characterization of adversarial perturbations by deriving lower bounds on the magnitudes of perturbations necessary to change the classification of neural networks. The bounds are experimentally verified on the MNIST and CIFAR-10 data sets.