Ph.D. student,
Ghent University

jonathan.peck@ugent.be

At ICML 2021, Radiya-Dixit and Tramèr argue that, in the long run, data poisoning attacks cannot effectively protect our privacy and thwart unauthorized exploitation of our data for facial recognition purposes. I believe their conclusions are premature as long as shortcut learning remains a significant issue in AI.

It is by now well-known to the public that the core business of companies like Google and Facebook is **indiscriminate data collection:** they thrive on collecting as much personal data as possible on every individual on the planet in order to either sell this data to third parties or develop AI tools based on them, often for the purpose of empowering their surveillance apparatus even more. The most common such tool is, of course, **facial recognition:** by encouraging users to upload photos and videos to their platforms and tagging their friends in them, these companies have in fact curated the world’s largest crowd-sourced annotated data sets of facial images. The users provide free labor in the form of annotated images, which can then be easily extracted and automatically put into a database on which a machine learning model can be trained. These models can subsequently be incorporated into street cameras, for example, enabling the state to construct a large, distributed surveillance system where any individual in public can be identified and located at any time at the mere press of a button. There are various obvious problems with this current state of affairs:

- You have no way of knowing for what exact purposes your data will be used, and no way of opting out. Google and Facebook could be supplying law enforcement with arguably legitimate information, but they could just as well be selling your data to shady brokers you’ve never heard of and who intend to use your data to essentially spy on you. In fact, we know this to be the case. It is the wet dream of stalkers, pedophiles and white-collar criminals.
- There is a very high potential for human rights abuses when facial recognition technology is implemented at scale by governments. It is known, for instance, that China uses facial recognition to track and detain Uyghur people and other non-Han Chinese people (IPVM, NYT). Closer to my home, in the Western world, there is an absolutely frightening increase in pseudo-scientific beliefs surrounding the capabilities of facial recognition technology. For example, people have claimed to use facial recognition to detect whether someone is a criminal, determine someone’s sexual orientation, infer genetic disorders, perform lie detector tests, infer personality traits and more utter nonsense. Aside from having absolutely no scientific basis, these tools claim to reveal highly personal and private information about you using just your face. Companies and governments should not be able to collect this information so easily, if at all. It is nothing short of an existential threat to our democracy.

In the absence of effective legislation to curtail the unauthorized use of personal data, the user is left to their own devices when it comes to protecting their privacy against these invasions. In a surprising turn of events given the general lack of social awareness in academia, academics have actually started to take this problem seriously in the past few years. Vincent et al. (2021) put it this way:

Many powerful computing technologies rely on implicit and explicit data contributions from the public. This dependency suggests a potential source of leverage for the public in its relationship with technology companies: by reducing, stopping, redirecting, or otherwise manipulating data contributions, the public can reduce the effectiveness of many lucrative technologies.

One of the core ideas discussed in their paper is **data poisoning:**

A data poisoning attack is an adversarial attack that inserts inaccurate or harmful training data into a data-dependent technology, thereby encouraging the model to perform poorly.

In the case of facial recognition, the most obvious avenue of attack is to leverage *adversarial perturbations.* These are deliberate modifications of the images we upload to the internet in such a way that humans barely notice or care about the difference, but the operation of typical image recognition models would be disrupted. There already exist public tools that enable users to carry out such attacks, e.g., Fawkes and LowKey. These tools make imperceptible modifications to user images that cause existing recognition systems to fail. However, as Vincent et al. also note, these are not silver bullets:

Progress in adversarial ML could actually end up reducing the public’s poisoning-based data leverage, in which case non-poisoning data levers would become more important. Fundamentally, data poisoners are engaging in a contest with data scientists. This means any data poisoning technique runs the risk of becoming outdated — if a company’s data scientists find or invent a defense, the public might lose leverage.

The main problem facing tools like Fawkes and LowKey is the fact that ML researchers can *adapt* to them, rendering the protections applied to old images irrelevant. In this vein, at the recent ICML 2021 conference, Radiya-Dixit & Tramèr (2021) published an interesting contribution titled *Data Poisoning Won’t Save You From Facial Recognition*. Their central claim is that data poisoning attacks will not, in the long term, deter facial recognition systems. The basic argument of the paper is that, for any poisoning attack, it suffices to simply wait until the attack is defeated by more sophisticated computer vision models. After all, such an attack can be carried out only once: as soon as a user has uploaded an image to the internet and the image has been scraped for use in a machine learning pipeline, the user can no longer change the perturbation. This asymmetry makes it implausible that poisoning attacks will remain effective in the long term: ML practitioners can simply archive the data until the attack is defeated, at which point they can undo the protections and use the images as training data anyway.

Here, I propose a counter-argument to this position. I argue that although specific poisoning attacks used at present may be defeated, this does not preclude the existence of attacks which *cannot* be defeated by *any* machine learning model, or at least not without breakthrough solutions to a major outstanding problem in the field. The argument is similar to a *hardness reduction* in complexity theory: basically, I claim that unless a major open problem in machine learning is resolved, one can construct data poisoning attacks that are essentially impossible to defeat. Specifically, consider the **shortcut problem** discussed by Geirhos et al. (2020):

**Shortcut problem.** “Shortcuts” are decision rules that perform well on standard benchmarks but fail to transfer to more challenging testing conditions, such as real-world scenarios. Deep neural networks often solve problems by taking shortcuts instead of learning the intended solution, leading to a lack of generalisation and unintuitive failures.

Geirhos et al. give many examples of shortcut learning, both in artificial as well as biological neural networks:

For instance, a DNN may appear to classify cows perfectly well—but fails when tested on pictures where cows appear outside the typical grass landscape, revealing "grass" as an unintended (shortcut) predictor for "cow."

[...]

Rats learned to navigate a complex maze apparently based on subtle colour differences— very surprising given that the rat retina has only rudimentary machinery to support at best somewhat crude colour vision. Intensive investigation into this curious finding revealed that the rats had tricked the researchers: They did not use their visual system at all in the experiment and instead simply discriminated the colours by the odour of the colour paint used on the walls of the maze. Once smell was controlled for, the remarkable colour discrimination ability disappeared.

Humans are no strangers to shortcut learning, either:

Alice loves history. Always has, probably always will. At this very moment, however, she is cursing the subject: After spending weeks immersing herself in the world of Hannibal and his exploits in the Roman Empire, she is now faced with a number of exam questions that are (in her opinion) to equal parts dull and difficult. "How many elephants did Hannibal employ in his army—19, 34 or 40?" ... Alice notices that Bob, sitting in front of her, seems to be doing very well. Bob of all people, who had just boasted how he had learned the whole book chapter by rote last night.

In educational research, Bob’s reproductive learning strategy would be considered surface learning, an approach that relies on narrow testing conditions where simple discriminative generalisation strategies can be highly successful. This fulfils the characteristics of shortcut learning by giving the appearance of good performance but failing immediately under more general test settings. Worryingly, surface learning helps rather than hurts test performance on typical multiple-choice exams: Bob is likely to receive a good grade, and judging from grades alone Bob would appear to be a much better student than Alice in spite of her focus on understanding. Thus, in analogy to machine learning we again have a striking discrepancy between intended and actual learning outcome.

These *surface learning* strategies are a real problem in the field of education, as they undermine the reliability of testing outcomes and discourage students from actually trying to understand the subject matter. Unsurprisingly, research shows that in order to combat surface learning, teachers must develop questionnaires that demand a thorough understanding of the material rather than focusing on purely factual information that is readily apparent in the text. However, this is easier said than done in practice, and it is not clear at all how this could be incorporated into a machine learning pipeline in order to discourage shortcut learning, especially at scale. In particular, when we are trying to build a large-scale facial recognition system, it is hard to come up with any concrete objective functions we could use to encourage the network to truly “understand” what a face is and why it belongs to a particular person, let alone optimize such functions in an efficient training algorithm. Indeed, for many tasks, it seems like we would not even need machine learning at all if we could develop such algorithms!

The take-away here is that essentially *all* learning systems are prone to shortcut learning, which suggests that it is a fundamentally hard problem for learners to cope with. The question of how to tackle shortcut learning in AI is probably one of the biggest open problems in the entire field at this moment, and I believe it is unlikely to be resolved anywhere in the near future, if ever. My justification for this belief is the fact that even humans are prone to learning shortcuts. Hence, if we are to resolve this problem in ML, we need to develop learning algorithms that are more sophisticated than the human brain itself. At present, this seems about as likely as the human race colonizing Io, the third moon of Jupiter. It isn’t necessarily impossible, but you would be a fool to bet on it happening anywhere in our lifetime.

Furthermore, if one is aware of the particular shortcut used by a learner, it is easy to exploit it in order to sabotage the performance of the system. Hence, my basic argument is this:

Given that ML systems are prone to shortcut learning and will in all likelihood remain so for the foreseeable future, effective data poisoning attacks can be carried out by deliberately inserting shortcuts into the training data that do not generalize to the real world.

It is important to note, of course, that I cannot present a formal theorem that mathematically proves this statement beyond any doubt. However, Radiya-Dixit & Tramèr do not provide a formal mathematical proof either: they support their claim using an informal but plausible line of reasoning backed up with empirical results that seem to confirm it. Therefore, I feel that I should be able to do the same: if one is to criticize my counter-argument based on a lack of mathematical rigor, then the same criticism applies to Radiya-Dixit & Tramèr.

That said, it is easy to prove a result which comes very close to formally establishing my point. Specifically, in the following theorem, I show that one can very easily “hide” signals in the input data such that the functional relationship with the output becomes *linear.* Without these signals, it is highly unlikely that a linear predictor for facial recognition would be accurate. However, these hidden signals serve as a shortcut that the learner is very likely to exploit, because it is a very easy relationship that can offer high accuracy on training data.

**Notation.** In the following, I will use the notation $\overline{X}$ to represent the average of the components of a vector $X = [X_1, \dots, X_n]$. That is,

$$\overline{X} = \frac{1}{n}\sum_{i=1}^nX_i.$$

Furthermore, $\|X\|$ will refer to any $\ell_p$ norm of $X$:

$$\|X\| = \left(\sum_{i=1}^n|X_i|^p\right)^{1/p}.$$

**Theorem 1.** Let $\mathcal{D}$ be any data distribution over $(X,Y)$ where $X \in [0,1]^d$ and $Y \in \{-1,+1\}$. Let $\mu = \mathbb{E}[\overline{X}]$ and $\varepsilon > 0$. Then there exists an adversarial data distribution $\tilde{\mathcal{D}}$ over $(\tilde{X}, Y)$ and a vector $w \in \mathbb{R}^d$ such that the following properties hold:

- The original data $X$ and adversarial data $\tilde{X}$ are $\varepsilon$-close: $$\Pr[\|X - \tilde{X}\| \leq \varepsilon] = 1.$$
- A linear predictor can achieve high accuracy on the adversarial data distribution: $$\Pr[\mathrm{sgn}(w^\intercal\tilde{X} - \mu) = Y] \geq 1 - \exp\left(-\frac{\varepsilon^2d^{1 - 2/p}}{2}\right).$$

*Proof.* Let $\tilde{X} = X + \Delta(X,Y)$ where $\Delta(X,Y) = \alpha[Y, \dots, Y]^\intercal$ for some $\alpha > 0$. Then

$$\|X - \tilde{X}\| = \|\Delta(X,Y)\| = \alpha d^{1/p}.$$

To satisfy the first requirement, it suffices to choose $\alpha = \varepsilon/d^{1/p}$.

For the second requirement, we let $w = \frac{1}{d}[1, \dots, 1]^\intercal$. Then

$$w^\intercal\tilde{X} - \mu = \frac{1}{d}\sum_{i=1}^dX_i + \alpha Y - \mu = \overline{X} + \alpha Y - \mu.$$

In order to have $\mathrm{sgn}(w^\intercal\tilde{X} - \mu) = Y$, it is sufficient that $|\overline{X} - \mu| \leq \alpha$. Given the random vector $X = [X_1, \dots, X_d]^\intercal$, we can form the Doob martingale

$$Z_t = \mathbb{E}[\overline{X} \mid X_1, \dots, X_t].$$

This martingale is Lipschitz because the components $X_i$ are bounded:

$$|Z_{t+1} - Z_t| \leq \frac{1}{d}.$$

By Azuma’s inequality, we have

$$ \Pr[|\overline{X} - \mu| \leq \alpha] \geq 1 - \exp\left(-\frac{\alpha^2d}{2}\right) = 1 - \exp\left(-\frac{\varepsilon^2d^{1 - 2/p}}{2}\right). $$

This yields

$$\Pr[\mathrm{sgn}(w^\intercal\tilde{X} - \mu) = Y] \geq 1 - \exp\left(-\frac{\varepsilon^2d^{1 - 2/p}}{2}\right),$$

which completes the proof. $\blacksquare$

Setting $\eta = \exp\left(-\frac{\varepsilon^2d^{1 - 2/p}}{2}\right)$, we find

$$\varepsilon = \sqrt{\frac{2}{d^{1 - 2/p}}\log\frac{1}{\eta}}.$$

Hence, theorem 1 can be restated as a function of the desired level of accuracy of the linear approximation rather than the perturbation budget.

One concrete implication of theorem 1 is that, at least for high dimensional data sets in the $\ell_\infty$ threat model, it is possible to embed small perturbations in the data such that the corrupted samples have a very simple linear relationship with the output label. For example, on the CIFAR-10 data set, we would be able to achieve 77.95% accuracy using a linear predictor on corrupted data with $\varepsilon = 8/255$. On the ImageNet data set at $224 \times 224$ pixel resolution, we basically achieve 100% accuracy at this perturbation budget. In turn, this implies that any *sufficiently complex* (i.e., capable of expressing at least the class of linear functions) machine learning model trained on this data will risk learning a simple linear relation between the data and the labels regardless of what the real underlying relation actually is. This is especially true if regularization schemes are used which bias the learner towards simpler hypotheses. For the types of computer vision problems studied in modern machine learning, linear predictors are a very poor fit and hence it is highly likely that models trained on such corrupted data will not generalize at all.

Of course, this argument is not a formal proof that protection against data poisoning is impossible; the mere presence of an artificial linear relationship in the corrupted data does not imply that any particular machine learning model trained on this data will focus on it and fail to generalize. However, modern deep neural networks are notoriously prone to this type of shortcut learning where simple but non-generalizing properties of the data are used to predict the label, and it turns out to be very easy to insert such shortcuts artificially: a great example of this is the recent work by Huang et al. (2020) on “unlearnable samples.” Arguably, shortcut learning is one of the biggest open research problems in machine learning today, and I believe data poisoning will remain a concern for ML practitioners as long as it is not resolved.

I therefore believe that the conclusions of Radiya-Dixit & Tramèr (2021) are premature, since they only take into consideration a few specific data poisoning tools that exist at this time. While I certainly agree that their argument applies to the particular approaches taken by Fawkes and LowKey, I am not convinced it holds in general. Although theorem 1 only proves the existence of a specific linear relationship that can be inserted as a shortcut for binary classification problems, I am very confident that similar results hold for multi-class problems as well, and that one can likely use infinitely many distinct polynomial functions instead of the particular linear one I have considered here.

There are infinitely many simple shortcuts one can artificially embed into data without hindering human perception. In my opinion, when a data set is littered with shortcuts, it is much more likely that a machine learning system will pick up on the artificial signals and learn the shortcuts rather than the much more complicated true relationship between the input and output. All that’s missing is a user-friendly tool that puts this idea into practice.