A simple upper bound on adversarial robustness

One of the most common threat models for adversarial perturbations is the \(\ell_p\) threat model. Under this model, an input \(\tilde{x}\) is said to be \(\varepsilon\)-adversarial for a classifier \(f: X \to Y\) and original sample \(x\) if the following properties are satisfied:

\[\begin{aligned} f(x) &\neq f(\tilde{x}), & \|x - \tilde{x}\|_p &\leq \varepsilon. \end{aligned}\]

That is, the classifier assigns different labels to \(x\) and \(\tilde{x}\) despite the fact that they are \(\varepsilon\)-close to each other in \(\ell_p\) norm. The idea is that \(\varepsilon\) is set sufficiently small so that no perturbation of this magnitude should change the classification of a natural data sample. It is then of course tacitly assumed that “natural” samples will always lie \(\varepsilon\)-far from any decision boundary.

Note that the \(\ell_p\) threat model essentially only constrains the norm \(\|\delta\|\) of the perturbation \(\delta\). This leads to a simple impossibility result, which I prove below. I consider the setting of linear classifiers, where our model takes the form

\[f(x) = w^\intercal x + b.\]

I will focus on the \(\ell_2\) norm here, so \(\|\cdot\| = \|\cdot\|_2\). Before stating the result, recall the definition of a sub-Gaussian random variable. A random variable \(X\) is called sub-Gaussian with variance proxy \(\sigma^2\) if \(\mathbb{E}[X] = 0\) and

\[\mathbb{E}[\exp(sX)] \leq \exp(\sigma^2s^2/2)\]

for all \(s \in \mathbb{R}\). This assumption is widely used in probability, as a large and useful class of random variables satisfies it. In particular, every bounded random variable is automatically sub-Gaussian. In the case of linear classifiers, \(f(x)\) will be bounded whenever \(x\) is bounded, and so boundedness of our input samples implies sub-Gaussianity of \(f(x)\).

Lemma. Let \(X\) be sub-Gaussian with variance proxy \(\sigma^2\). Then for any \(t > 0\),

\[\begin{aligned} \Pr[X > t] &\leq \exp(-t^2/2\sigma^2), & \Pr[X > -t] &\leq \exp(-t^2/2\sigma^2). \end{aligned}\]

The above lemma is a well-known result about sub-Gaussian random variables which will be very useful to us here. The quantity \(\sigma^2\) is referred to as the variance proxy because it satisfies

\[\mathbb{E}[X^2] \leq 4\sigma^2.\]

Now, on to the main result.

Theorem. Let \(f\) be sub-Gaussian with variance proxy \(\sigma^2\) and let

\[\varepsilon > \frac{\sigma}{\|w\|}\sqrt{2\ln\frac{2}{\tau}}.\]

Then, with probability exceeding \(1 - \tau\), there exist \(\varepsilon\)-adversarial samples for \(f(x)\) when \(x\) is sampled according to the data distribution.

Proof. First, assume \(f(x) > 0\). Robustness of \(f\) then implies \(f(x + \delta) > 0\) for every \(\delta\) with \(\|\delta\| = \varepsilon\). As such,

\[\cos\angle(w, \delta) = \frac{w^\intercal\delta}{\|w\| \cdot \varepsilon} = \frac{f(x + \delta) - f(x)}{\|w\| \cdot \varepsilon} > -\frac{f(x)}{\|w\| \cdot \varepsilon}.\]

To violate the robustness of \(f\), we need only rotate \(\delta\) such that

\[\cos\angle(w, \delta) \leq -\frac{f(x)}{\|w\| \cdot \varepsilon}.\]

This is always possible provided

\[\tag{1} f(x) \leq \|w\| \cdot \varepsilon.\]

Now assume \(f(x) < 0\), so that \(f(x + \delta) < 0\). Then

\[\cos\angle(w, \delta) = \frac{w^\intercal\delta}{\|w\| \cdot \varepsilon} = \frac{f(x + \delta) - f(x)}{\|w\| \cdot \varepsilon} < -\frac{f(x)}{\|w\| \cdot \varepsilon}.\]

To violate the robustness of \(f\), we need only rotate \(\delta\) such that

\[\cos\angle(w, \delta) \geq -\frac{f(x)}{\|w\| \cdot \varepsilon}.\]

This is always possible provided

\[\tag{2} f(x) \geq -\|w\| \cdot \varepsilon.\]

Putting (1) and (2) together, we obtain the following sufficient condition for the existence of \(\varepsilon\)-adversarial examples:

\[\tag{3} -\|w\| \cdot \varepsilon \leq f(x) \leq \|w\| \cdot \varepsilon \iff |f(x)| \leq \|w\| \cdot \varepsilon.\]

Specifically, if (3) is satisfied, then we can rotate \(\delta\) such that \(\|\delta\| = \varepsilon\) but a necessary condition for \(\varepsilon\)-robustness of \(f\) at \(x\) is violated, and hence \(f\) cannot be \(\varepsilon\)-robust at \(x\). Using sub-Gaussianity of \(f\), a union bound on lemma 1 yields

\[\Pr[|f(x)| \leq \|w\| \cdot \varepsilon] \geq 1 - 2\exp(-\varepsilon^2\|w\|^2/2\sigma^2).\]

To prove the existence of vulnerable samples \(x\), we need only show that

\[1 - 2\exp(-\varepsilon^2\|w\|^2/2\sigma^2) > 1 - \tau \iff \varepsilon > \frac{\sigma}{\|w\|}\sqrt{2\ln\frac{2}{\tau}}.\]

This completes the proof. \(\blacksquare\)

Setting \(\tau = 1\) we obtain

Corollary. Let \(f\) be sub-Gaussian with variance proxy \(\sigma^2\) and let

\[\varepsilon > \frac{\sigma}{\|w\|}\sqrt{2\ln 2}.\]

Then there exist \(\varepsilon\)-adversarial samples for \(f(x)\) when \(x\) is sampled according to the data distribution.

This result essentially proves that, under a very mild assumption, linear classifiers cannot have adversarial robustness exceeding \(\sigma\sqrt{2\ln 2}/\|w\|\). Given that the \(\ell_2\) norm \(\|w\|\) scales like \(\sqrt{n}\) where \(n\) is the data dimensionality, the robustness of a linear classifier \(f(x) = w^\intercal x + b\) with bounded inputs \(x\) scales like

\[\mathcal{O}\left( \frac{\sigma}{\sqrt{n}} \right).\]

A final important remark regarding this result is the fact that we needed only an assumption of sub-Gaussianity of \(f(x)\) when \(x\) is sampled from the data distribution. Hence this existence result refers not to general vectors \(x \in \mathbb{R}^n\) but to actual samples that have non-zero probability under the data distribution. This is in contrast to many existing similar results, which are often overly pessimistic in the sense that they only prove robustness bounds for general vectors \(x \in \mathbb{R}^n\) instead of samples supported by the data distribution.

Enjoy Reading This Article?