Certifiable Defenses against Adversarial Attacks

While neural networks have achieved high performance in different learning tasks, their accuracy drops significantly in the presence of small adversarial perturbations to inputs. In the last couple of years, several practical defenses based on regularization and adversarial training have been proposed which are often followed by stronger attacks to defeat them. To escape this cycle, a new line of work focuses on developing certifiably robust classifiers. In these models, for a given input sample, one can calculate a robustness certificate such that for ‘any’ perturbation of the input within the robustness radius, the classification output will ‘provably’ remain unchanged. In this talk, I will present two certifiable defenses: (1) Wasserstein smoothing to defend against non-additive Wasserstein adversarial attacks, and (2) Curvature-based robust training to certifiably defend against $L_2$ attacks by globally bounding curvature values of the network.

Certifiable Defenses against Adversarial Attacks

Soheil Feizi (UMD)