How Adversarial Attacks Work. Adversarial attacks against machine learning classifiers involve constructing an input that deliberately produces the wrong classification. This article shows how these can be constructed, and includes examples generated using PyTorch which produce a sports car that gets identified as a toaster and a photo of Sylvester Stallone that gets classified as Keanu Reeves.
Recent articles
- Putting Gemini 2.5 Pro through its paces - 25th March 2025
- New audio models from OpenAI, but how much can we rely on them? - 20th March 2025
- Calling a wrap on my weeknotes - 20th March 2025