Abstract: Computer vision is playing an increasingly important role in automated
malware detection with to the rise of the image-based binary representation.
These binary images are fast to generate, require no feature engineering, and
are resilient to popular obfuscation methods. Significant research has been
conducted in this area, however, it has been restricted to small-scale or
private datasets that only a few industry labs and research teams have access
to. This lack of availability hinders examination of existing work, development
of new research, and dissemination of ideas. We introduce MalNet, the largest
publicly available cybersecurity image database, offering 133x more images and
27x more classes than the only other public binary-image database. MalNet
contains over 1.2 million images across a hierarchy of 47 types and 696
families. We provide extensive analysis of MalNet, discussing its properties
and provenance. The scale and diversity of MalNet unlocks new and exciting
cybersecurity opportunities to the computer vision community--enabling
discoveries and research directions that were previously not possible. The
database is publicly available at www.mal-net.org.