MalDICT: Benchmark Datasets on Malware Behaviors, Platforms, Exploitation, and Packers

RJ Joyce
PhD Student
CSEE Department, UMBC

12:00pm (noon) – 1pm
Friday, October 6, 2023
Remotely via WebEx: https://umbc.webex.com/meet/sherman

Recording of Talk

Joint work with Edward Raff, Charles Nicholas, and James Holt

Abstract:

Existing research on malware classification focuses almost exclusively on two tasks: distinguishing between malicious and benign files, and classifying malware by family. Malware, however, can be categorized according to many other types of attributes, and the ability to identify these attributes in newly-emerging malware using machine learning will provide significant value to analysts. In particular, we have identified four tasks which are under-represented in prior work: classification by behaviors that malware exhibit, platforms that malware run on, vulnerabilities that malware exploit, and packers that packed the malware. To obtain labels for training and evaluating ML classifiers on these tasks, we created an antivirus (AV) tagging tool called ClarAVy. ClarAVy’s sophisticated AV label parser distinguishes itself from prior AV-based taggers with the ability to parse 882 different AV label formats used by 90 different AV products accurately. We are releasing benchmark datasets for each of these four classification tasks, tagged using ClarAVy and comprising nearly 5.5 million malicious files in total. Our malware behavior dataset includes 75 distinct tags—nearly seven times more than the only prior benchmark dataset with behavioral tags. To our knowledge, we are the first to release datasets with malware platform, exploitation, and packer tags.

About the Speaker:

RJ Joyce (joyce8@umbc.edu) is a PhD student at UMBC under the supervision of Dr. Charles Nicholas and Dr. Edward Raff. Presently, RJ works as a data scientist at Booz Allen Hamilton performing research at the intersection of malware analysis and machine learning. RJ is also a visiting lecturer at UMBC and is teaching CMSC-426 Principles of Computer Security course this semester.

Host:

Alan T. Sherman, sherman@umbc.edu

Upcoming CDL Meetings:

  • October 20 (1-2pm) Josh Benaloh (Microsoft), ElectionGuard
  • November 3, Jason Rheinhart (Sandia), Risk analysis
  • November 17 (1-2pm) Austin Murdoch (Sixmap)
  • December 1, Enis Golaszewski (UMBC), Automatic cryptographic bindings

Support for this event was provided in part by the National Science Foundation under SFS grant DGE-1753681.

The UMBC Cyber Defense Lab meets biweekly Fridays 12-1 pm. All meetings are open to the public.