Obfuscated malware is malware that hides to avoid detection and extermination. The obfuscated malware dataset is designed to test obfuscated malware detection methods through memory. The dataset was created to represent as close to a real-world situation as possible using malware that is prevalent in the real world. Made up of Spyware, Ransomware and Trojan Horse malware, it provides a balanced dataset that can be used to test obfuscated malware detection systems.
This dataset uses debug mode for the memory dump process to avoid the dumping process to show up in the memory dumps. This works to represent a more accurate example of what an average user would have running at the time of a malware attack.
The obfuscated malware dataset focuses on simulation of real-world scenarios. Figure 1 shows the breakdown of benign and malicious memory dumps. Figure 2 shows the breakdown of what malware families are used in each malware category for Spyware (a), Ransomware (b), and Trojan Horse (c) malware. Figure 3 shows the overall malware families used in the whole dataset.
The dataset is balanced with it being made up by 50% malicious memory dumps and 50% benign memory dumps. The break down for malware families is shown in the table below. The dataset contains a total of 58,596 records with 29,298 benign and 29,298 malicious. Figure 4 shows the total count of each malware family from each malware category.
Malware category | Malware families | Count |
---|---|---|
Trojan Horse |
|
|
Spyware |
|
|
Ransomware |
|
|
The dataset is balanced with it being made up by 50% malicious memory dumps and 50% benign memory dumps. The break down for malware families is shown in the table below. The dataset contains a total of 58,596 records with 29,298 benign and 29,298 malicious.
Tristan Carrier, Princy Victor, Ali Tekeoglu, Arash Habibi Lashkari,” Detecting Obfuscated Malware using Memory Feature Engineering”, The 8th International Conference on Information Systems Security and Privacy (ICISSP), 2022