The sophisticated and advanced Android malware is able to identify the presence of the emulator used by the malware analyst and in response, alter its behaviour to evade detection. To overcome this issue, we installed the Android applications on the real device and captured its network traffic.
CICAAGM dataset is captured by installing the Android apps on the real smartphones semi-automated. The dataset is generated from 1,900 applications with the following three categories:
Airpush: Designed to deliver unsolicited advertisements to the user’s systems for information stealing.
Dowgin: Designed as an advertisement library that can also steal the user’s information.
Kemoge: Designed to take over a user’s Android device. This adware is a hybrid of botnet and disguises itself as popular apps via repackaging.
Mobidash: Designed to display ads and to compromise user’s personal information.
Shuanet: Similar to Kemoge, Shuanet is also designed to take over a user’s device.
AVpass: Designed to be distributed in the guise of a Clock app.
FakeAV: Designed as a scam that tricks user to purchase a full version of the software in order to re-mediate non-existing infections.
FakeFlash/FakePlayer: Designed as a fake Flash app in order to direct users to a website (after successfully installed).
GGtracker: Designed for SMS fraud (sends SMS messages to a premium-rate number) and information stealing.
Penetho: Designed as a fake service (hacktool for Android devices that can be used to crack the WiFi password). The malware is also able to infect the user’s computer via infected email attachment, fake updates, external media and infected documents.
2015 GooglePlay market (top free popular and top free new)
2016 GooglePlay market (top free popular and top free new)
The CICAAGM dataset consists of the following items is publicly available for researchers.
.pcap files – the network traffic of both the malware and benign (20% malware and 80% benign)
.csv files - the list of extracted network traffic features generated by the CIC-flowmeter
If you are using our dataset, you should cite our related paper that outlines the details of the dataset and its underlying principles: