The unrivaled threat of android malware is the root cause of various security problems on the internet. Android malware industry is becoming increasingly disruptive with almost 12,000 new android malware instances every day. Detecting android malware in smartphones is an essential target for cyber community to get rid of menacing malware samples.
Android malware is one of the most serious threats on the internet which has witnessed an unprecedented upsurge in recent years. It is an open challenge for cybersecurity experts. There are many techniques available to identify and classify android malware based on machine learning, but recently, deep learning has emerged as a prominent classification method for such samples.
This research work proposes a new comprehensive and huge android malware dataset, named CCCS-CIC-AndMal-2020. The dataset includes 200K benign and 200K malware samples totalling to 400K android apps with 14 prominent malware categories and 191 eminent malware families.
To generate the representative dataset, we collaborated with CCCS to capture 200K android malware apps which are labeled and characterized into corresponding family. Benign android apps (200K) are collected from Androzoo dataset to balance the huge dataset. We collected 14 malware categories including adware, backdoor, file infector, no category, Potentially Unwanted Apps (PUA), ransomware, riskware, scareware, trojan, trojan-banker, trojan-dropper, trojan-sms, trojan-spy and zero-day.
A complete taxonomy of all the malware families of captured malware apps is created by dividing them into eight categories such as sensitive data collection, media, hardware, actions/activities, internet connection, C&C, antivirus and storage & settings. The taxonomy is presented in the research paper mentioned under license (Section 5).
CCCS supported us to capture the real-world android malware apps for analysis. We used VirusTotal to specify malware family and label the dataset by following a consensus of 70% anti-viruses to incorporate reliability in labeled dataset. We searched for similar malware samples to categorize malware samples in dataset with similar characteristics. Table 1 presents the details of 14 android malware categories along with number of respective families and samples in the dataset.
Table 1: Dataset details
Category | Number of families | Number of samples |
---|---|---|
Adware | 48 | 47,210 |
Backdoor | 11 | 1,538 |
File Infector | 5 | 669 |
No Category | - | 2,296 |
PUA | 8 | 2,051 |
Ransomware | 8 | 6,202 |
Riskware | 21 | 97,349 |
Scareware | 3 | 1,556 |
Trojan | 45 | 13,559 |
Trojan-Banker | 11 | 887 |
Trojan-Dropper | 9 | 2,302 |
Trojan-SMS | 11 | 3,125 |
Trojan-Spy | 11 | 3,540 |
Zero-day | - | 13,340 |
The families of each malware category in Table 1 along with the numbers of the captured samples are as presented below:
Sr. No. | Family | Number of captured samples |
---|---|---|
1 | dowgin | 2679 |
2 | adflex | 418 |
3 | admogo | 79 |
4 | adviator | 77 |
5 | adwo | 188 |
6 | airpush | 2242 |
7 | appad | 92 |
8 | appsgeyser | 60 |
9 | baiduprotect | 984 |
10 | batmobi | 458 |
11 | dianjin | 45 |
12 | dianle | 19 |
13 | domob | 103 |
14 | ewind | 1047 |
15 | feiwo | 108 |
16 | fictus | 349 |
17 | ganlet | 28 |
18 | adend | 301 |
19 | gmobi | 17 |
20 | hiddenad | 61 |
21 | hummingbad | 28 |
22 | igexin | 82 |
23 | inmobi | 330 |
24 | inoco | 5649 |
25 | kalfere | 113 |
26 | kuguo | 1015 |
27 | leadbolt | 233 |
28 | mobclick | 41 |
29 | mobidash | 1033 |
30 | mobisec | 117 |
31 | mulad | 171 |
32 | oimobi | 913 |
33 | shedun | 19036 |
34 | sprovider | 227 |
35 | viser | 31 |
36 | wooboo | 16 |
37 | xynyin | 44 |
38 | zdtad | 5694 |
39 | frupi | 43 |
40 | kyhub | 28 |
41 | stopsms | 26 |
42 | loki | 46 |
43 | kyview | 127 |
44 | pandaad | 50 |
45 | plague | 14 |
46 | accutrack | 7 |
47 | adcolony | 17 |
48 | gexin | 3 |
Sr. No. | Family | Number of captured samples |
---|---|---|
1 | kapuser | 15 |
2 | kmin | 24 |
3 | fobus | 171 |
4 | mobby | 119 |
5 | hiddad | 664 |
6 | moavt | 166 |
7 | androrat | 129 |
8 | dendroid | 48 |
9 | levida | 51 |
10 | pyls | 24 |
11 | droidkungfu | 50 |
Sr. No. | Family | Number of captured samples |
---|---|---|
1 | commplat | 77 |
2 | leech | 99 |
3 | tachi | 45 |
4 | gudex | 14 |
5 | aqplay | 407 |
Sr. No. | Family | Number of captured samples |
---|---|---|
1 | apptrack | 92 |
2 | cauly | 27 |
3 | secapk | 1004 |
4 | umpay | 67 |
5 | wiyun | 11 |
6 | youmi | 529 |
7 | utchi | 139 |
8 | scamapp | 99 |
Sr. No. | Family | Number of captured samples |
---|---|---|
1 | masnu | 35 |
2 | congur | 252 |
3 | fusob | 67 |
4 | jisut | 820 |
5 | koler | 79 |
6 | lockscreen | 356 |
7 | slocker | 998 |
8 | smsspy | 3319 |
Sr. No. | Family | Number of captured samples |
---|---|---|
1 | skymobi | 10229 |
2 | anydown | 57 |
3 | badpac | 45 |
4 | deng | 58 |
5 | dnotua | 36 |
6 | jiagu | 721 |
7 | metasploit | 28 |
8 | mobilepay | 1197 |
9 | remotecode | 36 |
10 | revmob | 806 |
11 | secneo | 27 |
12 | smspay | 28512 |
13 | smsreg | 50073 |
14 | talkw | 49 |
15 | tencentprotect | 144 |
16 | tordow | 7 |
17 | triada | 493 |
18 | wapron | 93 |
19 | nqshield | 46 |
20 | kingroot | 24 |
21 | wificrack | 15 |
Sr. No. | Family | Number of captured samples |
---|---|---|
1 | avpass | 126 |
2 | mobwin | 23 |
3 | fakeapp | 1332 |
Sr. No. | Family | Number of captured samples |
---|---|---|
1 | Autosms | 239 |
2 | coinge | 16 |
3 | droiddreamlight | 15 |
4 | gluper | 680 |
5 | hiddenapp | 157 |
6 | iconosys | 33 |
7 | lotoor | 661 |
8 | mobtes | 343 |
9 | mseg | 148 |
10 | qysly | 94 |
11 | rootnik | 474 |
12 | syringe | 99 |
13 | wkload | 143 |
14 | zbot | 85 |
15 | hyspu | 112 |
16 | basebridge | 63 |
17 | boogr | 218 |
18 | lovetrap | 48 |
19 | oveead | 30 |
20 | rusms | 27 |
21 | systemmonitor | 61 |
22 | uupay | 27 |
23 | wintertiger | 24 |
24 | typstu | 28 |
25 | blouns | 652 |
26 | autoins | 479 |
27 | cnsms | 3413 |
28 | gappusin | 766 |
29 | gedma | 11 |
30 | ginmaster | 130 |
31 | hypay | 360 |
32 | mytrackp | 1054 |
33 | subspod | 11 |
34 | walkfree | 15 |
35 | xinyinhe | 59 |
36 | drosel | 59 |
37 | uapush | 11 |
38 | uten | 9 |
39 | smsagent | 1166 |
40 | styricka | 833 |
41 | autoinst | 12 |
42 | noicondl | 33 |
43 | obtes | 5 |
44 | droiddream | 3 |
45 | hiddenap | 3 |
Sr. No. | Family | Number of captured samples |
---|---|---|
1 | asacub | 260 |
2 | fakebank | 17 |
3 | faketoken | 52 |
4 | marcher | 87 |
5 | minimob | 56 |
6 | guerrilla | 256 |
7 | bankbot | 4 |
8 | gugi | 8 |
9 | svpeng | 68 |
10 | wroba | 9 |
11 | zitmo | 40 |
Sr. No. | Family | Number of captured samples |
---|---|---|
1 | locker | 1296 |
2 | rooter | 51 |
3 | xiny | 31 |
4 | boqx | 106 |
5 | hqwar | 118 |
6 | ramnit | 84 |
7 | ztorg | 500 |
8 | gorpo | 16 |
Sr. No. | Family | Number of captured samples |
---|---|---|
1 | opfake | 368 |
2 | hipposms | 20 |
3 | podec | 13 |
4 | feejar | 56 |
5 | smsdel | 40 |
6 | plankton | 186 |
7 | jsmshider | 21 |
8 | smsbot | 42 |
9 | boxer | 87 |
10 | fakeinst | 2148 |
11 | vietsms | 13 |
Sr. No. | Family | Number of captured samples |
---|---|---|
1 | spynote | 21 |
2 | kasandra | 29 |
3 | spyagent | 48 |
4 | spyoo | 13 |
5 | tekwon | 19 |
6 | sandr | 208 |
7 | qqspy | 27 |
8 | smforw | 1873 |
9 | smsthief | 1058 |
10 | smszombie | 52 |
11 | spydealer | 1 |
For benign android apps, we used the Androzoo dataset, which currently contains more than eight million unique android apps, and the number is still growing. The architecture is developed to collect the Androzoo dataset from different sources including official android market, Google Play, Anshi, AppChina, 1mobile, and Genome project dataset. A weekly updated list containing all the detailed information about the apps is created. HTTP API is provided to allow the full download of the unaltered APKs from the Androzoo dataset.
AndroidManifest.xml contains a lot of features that can be used for static analysis. The main extracted features include:
Table 2 presents the examples of static features extracted from captured dataset.
Table 2: List of static features
Feature | Values |
---|---|
Package Name | "com.fb.iwidget" |
Activities | "com.fb.iwidget.OverlayActivity" "org.acra.CrashReportDialog" "com.batch.android.BatchActionActivity" "com.fb.iwidget.MainActivity" "com.fb.iwidget.PreferencesActivity" "com.fb.iwidget.PickerActivity" "com.fb.iwidget.IntroActivity" |
Services | "com.batch.android.BatchActionService" "com.fb.iwidget.MainService" "com.fb.iwidget.SnapAccessService" |
Receivers/Providers | "com.fb.iwidget.ExpandWidgetProvider" "com.fb.iwidget.ActionReceiver" |
Intents Actions | "android.accessibilityservice.AccessibilityService" "android.appwidget.action.APPWIDGET_UPDATE" "android.intent.action.BOOT_COMPLETED" "android.intent.action.CREATE_SHORTCUT" "android.intent.action.MAIN" "android.intent.action.MY_PACKAGE_REPLACED" "android.intent.action.USER_PRESENT" "android.intent.action.VIEW" "com.fb.iwidget.action.SHOULD_REVIVE" |
Intents Categories | "android.intent.category.BROWSABLE" "android.intent.category.DEFAULT" "android.intent.category.LAUNCHER" |
Permissions | "android.permission.ACCESS_NETWORK_STATE" "android.permission.CALL_PHONE" "android.permission.INTERNET" "android.permission.RECEIVE_BOOT_COMPLETED" "android.permission.SYSTEM_ALERT_WINDOW" "com.android.vending.BILLING" "android.permission.BIND_ACCESSIBILITY_SERVICE" |
Meta-Data | "android.accessibilityservice" "android.appwidget.provider" |
#Icons | 331 |
#Pictures | 0 |
#Videos | 0 |
Audio files | 0 |
Videos | 0 |
Size of the App | 4.2M |
For understanding the behavioral changes of these malware categories and families, six categories of features are extracted after executing the malware in an emulated environment. The main extracted features include:
Table 3 presents the complete list of dynamic features extracted from dynamic execution of malware.
Table 3: List of dynamic features
Category | Feature |
---|---|
Memory | Memory_PssTotal Memory_PssClean Memory_SharedDirty Memory_PrivateDirty Memory_SharedClean Memory_PrivateClean Memory_SwapPssDirty Memory_HeapSize Memory_HeapAlloc Memory_HeapFree Memory_Views Memory_ViewRootImpl Memory_AppContexts Memory_Activities Memory_Assets Memory_AssetManagers Memory_LocalBinders Memory_ProxyBinders Memory_ParcelMemory Memory_ParcelCount Memory_DeathRecipients Memory_OpenSSLSockets Memory_WebViews |
API | API_Process_android.os.Process_start API_Process_android.app.ActivityManager_killBackgroundProcesses API_Process_android.os.Process_killProcess API_Command_java.lang.Runtime_exec API_Command_java.lang.ProcessBuilder_start API_JavaNativeInterface_java.lang.Runtime_loadLibrary API_JavaNativeInterface_java.lang.Runtime_load API_WebView_android.webkit.WebView_loadUrl API_WebView_android.webkit.WebView_loadData API_WebView_android.webkit.WebView_loadDataWithBaseURL API_WebView_android.webkit.WebView_addJavascriptInterface API_WebView_android.webkit.WebView_evaluateJavascript API_WebView_android.webkit.WebView_postUrl API_WebView_android.webkit.WebView_postWebMessage API_WebView_android.webkit.WebView_savePassword API_WebView_android.webkit.WebView_setHttpAuthUsernamePassword API_WebView_android.webkit.WebView_getHttpAuthUsernamePassword API_WebView_android.webkit.WebView_setWebContentsDebuggingEnabled API_FileIO_libcore.io.IoBridge_open API_FileIO_android.content.ContextWrapper_openFileInput API_FileIO_android.content.ContextWrapper_openFileOutput API_FileIO_android.content.ContextWrapper_deleteFile API_Database_android.content.ContextWrapper_openOrCreateDatabase API_Database_android.content.ContextWrapper_databaseList API_Database_android.content.ContextWrapper_deleteDatabase API_Database_android.database.sqlite.SQLiteDatabase_execSQL API_Database_android.database.sqlite.SQLiteDatabase_deleteDatabase API_Database_android.database.sqlite.SQLiteDatabase_getPath API_Database_android.database.sqlite.SQLiteDatabase_insert API_Database_android.database.sqlite.SQLiteDatabase_insertOrThrow API_Database_android.database.sqlite.SQLiteDatabase_insertWithOnConflict API_Database_android.database.sqlite.SQLiteDatabase_openDatabase API_Database_android.database.sqlite.SQLiteDatabase_openOrCreateDatabase API_Database_android.database.sqlite.SQLiteDatabase_query API_Database_android.database.sqlite.SQLiteDatabase_queryWithFactory API_Database_android.database.sqlite.SQLiteDatabase_rawQuery API_Database_android.database.sqlite.SQLiteDatabase_rawQueryWithFactory API_Database_android.database.sqlite.SQLiteDatabase_update API_Database_android.database.sqlite.SQLiteDatabase_updateWithOnConflict API_Database_android.database.sqlite.SQLiteDatabase_compileStatement API_Database_android.database.sqlite.SQLiteDatabase_create API_IPC_android.content.ContextWrapper_sendBroadcast API_IPC_android.content.ContextWrapper_sendStickyBroadcast API_IPC_android.content.ContextWrapper_startActivity API_IPC_android.content.ContextWrapper_startService API_IPC_android.content.ContextWrapper_stopService API_IPC_android.content.ContextWrapper_registerReceiver API_Binder_android.app.ContextImpl_registerReceiver API_Binder_android.app.ActivityThread_handleReceiver API_Binder_android.app.Activity_startActivity API_Crypto_javax.crypto.spec.SecretKeySpec_$init API_Crypto_javax.crypto.Cipher_doFinal API_Crypto-Hash_java.security.MessageDigest_digest API_Crypto-Hash_java.security.MessageDigest_update API_DeviceInfo_android.telephony.TelephonyManager_getDeviceId API_DeviceInfo_android.telephony.TelephonyManager_getSubscriberId API_DeviceInfo_android.telephony.TelephonyManager_getLine1Number API_DeviceInfo_android.telephony.TelephonyManager_getNetworkOperator API_DeviceInfo_android.telephony.TelephonyManager_getNetworkOperatorName API_DeviceInfo_android.telephony.TelephonyManager_getSimOperatorName API_DeviceInfo_android.net.wifi.WifiInfo_getMacAddress API_DeviceInfo_android.net.wifi.WifiInfo_getBSSID API_DeviceInfo_android.net.wifi.WifiInfo_getIpAddress API_DeviceInfo_android.net.wifi.WifiInfo_getNetworkId API_DeviceInfo_android.telephony.TelephonyManager_getSimCountryIso API_DeviceInfo_android.telephony.TelephonyManager_getSimSerialNumber API_DeviceInfo_android.telephony.TelephonyManager_getNetworkCountryIso API_DeviceInfo_android.telephony.TelephonyManager_getDeviceSoftwareVersion API_DeviceInfo_android.os.Debug_isDebuggerConnected API_DeviceInfo_android.content.pm.PackageManager_getInstallerPackageName API_DeviceInfo_android.content.pm.PackageManager_getInstalledApplications API_DeviceInfo_android.content.pm.PackageManager_getInstalledModules API_DeviceInfo_android.content.pm.PackageManager_getInstalledPackages API_Network_java.net.URL_openConnection API_Network_org.apache.http.impl.client.AbstractHttpClient_execute API_Network_com.android.okhttp.internal.huc.HttpURLConnectionImpl_getInputStream API_Network_com.android.okhttp.internal.http.HttpURLConnectionImpl_getInputStream API_DexClassLoader_dalvik.system.BaseDexClassLoader_findResource API_DexClassLoader_dalvik.system.BaseDexClassLoader_findResources API_DexClassLoader_dalvik.system.BaseDexClassLoader_findLibrary API_DexClassLoader_dalvik.system.DexFile_loadDex API_DexClassLoader_dalvik.system.DexFile_loadClass API_DexClassLoader_dalvik.system.DexClassLoader_$init API_Base64_android.util.Base64_decode API_Base64_android.util.Base64_encode API_Base64_android.util.Base64_encodeToString API_SystemManager_android.app.ApplicationPackageManager_setComponentEnabledSetting API_SystemManager_android.app.NotificationManager_notify API_SystemManager_android.telephony.TelephonyManager_listen API_SystemManager_android.content.BroadcastReceiver_abortBroadcast API_SMS_android.telephony.SmsManager_sendTextMessage API_SMS_android.telephony.SmsManager_sendMultipartTextMessage API_DeviceData_android.content.ContentResolver_query API_DeviceData_android.content.ContentResolver_registerContentObserver API_DeviceData_android.content.ContentResolver_insert API_DeviceData_android.content.ContentResolver_delete API_DeviceData_android.accounts.AccountManager_getAccountsByType API_DeviceData_android.accounts.AccountManager_getAccounts API_DeviceData_android.location.Location_getLatitude API_DeviceData_android.location.Location_getLongitude API_DeviceData_android.media.AudioRecord_startRecording API_DeviceData_android.media.MediaRecorder_start API_DeviceData_android.os.SystemProperties_get API_DeviceData_android.app.ApplicationPackageManager_getInstalledPackages API__sessions |
Network | Network_TotalReceivedBytes Network_TotalReceivedPackets Network_TotalTransmittedBytes Network_TotalTransmittedPackets |
Battery | Battery_wakelock Battery_service |
Logcat | Logcat_verbose Logcat_debug Logcat_info Logcat_warning Logcat_error Logcat_total |
Process | Process_total |
You may redistribute, republish and mirror the CCCS-CIC-AndMal-2020 dataset in any form. However, any use or redistribution of the data must include a citation to the CCCS-CIC-AndMal-2020 dataset and the following papers.
David Sean Keyes, Beiqi Li, Gurdip Kaur, Arash Habibi Lashkari, Francois Gagnon, Frederic Massicotte, "EntropLyzer: Android Malware Classification and Characterization Using Entropy Analysis of Dynamic Characteristics", Reconciling Data Analytics, Automation, Privacy, and Security: A Big Data Challenge (RDAAPS), IEEE, Canada, ON, McMaster University, 2021
Abir Rahali, Arash Habibi Lashkari, Gurdip Kaur, Laya Taheri, Francois Gagnon, and Frédéric Massicotte, "DIDroid: Android Malware Classification and Characterization Using Deep Image Learning", 10th International Conference on Communication and Network Security (ICCNS2020), Pages 70–82, Tokyo, Japan, November 2020
We thank the Mitacs Globalink Program for providing the Research Internship (GRI) opportunity and Harrison McCain Young Scholar Foundation funds from University of New Brunswick (UNB) for supporting this project. We also thank CCCS for sharing the malware samples of this dataset with us.