AndMal 2020 | Datasets | Research | Canadian Institute for Cybersecurity | UNB

Global Site Navigation (use tab and down arrow)

Canadian Institute for Cybersecurity

CCCS-CIC-AndMal-2020

Canadian Institute for Cybersecurity (CIC) project in collaboration with Canadian Centre for Cyber Security (CCCS)

The unrivaled threat of android malware is the root cause of various security problems on the internet. Android malware industry is becoming increasingly disruptive with almost 12,000 new android malware instances every day. Detecting android malware in smartphones is an essential target for cyber community to get rid of menacing malware samples.

Android malware is one of the most serious threats on the internet which has witnessed an unprecedented upsurge in recent years. It is an open challenge for cybersecurity experts. There are many techniques available to identify and classify android malware based on machine learning, but recently, deep learning has emerged as a prominent classification method for such samples.

This research work proposes a new comprehensive and huge android malware dataset, named CCCS-CIC-AndMal-2020. The dataset includes 200K benign and 200K malware samples totalling to 400K android apps with 14 prominent malware categories and 191 eminent malware families.

1. Introduction

To generate the representative dataset, we collaborated with CCCS to capture 200K android malware apps which are labeled and characterized into corresponding family. Benign android apps (200K) are collected from Androzoo dataset to balance the huge dataset. We collected 14 malware categories including adware, backdoor, file infector, no category, Potentially Unwanted Apps (PUA), ransomware, riskware, scareware, trojan, trojan-banker, trojan-dropper, trojan-sms, trojan-spy and zero-day.

A complete taxonomy of all the malware families of captured malware apps is created by dividing them into eight categories such as sensitive data collection, media, hardware, actions/activities, internet connection, C&C, antivirus and storage & settings. The taxonomy is presented in the research paper mentioned under license (Section 5).

2. Capturing data and final dataset

CCCS supported us to capture the real-world android malware apps for analysis. We used VirusTotal to specify malware family and label the dataset by following a consensus of 70% anti-viruses to incorporate reliability in labeled dataset. We searched for similar malware samples to categorize malware samples in dataset with similar characteristics. Table 1 presents the details of 14 android malware categories along with number of respective families and samples in the dataset.

Table 1: Dataset details

Category Number of families Number of samples
Adware 48 47,210
Backdoor 11 1,538
File Infector 5 669
No Category - 2,296
PUA 8 2,051
Ransomware 8 6,202
Riskware 21 97,349
Scareware 3 1,556
Trojan 45 13,559
Trojan-Banker 11 887
Trojan-Dropper 9 2,302
Trojan-SMS 11 3,125
Trojan-Spy 11 3,540
Zero-day - 13,340

The families of each malware category in Table 1 along with the numbers of the captured samples are as presented below:


Adware

Sr. No. Family Number of captured samples
1 dowgin 2679
2 adflex 418
3 admogo 79
4 adviator 77
5 adwo 188
6 airpush 2242
7 appad 92
8 appsgeyser 60
9 baiduprotect 984
10 batmobi 458
11 dianjin 45
12 dianle 19
13 domob 103
14 ewind 1047
15 feiwo 108
16 fictus 349
17 ganlet 28
18 adend 301
19 gmobi 17
20 hiddenad 61
21 hummingbad 28
22 igexin 82
23 inmobi 330
24 inoco 5649
25 kalfere 113
26 kuguo 1015
27 leadbolt 233
28 mobclick 41
29 mobidash 1033
30 mobisec 117
31 mulad 171
32 oimobi 913
33 shedun 19036
34 sprovider 227
35 viser 31
36 wooboo 16
37 xynyin 44
38 zdtad 5694
39 frupi 43
40 kyhub 28
41 stopsms 26
42 loki 46
43 kyview 127
44 pandaad 50
45 plague 14
46 accutrack 7
47 adcolony 17
48 gexin 3

Backdoor

Sr. No. Family Number of captured samples
1 kapuser 15
2 kmin 24
3 fobus 171
4 mobby 119
5 hiddad 664
6 moavt 166
7 androrat 129
8 dendroid 48
9 levida 51
10 pyls 24
11 droidkungfu 50

File Infector

Sr. No. Family Number of captured samples
1 commplat 77
2 leech 99
3 tachi 45
4 gudex 14
5 aqplay 407

PUA

Sr. No. Family Number of captured samples
1 apptrack 92
2 cauly 27
3 secapk 1004
4 umpay 67
5 wiyun 11
6 youmi 529
7 utchi 139
8 scamapp 99

Ransomware

Sr. No. Family Number of captured samples
1 masnu 35
2 congur 252
3 fusob 67
4 jisut 820
5 koler 79
6 lockscreen 356
7 slocker 998
8 smsspy 3319

Riskware

Sr. No. Family Number of captured samples
1 skymobi 10229
2 anydown 57
3 badpac 45
4 deng 58
5 dnotua 36
6 jiagu 721
7 metasploit 28
8 mobilepay 1197
9 remotecode 36
10 revmob 806
11 secneo 27
12 smspay 28512
13 smsreg 50073
14 talkw 49
15 tencentprotect 144
16 tordow 7
17 triada 493
18 wapron 93
19 nqshield 46
20 kingroot 24
21 wificrack 15

Scareware

Sr. No. Family Number of captured samples
1 avpass 126
2 mobwin 23
3 fakeapp 1332

Trojan

Sr. No. Family Number of captured samples
1 Autosms 239
2 coinge 16
3 droiddreamlight 15
4 gluper 680
5 hiddenapp 157
6 iconosys 33
7 lotoor 661
8 mobtes 343
9 mseg 148
10 qysly 94
11 rootnik 474
12 syringe 99
13 wkload 143
14 zbot 85
15 hyspu 112
16 basebridge 63
17 boogr 218
18 lovetrap 48
19 oveead 30
20 rusms 27
21 systemmonitor 61
22 uupay 27
23 wintertiger 24
24 typstu 28
25 blouns 652
26 autoins 479
27 cnsms 3413
28 gappusin 766
29 gedma 11
30 ginmaster 130
31 hypay 360
32 mytrackp 1054
33 subspod 11
34 walkfree 15
35 xinyinhe 59
36 drosel 59
37 uapush 11
38 uten 9
39 smsagent 1166
40 styricka 833
41 autoinst 12
42 noicondl 33
43 obtes 5
44 droiddream 3
45 hiddenap 3

Trojan-Banker

Sr. No. Family Number of captured samples
1 asacub 260
2 fakebank 17
3 faketoken 52
4 marcher 87
5 minimob 56
6 guerrilla 256
7 bankbot 4
8 gugi 8
9 svpeng 68
10 wroba 9
11 zitmo 40

Trojan-Dropper

Sr. No. Family Number of captured samples
1 locker 1296
2 rooter 51
3 xiny 31
4 boqx 106
5 hqwar 118
6 ramnit 84
7 ztorg 500
8 gorpo 16

Trojan-SMS

Sr. No. Family Number of captured samples
1 opfake 368
2 hipposms 20
3 podec 13
4 feejar 56
5 smsdel 40
6 plankton 186
7 jsmshider 21
8 smsbot 42
9 boxer 87
10 fakeinst 2148
11 vietsms 13

Trojan-Spy

Sr. No. Family Number of captured samples
1 spynote 21
2 kasandra 29
3 spyagent 48
4 spyoo 13
5 tekwon 19
6 sandr 208
7 qqspy 27
8 smforw 1873
9 smsthief 1058
10 smszombie 52
11 spydealer 1

For benign android apps, we used the Androzoo dataset, which currently contains more than eight million unique android apps, and the number is still growing. The architecture is developed to collect the Androzoo dataset from different sources including official android market, Google Play, Anshi, AppChina, 1mobile, and Genome project dataset. A weekly updated list containing all the detailed information about the apps is created. HTTP API is provided to allow the full download of the unaltered APKs from the Androzoo dataset.

3. Static analysis

AndroidManifest.xml contains a lot of features that can be used for static analysis. The main extracted features include:

  • Activities: An android activity is one screen of the android app's user interface
  • Broadcast receivers and providers
  • Metadata: It is basically an additional option to store information that can be accessed through the entire project
  • The permissions requested by application: It protects the privacy of the user and is needed to access sensitive user data (such as contacts and SMS)
  • System features (such as camera and internet)

Table 2 presents the examples of static features extracted from captured dataset.

Table 2: List of static features

Feature Values
Package Name "com.fb.iwidget"
Activities "com.fb.iwidget.OverlayActivity"
"org.acra.CrashReportDialog"
"com.batch.android.BatchActionActivity"
"com.fb.iwidget.MainActivity"
"com.fb.iwidget.PreferencesActivity"
"com.fb.iwidget.PickerActivity"
"com.fb.iwidget.IntroActivity"
Services "com.batch.android.BatchActionService"
"com.fb.iwidget.MainService"
"com.fb.iwidget.SnapAccessService"
Receivers/Providers "com.fb.iwidget.ExpandWidgetProvider"
"com.fb.iwidget.ActionReceiver"
Intents Actions "android.accessibilityservice.AccessibilityService"
"android.appwidget.action.APPWIDGET_UPDATE"
"android.intent.action.BOOT_COMPLETED"
"android.intent.action.CREATE_SHORTCUT"
"android.intent.action.MAIN"
"android.intent.action.MY_PACKAGE_REPLACED"
"android.intent.action.USER_PRESENT"
"android.intent.action.VIEW"
"com.fb.iwidget.action.SHOULD_REVIVE"
Intents Categories "android.intent.category.BROWSABLE"
"android.intent.category.DEFAULT"
"android.intent.category.LAUNCHER"
Permissions "android.permission.ACCESS_NETWORK_STATE"
"android.permission.CALL_PHONE"
"android.permission.INTERNET"
"android.permission.RECEIVE_BOOT_COMPLETED"
"android.permission.SYSTEM_ALERT_WINDOW"
"com.android.vending.BILLING"
"android.permission.BIND_ACCESSIBILITY_SERVICE"
Meta-Data "android.accessibilityservice"
"android.appwidget.provider"
#Icons 331
#Pictures 0
#Videos 0
Audio files 0
Videos 0
Size of the App 4.2M

4. Dynamic analysis

For understanding the behavioral changes of these malware categories and families, six categories of features are extracted after executing the malware in an emulated environment. The main extracted features include:

  • Memory: Memory features define activities performed by malware by utilizing memory.
  • API: Application Programming Interface (API) features delineate the communication between two applications.
  • Network: Network features describe the data transmitted and received between other devices in the network. It indicates foreground and background network usage.
  • Battery: Battery features describe the access to battery wakelock and services by malware.
  • Logcat: Logcat features write log messages corresponding to a function performed by malware.
  • Process: Process features count the interaction of malware with total number of processes.

Table 3 presents the complete list of dynamic features extracted from dynamic execution of malware.

Table 3: List of dynamic features

Category Feature
Memory Memory_PssTotal
Memory_PssClean
Memory_SharedDirty
Memory_PrivateDirty
Memory_SharedClean
Memory_PrivateClean
Memory_SwapPssDirty
Memory_HeapSize
Memory_HeapAlloc
Memory_HeapFree
Memory_Views
Memory_ViewRootImpl
Memory_AppContexts
Memory_Activities
Memory_Assets
Memory_AssetManagers
Memory_LocalBinders
Memory_ProxyBinders
Memory_ParcelMemory
Memory_ParcelCount
Memory_DeathRecipients
Memory_OpenSSLSockets
Memory_WebViews
API API_Process_android.os.Process_start
API_Process_android.app.ActivityManager_killBackgroundProcesses
API_Process_android.os.Process_killProcess
API_Command_java.lang.Runtime_exec
API_Command_java.lang.ProcessBuilder_start
API_JavaNativeInterface_java.lang.Runtime_loadLibrary
API_JavaNativeInterface_java.lang.Runtime_load
API_WebView_android.webkit.WebView_loadUrl
API_WebView_android.webkit.WebView_loadData
API_WebView_android.webkit.WebView_loadDataWithBaseURL
API_WebView_android.webkit.WebView_addJavascriptInterface
API_WebView_android.webkit.WebView_evaluateJavascript
API_WebView_android.webkit.WebView_postUrl
API_WebView_android.webkit.WebView_postWebMessage
API_WebView_android.webkit.WebView_savePassword
API_WebView_android.webkit.WebView_setHttpAuthUsernamePassword
API_WebView_android.webkit.WebView_getHttpAuthUsernamePassword
API_WebView_android.webkit.WebView_setWebContentsDebuggingEnabled
API_FileIO_libcore.io.IoBridge_open
API_FileIO_android.content.ContextWrapper_openFileInput
API_FileIO_android.content.ContextWrapper_openFileOutput
API_FileIO_android.content.ContextWrapper_deleteFile
API_Database_android.content.ContextWrapper_openOrCreateDatabase
API_Database_android.content.ContextWrapper_databaseList
API_Database_android.content.ContextWrapper_deleteDatabase
API_Database_android.database.sqlite.SQLiteDatabase_execSQL
API_Database_android.database.sqlite.SQLiteDatabase_deleteDatabase
API_Database_android.database.sqlite.SQLiteDatabase_getPath
API_Database_android.database.sqlite.SQLiteDatabase_insert
API_Database_android.database.sqlite.SQLiteDatabase_insertOrThrow
API_Database_android.database.sqlite.SQLiteDatabase_insertWithOnConflict
API_Database_android.database.sqlite.SQLiteDatabase_openDatabase
API_Database_android.database.sqlite.SQLiteDatabase_openOrCreateDatabase
API_Database_android.database.sqlite.SQLiteDatabase_query
API_Database_android.database.sqlite.SQLiteDatabase_queryWithFactory
API_Database_android.database.sqlite.SQLiteDatabase_rawQuery
API_Database_android.database.sqlite.SQLiteDatabase_rawQueryWithFactory
API_Database_android.database.sqlite.SQLiteDatabase_update
API_Database_android.database.sqlite.SQLiteDatabase_updateWithOnConflict
API_Database_android.database.sqlite.SQLiteDatabase_compileStatement
API_Database_android.database.sqlite.SQLiteDatabase_create
API_IPC_android.content.ContextWrapper_sendBroadcast
API_IPC_android.content.ContextWrapper_sendStickyBroadcast
API_IPC_android.content.ContextWrapper_startActivity
API_IPC_android.content.ContextWrapper_startService
API_IPC_android.content.ContextWrapper_stopService
API_IPC_android.content.ContextWrapper_registerReceiver
API_Binder_android.app.ContextImpl_registerReceiver
API_Binder_android.app.ActivityThread_handleReceiver
API_Binder_android.app.Activity_startActivity
API_Crypto_javax.crypto.spec.SecretKeySpec_$init
API_Crypto_javax.crypto.Cipher_doFinal
API_Crypto-Hash_java.security.MessageDigest_digest
API_Crypto-Hash_java.security.MessageDigest_update
API_DeviceInfo_android.telephony.TelephonyManager_getDeviceId
API_DeviceInfo_android.telephony.TelephonyManager_getSubscriberId
API_DeviceInfo_android.telephony.TelephonyManager_getLine1Number
API_DeviceInfo_android.telephony.TelephonyManager_getNetworkOperator
API_DeviceInfo_android.telephony.TelephonyManager_getNetworkOperatorName
API_DeviceInfo_android.telephony.TelephonyManager_getSimOperatorName
API_DeviceInfo_android.net.wifi.WifiInfo_getMacAddress
API_DeviceInfo_android.net.wifi.WifiInfo_getBSSID
API_DeviceInfo_android.net.wifi.WifiInfo_getIpAddress
API_DeviceInfo_android.net.wifi.WifiInfo_getNetworkId
API_DeviceInfo_android.telephony.TelephonyManager_getSimCountryIso
API_DeviceInfo_android.telephony.TelephonyManager_getSimSerialNumber
API_DeviceInfo_android.telephony.TelephonyManager_getNetworkCountryIso
API_DeviceInfo_android.telephony.TelephonyManager_getDeviceSoftwareVersion
API_DeviceInfo_android.os.Debug_isDebuggerConnected
API_DeviceInfo_android.content.pm.PackageManager_getInstallerPackageName
API_DeviceInfo_android.content.pm.PackageManager_getInstalledApplications
API_DeviceInfo_android.content.pm.PackageManager_getInstalledModules
API_DeviceInfo_android.content.pm.PackageManager_getInstalledPackages
API_Network_java.net.URL_openConnection
API_Network_org.apache.http.impl.client.AbstractHttpClient_execute
API_Network_com.android.okhttp.internal.huc.HttpURLConnectionImpl_getInputStream
API_Network_com.android.okhttp.internal.http.HttpURLConnectionImpl_getInputStream
API_DexClassLoader_dalvik.system.BaseDexClassLoader_findResource
API_DexClassLoader_dalvik.system.BaseDexClassLoader_findResources
API_DexClassLoader_dalvik.system.BaseDexClassLoader_findLibrary
API_DexClassLoader_dalvik.system.DexFile_loadDex
API_DexClassLoader_dalvik.system.DexFile_loadClass
API_DexClassLoader_dalvik.system.DexClassLoader_$init
API_Base64_android.util.Base64_decode
API_Base64_android.util.Base64_encode
API_Base64_android.util.Base64_encodeToString
API_SystemManager_android.app.ApplicationPackageManager_setComponentEnabledSetting
API_SystemManager_android.app.NotificationManager_notify
API_SystemManager_android.telephony.TelephonyManager_listen
API_SystemManager_android.content.BroadcastReceiver_abortBroadcast
API_SMS_android.telephony.SmsManager_sendTextMessage
API_SMS_android.telephony.SmsManager_sendMultipartTextMessage
API_DeviceData_android.content.ContentResolver_query
API_DeviceData_android.content.ContentResolver_registerContentObserver
API_DeviceData_android.content.ContentResolver_insert
API_DeviceData_android.content.ContentResolver_delete
API_DeviceData_android.accounts.AccountManager_getAccountsByType
API_DeviceData_android.accounts.AccountManager_getAccounts
API_DeviceData_android.location.Location_getLatitude
API_DeviceData_android.location.Location_getLongitude
API_DeviceData_android.media.AudioRecord_startRecording
API_DeviceData_android.media.MediaRecorder_start
API_DeviceData_android.os.SystemProperties_get
API_DeviceData_android.app.ApplicationPackageManager_getInstalledPackages
API__sessions
Network Network_TotalReceivedBytes
Network_TotalReceivedPackets
Network_TotalTransmittedBytes
Network_TotalTransmittedPackets
Battery Battery_wakelock
Battery_service
Logcat Logcat_verbose
Logcat_debug
Logcat_info
Logcat_warning
Logcat_error
Logcat_total
Process Process_total

5. License

You may redistribute, republish and mirror the CCCS-CIC-AndMal-2020 dataset in any form. However, any use or redistribution of the data must include a citation to the CCCS-CIC-AndMal-2020 dataset and the following papers.

David Sean Keyes, Beiqi Li, Gurdip Kaur, Arash Habibi Lashkari, Francois Gagnon, Frederic Massicotte, "EntropLyzer: Android Malware Classification and Characterization Using Entropy Analysis of Dynamic Characteristics", Reconciling Data Analytics, Automation, Privacy, and Security: A Big Data Challenge (RDAAPS), IEEE, Canada, ON, McMaster University, 2021

Abir Rahali, Arash Habibi Lashkari, Gurdip Kaur, Laya Taheri, Francois Gagnon, and Frédéric Massicotte, "DIDroid: Android Malware Classification and Characterization Using Deep Image Learning", 10th International Conference on Communication and Network Security (ICCNS2020), Pages 70–82, Tokyo, Japan, November 2020

Acknowledgements

We thank the Mitacs Globalink Program for providing the Research Internship (GRI) opportunity and Harrison McCain Young Scholar Foundation funds from University of New Brunswick (UNB) for supporting this project. We also thank CCCS for sharing the malware samples of this dataset with us.

Download the dataset