The main goal of this research is to provide cybersecurity researchers focusing on APT detection tasks with a dataset collected for an APT campaign in an industrial internet of things (IIoT) environment. To achieve this, we designed an attack scenario based on the operations of the APT29 attack group. We implemented the attacks in an IIoT environment and collected both Provenance logs and Network traffic data.
The main contributions of our research are as follows:
We have developed a simulation testbed to create a controlled environment that supports IIoT research and is particularly effective for simulating APT scenarios. This testbed, built on the Brown-IIoTbed framework architecture, incorporates a mix of virtual and physical components to accurately reflect the complexity and dynamics of real-world IIoT systems.
At the core of our testbed is the NS3 network simulator, which operates on an Ubuntu host. The setup includes two Ubuntu virtual machines and two Kali Linux virtual machines, all hosted on a system running NS3. Additionally, the testbed features two Raspberry Pi devices and two IoT sensors. Raspberry Pi1 is equipped with OpenPLC and utilizes the Modbus protocol for communication, while Raspberry Pi2 functions as a WiFi access point to facilitate enhanced connectivity for the IoT sensors.
The dataset contains two datatypes: Provenance data and network logs and each of these datatypes are collected during two phases of the experiment. The provenance data files are in CSV format and contain the nodes and edges of the provenance graph. Each node in the provenance data is assigned a unique 32-digit ID, which is utilized by the edge entries to establish connections between
nodes in the graph.
Besides the IDs, the provenance data files comprise 32 features in total. However, due to the heterogeneous nature of nodes and edges that are all in a single file, not all features apply to every node or edge type, resulting in many fields being populated with NaN values. The provenance data includes two main node types: Process and Artifact. The Artifact node type is further categorized into various subtypes such as file, directory, network socket, link, and unknown, the latter being used for provenance node types that do not fit into the existing subtypes. The common edge types in the provenance graph are: ``Used" (from Process to Artifact), ``WasGeneratedBy" (WGB; from Artifact to Process), ``WasTriggeredBy" (WTB; from Process to Process), and ``WasDerivedFrom" (WDF; from Artifact to Artifact).
The other data type in the dataset is the network logs captured using NS3 during the experiments and stored in pcap format. These pcap files can be further processed into CSV format and various features can be extracted from these files. The last file in the dataset is the Attack Information file, which contains all necessary information about the attacks performed during the experiments in phase 2. This information includes attack time, attack PID, and the category of attack. This file helps the researchers to further analyze the dataset behaviour during the attacks.
Tactic | Technique ID | Attack Type | APT Group |
---|---|---|---|
Collection | T1074 | Data Staged: Local Data Staging | APT28, APT29, APT39, APT3 |
T1005 | Data from Local System | Andariel, APT28, APT29 | |
T1119 | Automated Collection | APT1, APT28, Chimera | |
T1113 | Screen Capture | APT28, APT39, Carbanak | |
T1115 | Clipboard Data | APT29, APT29, APT38 | |
Exfiltration | T1560 | Archive Collected Data: Archive via Utility | APT28, APT29, APT32 |
T1041 | Exfiltration Over C2 Channel | Lazarus, APT3, APT32 | |
Command and Control | T1105 | Ingress Tool Transfer | Lazarus, APT29, APT3 |
Persistence | T1546 | Event Triggered Execution | APT28, APT29, APT3 |
T1136 | Create Account: Local Account | Dragonfly, FIN13, APT29 | |
Discovery | T1087 | Account Discovery: Local Account | APT1, APT3, Chimera |
T1016 | System Network Configuration Discovery: Internet Connection Discovery | FIN13, Gamaredon, APT29 | |
System Network Configuration Discovery: Wi-Fi Discovery | Magic Hound, Wizard Spider | ||
T1033 | System Owner/User Discovery | Chimera, Dragonfly, APT3 | |
T1518 | Software Discovery | HEXANE, MuddyWater | |
T1069 | Permission Groups Discovery: Local Groups | Chimera, HEXANE, APT29 | |
T1082 | System Information Discovery | Chimera, APT3, APT32 | |
T1083 | File and Directory Discovery | APT28, APT29, APT32 | |
T1018 | Remote System Discovery | Chimera, APT29, APT32 | |
Credential Access | T1552 | Unsecure Credentials: Credentials In Files | APT3, APT33, FIN13 |
Unsecure Credentials: Bash History | - | ||
T1555 | Credentials from Password Stores: Credentials from Web Browsers | APT33, APT39, HEXANE | |
Lateral Movement | T1021 | Remote Services: SSH | APT29, APT39, Lazarus |
Defence Evasion | T1036 | Masquerading: Right-to-Left Override | APT28, APT29, Dragonfly |
T1485 | Data Destruction | APT38, Gamaredon, Lazarus |
Webinar explanation about CIC IoT datasets: "From Profiling to Protection: Leveraging Datasets for Enhanced IoT Security" by Dr. Sajjad Dadkhah, Assistant Professor and Cybersecurity R&D Team Lead, with Q&A by Sumit Kundu.
YouTube video: CICAPT-IIOT: A Provenance-Based APT Attack Dataset for IIOT Environment by Erfan Ghiasvand, Cybersecurity Software Developer, Canadian Institute for Cybersecurity with introduction and Q&A by Sumit Kundu.
The authors would like to thank the Canadian Institute for Cybersecurity (CIC) and National Research Council Canada (NRC) for its financial and educational support.
E. Ghiasvand, S. Ray, S. Iqbal, S. Dadkhah, A. Ghorbani. "Resilience Against APTs: A Provenance-Based IIoT Dataset for Cybersecurity Research," - (Submitted to ESORICS 2024 Conference).
E. Ghiasvand, S. Ray, S. Iqbal, S. Dadkhah, and A. A. Ghorbani. "CICAPT-IIOT: A provenance-based APT attack dataset for IIoT environment," preprint, July 2024.