Modbus 2023 | Datasets | Research | Canadian Institute for Cybersecurity | UNB

Global Site Navigation (use tab and down arrow)

Canadian Institute for Cybersecurity

CIC Modbus dataset 2023 (CICModbusDataset2023)

The CIC Modbus Dataset contains network (pcap) captures and attack logs from a simulated substation network. The dataset is categorized into two groups: an attack dataset and a benign dataset.

The attack dataset includes network traffic captures that simulate various types of Modbus protocol attacks in a substation environment. The attacks are reconnaissance, query flooding, loading payloads, delay response, modify length parameters, false data injection, stacking Modbus frames, brute force write and baseline replay. These attacks are based of some techniques in the MITRE ICS ATT&CK framework.

On the other hand, the benign dataset consists of normal network traffic captures representing legitimate Modbus communication within the substation network.

The purpose of this dataset is to facilitate research, analysis, and development of intrusion detection systems, anomaly detection algorithms and other security mechanisms for substation networks using the Modbus protocol.

Architecture

The CIC Modbus Dataset was generated from Wireshark captures obtained from a simulated testbed. As the dataset is based on a simulated Docker environment, the Docker containers were created to represent IEDs and SCADA HMIs. Python scripts were generated to run the logic of IEDs and SCADA HMIs.

The logic for an IED is to periodically change the voltage values randomly or when a request is received from SCADA HMI to do so. The logic of the SCADA HMI is to tap-change based on values received from IED and close or open based on overvoltage or undervoltage.

The containers were built to contain either the detection code (Java jar files) and scripts, or only the scripts. IEDs or SCADA HMIs that contain only the scripts are the insecure devices. The secure IEDs or SCADA HMIs contain both the jar files and scripts. Each secure device contains an agent that sends detection scores to a central agent.

Data collection

The CIC Modbus Dataset was collected using the following methods:

  • Network interface card (NIC) capture: The network traffic of each Intelligent Electronic Device (IED) within the substation network was captured using tcpdump. This allowed for the collection of specific traffic related to individual devices.
  • Docker bridge capture: The network traffic of the entire substation network was captured by monitoring the Docker bridge. This provided a comprehensive view of the network, including communication between different devices.
  • Attack scenarios: The dataset covers attacks conducted in three different scenarios: attacks from devices external to the network, attacks from compromised IEDs and attacks from compromised Human-Machine Interfaces (HMIs). Each scenario generated specific logs capturing the corresponding attack activity.

Data format

The CIC Modbus Dataset is provided in the following formats:

  • Network captures: The network captures are stored in PCAP (Packet Capture) format. The captures are chunked into 100MB files, named in sequential order and each file represents a portion of the overall network traffic.
  • Logs: The logs generated by the attack tools and the trust model are stored in CSV (Comma-Separated Values) format. The logs are grouped by dates, and each record within the log files is timestamped, providing a chronological view of the captured events.

Data dictionary

The CIC Modbus Dataset includes several fields or attributes across the different files. Here is a breakdown of the fields, their data types, possible values or categories and explanations.

PCAP files (network captures)

  • Source IP address: The source IP address of the network packet. (String)
  • Destination IP address: The destination IP address of the network packet. (String)
  • Other IP-related fields: Depending on the specific PCAP file, additional IP-related fields may be present, such as protocol, port numbers, etc.

The IPs of the devices are shown below:

  • Secure IEDs
    • IED1A – 185.175.0.4
    • IED4C – 185.175.0.8
  • Normal IEDs
    • IED1B – 185.175.0.5
  • Secure SCADA HMI – 185.175.0.2
  • Normal SCADA HMI – 185.175.0.3
  • Central Agent – 185.175.0.6
  • Attacker – 185.175.0.7

Logs (CSV files)

  • csv
    • Timestamp: The timestamp of the attack event. (Date or time)
    • TargetIP: The IP address of the targeted device. (String)
    • Attack: The type of attack. (String)
    • TransactionID: The ID of the transaction associated with the attack. (String)

Dataset usage

The CIC Modbus Dataset provides valuable resources for various research and practical applications, including:

  • Research on trust in securing substations: Researchers can utilize the pcap files to analyze trust-related aspects in securing substations. This includes evaluating trust models, assessing the effectiveness of security mechanisms and investigating trust-based intrusion detection systems.
  • Machine learning techniques: The pcap files can serve as a valuable training and evaluation resource for machine learning models. Researchers can develop and apply ML techniques, such as anomaly detection, classification or clustering, to enhance the security of substation networks.

To facilitate accurate labeling and analysis, it is recommended to extract IP-specific versions of the pcap files for research purposes. This allows for precise identification and classification of network traffic associated with specific IP addresses.

Webinar example of dataset use:  "Securing Substations with Trust, Risk Posture, and Multi-Agent Systems: A Comprehensive Approach" by Dr. Kwasi Boakye-Boateng, Postdoctoral Fellow, Canadian Institute for Cybersecurity and Q&A with Sumit Kundu.

Acknowledgments

The creators of the CIC Modbus Dataset would like to acknowledge the following organizations for their contributions and support:

Contact information

For any inquiries, feedback, or collaboration opportunities related to the CIC Modbus Dataset, please contact:

License

You may redistribute, republish and mirror the CIC Modbus Dataset 2023 dataset in any form. However, any use or redistribution of data must include a citation to the CIC Modbus Dataset 2023 dataset and the following paper.

Kwasi Boakye-Boateng, Ali A. Ghorbani, and Arash Habibi Lashkari, "Securing Substations with Trust, Risk Posture and Multi-Agent Systems: A Comprehensive Approach," 20th International Conference on Privacy, Security and Trust (PST), Copenhagen, Denmark, August. 2023.

Download the dataset