The CIC Modbus Dataset contains network (pcap) captures and attack logs from a simulated substation network. The dataset is categorized into two groups: an attack dataset and a benign dataset.
The attack dataset includes network traffic captures that simulate various types of Modbus protocol attacks in a substation environment. The attacks are reconnaissance, query flooding, loading payloads, delay response, modify length parameters, false data injection, stacking Modbus frames, brute force write and baseline replay. These attacks are based of some techniques in the MITRE ICS ATT&CK framework.
On the other hand, the benign dataset consists of normal network traffic captures representing legitimate Modbus communication within the substation network.
The purpose of this dataset is to facilitate research, analysis, and development of intrusion detection systems, anomaly detection algorithms and other security mechanisms for substation networks using the Modbus protocol.
The CIC Modbus Dataset was generated from Wireshark captures obtained from a simulated testbed. As the dataset is based on a simulated Docker environment, the Docker containers were created to represent IEDs and SCADA HMIs. Python scripts were generated to run the logic of IEDs and SCADA HMIs.
The logic for an IED is to periodically change the voltage values randomly or when a request is received from SCADA HMI to do so. The logic of the SCADA HMI is to tap-change based on values received from IED and close or open based on overvoltage or undervoltage.
The containers were built to contain either the detection code (Java jar files) and scripts, or only the scripts. IEDs or SCADA HMIs that contain only the scripts are the insecure devices. The secure IEDs or SCADA HMIs contain both the jar files and scripts. Each secure device contains an agent that sends detection scores to a central agent.
The CIC Modbus Dataset was collected using the following methods:
The CIC Modbus Dataset is provided in the following formats:
The CIC Modbus Dataset includes several fields or attributes across the different files. Here is a breakdown of the fields, their data types, possible values or categories and explanations.
The IPs of the devices are shown below:
The CIC Modbus Dataset provides valuable resources for various research and practical applications, including:
To facilitate accurate labeling and analysis, it is recommended to extract IP-specific versions of the pcap files for research purposes. This allows for precise identification and classification of network traffic associated with specific IP addresses.
Webinar example of dataset use: "Securing Substations with Trust, Risk Posture, and Multi-Agent Systems: A Comprehensive Approach" by Dr. Kwasi Boakye-Boateng, Postdoctoral Fellow, Canadian Institute for Cybersecurity and Q&A with Sumit Kundu.
The creators of the CIC Modbus Dataset would like to acknowledge the following organizations for their contributions and support:
For any inquiries, feedback, or collaboration opportunities related to the CIC Modbus Dataset, please contact:
You may redistribute, republish and mirror the CIC Modbus Dataset 2023 dataset in any form. However, any use or redistribution of data must include a citation to the CIC Modbus Dataset 2023 dataset and the following paper.
Kwasi Boakye-Boateng, Ali A. Ghorbani, and Arash Habibi Lashkari, "Securing Substations with Trust, Risk Posture and Multi-Agent Systems: A Comprehensive Approach," 20th International Conference on Privacy, Security and Trust (PST), Copenhagen, Denmark, August. 2023.