This project aims to generate a state-of-the-art dataset for profiling, behavioural analysis, and vulnerability testing of different IoT devices with different protocols such as IEEE 802.11, Zigbee-based and Z-Wave. The following illustrates the main objectives of the CIC-IoT dataset project:
Current CIC IoT dataset project and activities around it can be summarized in the following steps:
Our lab network configuration was configured with a 64-bit Window machine with two network interface cards - one is connected to the network gateway, and the other is connected to an unmanaged network switch. Simultaneously, Wireshark, the open-source network protocol analyzer, listens to both interfaces, captures and saves the output packet captured (pcap) files. Hence, IoT devices that require an Ethernet connection are connected to this switch. Additionally, a smart automation hub, Vera Plus is also connected to the unmanaged switch, which creates our wireless IoT environment to serve IoT devices compatible with Wi-Fi, ZigBee, Z-Wave and Bluetooth.
For collecting the data, we captured the network traffic of the IoT devices coming through the gateway using Wireshark and dumpcap in six different types of experiments. The former was used for manual experiments, while the latter was used for semi-automated ones. All the experiments can be organized as follows:
After generating the dataset, we performed a case study on the idea of transferability – training datasets in our lab and transferring the trained model to another lab for testing. We conducted 20 different experiments based on the number of sampled devices from the United States lab.
Forty-eight features were extracted from both the training dataset from our lab and the testing dataset from the other lab. Three classes of device types were used in this experiment: Audio, Camera and Home Automation. However, no labels were required for the test dataset since that was what was to be predicted but the training dataset required labels.
After training, the model is transferred to the other lab for testing on each device to predict the class of the device in question. For example, if Amazon Echo Dot is tested on the trained model, the classifier should be able to predict this device as belonging to device type Audio. How this works is by counting the prediction of the classifier based on the features for each device type. The device type with the highest count is predicted as the class for the device in question.
The main dataset directory (CIC IoT Dataset) contains six subdirectories related to each experiment, namely:
The project is not currently in development, but any contribution is welcome. Please contact one of the authors of the paper.
YouTube video: Label Flipping Mitigation in Deep-Learning-Based IoT Profiling by Dr. Euclides Carlos Pinto Neto.
Webinar explanation about CIC IoT datasets: "From Profiling to Protection: Leveraging Datasets for Enhanced IoT Security" by Dr. Sajjad Dadkhah, Assistant Professor and Cybersecurity R&D Team Lead with Q&A by Sumit Kundu.
YouTube video: IoTProMo: Securing IoT Networks using Device Profiling and Monitoring by Alireza Zohourian with Q&A by Sumit Kundu.
The authors would like to thank the Canadian Institute for Cybersecurity for its financial and educational support.
Sajjad Dadkhah, Hassan Mahdikhani, Priscilla Kyei Danso, Alireza Zohourian, Kevin Anh Truong, Ali A. Ghorbani, “Towards the development of a realistic multidimensional IoT profiling dataset”, Submitted to: The 19th Annual International Conference on Privacy, Security & Trust (PST2022) August 22-24, 2022, Fredericton, Canada.