IoT-DIAD 2024 | Datasets | Research | Canadian Institute for Cybersecurity | UNB

Global Site Navigation (use tab and down arrow)

Canadian Institute for Cybersecurity

CIC IoT-DIAD 2024 dataset

A dual-function dataset for IoT device identification and anomaly detection

The primary goal of this research is to introduce a comprehensive IoT attack dataset designed for both IoT device identification and anomaly detection, aiming to advance security analytics applications for real-world IoT environments. To achieve this, 33 distinct attacks are conducted within an IoT topology comprising 105 devices at the Canadian Institute for Cybersecurity.

These attacks are classified into seven categories: DDoS, DoS, Recon, Web-based, Brute Force, Spoofing, and Mirai. All attacks are executed by malicious IoT devices targeting other IoT devices.

The proposed approach leverages both packet-based and flow-based feature extraction techniques to extract a diverse and essential set of features for robust anomaly detection and device classification. This novel combined feature set incorporates a wide range of attributes from various domains, including HTTPS-related features, handshake information, and User Agent strings, specifically extracted for IoT device identification. Additionally, the feature set includes specialized attributes for anomaly detection, such as stream, channel, and jitter metrics, which are calculated over different time intervals to enhance the model’s anomaly detection capabilities. The following workflow illustrates the integrated framework for the IoT Device Identification and Anomaly Detection System.

Data descriptions

  • Benign
  • DDoS
  • Brute Force
  • Spoofing
  • DoS
  • Recon
  • Web-based
  • Mirai

IoT device tables

The following table presents the complete set of behaviour-based features extracted using the packet-based approach for both device identification and anomaly detection.

No Feature name
1 stream
2 (device_mac) Label 1 for DI
3 src_ip
4 dst_ip
5 src_port
6 dst_port
7 inter_arrival_time
8 time_since_previously_displayed_frame
9 port_class_dst
10 l4_tcp
11 l4_udp
12 ttl
13 eth_size
14 tcp_window_size
15 payload_entropy
16 handshake_version
17 handshake_cipher_suites_length
18 handshake_cipher_suites
19 handshake_extensions_length
20 tls_server
21 handshake_sig_hash_alg_len
22 http_request_method
23 http_host
24 http_response_code
25 User_Agent
26 dns_server
27 dns_query_type
28 dns_len_qry
29 dns_interval
30 dns_len_ans
31 eth_src_oui
32 eth_dst_oui
33 payload_length
34 highest_layer
35 http_uri
36 http_content_len
37 http_content_type
38 icmp_type
39 icmp_checksum_status
40 icmp_data_size
41 ntp_interval
42 most_freq_spot
43 min_et
44 q1
45 min_e
46 var_e
47 q1_e
48 sum_p
49 min_p
50 max_p
51 med_p
52 average_p
53 var_p
54 q3_p
55 q1_p
56 iqr_p
57 l3_ip_dst_count
58 jitter
59 stream_1_count
60 stream_1_mean
61 stream_1_var
62 src_ip_1_count
63 src_ip_1_mean
64 src_ip_1_var
65 src_ip_mac_1_count
66 src_ip_mac_1_mean
67 src_ip_mac_1_var
68 channel_1_count
69 channel_1_mean
70 channel_1_var
71 stream_jitter_1_sum
72 stream_jitter_1_mean
73 stream_jitter_1_var
74 stream_5_count
75 stream_5_mean
76 stream_5_var
77 src_ip_5_count
78 src_ip_5_mean
79 src_ip_5_var
80 src_ip_mac_5_count
81 src_ip_mac_5_mean
82 src_ip_mac_5_var
83 channel_5_count
84 channel_5_mean
85 channel_5_var
86 stream_jitter_5_sum
87 stream_jitter_5_mean
88 stream_jitter_5_var
89 stream_10_count
90 stream_10_mean
91 stream_10_var
92 src_ip_10_count
93 src_ip_10_mean
94 src_ip_10_var
95 src_ip_mac_10_count
96 src_ip_mac_10_mean
97 src_ip_mac_10_var
98 channel_10_count
99 channel_10_mean
100 channel_10_var
101 stream_jitter_10_sum
102 stream_jitter_10_mean
103 stream_jitter_10_var
104 stream_30_count
105 stream_30_mean
106 stream_30_var
107 src_ip_30_count
108 src_ip_30_mean
109 src_ip_30_var
110 src_ip_mac_30_count
111 src_ip_mac_30_mean
112 src_ip_mac_30_var
113 channel_30_count
114 channel_30_mean
115 channel_30_var
116 stream_jitter_30_sum
117 stream_jitter_30_mean
118 stream_jitter_30_var
119 stream_60_count
120 stream_60_mean
121 stream_60_var
122 src_ip_60_count
123 src_ip_60_mean
124 src_ip_60_var
125 src_ip_mac_60_count
126 src_ip_mac_60_mean
127 src_ip_mac_60_var
128 channel_60_count
129 channel_60_mean
130 channel_60_var
131 stream_jitter_60_sum
132 stream_jitter_60_mean
133 stream_jitter_60_var
134 Label 2 for AD

The following table presents the complete set of flow-based features specifically extracted for anomaly detection in IoT devices.

No Feature name
1 Flow ID
2 Src IP
3 Src Port
4 Dst IP
5 Dst Port
6 Protocol
7 Timestamp
8 Flow Duration
9 Total Fwd Packet
10 Total Bwd packets
11 Total Length of Fwd Packet
12 Total Length of Bwd Packet
13 Fwd Packet Length Max
14 Fwd Packet Length Min
15 Fwd Packet Length Mean
16 Fwd Packet Length Std
17 Bwd Packet Length Max
18 Bwd Packet Length Min
19 Bwd Packet Length Mean
20 Bwd Packet Length Std
21 Flow Bytes/s
22 Flow Packets/s
23 Flow IAT Mean
24 Flow IAT Std
25 Flow IAT Max
26 Flow IAT Min
27 Fwd IAT Total
28 Fwd IAT Mean
29 Fwd IAT Std
30 Fwd IAT Max
31 Fwd IAT Min
32 Bwd IAT Total
33 Bwd IAT Mean
34 Bwd IAT Std
35 Bwd IAT Max
36 Bwd IAT Min
37 Fwd PSH Flags
38 Bwd PSH Flags
39 Fwd URG Flags
40 Bwd URG Flags
41 Fwd Header Length
42 Bwd Header Length
43 Fwd Packets/s
44 Bwd Packets/s
45 Packet Length Min
46 Packet Length Max
47 Packet Length Mean
48 Packet Length Std
49 Packet Length Variance
50 FIN Flag Count
51 SYN Flag Count
52 RST Flag Count
53 PSH Flag Count
54 ACK Flag Count
55 URG Flag Count
56 CWR Flag Count
57 ECE Flag Count
58 Down/Up Ratio
59 Average Packet Size
60 Fwd Segment Size Avg
61 Bwd Segment Size Avg
62 Fwd Bytes/Bulk Avg
63 Fwd Packet/Bulk Avg
64 Fwd Bulk Rate Avg
65 Bwd Bytes/Bulk Avg
66 Bwd Packet/Bulk Avg
67 Bwd Bulk Rate Avg
68 Subflow Fwd Packets
69 Subflow Fwd Bytes
70 Subflow Bwd Packets
71 Subflow Bwd Bytes
72 FWD Init Win Bytes
73 Bwd Init Win Bytes
74 Fwd Act Data Pkts
75 Fwd Seg Size Min
76 Active Mean
77 Active Std
78 Active Max
79 Active Min
80 Idle Mean
81 Idle Std
82 Idle Max
83 Idle Min
84 Label

Researchers focusing on IoT device identification and anomaly detection can directly utilise the extracted features stored in CSV files to train machine learning and deep learning models, with specified labels provided for each task.

Dataset directories

The main dataset directory (CIC IoT-DIAD 2024) contains two subdirectories which individually contain network traffic features extracted using different feature extraction approaches form Pcap files, namely:

  • AD_Flow-based-features: Contains features extracted using CICFlowMeter (.csv files). This is expected to be used in Anomaly Detection (AD) and attack classification studies for IoT devices.
  • DI_AD_Packet-based-features: Contains features extracted using Packet-per-Packet analysis from Pcap files (.csv files). This dataset can be simultaneously used in both Device Identification (DI) and Anomaly Detection (AD) studies for IoT devices.
  • README.txt: Each subdirectory contains a README.txt file that provides a description of the features available in the corresponding .csv files.

Acknowledgements

The authors express their gratitude to Mastercard Vancouver Tech Hub and the Canadian Institute for Cybersecurity (CIC) for their financial and educational support.

CIC IoT dataset 2023:  Neto EC, Dadkhah S, Ferreira R, Zohourian A, Lu R, Ghorbani AA. CICIoT2023: A real-time dataset and benchmark for large-scale attacks in IoT environment. Sensors. 2023 Jun 26;23(13):5941.

Citation

More details and information on the feature descriptions, feature extraction methodologies, and baseline machine learning models used for evaluation and comparison are available in the following paper. Researchers using this dataset are requested to cite the associated research publication.

M. Rabbani, J. Gui, F. Nejati, Z. Zhou, A. Kaniyamattam, M. Mirani, G. Piya, I. Opushnyev, R. Lu, A. A. Ghorbani. "Device Identification and Anomaly Detection in IoT Environments," IEEE Internet of Things Journal, Dec 2024.

Download the dataset