Most of the latest studies on detection models for DoS or DDoS have been applied in general networks. Therefore, no dataset of DoS or DDoS in electric vehicle (EV) charging infrastructure exists. In addition, existing datasets have information on the reception count of packets during a specific period. However, our dataset provides more diverse machine learning features, including packet access counts and system status information on charging facilities. The dataset in this work can contribute to EV charging system analyses and provide training and testing features for a DoS or DDoS attack detection classifier. To create this dataset, we developed a simulator to simulate multiple EVs, charging stations (CSs) and a GS of charging infrastructure network and implemented four attack scenarios.
The dataset consists of four attack scenarios based on (i) Correct EV ID, (ii) Wrong EV, (iii) Wrong EV Timestamp and (iv) Wrong CS Timestamp. We created an EV authentication protocol on the simulator to get the dataset. We applied a session key, Hash-based Message Authentication Code (HMAC), time stamp, and AND, OR and XOR operations to the protocol.
The first step consisted of data collection. A simulator was built in Python in the Ubuntu environment to collect the data. We used ACN-Network for that purpose and accessed charging time from Sept. 5, 2019 to Sept. 6, 2020. We provide features to reflect different attack scenarios. We identified the following features of the CS and GS on the simulator: (i) data indicating the Linux kernel overhead based on CPU cycles, branch instructions and general instructions, (ii) data representing system performance status based on the number of consumed CPU cycles, branch instruction and general instructions, (iii) the time differences in legitimate authentication trials or DDoS attacks. The data in (i) and (ii) were collected in real-time using Perf.
This dataset consists of profiling results such as the performance overhead and system resource consumption in the profiling target. Another type of feature represents time differences for authentication times of EVs in each CS.
To create a normal scenario, we use the monthly charging count for one year (2019.9.5 ~ 2020.9.6) in the ACN network. Based on this information, a CS and a normal EV charging scenario were constructed. Each CS is created as a separate multi-process. Perf does profiling by referring to the PID of this process. EVs are created on multi-threads and Fig. 1 shows the structure of the simulator used in this study.
The normal EV charging scenario was constructed based on a single year of the ACN Network. In simulations, it is impractical to collect data over long periods. Therefore, the long-term normal scenario time is converted based on the time spent in the attack scenario. In this way, a normal charging schedule for a year can be simulated simultaneously as the standard of an attack scenario within tens of minutes. In a normal charging scenario, each EV may show a charging time interval of several hours or days in a specific CS, but these time intervals are time scaled through the equation below. The scaling formula is:
The Scenario based on DDoS attacks with false authentication and timestamp manipulation consists of four parts.
When the attacks occur, based on the normal Scenario, multiple DDoS attacks and normal EVs compete with each other to be authenticated by the GS through CS. At this time, EVs attempt normal authentication with the correct session key and timestamp and attacking EVs execute the four attack scenarios above. When attacking for each Scenario, the strength of the attacks proceeds differently. DDoS attacks are of four types:
The full attack mode attacks many identical EVs to all CSs simultaneously. In this dataset, 2,000 attack EVs are executed for each CS. The random attack mode arbitrarily chooses the victim CS under the attacks. The Gaussian analysis attack mode assumes a smart DDoS attack. This attack uses Gaussian analysis to create a distribution similar to the normal EV authentication distribution. This makes it difficult for a detection model to differentiate between attack and normal authentication. Although the impact of the attacks is weaker than the full attack, it can cause service delays without being easily detected through statistical analysis.
Suppose the level of DoS attacks is adjusted by analyzing the Gaussian distribution for the number of authentications in a normal scenario; in that case, an attack detection model's false positive and false positive rates can be increased. In general, the entropy of the data distributions of the normal and attack scenarios can be maximized by following the natural distribution of the number of authentication requests according to the Gaussian distribution (ϕ). The difference between the attack and the normal authentication attempt can be ambiguous. Therefore, the difficulty of classification inevitably increases.
These features were measured with Perf when normal EV authentications or DDoS attacks occurred. The status of CS and GS is measured in real-time through Perf.
Feature Name | Description |
---|---|
Time delta | It is the interval between the immediately preceding and subsequent authentications. |
Instruction overhead | It provides overhead information about libraries and code symbols used in the Linux kernel by counting the number of instructions used by the profiling targets (CS or GS). |
CPU cycle overhead | Perf calculates cycles used by profiling targets and provides overhead information about libraries and code symbols used in the Linux kernel. |
Branch overhead | It provides overhead information about libraries and code symbols used in the Linux kernel by counting the number of branch instructions used by the profiling target. |
Cycles | It means the number of cycles consumed by the profiling target. |
Instructions | It refers to the number of instructions the profiling target uses. |
Branches | It refers to the number of branch commands executed by the profiling target. |
"Time delta" represents the time differences between the previous EV authentication and the next. This feature can facilitate cosine similarity analysis to discern DoS or DDoS attacks. "Instruction overhead" is a feature obtained by measuring the overhead of each symbol based on the number of instructions implemented in the profiling target CS or GS.
Likewise, "CPU cycle overhead" and "Branch overhead" are features regarding the overhead of symbols. The former concerns the number of consumed cycles in each symbol and the latter concerns the number of consumed branch instructions. "Cycles," "Instruction" and "Branch" represent the total number of systemwide consumed cycles, instructions and branch instructions in CS or GS.
The explanations on our dataset directory and file naming structures are as follows:
The authors graciously acknowledge the support from the Canadian Institute for Cybersecurity (CIC), the funding support from the Canada Research Chair and Atlantic Canada Opportunities Agency (ACOA).
Y. Kim, S. Hakak, and A. Ghorbani. "DDoS Attack Dataset (CICEV2023) against EV Authentication in Charging Infrastructure," in 2023 20th Annual International Conference on Privacy, Security and Trust (PST), IEEE Computer Society, pp. 1-9, August 2023.
Revised version