1. AutoCyber: A Cold Start Problem

Vehicles are becoming computer-on-wheels, facing higher risk of being compromised.

As automotive vehicles are increasingly connected, they face higher risk of being compromised. Especially, their in-vehicle networks are prone to attacks due to original designs with no security concerns in mind and one of the most common attacks is injecting messages to a vehicle’s CAN Bus.

In fact, this received widespread media attention in 2015 that a Senate bill was proposed in 2015 and recently reintroduced in 2019 to “ensure cybersecurity in increasingly computerized vehicles”.

Similar threats exist for aircrafts, smart factories, smart buildings, and of course, the increasing number of IoT appliances. But let’s focus on automotive cybersecurity in this tutorial.


Cyberattaks

Two of the proposed requirement in the SPY Car act are:

  • All entry points to the electronic systems of each motor vehicle manufactured for sale in the United States shall be equipped with reasonable measures to protect against hacking attacks
  • Any motor vehicle manufactured for sale in the United States that presents an entry point shall be equipped with capabilities to immediately detect, report, and stop attempts to intercept driving data or control the vehicle

It is not obvious that such intrusion detection system (IDS) could work, but it turns out that with careful system design, we can construct it with Human1st.AI. The nature of CAN bus data and vehicle operations is that the normal traffic is highly regular (unlike an open node on the internet) and we can leverage this to build an IDS.

Let’s dive in!

1a. CAN data basics

Let’s familiarize ourselves with vehicle data.

Controller Area Network (CAN Bus) is a common in-vehicle network architecture. It was designed to avoid massive physical wires between Engine Control Units (ECUs) in a vehicle. A CAN packet (also called message)’s payload contains data from one or more ECUs which we refer to as sensors such as Car Speed, Steering Wheel’s Angle, Yaw Rate, Longitudinal Acceleration (Gx), Latitudinal Acceleration (Gy).

CAN Bus’ simple communication protocol makes it vulnerable to cyber-attacks due to security issues such as message broadcasting, no authentication, etc. Injection attacks are common to CAN Bus.

Note
All the tutorial notebooks and code is available from the H1st Github project at https://github.com/h1st-ai/h1st/tree/master/examples/AutoCyber.

Simply go ahead and clone it, then follow along.

The following dataset is originally based on https://zenodo.org/record/3267184#.XpHta1NKhQJ with important processing done by Arimo. The reason is that recreating realistic message frequency for each CAN ID is crucial for this problem. Simply following along the tutorial would help you understand why this is needed.

For convenient, we provides a utility function to download this dataset which is about ~200MB in size.


import util
data_files = util.load_data()

Fetching https://h1st-tutorial-autocyber.s3.amazonaws.com/h1st_autocyber_tutorial_data.zip ...
data_files['attack_files'][:5]

['data/attack-samples/20181116_Driver1_Trip4-1.parquet',
 'data/attack-samples/20181203_Driver1_Trip9-2.parquet',
 'data/attack-samples/20181116_Driver1_Trip4-0.parquet',
 'data/attack-samples/20181214_Driver3_Trip7-2.parquet',
 'data/attack-samples/20181203_Driver1_Trip9-1.parquet']
 
 
Timestamp SteeringAngle CarSpeed YawRate Gx Gy Label AttackSensor AttackMethod AttackParams AttackEventIndex
0 0.013574 67.604385 NaN NaN NaN NaN Normal NA NA 0.0 <NA>
1 0.024278 NaN NaN 0.189777 0.002458 -0.002173 Normal NA NA 0.0 <NA>
2 0.024343 NaN 0.0 NaN NaN NaN Normal NA NA 0.0 <NA>
3 0.027083 67.608772 NaN NaN NaN NaN Normal NA NA 0.0 <NA>
4 0.037508 NaN NaN 0.189665 0.002375 -0.002151 Normal NA NA 0.0 <NA>
5 0.038148 67.613159 NaN NaN NaN NaN Normal NA NA 0.0 <NA>
6 0.043605 67.617538 NaN NaN NaN NaN Normal NA NA 0.0 <NA>
7 0.045474 NaN 0.0 NaN NaN NaN Normal NA NA 0.0 <NA>
8 0.048622 NaN NaN 0.189554 0.002292 -0.002130 Normal NA NA 0.0 <NA>
9 0.056639 67.621925 NaN NaN NaN NaN Normal NA NA 0.0 <NA>
10 0.059873 67.626312 NaN NaN NaN NaN Normal NA NA 0.0 <NA>
11 0.062288 NaN NaN 0.189442 0.002208 -0.002109 Normal NA NA 0.0 <NA>
12 0.068527 NaN 0.0 NaN NaN NaN Normal NA NA 0.0 <NA>
13 0.072090 67.630699 NaN NaN NaN NaN Normal NA NA 0.0 <NA>
14 0.073968 NaN NaN 0.189330 0.002125 -0.002088 Normal NA NA 0.0 <NA>
15 0.083613 67.635086 NaN NaN NaN NaN Normal NA NA 0.0 <NA>
16 0.086092 NaN NaN 0.189219 0.002041 -0.002066 Normal NA NA 0.0 <NA>
17 0.092441 NaN 0.0 NaN NaN NaN Normal NA NA 0.0 <NA>
18 0.096228 67.639473 NaN NaN NaN NaN Normal NA NA 0.0 <NA>
19 0.096762 67.643860 NaN NaN NaN NaN Normal NA NA 0.0 <NA>

Note that the data has a particular rhythm to it: each non-NA CarSpeed or YawRate comes at a regular interval, and YawRate/Gx/Gy messages always come with each other. In technical parlance, these are 3 different CAN IDs with different message payloads.

1b. Simulating attacks

Now comes the hard & fun part, as we only have normal data. How can we develop an intrusion detection system?

The first natural step is to generate attack data. There are many ways to simulate such attacks but the cheapest method is simply to inject fake messages into the stored data stream.

A more realistic (and also more expensive) method to safely simulate attacks is to inject messages directly into the CAN bus while vehicle is stationary (engine on/transmission in park), or when vehicle is in motion in a controlled driving environment / test track such as conducted by the NHTSA

For convenience, we have provided some synthetic samples (they are generated using the aegis_datagen.py). We can visualize one such attack as follow.


SENSORS = ["SteeringAngle", "CarSpeed", "YawRate", "Gx", "Gy"]

df.loc[df.AttackEventIndex == 3, ["Timestamp", "Label", "AttackSensor"] + SENSORS].head(20)
Timestamp Label AttackSensor SteeringAngle CarSpeed YawRate Gx Gy
48734 216.321254 Normal NA NaN NaN 0.132825 -0.040180 -0.127843
48735 216.323931 Normal NA NaN NaN 0.134205 -0.037794 -0.128857
48736 216.326549 Normal NA NaN 46.249397 NaN NaN NaN
48737 216.330955 Normal NA 2.3928 NaN NaN NaN NaN
48738 216.333062 Attack YawRate NaN NaN 0.131447 -0.037794 -0.128857
48739 216.335049 Normal NA NaN NaN 0.135585 -0.035409 -0.129870
48740 216.342751 Normal NA 2.4144 NaN NaN NaN NaN
48741 216.344695 Attack YawRate NaN NaN 0.131461 -0.035409 -0.129870
48742 216.348972 Normal NA NaN 46.203197 NaN NaN NaN
48743 216.349076 Normal NA NaN NaN 0.136965 -0.033023 -0.130883
48744 216.352874 Normal NA NaN NaN 0.138345 -0.030637 -0.131897
48745 216.355882 Normal NA 2.4360 NaN NaN NaN NaN
48746 216.356924 Attack YawRate NaN NaN 0.131457 -0.030637 -0.131897
48747 216.366063 Normal NA NaN NaN 0.139725 -0.028252 -0.132910
48748 216.367203 Normal NA 2.4576 NaN NaN NaN NaN
48749 216.368773 Attack YawRate NaN NaN 0.131455 -0.028252 -0.132910
48750 216.370226 Normal NA NaN 46.156998 NaN NaN NaN
48751 216.377833 Normal NA NaN NaN 0.141105 -0.025866 -0.133924
48752 216.378901 Normal NA 2.4792 NaN NaN NaN NaN
48753 216.380554 Attack YawRate NaN NaN 0.131486 -0.025866 -0.133924
z = df[df.AttackEventIndex == 3]
yr = z[(z.Timestamp > z.Timestamp.min()) & (z.Timestamp < z.Timestamp.min()+10)].dropna(subset=["YawRate"])
import matplotlib.pyplot as plt
att = yr[yr["Label"] == "Attack"]
normal = yr[yr["Label"] == "Normal"]
plt.plot(normal.Timestamp, normal.YawRate, label="normal")
plt.plot(att.Timestamp, att.YawRate, label="attack")
plt.legend()

<matplotlib.legend.Legend at 0x7ffb8b613b80>

The key question is can ML/AD system detect the injected messages from the normal ones?