How MIT researchers use machine learning to detect IP hijackings before they occur

he goal is to predict incidents in advance by tracing them back to the actual hijackers.

The internet uses routing tables to determine how and where data is sent and received. Without accurate and reliable tables, the internet would be like a highway system with no signs or signals to direct the traffic to the right places. Of course, cybercriminals find a way to corrupt just about everything that makes the internet work, and routing is no exception.

IP hijacking, or BGP (Border Gateway Protocol) hijacking, is a process in which hackers and cybercriminals take over groups of IP addresses by corrupting the routing tables that use BGP. The purpose is to redirect traffic on the public internet or on private business networks to the hijackers’ own networks where they can intercept, view, and even modify the packets of data. As such, IP hijacking has been used to send spam and malware and steal Bitcoin. IP hijacking has also been aimed at individuals on home networks as well as organizations with private networks, and has been backed by nation-states such as China, according to researchers.

Thwarting IP hijacking has been a challenge as most attempts focus on tackling it while it is in progress. Now, a team of researchers from MIT’s Computer Science and Artificial Intelligence Lab (CSAIL) and the University of California San Diego (UCSD) are developing a way to combat IP hijackings before they occur. A research paper written by MIT graduate student and lead author Cecilia Testart, and MIT and senior research scientist David Clark, alongside MIT postdoc Philipp Richter, data scientist Alistair King, and research scientist Alberto Dainotti of UCSD’s Center for Applied internet Data Analysis (CAIDA) described the project in detail.

How to stop perpetrators

With IP hijacking, cybercriminals exploit a security weakness in BGP, a protocol that allows different networks and parts of the internet to communicate with each other so that the data reaches the correct destination. In an IP hijack, the bad actors are able to convince nearby networks that the best path to a specific IP address is through their own network.

The key to stopping an IP hijacking is to trace it back to the actual perpetrators before it happens rather than when it is already in progress. And to do that, the team is using a new machine learning system. By detecting some of the common traits of « serial hijackers, » the team taught the system to catch around 800 suspicious networks, some of which had hijacked IP addresses for many years.

« Network operators normally have to handle such incidents reactively and on a case-by-case basis, making it easy for cybercriminals to continue to thrive, » Testart said in a press release. « This is a key first step in being able to shed light on serial hijackers’ behavior and proactively defend against their attacks. »

Specific traits of hijackers

To zero in on serial IP hijackings, the team grabbed information from network operator mailing lists and from historical BGP data taken every five minutes from the global routing table. By analyzing that information, they were able to detect specific traits of hijackers and then train their system to automatically identify those traits.

Specifically, the machine learning system tagged networks with three key traits in terms of the blocks of IP addresses they use:

Volatile changes in activity. The blocks of addresses used by hijackers appear to vanish faster than do those used by legitimate networks. On average, addresses used by hijackers disappeared after 50 days, compared with two years for legitimate addresses.
Multiple address blocks. Serial IP hijackers often advertise more blocks of IP addresses, or network prefixes. The median number was 41 compared with 23 for legitimate networks.
IP addresses in multiple countries. Most networks don’t have foreign IP addresses, while serial hijackers are more likely to register addresses in other countries and continents.

One challenge is that some IP hijackings can be the result of human error rather than a malicious attack. As a result, the team had to manually identity false positives, which accounted for around 20% of the results from the system. To cut down on the manual work, the team said it hopes that future versions of the system will be able to take on this type of activity without as much human intervention.

Ultimate goal

The ultimate goal is for this type of machine learning system to be used in actual production environments.

« This project could nicely complement the existing best solutions to prevent such abuse that include filtering, anti-spoofing, coordination via contact databases, and sharing routing policies so that other networks can validate it, » David Plonka, a senior research scientist at Akamai Technologies who was not involved in the work, said in a press release. « It remains to be seen whether misbehaving networks will continue to be able to game their way to a good reputation. But this work is a great way to either validate or redirect the network operator community’s efforts to put an end to these present dangers. »