Update: the source code for the 2D IP map generation tool ipmap is available under BSD licence here: ipmap-0.2

Google Maps is one of the few sites on the Internet that totally work. The interface is clean and even new users can grasp the drap-and-drop/zoom functions. The simplicity of the interface was the motivation for this project. Why can't the Google Maps navigation concepts be applied to visualizing other high-dimentional data?

It turns out that displaying other data with the Google Maps interface is not too hard because the Google maps backend is very flexible. After seeing Kyle's work overlaying other maps at a MESH meeting I kept wondering if the longitude/latitude coordinate systems could be remapped to display other data. Almost all the heavy lifting (fetching images, compositing them with overlays, and adding markers) is done on the client (javascript in the browser) so it should be possible to replace the backend map server and retain all the fancy navigation capability.

The approach Google took to providing detailed maps and also scaling to millions of users is to divide each map into small chunks (256 X 256 pixels) and only send nearby chunks to the client. The client is responsible for requesting the specific map chunks from Google as the user scrolls and zooms around the map. So with a few lines of javascript it's possible replace the backend image server with your own webserver with your own map images. For more information check out Mapki which is an excellent reference for Google Maps hacks.

Our research group deals with tons of data (think > 20TB and growing) and visualizing this data can be very challenging. One of the fundamental challenges is that to view the entire dataset one loses the detail of small but equally important features. It's just like viewing a broad map. It's interesting to have map of all of Europe, but if you are trying to drive from Piccadilly Circus in London to Cambridge in England such a but map is not helpful. Two datasets we deal with that have the feature of being interesting at many different levels are BGP data and Darknet data.

BGP Advertisements:

BGP is the routing protocol used in the core of the Internet and it is used to distribute IP address reachability information. For example, if my organization is assigned the block of addresses 141.211.0.0 through 141.211.255.255 (141.211.0.0/16 in CIDR notation), then I can tell the rest of world I have 141.211.0.0/16 using BGP. Another Internet user in India trying to reach me will send a packet that reach the core Internet routers and know how to reach 141.211.0.0/16 because the routers observed the BGP annoucement.

Since address blocks (also called prefixes) must be advertised to be globally reachable, one can obtain a good view of the entire Internet by looking at all the prefixes advertised by all organizations on the Internet. Together, there are about 180,000 unique prefixes in the BGP routing table today.

The problem is how to visualize the huge amounts of high-dimensional BGP routing data. One interesting approach is to plot AS relationships in a radial-space. Another approach is to plot prefixes in a quadrant-based 2D space. Patrick McDaniel at Penn State also used the technique to visualize BGP advertisements. The basic idea is to recursively choose a quadrant in a 2D plot based on the most significant 2 bits of an IP address. An example of finding the location in 2D-space for 6-bit number is shown in the figure on the left. The Manish Karir over at Merit Network constructed a tool called Flamingo that uses the technique and has some additional visualizations.

The quad-based visualization techique provides a novel way of viewing of BGP data however the resulting map is huge. If a single IP address (/32) is represented by a single pixel the the resulting map is 65,536 pixels by 65,536 pixels. That means that is only possible to view the part of the map or a lower resolution version map at any one time. Which brings us back to Google Maps. Google Maps is great at viewing huge image spaces at multiple resolutions. So I set about making Google maps plot BGP data on a full map of IPv4 address space.

The process turned out to be very simple. I wrote a short program that took in a list of prefixes and produced a set of 256x256 pixel tiles for 8 different resolutions. I simply loaded the image files on a web server and wrote some tiny glue javascript to make Google Maps point at my image server. You can browse the result by clicking on the image below. At the highest zoom level each pixel represents a /16 and the lowest a /32. Two bits are required to generate the next deeper zoom level so there are 8 zoom levels in all.


Click to explore BGP data using the Google Maps interface

Darknet Data:

Another interesting source of IP address data is from darknets. Darknets are blocks of unused or unreachable address space. This means there are no legitimate hosts using the darknet IP addresses. Any traffic that is observed at these non-productive dark addresses must be the result of some outside process such as worm or virus infected hosts or misconfiguration.

I work with the Internet Motion Sensor project at the University of Michigan which monitors darknets around the Internet. The darknet data collected by the project indicates interesting trends in the behavior of Internet worms and infected hosts around the world.

Using the same quad-based visualization techique I plotted the source IP addresses detected by the IMS darknets. You can browse the result by clicking on the image below.

There are couple very interesting observations one can draw from the data. First, there appears to be a significant amount of spoofing as indicated by the evenly distributed noise across the map (and across bogon addresses). Second, there are distinct and significant hotspots, many of which correspond to address space belonging to broadband providers.


Click to explore darknet data using the Google Maps interface

Thanks to Kyle Mulka, Jose Nazario, Marius Eriksen, and Manish Karir for excellent feedback and suggestions.