Active Internet measurement relies on responses to active probes such as ICMP Echo Request or TCP SYN messages. Active Internet measurement is very useful in that it enables researchers to measure the Internet without privileged data from ISPs. Researchers use active measurement to study Internet topology, route dynamics and link bandwidth by sending many packets through selected links, and measure RTTs and reliability through probing many addresses. A fundamental challenge in active measurement design is in allocating and limiting measurement traffic by carefully choosing where measurements are sent and how many samples are taken per measurement. It is important to minimize measurement loads because heavy measurement traffic may appear malicious. If network operators consider measurement traffic as attacks, then they can blacklist the sources of measurement traffic and thus affect the completeness and accuracy of the measurement. Another challenge of active measurement is that biases can occur due to no responses from or biased selection of destinations. Biases can cause misleading conclusions and thus should be minimized.
In this dissertation, I develop a general approach to reducing measurement loads and biases of active Internet measurement based on the insight that they can be reduced by letting Internet addresses represent larger aggregates. I first develop a technique that identifies and aggregates topologically proximate addresses. The technique called Hobbit compares traceroute results to measure topological proximity. Hobbit deals with load-balanced paths that can cause incorrect inferences of topological proximity by distinguishing between route differences due to load balancing and due to distinct route entries. Hobbit also makes a unique contribution that it can aggregate even discontiguous addresses. This contribution is important in that fragmented allocations of IPv4 addresses are common in the Internet.
I apply Hobbit to IPv4 addresses and identify 0.51M aggregates of addresses (i.e. Hobbit blocks) that contain 1.77M /24 blocks. I evaluate the homogeneity of Hobbit blocks using RTTs and show that Hobbit blocks are as homogeneous as /24s even though their sizes are generally larger than /24s. I then demonstrate that Hobbit blocks improve the efficiency of Internet topology mapping by comparing strategies that select destinations from Hobbit and /24 blocks. I also quantify the efficiency improvement of latency estimation that can be achieved by using Hobbit blocks. I show that Hobbit blocks tend to be stable over time and analyze the measurement cost of Hobbit block generation.
I finally demonstrate that Hobbit blocks can improve the representativeness of network measurement. I develop a methodology that measures the representativeness of measurement and show that active Internet measurement may not be representative even if the entire IPv4 space is probed. By using Hobbit blocks, I adapt weighting adjustment, which is a common bias correction technique in surveys, to active Internet measurement. I evaluate the weighting adjustment using various kinds of samples and show that the weighting adjustment reduces biases in most cases. If Hobbit blocks are given, the weighting adjustment incurs no measurement cost. I make Hobbit blocks publicly available and update them every month for researchers who want to perform weighting adjustment or to improve the efficiency of network measurement.
Chair: Dr. Neil Spring
Dean’s rep: Dr. Rama Chellappa
Members: Dr. Bobby Bhattacharjee
Dr. David Mount
Dr. David Levin