log in  |  register  |  feedback?  |  help  |  web accessibility
Logo
Capturing Community Behavior in Very Large Networks using MapReduce
Tuesday, December 4, 2012, 3:30-4:30 pm Calendar
  • You are subscribed to this talk through .
  • You are watching this talk through .
  • You are subscribed to this talk. (unsubscribe, watch)
  • You are watching this talk. (unwatch, subscribe)
  • You are not subscribed to this talk. (watch, subscribe)
Abstract

Graphs and networks are used to model interactions in a variety of contexts, and there is a growing need to accurately measure and model large-scale networks. We consider especially the role of triangles, which are useful for measuring social cohesion. This talk will focus on two topics: (1) Accurately estimating the number of triangles and clustering coefficients for very large-scale networks, and (2) Generating very large-scale artificial networks that capture the degree distribution and clustering coefficients of observed data.

Triangles form a basic pattern of interest for understanding networks because they reflect friend-of-friend relationships. However, counting triangles can be extremely expensive in terms of both computational time and memory requirements. We explain how sampling can be used to accurately estimate the number of triangles and the number of triangles per degree (or degree range) in an undirected graph, as well as the number of each type of directed triangle in a directed graph. Using Hoeffing's inequality, we can predict exactly how many samples we need for a desired error and confidence level, and this sample size is independent of the size of the graph. We describe a MapReduce implementation and provide examples demonstrating its utility.

Once we know how to measure triangles, we can calculate the clustering coefficients and use this data as input to a generative model. Ideally, a good model should be able to reproduce the observations, yet few models are able to do so. We hypothesize that any graph with a heavy-tailed degree distribution and large clustering coefficients must contain a scale-free collection of dense ER subgraphs. From this, we propose the Block Two-Level Erd?s-Rényi (BTER) model, and demonstrate that it accurately captures the observable properties of many real-world social networks. Moreover, the BTER model can scale to very large networks, and we describe our MapReduce implementation and recent experimental results.

This is joint work with Ali Pinar, Todd Plantenga, C. Seshadhri (Sandia National Labs) and Christine Task (Purdue University).

This talk is organized by Howard Elman