log in  |  register  |  feedback?  |  help  |  web accessibility
Logo
PhD Defense: Automating Performance Diagnosis in Networked Systems
Justin McCann - University of Maryland, College Park
Tuesday, April 10, 2012, 2:00-3:00 pm Calendar
  • You are subscribed to this talk through .
  • You are watching this talk through .
  • You are subscribed to this talk. (unsubscribe, watch)
  • You are watching this talk. (unwatch, subscribe)
  • You are not subscribed to this talk. (watch, subscribe)
Abstract

THE DISSERTATION DEFENSE FOR THE DEGREE OF Ph.D. IN COMPUTER SCIENCE FOR

                                    Justin McCann

Diagnosing performance degradation in distributed systems is a complex and difficult task.  Software that performs well in one environment may be unusably slow in another, and determining the root cause is time-consuming and error-prone, even in environments in which all the data may be available. End users have an even more difficult time trying to diagnose system performance, since both software and network problems have the same symptom: a stalled application.

The central thesis of this dissertation is that the source of performance stalls in a distributed system can be automatically detected and diagnosed with very limited information: the dependency graph of data flows through the system, and a few counters common to almost all data processing systems. Our automated fault detection system requires as little as two bits of information per module: one to indicate whether the module is actively processing data, and one to indicate whether the module is waiting on its dependents. We prove this thesis by implementing the idea and demonstrating its effectiveness in two distinct environments: an individual host's networking stack, and a distributed streams processing system. Using real applications, we show that our approach correctly diagnoses 99% of networking-related stalls due to application, connection-specific, or network-wide performance problems, with a false positive rate under 3%. Our prototype system for diagnosing messaging stalls in a commercial streams processing system correctly finds 93% of message-processing stalls, with a false positive rate of 2%.      

Examining Committee:

Committee Chair:                       Dr. Michael Hicks

Dean’s Representative:              Dr. Mark Shayman

Committee Members:                Dr. Peter Keleher

                                                Dr. James Reggia

                                                Dr. Neil Spring

EVERYONE IS INVITED TO ATTEND THE PRESENTATIVE PORTION OF THIS DEFENSE

This talk is organized by Jeff Foster