log in  |  register  |  feedback?  |  help  |  web accessibility
Logo
Me and My Software Engineering Research
Friday, November 7, 2014, 11:00 am-12:00 pm Calendar
  • You are subscribed to this talk through .
  • You are watching this talk through .
  • You are subscribed to this talk. (unsubscribe, watch)
  • You are watching this talk. (unwatch, subscribe)
  • You are not subscribed to this talk. (watch, subscribe)
Abstract

Software is a fundamental enabling technology, supporting virtually every sector of our economy. For example, studies show that software accounts for nearly all growth in labor productivity over the last 20 years or so. To sustain this growth, software systems are becoming ever larger and ever more complex. My research focuses on software engineering - the tools, methods and processes that help us deliver large-scale software systems that work as they’re supposed to and that are delivered on-time and on-budget.

In this talk I’ll introduce myself, discuss some of my recent research results aimed at efficiently testing large-scale configurable software systems, and briefly outline several new projects I’m currently starting up. In particular I’ll focus on some recent work developing and evaluating a new system and algorithm called iTree.

iTree - Efficiently Discovering High-Coverage Configurations

Modern software systems are increasingly configurable. While this has many benefits, it also makes some software engineering tasks, such as software testing, much harder. This is because, in theory, unique errors could be hiding in any configuration, and, therefore, every configuration may need to undergo expensive testing. As this is generally infeasible, developers need cost-effective techniques for selecting which specific configurations they will test. One popular selection approach is combinatorial interaction testing (CIT), where the developer selects a strength t and then computes a covering array (a set of configurations) in which all t-way combinations of configuration option settings appear at least once.

In prior work, we demonstrated several limitations of the CIT approach. In particular, we found that a given system’s effective configuration space—the minimal set of configurations needed to achieve a specific goal— may comprise only a tiny subset of the system’s full configuration space. We also found that this effective configuration space may not be well approximated by t-way covering arrays. Based on these insights we have developed an algorithm called interaction tree discovery (iTree).

iTree is an iterative learning algorithm that efficiently searches for a small set of configurations that closely approximates a system’s effective configuration space. On each iteration iTree tests the system on a small sample of carefully chosen configurations, monitors the system’s behaviors, and then applies machine learning techniques to discover which combinations of option settings are potentially responsible for any newly observed behaviors. This information is used in the next iteration to pick a new sample of configurations that are likely to reveal further new behaviors.

We have evaluated the iTree algorithm by comparing the coverage it achieves versus that of covering arrays and randomly generated configuration sets. We have also evaluated its scalability by using it to test MySQL, at 1M+ LOC database system. Our results strongly suggest that the iTree algorithm is highly scalable and can identify a high-coverage test set of configurations more effectively than existing methods.

(Joint work with Charles Song and Jeffrey S. Foster)

This talk is organized by Jeff Foster