Talks

Software Testing and Verification with Grammatical Inference

Tuesday, March 6, 2012, 11:00 am-12:00 pm

You are subscribed to this talk through .
You are watching this talk through .
You are subscribed to this talk. (unsubscribe, watch)
You are watching this talk. (unwatch, subscribe)
You are not subscribed to this talk. (watch, subscribe)

Abstract

Recent progress in automated deductive reasoning techniques has triggered a small scientific revolution in automated software testing and verification in general, and software-security in particular. Indeed, it is becoming increasingly difficult to find new automated software analysis papers that do not use or assume existence of propositional satisfiability and satisfiability-modulo theories solvers. In this talk, I will discuss the first results of my efforts to answer a simple question: What kind of automated reasoning will trigger the next scientific revolution in automated software analysis? As a potential candidate, I will present grammatical inference (GI), a class of techniques for learning formal languages. By inductively generalizing from examples, GI techniques nicely complement the capabilities of existing (deductive) approaches. Furthermore, the inferred grammars (or automata) are themselves analyzable objects, which can be used as approximations or abstractions of the behavior of the analyzed program. There is a broad class of problems in program analysis that are currently out of the reach of deductive techniques. I'll analyze several such open problems, namely (1) test generation for programs whose inputs have to adhere to complex grammars, and (2) automated verification of web sanitizers, and offer some solutions. The common trait of the above mentioned problems is that they require exact or approximate understanding of the relation between program's input and output. I'll explain how GI techniques can infer such relations and serve as a basis for inventing new powerful software analyses. In particular, I'll show that GI-based automated testing can find six times more vulnerabilities than the state-of-the-art methods, increasing code coverage by up to 58%. The novel symbolic learning technique that I'll present learns input-output relations of web sanitizers (for subsequent verification) in seconds, while construction of the same relations used to take several hours of a human expert's time in the past.

Bio

Domagoj Babic received his Dipl.Ing. in Electrical Engineering and M.Sc. in Computer Science from the Zagreb University (Faculty of Electrical Engineering and Computing) in 2001 and 2003. He received his Ph.D. in Computer Science in 2008 from the University of British Columbia. After spending a bit more than a year in industry, he joined UC Berkeley, as a research scientist. Domagoj's research focuses on various aspects of software analysis (software verification, testing, and security), decision procedures, grammatical inference, and applied formal methods in general. He is a recipient of the Canada's NSERC PDF Research Fellowship (2010-2012), Microsoft Graduate Research Fellowship (2005-2007), Fulbright Fellowship (declined to attend UBC), and several awards at international programming competitions (1st place at the 2007 Satisfiability Modulo Theories competition in the bit-vector arithmetic category and 3rd place at the 2005 Satisfiability Testing competition in the satisfiable-crafted instances category).

This talk is organized by Rance Cleaveland