PhD Defense: Learning Temporal Properties from Data Streams

Samuel Huang

Abstract

Verifying that systems behave as expected is a cornerstone of computing. In formal verification approaches, engineers capture their intentions, or specifications, mathematically, often using logic. The verification task is then to confirm that the system satisfies its specifications. In a formal setting this endeavor typically involves the construction of mathematical proofs, which are either constructed automatically, as in the case of so-called model-checking techniques, or by humans with machine assistance, as in the case of theorem-proving-based methodologies.

In practice, formal verification faces a number of obstacles. One involves the construction of formal specifications in the first place. Another is the lack of availability of analyzable system artifacts (source code, executables) about which proofs can be constructed. In this dissertation we propose techniques for inferring formal specifications, in the form of Linear Temporal Logic formulas, from system executions, and models when these are available, as a means of addressing these concerns.

We first illustrate how guided generation of test cases (i.e. guided exploration of the system’s input/output space) can be leveraged to develop and then refine the hypothesis set of invariants that a system satisfies. This approach was successful in revealing a system property required in code specification of an automotive application provided to us but missing in the implementation. An unexpected (undocumented) specification was also discovered through our analysis. The approach has since been applied by other researchers to several other automotive applications.

Second, we develop techniques to mine properties from models of systems and/or their executions. In some cases compact, finite state representations of a system is available. In this scenario we employ a novel automaton-based approach to mine properties matching a user-specified template. In other cases such white-box knowledge is not available and we must work over executions of the system rather than the system itself. In this scenario we apply a novel variant of Linear Temporal Logic (LTL) using finite-sequence semantics to again mine properties based on a specified template.

Lastly, we consider situations where standard formal properties are insufficient due to uncertainty or being overly simplified. Oftentimes properties that are not satisfied 100% of the time can be very interesting during a system inspection or system redesign. For example, natural noise manifesting in data streams from system executions such as missing evidence and changing system behavior could lead to significant properties being overlooked if a strict “exactitude” were enforced. We explore the notion of “incomplete” properties which we term partial invariants by formulating a new “Noisy Linear Temporal Logic” which is an extension of standard LTL. We consider several representations of such noise and show decidability of the language emptiness problem for some of the variants.

Examining Committee:

In practice, formal verification faces a number of obstacles. One involves the construction of formal specifications in the first place. Another is the lack of availability of analyzable system artifacts (source code, executables) about which proofs can be constructed. In this dissertation we propose techniques for inferring formal specifications, in the form of Linear Temporal Logic formulas, from system executions, and models when these are available, as a means of addressing these concerns.

We first illustrate how guided generation of test cases (i.e. guided exploration of the system’s input/output space) can be leveraged to develop and then refine the hypothesis set of invariants that a system satisfies. This approach was successful in revealing a system property required in code specification of an automotive application provided to us but missing in the implementation. An unexpected (undocumented) specification was also discovered through our analysis. The approach has since been applied by other researchers to several other automotive applications.

Second, we develop techniques to mine properties from models of systems and/or their executions. In some cases compact, finite state representations of a system is available. In this scenario we employ a novel automaton-based approach to mine properties matching a user-specified template. In other cases such white-box knowledge is not available and we must work over executions of the system rather than the system itself. In this scenario we apply a novel variant of Linear Temporal Logic (LTL) using finite-sequence semantics to again mine properties based on a specified template.

Lastly, we consider situations where standard formal properties are insufficient due to uncertainty or being overly simplified. Oftentimes properties that are not satisfied 100% of the time can be very interesting during a system inspection or system redesign. For example, natural noise manifesting in data streams from system executions such as missing evidence and changing system behavior could lead to significant properties being overlooked if a strict “exactitude” were enforced. We explore the notion of “incomplete” properties which we term partial invariants by formulating a new “Noisy Linear Temporal Logic” which is an extension of standard LTL. We consider several representations of such noise and show decidability of the language emptiness problem for some of the variants.

Examining Committee:

Chair: Dr. Rance Cleaveland

Dean's rep: Dr. Steven Marcus

Members: Dr. Adam Porter

Dr. Jim Reggia

Dr. David Van Horn

Dean's rep: Dr. Steven Marcus

Members: Dr. Adam Porter

Dr. Jim Reggia

Dr. David Van Horn

Bio

Samuel Huang is a PhD candidate in the Computer Science department at the University of Maryland, where he works with Prof. Rance Cleaveland. His research interests lie at the intersection of formal methods and machine learning, with special interests in cyber-physical systems. He recently has also been interested in challenges related to provable security.

This talk is organized by Tom Hurst