Talks

Testing Theorems, Fully Automatically

Segev Elazar Mittelman

5105 Brendan Iribe Center for Computer Science and Engineering (IRB)

Thursday, April 9, 2026, 12:30-1:30 pm

You are subscribed to this talk through .
You are watching this talk through .
You are subscribed to this talk. (unsubscribe, watch)
You are watching this talk. (unwatch, subscribe)
You are not subscribed to this talk. (watch, subscribe)

Abstract

While the ultimate goal of interactive theorem proving is to prove theorems, it can really help to test them first. Testing theorems, specifically using property-based testing, helps users identify incorrect definitions and theorem statements before they waste time on a proof that could never succeed. Unfortunately, the testing infrastructure provided by modern theorem provers has yet to reach its full potential. Even QuickChick, the state-of-the-art property-based testing framework for Rocq, which offers random generation for data satisfying inductively defined relations, often requires substantial effort and expertise to be used effectively.

This is in part because this effectiveness is heavily sensitive to both the order that hypotheses appear within a theorem, and to the order that inductive constraints appear within the inductive relations involved.

In this talk, we present a novel strategy for testing theorems that is highly effective, fully automatic, and robust to equivalent formulations of theorem and definition statements. To do so, we characterize the exponentially large space of possible QuickChick-style properties and generators as solutions to a constrained scheduling problem. To find the best property or generator in this space, we estimate effectiveness by introducing a notion of “density” for inductive relations, which approximates the tendency for a generator to succeed given arbitrary inputs. We implement our algorithm on top of the QuickChick framework for Rocq and evaluate it in a number of case studies from the literature, demonstrating that our push-button automation is on par with and in some cases even more effective at finding bugs than expertly handcrafted tests.

This talk is organized by Finn