Dragana Milovancevic
Biography
I'm a PhD student in the LARA group at EPFL, under the supervision of Viktor Kuncak. My research interests are in the field of formal verification, and include equivalence checking and automated grading.Publications
Infoscience publications
Proving Termination via Measure Transfer in Equivalence Checking (Extended Version)
Program verification can benefit from proofs with varied induction schemas. A natural class of induction schemas, functional induction, consists of those derived from definitions of functions. For such inductive proofs to be sound, it is necessary to establish that the functions terminate, which is a challenging problem on its own. In this paper, we consider termination in the context of equivalence checking of a candidate program against a provably terminating reference program annotated with termination measures. Using equivalence checking, our approach automatically matches function calls in the reference and candidate programs and proves termination by transferring measures from a measure-annotated program to one without annotations. We evaluate this approach on existing and newly written termination benchmarks, as well as on exercises in programming courses. Our evaluation corpus comprises around 10K lines of code. We show empirically that the termination measures of reference programs often successfully prove the termination of equivalent candidate programs, ensuring the soundness of inductive reasoning in a fully automated manner.
2024Proving Termination via Measure Transfer in Equivalence Checking
Program verification can benefit from proofs with varied induction schemas. A natural class of induction schemas, functional induction, consists of those derived from definitions of functions. For such inductive proofs to be sound, it is necessary to establish that the functions terminate, which is a challenging problem on its own. In this paper, we consider termination in the context of equivalence checking of a candidate program against a provably terminating reference program annotated with termination measures. Using equivalence checking, our approach automatically matches function calls in the reference and candidate programs and proves termination via measure transfer. We evaluate this approach on existing and newly written termination benchmarks, as well as on exercises in programming courses. Our evaluation corpus comprises around 10K lines of code. We show empirically that the termination measures of reference programs often successfully prove the termination of equivalent candidate programs, ensuring the soundness of inductive reasoning in a fully automated manner.
2024Formal Autograding in a Classroom (Experience Report)
We report our first experience in enhancing automated grading in an undergraduate programming course using formal verification. In our experiment, we deploy a program verifier to check the equivalence between student submissions and reference solutions, alongside the existing, testing-based grading infrastructure. We find that we can use the conservative nature of program equivalence checking to our advantage. We were able to use such equivalence to differentiate student submissions according to their high-level program structure, in particular their recursion pattern, even when their input-output behaviour is identical. Consequently, we achieve (1) higher confidence in correctness of idiomatic solutions but also (2) more thorough assessment of solution landscape that reveals solutions beyond those envisioned by instructors.
2024Formula Normalizations in Verification
We propose a new approach for normalization and simplification of logical formulas. Our approach is based on algorithms for lattice-like structures. Specifically, we present two efficient algorithms for computing a normal form and deciding the word problem for two subtheories of Boolean algebra, giving a sound procedure for propositional logical equivalence that is incomplete in general but complete with respect to a subset of Boolean algebra axioms. We first show a new algorithm to produce a normal form for expressions in the theory of ortholattices (OL) in time O(n^2). We also consider an algorithm, recently presented but never evaluated in practice, producing a normal form for a slightly weaker theory, orthocomplemented bisemilattices (OCBSL), in time O(n log(n)^2). For both algorithms, we present an implementation and show efficiency in two domains. First, we evaluate the algorithms on large propositional expressions, specifically combinatorial circuits from a benchmark suite, as well as on large random formulas. Second, we implement and evaluate the algorithms in the Stainless verifier, a tool for verifying the correctness of Scala programs. We used these algorithms as a basis for a new formula simplifier, which is applied before valid verification conditions are saved into a persistent cache. The results show that normalization substantially increases cache hit ratio in large benchmarks.
2023. 35th International Conference on Computer Aided Verification, Paris, France.Proving and Disproving Equivalence of Functional Programming Assignments
We present an automated approach to verify the correctness of programming assignments, such as the ones that arise in a functional programming course. Our approach takes as input student submissions and reference solutions, and uses equivalence checking to automatically prove or disprove correctness of each submission. To be effective in the context of a real-world programming course, an automated grading system must be both robust, to support programs written in a variety of style, and scalable, to treat hundreds of submissions at once. We achieve robustness by handling recursion using functional induction and by handling auxiliary functions using function call matching. We achieve scalability using a clustering algorithm that leverages the transitivity of equivalence to discover intermediate reference solutions among student submissions. We implement our approach on top of the Stainless verification system, to support equivalence checking of Scala programs. We evaluate our system and its components on over 4000 programs drawn from a functional programming course and from the program equivalence checking literature; this is the largest such evaluation to date. We show that our system is capable of proving program correctness by generating inductive equivalence proofs, and providing counterexamples for incorrect programs, with a high success rate.
2023. p. 928 - 951. DOI : 10.1145/3591258.