Frederik Schmitt

I am a Ph.D. student in the Reactive Systems Group at CISPA Helmholtz Center for Information Security and at the Computer Science Department of Saarland University.


My research targets the integration of neural and symbolic reasoning, with major applications to automated verification. Key results are:

  • end-to-end generation of proof certificates when learning formal logics (ICLR’21, NeurIPS’24).
  • first deep learning solution for synthesizing circuits from temporal logic (NeurIPS’21, ICLR’23).
  • first interactive formalization of natural language to temporal specifications with LLMs (CAV’23).

Check out our new preprint on synthesizing correct-by-design Verilog code from formal specifications using Large Reasoning Models!

+49 681 87083 2462
CISPA, Saarbrücken
E9 1 / 1.09
Frederik Schmitt
Publications
    Learning Better Representations from Less Data for Propositional Satisfiability
    Mohamed Ghanem, Frederik Schmitt, Julian Siber and Bernd Finkbeiner · NeurIPS 2024 · Spotlight Paper

    Training neural networks on NP-complete problems typically demands very large amounts of training data and often needs to be coupled with computationally expensive symbolic verifiers to ensure output correctness. In this paper, we present NeuRes, a neuro-symbolic approach to address both challenges for propositional satisfiability, being the quintessential NP-complete problem. By combining certificate-driven training and expert iteration, our model learns better representations than models trained for classification only, with a much higher data efficiency - requiring orders of magnitude less training data. NeuRes employs propositional resolution as a proof system to generate proofs of unsatisfiability and to accelerate the process of finding satisfying truth assignments, exploring both possibilities in parallel. To realize this, we propose an attention-based architecture that autoregressively selects pairs of clauses from a dynamic formula embedding to derive new clauses. Furthermore, we employ expert iteration whereby model-generated proofs progressively replace longer teacher proofs as the new ground truth. This enables our model to reduce a dataset of proofs generated by an advanced solver by around 32% after training on it with no extra guidance. This shows that NeuRes is not limited by the optimality of the teacher algorithm owing to its self-improving workflow. We show that our model achieves far better performance than NeuroSAT in terms of both correctly classified and proven instances.

    PDF
    Details
    NeuroSynt: A Neuro-symbolic Portfolio Solver for Reactive Synthesis
    Matthias Cosler, Christopher Hahn, Ayham Omar and Frederik Schmitt · TACAS 2024

    We introduce NeuroSynt, a neuro-symbolic portfolio solver framework for reactive synthesis. At the core of the solver lies a seamless integration of neural and symbolic approaches to solving the reactive synthesis problem. To ensure soundness, the neural engine is coupled with model checkers verifying the predictions of the underlying neural models. The open-source implementation of NeuroSynt provides an integration framework for reactive synthesis in which new neural and state-of-the-art symbolic approaches can be seamlessly integrated. Extensive experiments demonstrate its efficacy in handling challenging specifications, enhancing the state-of-the-art reactive synthesis solvers, with NeuroSynt contributing novel solves in the current SYNTCOMP benchmarks.

    nl2spec: Interactively Translating Unstructured Natural Language to Temporal Logics with Large Language Models
    Matthias Cosler, Christopher Hahn, Daniel Mendoza, Frederik Schmitt and Caroline Trippel · CAV 2023

    A rigorous formalization of desired system requirements is indispensable when performing any verification task. This often limits the application of verification techniques, as writing formal specifications is an error-prone and time-consuming manual task. To facilitate this, we present nl2spec, a framework for applying Large Language Models (LLMs) to derive formal specifications (in temporal logics) from unstructured natural language. In particular, we introduce a new methodology to detect and resolve the inherent ambiguity of system requirements in natural language: we utilize LLMs to map subformulas of the formalization back to the corresponding natural language fragments of the input. Users iteratively add, delete, and edit these sub-translations to amend erroneous formalizations, which is easier than manually redrafting the entire formalization. The framework is agnostic to specific application domains and can be extended to similar specification languages and new neural models. We perform a user study to obtain a challenging dataset, which we use to run experiments on the quality of translations. We provide an open-source implementation, including a web-based frontend.

    Iterative Circuit Repair Against Formal Specifications
    Matthias Cosler, Frederik Schmitt, Christopher Hahn and Bernd Finkbeiner · ICLR 2023

    We present a deep learning approach for repairing sequential circuits against formal specifications given in linear-time temporal logic (LTL). Given a defective circuit and its formal specification, we train Transformer models to output circuits that satisfy the corresponding specification. We propose a separated hierarchical Transformer for multimodal representation learning of the formal specification and the circuit. We introduce a data generation algorithm that enables generalization to more complex specifications and out-of-distribution datasets. In addition, our proposed repair mechanism significantly improves the automated synthesis of circuits from LTL specifications with Transformers. It improves the state-of-the-art by 6.8 percentage points on held-out instances and 11.8 percentage points on an out-of-distribution dataset from the annual reactive synthesis competition.

    Neural Circuit Synthesis from Specification Patterns
    Frederik Schmitt, Christopher Hahn, Markus N. Rabe and Bernd Finkbeiner · NeurIPS 2021

    We train hierarchical Transformers on the task of synthesizing hardware circuits directly out of high-level logical specifications in linear-time temporal logic (LTL). The LTL synthesis problem is a well-known algorithmic challenge with a long history and an annual competition is organized to track the improvement of algorithms and tooling over time. New approaches using machine learning might open a lot of possibilities in this area, but suffer from the lack of sufficient amounts of training data. In this paper, we consider a method to generate large amounts of additional training data, i.e., pairs of specifications and circuits implementing them. We ensure that this synthetic data is sufficiently close to human-written specifications by mining common patterns from the specifications used in the synthesis competitions. We show that hierarchical Transformers trained on this synthetic data solve a significant portion of problems from the synthesis competitions, and even out-of-distribution examples from a recent case study.

    Teaching Temporal Logics to Neural Networks
    Christopher Hahn, Frederik Schmitt, Jens Kreber, Markus N. Rabe and Bernd Finkbeiner · ICLR 2021

    We study two fundamental questions in neuro-symbolic computing: can deep learning tackle challenging problems in logics end-to-end, and can neural networks learn the semantics of logics. In this work we focus on linear-time temporal logic (LTL), as it is widely used in verification. We train a Transformer on the problem to directly predict a solution, i.e. a trace, to a given LTL formula. The training data is generated with classical solvers, which, however, only provide one of many possible solutions to each formula. We demonstrate that it is sufficient to train on those particular solutions to formulas, and that Transformers can predict solutions even to formulas from benchmarks from the literature on which the classical solver timed out. Transformers also generalize to the semantics of the logics: while they often deviate from the solutions found by the classical solvers, they still predict correct solutions to most formulas.

    Advanced Factoring Strategies for Decoupled Search Using Linear Programming
    Frederik Schmitt, Daniel Gnad and Jörg Hoffmann · ICAPS 2019

    Star-topology decoupled state space search decomposes a planning task and searches over the component state spaces instead of multiplying the state variables. This can lead to an exponential reduction of the search effort. To do so, in a preprocess before the search, the given planning task is partitioned into factors, such that the interaction between these factors takes the form of a star topology. Prior work has identified several ways to automatically decompose planning tasks, however, was not able to release the full potential of decoupled search. We try to close this gap by introducing an integer linear programming formulation of the factoring process, allowing us to explicitly specify the properties that a factoring should have. We prove that our approach returns the factoring that maximizes the number of factors, if this is the objective, and employ two other properties to assess the quality of a factoring. Our experimental evaluation shows that this leads to superior performance and substantially increases the applicability of decoupled search.

Preprints
    Natural Synthesis: Outperforming Reactive Synthesis Tools with Large Reasoning Models
    Frederik Schmitt, Matthias Cosler, Niklas Metzger, Julian Siber, Vladimir Krsmanovic, Mohamed Ghanem and Bernd Finkbeiner · arXiv preprint

    Reactive synthesis, the problem of automatically constructing a hardware circuit from a logical specification, is a long-standing challenge in formal verification. It is elusive for two reasons: It is algorithmically hard, and writing formal specifications by hand is notoriously difficult. In this paper, we tackle both sides of the problem. For the algorithmic side, we present a neuro-symbolic approach to reactive synthesis that couples large reasoning models with model checkers to iteratively repair a synthesized Verilog implementation via sound symbolic feedback. Our approach solves more benchmarks than the best dedicated tools in the annual synthesis competition and extends to constructing parameterized systems, a problem known to be undecidable. On the specification side, we introduce an autoformalization step that shifts the specification task from temporal logic to natural language by introducing a hand-authored dataset of natural-language specifications for evaluation. We demonstrate performance comparable to that of starting from formal specifications, establishing natural synthesis as a viable end-to-end workflow.

    Formal Specifications from Natural Language
    Christopher Hahn, Frederik Schmitt, Julia J. Tillman, Niklas Metzger, Julian Siber and Bernd Finkbeiner · arXiv preprint

    We study the generalization abilities of language models when translating natural language into formal specifications with complex semantics. In particular, we fine-tune language models on three datasets consisting of English sentences and their corresponding formal representation: 1) regular expressions (regex), frequently used in programming and search; 2) First-order logic (FOL), commonly used in software verification and theorem proving; and 3) linear-time temporal logic (LTL), which forms the basis for industrial hardware specification languages. Our experiments show that, in these diverse domains, the language models maintain their generalization capabilities from pre-trained knowledge of natural language to generalize, e.g., to new variable names or operator descriptions. Additionally, they achieve competitive performance, and even outperform the state-of-the-art for translating into regular expressions, with the benefits of being easy to access, efficient to fine-tune, and without a particular need for domain-specific reasoning.

Contributed Talks & Workshops
    Neural Circuit Synthesis with Pre-trained Language Models
    Frederik Schmitt, Matthias Cosler and Bernd Finkbeiner · DAV 2023

    This extended abstract reports preliminary results on fine-tuning pre-trained language models for solving reactive synthesis problems end-to-end. In recent work, hierarchical Transformer neural networks have been successfully trained from scratch to synthesize sequential circuits directly out of formal specifications. We improve over existing approaches by fine-tuning CodeT5 models that have been pre-trained on both natural language and programming languages. Our experiments show improved generalization and sample efficiency compared to the previous approach.

    Deep Learning for Temporal Logics
    Frederik Schmitt, Christopher Hahn, Jens Kreber, Markus N. Rabe and Bernd Finkbeiner · AITP 2021

    Temporal logics are a well established formal specification paradigm to specify the behavior of systems, and serve as inputs to industrial-strength verification tools. We report on current advances in applying deep learning to temporal logical reasoning tasks, showing that models can even solve instances where competitive classical algorithms timed out.

Teaching
Contact Data Privacy Policy Imprint
Home People Publications
More