Towards certified virtual machine based regular expression parsing

Date:

Regular expressions (REs) are pervasive in computing. We use REs in text editors, string search tools (like GNU-Grep) and lexical analysers generators. Most of these tools rely on converting regular expressions to its corresponding finite state machine or use REs derivatives for directly parse an input string. In this work, we investigate the suitability of another approach: instead of using derivatives or generate a finite state machine for a given RE, we developed a virtual machine (VM) for parsing regular languages, in such a way that a RE is merely a program executed by the VM over the input string. We developed a prototype implementation in Haskell, tested it using QuickCheck and provided proof sketches of its correctness with respect to RE standard inductive semantics.

Slides