Regex - A Formally Verified Regular Expression Engine #
This library provides a robust, formally verified implementation of regular expressions for Lean 4. It implements two distinct matching algorithms (a backtracker and a virtual machine), both with complete mathematical proofs of their correctness.
Features #
- Formally Verified: Both matching algorithms are mathematically proven to correctly implement the semantics of regular expressions
- Real-World Regex Features: Support for capture groups, character classes, substring matching, and more
- Two Matching Engines: Choose between a backtracker or virtual machine implementation
- Simple API: Clean, easy-to-use interface for common regex operations
Installation #
Add lean-regex to your Lean project by adding the following to your lakefile.toml
:
[[require]]
name = "Regex"
git = "https://github.com/pandaman64/lean-regex"
rev = "v4.20.0"
subDir = "regex"
or
require Regex from git "https://github.com/pandaman64/lean-regex.git" @ "v4.20.0" / "regex"
to your lakefile.lean
.
Basic Usage #
import Regex
-- Create a regex at compile-time using re! syntax
def dateRegexExample := re! r"\d{4}-\d{2}-\d{2}"
-- Find and return matches (and their positions as components of each `Substring`)
def allMatches := dateRegexExample.findAll
"2025-05-24: Something happened\\n2025-05-26: Another thing happened"
-- #["2025-05-24".toSubstring, "2025-05-26".toSubstring]
#eval allMatches
-- #[{ byteIdx := 0 }, { byteIdx := 32 }]
#eval allMatches.map (·.startPos)
-- Capture groups
def groupRegexExample := re! r"(a+)(b*)"
def captures := groupRegexExample.capture "aaabb"
/-
Returns captured groups: group 0 = whole match, group 1 = "aaa", group 2 = "bb":
some {
haystack := "aaabb",
buffer := #[
some { byteIdx := 0 }, some { byteIdx := 5 },
some { byteIdx := 0 }, some { byteIdx := 3 },
some { byteIdx := 3 }, some { byteIdx := 5 }
]
}
-/
#eval captures
-- Additional utility methods are available
def utilityRegexExample := re! r"a+"
def haystack := "a1aa2aaa3"
-- some "a"
#eval utilityRegexExample.extract haystack
-- #["a", "aa", "aaa"]
#eval utilityRegexExample.extractAll haystack
-- true
#eval utilityRegexExample.test haystack
-- 3
#eval utilityRegexExample.count haystack
/-
Splits a string using regex matches as breakpoints:
#["", "1", "2", "3"]
Note the empty substring due to the match "a" at the beginning of the input:
"a1aa2aaa3" = "" ++ "a" ++ "1" ++ "aa" ++ "2" ++ "aaa" ++ "3"
-/
#eval utilityRegexExample.split haystack
def transformRegexExample := re! r"(a+)(b*)|(c+)"
def countString (name : String) (input : Option Substring) : String :=
input.map (Substring.toString)
|>.map (fun s => toString s.length ++ name)
|>.getD ""
def countTransform (captures : CapturedGroups) : String :=
let as := captures.get 1 |> countString "a"
let bs := captures.get 2 |> countString "b"
let cs := captures.get 3 |> countString "c"
as ++ bs ++ cs
-- "1a0b 2a0b 2a1b 1c 2c"
#eval transformRegexExample.transformAll "a aa aab c cc" countTransform
-- Transforming matches into a fixed string is also available using `.replace` and `.replaceAll`.
#guard transformRegexExample.transformAll "a aa aab c cc" (fun _ => ".")
= transformRegexExample.replaceAll "a aa aab c cc" "."
API Overview #
The library provides the following main structures:
Regex
: The main structure representing a compiled regular expressionMatches
: A structure for iterating through all matches of a regex in a stringCaptures
: A structure for iterating through all matches with their capture groupsCapturedGroups
: A structure representing the capture groups from a single match
Regex operations include:
Regex.find
: Find the first match of a regex in a stringRegex.findAll
: Find all matches of a regex in a stringRegex.extract
: Extract the first match of a regex in a stringRegex.extractAll
: Extract all matches of a regex in a stringRegex.capture
: Capture the first match with its capture groupsRegex.captureAll
: Capture all matches with their capture groupsRegex.replace
: Replace the first match with a replacement stringRegex.replaceAll
: Replace all matches with a replacement stringRegex.transform
: Transform the first match using a rule based on its captured groupsRegex.transformAll
: Transform all matches using a rule based on their captured groupsRegex.test
: Test if a regex matches a stringRegex.count
: Count the number of regex matches in a stringRegex.split
: Split a string using regex matches as breakpoints
Creating Regexes #
Regexes can be created in two ways:
At compile time using the
re!
macro:let regex := re! r"pattern"
This checks the regex syntax at compile time and reports errors during compilation.
At runtime using
Regex.parse
orRegex.parse!
:let regex ← Regex.parse "pattern" -- Returns Except with possible error let regex := Regex.parse! "pattern" -- Panics on invalid syntax
Matching Engines #
The library provides two matching engines:
- Virtual Machine (default): Efficient for most regular expressions
- Backtracker: Can be enabled with
{ regex with useBacktracker := true }
Both engines are formally verified to be correct implementations of regex semantics.