Documentation

Regex

Regex - A Formally Verified Regular Expression Engine #

This library provides a robust, formally verified implementation of regular expressions for Lean 4. It implements two distinct matching algorithms (a backtracker and a virtual machine), both with complete mathematical proofs of their correctness.

Features #

Installation #

Add lean-regex to your Lean project by adding the following to your lakefile.toml:

[[require]]
name = "Regex"
git = "https://github.com/pandaman64/lean-regex"
rev = "v4.20.0"
subDir = "regex"

or

require Regex from git "https://github.com/pandaman64/lean-regex.git" @ "v4.20.0" / "regex"

to your lakefile.lean.

Basic Usage #

import Regex

-- Create a regex at compile-time using re! syntax
def dateRegexExample := re! r"\d{4}-\d{2}-\d{2}"

-- Find and return matches (and their positions as components of each `Substring`)
def allMatches := dateRegexExample.findAll
  "2025-05-24: Something happened\\n2025-05-26: Another thing happened"

-- #["2025-05-24".toSubstring, "2025-05-26".toSubstring]
#eval allMatches

-- #[{ byteIdx := 0 }, { byteIdx := 32 }]
#eval allMatches.map (·.startPos)

-- Capture groups
def groupRegexExample := re! r"(a+)(b*)"

def captures := groupRegexExample.capture "aaabb"

/-
Returns captured groups: group 0 = whole match, group 1 = "aaa", group 2 = "bb":

some {
  haystack := "aaabb",
  buffer := #[
    some { byteIdx := 0 }, some { byteIdx := 5 },
    some { byteIdx := 0 }, some { byteIdx := 3 },
    some { byteIdx := 3 }, some { byteIdx := 5 }
  ]
}
-/
#eval captures

-- Additional utility methods are available
def utilityRegexExample := re! r"a+"

def haystack := "a1aa2aaa3"

-- some "a"
#eval utilityRegexExample.extract haystack

-- #["a", "aa", "aaa"]
#eval utilityRegexExample.extractAll haystack

-- true
#eval utilityRegexExample.test haystack

-- 3
#eval utilityRegexExample.count haystack

/-
Splits a string using regex matches as breakpoints:

#["", "1", "2", "3"]

Note the empty substring due to the match "a" at the beginning of the input:
"a1aa2aaa3" = "" ++ "a" ++ "1" ++ "aa" ++ "2" ++ "aaa" ++ "3"
-/
#eval utilityRegexExample.split haystack

def transformRegexExample := re! r"(a+)(b*)|(c+)"

def countString (name : String) (input : Option Substring) : String :=
  input.map (Substring.toString)
    |>.map (fun s => toString s.length ++ name)
    |>.getD ""

def countTransform (captures : CapturedGroups) : String :=
  let as := captures.get 1 |> countString "a"
  let bs := captures.get 2 |> countString "b"
  let cs := captures.get 3 |> countString "c"
  as ++ bs ++ cs

-- "1a0b 2a0b 2a1b 1c 2c"
#eval transformRegexExample.transformAll "a aa aab c cc" countTransform

-- Transforming matches into a fixed string is also available using `.replace` and `.replaceAll`.
#guard transformRegexExample.transformAll "a aa aab c cc" (fun _ => ".")
     = transformRegexExample.replaceAll   "a aa aab c cc" "."

API Overview #

The library provides the following main structures:

Regex operations include:

Creating Regexes #

Regexes can be created in two ways:

  1. At compile time using the re! macro:

    let regex := re! r"pattern"
    

    This checks the regex syntax at compile time and reports errors during compilation.

  2. At runtime using Regex.parse or Regex.parse!:

    let regex ← Regex.parse "pattern" -- Returns Except with possible error
    let regex := Regex.parse! "pattern" -- Panics on invalid syntax
    

Matching Engines #

The library provides two matching engines:

  1. Virtual Machine (default): Efficient for most regular expressions
  2. Backtracker: Can be enabled with { regex with useBacktracker := true }

Both engines are formally verified to be correct implementations of regex semantics.