RIS.regexp: regular expressions and the free Kleene algebra.

Set Implicit Arguments.

Require Import tools algebra language.
Require Import Bool.
Section regexp.
  Context {X : Set}.

Definitions

Regular expressions are terms generated by the following syntax.
  Inductive regexp : Set :=
  | e_zero : regexp
  | e_un : regexp
  | e_add : regexp regexp regexp
  | e_prod : regexp regexp regexp
  | e_star : regexp regexp
  | atomic : X regexp.

  Notation " ⟪ l ⟫" := (atomic l).

The size of an expression is the number of occurrences of variables appearing in it.
  Fixpoint sizeExpr e :=
    match e with
    | e_un | e_zero ⇒ 0
    | _ ⇒ 1
    | e_add e f | e_prod e fsizeExpr e + sizeExpr f
    | e_star esizeExpr e
    end.

We may associate to an expression e the regular language e.
  Reserved Notation " ⟦ e ⟧ ".
  Fixpoint reg_lang (e : regexp) : language :=
    match e with
    | e_un𝟭
    | e_zero𝟬
    | e_prod e fe · f
    | e_add e fe f
    | e_star ee
    | l fun ww = [l]
    end
  where " ⟦ e ⟧ " := (reg_lang e).


Given a set of identities Ax (i.e. a binary relation over expressions) and a set of quasi-equations Ax' (i.e. a quaternary relation), we define the relation {| Ax , Ax' |} e == f as the smallest congruence stable by both Ax and Ax'.
  Reserved Notation " {| Ax , Ax' |} ⊢ e == f " (at level 80).
  Inductive ax_eq (Ax : relation regexp) (Ax' : regexp regexp regexp regexp Prop)
    : relation regexp :=
  | ax_eq_refl e : {|Ax,Ax'|} e == e
  | ax_eq_sym e f : {|Ax,Ax'|} e == f {|Ax,Ax'|} f == e
  | ax_eq_trans e f g : {|Ax,Ax'|} e == f {|Ax,Ax'|} f == g {|Ax,Ax'|} e == g
  | ax_eq_prod e f e' f' : {|Ax,Ax'|} e == f {|Ax,Ax'|} e' == f' {|Ax,Ax'|} e · e' == f · f'
  | ax_eq_add e f e' f' : {|Ax,Ax'|} e == f {|Ax,Ax'|} e' == f' {|Ax,Ax'|} e e' == f f'
  | ax_eq_star e f : {|Ax,Ax'|} e == f {|Ax,Ax'|} e == f
  | ax_eq_ax e f : Ax e f {|Ax,Ax'|} e == f
  | ax_eq_ax' e f g h : Ax' e f g h {|Ax,Ax'|} e == f {|Ax,Ax'|} g == h
  where " {| Ax , Ax' |} ⊢ e == f " := (ax_eq Ax Ax' e f).

  Definition ax_inf Ax Ax' e f := {|Ax,Ax'|} e f == f.
  Notation " {| Ax , Ax' |} ⊢ e =<= f " := (ax_inf Ax Ax' e f) (at level 80).

  Hint Constructors ax_eq.

Basic properties

Properties of the language interpretation

  Lemma Σ_lang L u : Σ L u e, e L e u.

  Lemma split_star e u v :
    e (u++v)
    (u=[] v=[])
     u1 u2 v1 v2, u = u1++u2 v=v1++v2 e u1 e (u2++v1) e v2.

Properties of the axiomatic equivalence

  Global Instance ax_eq_Equivalence Ax Ax' : Equivalence (ax_eq Ax Ax').

  Global Instance prod_ax_eq Ax Ax' :
    Proper (ax_eq Ax Ax' ==> ax_eq Ax Ax' ==> ax_eq Ax Ax') prod.

  Global Instance join_ax_eq Ax Ax' :
    Proper (ax_eq Ax Ax' ==> ax_eq Ax Ax' ==> ax_eq Ax Ax') join.

  Global Instance star_ax_eq Ax Ax':
    Proper (ax_eq Ax Ax'==> ax_eq Ax Ax') star.

If the axioms Ax,Ax' are sound with respect to the language interpretation, so is the relation {|Ax,Ax'|}⊢_==_.
  Remark soundness_proof (Ax: relation regexp)(Ax' : regexp regexp regexp regexp Prop):
    ( e f, Ax e f e f)
    ( e f g h, Ax' e f g h e f g h)
     e f, {|Ax,Ax'|} e == f e f.

  Section gen_proofs.
    Context {Ax : relation regexp}{Ax' : regexp regexp regexp regexp Prop}.

A set of axioms is good enough if it entails that is idempotent, associative and commutative.
    Class GoodEnoughAxiom Ax :=
      {join_idem : e, Ax (e e) e;
       join_assoc : e f g, Ax (e (f g)) ((e f) g);
       join_comm : e f, Ax (e f) (f e)}.

    Context {G : GoodEnoughAxiom Ax}.

In this case, we can show that {|Ax,Ax'|} e =<= f is a partial order, thus yielding a semi-lattice structure.
    Global Instance ax_inf_PreOrder : PreOrder (fun e f{|Ax,Ax'|} e =<= f).

    Global Instance ax_inf_PartialOrder :
      PartialOrder (ax_eq Ax Ax') (fun e f{|Ax,Ax'|} e =<= f).

    Global Instance joinOrder_ax : JoinOrder _ (ax_eq Ax Ax') (ax_inf Ax Ax') join.

    Lemma ax_eq_inf e f : {| Ax,Ax'|} e == f {| Ax,Ax'|} e =<= f.

    Global Instance Semilattice_ax : Semilattice _ (ax_eq Ax Ax') join.

Ax is very good if in addition it entails distributivity of · over .
    Class VeryGoodAxioms Ax :=
      {left_distr : e f g, Ax (e · (f g)) (e·f e·g);
       right_distr : e f g, Ax ((e f g) (e·g f·g)}.

    Context {V : VeryGoodAxioms Ax}.

In this case, the product preserves the ordering.
    Global Instance prod_ax_inf :
      Proper (ax_inf Ax Ax' ==> ax_inf Ax Ax' ==> ax_inf Ax Ax') prod.

  End gen_proofs.

The axiomatization KA.

  Reserved Notation " e =KA f " (at level 80).
  Reserved Notation " e <=KA f " (at level 80).

  Inductive KA : relation regexp :=
  | KA_prod_assoc e f g : e · (f · g) =KA (e · f) · g
  | KA_add_assoc e f g : e (f g) =KA (e f) g
  | KA_add_comm e f : e f =KA f e
  | KA_left_distr e f g : e · (f g) =KA e·f e·g
  | KA_right_distr e f g : (e f g =KA e·g f·g
  | KA_zero e : 𝟬 KA e
  | KA_un_left e : 𝟭 · e =KA e
  | KA_un_right e : e · 𝟭 =KA e
  | KA_left_zero e : 𝟬 · e =KA 𝟬
  | KA_right_zero e : e · 𝟬 =KA 𝟬
  | KA_idem e : e KA e
  | KA_star_unfold e : 𝟭 e · e KA e
  where " e =KA f " := (KA e f)
  and " e <=KA f " := (KA (e f) f).

  Hint Constructors KA.

  Inductive KA' : regexp regexp regexp regexp Prop :=
  | KA_star_left_ind e f : KA' (e · f f) f (e · f f) f
  | KA_star_right_ind e f : KA' (e · f e) e (e · f e) e.

  Hint Constructors KA'.

  Global Instance goodKA : GoodEnoughAxiom KA.
  Global Instance verygoodKA : VeryGoodAxioms KA.

  Lemma star_left_ind e f : {| KA,KA'|} e · f =<= f {| KA,KA'|} e · f =<= f.
  Lemma star_right_ind e f : {| KA,KA'|} e · f =<= e {| KA,KA'|} e · f =<= e.

The set of expressions equipped with the axioms KA,KA' forms a Kleene algebra.
  Global Instance KA_regexp : KleeneAlgebra regexp (ax_eq KA KA') (ax_inf KA KA').
  Global Instance Semilattice_KA : Semilattice _ (ax_eq KA KA') join := join_semilattice.

Axiomatic equivalence is sound with respect to the language interpretation.

Equatity modulo associativity, commutativity, idempotence and units.

  Inductive ACI0 : relation regexp :=
  | ACI0_prod_assoc e f g : ACI0(e · (f · g)) ((e · f) · g)
  | ACI0_add_assoc e f g : ACI0 (e (f g)) ((e f) g)
  | ACI0_add_comm e f : ACI0 (e f) (f e)
  | ACI0_add_idem e : ACI0 (e e) e
  | ACI0_add_zero e : ACI0 (e 𝟬) e
  | ACI0_prod_one e : ACI0 (𝟭 · e) e.
  Definition Empt : regexp regexp regexp regexp Prop := fun _ _ _ _False.
  Hint Constructors ACI0.

  Global Instance goodACI0 : GoodEnoughAxiom ACI0.

  Infix " ≡0 " := (ax_eq ACI0 Empt) (at level 80).
  Infix " ≦0 " := (ax_inf ACI0 Empt) (at level 80).

  Lemma KA_ACI0 e f : e ≡0 f {|KA,KA'|}⊢e==f.

Utilities

Variables

Var e computes the set of variables appearing in e.
  Fixpoint Var (e : regexp) : list X :=
    match e with
    | e_un | e_zero[]
    | x[x]
    | e_prod e f | e_add e fVar e ++ Var f
    | e_star eVar e
    end.

  Lemma Var_spec e w : e w w Var e.

Empty word

ϵ e yields 𝟭 if e contains the empty word, 𝟬 otherwise.
  Fixpoint ϵ (e : regexp) : regexp :=
    match e with
    | e_un | e_star _e_un
    | e_zero | _e_zero
    | e_prod e f
      match (ϵ e,ϵ f) with
      | (e_un,e_un)e_un
      | _e_zero
      end
    | e_add e f
      match (ϵ e,ϵ f) with
      | (e_un,_) | (_,e_un)e_un
      | _e_zero
      end
    end.

  Lemma nil_prod_lang (l m : @language X) : (l·m) [] l [] m [].

  Lemma ϵ_zero_or_un e : {ϵ e = e_un} + {ϵ e = e_zero}.

  Lemma ϵ_spec e : ϵ e = e_un e [].

  Lemma ϵ_inf_e e : {|KA,KA'|} ϵ e =<= e.

ϵ e is 𝟭 exactly when the inequation 𝟭<=e is provable in KA.
  Lemma ϵ_ax_spec e : ϵ e = un {|KA,KA'|} un =<= e.

  Lemma ϵ_zero e : ϵ e = e_zero ¬ e [].

  Remark ϵ_sub_id e : {|KA,KA'|} ϵ e =<= 𝟭.

  Lemma ϵ_add e f : {|KA,KA'|} ϵ (e f) == ϵ e ϵ f.

  Lemma ϵ_prod e f : {|KA,KA'|} ϵ (e · f) == ϵ e · ϵ f.

  Global Instance ϵ_proper : Proper (ax_eq KA KA' ==> ax_eq KA KA') ϵ.

  Global Instance ϵ_proper_inf : Proper (ax_inf KA KA' ==> ax_inf KA KA') ϵ.

  Lemma ϵ_idem e : ϵ (ϵ e) = ϵ e.

  Lemma ϵ_KA e f : {|KA, KA'|}⊢ e == f ϵ e = ϵ f.

Empty language

test0 e is a boolean, it returns true if and only if the language of e is empty.
  Fixpoint test0 e :=
    match e with
    | e_zerotrue
    | e_un | e_star _ | _false
    | e_add e ftest0 e && test0 f
    | e_prod e ftest0 e || test0 f
    end.

  Lemma test0_false e : test0 e = false u, e u.

test0 e is true exactly when the equation e=𝟬 is provable in KA.
  Lemma test0_spec e : test0 e = true {|KA,KA'|} e == e_zero.

  Lemma test0_Σ L :
    test0 (Σ L) = forallb test0 L.


  Lemma test0_KA e f : {|KA, KA'|}⊢ e == f test0 e = test0 f.

  Lemma test0_ϵ e : test0 e = true test0 (ϵ e) = true.

Unit language

test1 e checks whether e=𝟭 is provable in KA. If the value is false, then e is either the empty language or it contains a non-empty word a::u.
  Fixpoint test1 e :=
    match e with
    | e_zerofalse
    | e_untrue
    | _false
    | e_star etest1 e || test0 e
    | e_add e f(test1 e && test1 f)||(test1 e && test0 f)||(test0 e && test1 f)
    | e_prod e ftest1 e && test1 f
    end.

  Lemma test0_test1_false e : test0 e = false test1 e = false a u, e (a::u).


  Lemma test1_spec e : test1 e = true {|KA,KA'|} e == e_un.

Finite sums

  Fixpoint join_list {A} f (E : list A) : regexp :=
    match E with
    | []𝟬
    | x::E(f x) (join_list f E)
    end.

  Notation "Σ_{ X } f" := (join_list f X) (at level 35).

  Lemma join_list_app {A} f (E F : list A) : Σ_{E++F} f ≡0 Σ_{E} f Σ_{F} f.

  Lemma join_list_monotone {A} f (E F : list A) : E F Σ_{E} f ≦0 Σ_{F} f.

  Lemma join_list_add {B} (A : list B) f g :
    Σ_{A} (fun xf x g x) ≡0 Σ_{A} f Σ_{A} g.

  Lemma join_list_left_distr {B} (A : list B) f e :
    {|KA,KA'|} Σ_{A} (fun xe · f x) == e · Σ_{A} f.

  Lemma join_list_right_distr {B} (A : list B) f e :
    {|KA,KA'|} Σ_{A} (fun xf x · e) == Σ_{A} f · e.

  Lemma join_list_equivalent {C} (A B : list C) f g :
    ( x, x A {|KA,KA'|} f x == g x) A B
    {|KA,KA'|} Σ_{A} f == Σ_{B} g.

  Lemma join_list_equivalent_ACI0 {C} (A B : list C) f g :
    ( x, x A f x ≡0 g x) A B Σ_{A} f ≡0 Σ_{B} g .

  Lemma join_list_zero {B} (A : list B) : Σ_{A} (fun _e_zero) ≡0 𝟬.

  Lemma Σ_app L M : Σ L Σ M ≡0 Σ (L++M).

  Lemma Σ_incl0 L M : L M Σ L ≦0 Σ M.

  Global Instance Σ_equivalent : Proper (@equivalent _ ==> ax_eq ACI0 Empt) Σ.

  Lemma Σ_map_concat l :
    Σ (map Σ l) ≡0 Σ (concat l).

  Lemma Σ_map_equiv {A} (f g : A regexp) (L : list A) :
    ( e, e L {| KA , KA' |} f e == g e) {| KA , KA' |} Σ (map f L) == Σ (map g L).

  Lemma ϵ_Σ_un L : ( e, e L ϵ e = un) ϵ (Σ L) = un.

  Lemma ϵ_Σ_zero L : ( e, e L ϵ e = zero) ϵ (Σ L) = zero.

Finite products

  Definition Π : list regexp regexp := fold_right prod un.

  Lemma Π_lang E u :
    Π E u U, u = fold_right (@app _) [] U
                    U = E
                    n, nth n E un (nth n U []).

  Lemma fold_star_prodList (e : regexp) l :
   {| KA , KA' |} fold_right (fun f1 f2e_prod (e_prod e f1) f2) e l == Π (pad e l).

  Remark idem_fold_star (g : regexp) P :
    fold_right (fun e fg · e · f ) (g ) P g · fold_right (fun e fg · e · f ) (g ) P.

Derivatives

From now on we assume that the alphabet X has decidable equality.
  Context {decX: decidable_set X }.

This entails quite easily that the syntactic equality of regular expressions is decidable.
  Global Instance decidable_regexp : decidable_set regexp.
    Fixpoint eq_regexp e1 e2 :=
      match (e1,e2) with
      | (e_zero,e_zero) | (e_un,e_un)true
      | (x,y)x =?= y
      | (e_add e1 f1,e_add e2 f2) | (e_prod e1 f1,e_prod e2 f2)
                                    eq_regexp e1 e2 && eq_regexp f1 f2
      | (e_star e,e_star f)eq_regexp e f
      | _false
      end.
  Qed.

δ x e is the Brzozowski derivative of e at x.
  Fixpoint δ (x:X) (e : regexp) :=
    match e with
    | e_un | e_zeroe_zero
    | yif x =?= y
            then e_un
            else e_zero
    | e_prod e f
      if ϵ e =?= 𝟭
      then δ x e · f δ x f
      else δ x e · f
    | e_add e fδ x e δ x f
    | e_star eδ x e · e
    end.


  Proposition fundamental_theorem e A :
    Var e A {|KA,KA'|} e == ϵ e Σ_{A} (fun xx · δ x e).

  Lemma δ_lang e a w : e (a::w) δ a e w.

  Lemma δ_inf_e a e : {| KA , KA' |} a · (δ a e) =<= e.

  Lemma δ_Σ l L : δ l (Σ L) = Σ (map (δ l) L).

  Lemma δ_proper_KA l :
    Proper ((ax_eq KA KA') ==> (ax_eq KA KA')) (δ l).

  Lemma δ_proper_KA_inf l :
    Proper ((ax_inf KA KA') ==> (ax_inf KA KA')) (δ l).

  Lemma test0_δ x e : test0 e = true test0 (δ x e) = true.

  Remark δ_prod l e f : {|KA, KA'|}⊢ δ l (e·f) == δ l e ·f ϵ e·δ l f.

δ may be extended to words in the obvious way.
  Fixpoint δ_words w e :=
    match w with
    | []e
    | a::wδ_words w (δ a e)
    end.

  Notation " δ⋆ " := δ_words.

  Lemma δ_words_zero w : δ w e_zero = e_zero.

  Lemma δ_words_add e f w : δ w (e f) = (δ w e) (δ w f).

  Lemma δ_words_lang e : e (fun wϵ (δ w e) = 𝟭).

  Lemma δ_words_last e u a : δ (u++[a]) e = δ a (δ u e).

Prefixes of a regular language

  Fixpoint prefixes e :=
    match e with
    | e_zero[]
    | e_un[𝟭]
    | x[𝟭;x]
    | e_add e fprefixes e ++ prefixes f
    | e_prod e fif test0 e || test0 f
                   then []
                   else prefixes e ++ map (prod e) (prefixes f)
    | e_star eif test0 e
                  then [𝟭]
                  else map (prod (e)) (prefixes e)
    end.

  Lemma prefixes_spec e : u, Σ (prefixes e) u v, e (u++v).
      Transparent joinLang.
      Opaque joinLang.
        Transparent joinLang.
             Opaque joinLang.

  Lemma test0_prefixes e : test0 e = true prefixes e = [].

End regexp.

Notation "Σ_{ X } f" := (join_list f X) (at level 35).
Notation " ⟪ l ⟫" := (atomic l).

Notation " ⟦ e ⟧ " := (reg_lang e).

Notation " {| Ax , Ax' |} ⊢ e == f " := (ax_eq Ax Ax' e f) (at level 80).

Notation " {| Ax , Ax' |} ⊢ e =<= f " := (ax_inf Ax Ax' e f) (at level 80).
Hint Constructors ax_eq.
Hint Constructors KA.
Hint Constructors KA'.
Notation " e =KA f " := (ax_eq KA KA' e f) (at level 80).
Notation " e <=KA f " := (ax_inf KA KA' e f) (at level 80).
Notation " δ⋆ " := δ_words.

Section Antimirov.
  Context {X : Set}{decX: decidable_set X }.

Antimirov derivatives


  Fixpoint δA a e :=
    match e with
    | e_zero | e_un[]
    | b if a =?= b then [e_un] else []
    | e_add e fδA a e ++ δA a f
    | e_prod e f
      map (fun gg · f) (δA a e)++(if ϵ e =?= e_un then (δA a f) else [])
    | e_star emap (fun gg · e) (δA a e)
    end.

  Fixpoint stateSpace (e:@regexp X) :=
    e::match e with
       | e_zero[]
       | e_un[]
       | x [e_un]
       | e_add e f(stateSpace e) ++ (stateSpace f)
       | e_prod e fmap (fun gg · f) (stateSpace e)++(if negb (test0 e) then stateSpace f else [])
       | e_star emap (fun gg · e) (stateSpace e)
       end.

  Lemma stateSpace_refl e : e stateSpace e.

  Lemma test0_stateSpace e f :
    test0 f = true e stateSpace f test0 e = true.

  Lemma stateSpace_trans e f :
    f stateSpace e stateSpace f stateSpace e.

  Lemma δA_stateSpace e a : δA a e stateSpace e.

  Lemma Antimirov_lang e a w : e (a::w) f, f δA a e f w.

  Remark In_δA_stateSpace_incl : ( e f a, f δA a e stateSpace f stateSpace e).

 Lemma test0_δA e x : test0 e = true f, f δA x e test0 f = true.

  Lemma stateSpace_Var e f : e stateSpace f Var e Var f.
End Antimirov.