Module Acutis.Matching

Compile patterns into decision trees.

Every pattern represents a one-dimensional path across a multi-dimensional data structure. A list of patterns is a two-dimensional matrix of paths. In order to transverse these paths efficiently, we need to combine them into a tree.

We take advantage of a few properties that can make our tree simpler. We only test discrete static values: integers, strings, etc. We sort record fields and replace omitted fields with wildcards, so every pattern "lines up" with the others in 2D space. We also sort the tested values (the integers, strings, etc.).

Every node has a set of integer IDs. These keep track of which values could potentially be bound to an identifier. Due to the merging and expanding of trees, nodes may have multiple IDs. Each leaf at the end of the branches contains a map of names to their IDs. This is necessary because multiple patterns may merge that use different or overlapping names for bindings.

The most complicated part is how we handle wildcards. When we merge trees, each wildcard "expands" into its joining node. All of the nodes that come after the wildcard will also expand into all of the nodes after the other node. This has trade-offs. One advantage is that we can guarantee that every node is only visited once at runtime. One disadvantage is that some patterns may produce extremely large trees.

Detecting redundant patterns is almost "free" with this strategy because merging a redundant tree fails to produce a new, different tree.

Example patterns and their resulting trees.

Here are a few example patterns juxtaposed with the trees they produce. The trees are written in pseudo-code, since the real trees are much more verbose.

A basic list of integers.

{% match a
   with 0  %} a
{% with 10 %} b
{% with 20 %} c
{% with _  %} d
{% /match %}
|- case 0 -> a
|- case 10 -> b
|- case 20 -> c
|- wildcard -> d

A record nested in a tuple.

(Note: internally, all arguments passed to a match statement are implicitly wrapped in a tuple-like structure.)

{% match a, b
   with {a: 10, b: 11}, 12 %} a
{% with {b: 21, a: 20}, 22 %} b
{% with _, _ %} c
{% /match %}
key 0
  |- begin nest
      |- key "a"
      |   |- case 10
      |   |   |- key "b"
      |   |       |- case 11
      |   |           |- end nest
      |   |               |- key 1
      |   |                   |- case 12 -> a
      |   |                   |- wildcard -> c
      |   |- case 20
      |       |- key "b"
      |           |- case 21
      |               |- end nest
      |                   |- key 1
      |                       |- case 22 -> b
      |                       |- wildcard -> c
      |- wildcard
          |----------------- key 1
                              |- wildcard -> c

A list.

Remember that the [1, 2, ...tl] list syntax is basically sugar for !(1, !(2, tl)).

{% match a
   with [] %} a
{% with [x] %} b
{% with [x, ...y] %} c
{% /match %}
|- nil -> a
|- cons
    |- begin nest
        |- key 0
            |- wildcard
                |- key 1
                    |- nil
                    |  |- end nest -> b
                    |
                    |- cons
                        |- wildcard
                            |- end nest -> c

Nested tuples with wildcards.

We can see the effect of wildcard expansion. Some paths are duplicated throughout the tree. For larger, more complex patterns, this can create unexpectedly enormous trees.

{% match              a,  b
   with               x, 41 %} a
{% with  ((10, 20), 30), 40 %} b
{% with               y,  z %} c
{% /match %}
key 0
  |- begin nest
      |- key 0
      |   |- begin nest
      |       |- key 0
      |           |- case 10
      |               |- key 1
      |                   |- case 20
      |                       |- end nest
      |                           |- key 1
      |                               |- case 30
      |                                   |- end nest
      |                                       |- key 1
      |                                           |- case 40 -> b
      |                                           |- case 41 -> a
      |                                           |- wildcard -> c
      |- wildcard
          |------------------------------------- key 1
                                                  |- case 41 -> a
                                                  |- wildcard -> c

Type definitions.

type internal_check_cases
type ('leaf, 'key) tree =
  1. | Switch of {
    1. key : 'key;
    2. ids : Set.Int.t;
    3. cases : ('leaf, 'key) switchcase;
    4. wildcard : ('leaf, 'key) tree option;
    5. check_cases : internal_check_cases;
    }
    (*

    A Switch represents a list of discreet values to test (i.e., 1, "a", etc.). If none of the values match the input, then the wildcard is used.

    *)
  2. | Nest of {
    1. key : 'key;
    2. ids : Set.Int.t;
    3. child : ('leaf, 'key) nest;
    4. wildcard : ('leaf, 'key) tree option;
    }
    (*

    A Nest represents a structure such as tuple or a record.

    *)
  3. | Nil of {
    1. key : 'key;
    2. ids : Set.Int.t;
    3. child : ('leaf, 'key) tree;
    }
    (*

    A null or []. The child points to the next node in the tree.

    *)
  4. | Cons of {
    1. key : 'key;
    2. ids : Set.Int.t;
    3. child : ('leaf, 'key) tree;
    }
    (*

    A not-null value or a non-empty list. The child points to a node representing the "current" data, either a Wildcard or a Nest.

    *)
  5. | Nil_or_cons of {
    1. key : 'key;
    2. ids : Set.Int.t;
    3. nil : ('leaf, 'key) tree;
    4. cons : ('leaf, 'key) tree;
    }
    (*

    An exhaustive combination of Nil and Cons.

    *)
  6. | Wildcard of {
    1. key : 'key;
    2. ids : Set.Int.t;
    3. child : ('leaf, 'key) tree;
    }
    (*

    Wildcards simply point to the next node in the tree.

    *)
  7. | Optional of {
    1. child : ('leaf, 'key) tree;
    2. next : ('leaf, 'key) tree option;
    }
    (*

    Optionals are only used, and always used, inside of dictionary nests. They denote that the item in child does not need to be present during runtime. If following a child path fails, then follow the next path instead. next is analogous to a wildcard except that it cannot bind a value to an ID.

    *)
  8. | End of 'leaf

This is a polymorphic "nested data type." Each tree can use itself as its own type variable, i.e. (('a, 'key) tree, 'key) tree. This allows the End nodes to be fully polymorphic. They can either lead to a leaf or back to their containing tree. This type nesting corresponds to the nesting of physical patterns.

One way to think about this is that our input patterns are non-nested types, but their physical structure can be nested in many dimensions. Our trees are the inverse. We need a tree that is always structurally two-dimensional, but its type can be nested on multiple dimensions.

Nested types are simple to create, but complicated to manipulate. Functions cannot consume these types under normal polymorphism rules. We need to use explicitly polymorphic type annotations and GADTs.

and ('leaf, 'key) nest =
  1. | Int_keys of (('leaf, 'key) tree, int) tree
  2. | String_keys of (('leaf, 'key) tree, string) tree
and ('leaf, 'key) switchcase = {
  1. data : [ `Int of int | `Float of float | `String of string ];
  2. if_match : ('leaf, 'key) tree;
  3. next : ('leaf, 'key) switchcase option;
}

The switch cases work like linked lists of values to test. If an input matches a value, then we follow its associated tree. If not, we try the next case.

module Exits : sig ... end

Each "exit" is given a key which we can use to look up the AST nodes to follow after the tree. We use keys because exits can be copied when trees merge, and we don't want to duplicate entire trees.

type leaf = {
  1. names : int Map.String.t;
  2. exit : Exits.key;
}
type 'a t = {
  1. tree : (leaf, int) tree;
  2. exits : 'a Exits.t;
}

Functions.

Raises Error.Acutis_error if the cases are non-exhaustive or if there is an unused case.

val to_sexp : ('a -> Sexp.t) -> 'a t -> Sexp.t