type-inference-engine

Language of expressions

The inference engine infers types for expressions in the following language:

   expression = literal | variable | funcall | let
   funcall    = "(" + function + expression* + ")"
   variable   = symbol
   function   = symbol
   let        = (let (binding+) expression)
   binding    = (variable expression)
   

Here | means non-ambiguous choice operator, i.e. if an expression is a literal, it cannot be a variable of a function call. To determine what is a literal and what is not, a sequence of tests is performed on each term before it is classified as either a variable or a function call. For more information look at parse-expr function in the API section. For example, both (+ (sin x)(- y z)) and (foo x y (- x y)) are expressions, but ((x y)(y x)), ((foo x)) and (foo x) y are not (the latter is actualy two correct expressions).

An example provided with this engine in the system type-inference-engine/example has the following defined literals:

  • Integers are all literals recognized by Common Lisp as such e.g. 1, 2, 3 etc.
  • Floats are all literals recognized by Common Lisp as such e.g. 1.0, 2.32, 3.2f5, 3.43d0 etc.
  • T(as a boolean) and NIL(as a boolean, a list or NULL).
  • Cons cells, e.g. (3 . 5), (nil . t). Lists are also recognized as conses.
  • Vectors, e.g. #(1 2 3).
  • Multidimensional arrays, e.g. #2a((1)(2)(3)).

It also contains the following known functions:

  • 1+, 1-: Take a numeric argument and return it incremented or decremented by 1 (as in Common Lisp).
  • sin, cos: Take a numeric argument and return its sine or cosine (as in Common Lisp).
  • length: Take a sequence (read the next section if you do not know what it is) and return its length as an integer (as in Common Lisp).
  • +, -, *, /: Arithmetic functions as in Common Lisp, but these functions are two-args.
  • floor, ceiling: Take a numeric argument and round it (return an integer).
  • elt: Take a sequence and an index (integer) and return an element from that sequence.
  • cons: Constructor for cons cells.
  • numberp: Take a value and return either T(if the value is a number) or NIL(otherwise).
  • zerop: Take a number and return either T or NIL.
  • =: Take two numbers and return either T or NIL.
  • if: Take a boolean and any two values x and y. Return the x if the boolean is T and y otherwise. Note, that this is an ordinary function, so it evaluates both x and y before giving a result.

Unlike Common Lisp, all functions in this language return exactly one value. Note that these functions are not mandatory for the engine and can be replaced with your own set. Also note that these functions are not evaluated. E.g. inferring types for an expression (sin x) does not actually evaluate x or (sin x). You do not have to bind x with anything. As the last important notice: there is no generalized booleans in the example language (but you can redefine this behaviour): the first argument of ifmust be either T or NIL.

An expression is internally represented as a directed graph of parallel assignment statements in the following form: \((x_1, x_2, \dots, x_m) \leftarrow (f_1 (y_{11}, y_{12}, \dots, y_{1n_1}), \dots, f_m (y_{m1}, y_{m2}, \dots, y_{mn_m})) \) That means that a statement contains \(m\) assignments. When a statement is executed \(m\) functions from \(f_1\) to \(f_m\) are evaluated with values in corresponding variables \(y_{ij}\) as arguments and their results are assigned to (all distinct) variables \(x_i\). For example an expression (sin (1+ y)) is transformed in something like following:

   1: (y, v1, v2) ← initialize variables with some safe values
   2: v1 ← y + 1
   3: v2 ← sin v1
   

with edges \(1 \rightarrow 2\), \(2 \rightarrow 3\), \(3 \rightarrow 1\). Note, that the last statement is always connected to the first. This knowleadge about internal representation will be useful later when we will read a section about type inference.