The Details about the Unification Algorithm implemented in Javascript
For details about the Unification Algorithm itself, please scroll down to the chapter "The Unification Algorithm".Why Javascript?
Often, when I talk with people about my implementation of the Unification Algorithm and I tell them that I use Javascript as the programming language to implement an example, I get a rather sceptic reaction. This doesn't surprise me. Javascript is a very misunderstood programming language.
Some people know Javascript from flashy and irritating web sites. Other people even think that Javascript is like Java. Of course the similarity of the names doesn't help. Javascript is very different from Java.
What is Javascript? What does it offer?
- An interpreted language (scripting language)
- Syntax is similar to C or Java
- Functional (Values can be functions and higher order functions are possible)
- Functions are closures (Some local variables are kept between invocations)
- Lambdas (anonymous functions) are possible
- Nested functions are possible
- Eager evaluation
- Object based (objects are function closures)
- Objects/Function closures are like dynamic records
- Prototype-based Object Orientation
- Dynamic typing
- Weak or no subtyping
- Very dynamic
- Sometimes a bit confusing
- Supports Perl-like Regular Expressions
Some links about Javascript
- Wikipedia Javascript
- The World's Most Misunderstood Programming Language
- Classical Inheritance in Javascript
- Private Static Members in Javascript
- jslint: The Javascript Verifier
- Lambda calculus to Javascript source-to-source filter
- Curried JavaScript functions
- JScript is a Functional Language
The Unification Algorithm
This text is a summary from the exercise text describing the Unification Algorithm, partially expressed differently in a form which I understand better.
The unification algorithm is used in Prolog to infer queries from a knowledge base using substitutions or for type inference in programming languages. This requires finding substitutes that make different logical expressions look identical. This process is called unification and is a key component of all first-order inference algorithms. The Unify algorithm takes two rules or sentences and returns a unifier for them if one exists:
Unify(p, q) = θ where Subst(θ, p) = Subst(θ, q).
An unifier is a set of n substitutions { var_1/val_1, ..., var_n/val_n } where each variable var_i in an expression is to be replaced by the value val_i. For example the substitution x/Jane applied to the expression Knows(John, x) yields the expression Knows(John, Jane). The unifier has the property that the two rules p and q become equal if all substitutions in the set are applied to them.
For some expressions there is more than one unifier. The different unifiers have different results of the substitutions. We look for the most general unifier, which yields the "most general" equal form of the substituted rules. I think the most general form is the form which keeps as many variables "open" as possible. For example, the same variable appearing in both rules at the same place is left as is.
The algorithm consists of the functions Unify and Unify-Var. An occur check in Unify-Var is needed: if a variable is contained in an expression, the unification must fail, because no consistent unifier can be constructed in this case.
In the algorithm the expressions p and q are both recursively explored. If one of the expressions is a variable, proceed to the function Unify-Var, which finds and adds a substitution for the variable. Compound rules and rule lists are broken up and unified for each element separately. If p and q are equal, however, just return the substitution accumulated so far.
Failures in unification come from rule or argument lists of inequal lengths, or from the occur check, or from two different constants.
The Rule object explained more in detail
For the Unification Algorithm I defined some "helper classes". One class is "Rule". A Rule object is both a single rule and a rule list (because the Unification Algorithm accepts both single rules and rule lists).
Some definitions
- Leaf rule
- A rule without arguments. A variable or a constant. Examples: John or x.
- Constant
- A leaf rule which is never to be substituted.
- Variable
- A leaf rule which can be substituted.
- Compound rule
- A rule with arguments. Example: Knows(x, John).
- Rule name
- The name of a compound rule, constant or variable. Rule lists don't have a name. Variables have a name of only one letter.
- Rule list
- Zero or more rules. Separated by comma. Arguments of a compound rule are extracted as a rule list by Rule.args().
- Nil rule
- An empty rule list. Rule.head() and Rule.tail() don't work for nil rules (for leaf rules neither)!
- Variable names for compound rules
- Not only leafs can be variables. The names of compound rules can be variables, too. For example f(Jane, John) can be substituted to Knows(Jane, John) by {f/Knows}.
Here some part of the implementation in Javascript (with comments to explain some specialities in Javascript) is shown below. I didn't include everything to keep the example short.
// Objects are function closures. Function arguments are never // typed explicitly in Javascript. function Rule(op, list) { // 'this' is the reference to the closure. // Return the length of the rule/argument list. this.length = function() { return list.length; } // Return the first element of the argument list. this.head = function() { return list[0]; } // Return the rest rule list without the first argument. // slice() is an array method to return a sub-array. // new Rule() is needed because a Rule object must be returned. this.tail = function() { return new Rule("", list.slice(1, list.length)); } // add, join, op, args etc. are omitted. // Define an empty array as default for the parameter 'list' if (!list) list = new Array(); // Is the parameter 'list' a Rule list, extract the array instead if (is_list(list)) list = list.list; // Define an empty name as default for the rule name if (!op) op = ""; // Set the public members name and list this.name = op; this.list = list; } // Rule.prototype is the prototype object of Rule. Functions and // members defined in the prototype are available to all Rule instances. // Pretty-print a rule (using [] to delimit lists if show_list is true) Rule.prototype.dump = function(show_list) { ... } // Non-method functions which work on Rule instances // Non-method, because is_leaf(rule) and others returns false reliably // if rule is undefined, but rule.is_leaf() throws an exception. // Is 'rule' a Rule object? // First test rule, then rule.constructor, then compare it with the // constructor function Rule(). Each Javascript object instance has // the special member 'constructor' which points to the constructor // function. Very useful to find the "type" of an object instance. function is_rule(rule) { return rule && rule.constructor && rule.constructor == Rule; } // A Rule is leaf when it has a name and an empty argument list // rule.length() is the method defined inside the Rule object. // Empty strings evaluate to false. function is_leaf(rule) { return is_rule(rule) && rule.length() > 0 && rule.name; }
To create a rule Knows(John,Jane), one invokes:
var rule = new Rule("Knows", [new Rule("John"), new Rule("Jane")]);
To create a rule list John, Jane, Knows(John,Jane), one invokes:
var list = new Rule("", [ new Rule("John"), new Rule("Jane"), new Rule("Knows", [new Rule("John"), new Rule("Jane")]) ]);
We see that it is easier to use a parser. I wrote a parser. With the parser the rule list is defined more compactly:
var list = parse("John, Jane, Knows(John, Jane)");
The parser is a recursive-descent parser and uses regular expressions to scan the text. It is very compact (25 lines of code). The inner function parse(leaf) consumes the string variable (a closure member of the outer function parse(string)) and terminates on an empty string or too many closing parentheses. In a preprocessing phase first white-space is found with the regular expression /\s/g and removed, and then bad characters and illegal separator strings are catched with the regular expression /.{0,5}([^A-Za-z(),]|[,(][,()]).{0,15}/. The leading five and trailing fifteen characters are used for the error message to show the context. The parser is not perfect (its syntax is a bit too lax).
The implementation of Unify in Javascript
The Unification Algorithm itself renders in Javascript in a compact and rather easily understood form. I omitted logging and internal error handling.
function unify(x, y, subst) { if (!subst) subst = new Substitutions(); if (subst == failure) return failure; if (is_leaf_equal(x, y)) return subst; if (is_nil(x) && is_nil(y)) return subst; if (is_variable(x)) return unify_var(x, y, subst); if (is_variable(y)) return unify_var(y, x, subst); // Split up compound rules or rule lists and unify each part separately // For lists, check additionally that they aren't empty (nil) if (is_compound(x) && is_compound(y)) return unify(x.args(), y.args(), unify(x.op(), y.op(), subst)); if (is_list(x) && is_list(y) && !(is_nil(x) || is_nil(y))) return unify(x.tail(), y.tail(), unify(x.head(), y.head(), subst)); return failure; } function unify_var(variable, expression, subst) { var vs = variable.dump(); var es = expression.dump(); if (subst.subst[vs]) return unify(subst.subst[vs], expression, subst); if (subst.subst[es]) return unify(variable, subst.subst[es], subst); if (occurs(variable, expression)) return failure; // Substitution vs/expression found and add it subst.subst[vs] = expression; return subst; }