Type unification

A couple of days ago we saw the spec reference the concept of “type unification”. Today we start through that explanation….

Type unification

Type inference solves type equations through type unification. Type unification recursively compares the LHS and RHS types of an equation, where either or both types may be or contain bound type parameters, and looks for type arguments for those type parameters such that the LHS and RHS match (become identical or assignment-compatible, depending on context).

Let be break up this long paragraph with a quick example.

So we have a type equation P ➞ A… let’s imagine that P, the type parameter, is defined as [P ~int|~float64]. Then A represents an argument of int(3). Type unification is the process of comparing the left-hand side (LHS; ~int|~float) and the right-hand side (RHS; int(3)) to look for an identical match. In this case, ~int and int(3) match, to the equation is solved, and the type parameter is int.

If the RHS represents an untyped constant, example, let’s say simply the literal 3, then it’s not an identical match, but an assignment-compatible match, because the untyped constnat 3 is assignable to int (and to float64, for that matter, but it would default to int in this case).

To that effect, type inference maintains a map of bound type parameters to inferred type arguments; this map is consulted and updated during type unification. Initially, the bound type parameters are known but the map is empty. During type unification, if a new type argument A is inferred, the respective mapping P ➞ A from type parameter to argument is added to the map. Conversely, when comparing types, a known type argument (a type argument for which a map entry already exists) takes the place of its corresponding type parameter. As type inference progresses, the map is populated more and more until all equations have been considered, or until unification fails. Type inference succeeds if no unification step fails and the map has an entry for each type parameter.

In other words, an internal mapping is kept of all relevant P ➞ A values in the equation set. This is important because there may be multiple places where the same P ➞ A mapping are relevant, and once it’s solved for one of the places, that solution can be applied to the others (or trigger a type inference failure, if they conflict). Consider:

func sum[S ~int|~float64](a, b S) S { return a+b }
func product[P ~int|~float64](a, b P) P { return a*b }

x := product(sum(1, 2), sum(3, 4))

In this case we have three generic function calls in play: Two to sum(), and one to product(). Our complete set of type parameter equations looks something like this:

P₁ ➞ sum(1, 2)
P₂ ➞ sum(3, 4)
S₁ ➞ 1
S₂ ➞ 2
S₃ ➞ 3
S₄ ➞ 4

And our initial (empty) mapping of type parameters to inferred types something like:

P ➞ ?
S ➞ ?

Type unification then goes through the list of equations, and looks for one it can solve. If we go in the order presented above, the first one that can be solved is S₁, because we know the default type of the untyped constnat 1 is int, and that is assignable to S’s type parameter of ~int|~float. So the map of inferred types looks like:

P ➞ ?
S ➞ int

If we apply this newly inferred int type to all the other instances of S in the list of equations above, we see that there are no conflicts. And we are also able to continue to solve the P₁ and P₂ equations, because the RHS of the equations is now known also to be int.

On the other hand, if our function call looked like this:

x := product(sum(1, float54(2)), sum(3, 4))

we’d be in a different situation. We’d solve the S₁ equation the same way, but when we tried to apply the solution of int to S₂, we’d see that int and float64 are not assignment-compatible, even though both are assignable to ~int|~float64, and type inference would fail.

The spec offers its own example, as well:

For example, given the type equation with the bound type parameter P
  [10]struct{ elem P, list []P } ≡_A [10]struct{ elem string; list []string }
type inference starts with an empty map. Unification first compares the top-level structure of the LHS and RHS types. Both are arrays of the same length; they unify if the element types unify. Both element types are structs; they unify if they have the same number of fields with the same names and if the field types unify. The type argument for P is not known yet (there is no map entry), so unifying P with string adds the mapping P ➞ string to the map. Unifying the types of the list field requires unifying []P and []string and thus P and string. Since the type argument for P is known at this point (there is a map entry for P), its type argument string takes the place of P. And since string is identical to string, this unification step succeeds as well. Unification of the LHS and RHS of the equation is now finished. Type inference succeeds because there is only one type equation, no unification step failed, and the map is fully populated.

Quotes from The Go Programming Language Specification Version of August 2, 2023

Type unification

January 15, 2024