Source code representation: Characters, Letters and digits

January 6, 2023

Today’s section of the Go Spec is pretty straight forward. It defines a few character classes, which are important very quickly when we start talking about names of things (variables, functions, packages, etc) in Go.

In fact, this is probably the area of the Go spec that I have referenced most frequently. Any time a question like “Is that a valid variable name?” comes up, you’ll end up referring to this section of the spec.


The following terms are used to denote specific Unicode character categories:

newline        = /* the Unicode code point U+000A */ .
unicode_char   = /* an arbitrary Unicode code point except newline */ .
unicode_letter = /* a Unicode code point categorized as "Letter" */ .
unicode_digit  = /* a Unicode code point categorized as "Number, decimal digit" */ .

In The Unicode Standard 8.0, Section 4.5 “General Category” defines a set of character categories. Go treats all characters in any of the Letter categories Lu, Ll, Lt, Lm, or Lo as Unicode letters, and those in the Number category Nd as Unicode digits.

Letters and digits

The underscore character _ (U+005F) is considered a lowercase letter.

letter        = unicode_letter | "_" .
decimal_digit = "0"  "9" .
binary_digit  = "0" | "1" .
octal_digit   = "0"  "7" .
hex_digit     = "0"  "9" | "A"  "F" | "a"  "f" .

Some of these character classes are completely intuitive. decimal_digit is the digits 0 through 9. Of course. likewise binary_digit, octal_digit, and hex_digit mean exactly what you might expect.

The bit that’s perhaps the most non-intuitive is that the underscore character (_) is considered a letter for the purposes of Go source code. (And a lowercase letter at that.)

This is important, as we’ll see soon, because it means that the definition of an identifier, which includes only letters and unicode_digits, also includes the _ character, which is probably not otherwise obvious.

The Go Programming Language Specification, Version of June 29, 2022

Share this

Related Content

Empty structs

We finally we have enough knowledge for the EBNF format not to seem completely foreign, so let’s jump back and take a look at that, with the examples provided in the spec… Struct types … StructType = "struct" "{" { FieldDecl ";" } "}" . FieldDecl = (IdentifierList Type | EmbeddedField) [ Tag ] . EmbeddedField = [ "*" ] TypeName [ TypeArgs ] . Tag = string_lit . // An empty struct.

Struct tags

Struct types … A field declaration may be followed by an optional string literal tag, which becomes an attribute for all the fields in the corresponding field declaration. An empty tag string is equivalent to an absent tag. The tags are made visible through a reflection interface and take part in type identity for structs but are otherwise ignored. struct { x, y float64 "" // an empty tag string is like an absent tag name string "any string is permitted as a tag" _ [4]byte "ceci n'est pas un champ de structure" } // A struct corresponding to a TimeStamp protocol buffer.

Struct method promotion

Yesterday we saw an example of struct field promotion. But methods (which we haven’t really discussed yet) can also be promoted. Struct types … Given a struct type S and a named type T, promoted methods are included in the method set of the struct as follows: If S contains an embedded field T, the method sets of S and *S both include promoted methods with receiver T. The method set of *S also includes promoted methods with receiver *T.