Lexical elements: Rune literals pt 2
January 25, 2023
Let’s continue our exploration of rune literals, which began on Monday. In summary from Monday, a rune
in Go represents a Unicode code point. Continuing from there…
Rune literals
A rune literal is expressed as one or more characters enclosed in single quotes, as in
'x'
or'\n'
. Within the quotes, any character may appear except newline and unescaped single quote. A single quoted character represents the Unicode value of the character itself, while multi-character sequences beginning with a backslash encode values in various formats.The simplest form represents the single character within the quotes; since Go source text is Unicode characters encoded in UTF-8, multiple UTF-8-encoded bytes may represent a single integer value. For instance, the literal
'a'
holds a single byte representing a literala
, Unicode U+0061, value0x61
, while'ä'
holds two bytes (0xc3
0xa4
) representing a literala
-dieresis, U+00E4, value0xe4
.
There are a few things I want to call out about this section of the spec, that aren’t always obvious, or are easily forgotten. Especially if you’re not already very familiar with Unicode.
- A
rune
represents a single “character” (technically: unicode code point, see Monday’s discussion). - A
rune
is not a single byte. (In fact,rune
is an alias forint32
, so it’s actually 4 bytes) - A
rune
is not necissarily a single visible character, as many visible characters are built by combining multiple codepoints.
As pointed out in the spec, both 'a'
and 'ä'
are valid rune
literals. The first also corresponds to a single ASCII (or Unicode) byte: 0x61
. The second corresponds to two UTF-8 bytes: 0xc3
, 0xa4
. So it’s immediately clear that a rune
may contain multiple bytes.
But recall the example from Monday as well: 'ў'
is a valid rune
literal, and represents two bytes: 0xd1
, 0x9e
. But in contrast, the visually identical 'ў'
is not a valid rune
literal, because it contains two Unicode code points, each of two bytes: у
(0xd1
, 0x83
) followed by the breve mark, ˘
, (0xcc
, 0x86
).
As you might expect, this can be an easy place to get tripped up. What you see on the screen is quite frequently not the whole story. I know of no fool-proof way to solve this confusion. The best I know is to be aware that the confusion exists, so when you see an error along the lines of “more than one character in rune literal”, you know where to begin your search.
Quotes from The Go Programming Language Specification, Version of June 29, 2022
Related Content

Empty structs
We finally we have enough knowledge for the EBNF format not to seem completely foreign, so let’s jump back and take a look at that, with the examples provided in the spec… Struct types … StructType = "struct" "{" { FieldDecl ";" } "}" . FieldDecl = (IdentifierList Type | EmbeddedField) [ Tag ] . EmbeddedField = [ "*" ] TypeName [ TypeArgs ] . Tag = string_lit . // An empty struct.

Struct tags
Struct types … A field declaration may be followed by an optional string literal tag, which becomes an attribute for all the fields in the corresponding field declaration. An empty tag string is equivalent to an absent tag. The tags are made visible through a reflection interface and take part in type identity for structs but are otherwise ignored. struct { x, y float64 "" // an empty tag string is like an absent tag name string "any string is permitted as a tag" _ [4]byte "ceci n'est pas un champ de structure" } // A struct corresponding to a TimeStamp protocol buffer.

Struct method promotion
Yesterday we saw an example of struct field promotion. But methods (which we haven’t really discussed yet) can also be promoted. Struct types … Given a struct type S and a named type T, promoted methods are included in the method set of the struct as follows: If S contains an embedded field T, the method sets of S and *S both include promoted methods with receiver T. The method set of *S also includes promoted methods with receiver *T.