Lexical elements: Rune literals pt 2

January 25, 2023

Let’s continue our exploration of rune literals, which began on Monday. In summary from Monday, a rune in Go represents a Unicode code point. Continuing from there…

Rune literals

A rune literal is expressed as one or more characters enclosed in single quotes, as in 'x' or '\n'. Within the quotes, any character may appear except newline and unescaped single quote. A single quoted character represents the Unicode value of the character itself, while multi-character sequences beginning with a backslash encode values in various formats.

The simplest form represents the single character within the quotes; since Go source text is Unicode characters encoded in UTF-8, multiple UTF-8-encoded bytes may represent a single integer value. For instance, the literal 'a' holds a single byte representing a literal a, Unicode U+0061, value 0x61, while 'ä' holds two bytes (0xc3 0xa4) representing a literal a-dieresis, U+00E4, value 0xe4.

There are a few things I want to call out about this section of the spec, that aren’t always obvious, or are easily forgotten. Especially if you’re not already very familiar with Unicode.

  • A rune represents a single “character” (technically: unicode code point, see Monday’s discussion).
  • A rune is not a single byte. (In fact, rune is an alias for int32, so it’s actually 4 bytes)
  • A rune is not necissarily a single visible character, as many visible characters are built by combining multiple codepoints.

As pointed out in the spec, both 'a' and 'ä' are valid rune literals. The first also corresponds to a single ASCII (or Unicode) byte: 0x61. The second corresponds to two UTF-8 bytes: 0xc3, 0xa4. So it’s immediately clear that a rune may contain multiple bytes.

But recall the example from Monday as well: 'ў' is a valid rune literal, and represents two bytes: 0xd1, 0x9e. But in contrast, the visually identical 'ў' is not a valid rune literal, because it contains two Unicode code points, each of two bytes: у (0xd1, 0x83) followed by the breve mark, ˘, (0xcc, 0x86).

As you might expect, this can be an easy place to get tripped up. What you see on the screen is quite frequently not the whole story. I know of no fool-proof way to solve this confusion. The best I know is to be aware that the confusion exists, so when you see an error along the lines of “more than one character in rune literal”, you know where to begin your search.

Quotes from The Go Programming Language Specification, Version of June 29, 2022


Share this

Direct to your inbox, daily. I respect your privacy .

Unsure? Browse the archive .

Related Content


Lexical elements: Rune literals pt 3

Let’s continue our disection of rune literals. If you missed the parts, check them out from Monday when we discussed Unicode, and yesterday when we discussed quoting single characters. Today we’re looking at the various escape sequences supported by the rune literal syntax. Rune literals Several backslash escapes allow arbitrary values to be encoded as ASCII text. There are four ways to represent the integer value as a numeric constant: \x followed by exactly two hexadecimal digits; \u followed by exactly four hexadecimal digits; \U followed by exactly eight hexadecimal digits, and a plain backslash \ followed by exactly three octal digits.


Lexical elements: Rune literals pt 1, Intro to Unicode

Runes… Oh boy! This is one of bits of Go that shines for its elegant simplicity, but constantly trips up everyone (myself included). As such, I think this may be a 2-, or maybe even a 3-parter. Let’s get started. Rune literals A rune literal represents a rune constant, an integer value identifying a Unicode code point. If you’re already familiar with Unicode, and have a strong understanding of what a “code point” is, you can probably skip this one.


Numeric type aliases

Numeric Types … byte alias for uint8 rune alias for int32 … To avoid portability issues all numeric types are defined types and thus distinct except byte, which is an alias for uint8, and rune, which is an alias for int32. As with other aliases in Go, this means that the same type simply has two (or perhaps more) identifiers, which are completely interchangeable. This means that byte is not a distinct type, simply backed by a uint8 type.

Get daily content like this in your inbox!

Subscribe