Lexical elements: Rune literals pt 3

January 26, 2023

Let’s continue our disection of rune literals. If you missed the parts, check them out from Monday when we discussed Unicode, and yesterday when we discussed quoting single characters.

Today we’re looking at the various escape sequences supported by the rune literal syntax.

Rune literals

Several backslash escapes allow arbitrary values to be encoded as ASCII text. There are four ways to represent the integer value as a numeric constant: \x followed by exactly two hexadecimal digits; \u followed by exactly four hexadecimal digits; \U followed by exactly eight hexadecimal digits, and a plain backslash \ followed by exactly three octal digits. In each case the value of the literal is the value represented by the digits in the corresponding base.

Although these representations all result in an integer, they have different valid ranges. Octal escapes must represent a value between 0 and 255 inclusive. Hexadecimal escapes satisfy this condition by construction. The escapes \u and \U represent Unicode code points so within them some values are illegal, in particular those above 0x10FFFF and surrogate halves.

Let’s take these one at a time.

  • A single octal byte\OOO

    You’ll probably never use this, so let’s get it out of the way first. But it’s allowed. You can specify a byte using octal notation. However, note that as described, this is limited to values 0-255 inclusive, which means you can create an invalid rune representation this way:

    var x = rune('\400') // # 400 octal == 256 decimal

    Produces the following error:

    octal escape value 256 > 255
  • One, two, or four hexidecimal bytes\xXX, \uXXXX, \UXXXXXXXX

    This allows you to a single byte with two hexidecimal digits, (\xXX), two bytes with four digits (\uXXXX), or the full 4 bytes of a rune with eight hexidecimal digits (\UXXXXXXXX).

And finally, there are some special escape sequences supported for rune literals:

After a backslash, certain single-character escapes represent special values:

\a   U+0007 alert or bell
\b   U+0008 backspace
\f   U+000C form feed
\n   U+000A line feed or newline
\r   U+000D carriage return
\t   U+0009 horizontal tab
\v   U+000B vertical tab
\\   U+005C backslash
\'   U+0027 single quote  (valid escape only within rune literals)
\"   U+0022 double quote  (valid escape only within string literals)

(That last one arguably doesn’t belong here, as it’s not valid in a rune literal, but it’s nice to know that it’s explicitly excluded here.)

Let’s round out today’s email with the rest of the rune literal section, which is just the boring EBNF syntax, and some examples, which we don’t need to discuss in any detail.

An unrecognized character following a backslash in a rune literal is illegal.

rune_lit         = "'" ( unicode_value | byte_value ) "'" .
unicode_value    = unicode_char | little_u_value | big_u_value | escaped_char .
byte_value       = octal_byte_value | hex_byte_value .
octal_byte_value = `\` octal_digit octal_digit octal_digit .
hex_byte_value   = `\` "x" hex_digit hex_digit .
little_u_value   = `\` "u" hex_digit hex_digit hex_digit hex_digit .
big_u_value      = `\` "U" hex_digit hex_digit hex_digit hex_digit
                           hex_digit hex_digit hex_digit hex_digit .
escaped_char     = `\` ( "a" | "b" | "f" | "n" | "r" | "t" | "v" | `\` | "'" | `"` ) .
'\''         // rune literal containing single quote character
'aa'         // illegal: too many characters
'\k'         // illegal: k is not recognized after a backslash
'\xa'        // illegal: too few hexadecimal digits
'\0'         // illegal: too few octal digits
'\400'       // illegal: octal value over 255
'\uDFFF'     // illegal: surrogate half
'\U00110000' // illegal: invalid Unicode code point

Quotes from The Go Programming Language Specification, Version of June 29, 2022

Share this

Related Content

Empty structs

We finally we have enough knowledge for the EBNF format not to seem completely foreign, so let’s jump back and take a look at that, with the examples provided in the spec… Struct types … StructType = "struct" "{" { FieldDecl ";" } "}" . FieldDecl = (IdentifierList Type | EmbeddedField) [ Tag ] . EmbeddedField = [ "*" ] TypeName [ TypeArgs ] . Tag = string_lit . // An empty struct.

Struct tags

Struct types … A field declaration may be followed by an optional string literal tag, which becomes an attribute for all the fields in the corresponding field declaration. An empty tag string is equivalent to an absent tag. The tags are made visible through a reflection interface and take part in type identity for structs but are otherwise ignored. struct { x, y float64 "" // an empty tag string is like an absent tag name string "any string is permitted as a tag" _ [4]byte "ceci n'est pas un champ de structure" } // A struct corresponding to a TimeStamp protocol buffer.

Struct method promotion

Yesterday we saw an example of struct field promotion. But methods (which we haven’t really discussed yet) can also be promoted. Struct types … Given a struct type S and a named type T, promoted methods are included in the method set of the struct as follows: If S contains an embedded field T, the method sets of S and *S both include promoted methods with receiver T. The method set of *S also includes promoted methods with receiver *T.