Lexical elements: Rune literals pt 3

January 26, 2023

Let’s continue our disection of rune literals. If you missed the parts, check them out from Monday when we discussed Unicode, and yesterday when we discussed quoting single characters.

Today we’re looking at the various escape sequences supported by the rune literal syntax.

Rune literals

Several backslash escapes allow arbitrary values to be encoded as ASCII text. There are four ways to represent the integer value as a numeric constant: \x followed by exactly two hexadecimal digits; \u followed by exactly four hexadecimal digits; \U followed by exactly eight hexadecimal digits, and a plain backslash \ followed by exactly three octal digits. In each case the value of the literal is the value represented by the digits in the corresponding base.

Although these representations all result in an integer, they have different valid ranges. Octal escapes must represent a value between 0 and 255 inclusive. Hexadecimal escapes satisfy this condition by construction. The escapes \u and \U represent Unicode code points so within them some values are illegal, in particular those above 0x10FFFF and surrogate halves.

Let’s take these one at a time.

  • A single octal byte\OOO

    You’ll probably never use this, so let’s get it out of the way first. But it’s allowed. You can specify a byte using octal notation. However, note that as described, this is limited to values 0-255 inclusive, which means you can create an invalid rune representation this way:

    var x = rune('\400') // # 400 octal == 256 decimal
    

    Produces the following error:

    octal escape value 256 > 255
    
  • One, two, or four hexidecimal bytes\xXX, \uXXXX, \UXXXXXXXX

    This allows you to a single byte with two hexidecimal digits, (\xXX), two bytes with four digits (\uXXXX), or the full 4 bytes of a rune with eight hexidecimal digits (\UXXXXXXXX).

And finally, there are some special escape sequences supported for rune literals:

After a backslash, certain single-character escapes represent special values:

\a   U+0007 alert or bell
\b   U+0008 backspace
\f   U+000C form feed
\n   U+000A line feed or newline
\r   U+000D carriage return
\t   U+0009 horizontal tab
\v   U+000B vertical tab
\\   U+005C backslash
\'   U+0027 single quote  (valid escape only within rune literals)
\"   U+0022 double quote  (valid escape only within string literals)

(That last one arguably doesn’t belong here, as it’s not valid in a rune literal, but it’s nice to know that it’s explicitly excluded here.)

Let’s round out today’s email with the rest of the rune literal section, which is just the boring EBNF syntax, and some examples, which we don’t need to discuss in any detail.

An unrecognized character following a backslash in a rune literal is illegal.

rune_lit         = "'" ( unicode_value | byte_value ) "'" .
unicode_value    = unicode_char | little_u_value | big_u_value | escaped_char .
byte_value       = octal_byte_value | hex_byte_value .
octal_byte_value = `\` octal_digit octal_digit octal_digit .
hex_byte_value   = `\` "x" hex_digit hex_digit .
little_u_value   = `\` "u" hex_digit hex_digit hex_digit hex_digit .
big_u_value      = `\` "U" hex_digit hex_digit hex_digit hex_digit
                           hex_digit hex_digit hex_digit hex_digit .
escaped_char     = `\` ( "a" | "b" | "f" | "n" | "r" | "t" | "v" | `\` | "'" | `"` ) .
'a'
'ä'
'本'
'\t'
'\000'
'\007'
'\377'
'\x07'
'\xff'
'\u12e4'
'\U00101234'
'\''         // rune literal containing single quote character
'aa'         // illegal: too many characters
'\k'         // illegal: k is not recognized after a backslash
'\xa'        // illegal: too few hexadecimal digits
'\0'         // illegal: too few octal digits
'\400'       // illegal: octal value over 255
'\uDFFF'     // illegal: surrogate half
'\U00110000' // illegal: invalid Unicode code point

Quotes from The Go Programming Language Specification, Version of June 29, 2022


Share this

Direct to your inbox, daily. I respect your privacy .

Unsure? Browse the archive .

Related Content


Lexical elements: Rune literals pt 2

Let’s continue our exploration of rune literals, which began on Monday. In summary from Monday, a rune in Go represents a Unicode code point. Continuing from there… Rune literals A rune literal is expressed as one or more characters enclosed in single quotes, as in 'x' or '\n'. Within the quotes, any character may appear except newline and unescaped single quote. A single quoted character represents the Unicode value of the character itself, while multi-character sequences beginning with a backslash encode values in various formats.


Lexical elements: Rune literals pt 1, Intro to Unicode

Runes… Oh boy! This is one of bits of Go that shines for its elegant simplicity, but constantly trips up everyone (myself included). As such, I think this may be a 2-, or maybe even a 3-parter. Let’s get started. Rune literals A rune literal represents a rune constant, an integer value identifying a Unicode code point. If you’re already familiar with Unicode, and have a strong understanding of what a “code point” is, you can probably skip this one.


Taking the address of literals

Composite literals … Taking the address of a composite literal generates a pointer to a unique variable initialized with the literal’s value. var pointer *Point3D = &Point3D{y: 1000} This particular feature of the language is immensely useful. Without it, the above code snippet would become much longer: var point Point3D = Point3D{y: 1000} var pointer *Point3D = &point In fact, this is the exact situation we are in for non-composites.

Get daily content like this in your inbox!

Subscribe