Iteration over strings

June 18, 2024

For statements with range clause

  1. For a string value, the “range” clause iterates over the Unicode code points in the string starting at byte index 0. On successive iterations, the index value will be the index of the first byte of successive UTF-8-encoded code points in the string, and the second value, of type rune, will be the value of the corresponding code point. If the iteration encounters an invalid UTF-8 sequence, the second value will be 0xFFFD, the Unicode replacement character, and the next iteration will advance a single byte in the string.

This might seem pretty straight forward, but there are some interesting subtleties that arise from the fact that ranging over a string steps by Unicode code point, and not by byte. Consider the difference between these two snippets:

for a, b := range "Hello, 世界" {
	fmt.Println(a, b)
}

which produces the following output:

0 72
1 101
2 108
3 108
4 111
5 44
6 32
7 19990
10 30028

and this variant:

for a, b := range []byte("Hello, 世界") {
	fmt.Println(a, b)
}

which produces this different output:

0 72
1 101
2 108
3 108
4 111
5 44
6 32
7 228
8 184
9 150
10 231
11 149
12 140

In the latter case, we step over the string byte-by-byte. But because the string Hello, 世界 contains multi-byte Unicode codepoints, when ranging over it as a string, we range over it codepoint-by-codepoint.

This is a very important distinction to keep in mind. Sometimes you’ll want to range over a string byte-by-byte, in which case you first want to convert it to a byte slice, as we did in the second code example. Other times, particularly when processing text, you’ll want the default behavior of ranging over a string.

This may have you wondering about a related question: How do you access the nth Unicode code point or rune, of a string?

There’s no shorthand for doing this in Go, as there is for accessing the nth byte. But you can accomplish it with a loop, as we’ve just seen. Let’s wrap such a loop in a convenience function for demonstration purposes:

// runeAt returns the nth rune in str, or 0 if len(str)-1 < n
func runeAt(str string, n int) rune {
	if len(str)-1 < n {
		return 0
	}
	for i, r := range str {
		if i == n {
			return r
		}
	}
	return 0
}

See it in the playground

Quotes from The Go Programming Language Specification Language version go1.22 (Feb 6, 2024)


Share this

Direct to your inbox, daily. I respect your privacy .

Unsure? Browse the archive .

Related Content


Evaluation of range expressions

For statements with range clause … The range expression x is evaluated once before beginning the loop, with one exception: if at most one iteration variable is present and len(x) is constant, the range expression is not evaluated. The first part of this seems pretty self-evident, as it follows the same pattern as a for statement with a for clause: The expression needs to be evaluated once before the loop executes.


Iteration over arrays and slices

For statements with range clause … For an array, pointer to array, or slice value a, the index iteration values are produced in increasing order, starting at element index 0. If at most one iteration variable is present, the range loop produces iteration values from 0 up to len(a)-1 and does not index into the array or slice itself. For a nil slice, the number of iterations is 0. From this above paragraph, we can be assured that ranging over an array, pointer to array, or slice, will operate in a defined order–from index 0, upward.


The types of iteration variables

For statements with range clause … For each iteration, iteration values are produced as follows if the respective iteration variables are present: Range expression 1st value 2nd value array or slice a [n]E, *[n]E, or []E index i int a[i] E string s string type index i int see below rune map m map[K]V key k K m[k] V channel c chan E, <-chan E element e E integer n integer type value i see below How do we read this table?

Get daily content like this in your inbox!

Subscribe