Source code representation

January 5, 2023

Source code representation

Source code is Unicode text encoded in UTF-8. The text is not canonicalized, so a single accented code point is distinct from the same character constructed from combining an accent and a letter; those are treated as two code points. For simplicity, this document will use the unqualified term character to refer to a Unicode code point in the source text.

Each code point is distinct; for instance, uppercase and lowercase letters are different characters.

Implementation restriction: For compatibility with other tools, a compiler may disallow the NUL character (U+0000) in the source text.

Implementation restriction: For compatibility with other tools, a compiler may ignore a UTF-8-encoded byte order mark (U+FEFF) if it is the first Unicode code point in the source text. A byte order mark may be disallowed anywhere else in the source.

I suppose it should be no surprise that Go source code is explicitly written in UTF-8, as both Ken Thompson and Rob Pike co-created both Go and UTF-8!

What this means is that it’s possible to unambiguously write any Unicode text into your Go source code. Having dealt with ambiguous encodings, and special runtime modules in some other languages in the past, I find this to be quite a nice thing!

You can see this demonstrated at the Go Playground, as its default Hello-World program includes some simplified Chinese:

func main() {
	fmt.Println("Hello, 世界")
}

But it works just as well with any other Unicode characters as well.

This does mean there’s room for some confusion, as there are many Unicode characters that (in)famously look alike.

The Go Programming Language Specification, Version of June 29, 2022


Share this

Direct to your inbox, daily. I respect your privacy .

Unsure? Browse the archive .

Related Content


Notation

Notation The syntax is specified using a variant of Extended Backus-Naur Form (EBNF): Ah, the good ol’ Extended Backus-Naur Form… EBNF is a fairly popular way to formally describe a formal language. There’s a very good chance you’ve seen something like this before, especially if you’ve ever found yourself reading an RFC. But if you haven’t, this is a great time to familiarize yourself with the basic concepts. I won’t go into a detailed explanation of EBNF or WSN (the variant used in the Go spec), as there are better online resources.


Dependency pinning

Today I want to share a trick I stumbled upon. It won’t matter to most of you. If it does matter to you, it will probably save you a bunch of headaches! First, some background. Have you ever had to use an outdated dependency for $REASONS? If not, you can skip today’s email, unless you’re just curious. One of my clients is still using MongoDB 3.2. MongoDB was EOLed more than a year before anyone had ever heard the term “COVID”.


Why is context.TODO not just a comment?

I’ve been on vacation for the last week, so haven’t written much. Let’s finally finish up the context package series with some reader feedback and questions! Joost Helberg wrote in with an interesting observation about context.TODO. In his own words: With regards to context.TODO, back in my early Go days, I thought the TODO was about the intent of this context. Like context.Background is for a background task and context.WithCancel for a job that can be canceled.

Get daily content like this in your inbox!

Subscribe