Floating-point operators

February 8, 2024

Floating-point operators

For floating-point and complex numbers, +x is the same as x, while -x is the negation of x. The result of a floating-point or complex division by zero is not specified beyond the IEEE-754 standard; whether a run-time panic occurs is implementation-specific.

I find this to be quite interesting. An implementation may choose to panic, or not, if you attempt to divide a floating-point or complex number by zero. And indeed, the standard implementation does not panic, as you can see by running the following code in the Playground or on your own machine:

func main() {
	var x float64 = 1.23
	fmt.Println(x / 0)
}

Outputs:

+Inf

But wait, that’s not all… there are other implementation-dependent details!

An implementation may combine multiple floating-point operations into a single fused operation, possibly across statements, and produce a result that differs from the value obtained by executing and rounding the instructions individually.

So as a simple example, if you have code that looks like:

x := float32(0.0000000001)
y := x
x = x * 12.3456789
x = x / 12.3456789
fmt.Println(y, x) // 1e-10 9.9999994e-11

An implementation has the freedom to “fuse” the multiplication followed by division operations, effectively eliminating them, for a more accurate result.

… An explicit floating-point type conversion rounds to the precision of the target type, preventing fusion that would discard that rounding.

For instance, some architectures provide a “fused multiply and add” (FMA) instruction that computes x*y + z without rounding the intermediate result x*y. These examples show when a Go implementation can use that instruction:

// FMA allowed for computing r, because x*y is not explicitly rounded:
r  = x*y + z
r  = z;   r += x*y
t  = x*y; r = t + z
*p = x*y; r = *p + z
r  = x*y + float64(z)

// FMA disallowed for computing r, because it would omit rounding of x*y:
r  = float64(x*y) + z
r  = z; r += float64(x*y)
t  = float64(x*y); r = t + z

So if you want your implementation/platform to (possibly) fuse floating-point operations, avoide explicit conversion to/from floating-point types, which make that impossible. In reality: If in doubt, don’t worry about this level of micro-optimization.

Quotes from The Go Programming Language Specification Version of August 2, 2023


Share this

Direct to your inbox, daily. I respect your privacy .

Unsure? Browse the archive .

Related Content


Order of evaluation and floating point numbers

Today I’ll be live streaming again! This time, picking up where we left off last week, and adding another feature to my new Go code rewriter and simplifier. Join me! Order of Evaluation … Floating-point operations within a single expression are evaluated according to the associativity of the operators. Explicit parentheses affect the evaluation by overriding the default associativity. In the expression x + (y + z) the addition y + z is performed before adding x.


Constant expressions, part III

Did you miss yesterday’s Ask-me-anything session? You probably did. I had about 10-15 people there. But even with a small group, we had a ton of great questions! The Q&A session lasted about an hour, and covered topics such as book recommendations for going deeper into Go, what project to build to learn concurrency, and much more. Catch the replay on YouTube. Let’s continue our discussion of constant expressions, with some more miscellaneous rules:


Conversions between numeric types

So we’ve gone through the high-level conversion stuff… now we dive into some particulars. Today, numeric conversions. Conversions between numeric types For the conversion of non-constant numeric values, the following rules apply: When converting between integer types, if the value is a signed integer, it is sign extended to implicit infinite precision; otherwise it is zero extended. It is then truncated to fit in the result type’s size. For example, if v := uint16(0x10F0), then uint32(int8(v)) == 0xFFFFFFF0.

Get daily content like this in your inbox!

Subscribe