On Friday, I shared a short code snippet, and asked how it could be improved. If you missed it, go read the challenge now then come back.
I got a good number of responses, so today I’m going to share a few of them, and offer my own answer.
First off, Jim Kalafut, and several others, the code I shared. One line was return nil
, but should have been return decodedBytes, nil
. Thanks for pointing that out. With that in mind, the (correct) original code to be improved is:
func decodeString(encoded string) ([]byte, error) {
var decodedBytes []byte
var err error
trimmed := strings.TrimRight(encoded, "=")
decodedBytes, err = base64.StdEncoding.DecodeString(trimmed)
if err != nil {
decodedBytes, err = base64.URLEncoding.DecodeString(trimmed)
if err != nil {
decodedBytes, err = base64.RawURLEncoding.DecodeString(trimmed)
if err !=nil {
return decodedBytes, nil
}
}
}
return decodedBytes, err
}
By far the most common suggestion I received was to loop over a slice of decoders, to reduce nesting. Mark Ayers, for example, suggested:
func decodeString(encoded string) ([]byte, error) {
trimmed := strings.TrimRight(encoded, "=")
// Try decoding with various Base64 encodings
decoders := []func(string) ([]byte, error){
base64.StdEncoding.DecodeString,
base64.URLEncoding.DecodeString,
base64.RawURLEncoding.DecodeString,
}
for _, decoder := range decoders {
if decoded, err := decoder(trimmed); err == nil {
return decoded, nil
}
}
return nil, fmt.Errorf("failed to decode string")
}
I agree that this is a big improvement. So kudos to everyone who suggested it!
It’s also the approach I almost implemented when I ran into this code, until I noticed something else odd about the code that only one reader, Marc Journeux, hinted at:
- I do not understand the TrimRight to remove the = character. Base64 strings are made of 6 bits groups (2^6=64) so = characters can be added for padding. RAW URL base64 encoding does not use padding, so looking for a = character might save one decode call.
He’s on the right path with this comment.
Let’s take a closer look at what the code is doing, with this in mind:
- We trim trailing
=
characters from the input. i.e.Zm9vCg==
becomesZm9vCg
. - We attempt to decode the new string using the
StdEncoding
, which uses the standard Base64 alphabet, with padding. - We attempt to decode using
URLEncoding
, which uses a URL-safe Base64 alphabet, with padding. - We attempt to decode using
RawURLEncoding
, which uses a URL-safe Base64 alphabet, without padding.
I’m guessing by now the problem is pretty obvious.
Of the three decoders we attempt to use, only one is possibly valid after trimming padding! StdEncoding
and URLEncoding
will both fail on all inputs that are missing the needed padding.
With this realization, we can simplify the code significantly:
func decodeString(encoded string) ([]byte, error) {
trimmed := strings.TrimRight(encoded, "=")
return base64.RawURLEncoding.DecodeString(trimmed)
}
Now some of you may be thinking: “Wait! Not all Base64 encoded strings have padding!”
You’re right, of course. Padding is only needed when the number of input bytes is not divisible by 3. This means that for 1/3 of inputs, URLEncoding
could work. But only by accident of the fact that trimming the padding is a no-op.
Further, there’s a chance that StdEncoding
could work for some inputs. Any inputs that happen to not contain the +
or /
characters (which are replaced with -
and _
respectively, for URL encoding), and also don’t contian padding, would decode just fine with the standard encoding.
But of course, we want a function that works all the time, with all inputs. And RawURLEncoding
handles all of those cases properly.
Now one final note:
This function assumes that inputs may or may not include padding. That strikes me as an odd requirement. I have a strong suspicion that this function could be further simplified to:
func decodeString(encoded string) ([]byte, error) {
return base64.URLEncoding.DecodeString(encoded)
}
Which no longer trims the padding.
But I can neither confirm nor deny that this will work, based on the tests I have available to me. So I’m taking the more conservative approach, and retaining the functionality to support both input formats.
I might also assume that the inputs could be of either standard, or URL-encoding variants. But I don’t see any actual evidence of that from the tests available to me. So I’m assuming the inclusion of the standard encoding in the original version was just an oversight, or maybe a first attempt. (It’s likely this code was created by an LLM, too, which might explain some of this.)