Experilous

Tl;dr:
If numerical determinism is important for your application, never use a PRNG to generate more than one function parameter at a time.

I just got bit by C++’s unfortunate underspecificity regarding the order of evaluation of function parameters. I was working on adapting a procedural generation algorithm (based loosely on my planet generator code) from floating point numbers to fixed point numbers. The goal was to ensure that regardless of compiler, OS, or CPU architecture, the same code would generate the same planet, if starting from the same pseudo-random number seed and using the same generation parameters. Despite IEEE 754 being an extensively designed and very mature standard, in practice there are more than enough other variables at play to make floating point unreliable from machine to machine if exact replication is required. Just search for “floating point determinism” for plenty of examples.

Quite obnoxiously, but not unexpectedly, the results with the new fixed point code were substantially different from the floating point results. Not completely different; I could tell that some steps were behaving nearly identically, which proved that most of my fixed point code was functioning correctly. Tracking down the source of the discrepancy was harder. Was my implementation of a cross product backwards, leading to blatantly wrong vectors? Or was it something more subtle, such as a chaotic variable tipping just enough across its threshold to lead to radically different behavior?

Here’s what the generated maps looked like, the original floating point on the left, the fixed point in the middle, and the image difference between them on the right:

Map generated with floating point numbers, fixed point numbers, and the image difference between them. Severe discrepancies, but the continents are the same shape.

After going through and attempting to compare results of individual computations from my old floating point code to the new fixed point code, and nearly running out of relevant bits of code to examine, I came upon the code that gave each of my tectonic plates different physical properties. I could tell by the map generated that the shape of the continents were the same regardless of number type, but the elevations at the plate boundaries were different. I had already checked all the code that estimates plate boundary stresses, the code that propagates those stresses inward away from the boundaries, and the code that translates those stresses into elevation values. The only thing left, then, was the code that generated the movements of the plates that would produce all the different stresses and elevations.

At first glance, nothing seemed out of the ordinary. I was simply randomly generating a few different values to determine the drift, spin, and oceanic properties of each plate. Given the nature my half-finished transition from float to fixed (I was still randomly generating floating point numbers, and just converting them to fixed point afterward), I couldn’t see any way in which I was grossly miscalculating anything here.

And then I remembered that C++ does not specify the order of evaluation of expressions passed in as function parameters. I was using the emplace_back() function of a std::vector to construct a std::tuple. Every single one of the five parameters were expressions that included a call to my pseudo-random number generator. Naturally, to get the same results out of a PRNG, you have to use the sequence of generated numbers in exactly the same order, or else things will work out quite differently. I decided to pull these expressions out and assign their results to five local variables first, and then emplace them into the vector once they were calculated.

After doing this, the map generated was something else different altogether. Continents shapes weren’t even the same, but I could tell that the shape of the underlying tectonic plates still were. That indicated that the oceanic variable was now being generated in a different order also, whereas before I guess it was consistent between the floating point and fixed point programs.

Knowing that this was sufficient to cause significant differences, I proceeded to determine if it was the only problem tripping me up. To do this, I simply fiddled around with the explicit order of the expressions, and was finally able to find an order that generated essentially the same results as I had achieved with the floating point code. You can tell by the image difference below that discrepancies are quite minor, and are pretty much what one might expect from changing the numeric type used in a non-chaotic simulation.

Map generated with floating point numbers, fixed point numbers, and the image difference between them. After correcting order of evaluations, discrepancies are minimal, more in line with what would be expected.

So after all that investigation into how I was messing up my fixed point calculations, it turns out I wasn’t getting them wrong at all! My compiler simply decided to change the order of function parameter evaluation on me after switching from floating point to fixed point. And if a single version of a single compiler on a single OS and architecture changes its mind, I can be certain that I’ll get different results on different compilers and for different platforms! I could have missed this little bug for a long time if I hadn’t ran into it while verifying the correctness of my fixed point calculations. I might have thought that all was proceeding smoothly, only to be dismayed when building the application for other platforms. Even then, I’d only notice if I were comparing results across platforms. More likely, I’d release the broken program and start getting bug reports from users sharing planet seeds with each other; not good!

The conclusion? If numerical determinism is important for your application, never use a PRNG to generate more than one function parameter at a time.

1 Comment

Comment by Robert de Forest — 2014/12/30 @ 17:49

Nice. I was not aware of this aspect of C++ and presumably it can vary between implementations and the wisdom applies to other languages as well. Semantics is tricky.

In my own code I tend to avoid putting anything other than a variable or constant in the arguments to a function, but I didn’t realize before now that there could even be a good reason for that. The one exception is in JavaScript where I in-line my callback functions if they’re small enough:

function thisThenThat(foo, that, bar) {
this(foo, function () { that(bar) });
}

But the declaration of and passing of the anonymous function has no side-effects, so in principle it’s the same.

I’m filing “function arguments with side effects” as a bad code smell in my book. That I don’t have. But should.

Thanks for the interesting post!

Leave a comment