While working on the Make It Random library for Unity, I took up an engineering challenge: generation of random floating point numbers between zero and one quickly and with perfect uniform distribution. The most common method, dividing a random integer by the full range of possible integers, has a few substantial flaws that I hoped to avoid:

- First, it involves a division operation, which could naturally slow things down a bit.
- Secondly, although the distribution on a high level is uniform, it’s more clumpy for higher numbers closer to one, and more diffusely spread for lower numbers closer to zero, so the uniformity is imperfect.
- Third, the quantity of possible integers generated rarely maps cleanly to the possible number of floating point values available, and so different floating point values are likely to have at least slightly different probabilities of occurring, further weakening the uniformity.
- And fourth, even when the mapping from integers to floating point values is carefully managed, if the range of values needed is not a power of two, this adds further problems to any attempt to get perfect uniformity without sacrificing performance, due to how typical random engines generate data as chunks of bits.

With the help of various sources around the internet plus some clever use of probability mathematics, I was able to conquer all of these difficulties, providing perfectly uniform and fast generation of floating point numbers in the unit range. This includes all four variants of whether the lower and upper bounds are inclusive or exclusive. The techniques involved are explained below.

The first two are not at all new, but are included because I have not seen them discussed as often as I think they deserve, and because they help build the context for the third technique. This last technique is something I devised on my own, and although I have no doubt other smart people have already discovered it or something similar, I never ran across it while researching, so I’m eager to share it with others in this post. Plus, I suspect that the general technique can be usefully applied to other random value generation beyond just floating point numbers, so the more people are aware of it, the better. (more…)

While working on Worldbuilder v0.2, I spent a fair amount of time implementing an algorithm for generating distance fields on the surface of a sphere. It was admittedly a struggle, with many false starts, but I finally stumbled upon a solution that works well, producing a distance field with a high degree of accuracy, and executing very quickly with the help of the GPU.

Most of the literature I could find on the topic was focused either on generating distance fields for flat 2D images, or for full 3D space. In both cases, space was always Euclidean, whereas distance on the surface of a sphere behaves quite differently. Additionally, most algorithms I ran across were focused on calculating distances from a collection of points, but for reasons I’ll discuss below, I needed to calculate accurate distances from polygon outlines. Some of the 3D algorithms were tantalizingly close to what I wanted, since they often started with a triangle mesh as their input, but the 3D aspect greatly increased the complexity of the algorithms relative to my needs, while still not addressing my non-Euclidean needs.

In the end, the algorithm that I finally implemented is honestly nothing very impressive, and I kind of figured it out by accident. But it works, and it works well, despite being something of an ugly hack. As it might help others needing to do something similar, allow me to share the details of this algorithm, along with the journey getting there. Though admittedly, the journey does get a bit verbose at times, so feel free to jump straight to the final algorithm. I tried to keep that section relatively self-contained. Just know that you’re skipping over loads of pretty pictures. (more…)

Lua is a dynamically typed language, with only a small selection of primitive types. Any aggregate or otherwise complex types must use tables (i.e., dictionaries/associative arrays) in some form, which will significantly impact the performance of field/method access as well as the efficiency of data storage (and thus cache usage). LuaJIT’s foreign function interface (FFI) let’s us work around those limitations by offering us the ability to use C declarations to define new types with much greater control and efficiency. In this blog post, I’ll share how, for my Worldbuilder project, I used the FFI to implement a fairly performant fixed point number type. Public domain source code can be downloaded here. (more…)

I mentioned in my previous post that I needed to do some serious performance optimizations on my Lua code, and indicated in an edit that after a day’s worth of effort, I managed to speed things up by roughly a factor of 13. I have some more work to do, but in the hopes that the steps I’ve taken so far are helpful to anyone else going through a similar process, allow me to describe some of them before they slip my mind. (more…)

It took me three weeks of design, redesign, more redesign, lots of implementing scattered within, and three intense days of debugging a giant mass of previously untested code, but I finally have a basic modular system in place for running all procedural generation from Lua. This will enable me (and eventually anyone else) to quickly experiment with various algorithms for every stage of planet creation and presentation.

Unfortunately, I have a lot of optimizing investigations to do, because it seems to be running about 100 times slower than the prior C++ code. But at least it generates the exact same planet (give or take a few subtle discrepancies due to slightly different math here and there). Based on some of my earlier experiments at the beginning of the month, I’m pretty sure I can bring that up to within at least 10% of the speed of my C++ implementation, and quite possibly within 50% of its speed. Just need to profile and figure out the bottlenecks. (*Edit: A day’s worth of investigation has gotten me up to around 13%, or 7.5 times slower than the C++ implementation. That should be acceptable for the moment.*)

A cool thing about the architecture I ended up with is that not only will it naturally support a rich modularity of hooking up algorithms at different stages to each other, but that the way this modularity is exposed will also automatically enable a significant degree of concurrent execution on multiple cores with little to no effort on the part of the script writer. Right now I have only implemented a single threaded execution model, but I should be able to change these details under the hood when I get to that stage in the project, and the Lua scripts won’t know the difference. If you’re curious, allow me to provide an overview of how I’ve designed this modularity and concurrency. (more…)

Tl;dr:

If numerical determinism is important for your application, **never** use a PRNG to generate more than one function parameter at a time.

I just got bit by C++’s unfortunate underspecificity regarding the order of evaluation of function parameters. I was working on adapting a procedural generation algorithm (based loosely on my planet generator code) from floating point numbers to fixed point numbers. The goal was to ensure that regardless of compiler, OS, or CPU architecture, the same code would generate the same planet, if starting from the same pseudo-random number seed and using the same generation parameters. Despite IEEE 754 being an extensively designed and very mature standard, in practice there are more than enough other variables at play to make floating point unreliable from machine to machine if exact replication is required. Just search for “floating point determinism” for plenty of examples.

Quite obnoxiously, but not unexpectedly, the results with the new fixed point code were substantially different from the floating point results. Not completely different; I could tell that some steps were behaving nearly identically, which proved that *most* of my fixed point code was functioning correctly. Tracking down the source of the discrepancy was harder. Was my implementation of a cross product backwards, leading to blatantly wrong vectors? Or was it something more subtle, such as a chaotic variable tipping just enough across its threshold to lead to radically different behavior? (more…)

A few weeks ago I introduced you to what I felt was an impressive little data structure that can provide constant time complexity for the standard operations, stores data contiguously, and provides stable handles that remain valid across all operations other than removing the referenced items themselves. Today I want to extend the basic data structure with some additional functionality which will be focused on iterating over the contiguous items in order. While iterating, we may wish to modify the collection as we go (such as conditionally removing some items). We may also wish to rearrange the items so that a simple linear traversal will also happen to hit the items in a particular order (such as sorting 3D objects from front to back to take advantage of a depth buffer and reduce fill-rate). (more…)

I recently ran across a couple of blog articles describing a data structure that provides a set of properties that I often am seeking:

- Contiguous data storage, good for linear iteration and maximizing cache utilization
- Constant-time insertion, deletion, and relocation of elements
- Stable external references to elements regardless of internal relocation
- Constant-time look-up from external references without hashing

(The two articles were from the makers of the Molecule Engine and the BitSquid engine, respectively.)

Fascinated by data structures and algorithms in general, and lured by the appeal of these properties, I set out to study the concept and implement my own version. It’s quite simple to implement in a basic form to achieve the above four properties, and can easily be extended to acquire some additional behaviors at the cost of additional memory or extra operations. In this first article, I’ll walk through the basic concept, as well as a simple implementation in JavaScript. Later articles will explore some of the possible extensions. (more…)

In beginning the simulation code of my city builder over the past few weeks, I’ve run into some difficulties with integers that quite surprised me. For a variety of reasons, I want to ensure that the simulation code is absolutely completely deterministic, regardless of platform, compiler, machine, or any other variable. While I’d need to do further research to properly justify this statement in all its details, my impression is that this is much easier to guarantee if one sticks to integer computations, avoiding floating point representations of numbers.

Of course, if one uses just integers in all computations (or even the somewhat fancier fixed point variants), then loss of precision becomes more important to deal with, whereas with floating point numbers, one can often get by ignoring such details. Division in particular is the prime culprit. (more…)