Understand the data, and understand how it is used.

Over the past few weeks, I’ve been struggling to get past a point of excessive complexity in the simulation code for my city builder prototype.  I went through a few different refactoring attempts trying and generally failing to manage the complexity.  But the most recent attempt finally appears to have succeeded, and I think it has also has gotten me one step closer to properly grokking data oriented design.

My progress first ground to a halt after I had implemented the construction of roads and houses.  Construction itself was handled identically for both, in terms of hiring construction work labor and making progress on construction over time.  But upon completion of houses, I wanted to create residents that would move in to the city.

The naïve solution would simply be to loop over all construction objects, and for each one, if it is complete and it is a house, execute the code for creating residents.  In line with existence based processing, however, I really wanted to avoid checking the type of each construction object, and conditionally running different code for each type.  Once the game got complex enough, this could quickly get out of control.  Run-time polymorphism might keep the code complexity under control, but the processing complexity for the CPU and memory would still be awful.

What I needed instead was a collection of construction objects that I knew were entirely just houses, so that I could loop over them and create residents without the need to check for construction type.

Up until this point, all my data was stored in dynamic arrays and hash tables as seemed appropriate.  I used unique integer ids a lot to represent various types of entities.  I had agent ids for entities that make decisions in the game world, land ids for notable tiles of land, structure ids for buildings, roads, and such, and construction ids for structures that currently had an incomplete level of ongoing construction.  So to add a collection of housing construction, all I’d need to do is store a construction id for each house under construction.

This approach ran into two problems, the second of which I did not immediately recognize and will get to in a moment.  The first was that the number of arrays and hash tables that I needed to keep synchronized was quickly becoming unwieldy.  I felt like the major difficulty was that the relationships between all these data structures were informal, and I had to remember to keep them all properly synchronized everywhere in code where I manipulated them.  I thought if I could just formalize the relationships, the code would become much simpler and reliable.  What I wanted was essentially the foreign key of relational databases.

This led me down an interesting but ultimately futile path of writing a quick-and-dirty implementation of tables, indexes, and foreign keys.  I learned a lot about C++ type traits, template metaprogramming, and variadic template parameters.  Partway through the project I also discovered Boost.MultiIndex, and decided to write a wrapper around this nicely designed library and save myself some trouble (at least in the short term; the library wasn’t a perfect match for my needs, but it was close).  Unfortunately, once I got my simulation code updated with this new library, not only were my compile times painfully long for such a small project (no surprise), but I was still having difficulty figuring out how to manage housing construction and new residents.

In despair, I more or less started over from scratch.  And since this housing construction issue was where I had consistently run into problems, I decided to start there.  But more specifically, I decided to start with the code, hoping that the data would make itself clear.  I simply wrote the function that checks for completed housing construction and creates new residents to live there.  The whole time I was simply assuming that the data was structured in whatever way was most convenient for the function I was writing; this made the code incredibly simple, naturally.  I then wrote a few more pieces of processing in this way, sometimes copying my old code, sometimes writing different code.  Eventually, I started plugging in the gaps between the different processes, adapting the form that each process presumed that the data had, or adding in some transformation processes to adapt data from one form to another.  Only once I had most of the processes in place did I actually begin to define the data structures that I had previously just been imagining.

I eventually got all of this to compile, which felt good.  It was unpleasant writing so much code that wouldn’t compile for so long because the data structures weren’t actually defined until the end.  The prototype was able to perform just like it had before, but the underlying processing and data structures were somewhat different from before.  Although I had started with the housing construction process, I hadn’t actually integrated it fully into the simulation, so now I faced the real test:  Creating new residents in the city for each house that is constructed.

Fortunately, the next few days of work not only accomplished this elusive task, but also saw further significant progress with an additional agent type (mining company), structure (mine), and much more active use of the economic system to produce, trade, and consume resources.

Houses, industry, construction, and the approve and play buttons!

At this point, I think I can use this progress as evidence that I have managed to structure my code (and my mind) into a better form.  And in retrospect, I realize that this is because I got back to the core principle of data oriented design, as I summarized it at the top:  Understand the data, and understand how it is used.  I’ve read proponents of data oriented design describe how it is hard to come up with many generic and reusable patterns for this style of programming.  You ultimately just need to deeply explore and understand the problem you’re trying to solve, and then structure the data and processing accordingly.  Given my past heavy influence from object oriented design, my brain wants to rebel against this notion.  But I’m slowly beginning to develop a new conception that does not value generic reusable code quite so strongly.

We’ll see how that pans out.  Hopefully this coming week will continue to see rapid progress on the prototype.  It was suppose to be done two and a half weeks ago.  I am clearly doing a fine job of maintaining the venerable software develop tradition of missing milestones. :-D

No Comments

Leave a comment