Jemgine

10Jun/100

“Sparse Components”

There was an interesting discussion on the sweng-gamedev mailing list recently about 'sparse components'. It branched into the topic of software architecture and design. In between the bickering and the hate for object-oriented programming and C++, there was some real useful information. As with anything, it takes time to wade through the crap to get the gems, but I've already done that.

Sparse Components aren't a new idea. The term isn't terribly accurate, either. What it is is a step away from the usual way of doing things with object-oriented programming where you'd have an entity object, and a family of virtual functions on that entity object that are over-ridden for specific entity types. The component-based model is already a step away from that, in that there's a single entity type (in the programming language) and you create many entity types (in the game) by mixing and matching components. But in most component-based system, the component object itself is a polymorphic object with behavior controlled by a family of virtual functions. And, in fact, that's how I've been doing it all along; I'll get into why this is or isn't a good idea.

So what's wrong with this approach? It's certainly better than having a hierarchy of entity types for reasons I won't get into here - lets just assume you already know the benefit of the component model and want to use it. The problem is those virtual functions and how you call them. It is not virtual function call overhead; that's so minuscule to not be worth worrying about. If you have a container of heterogeneous types and you're firing off virtual functions on them, you're potentially calling different code for every single call. And that's the problem, you're constantly moving code into and out of the cache. The biggest cost is that cache miss. That's where a non-virtual component interface comes in.

Instead of keeping a list of entities which each contain a list of components, keep several lists of components. A list for physics components, a list for render components, etc. Now you can update all your physics components at once. You can keep using your virtual functions - you'll still see an immediate benefit because each call is going to the same virtual function; the cache will stay nice and warm and you won't miss at all.

But we can go one step better. We keep this list of physics components in the physics module. The physics module knows how to update them, so instead of calling a function on the component, it just does it. Updating these objects becomes a tight loop over homogeneous data. We can do the same for render components to apply transforms or whatever. Instead of a single loop over heterogeneous data doing many things, we have several smaller loops over homogeneous data doing one thing each. There are far fewer cache misses. This is what the term 'sparse components' is failing to describe. The behavior lives in the module, not the component itself.

This isn't quite how I've been doing it. I have something of a hybrid approach going. Some systems work that way, some don't. My physics almost do, entirely accidentally. The physics module is it's own self contained thing, and however it works internally, I don't think it invokes a lot of virtual functions. The cache should be doing fine, and there's not much I can do about it either way anyway. So I was accidentally using a 'sparse component'; my component was just data, no behavior. The behavior was all handled by the module running it. But not every system works this way (And I'm afraid not every system can) but I'm going to do as much as possible to improve it. I might even have to dig into Box2d and make changes there.

This method does not reap a benefit for a small number of game objects and components. Lets make that clear. If there are only a few dozen objects, and especially if they are all different, there is no benefit to structuring your game this way. Several different loops that update just a few objects each isn't going to be faster than a single loop that updates a few dozen objects in different ways. The overhead of the additional lists might even make it slower. You need many objects to reap the real benefits of this approach. Hundreds or thousands of them.

And C# is really, really bad at this sort of thing. Just an FYI. It's because you can't stick classes into continuous memory, and structs can only be dumb data. Maybe it's a good excuse to normalize your data. Mostly it's just a pain in the ass, and almost makes me want to switch back. If only everything else about C# wasn't so nice.

The discussion shifted to stream-based processing and a new way of thinking about your programs. Every program can be broken down into two things. State Objects and Processes. Processes take State Objects as input, and produce State Objects as output. This seems kind of silly and obvious at the same time. That's exactly how every program ever made works, it's just a matter of realizing this, and then using it to our advantage. You already take an object and transform it through some process into a new object. (Okay, you don't always make a copy. But that's just nomenclature pedantry.) But what happens when you structure your program with this in mind? Instead of transforming an object, you transform many objects.

Your process can take a stream of objects and transform them and output another stream of objects. This is the same optimization we got with Sparse Components! I've not yet explored all the benefits of this new way of thinking. There's a psychological barrier; I'm solving problems now that I know how to solve with object-oriented programming techniques. I need a problem that's not trivial to solve with what I already know. This stream-based paradigm is very similar to the graphics pipeline, though. Verticies are input as a stream and processed by the vertex shader, which produces a new stream of verticies. These verticies are fed into the rasterizer or a tessellation shader, and so on. The key here, I think, is that you don't perform the entire operation on each object one at a time, you perform part of the operation on every object, and then more on every object, and so on.

We'll see what happens. First, I need to make some minor changes to Jemgine's physics module. Changes that can take advantage of this new knowledge right now. And then I might explore any benefit an intrusive linked list would have over C#'s generic collections. And I should see if there are any updates to box2d.xna available.

About Bleck

No description. Please complete your profile.
Comments (0) Trackbacks (0)

No comments yet.


Leave a comment


Trackbacks are disabled.