Hard work refactoring the Game Loop and Display List
The image above, that of the Fellowship of the Ring looking out towards their final destination, far away in the distance, feels very appropriate right now.
Felipe and I have absolutely worked our butts off on Phaser 3 this week. 27 major commits, hundreds of lines of source code, piles of tests and examples created. We accomplished some major tasks. The SpriteBatch was overhauled, adding in lots of new features. The Blitter object was merged into the core renderer, blend mode support was added in, and numerous small tweaks and errors fixed all over the place. And yet we're still not where we wanted to be.
It honestly feels like we're in Mordor, trying to scale Mount Doom.
Part of the work we did this week was to carefully evaluate the main game loop. I literally broke the whole loop down, function by function, and we graphed out the whole thing. Then we created some performance tests for key parts of it, and it clearly displayed some issues to us. Everything worked, it just didn't run as fast as either of us wanted. A lot of time was spent lost in iteration of the scene graph each frame, with too much branching going on, killing the speed advantage our new batch renderer had given us. In our tech demos we were happily blasting 1 million objects around the screen. Inside of the full game loop, we struggled to hit 10,000 (at 60fps). It was a particularly demoralizing find.
After much discussion we spent a couple of days exploring alternatives. We looked at using an RTree for spatial separation of game objects, and while the figures were very impressive, it wasn't suitable for a scene graph as it lost all context of parenting. On the plus side, we do now have a superbly fast RTree implementation which we can use for the broadphase checks in physics (replacing the slow QuadTree from v2).
Then we looked at ways to speed-up iterating through the display list. Swapping to doubly linked lists, and a truly flat rendering list, to avoid branching and misprediction, and to improve data locality. This did indeed start performing a lot better. We then threw viewport culling into the mix, and started tackling the Transform class. As every single game object in the world has a Transform, and as it's vital to pretty much everything, being the custodian of game object placement and representation in the world, it was essential it was fast, but still offered the features you all need (like easily being able to scale a sprite, or easily get the true world coordinates of a sprite without first having to wait until a render pass has occurred, like you do in v2). The problem is that an immediate (as opposed to deferred) Transform is more expensive. But as Felipe said to me: "the fastest game engine in the world is an empty file", and he's right - it's useless of course, but it's fast. And what we're building here has to be useful too.
So all in all it's been a mixed week. We've poured hours and hours into it. I personally have been coding until the small hours several times (4am? oh hello again!) and while I know we've made great progress, it feels like we've actually slipped down Mount Doom a bit, rather than got higher up. Then again, that is the life of software development isn't it? It can't always go well, you can't always move forwards, but as long as you're still trying, and as long as you're still heading in the right direction, you'll get there eventually.
Here's hoping for an easier next week! :)
Phaser 3 Mailing List and Developers Guide
If you're interested in helping evolve the shape of Phaser 3, then please join the Phaser 3 Google Group. Discussions this week have included varying render loops. The group is for anyone who wishes to help shape what the Phaser 3 API and feature-set will contain.
The Phaser 3 Developers Guide is available. Essential reading for anyone who'd like to help build Phaser 3.
The image above, that of the Fellowship of the Ring looking out towards their final destination, far away in the distance, feels very appropriate right now.
Felipe and I have absolutely worked our butts off on Phaser 3 this week. 27 major commits, hundreds of lines of source code, piles of tests and examples created. We accomplished some major tasks. The SpriteBatch was overhauled, adding in lots of new features. The Blitter object was merged into the core renderer, blend mode support was added in, and numerous small tweaks and errors fixed all over the place. And yet we're still not where we wanted to be.
It honestly feels like we're in Mordor, trying to scale Mount Doom.
Part of the work we did this week was to carefully evaluate the main game loop. I literally broke the whole loop down, function by function, and we graphed out the whole thing. Then we created some performance tests for key parts of it, and it clearly displayed some issues to us. Everything worked, it just didn't run as fast as either of us wanted. A lot of time was spent lost in iteration of the scene graph each frame, with too much branching going on, killing the speed advantage our new batch renderer had given us. In our tech demos we were happily blasting 1 million objects around the screen. Inside of the full game loop, we struggled to hit 10,000 (at 60fps). It was a particularly demoralizing find.
After much discussion we spent a couple of days exploring alternatives. We looked at using an RTree for spatial separation of game objects, and while the figures were very impressive, it wasn't suitable for a scene graph as it lost all context of parenting. On the plus side, we do now have a superbly fast RTree implementation which we can use for the broadphase checks in physics (replacing the slow QuadTree from v2).
Then we looked at ways to speed-up iterating through the display list. Swapping to doubly linked lists, and a truly flat rendering list, to avoid branching and misprediction, and to improve data locality. This did indeed start performing a lot better. We then threw viewport culling into the mix, and started tackling the Transform class. As every single game object in the world has a Transform, and as it's vital to pretty much everything, being the custodian of game object placement and representation in the world, it was essential it was fast, but still offered the features you all need (like easily being able to scale a sprite, or easily get the true world coordinates of a sprite without first having to wait until a render pass has occurred, like you do in v2). The problem is that an immediate (as opposed to deferred) Transform is more expensive. But as Felipe said to me: "the fastest game engine in the world is an empty file", and he's right - it's useless of course, but it's fast. And what we're building here has to be useful too.
So all in all it's been a mixed week. We've poured hours and hours into it. I personally have been coding until the small hours several times (4am? oh hello again!) and while I know we've made great progress, it feels like we've actually slipped down Mount Doom a bit, rather than got higher up. Then again, that is the life of software development isn't it? It can't always go well, you can't always move forwards, but as long as you're still trying, and as long as you're still heading in the right direction, you'll get there eventually.
Here's hoping for an easier next week! :)
Phaser 3 Mailing List and Developers Guide
If you're interested in helping evolve the shape of Phaser 3, then please join the Phaser 3 Google Group. Discussions this week have included varying render loops. The group is for anyone who wishes to help shape what the Phaser 3 API and feature-set will contain.
The Phaser 3 Developers Guide is available. Essential reading for anyone who'd like to help build Phaser 3.