Phaser 4 Dev Report 2
Hello World!
After the release of Phaser 3.20 I was excited to dive right back into Phaser 4 dev again. I had already spent time refining the approach, so was happy with how classes should be structured and exported and how things are built in the different packages under the new @phaserjs namespace. You can read about all of this in the first Dev Report.
I knew that the next step would be to get a renderer in place. This was essential not only for testing features like the new scene graph, but also building more visual features from here on. Plus, it leads to eye candy and you can never have too much of that!
The whole point of Phaser 4 is to embrace every last aspect of modern web browsers and practices, which of course includes WebGL2. I spent a week working through lots of tutorials, coding test after test to get to grips with the new features WebGL2 contains. Fundamentally, it's very similar to WebGL1. Everything is still heavily state-based and the process is all about putting the GPU into the correct state in order to perform an action. It's this state management that takes up most of the code and where it's possible to easily get tripped-up, by leaving an element in the incorrect state, and then calling a GL function which bombs out with a less than helpful error, leading to lots of backtracking and debugging.
I knew that I wanted the WebGL2 Renderer to be as minimal as possible and less opinionated than in Phaser 3. In v3 the renderer class is responsible for a huge batch of features, from the scissor stack to pipeline registration, to texture creation. It makes it difficult to extend or enhance because you can never be too sure what else is relying on it. While researching WebGL2 I came across Tarek Sherif's PicoGL library. I really liked the way it focused on managing the GPU state and the creation of common WebGL elements such as vertex buffers, render textures, uniform buffers and draw calls.
It is minimal to the core. There's no scene graph and not even a single shader bundled into the main library. It's just a small bunch of building blocks that you pull together as required, each one of which carefully tracks the GPU state. It is very similar to how the renderer was being built for Lazer several years ago, it was just more featured as it had been completed. This made it perfect for my needs, so I took the key parts of Pico, ported them to TypeScript and tided things up a bit and set about testing it out. To give you an idea of the approach let's create the famous WebGL triangle:
If you've ever followed a WebGL tutorial this is nearly always step 1. It's a good starting test because it involves creating simple vertex and fragment shaders, along with a vertex buffers for the triangle data and a vertex array that the draw call can use. Here's the code:
The majority of the lines of source code are the two shaders. The rest of it is pretty straight forward. A few verts, colors and a VAO to bind it all together. It won't set the world on fire, but from small acorns grow big trees and all that.
The next stage was to get a textured quad on-screen. This was very straight forward and led me to do a range of purely synthetic performance tests. I was curious to see what kind of frame rate differences were to be had compared to WebGL1. So I set about trying all sorts of tests. From batches of small indexed interleaved buffers (as Phaser 3 does it) to single giant data arrays. I tried all kinds of things, even enabling the beta multi-draw WebGL2 extension in Canary. The stats were quite insane and it was pleasing to see I was often hitting the limits of the GPU before those of the CPU. For example, I was pushing 2,097,152 quads per draw call and batching 12 draws together, for 12 million quads at a solid 60 fps.
(^ that's 1 million bits of fruit, more than enough for your 5-a-day)
However, these tests used a tiny 2x2 texture. Increase the texture size to 32x32 and the figures drop accordingly (down to just 1,048,576 quads) - which means I was clearly hitting the fillrate limit of my GPU, as the CPU load was next to nothing. I have got a GeForce 1660 in my main work PC, which has a theoretical fillrate of 88 billion texture fills per second. Given my monitor's refresh rate that allows me 1.17 billion pixels per frame. This led me to create a few pure fillrate tests, so no textures, just pixels being pushed from the shader. After some testing, I found I could comfortably draw 662 million pixels at 60fps, or just over 1 billion at 40fps.
Again, I'm sure the fillrates you see on GPU packaging these days are just a bit of marketing fluff and most likely just the sum of the ROPs x GPU clock speed. Even so, it was good to see that WebGL2 was capable of in the hands of Chrome Canary (and indeed my ported renderer code). After all, being able to fill the screen 600+ times over with pixels isn't too shabby. It also highlights an interesting issue with "2d html5 performance tests". For example, Bunny Mark is an infamous 2D test. If you've not seen it, it just checks to see how many bunny sprites can be bounced around the screen. The thing is, if you were to replace those bunny sprites with a texture half the size (or, indeed, double the size) it will have a huge impact on the results. This is why retina displays are such a drain on the GPU. However, just like my fillrate tests, it does provide a baseline from which to work from.
These tests led me to think that, when it comes to a typical 2d game, probably the single biggest killer of frame rate is overdraw. Phaser, as with most 2D frameworks, will draw from back-to-front. So it renders your game background first, followed by foreground objects, sprites, UI and so on, layering it up as it iterates through the display list. In cases where lots of these elements overlap each other, you've got overdraw going on - pixels 'at the back' being drawn over again and again. And when the biggest GPU bottleneck is fillrate, this is something you really want to avoid. As with most things in 2D, this issue has been solved in various ways in the past. One such method is using a depth buffer, with sprites having incremental z-depths and drawing front-to-back. Another is using an alpha and color pass, so drawing front-to-back and then back-to-front, discarding pixels as it goes based on the pass results. It's an area I'll be looking much more into as I built the scene graph and sprite renderer next.
As the renderer stands now it's just a bunch of bricks. How those bricks are assembled is entirely up to to you, and me. In the code I showed above that isn't how I expect you to have to use Phaser, of course. However, that doesn't mean you can't. It also means Phaser isn't locked to a 2D renderer:
With a small alteration to the shader, to pass in a few transform matrices, we have a textured spinning cube. Push that demo a little further with a scene uniform buffer and we have 64,000 uniquely spinning textured cubes in a single draw call, at a rock-solid 60fps thanks to WebGL2 instancing:
Please don't let me give you the wrong impression: I have no plans to build support for 3D directly into Phaser 4. Yet, I absolutely 100% will let you have access to all of the core renderer features, meaning there will be absolutely nothing stopping you from creating the above for yourself, without having to make the renderer bend backward, or do some kind of three.js integration kung-fu, to make it happen. What's more, the new renderer is nicely broken up into isolated modules, making use of TypeScript interfaces to achieve strictness in objects and as it stands today, with all features enabled, it's under 7kb min+gz.
As with all Phaser 4 development it's all being done in the public repos and under the new @phaserjs npm namespace, as are all of the tests shown above. I also wrote a quite long guide on how new modules and tests are created, more for my own use than anything, but I figured it would be handy reading for anyone wanting to learn more about the project structure. I also registered the domain phaser4.io yesterday, so I'll try and get all the demos uploaded there soon, so you can run them for yourselves.
I'm happy that the renderer is now doing all that I need of it, so the next task is to get down and dirty with the scene graph and the matrix transforms. Those, combined with what I've done so far, will give us a fully-featured sprite renderer. And that's a very powerful position to be in. My plan is to carry on with this for the rest of this week, and then next week I'll divert back to Phaser 3 and publish the 3.21 release, based on all the fantastic PRs that have built-up in GitHub recently. Either way, you can keep track of what's happening here, or in the Phaser Discord channel, where you'll find me every weekday between 10am and 5pm GMT.
Hello World!
After the release of Phaser 3.20 I was excited to dive right back into Phaser 4 dev again. I had already spent time refining the approach, so was happy with how classes should be structured and exported and how things are built in the different packages under the new @phaserjs namespace. You can read about all of this in the first Dev Report.
I knew that the next step would be to get a renderer in place. This was essential not only for testing features like the new scene graph, but also building more visual features from here on. Plus, it leads to eye candy and you can never have too much of that!
The whole point of Phaser 4 is to embrace every last aspect of modern web browsers and practices, which of course includes WebGL2. I spent a week working through lots of tutorials, coding test after test to get to grips with the new features WebGL2 contains. Fundamentally, it's very similar to WebGL1. Everything is still heavily state-based and the process is all about putting the GPU into the correct state in order to perform an action. It's this state management that takes up most of the code and where it's possible to easily get tripped-up, by leaving an element in the incorrect state, and then calling a GL function which bombs out with a less than helpful error, leading to lots of backtracking and debugging.
I knew that I wanted the WebGL2 Renderer to be as minimal as possible and less opinionated than in Phaser 3. In v3 the renderer class is responsible for a huge batch of features, from the scissor stack to pipeline registration, to texture creation. It makes it difficult to extend or enhance because you can never be too sure what else is relying on it. While researching WebGL2 I came across Tarek Sherif's PicoGL library. I really liked the way it focused on managing the GPU state and the creation of common WebGL elements such as vertex buffers, render textures, uniform buffers and draw calls.
It is minimal to the core. There's no scene graph and not even a single shader bundled into the main library. It's just a small bunch of building blocks that you pull together as required, each one of which carefully tracks the GPU state. It is very similar to how the renderer was being built for Lazer several years ago, it was just more featured as it had been completed. This made it perfect for my needs, so I took the key parts of Pico, ported them to TypeScript and tided things up a bit and set about testing it out. To give you an idea of the approach let's create the famous WebGL triangle:
If you've ever followed a WebGL tutorial this is nearly always step 1. It's a good starting test because it involves creating simple vertex and fragment shaders, along with a vertex buffers for the triangle data and a vertex array that the draw call can use. Here's the code:
The majority of the lines of source code are the two shaders. The rest of it is pretty straight forward. A few verts, colors and a VAO to bind it all together. It won't set the world on fire, but from small acorns grow big trees and all that.
The next stage was to get a textured quad on-screen. This was very straight forward and led me to do a range of purely synthetic performance tests. I was curious to see what kind of frame rate differences were to be had compared to WebGL1. So I set about trying all sorts of tests. From batches of small indexed interleaved buffers (as Phaser 3 does it) to single giant data arrays. I tried all kinds of things, even enabling the beta multi-draw WebGL2 extension in Canary. The stats were quite insane and it was pleasing to see I was often hitting the limits of the GPU before those of the CPU. For example, I was pushing 2,097,152 quads per draw call and batching 12 draws together, for 12 million quads at a solid 60 fps.
(^ that's 1 million bits of fruit, more than enough for your 5-a-day)
However, these tests used a tiny 2x2 texture. Increase the texture size to 32x32 and the figures drop accordingly (down to just 1,048,576 quads) - which means I was clearly hitting the fillrate limit of my GPU, as the CPU load was next to nothing. I have got a GeForce 1660 in my main work PC, which has a theoretical fillrate of 88 billion texture fills per second. Given my monitor's refresh rate that allows me 1.17 billion pixels per frame. This led me to create a few pure fillrate tests, so no textures, just pixels being pushed from the shader. After some testing, I found I could comfortably draw 662 million pixels at 60fps, or just over 1 billion at 40fps.
Again, I'm sure the fillrates you see on GPU packaging these days are just a bit of marketing fluff and most likely just the sum of the ROPs x GPU clock speed. Even so, it was good to see that WebGL2 was capable of in the hands of Chrome Canary (and indeed my ported renderer code). After all, being able to fill the screen 600+ times over with pixels isn't too shabby. It also highlights an interesting issue with "2d html5 performance tests". For example, Bunny Mark is an infamous 2D test. If you've not seen it, it just checks to see how many bunny sprites can be bounced around the screen. The thing is, if you were to replace those bunny sprites with a texture half the size (or, indeed, double the size) it will have a huge impact on the results. This is why retina displays are such a drain on the GPU. However, just like my fillrate tests, it does provide a baseline from which to work from.
These tests led me to think that, when it comes to a typical 2d game, probably the single biggest killer of frame rate is overdraw. Phaser, as with most 2D frameworks, will draw from back-to-front. So it renders your game background first, followed by foreground objects, sprites, UI and so on, layering it up as it iterates through the display list. In cases where lots of these elements overlap each other, you've got overdraw going on - pixels 'at the back' being drawn over again and again. And when the biggest GPU bottleneck is fillrate, this is something you really want to avoid. As with most things in 2D, this issue has been solved in various ways in the past. One such method is using a depth buffer, with sprites having incremental z-depths and drawing front-to-back. Another is using an alpha and color pass, so drawing front-to-back and then back-to-front, discarding pixels as it goes based on the pass results. It's an area I'll be looking much more into as I built the scene graph and sprite renderer next.
As the renderer stands now it's just a bunch of bricks. How those bricks are assembled is entirely up to to you, and me. In the code I showed above that isn't how I expect you to have to use Phaser, of course. However, that doesn't mean you can't. It also means Phaser isn't locked to a 2D renderer:
With a small alteration to the shader, to pass in a few transform matrices, we have a textured spinning cube. Push that demo a little further with a scene uniform buffer and we have 64,000 uniquely spinning textured cubes in a single draw call, at a rock-solid 60fps thanks to WebGL2 instancing:
Please don't let me give you the wrong impression: I have no plans to build support for 3D directly into Phaser 4. Yet, I absolutely 100% will let you have access to all of the core renderer features, meaning there will be absolutely nothing stopping you from creating the above for yourself, without having to make the renderer bend backward, or do some kind of three.js integration kung-fu, to make it happen. What's more, the new renderer is nicely broken up into isolated modules, making use of TypeScript interfaces to achieve strictness in objects and as it stands today, with all features enabled, it's under 7kb min+gz.
As with all Phaser 4 development it's all being done in the public repos and under the new @phaserjs npm namespace, as are all of the tests shown above. I also wrote a quite long guide on how new modules and tests are created, more for my own use than anything, but I figured it would be handy reading for anyone wanting to learn more about the project structure. I also registered the domain phaser4.io yesterday, so I'll try and get all the demos uploaded there soon, so you can run them for yourselves.
I'm happy that the renderer is now doing all that I need of it, so the next task is to get down and dirty with the scene graph and the matrix transforms. Those, combined with what I've done so far, will give us a fully-featured sprite renderer. And that's a very powerful position to be in. My plan is to carry on with this for the rest of this week, and then next week I'll divert back to Phaser 3 and publish the 3.21 release, based on all the fantastic PRs that have built-up in GitHub recently. Either way, you can keep track of what's happening here, or in the Phaser Discord channel, where you'll find me every weekday between 10am and 5pm GMT.