- Compatible
- Minimal sacrifices to performance
- Massive API
- Minimal sacrifices to performance
- Massive API
Feeling dizzy after reading all this? Well, you can imagine the fun I had figuring out these oddities with
OpenGL, and having to write code to emulate those fixes.
Keep in mind, in my project I'm only emulating a small subset of OpenGL. I can't even begin to imagine how
much emulation goes on in the background in the drivers to emulate everything perfectly in OpenGL to
the GPU. All the shaders, states, UBOs, textures, data, it's a lot when you really think about it!
OpenGL, and having to write code to emulate those fixes.
Keep in mind, in my project I'm only emulating a small subset of OpenGL. I can't even begin to imagine how
much emulation goes on in the background in the drivers to emulate everything perfectly in OpenGL to
the GPU. All the shaders, states, UBOs, textures, data, it's a lot when you really think about it!
My 4th year uni project is about adding a new rendering backend to the Processing framework which uses
Vulkan, instead of OpenGL (what it currently uses).
Vulkan, instead of OpenGL (what it currently uses).
After writing a whole translation layer for my 4th year project... yeah. I can definitely say that there's
some overhead needed to translate OpenGL calls into Vulkan calls. Here's just a few examples of weird
OpenGL quirks I had to emulate in Vulkan:
some overhead needed to translate OpenGL calls into Vulkan calls. Here's just a few examples of weird
OpenGL quirks I had to emulate in Vulkan:
And, when I finally got to benchmark it for the first time, something I was not expecting at all; the GPU
usage was down. Like, way down. 50% less GPU usage.
I was not expecting that... I thought Vulkan mainly lowered CPU usage as the GPU just carried out the same
tasks as OpenGL... but apparently not! Needless to say, this discover will make my benchmarks look even
better than I initially thought.
But why all this GPU usage in OpenGL? Well, remember when I said that I suspect OpenGL drivers add extra
microcode to the GPU for compatibility? I feel like that's what's going on. The indices short overflow is one
example I can think of at the top of my head. But hey, this is just a theory.
usage was down. Like, way down. 50% less GPU usage.
I was not expecting that... I thought Vulkan mainly lowered CPU usage as the GPU just carried out the same
tasks as OpenGL... but apparently not! Needless to say, this discover will make my benchmarks look even
better than I initially thought.
But why all this GPU usage in OpenGL? Well, remember when I said that I suspect OpenGL drivers add extra
microcode to the GPU for compatibility? I feel like that's what's going on. The indices short overflow is one
example I can think of at the top of my head. But hey, this is just a theory.
Vulkan
OpenGL
So, after writing all these thousands of lines of code to get Processing running in Vulkan, was it worth it?
So far, yes.
It really depends on the sketch, but in benchmarking sketches, Vulkan was about 1.5x faster than OpenGL.
In some cases, this might not sound like much, but consider this;
Your game runs at 40fps in OpenGL. But because 40 is not a multiple of 60, it ends up skipping a frame every
few frames and the framerate is not smooth and looks quite jittery, so naturally you have to gear down to
30fps.
If you use Vulkan, you'd gain just enough performance to target a constant 60fps.
So even though it only boosted performance by 20fps, it might as well be doubling the framerate because we
were stuck on 30fps. Neato!
Not only this, but in some cases, such as the lines test, it was running 20fps in OpenGL, and 50fps in Vulkan...
an absolutely insane performance boost.
So far, yes.
It really depends on the sketch, but in benchmarking sketches, Vulkan was about 1.5x faster than OpenGL.
In some cases, this might not sound like much, but consider this;
Your game runs at 40fps in OpenGL. But because 40 is not a multiple of 60, it ends up skipping a frame every
few frames and the framerate is not smooth and looks quite jittery, so naturally you have to gear down to
30fps.
If you use Vulkan, you'd gain just enough performance to target a constant 60fps.
So even though it only boosted performance by 20fps, it might as well be doubling the framerate because we
were stuck on 30fps. Neato!
Not only this, but in some cases, such as the lines test, it was running 20fps in OpenGL, and 50fps in Vulkan...
an absolutely insane performance boost.
Because the codebase spans thousands of lines of code, I instead wrote a translation layer that translated
OpenGL calls to Vulkan calls via Processing's PGL abstraction layer. Easy conceptually, but it turns out
OpenGL has a lot of quirks that make it a difficult job.
OpenGL calls to Vulkan calls via Processing's PGL abstraction layer. Easy conceptually, but it turns out
OpenGL has a lot of quirks that make it a difficult job.
See, OpenGL was designed so that no matter what GPU you ran it on, your application would look the same,
even if it needs to sacrifice a bit of performance to do this.
Vulkan was also designed so that your application looked the same across different GPUs. However, it's also
designed to resemble the majority of GPU hardware as close as possible, which eliminates having to
sacrifice performance for compatibility... mostly. But of course, the API is a lot bigger, and you, the
developer, need to write a lot of code to do this.
even if it needs to sacrifice a bit of performance to do this.
Vulkan was also designed so that your application looked the same across different GPUs. However, it's also
designed to resemble the majority of GPU hardware as close as possible, which eliminates having to
sacrifice performance for compatibility... mostly. But of course, the API is a lot bigger, and you, the
developer, need to write a lot of code to do this.
02/01/2025
13:05:54
13:05:54
Learning Vulkan in 2024
-> Write X to buffer A
-> Draw buffer A
-> Write Y to buffer A
-> Draw buffer A
-> Submit queue
-> Draw buffer A
(Y was last written so this gets rendered instead of X, uhoh)
-> Draw buffer A (Y is rendered)
-> Draw buffer A
-> Write Y to buffer A
-> Draw buffer A
-> Submit queue
-> Draw buffer A
(Y was last written so this gets rendered instead of X, uhoh)
-> Draw buffer A (Y is rendered)
To fix this without losing performance, we automatically utilise several buffers for each draw call.
If a buffer which has been written to detects that it's being written to again, we just create a new
buffer and tell the next draw call to use that buffer instead, all while disguising it as the same
buffer in the OpenGL end.
- Vulkan expects you to predefine bindings in the form of IDs for attaching attributes to shaders.
Instead of accessing the attribute named "inVertices", you access the attribute at binding index 0.
In reality, it's more complicated than that, but that means I had to write an abstraction system
where we keep track of what attrib name belonged to what binding ID.
- In OpenGL, up is negative Y and down is positive Y. In Vulkan, it's the other way round. To fix this,
I simply added a piece of code to the vertex shader (as it's being translated from OpenGL-GLSL to
Vulkan GLSL) which inverts gl_Position.y. It would be better to intercept the transformation matrix,
but it's barely a performance penalty and it's more compatible with base OpenGL.
If a buffer which has been written to detects that it's being written to again, we just create a new
buffer and tell the next draw call to use that buffer instead, all while disguising it as the same
buffer in the OpenGL end.
- Vulkan expects you to predefine bindings in the form of IDs for attaching attributes to shaders.
Instead of accessing the attribute named "inVertices", you access the attribute at binding index 0.
In reality, it's more complicated than that, but that means I had to write an abstraction system
where we keep track of what attrib name belonged to what binding ID.
- In OpenGL, up is negative Y and down is positive Y. In Vulkan, it's the other way round. To fix this,
I simply added a piece of code to the vertex shader (as it's being translated from OpenGL-GLSL to
Vulkan GLSL) which inverts gl_Position.y. It would be better to intercept the transformation matrix,
but it's barely a performance penalty and it's more compatible with base OpenGL.
- Compatible
- Sacrifice performance
- Basic API
- Sacrifice performance
- Basic API
-> Write X to buffer A
-> Draw buffer A (X is rendered)
-> (wait until a write can be allowed)
-> Write Y to buffer A
-> Draw buffer A (Y is rendered)
-> Draw buffer A (X is rendered)
-> (wait until a write can be allowed)
-> Write Y to buffer A
-> Draw buffer A (Y is rendered)
I can imagine a lot of emulation also takes place inside the GPU, perhaps the drivers are adding extra bits
of microcode to your OpenGL shaders to emulate the legacy hardware that OpenGL is originally based off of
before the Vulkan days. I'll explain why in just a sec.
of microcode to your OpenGL shaders to emulate the legacy hardware that OpenGL is originally based off of
before the Vulkan days. I'll explain why in just a sec.
Why?
- OpenGL is an old, slow API that originates from the 90s and is superseded by newer API's like DirectX 12,
Metal, and the star of the show, Vulkan.
- Big and complicated Processing sketches can get pretty slow due to OpenGL (theoretically)
- I'm curious; what happens if it were to use Vulkan instead of OpenGL? Could we see a performance
improvement?
- And most important reason of all; because it's fun. And I want to learn Vulkan.
Metal, and the star of the show, Vulkan.
- Big and complicated Processing sketches can get pretty slow due to OpenGL (theoretically)
- I'm curious; what happens if it were to use Vulkan instead of OpenGL? Could we see a performance
improvement?
- And most important reason of all; because it's fun. And I want to learn Vulkan.
Vulkan
Speaking of theories, I have another theory as to why OpenGL and Vulkan commands take a while to process.
Well, aside from OpenGL's compatibility emulation, of course.
But first, I want to mention I added multithreading to the Vulkan renderer in Processing, where it
automatically assigns a thread for each GL command called. I won't get into too much detail (trust me, I've
learned loads about multithreading), but one bottleneck I kept coming across was that calling interrupt()
to wake a thread from sleep took a long time. 20 microseconds, which added up super quickly.
Turns out, context switching in an operating system is a costly process. Why? I have yet to find out.
But my solution to this problem is simple but ugly; busy wait for a certain amount of time before sleeping
to avoid the wasted time calling interrupt().
What's my point to all this?
Well, my theory as to why OpenGL/Vulkan commands take long to process is because of context switching.
Whenever you call a command like vkCmdDrawIndices, the process has to go into kernel mode to run the
driver code that handles vkCmdDrawIndices- a context switch. When that is done, we need to context switch
back to user mode to continue our Java program.
This is partially backed by an blog I read somewhere (I can't remember where) which looked at
Vulkan commands on a machine-code level. Apparently a lot of security checks happen when context switching,
which is one of the reasons why it takes so long.
Well, aside from OpenGL's compatibility emulation, of course.
But first, I want to mention I added multithreading to the Vulkan renderer in Processing, where it
automatically assigns a thread for each GL command called. I won't get into too much detail (trust me, I've
learned loads about multithreading), but one bottleneck I kept coming across was that calling interrupt()
to wake a thread from sleep took a long time. 20 microseconds, which added up super quickly.
Turns out, context switching in an operating system is a costly process. Why? I have yet to find out.
But my solution to this problem is simple but ugly; busy wait for a certain amount of time before sleeping
to avoid the wasted time calling interrupt().
What's my point to all this?
Well, my theory as to why OpenGL/Vulkan commands take long to process is because of context switching.
Whenever you call a command like vkCmdDrawIndices, the process has to go into kernel mode to run the
driver code that handles vkCmdDrawIndices- a context switch. When that is done, we need to context switch
back to user mode to continue our Java program.
This is partially backed by an blog I read somewhere (I can't remember where) which looked at
Vulkan commands on a machine-code level. Apparently a lot of security checks happen when context switching,
which is one of the reasons why it takes so long.
- Indices that overflow (using shorts) are accounted for in OpenGL and an offset
to the vertex index (of 32768) is added when that overflow is detected. In
Vulkan, you must know when your indices overflows and specify this offset
yourself. The table on the left shows that overflow point.
- OpenGL submits commands to the GPU as they're called, whereas Vulkan sends
commands in batches on each frame- much faster as it avoids synchronisation
overhead. But this means that buffers that are overwritten in the same frame
will have the incorrect values, because:
to the vertex index (of 32768) is added when that overflow is detected. In
Vulkan, you must know when your indices overflows and specify this offset
yourself. The table on the left shows that overflow point.
- OpenGL submits commands to the GPU as they're called, whereas Vulkan sends
commands in batches on each frame- much faster as it avoids synchronisation
overhead. But this means that buffers that are overwritten in the same frame
will have the incorrect values, because:
My Vulkan project
OpenGL
Vulkan
Wow... it's been a long, lonnng time since I've written a blog entry here. And man, has a lot changed.
It is now 2025 and I think I'm going to start off this year by talking about Vulkan
I'll maybe talk about what's been happening in 2024 later on. Maybe.
Also, since my last entry, Timeway can now scroll vertically infinitely, which means that I'm not limited to
a screen's height for each entry. So scroll on down while I ramble on about what I've learned about
Vulkan in 2024.
It is now 2025 and I think I'm going to start off this year by talking about Vulkan
I'll maybe talk about what's been happening in 2024 later on. Maybe.
Also, since my last entry, Timeway can now scroll vertically infinitely, which means that I'm not limited to
a screen's height for each entry. So scroll on down while I ramble on about what I've learned about
Vulkan in 2024.
But, I think that wraps it up for what I've learned in Vulkan. Of course, there's way more that I haven't
talked about- texturing, buffers, loads about multithreading, descriptor sets, pipelines, I could go on.
What I've talked about there are some of the niche but interesting details I've noticed while spending
hours upon hours on this project. Maybe I'll talk more about Vulkan later on, who knows?
talked about- texturing, buffers, loads about multithreading, descriptor sets, pipelines, I could go on.
What I've talked about there are some of the niche but interesting details I've noticed while spending
hours upon hours on this project. Maybe I'll talk more about Vulkan later on, who knows?