Timeway- Optimisation update

The values we're interested in this case here is the PRObjects render code, represented in the
green bar (as you can see, it's a pretty massive bar). However, I will later show optimisations
for the terrain render code (brown bar) as the culling also culls the terrain.

Eliminates need for sorting while rendering without glitches,
very fast on the CPU side, relatively easy to code, no memory garbage (mostly).

The PRObject rendering code takes about 8600 microseconds to execute: that is 8.6 milliseconds of
the 16 millisecond time budget we have to render the frame. That is pretty poor, especially
considering we're running this on pretty powerful hardware (a Surface Laptop Studio 2).

Trees in Timeway each use a separate texture for each tree, which is very inefficient; this requires a state change for
each tree object, and there could be hundreds, or even thousands of trees in a scene. This optimisation combines all the
tree textures into one texture, and accesses each texture using UV coordinates. No need for texture switching.

A rendering code overhaul later with a whole load of bug fixing, we get these results:

Potentially solves the PShape overhead problem if I manage to pull it off,
potentially fast-ish, no visual glitches, no memory garbage

Optimisation 2: Remove immediate mode rendering

31/05/2025
14:26:41

For Timeway 0.1.6, I decided to apply some optimisations that I wanted to do for a long,
long time. The optimisations are Pixel Realm related, and these consist of the following:

A bit hard to code.

Every object in a scene is rendered using Processing's immediate mode; this is very inefficient as geometry is generated on
the CPU and buffered to the GPU every frame. Additionally, this causes loads of garbage in the JVM, which means the
garbage collector has to run very frequently, using even more CPU time. We can optimise by generating geometry only once
and then calling draw commands every frame.

Cons:

Modify Processing's PShape code to be faster

But there was one more option which I ultimately decided to choose:

Cons:

Results

I will collect 4 separate runs for each optimisation.

Overall

Slow, causes massive memory garbage, choosing this option feels like admitting defeat.

Optimisation 3: Culling

Just stick to immediate rendering

Use OpenGL to bypass the PShape overhead.

Admittedly, the culling algorithm used is very lightweight and basic; there is still a lot of
off-screen objects being rendered. But even with this simple algorithm, we've almost slashed
the rendering time in half!

Also, I applied this culling to the terrain (and the water), so we have some bonus time reductions:

It's not much (about 0.4 milliseconds faster), but there's something even more important it
improve on: reduced memory garbage.

Using a memory usage bar in Timeway, we can see how memory is used:

This involves calling glDraw() on each object, which is slower than Processing's batched
"retained" mode, but faster than generating all the geometry on the fly (as well as calling
glDraw() for each object) and more crutially, we skip whatever's causing PShape to be so
slow and just skip right to the glDraw() call.

Pros:

Inflexible, cannot sort from furthest to shortest so visibility glitches occur
(see my blog post titled "My solution to the visibility problem"), lose accurate
transparency.

Pros:

Benchmarking

Timeway has a built-in benchmarking feature which measures how long certain points in the
program take in microseconds on the CPU. This records over a number of frames (180 in this case)
and averages them up:

Slow on the GPU side, we lose accurate transparency, generally bad practice
to use the "discard" function.

Pros:

2. Removing immediate rendering

Timeway- Optimisation update

1. Combined texture on trees

For getting speedup information, I'll be using a test scene:

No coding required, why bother when it already renders visually accurately.

You can see from the red bar that rendering the terrain is almost 3 times faster now.

I could improve this culling algorithm by adding a frustrum culling stage to further cut out
unseen objects and terrain, but I'll leave it here for now. I can do this in a future update.

Cons:

If this image was animated, you would see the bar filling up before dropping back down when
it reaches around ~90% memory usage. The reason it continuously fills up is because of
memory garbage in the JVM; the speed at which it fills up depends on how much memory
garbage we're creating. And using immediate mode makes that bar fill up VERY quickly,
especially in our test scene. If I plotted a graph, it would look a bit like this:

Pros:

Fast-ish, renders correct visually with accurate transparency, definitely
no memory garbage.

Rendering a scene is now ~66% faster. Additionally, I tested it on my old laptop,
the Surface Book 2; the older version ran at 10 frames per second on the test scene (yikes!)
The new version now runs at 30-40 frames per second, a massive improvement.

Overall, I'm really happy with Timeway 0.1.6, and I'll later release it once I test it and patch
any other bugs that I find.

First, the base times pre-optimisation:

After:

Pros:

Optimisation 1: Combined tree textures

Very difficult to do, and I don't know if it's even possible.

Overall, this means a measurement which is not recorded by the benchmark - CPU usage - is
reduced quite a bit. On my machine, CPU usage went down from ~10% to ~8% just looking at
task manager alone. Of course, lower CPU means better performance, but also less power usage,
which is a goal I'm also trying to achieve with Timeway.

This optimisation was hard to implement, because, as it would turn out, using Processing's
retained features, "PShape", turned out to be really slow for rendering each object; in fact,
it was slower than rendering in immediate mode!

There were some options I could take, but they had their limitations:

However, with the new optimisation, there's a massive improvement; the memory garbage rate
is MUCH lower. It looks a bit like this now:

In this optimisation, we avoid texture switching by combining all textures into one single
texture object, and simply use UV coordinates to tell the GPU which parts of the image to
render. Already, that is a massive improvement: we're using ~3.2 milliseconds less time!

Before:

Cons:

Use the "discard" method on the GPU and use Processing's retained mode.

Easy to code, super-fast rendering, no memory garbage (mostly).

Processing's retained mode rendering

Up until now, every object within the viewing radius was rendered whether it was visible on-screen or not. This optimisation
employs a very basic culling algorithm which has a very fast check, and cuts many objects which aren't visible on screen.

Cons:

3. Culling