Currently pipeline compilation accesses the render pass cache, which
presents thread safety challenges. The framebuffer and render pass
caches are also slow and gross.
This adds a `gpu_pass` object which is basically just a VkRenderPass
object. The graphics module creates and caches these in
`lovrPassSetCanvas`. They are used when compiling pipelines and
beginning render passes.
Framebuffers are no longer cached but are just created and immediately
condemned dynamically when beginning a render pass. This is fine,
because framebuffers are super cheap.
There's still technically a thread safety issue with the `gpu_pass`
object caching, but it's much easier to solve that with a lock in
`lovrPassSetCanvas` compared to trying to make core/gpu's render pass
cache thread safe.
This is all still a temporary measure until we can use a more
"ergonomic" render pass API like dynamic rendering.
Oh also we stopped using automatic layout transitions because they seem
to be pessimistic in drivers and require tying render pass objects to
texture usage which is annoying. So now we do attachment layout
transitions manually before/after beginning/ending a render pass, which
isn't so bad.
Fixes a validation layer, and may result in performance improvement. I
think this technically means you can't do discard/FragDepth to adjust
depth buffer values, but that's kinda niche.
+Z is the front face in a cubemap, not -Z. Currently cubemap faces are
flipped in both the X and Z directions.
Some kind of flip is required because cubemaps use a left-handed
coordinate space instead of lovr's/vulkan's right-handed coordinate
space.
Equirect does not need any changes.
Goal is to support more combinations of shader stages, both for
vertex-only shaders and mesh/raytracing shaders in the future. In
general most of the logic that conflated "stage count" with "shader
type" has been changed to look at individual shader stages.
Now all regular Buffer objects are suballocated from big 4MB buffers,
like the temporary buffers.
Each of the big buffers is refcounted.
Wherever a `gpu_buffer` is used, the offset of the "view" into the big
buffer needs to be taken into account as well.
A practical advantage of this is that vertex/index buffers usually do
not need to be rebound between draws when drawing a different model/mesh.
Pretty much the same, but fixes an obscure lifetime issue with pass
buffer allocation.
If a pass's buffer fills up, it can't be recycled immediately because
that might cause the buffer to get reused too soon if the pass is
submitted multiple times (with draws added in between that exceed 4MB of
temp buffer allocations).
Instead, the strategy is to have the pass keep track of "full" buffers
in a pass-local freelist.
Whenever the pass is submitted, the tick of each full buffer is set to
the current tick, so we know the tick when it's safe to reuse them.
When a pass is reset (or destroyed), all of its full buffers are
returned to the global freelist of stream buffers. The current one can
remain in use.
Pooling the buffers of all the passes globally like this increases reuse
opportunities but also increases contention.
These changes also serve as preparation to suballocate Buffer objects in
the graphics module, which makes it easier to batch more stuff.
Arrays of bindings is really bad for API usability so the existing
single-descriptor API remains backwards compatible -- if you specify a
count of zero then the old "by-value" union entry is used, but you can
specify a count > 0 and then it will use an array of bindings.
- Font:getLines/Pass:text use temp memory for strings/vertices.
- Due to the recent morgue fix, resizing the atlas will now do a GPU
submit to flush the transfers before destroying the atlas.
- This GPU submit also rewinds the temp allocator, invalidating the
temp memory used for the lines/vertices, causing a use-after-free.
There are 2 changes here:
- Only flush transfers if the destroyed resource actually has pending
work (we're already tracking this w/ lastTransferRead/Write).
- Restore the allocator cursor after doing the submit.
Having to restore the allocator offset is kinda tricky/complicated,
which sucks. Other solutions might be:
- Avoid using temp memory in those Font methods. More generally, adopt
a rule where you can't use temp memory if it's possible that there
will be a submit while you're using the temp memory.
- Find some way to stop destroying the old font atlas during a resize?
- Don't rewind the temp allocator on every GPU submit. Instead only
rewind it at the end of a frame, or only when Lua does the submit.
We were conflating "parent struct of Sync pointer" and "object to be
refcounted", which isn't the case for texture views.
To fix it, add both pointers to the Access struct. This sucks because
it increases the size from 16 bytes to 24 bytes.
There might be other solutions like a "texture view mask" or having
buffers/textures store a pointer to their sync or something. But these
have some drawbacks as well. May revisit in future.
The morgue is a fixed-size queue for GPU resources that are waiting to
be destroyed. There's been an annoying issue with it for a while where
destroying too many objects at once will trigger a "Morgue overflow!"
error. Even innocuous projects that create more than 1024 textures will
see this during a normal quit.
One way to solve this problem is to make the queue unbounded instead of
bounded. However, this can hide problems and lead to more catastrophic
failure modes.
A better solution is to add "backpressure", where we avoid putting
things in the queue if it's full, or find some way to deal with them.
In this case it means finding a way to destroy stuff in the morgue when
it's full, to make space for more victims.
We weren't able to add backpressure reliably before, because command
buffers could have commands that reference the condemned resources.
This was mostly a problem for texture transfers -- if you create
thousands of textures in a loop, we'd have a giant command buffer with
commands to transfer pixels to the textures. If these textures were
destroyed before submitting anything, the morgue would fill up, and we
wouldn't have any way to clear space because there was still a pending
command buffer that needs to act on the textures!
A simple change is to flush all pending transfers whenever a buffer or
texture is destroyed. This lets us add backpressure to the morgue
because we can guarantee that there are no pending command buffers that
refer to an object in the morgue.
For backpressure, we try to destroy the oldest object in the morgue if
the GPU is done using it. If that doesn't work, we'll wait on the fence
for its tick and destroy it. This *should* always work, although in an
extreme case you could vkDeviceWaitIdle and clear out the entire morgue.
It should also be noted that in general command buffers need to be
flushed when destroying objects that they refer to. However, for our
particular usage patterns, we only need to flush state.stream when a
buffer or texture is destroyed. Pass objects already refcount their
buffers and textures and their commands are software command buffers, so
they don't require any special handling. Other objects like shaders,
pipelines, descriptor set layouts, etc. all survive until shutdown, so
those don't impact anything either.
There were numerous problems with the previous effort to add support for
linear views of sRGB storage textures. Here's another attempt:
- Images are always created with the linear version of their format.
- The default texture view uses the sRGB format if the parent is sRGB.
- Use ImageViewUsageCreateInfo to specify the usage for render/storage views.
- sRGB image views always have their storage bit forcibly cleared.
The storage view now behaves more like the existing renderView -- if we
detect that you couldn't use the default texture view for storage, we'll
create one that is guaranteed to be usable for storage bindings (by
clearing the sRGB flag on it).
Previously this would include multiple descriptors with the same
binding, which isn't allowed. Instead, just reuse the inter-stage
tracking/merging for intra-stage resources as well.