Basically it's somewhat common for depth-stencil formats to not support
linear filtering and that is kind of annoying because you can't create
depth textures with the `sample` usage. Instead we'll just ignore the
LINEAR format feature bit for now.
In the future I'd like to fix this by silently demoting individual
texture's filtering to nearest when linear is not supported for the
format, but this requires per-texture sampler settings which isn't done
yet.
Currently pipeline compilation accesses the render pass cache, which
presents thread safety challenges. The framebuffer and render pass
caches are also slow and gross.
This adds a `gpu_pass` object which is basically just a VkRenderPass
object. The graphics module creates and caches these in
`lovrPassSetCanvas`. They are used when compiling pipelines and
beginning render passes.
Framebuffers are no longer cached but are just created and immediately
condemned dynamically when beginning a render pass. This is fine,
because framebuffers are super cheap.
There's still technically a thread safety issue with the `gpu_pass`
object caching, but it's much easier to solve that with a lock in
`lovrPassSetCanvas` compared to trying to make core/gpu's render pass
cache thread safe.
This is all still a temporary measure until we can use a more
"ergonomic" render pass API like dynamic rendering.
Oh also we stopped using automatic layout transitions because they seem
to be pessimistic in drivers and require tying render pass objects to
texture usage which is annoying. So now we do attachment layout
transitions manually before/after beginning/ending a render pass, which
isn't so bad.
Fixes a validation layer, and may result in performance improvement. I
think this technically means you can't do discard/FragDepth to adjust
depth buffer values, but that's kinda niche.
+Z is the front face in a cubemap, not -Z. Currently cubemap faces are
flipped in both the X and Z directions.
Some kind of flip is required because cubemaps use a left-handed
coordinate space instead of lovr's/vulkan's right-handed coordinate
space.
Equirect does not need any changes.
Goal is to support more combinations of shader stages, both for
vertex-only shaders and mesh/raytracing shaders in the future. In
general most of the logic that conflated "stage count" with "shader
type" has been changed to look at individual shader stages.
Now all regular Buffer objects are suballocated from big 4MB buffers,
like the temporary buffers.
Each of the big buffers is refcounted.
Wherever a `gpu_buffer` is used, the offset of the "view" into the big
buffer needs to be taken into account as well.
A practical advantage of this is that vertex/index buffers usually do
not need to be rebound between draws when drawing a different model/mesh.
Pretty much the same, but fixes an obscure lifetime issue with pass
buffer allocation.
If a pass's buffer fills up, it can't be recycled immediately because
that might cause the buffer to get reused too soon if the pass is
submitted multiple times (with draws added in between that exceed 4MB of
temp buffer allocations).
Instead, the strategy is to have the pass keep track of "full" buffers
in a pass-local freelist.
Whenever the pass is submitted, the tick of each full buffer is set to
the current tick, so we know the tick when it's safe to reuse them.
When a pass is reset (or destroyed), all of its full buffers are
returned to the global freelist of stream buffers. The current one can
remain in use.
Pooling the buffers of all the passes globally like this increases reuse
opportunities but also increases contention.
These changes also serve as preparation to suballocate Buffer objects in
the graphics module, which makes it easier to batch more stuff.
Arrays of bindings is really bad for API usability so the existing
single-descriptor API remains backwards compatible -- if you specify a
count of zero then the old "by-value" union entry is used, but you can
specify a count > 0 and then it will use an array of bindings.
- Font:getLines/Pass:text use temp memory for strings/vertices.
- Due to the recent morgue fix, resizing the atlas will now do a GPU
submit to flush the transfers before destroying the atlas.
- This GPU submit also rewinds the temp allocator, invalidating the
temp memory used for the lines/vertices, causing a use-after-free.
There are 2 changes here:
- Only flush transfers if the destroyed resource actually has pending
work (we're already tracking this w/ lastTransferRead/Write).
- Restore the allocator cursor after doing the submit.
Having to restore the allocator offset is kinda tricky/complicated,
which sucks. Other solutions might be:
- Avoid using temp memory in those Font methods. More generally, adopt
a rule where you can't use temp memory if it's possible that there
will be a submit while you're using the temp memory.
- Find some way to stop destroying the old font atlas during a resize?
- Don't rewind the temp allocator on every GPU submit. Instead only
rewind it at the end of a frame, or only when Lua does the submit.
We were conflating "parent struct of Sync pointer" and "object to be
refcounted", which isn't the case for texture views.
To fix it, add both pointers to the Access struct. This sucks because
it increases the size from 16 bytes to 24 bytes.
There might be other solutions like a "texture view mask" or having
buffers/textures store a pointer to their sync or something. But these
have some drawbacks as well. May revisit in future.