If the number of skinned vertices in a Model doesn't fit in a single
dispatch (~2M vertices on 32-sized subgroups or ~500K vertices on
8-sized subgroups), split it into multiple dispatches.
- Pass:mesh accepts tables for vertices/indices
- Add Pass:setVertexFormat to set format used for table-based meshes
- Pass:send accepts tables for buffers
- Pass:send supports arbitrarily nested structs/arrays for push constants
- Buffer formats support arbitrarily nested structs/arrays
- Zero-length buffers are valid and represent structs
- Fields can have names using 'name'
- Field types can be tables of other fields (structs)
- Fields can have 'length' key
- newBuffer syntax has been changed to put format first (old version
still works)
- Buffers can be created from shader variables, avoiding need to declare
matching format.
- Pass:clear/Pass:read use byte offsets instead of indices
- Pass:copy uses byte offsets when copying a Buffer to a Buffer
- Deprecate lovr.graphics.getBuffer (tables can be used instead)
- core/spv just returns the type of image variables instead of trying to
validate them.
- When Shader is loading resources, it will reject combined image
samplers, uniform/texel buffers, and input attachments, with better
error messages that include the binding number of the invalid resource.
There were certain cases where an unrenderable texture view could be
created from a renderable parent texture (e.g. it has multiple mipmap
levels). This would leave renderView as NULL, which would cause a
crash.
If the target index is missing, the state will apply to all targets.
Fixes undefined behavior when setting color state in a pass with
multiple color attachments.
- Allow multisampled render pass to have a single-sample depth attachment
- Add a new depthResolve feature, indicating whether it's supported
- Fix stencil load/save
- Minor changes to render pass caching
- Currently the depth resolve is done using the first sample. A future
improvement would be to expose/use the min/max/average resolve modes.
Only when a readback is read back before a pass is created.
Should really change gpu to know if the frame has started yet and adjust
the tick index accordingly.
Fixes easily-encounterable GPU OOM on discrete cards.
Currently when mapping CPU-accessible GPU memory, there are only two
types of memory: write and read.
The "write" allocations try to use the special 256MB pinned memory
region, with the thought that since this memory is usually for vertices,
uniforms, etc. it should be fast.
However, this memory is also used for staging buffers for buffers and
textures, which can easily exceed the 256MB (or 246MB on NV) limit upon
creating a handful of large textures.
To fix this, we're going to separate WRITE mappings into STREAM and
STAGING. STREAM will act like the old CPU_WRITE mapping type and use
the same memory type. STAGING will use plain host-visible memory and
avoid hogging the precious 256MB memory region.
STAGING also uses a different allocation strategy. Instead of creating
a big buffer with a zone for each tick, it's a more traditional linear
allocator that allocates in 4MB chunks and condemns the chunk if it ever
fills up. This is a better fit for staging buffer lifetimes since there's
usually a bunch of them at startup and then a small/sporadic amount
afterwards. The buffer doesn't need to double in size, and it doesn't
need to be kept around after the transfers are issued. The memory
really is single-use and won't roll over from frame to frame like the
other scratchpads.
The animation compute shader was not specializing the workgroup size
properly, so it was only working on GPUs with a subgroup size of 32.
The Quest 1 has a subgroup size of 32 and the Quest 2 has a subgroup
size of 64, so this resulted in hand models breaking on Quest 2 only!
- glowTexture is on by default, but still requires the glow flag.
- occlusionTexture is named ambientOcclusion, and is on by default,
but is still not used by any builtin shaders/helpers.
Sigh, back to getPass. I don't even know at this point. Basically now
that we came up with a half-solution for temp buffers, it makes sense to
apply this to passes as well, since we aren't going with the workstream
idea and temp passes are more convenient than retained passes.
- They no longer live in temporary memory, but in a dedicated pool.
- There are error checks for using a temporary buffer after it's invalid
- However, these are imperfect, and could be improved. One idea is to
avoid recycling a temporary buffer until its refcount decays (i.e.
Lua finally decides to garbage collect it). This would explode
memory usage sometimes, so it could only be enabled when
t.graphics.debug is true.
A lot of clean up can happen now that C doesn't push delayed errors to
Lua. This was happening for Pico and WebVR, neither of which are used
anymore.
Also default vsync to true but force it off if VR is active.
The sync was totally wrong here. It's a bit better now. However there
are some general sync issues that need to be fixed. Basically a Pass
that does reads and writes or multiple writes doesn't work properly, for
various reasons. I think sync needs to be split into 2 phases -- first
process all the reads and merge barrier bits into the barrier of the
last writer, then process all the writes and set 'final' resource state
for stuff in the pass. Due to branch prediction it may be better to
have 2 separate lists -- one for reads and one for writes. And I'm not
100% sure on how to reconcile a Pass that is doing reads and writes to
the same resource yet, still thinking about it.
It uses newPass instead of getPass. Temporary objects had lifetime
issues that were nearly impossible to solve. And normal objects are
easier to understand because they behave like all other LÖVR objects.
However, Pass commands are not retained from frame to frame. Pass
objects must be re-recorded before every submit, and must be reset
before being recorded again.
Pass objects now provide a natural place for render-pass-related info
like clears and texture handles. They also allow more information to be
precomputed which should reduce overhead a bit.
It is now possible to request a stencil buffer and antialiasing on the
window and headset textures, via conf.lua.
lovr.graphics.setBackground should instead set the clear color on the
window pass. Though we're still going to try to do spherical harmonics
in some capacity.
There are still major issues with OpenXR that are going to be ironed
out, and the desktop driver hasn't been converted over to the new
headset Pass system yet. So lovr.headset integration is a bit WIP.
There are some issues with immediately tracking readbacks in the global
linked list of pending readbacks:
- The Pass might not get submitted, in which case the readback will be
"dangling" and never complete (or it will erroneously think it's
completed but its buffer will contain garbage data).
- Thread safety issues of modifying a global data structure from a Pass.
Instead, Pass will locally track the readbacks it performs, and only at
submit time will those readbacks get added to the global list.
(There is a little bit of refcounting mistakes now, those will get
cleaned up).
- Rename/reorder some projection matrix functions.
- Make perspective functions flip Y and use 0-1 NDC range.
- Flip winding and font vertices based on handedness.
This stuff is really confusing
Pass stores a small 16-bucket cache of vertices/indices it recently
generated. Draws that have relatively predictable geometry can provide
a hash along with their draw. The Pass will reuse vertices based on the
hash, when possible, and return a NULL vertex pointer to let the draw-er
know they don't need to generate any vertices.
This provides a dramatic speedup when drawing the same shape many times
in a row. The overhead is negligible, with benefits kicking in with
just a small handful of repeated draws (3-5 for cubes, less for more
complex shapes).
Originally we made the font texture f16 due to "clamping" of the
distance field, and kept it as floats (but f32 since conversion isn't
automatic with Vulkan) here. However, clamping isn't really an issue.
You can increase the spread of the font to literally get a wider spread
of the SDF for glows, etc. Switching to u8 uses 4x less texture memory,
which is significant.
It can be used to push the current cursor onto the stack, perform some
tmep allocations, and then pop the stack to "free" them all at once.
This can be nice if you're doing some temporary allocations that aren't
going to be needed when the function returns, since it reduces the
amount of allocator growth a bit.
This allocator is meant to be threadlocal eventually, so there are no
thread-safety concerns.
- Padding is automatically computed from spread.
- Spread increases detail at small sizes.
- Remove failure cases where padding < spread/2
- UVs are un16x2, making room for color
- Don't center glyphs inside their atlas bounding box
- Cache normalized UVs and update them (for glyphs and vertices) when
the atlas changes size.
- Updating the UVs is UGLY and duplicates a lot of code. It may be
better to normalize the UVs on the fly, or just re-render the entire
string if the atlas is updated.