Only when a readback is read back before a pass is created.
Should really change gpu to know if the frame has started yet and adjust
the tick index accordingly.
Fixes easily-encounterable GPU OOM on discrete cards.
Currently when mapping CPU-accessible GPU memory, there are only two
types of memory: write and read.
The "write" allocations try to use the special 256MB pinned memory
region, with the thought that since this memory is usually for vertices,
uniforms, etc. it should be fast.
However, this memory is also used for staging buffers for buffers and
textures, which can easily exceed the 256MB (or 246MB on NV) limit upon
creating a handful of large textures.
To fix this, we're going to separate WRITE mappings into STREAM and
STAGING. STREAM will act like the old CPU_WRITE mapping type and use
the same memory type. STAGING will use plain host-visible memory and
avoid hogging the precious 256MB memory region.
STAGING also uses a different allocation strategy. Instead of creating
a big buffer with a zone for each tick, it's a more traditional linear
allocator that allocates in 4MB chunks and condemns the chunk if it ever
fills up. This is a better fit for staging buffer lifetimes since there's
usually a bunch of them at startup and then a small/sporadic amount
afterwards. The buffer doesn't need to double in size, and it doesn't
need to be kept around after the transfers are issued. The memory
really is single-use and won't roll over from frame to frame like the
other scratchpads.
The animation compute shader was not specializing the workgroup size
properly, so it was only working on GPUs with a subgroup size of 32.
The Quest 1 has a subgroup size of 32 and the Quest 2 has a subgroup
size of 64, so this resulted in hand models breaking on Quest 2 only!
- glowTexture is on by default, but still requires the glow flag.
- occlusionTexture is named ambientOcclusion, and is on by default,
but is still not used by any builtin shaders/helpers.
Sigh, back to getPass. I don't even know at this point. Basically now
that we came up with a half-solution for temp buffers, it makes sense to
apply this to passes as well, since we aren't going with the workstream
idea and temp passes are more convenient than retained passes.
- They no longer live in temporary memory, but in a dedicated pool.
- There are error checks for using a temporary buffer after it's invalid
- However, these are imperfect, and could be improved. One idea is to
avoid recycling a temporary buffer until its refcount decays (i.e.
Lua finally decides to garbage collect it). This would explode
memory usage sometimes, so it could only be enabled when
t.graphics.debug is true.
A lot of clean up can happen now that C doesn't push delayed errors to
Lua. This was happening for Pico and WebVR, neither of which are used
anymore.
Also default vsync to true but force it off if VR is active.
The sync was totally wrong here. It's a bit better now. However there
are some general sync issues that need to be fixed. Basically a Pass
that does reads and writes or multiple writes doesn't work properly, for
various reasons. I think sync needs to be split into 2 phases -- first
process all the reads and merge barrier bits into the barrier of the
last writer, then process all the writes and set 'final' resource state
for stuff in the pass. Due to branch prediction it may be better to
have 2 separate lists -- one for reads and one for writes. And I'm not
100% sure on how to reconcile a Pass that is doing reads and writes to
the same resource yet, still thinking about it.