Lately I’ve been really interested in performing shading calculations at a frequency lower than per-pixel. Depending on factors like the frequency of the lighting effect and surface orientation relative to camera, you can get away with computing values at a lower resolution or at an adaptive sampling.
The likely most used technique along these lines is the “Off-screen Particles” method that NVIDIA first used in the Vulcan demo and then wrote up later in GPU Gems 3. The basic idea is that you blend your particles into a lower than screen resolution texture to save on fill rate. Particles, like smoke, are notorious fill rate hogs due to massive amounts of overdraw and potentially expensive per-pixel lighting computations. In order to get correct particle occlusion, a manually downsampled depth buffer has to be used for depth testing the lower resolution particle buffer. This introduces problems near particle-scene intersections due to gaps created by this low resolution depth testing. This is resolved by performing high-resolution particle blending in problem areas. Problem areas are identified with an edge detection filter performed on the low res buffer’s alpha channel. However, I have heard from a few developer friends that in most cases this fix-up pass to correct problems is not needed because artifacts are not noticable in the majority of cases.
An additional optimization is to render particles in to one of a few different resolution buffers. For example, particles that are very close the camera likely needed very little screen space resolution because higher frequency features are covering large amounts of screen real estate. Additionally, particles closest to the camera are going to contribute the most to overall fill rate (compared to other scene particles) because they are likely taking up most if not all of the screen pixels. You’d mostly see this situation in non-uniform volumetric fog or dense smoke conditions.
For more info on the original method, see the online version of the Vulcan paper or take a look at the “High-Speed, Off-Screen Particles” article in GPU Gems 3.
A few months ago I mentioned the Bilateral Upsampling method used by Sloan et al. in their paper “Image-Based Proxy Accumulation for Real-Time Soft Global Illumination”. The concept is to bilinearly interpolate lower resolution illumination results while applying a weighting function to interpolants so that values aren’t interpolated across boundaries. In some situations it is required to perform additional computation near boundaries. This situation is handled similarly to the off-screen particle method: edge detection and selective higher resolution calculation in edge regions. Read my previous entry and the paper for more info.
Adaptive Soft Shadows with Push-Pull Reconstruction
(Images from the paper: Orange pixels are where computation is skipped)
Another trick was used by Gaël Guennebaud et al. in the 2007 Eurographics paper “High-Quality Adaptive Soft Shadow Mapping”. This trick computes soft shadow amounts at adaptive resolution rather than simply at a lower resolution as in the previously discussed methods. While not as simple to upsample, computing values adaptively based on surface orientation and shadow frequency allocates more fidelity in regions that need it. The other methods simply pick some lower resolution and forgo any higher frequency information. Adaptive sample locations are determined dynamically by computing a per-pixel value in screenspace pre-pass based on surface normal and penumbra width (determined by examining the shadow map) and thresholding it against a repeated pattern. This is sort of along the lines of what is done with dithering. This pattern is a bit stochastic such that some pixels are bound to be identified as needing computation regardless of the scene/visibility complexity contained in those pixels, but more or all pixels inside the pattern will be identified in cases where the surface is quickly changing or the penumbra is very tight. After pixels are identified, a relatively expensive soft shadow technique is used to compute visibilty at those locations. These visibility values are then interpolated to fill in all screen pixels.
So, the question you may be asking is.. how do you interpolate non-uniformly spaced samples? The previous two methods discussed here computed at a lower resolution where all samples are equally spaced. Bilinear interpolation suffices there and we are all quite familiar with that. It turns out that the answer was provided over 20 years ago in a SIGGRAPH ’87 paper by Don P. Mitchell titled “Generating Antialiased Images at Low Sampling Densities”. This method was also used by Gortler et al. in “The Lumigraph” at SIGGRAPH 96 and by Grossman in his ’97 thesis “Point Sample Rendering“.
The concept is that you start with a screen sized buffer containing your initial sparse sampling. All pixels containing a value are given a weight of 1 and all others are given a weight of 0. In the “Pull” stage, you construct a pyramid of lower resolution images (each one half the resolution of the previous, like mipmaps) with accompanying weights calculated as a sum of the weights in the previous higher resolution map. Then in the following “Push” phase you reconstruct each successive higher resolution image by filling in gaps with values from the lower resolution image based on the accumulated weights from the Pull phase. I prefer to keep the math and notation out of the blog so check out page 23-24 of Grossman’s thesis for a thorough explanation. This method is a bit hairier than the other two so it would probably only pay off in situations when you computation is fairly expensive.
I should also note that the adaptive soft shadows paper is an extension on Guennebaud’s 2006 paper “Real-time Soft Shadow Mapping by Backprojection” which is also cool (and simpler).