Imperfect Shadow Maps

December 19, 2008 at 3:41 pm (Papers)

“Imperfect Shadow Maps for Efficient Computation of Indirect Illumination” by Ritschel et al., a real-time indirect lighting can be summarized as follows: it solves the visibility problem present in the paper “Splatting Indirect Illumination” by Dachsbacher and Stamminger.

The splatting indirect illumination method works by rendering what the authors call a reflective shadow map. A RSM is a collection of images that capture information of surfaces visible from a light source. The RSM is then sampled to choose surfaces that will be used as Virtual Point Lights. Indirect lighting is then calculated as the sum of the direct lighting contribution of these VPLs. The idea of approximating radiosity with point lights was first described in the paper Instant Radiosity. In order to light the scene with each VPL, the method performs deferred shading by rendering some proxy geometry that bounds the influence of the light and effectively splats the illumination from that (indirect) light onto the scene.

The problem with this method is that the illumination is splatted onto the scene without any information about the visibility of that VPL. The surface being splatted upon could be completely obscured by an occluder, but would receive the full amount of bounced lighting. What you would really need here is a shadow map rendered for each VPL. But in order to get good indirect illumination you need hundreds or thousands of VPLs, which requires hundreds or thousands of shadow maps. Let’s face it, that ain’t happenin’ in real-time. First of all, you’d have to render your scene X number of times, which means you’d have to limit the complexity of your scene or use some kind of adaptive technique like progressive meshes. But on top of that you’d have X number of draw calls, which have their own amount of overhead.

So what Imperfect Shadow Maps does is figure out a way to render hundreds or thousands of shadow maps in one draw call and with dramatically reduced amounts of geometry.

The paper achieves this by rendering 1024 paraboloid shadow maps of a sparse point representation of the scene. During preprocessing, many points are distributed uniformly across the scene. Then, n sets of ~8k points are constructed, where n is the number of VPLs the algorithm will use at run-time. The number 8k is not mentioned in the paper but the author stated this number in his SIGGRAPH Asia presentation. The points in these sets are chosen randomly. At run-time, each of the n sets of points are rendered to its respective paraboloid depth map.

Ok, you’re rendering a bunch of sparse points to a low-res (128×128 or less) shadow map. As you may suspect, it’s going to look like garbage:


It’s a Cornell box, can’t you tell?

The authors get clever here and use pull-push upsampling to fill holes between the points, being smart and using some thresholding to make sure they dont fill holes around depth discontinuties. Anyway, after the holes are filled the shadow maps still kind of look bad:


But it doesn’t matter so much because the indirect illumination is smooth and you’re going to adding the contribution of hundreds of these things at each pixel, so the incorrect visibility of each individual VPL gets smoothed out in the end.

That’s the basic idea.

The authors present some other cool things in the paper, like how to adaptively choose VPLs from the RSMs, and they also use the trick from “Non-interleaved Deferred Shading of Interleaved Sample Patterns” (talked about here) and only process a subset of the VPLs at each pixel.

Also, there is a paper that just got accepted to I3D called “Multiresolution Splatting for Indirect Illumination” by Nichols and Wyman that is a perfect fit for this paper. I’ll probably post a bit about that tomorrow.

Imperfect Shadow Maps for Efficient Computation of Indirect Illumination

Tobias Ritschel, Thorsten Grosch, Min H. Kim, Hans-Peter Seidel, Carsten Dachsbacher, Jan Kautz ACM Trans. on Graphics (Proceedings SIGGRAPH Asia 2008), 27(5), 2008.

Splatting indirect illumination

Dachsbacher, C. and Stamminger, M. 2006. In Proceedings of the 2006 Symposium on interactive 3D Graphics and Games (Redwood City, California, March 14 – 17, 2006). I3D ’06. ACM, New York, NY, 93-100.

Permalink 6 Comments

Let’s Have a Min/Max Party

December 11, 2008 at 5:54 am (News, Papers)

Today I was waiting for a session to begin at SIGGRAPH ASIA and began to think about how there are several cool papers that exploit min/max  images. A min/max image is an image pyramid that is sort of like a quadtree. The bottom level of the hierarchy is the original image while the elements in each subsequent level of the hierarchy contain the minimum and maximum of four elements in the previous level. So it’s sort of like a mip map, but instead of averaging values, you store the min and max of the previous level. This min/max hierarchy can be generated quickly in log n passes but can be used for making conservative estimations for large regions of your image. Refer to the following papers:

Maximum Mipmaps for Fast, Accurate, and Scalable Dynamic Height Field Rendering by A. Tevs, I. Ihrke, H.-P. Seidel

– Uses min/max maps to ray trace height fields. I feel like this idea has been around for ages but here it is all packaged up with a neat little bow.

Fast GPU Ray Tracing of Dynamic Meshes using Geometry Images by Nathan Carr, Jared Hoberock, Keenan Crane, John C. Hart.

– Uses min/max hierarchies of Geometry Images to accelerate the ray tracing of meshes.

Real-time Soft Shadow Mapping by Backprojection by Gaël Guennebaud, Loïc Barthe, Mathias Paulin
High-Quality Adaptive Soft Shadow Mapping by Gaël Guennebaud, Loïc Barthe, Mathias Paulin

–  I’ve ranted about these papers before. These works generate min/max hierarchies of shadow camera depth images to perform efficient blocker searches for soft shadow rendering, and also to determine penumbra regions for further optimization.

March of the Froblins SIGGRAPH course notes by Jeremy Shopf, Joshua Barczak, Christopher Oat, Natalya Tatarchuk

– Used a min/max hierarchy of the depth buffer to occlusion cull agents in our crowd simulation. Technically this only used the max portion of the hierarchy, but I didn’t want to title this Let’s have a Min/Max Party (Min is Optional).

Anyway, I think it’s kind of neat. I’m going to make another post tomorrow night about an awesome paper that’s here at the conference but I don’t want to write about it until I have a chance to clear up some nebulous parts of the paper with the author.

In other news, I received official word that the GDC lecture I proposed was accepted so I guess I will be seeing some of you in San Francisco next year in March. I’m excited about this talk because it came directly out of a post on this blog. Turns out this isn’t a waste of time after all!

Permalink 2 Comments

Pixel-Correct Shadow Maps with Temporal Reprojection …

November 6, 2008 at 10:14 pm (Papers) (, )

.. and Shadow Test Confidence!

Image from the paper depicting convergence on pixel-correct shadows

Image from the paper depicting convergence on pixel-correct shadows

This paper is in the running for both the longest graphics paper title and neatest lil’ shadow rendering method in recent memory. It’s pretty cool in it’s own right but I’ve had a special affection for papers that exploit temporal coherence as of late. I’d like to implement this in the near future (wouldn’t think it would take much longer than writing two blog posts.. why do I do this again?!) to see how usable it is in practice.

So.. the basic idea here is that each rasterization of the scene used for a shadow map provides a limited amount of information about occluders from the view point of the light (due to its discrete nature). This is the source of spatial aliasing (blockiness). However, rasterizations of the scene over several frames provides much more information. This paper exploits this fact to generate pixel-correct shadows by “honing in” on the correct answer over many frames and relying on the human eye’s inability to adapt quickly to notice this adaptive process. This method is lightweight, simple and should fit right into an existing rendering pipeline.

Here are the salient points of the paper:

– The method uses a screen-space history buffer that maintains information about per-pixel visibility over the past few frames. This is similar to the reprojection cache (another awesome temporally-exploitive paper, links below).

– Algorithm consists of four steps:

1) Calculate current frame’s per-pixel visibility using traditional shadow mapping.

2) Transform each pixel to the history buffer by transforming the pixel’s position using the transformation matrices of the camera in the current and previous frame.

3) Update the history buffer using this frame’s visibility test results.

4) Shadow scene with updated history buffer.

– The history buffer is updated with exponential smoothing according to some confidence value that describes how close the sample is to a correct visibility result.

– The confidence value is calculated as the distance between a pixel’s position in shadow map space and the closest shadow map texel center. This makes sense.. a scene position that maps exactly onto a shadow map texel has a correct visibility test result.

– The shadow map must contain different rasterizations of the scene over time or no new information is added to the system. This is achieved by sub-pixel jittering in both translation and rotation of the shadow camera.


Daniel Scherzer, Stefan Jeschke, Michael Wimmer. Pixel-Correct Shadow Maps with Temporal Reprojection and Shadow Test Confidence. In Rendering Techniques 2007 (Proceedings Eurographics Symposium on Rendering).

P. Sitthi-amorn, J. Lawrence, L. Yang, P. V. Sander, D. Nehab. An Improved Shading Cache for Modern GPUs. ACM SIGGRAPH Symposium on Graphics Hardware 2008.

D. Nehab, P. V. Sander, J. Lawrence, N. Tatarchuk, J. Isidoro. Accelerating Real-Time Shading with Reverse Reprojection Caching. ACM SIGGRAPH Symposium on Graphics Hardware 2007.

Permalink 2 Comments

Larrabee paper and articles

August 4, 2008 at 11:20 am (Papers, Presentations) (, )

Amidst a flurry of articles from technical websites, Intel also released the paper (non-ACM link) on the Larrabee architecture that will be presented at SIGGRAPH next week.

Articles discussing some details that were released in a presentation by Larry Siler:

Permalink 6 Comments

Geometry-Aware Framebuffer Level of Detail

July 6, 2008 at 10:26 pm (Papers)

Image from \
Comparison image from “Geometry-Aware Framebuffer Level of Detail”

Here’s a nice compact little paper from GH 2008 on scaling down shading computation in a rendered scene.

1) Render G-buffer (normals, depth) at a lower resolution (full res * some resizing factor r )

2) Render scene, using bilateral filter (orig paper, SIGGRAPH course, bilateral upsampling ) to upsample some or all of the shading components in a discontinuity respecting manner. Expensive, lower frequency computations should be performed at the lower resolution while inexpensive high-frequency shading (such as specular) should be done at normal resolution.

3) Adjust r based on current framerate vs. some baseline desired framerate to maintain a more constant framerate.

This method is limited by the fact that you have to render the scene geometry twice and highly detailed geometry may introduce artifacts when upsampling, but it is interesting and definitely worth a quick read.

Geometry-Aware Framebuffer Level of Detail
Lei Yang, Pedro V. Sander, Jason Lawrence
Eurographics Symposium on Rendering 2008

Permalink 2 Comments

Non-interleaved Deferred Shading of Interleaved Sample Patterns

June 25, 2008 at 11:19 pm (Papers)

Continuing with the theme of my last substantive post (it’s been awhile, but it was Keepin’ It Low-Res), here’s a paper from Graphics Hardware 2006 that deals with computing lighting at a resolution lower than the screen resolution. The exact application is in this case is Instant Radiosity style global illumination. As a refresh, instant radiosity approximates bounce lighting by tracing rays from the light sources and placing point lights where the ray intersects the scene to approximate reflected light. The radiance equation can then be approximated with a Monte Carlo integration of radiance computed by evaluating a set of point lights.

With a deferred rendering pipeline, you can reduce some of the shader bottlenecks you’d encounter evaluating hundreds of these placed point lights and shadow map comparisons by rendering proxies for each light source so that shading is computed only at relevant pixels. However, it would be even better if only had to evaluate a subset of those lights per-pixel. Obviously, that is the goal of this paper. The devil is in the details though, as performing incoherent lighting calculations (as you may do in a naive implementation) doesn’t jive so well on graphics hardware where coherency is key.

Let’s first talk about the naive implementation mentioned in the last paragraph. How might you first go about selectively shading different pixels with a different set of lights? You could start out by using dynamic flow control. Something like:

if ScreenSpacePos.x % 2 == 0 && ScreenSpacePos.y % 2 == 0

    for( int i = 0; i < nLightsSet1; i++ )
        // Compute shading for light i, in set 1
else if ScreenSpacePos.x % 2 == 1 && ScreenSpacePos.y % 2 == 0
    for( int i = 0; i < nLightsSet2; i++ )
       // Compute shading for light i, in set 2
else if ... // The other two cases

This is obviously bad for coherency. Within a group of pixels assigned to one SIMD, you are guaranteed to follow all four paths and essentially all four paths will be executed for each pixel in that SIMD group. So you will be doing 4x more work per-pixel than if all pixels in that group were from VPL set 1. You could similarly try and use a stencil buffer to mask all pixels for a given light set, but given the alternating light set pattern the stencil mask is incoherent and you won’t get any of the benefits of Hi-Stencil. The same thing goes for using depth to mask pixels (Hi-Z is trashed). I’m pretty sure in the last few generations of hardware that there isn’t a performance difference between using stencil or depth to cull computation, FYI.

I think you’re getting the idea. All of the GBuffer texels for a given light set should be computed coherently, so ideally all of the aforementioned GBuffer texels should be organized next to each other in the GBuffer. As an aside, it is important to understand that textures are block allocated on the GPU. They are allocated in 2D blocks because texture fetches are spatially coherent in 2D. When you fetch one texel, you’re typically going to fetch another one in the 2D neighborhood of the last fetch. Therefore when I say that GBuffer texels for a given light set should be next to each other in the Gbuffer I mean in blocks as depicted in Figure 1:

Figure 1 Left: Interleaved texels Center: Correctly coalesced texels Right: Non-cache friendly coalescence

In order to reorganize the GBuffer texels in such a fashion, the authors first suggest a one-pass approach. Say we have four VPL light sets. While we want them uniformly distributed across the viewport in a regular fashion so that we can interpolate the shaded values at the end, we want them organized in blocks to be coherent when shading. Texel (x,y) in the shuffled GBuffer should be mapped to texel ( (x % 2)*2 + x/2, (y%2)*2 + y/2) in the initial GBuffer. This will work fine, but the fetching of the initial GBuffer texels is incoherent (word count: 23) because they’re being fetched from all over the GBuffer instead of in a smaller neighborhood.

The authors suggest a two-pass approach instead. Rather than jumping straight to the shuffling described in the last paragraph, the shuffling is performed in cache-friendly blocks.

The shading is then performed on the reorganized GBuffer and then a de-swizzle or “gathering” step is performed to undo the mapping that was performed in previous steps. When the shaded pixels are de-swizzled, each pixel only has the shading for one subset of the overall set of lights in the scene. The authors discuss some clever little tricks to quickly filter these shading values in a discontinuity-respecting manner so that the computed irradiance is smooth across the image. I’ll leave that for you to parse from the paper.

Non-interleaved Deferred Shading of Interleaved Sample Patterns

Benjamin Segovia, Jean-Claude Iehl , Richard Mitanchey and Bernard Péroche Proceedings of SIGGRAPH/Eurographics Workshop on Graphics Hardware 2006

Permalink 4 Comments

SIGGRAPH 2008 Papers Posted

April 28, 2008 at 2:13 pm (Papers)

Not many papers are linked at this time, but Ke-Sen Huang has updated his graphics’ paper archive with SIGGRAPH 2008 papers. I am especially interested in “3D Unsharp Masking for Scene Coherent Enhancement”,  “Real-Time Smoke Rendering Using Compensated Ray Marching”,  and “Real-Time, All-Frequency Shadows in Dynamic Scenes”, but there are many papers that sound like they could be very interesting.

Permalink Leave a Comment

Keepin’ it Low-Res

April 24, 2008 at 3:24 am (Papers)

Lately I’ve been really interested in performing shading calculations at a frequency lower than per-pixel. Depending on factors like the frequency of the lighting effect and surface orientation relative to camera, you can get away with computing values at a lower resolution or at an adaptive sampling.

Off-screen particles

The likely most used technique along these lines is the “Off-screen Particles” method that NVIDIA first used in the Vulcan demo and then wrote up later in GPU Gems 3. The basic idea is that you blend your particles into a lower than screen resolution texture to save on fill rate. Particles, like smoke, are notorious fill rate hogs due to massive amounts of overdraw and potentially expensive per-pixel lighting computations. In order to get correct particle occlusion, a manually downsampled depth buffer has to be used for depth testing the lower resolution particle buffer. This introduces problems near particle-scene intersections due to gaps created by this low resolution depth testing. This is resolved by performing high-resolution particle blending in problem areas. Problem areas are identified with an edge detection filter performed on the low res buffer’s alpha channel. However, I have heard from a few developer friends that in most cases this fix-up pass to correct problems is not needed because artifacts are not noticable in the majority of cases.

An additional optimization is to render particles in to one of a few different resolution buffers. For example, particles that are very close the camera likely needed very little screen space resolution because higher frequency features are covering large amounts of screen real estate. Additionally, particles closest to the camera are going to contribute the most to overall fill rate (compared to other scene particles) because they are likely taking up most if not all of the screen pixels. You’d mostly see this situation in non-uniform volumetric fog or dense smoke conditions.

For more info on the original method, see the online version of the Vulcan paper or take a look at the “High-Speed, Off-Screen Particles” article in GPU Gems 3.

Bilateral Upsampling

A few months ago I mentioned the Bilateral Upsampling method used by Sloan et al. in their paper “Image-Based Proxy Accumulation for Real-Time Soft Global Illumination”. The concept is to bilinearly interpolate lower resolution illumination results while applying a weighting function to interpolants so that values aren’t interpolated across boundaries. In some situations it is required to perform additional computation near boundaries. This situation is handled similarly to the off-screen particle method: edge detection and selective higher resolution calculation in edge regions. Read my previous entry and the paper for more info.

Adaptive Soft Shadows with Push-Pull Reconstruction
Orange indicates pixels where no computation was performed

(Images from the paper: Orange pixels are where computation is skipped)

Another trick was used by Gaël Guennebaud et al. in the 2007 Eurographics paper “High-Quality Adaptive Soft Shadow Mapping”. This trick computes soft shadow amounts at adaptive resolution rather than simply at a lower resolution as in the previously discussed methods. While not as simple to upsample, computing values adaptively based on surface orientation and shadow frequency allocates more fidelity in regions that need it. The other methods simply pick some lower resolution and forgo any higher frequency information. Adaptive sample locations are determined dynamically by computing a per-pixel value in screenspace pre-pass based on surface normal and penumbra width (determined by examining the shadow map) and thresholding it against a repeated pattern. This is sort of along the lines of what is done with dithering. This pattern is a bit stochastic such that some pixels are bound to be identified as needing computation regardless of the scene/visibility complexity contained in those pixels, but more or all pixels inside the pattern will be identified in cases where the surface is quickly changing or the penumbra is very tight. After pixels are identified, a relatively expensive soft shadow technique is used to compute visibilty at those locations. These visibility values are then interpolated to fill in all screen pixels.

So, the question you may be asking is.. how do you interpolate non-uniformly spaced samples? The previous two methods discussed here computed at a lower resolution where all samples are equally spaced. Bilinear interpolation suffices there and we are all quite familiar with that. It turns out that the answer was provided over 20 years ago in a SIGGRAPH ’87 paper by Don P. Mitchell titled “Generating Antialiased Images at Low Sampling Densities”. This method was also used by Gortler et al. in “The Lumigraph” at SIGGRAPH 96 and by Grossman in his ’97 thesis “Point Sample Rendering“.

The concept is that you start with a screen sized buffer containing your initial sparse sampling. All pixels containing a value are given a weight of 1 and all others are given a weight of 0. In the “Pull” stage, you construct a pyramid of lower resolution images (each one half the resolution of the previous, like mipmaps) with accompanying weights calculated as a sum of the weights in the previous higher resolution map. Then in the following “Push” phase you reconstruct each successive higher resolution image by filling in gaps with values from the lower resolution image based on the accumulated weights from the Pull phase. I prefer to keep the math and notation out of the blog so check out page 23-24 of Grossman’s thesis for a thorough explanation. This method is a bit hairier than the other two so it would probably only pay off in situations when you computation is fairly expensive.

I should also note that the adaptive soft shadows paper is an extension on Guennebaud’s 2006 paper “Real-time Soft Shadow Mapping by Backprojection” which is also cool (and simpler).

Permalink 9 Comments

Light Indexed Deferred Rendering

January 24, 2008 at 3:19 pm (Demos, Papers)


Damian Trebilco posted a paper and demo of an interesting approach to deferred rendering on the Beyond3D forums. Instead of rendering out several buffers of geometry information (gbuffers), the author renders the light volumes with unique IDs into a buffer. Standard forward rendering is then used and the per-pixel light indices are used to index light information. The great benefit of doing it this way is that you don’t have all of the bandwidth of outputting these gbuffers. Once you turn on MSAA, these already large buffers become increasingly costly. Another benefit is that handling transparent surfaces becomes much easier.

With this approach, the worry of packing geometry and material information into as few gbuffers as possible is replaced with the worry of storing your light IDs and handling the max number of lights that might overlap one pixel. There are a few other gotchas, but you should read the paper for a comparison with standard forward rendering and traditional deferred rendering. Worth a read!

Permalink 3 Comments

I3D 2008 Papers and Registration

January 16, 2008 at 4:54 am (Papers)


I3D 2008 (Feb 15th-17th) is creeping up on us quickly. Early registration ends in about 15 minutes @ the I3D website. Papers have begun to trickle into Kensen Huang’s graphics conference paper page for I3D 2008.

A few papers of interest that have jumped out at me are:

Bouthors et al. Interactive multiple anisotropic scattering in clouds ( gallery/thread on

Modeling anisotropic light scattering in clouds with beautiful results.

Kim Hardware-Aware Analysis and Optimization of Stable Fluids

The author analyzes performance of Stam’s Stable Fluids algorithm in terms of load/store : ALU ratio and access patterns, reports experimental results supporting theorized performance, and offers two optimizations to alleviate the bandwidth bottleneck.

Wyman Hierarchical Caustic Maps

Haven’t had a chance to look at this one yet but caustics papers always warrant a look IMHO.

Kloetzli et al. Interactive Volume Isosurface Rendering using BT Volumes

Paper isn’t publicly available yet but I’ve seen the algorithm running and it is sweet.

Permalink 2 Comments

Next page »