Graphics Hardware 2008 Slides

July 2, 2008 at 3:36 pm (Presentations)

Graphics Hardware 2008 organizers have begun to post slides from the conference here. I don’t think the papers are online yet and the proceedings didn’t come with a DVD this year so I guess everyone will just have to wait :)

Permalink No Comments

Non-interleaved Deferred Shading of Interleaved Sample Patterns

June 25, 2008 at 11:19 pm (Papers)

Continuing with the theme of my last substantive post (it’s been awhile, but it was Keepin’ It Low-Res), here’s a paper from Graphics Hardware 2006 that deals with computing lighting at a resolution lower than the screen resolution. The exact application is in this case is Instant Radiosity style global illumination. As a refresh, instant radiosity approximates bounce lighting by tracing rays from the light sources and placing point lights where the ray intersects the scene to approximate reflected light. The radiance equation can then be approximated with a Monte Carlo integration of radiance computed by evaluating a set of point lights.

With a deferred rendering pipeline, you can reduce some of the shader bottlenecks you’d encounter evaluating hundreds of these placed point lights and shadow map comparisons by rendering proxies for each light source so that shading is computed only at relevant pixels. However, it would be even better if only had to evaluate a subset of those lights per-pixel. Obviously, that is the goal of this paper. The devil is in the details though, as performing incoherent lighting calculations (as you may do in a naive implementation) doesn’t jive so well on graphics hardware where coherency is key.

Let’s first talk about the naive implementation mentioned in the last paragraph. How might you first go about selectively shading different pixels with a different set of lights? You could start out by using dynamic flow control. Something like:

if ScreenSpacePos.x % 2 == 0 && ScreenSpacePos.y % 2 == 0

    for( int i = 0; i < nLightsSet1; i++ )
    {
        // Compute shading for light i, in set 1
    }
else if ScreenSpacePos.x % 2 == 1 && ScreenSpacePos.y % 2 == 0
{
    for( int i = 0; i < nLightsSet2; i++ )
    {
       // Compute shading for light i, in set 2
    }
}
else if ... // The other two cases

This is obviously bad for coherency. Within a group of pixels assigned to one SIMD, you are guaranteed to follow all four paths and essentially all four paths will be executed for each pixel in that SIMD group. So you will be doing 4x more work per-pixel than if all pixels in that group were from VPL set 1. You could similarly try and use a stencil buffer to mask all pixels for a given light set, but given the alternating light set pattern the stencil mask is incoherent and you won’t get any of the benefits of Hi-Stencil. The same thing goes for using depth to mask pixels (Hi-Z is trashed). I’m pretty sure in the last few generations of hardware that there isn’t a performance difference between using stencil or depth to cull computation, FYI.

I think you’re getting the idea. All of the GBuffer texels for a given light set should be computed coherently, so ideally all of the aforementioned GBuffer texels should be organized next to each other in the GBuffer. As an aside, it is important to understand that textures are block allocated on the GPU. They are allocated in 2D blocks because texture fetches are spatially coherent in 2D. When you fetch one texel, you’re typically going to fetch another one in the 2D neighborhood of the last fetch. Therefore when I say that GBuffer texels for a given light set should be next to each other in the Gbuffer I mean in blocks as depicted in Figure 1:

Figure 1 Left: Interleaved texels Center: Correctly coalesced texels Right: Non-cache friendly coalescence

In order to reorganize the GBuffer texels in such a fashion, the authors first suggest a one-pass approach. Say we have four VPL light sets. While we want them uniformly distributed across the viewport in a regular fashion so that we can interpolate the shaded values at the end, we want them organized in blocks to be coherent when shading. Texel (x,y) in the shuffled GBuffer should be mapped to texel ( (x % 2)*2 + x/2, (y%2)*2 + y/2) in the initial GBuffer. This will work fine, but the fetching of the initial GBuffer texels is incoherent (word count: 23) because they’re being fetched from all over the GBuffer instead of in a smaller neighborhood.

The authors suggest a two-pass approach instead. Rather than jumping straight to the shuffling described in the last paragraph, the shuffling is performed in cache-friendly blocks.

The shading is then performed on the reorganized GBuffer and then a de-swizzle or “gathering” step is performed to undo the mapping that was performed in previous steps. When the shaded pixels are de-swizzled, each pixel only has the shading for one subset of the overall set of lights in the scene. The authors discuss some clever little tricks to quickly filter these shading values in a discontinuity-respecting manner so that the computed irradiance is smooth across the image. I’ll leave that for you to parse from the paper.

Non-interleaved Deferred Shading of Interleaved Sample Patterns

Benjamin Segovia, Jean-Claude Iehl , Richard Mitanchey and Bernard Péroche Proceedings of SIGGRAPH/Eurographics Workshop on Graphics Hardware 2006

Permalink No Comments

ATI HD 4870 and 4850

June 25, 2008 at 5:01 pm (News)

*Shameless home-team raving content below*

The NDA on the new ATI hardware (R770) was lifted today, so a boat-load of reviews are out. And they’re all glowing. I’ve been with the company for about a year and a half and NVIDIA has had us pinned down from the day I started until now. It’s nice to have a winning product out there. If you’re thinking about buying a mid-range graphics card ($199-$299), or you want help ensure that I have a job in the future,  now is the time.

Techreport
Anandtech
Hexus
Hardware.fr (french)
arstechnica
Extremetech
Hardwarecanucks
computerbase (german)
Hothardware
Rage3d
hardocp

Permalink 1 Comment

Hold it now

June 23, 2008 at 10:21 am (Uncategorized)

So.. it’s been about 2 months since I posted on this blog. I have every intention of posting again!  I have been crunching like a mutha for the past two months and haven’t had time to form coherent, non-demo related thoughts let alone do normal things like prepare my own food or enjoy a sunny afternoon. But that’s over now, so I hope to start updating this blog regularly.  I realized that I haven’t posted anything related to demos that I’ve worked on at ATI so I might do that first.  Coming soon.

Permalink 1 Comment

SIGGRAPH 2008 Papers Posted

April 28, 2008 at 2:13 pm (Papers)

Not many papers are linked at this time, but Ke-Sen Huang has updated his graphics’ paper archive with SIGGRAPH 2008 papers. I am especially interested in “3D Unsharp Masking for Scene Coherent Enhancement”,  “Real-Time Smoke Rendering Using Compensated Ray Marching”,  and “Real-Time, All-Frequency Shadows in Dynamic Scenes”, but there are many papers that sound like they could be very interesting.

http://kesen.huang.googlepages.com/sig2008.html

Permalink No Comments

Keepin’ it Low-Res

April 24, 2008 at 3:24 am (Papers)

Lately I’ve been really interested in performing shading calculations at a frequency lower than per-pixel. Depending on factors like the frequency of the lighting effect and surface orientation relative to camera, you can get away with computing values at a lower resolution or at an adaptive sampling.

Off-screen particles

The likely most used technique along these lines is the “Off-screen Particles” method that NVIDIA first used in the Vulcan demo and then wrote up later in GPU Gems 3. The basic idea is that you blend your particles into a lower than screen resolution texture to save on fill rate. Particles, like smoke, are notorious fill rate hogs due to massive amounts of overdraw and potentially expensive per-pixel lighting computations. In order to get correct particle occlusion, a manually downsampled depth buffer has to be used for depth testing the lower resolution particle buffer. This introduces problems near particle-scene intersections due to gaps created by this low resolution depth testing. This is resolved by performing high-resolution particle blending in problem areas. Problem areas are identified with an edge detection filter performed on the low res buffer’s alpha channel. However, I have heard from a few developer friends that in most cases this fix-up pass to correct problems is not needed because artifacts are not noticable in the majority of cases.

An additional optimization is to render particles in to one of a few different resolution buffers. For example, particles that are very close the camera likely needed very little screen space resolution because higher frequency features are covering large amounts of screen real estate. Additionally, particles closest to the camera are going to contribute the most to overall fill rate (compared to other scene particles) because they are likely taking up most if not all of the screen pixels. You’d mostly see this situation in non-uniform volumetric fog or dense smoke conditions.

For more info on the original method, see the online version of the Vulcan paper or take a look at the “High-Speed, Off-Screen Particles” article in GPU Gems 3.

Bilateral Upsampling

A few months ago I mentioned the Bilateral Upsampling method used by Sloan et al. in their paper “Image-Based Proxy Accumulation for Real-Time Soft Global Illumination”. The concept is to bilinearly interpolate lower resolution illumination results while applying a weighting function to interpolants so that values aren’t interpolated across boundaries. In some situations it is required to perform additional computation near boundaries. This situation is handled similarly to the off-screen particle method: edge detection and selective higher resolution calculation in edge regions. Read my previous entry and the paper for more info.

Adaptive Soft Shadows with Push-Pull Reconstruction
Orange indicates pixels where no computation was performed

(Images from the paper: Orange pixels are where computation is skipped)

Another trick was used by Gaël Guennebaud et al. in the 2007 Eurographics paper “High-Quality Adaptive Soft Shadow Mapping”. This trick computes soft shadow amounts at adaptive resolution rather than simply at a lower resolution as in the previously discussed methods. While not as simple to upsample, computing values adaptively based on surface orientation and shadow frequency allocates more fidelity in regions that need it. The other methods simply pick some lower resolution and forgo any higher frequency information. Adaptive sample locations are determined dynamically by computing a per-pixel value in screenspace pre-pass based on surface normal and penumbra width (determined by examining the shadow map) and thresholding it against a repeated pattern. This is sort of along the lines of what is done with dithering. This pattern is a bit stochastic such that some pixels are bound to be identified as needing computation regardless of the scene/visibility complexity contained in those pixels, but more or all pixels inside the pattern will be identified in cases where the surface is quickly changing or the penumbra is very tight. After pixels are identified, a relatively expensive soft shadow technique is used to compute visibilty at those locations. These visibility values are then interpolated to fill in all screen pixels.

So, the question you may be asking is.. how do you interpolate non-uniformly spaced samples? The previous two methods discussed here computed at a lower resolution where all samples are equally spaced. Bilinear interpolation suffices there and we are all quite familiar with that. It turns out that the answer was provided over 20 years ago in a SIGGRAPH ‘87 paper by Don P. Mitchell titled “Generating Antialiased Images at Low Sampling Densities”. This method was also used by Gortler et al. in “The Lumigraph” at SIGGRAPH 96 and by Grossman in his ‘97 thesis “Point Sample Rendering“.

The concept is that you start with a screen sized buffer containing your initial sparse sampling. All pixels containing a value are given a weight of 1 and all others are given a weight of 0. In the “Pull” stage, you construct a pyramid of lower resolution images (each one half the resolution of the previous, like mipmaps) with accompanying weights calculated as a sum of the weights in the previous higher resolution map. Then in the following “Push” phase you reconstruct each successive higher resolution image by filling in gaps with values from the lower resolution image based on the accumulated weights from the Pull phase. I prefer to keep the math and notation out of the blog so check out page 23-24 of Grossman’s thesis for a thorough explanation. This method is a bit hairier than the other two so it would probably only pay off in situations when you computation is fairly expensive.

I should also note that the adaptive soft shadows paper is an extension on Guennebaud’s 2006 paper “Real-time Soft Shadow Mapping by Backprojection” which is also cool (and simpler).

Permalink 4 Comments

Horizon Split Ambient Occlusion

February 27, 2008 at 3:55 am (Game techniques, Presentations)

I have so much I want to write about what I saw at I3D and GDC that sitting down and doing it seems daunting. I will try and post my thoughts over the next few days.

One interesting poster at I3D (and also a short talk at GDC) was an extension to SSAO. The idea was to calculate a piece-wise linear approximation of the horizon ala horizon mapping. This is achieved by sampling the depth values along m steps ( they used 8 ) in n equally spaced directions (8 again) in the tangent frame. At each depth sample, you update the horizon value if the current depth sample is higher than the current estimation of the horizon in that direction. Sampling in this fashion reduces over-occlusion. This is actually very similar to the approximation to AO that Dachsbacher and Tatarchuk described in their poster at I3D last year, “Prism Parallax Occlusion Mapping with Accurate Silhouette Generation“. All of the authors from the poster are NVIDIA guys and they announced that they will release a whitepaper describing their method.

Permalink 3 Comments

2007: The year SSAO broke

February 10, 2008 at 4:39 pm (Demos, Game techniques)

kindernoiser

(image from RGBA demo kindernoiser)

One of my very first posts on this blog discussed the image space ambient occlusion paper by Shanmugam et al. That post has more hits than any other entry I’ve made here. So I thought I would take a few minutes and survey the state of the art in screen space ambient occlusion.

CryTek gave a presentation at SIGGRAPH 2007 in the Real-time Rendering course. They didn’t reveal any exact implementation details, but did reveal that they are only using the depth buffer from their z pre-pass (unlike the Shanmugam paper which uses depth and normals). They additionally noted that they use a per-pixel rotated disc of sample offsets to reduce sampling artifacts. This technique is common for reducing sampling artifacts in shadow mapping.

Iñigo Quilez ( of RGBA demoscene group, yay ) has created a website dedicated to his use of SSAO in the RGBA demo kindernoiser. One unique contribution is that he suggests doing a post-SSAO blur to lessen sampling artifacts. He has given the most detail by into the implementation of his technique, by far. A thread which follows some of the early results he was getting is @ gamedev. He also provides a analytic solution for the ambient occlusion due to a sphere here.

Megan Fox developed a technique which she refers to as “Crease shading“. It differs very little from the original Shanmugam technique. However, she provides an excellent breakdown of the technique and some comparison screenshots.

A tweak that I implemented for SSAO is to perform the ambient occlusion computation on a lower resolution texture and then upsample using an boundary respecting filter. I suggested on this blog (here) that the bilateral upsampling technique works well. I have received email from a few people that have tried it and are getting pretty good results.

So.. a whole lot of people were very interested in SSAO. Rightly so. Approximating an expensive technique like dynamic ambient occlusion (which itself is an approximation!) in a geometry-independent manner is a valuable tool. Of course, it has some serious drawbacks such as not being able to account for occlusion from back facing polygons and causing over occlusion, to name a few. If anybody knows of any other implementations/discussions of SSAO anywhere on the web, please post them here.

Permalink 11 Comments

Light Indexed Deferred Rendering

January 24, 2008 at 3:19 pm (Demos, Papers)

no_deferred1.jpg

Damian Trebilco posted a paper and demo of an interesting approach to deferred rendering on the Beyond3D forums. Instead of rendering out several buffers of geometry information (gbuffers), the author renders the light volumes with unique IDs into a buffer. Standard forward rendering is then used and the per-pixel light indices are used to index light information. The great benefit of doing it this way is that you don’t have all of the bandwidth of outputting these gbuffers. Once you turn on MSAA, these already large buffers become increasingly costly. Another benefit is that handling transparent surfaces becomes much easier.

With this approach, the worry of packing geometry and material information into as few gbuffers as possible is replaced with the worry of storing your light IDs and handling the max number of lights that might overlap one pixel. There are a few other gotchas, but you should read the paper for a comparison with standard forward rendering and traditional deferred rendering. Worth a read!

Permalink 2 Comments

I3D 2008 Papers and Registration

January 16, 2008 at 4:54 am (Papers)

logo.gif

I3D 2008 (Feb 15th-17th) is creeping up on us quickly. Early registration ends in about 15 minutes @ the I3D website. Papers have begun to trickle into Kensen Huang’s graphics conference paper page for I3D 2008.

A few papers of interest that have jumped out at me are:

Bouthors et al. Interactive multiple anisotropic scattering in clouds ( gallery/thread on gamedev.net)

Modeling anisotropic light scattering in clouds with beautiful results.

Kim Hardware-Aware Analysis and Optimization of Stable Fluids

The author analyzes performance of Stam’s Stable Fluids algorithm in terms of load/store : ALU ratio and access patterns, reports experimental results supporting theorized performance, and offers two optimizations to alleviate the bandwidth bottleneck.

Wyman Hierarchical Caustic Maps

Haven’t had a chance to look at this one yet but caustics papers always warrant a look IMHO.

Kloetzli et al. Interactive Volume Isosurface Rendering using BT Volumes

Paper isn’t publicly available yet but I’ve seen the algorithm running and it is sweet.

Permalink 2 Comments

« Previous entries