Larrabee paper and articles

August 4, 2008 at 11:20 am (Papers, Presentations) (, )

Amidst a flurry of articles from technical websites, Intel also released the paper (non-ACM link) on the Larrabee architecture that will be presented at SIGGRAPH next week.

Articles discussing some details that were released in a presentation by Larry Siler:

Permalink 6 Comments

Shader X2 Online

July 10, 2008 at 3:29 pm (Tips & Tricks)

GameDev is hosting a free online copy of the ShaderX2 books.

ShaderX2: Introduction & Tutorials with DirectX 9

ShaderX2: Shader Programming Tips and Tricks with DirectX 9

Pilfered from the Real-Time Rendering book blog.

Permalink Leave a Comment

Geometry-Aware Framebuffer Level of Detail

July 6, 2008 at 10:26 pm (Papers)

Image from \
Comparison image from “Geometry-Aware Framebuffer Level of Detail”

Here’s a nice compact little paper from GH 2008 on scaling down shading computation in a rendered scene.

1) Render G-buffer (normals, depth) at a lower resolution (full res * some resizing factor r )

2) Render scene, using bilateral filter (orig paper, SIGGRAPH course, bilateral upsampling ) to upsample some or all of the shading components in a discontinuity respecting manner. Expensive, lower frequency computations should be performed at the lower resolution while inexpensive high-frequency shading (such as specular) should be done at normal resolution.

3) Adjust r based on current framerate vs. some baseline desired framerate to maintain a more constant framerate.

This method is limited by the fact that you have to render the scene geometry twice and highly detailed geometry may introduce artifacts when upsampling, but it is interesting and definitely worth a quick read.

Geometry-Aware Framebuffer Level of Detail
Lei Yang, Pedro V. Sander, Jason Lawrence
Eurographics Symposium on Rendering 2008

Permalink 2 Comments

Graphics Hardware 2008 Slides

July 2, 2008 at 3:36 pm (Presentations)

Graphics Hardware 2008 organizers have begun to post slides from the conference here. I don’t think the papers are online yet and the proceedings didn’t come with a DVD this year so I guess everyone will just have to wait 🙂

Permalink Leave a Comment

Non-interleaved Deferred Shading of Interleaved Sample Patterns

June 25, 2008 at 11:19 pm (Papers)

Continuing with the theme of my last substantive post (it’s been awhile, but it was Keepin’ It Low-Res), here’s a paper from Graphics Hardware 2006 that deals with computing lighting at a resolution lower than the screen resolution. The exact application is in this case is Instant Radiosity style global illumination. As a refresh, instant radiosity approximates bounce lighting by tracing rays from the light sources and placing point lights where the ray intersects the scene to approximate reflected light. The radiance equation can then be approximated with a Monte Carlo integration of radiance computed by evaluating a set of point lights.

With a deferred rendering pipeline, you can reduce some of the shader bottlenecks you’d encounter evaluating hundreds of these placed point lights and shadow map comparisons by rendering proxies for each light source so that shading is computed only at relevant pixels. However, it would be even better if only had to evaluate a subset of those lights per-pixel. Obviously, that is the goal of this paper. The devil is in the details though, as performing incoherent lighting calculations (as you may do in a naive implementation) doesn’t jive so well on graphics hardware where coherency is key.

Let’s first talk about the naive implementation mentioned in the last paragraph. How might you first go about selectively shading different pixels with a different set of lights? You could start out by using dynamic flow control. Something like:

if ScreenSpacePos.x % 2 == 0 && ScreenSpacePos.y % 2 == 0

    for( int i = 0; i < nLightsSet1; i++ )
        // Compute shading for light i, in set 1
else if ScreenSpacePos.x % 2 == 1 && ScreenSpacePos.y % 2 == 0
    for( int i = 0; i < nLightsSet2; i++ )
       // Compute shading for light i, in set 2
else if ... // The other two cases

This is obviously bad for coherency. Within a group of pixels assigned to one SIMD, you are guaranteed to follow all four paths and essentially all four paths will be executed for each pixel in that SIMD group. So you will be doing 4x more work per-pixel than if all pixels in that group were from VPL set 1. You could similarly try and use a stencil buffer to mask all pixels for a given light set, but given the alternating light set pattern the stencil mask is incoherent and you won’t get any of the benefits of Hi-Stencil. The same thing goes for using depth to mask pixels (Hi-Z is trashed). I’m pretty sure in the last few generations of hardware that there isn’t a performance difference between using stencil or depth to cull computation, FYI.

I think you’re getting the idea. All of the GBuffer texels for a given light set should be computed coherently, so ideally all of the aforementioned GBuffer texels should be organized next to each other in the GBuffer. As an aside, it is important to understand that textures are block allocated on the GPU. They are allocated in 2D blocks because texture fetches are spatially coherent in 2D. When you fetch one texel, you’re typically going to fetch another one in the 2D neighborhood of the last fetch. Therefore when I say that GBuffer texels for a given light set should be next to each other in the Gbuffer I mean in blocks as depicted in Figure 1:

Figure 1 Left: Interleaved texels Center: Correctly coalesced texels Right: Non-cache friendly coalescence

In order to reorganize the GBuffer texels in such a fashion, the authors first suggest a one-pass approach. Say we have four VPL light sets. While we want them uniformly distributed across the viewport in a regular fashion so that we can interpolate the shaded values at the end, we want them organized in blocks to be coherent when shading. Texel (x,y) in the shuffled GBuffer should be mapped to texel ( (x % 2)*2 + x/2, (y%2)*2 + y/2) in the initial GBuffer. This will work fine, but the fetching of the initial GBuffer texels is incoherent (word count: 23) because they’re being fetched from all over the GBuffer instead of in a smaller neighborhood.

The authors suggest a two-pass approach instead. Rather than jumping straight to the shuffling described in the last paragraph, the shuffling is performed in cache-friendly blocks.

The shading is then performed on the reorganized GBuffer and then a de-swizzle or “gathering” step is performed to undo the mapping that was performed in previous steps. When the shaded pixels are de-swizzled, each pixel only has the shading for one subset of the overall set of lights in the scene. The authors discuss some clever little tricks to quickly filter these shading values in a discontinuity-respecting manner so that the computed irradiance is smooth across the image. I’ll leave that for you to parse from the paper.

Non-interleaved Deferred Shading of Interleaved Sample Patterns

Benjamin Segovia, Jean-Claude Iehl , Richard Mitanchey and Bernard Péroche Proceedings of SIGGRAPH/Eurographics Workshop on Graphics Hardware 2006

Permalink 4 Comments

ATI HD 4870 and 4850

June 25, 2008 at 5:01 pm (News)

*Shameless home-team raving content below*

The NDA on the new ATI hardware (R770) was lifted today, so a boat-load of reviews are out. And they’re all glowing. I’ve been with the company for about a year and a half and NVIDIA has had us pinned down from the day I started until now. It’s nice to have a winning product out there. If you’re thinking about buying a mid-range graphics card ($199-$299), or you want help ensure that I have a job in the future,  now is the time.

Hexus (french)
computerbase (german)

Permalink 1 Comment

Hold it now

June 23, 2008 at 10:21 am (Uncategorized)

So.. it’s been about 2 months since I posted on this blog. I have every intention of posting again!  I have been crunching like a mutha for the past two months and haven’t had time to form coherent, non-demo related thoughts let alone do normal things like prepare my own food or enjoy a sunny afternoon. But that’s over now, so I hope to start updating this blog regularly.  I realized that I haven’t posted anything related to demos that I’ve worked on at ATI so I might do that first.  Coming soon.

Permalink 1 Comment

SIGGRAPH 2008 Papers Posted

April 28, 2008 at 2:13 pm (Papers)

Not many papers are linked at this time, but Ke-Sen Huang has updated his graphics’ paper archive with SIGGRAPH 2008 papers. I am especially interested in “3D Unsharp Masking for Scene Coherent Enhancement”,  “Real-Time Smoke Rendering Using Compensated Ray Marching”,  and “Real-Time, All-Frequency Shadows in Dynamic Scenes”, but there are many papers that sound like they could be very interesting.

Permalink Leave a Comment

Keepin’ it Low-Res

April 24, 2008 at 3:24 am (Papers)

Lately I’ve been really interested in performing shading calculations at a frequency lower than per-pixel. Depending on factors like the frequency of the lighting effect and surface orientation relative to camera, you can get away with computing values at a lower resolution or at an adaptive sampling.

Off-screen particles

The likely most used technique along these lines is the “Off-screen Particles” method that NVIDIA first used in the Vulcan demo and then wrote up later in GPU Gems 3. The basic idea is that you blend your particles into a lower than screen resolution texture to save on fill rate. Particles, like smoke, are notorious fill rate hogs due to massive amounts of overdraw and potentially expensive per-pixel lighting computations. In order to get correct particle occlusion, a manually downsampled depth buffer has to be used for depth testing the lower resolution particle buffer. This introduces problems near particle-scene intersections due to gaps created by this low resolution depth testing. This is resolved by performing high-resolution particle blending in problem areas. Problem areas are identified with an edge detection filter performed on the low res buffer’s alpha channel. However, I have heard from a few developer friends that in most cases this fix-up pass to correct problems is not needed because artifacts are not noticable in the majority of cases.

An additional optimization is to render particles in to one of a few different resolution buffers. For example, particles that are very close the camera likely needed very little screen space resolution because higher frequency features are covering large amounts of screen real estate. Additionally, particles closest to the camera are going to contribute the most to overall fill rate (compared to other scene particles) because they are likely taking up most if not all of the screen pixels. You’d mostly see this situation in non-uniform volumetric fog or dense smoke conditions.

For more info on the original method, see the online version of the Vulcan paper or take a look at the “High-Speed, Off-Screen Particles” article in GPU Gems 3.

Bilateral Upsampling

A few months ago I mentioned the Bilateral Upsampling method used by Sloan et al. in their paper “Image-Based Proxy Accumulation for Real-Time Soft Global Illumination”. The concept is to bilinearly interpolate lower resolution illumination results while applying a weighting function to interpolants so that values aren’t interpolated across boundaries. In some situations it is required to perform additional computation near boundaries. This situation is handled similarly to the off-screen particle method: edge detection and selective higher resolution calculation in edge regions. Read my previous entry and the paper for more info.

Adaptive Soft Shadows with Push-Pull Reconstruction
Orange indicates pixels where no computation was performed

(Images from the paper: Orange pixels are where computation is skipped)

Another trick was used by Gaël Guennebaud et al. in the 2007 Eurographics paper “High-Quality Adaptive Soft Shadow Mapping”. This trick computes soft shadow amounts at adaptive resolution rather than simply at a lower resolution as in the previously discussed methods. While not as simple to upsample, computing values adaptively based on surface orientation and shadow frequency allocates more fidelity in regions that need it. The other methods simply pick some lower resolution and forgo any higher frequency information. Adaptive sample locations are determined dynamically by computing a per-pixel value in screenspace pre-pass based on surface normal and penumbra width (determined by examining the shadow map) and thresholding it against a repeated pattern. This is sort of along the lines of what is done with dithering. This pattern is a bit stochastic such that some pixels are bound to be identified as needing computation regardless of the scene/visibility complexity contained in those pixels, but more or all pixels inside the pattern will be identified in cases where the surface is quickly changing or the penumbra is very tight. After pixels are identified, a relatively expensive soft shadow technique is used to compute visibilty at those locations. These visibility values are then interpolated to fill in all screen pixels.

So, the question you may be asking is.. how do you interpolate non-uniformly spaced samples? The previous two methods discussed here computed at a lower resolution where all samples are equally spaced. Bilinear interpolation suffices there and we are all quite familiar with that. It turns out that the answer was provided over 20 years ago in a SIGGRAPH ’87 paper by Don P. Mitchell titled “Generating Antialiased Images at Low Sampling Densities”. This method was also used by Gortler et al. in “The Lumigraph” at SIGGRAPH 96 and by Grossman in his ’97 thesis “Point Sample Rendering“.

The concept is that you start with a screen sized buffer containing your initial sparse sampling. All pixels containing a value are given a weight of 1 and all others are given a weight of 0. In the “Pull” stage, you construct a pyramid of lower resolution images (each one half the resolution of the previous, like mipmaps) with accompanying weights calculated as a sum of the weights in the previous higher resolution map. Then in the following “Push” phase you reconstruct each successive higher resolution image by filling in gaps with values from the lower resolution image based on the accumulated weights from the Pull phase. I prefer to keep the math and notation out of the blog so check out page 23-24 of Grossman’s thesis for a thorough explanation. This method is a bit hairier than the other two so it would probably only pay off in situations when you computation is fairly expensive.

I should also note that the adaptive soft shadows paper is an extension on Guennebaud’s 2006 paper “Real-time Soft Shadow Mapping by Backprojection” which is also cool (and simpler).

Permalink 9 Comments

Horizon Split Ambient Occlusion

February 27, 2008 at 3:55 am (Game techniques, Presentations)

I have so much I want to write about what I saw at I3D and GDC that sitting down and doing it seems daunting. I will try and post my thoughts over the next few days.

One interesting poster at I3D (and also a short talk at GDC) was an extension to SSAO. The idea was to calculate a piece-wise linear approximation of the horizon ala horizon mapping. This is achieved by sampling the depth values along m steps ( they used 8 ) in n equally spaced directions (8 again) in the tangent frame. At each depth sample, you update the horizon value if the current depth sample is higher than the current estimation of the horizon in that direction. Sampling in this fashion reduces over-occlusion. This is actually very similar to the approximation to AO that Dachsbacher and Tatarchuk described in their poster at I3D last year, “Prism Parallax Occlusion Mapping with Accurate Silhouette Generation“. All of the authors from the poster are NVIDIA guys and they announced that they will release a whitepaper describing their method.

Permalink 3 Comments

« Previous page · Next page »