:: Welcome to homepage of Maciej Sawitus ::

3D / game development || contact

:: 3d / game development > deferred shading demo


Deferred Shading Demo
19 July, 2005
(Updated 21 July, 2005)





Download the demo (3.3 MB) - source and binary

Hi everyone,

[ Note: There is new version of this demo available here. ]

This demo presents 5 different multiple render targets (MRT)
configurations used for deferred shading. By pressing <R> key
you can switch between 2 available renderers: forward renderer
(traditional one; implemented for quality and speed comparison
reasons only) and deferred renderer with following 5 modes:
Mode ID Render targets format / data storage (in 4 render targets) Average FPS on GeForce 6600 TD
(800x600, Ambient and Diffuse only)
Average FPS on GeForce 6600 TD
(800x600, Ambient, Diffuse and Specular)
Quality
0 A8R8G8B8 - Color (R8G8B8), unused A8
R32F - Position as depth in clip space (R32F)
A8R8G8B8 - Normal in view space biased (R8G8B8), unused A8
A8R8G8B8 - Material: ambient, diffuse, specular, shininess biased
35 - 40 20 - 30 Good
1 A16R16G16B16F - Color (R16G16B16F), unused A16F
A16R16G16B16F - Position in view space (R16G16B16F), unused A16F
A16R16G16B16F - Normal in view space (R16G16B16F), unused A16F
A16R16G16B16F - Material: ambient, diffuse, specular, shininess
25 - 30 18 - 25 Excellent
2 A16R16G16B16 - Color (R16G16B16), unused A16
A16R16G16B16 - Position as depth in clip space packed (R16G16B16), unused A16
A16R16G16B16 - Normal in view space biased (R16G16B16), unused A16
A16R16G16B16 - Material: ambient, diffuse, specular, shininess biased
~24 ~20 Very poor
3 A8R8G8B8 - Color (R8G8B8), unused A8
G16R16F - Position as depth in clip space (G16F), unused R16F
A8R8G8B8 - Normal in view space biased (R8G8B8), unused A8
A8R8G8B8 - Material: ambient, diffuse, specular, shininess biased
19 ~17 Very poor
4 A8R8G8B8 - Color (R8G8B8), unused A8
A8R8G8B8 - Position as depth in clip space packed (R8G8B8), unused A8
A8R8G8B8 - Normal in view space biased (R8G8B8), unused A8
A8R8G8B8 - Material: ambient, diffuse, specular, shininess biased
~28 ~22 Very poor

To run this demo you need Direct3D 9.0d installed and card
capable of:
- creating 4 floating value render targets (each 4 x 16 bits)
- using Pixel Shader 2.0
- post-pixel blending operations for MRT (alpha-blending)

Currently all GeForce 6 class cards support these features.
As I know thanks to helpful guys from GameDev community, it
runs well on Radeon 9800 too (probably 9500 and up as well).
The table here contains test results for all tested cards, all
tests done for R32F_Position_R8G8B8_Normal mode, in 800x600
and with ambient and diffuse lighting only:
Card Drivers Average FPS Additional issues Tester
GeForce 6600 TD 77.72 35 - 40 - jumpy FPS for some time after recreating render targets
- sometimes recreating render targets results in much worse performance
- see below the table for more issues
me :-)
Radeon 9800 Pro 128 MB ? 85 --- Konfusius
GeForce 6800 GT ? 110 --- NoodleizzeR
X800 XT ? 130 - 200 --- ?
GeForce 6 Ultra ? 120 --- blue_knight
Radeon 9800 Pro 128 MB ? 100+ --- evanofsky
Radeon 9800 Pro 128 MB ? 90 --- pbryant
GeForce 6800 GO ? 85 - only 30 FPS in mode R16G16B16_Position_R16G16B16_Normal
- every 3rd or so time, in the same mode, it drops to 10 FPS
pbryant
GeForce 6600 GT 77.72 59 - 73 --- vEEcEE

For more info and valuable remarks from other GameDev guys see this GameDev.net thread .
Issues found when implementing deferred renderer (probably
very GeForce cards specific):

- an optimization with stencil masking pixels not being lit
  by the light was actually not optimization at all (but it's
  still necessary to correctly determine lit pixels); you can
  see stencil test in action by pressing  and disabling
  few lights (just to see more clearly)

- fastest (and good quality) deferred renderer mode for me was:
    R32F for position (as depth; stored in clip space)
    A8R8G8B8 for normals (biased; stored in world space)

- best quality deferred renderer mode was obviously:
    R16G16B16F for position (stored in world space)
    R16G16B16F for normal (stored in world space)

- speed of rendering using deferred renderer was different
  depending on when (yes when) were render target textures
  allocated; e.g. for me mode R16G16B16 (non-float) when
  switched on for the first time was usually about 2 times
  slower than when switched on for the second time (every time
  I switch, I recreate all required render targets); looks
  like card drivers are doing some unpredictable job when
  allocating / deallocating render target textures
  
    
One more thing to try is to linearize / delinearize and
scale / rescale depth in clip space when using any of non-float
depth storages - this will possibly result in better depth
values distribution.

The other would be to store all materials (emissive, ambient,
diffuse, specular and shininess) in lookup texture and reference
them by id, thus maybe saving even whole one render target.
However, in practice there are several more factors we would like
to store per-pixel, examples are: ambient occlusion, gloss mask
or "is shadowed" flag.

If you found a bug or have suggestions regarding the demo,
just let me know.

Have fun,
Maciej Sawitus


visitors so far: ?
last modified: 2008.09.29