Deferred Shading Demo
19 July, 2005
(Updated 21 July, 2005)


Download the demo (3.3 MB) - source and binary
Hi everyone,
[ Note: There is new version of this demo available here. ]
This demo presents 5 different multiple render targets (MRT)
configurations used for deferred shading. By pressing <R> key
you can switch between 2 available renderers: forward renderer
(traditional one; implemented for quality and speed comparison
reasons only) and deferred renderer with following 5 modes:
| Mode ID |
Render targets format / data storage (in 4 render targets) |
Average FPS on GeForce 6600 TD (800x600, Ambient and Diffuse only) |
Average FPS on GeForce 6600 TD (800x600, Ambient, Diffuse and Specular) |
Quality |
| 0 |
A8R8G8B8 - Color (R8G8B8), unused A8
R32F - Position as depth in clip space (R32F)
A8R8G8B8 - Normal in view space biased (R8G8B8), unused A8
A8R8G8B8 - Material: ambient, diffuse, specular, shininess biased
|
35 - 40 |
20 - 30 |
Good |
| 1 |
A16R16G16B16F - Color (R16G16B16F), unused A16F
A16R16G16B16F - Position in view space (R16G16B16F), unused A16F
A16R16G16B16F - Normal in view space (R16G16B16F), unused A16F
A16R16G16B16F - Material: ambient, diffuse, specular, shininess
|
25 - 30 |
18 - 25 |
Excellent |
| 2 |
A16R16G16B16 - Color (R16G16B16), unused A16
A16R16G16B16 - Position as depth in clip space packed (R16G16B16), unused A16
A16R16G16B16 - Normal in view space biased (R16G16B16), unused A16
A16R16G16B16 - Material: ambient, diffuse, specular, shininess biased
|
~24 |
~20 |
Very poor |
| 3 |
A8R8G8B8 - Color (R8G8B8), unused A8
G16R16F - Position as depth in clip space (G16F), unused R16F
A8R8G8B8 - Normal in view space biased (R8G8B8), unused A8
A8R8G8B8 - Material: ambient, diffuse, specular, shininess biased
|
19 |
~17 |
Very poor |
| 4 |
A8R8G8B8 - Color (R8G8B8), unused A8
A8R8G8B8 - Position as depth in clip space packed (R8G8B8), unused A8
A8R8G8B8 - Normal in view space biased (R8G8B8), unused A8
A8R8G8B8 - Material: ambient, diffuse, specular, shininess biased
|
~28 |
~22 |
Very poor |
To run this demo you need Direct3D 9.0d installed and card
capable of:
- creating 4 floating value render targets (each 4 x 16 bits)
- using Pixel Shader 2.0
- post-pixel blending operations for MRT (alpha-blending)
Currently all GeForce 6 class cards support these features.
As I know thanks to helpful guys from GameDev community, it
runs well on Radeon 9800 too (probably 9500 and up as well).
The table here contains test results for all tested cards, all
tests done for R32F_Position_R8G8B8_Normal mode, in 800x600
and with ambient and diffuse lighting only:
| Card |
Drivers |
Average FPS |
Additional issues |
Tester |
| GeForce 6600 TD |
77.72 |
35 - 40 |
- jumpy FPS for some time after recreating render targets
- sometimes recreating render targets results in much worse performance
- see below the table for more issues
|
me :-) |
| Radeon 9800 Pro 128 MB |
? |
85 |
--- |
Konfusius |
| GeForce 6800 GT |
? |
110 |
--- |
NoodleizzeR |
| X800 XT |
? |
130 - 200 |
--- |
? |
| GeForce 6 Ultra |
? |
120 |
--- |
blue_knight |
| Radeon 9800 Pro 128 MB |
? |
100+ |
--- |
evanofsky |
| Radeon 9800 Pro 128 MB |
? |
90 |
--- |
pbryant |
| GeForce 6800 GO |
? |
85 |
- only 30 FPS in mode R16G16B16_Position_R16G16B16_Normal
- every 3rd or so time, in the same mode, it drops to 10 FPS
|
pbryant |
| GeForce 6600 GT |
77.72 |
59 - 73 |
--- |
vEEcEE |
For more info and valuable remarks from other GameDev guys see this GameDev.net thread .
Issues found when implementing deferred renderer (probably
very GeForce cards specific):
- an optimization with stencil masking pixels not being lit
by the light was actually not optimization at all (but it's
still necessary to correctly determine lit pixels); you can
see stencil test in action by pressing and disabling
few lights (just to see more clearly)
- fastest (and good quality) deferred renderer mode for me was:
R32F for position (as depth; stored in clip space)
A8R8G8B8 for normals (biased; stored in world space)
- best quality deferred renderer mode was obviously:
R16G16B16F for position (stored in world space)
R16G16B16F for normal (stored in world space)
- speed of rendering using deferred renderer was different
depending on when (yes when) were render target textures
allocated; e.g. for me mode R16G16B16 (non-float) when
switched on for the first time was usually about 2 times
slower than when switched on for the second time (every time
I switch, I recreate all required render targets); looks
like card drivers are doing some unpredictable job when
allocating / deallocating render target textures
One more thing to try is to linearize / delinearize and
scale / rescale depth in clip space when using any of non-float
depth storages - this will possibly result in better depth
values distribution.
The other would be to store all materials (emissive, ambient,
diffuse, specular and shininess) in lookup texture and reference
them by id, thus maybe saving even whole one render target.
However, in practice there are several more factors we would like
to store per-pixel, examples are: ambient occlusion, gloss mask
or "is shadowed" flag.
If you found a bug or have suggestions regarding the demo,
just let me know.
Have fun,
Maciej Sawitus
|