Friday, December 28, 2018

Reverse engineering the rendering of The Witcher 3, part 9 - GBuffer

This post is a part of the series "Reverse engineering the rendering of The Witcher 3".


Welcome,

This is the ninth part of my series about rendering in The Witcher 3. Click here for full index.

In this part I will show some details about geometry buffer (gbuffer) in The Witcher 3.

I assume here that you know the basics of deferred shading. 
Quick recap: the idea is to, well, defer rendering by not calculating all final lighting and shading immediately, but instead separate calculations into two stages.
In the first one (geometry pass) we fill GBuffer with data about surface (position, normals, specular color etc...) and in the second one (lighting pass) we combine everything and calculate lighting. 

Deferred shading is hugely popular approach because it allows to calculate lighting in one full-screen pass with techniques like tile-based deferred shading which greatly improves performance.

Simply speaking, GBuffer is collecton of textures with properties of geometry. It's very important to design its layout carefully. For real-life example, check for instance The Rendering Technologies of Crysis 3

After this brief introduction let's take a look at example frame from The Witcher 3: Blood & Wine:
One of many inns in Toussaint

The main GBuffer consists of three fullscreen render targets with DXGI_FORMAT_R8G8B8A8_UNORM format
and DXGI_FORMAT_D24_UNORM_S8_UINT depth+stencil buffer.

Here are screenshots of them:
Render Target 0 - RGB channels, surface color

Render Target 0 - A channel. I have no idea what it is, really.

Render Target 1 - RGB channels. We have normal vectors in [0-1] range here.

Render Target 1 - A channel. Looks like reflectance!

Render Target 2 - RGB channels. Looks like specular color!
A channel is black in this scene (but it is used later)

Depth buffer. Note that reversed depth is used here

Stencil buffer to mark certain type of pixels (like skin, vegetation etc)
This is not whole GBuffer. Lighting pass also uses reflection probes and other buffers but this is not the subject of this post.

Before I start the "main" part of this post, some general observations first:


General observations


1) The only buffer to clear is depth/stencil.

If you analyze aforementioned textures in any good frame analyzer you may be a little surprised, because there is no "Clear" call on them with exception of Depth/Stencil.

So in reality RenderTarget1 looks like this (notice "blurred" pixels on far plane):

This is simple and nice optimization. 
Take with you: ClearRenderTargetView calls are not free, so use them only when really necessary.


2) Reversed depth rocks

Many articles have been already written about precision of floating-point depth buffer. The Witcher 3 uses reversed-z which is natural choice for such game with open world and long draw distances.

For DirectX the switch shouldn't be difficult:

a) Clear depth buffer with "0" intead on "1".
In a traditional approach we used to clear depth buffer far value of "1". After reversing depth, the new "far" value is zero, so we need to change that.

b) Flip near and far clip values when calculating projection matrix

c) Change depth test from "Less" to "Greater".

For OpenGL there is a bit more work (see mentioned articles) but it is really worth the effort.


3) Do not store world position

It is that simple. Reconstruct world position from depth in lighting pass.


Pixel Shader

What I want to show in this post is pixel shader which feeds GBuffer with surface data. 
So we know by now we that store at least color, normals and specular.
Of course it's not that simple as you may think.

The problem with this pixel shader is that it comes in many variants. They differ in number of textures consumed and number of parameters used from constant buffer (probably constant buffer which describes material).

I decided to use this nice barrel for analyze:
Our heroic barrel!
And please give warm welcome to textures used:

So we have albedo, normal map and specular color. Pretty common scenario.

Before we start, few words about geometry inputs:
The geometry comes with position, texcoords, normal and tangent buffers.
Vertex Shader outputs at least texcoords, normalized tangent/normal/bitangent vectors multiplied earlier by world matrix. For more complicated materials (like with two diffuse or normal maps) vertex shader can output other data but I wanted to show here the simple cases.


Pixel Shader as assembly:
 ps_5_0  
    dcl_globalFlags refactoringAllowed  
    dcl_constantbuffer cb4[3], immediateIndexed  
    dcl_sampler s0, mode_default  
    dcl_sampler s13, mode_default  
    dcl_resource_texture2d (float,float,float,float) t0  
    dcl_resource_texture2d (float,float,float,float) t1  
    dcl_resource_texture2d (float,float,float,float) t2  
    dcl_resource_texture2d (float,float,float,float) t13  
    dcl_input_ps linear v0.zw  
    dcl_input_ps linear v1.xyzw  
    dcl_input_ps linear v2.xyz  
    dcl_input_ps linear v3.xyz  
    dcl_input_ps_sgv v4.x, isfrontface  
    dcl_output o0.xyzw  
    dcl_output o1.xyzw  
    dcl_output o2.xyzw  
    dcl_temps 3  
   0: sample_indexable(texture2d)(float,float,float,float) r0.xyzw, v1.xyxx, t1.xyzw, s0  
   1: sample_indexable(texture2d)(float,float,float,float) r1.xyz, v1.xyxx, t0.xyzw, s0  
   2: add r1.w, r1.y, r1.x  
   3: add r1.w, r1.z, r1.w  
   4: mul r2.x, r1.w, l(0.333300)  
   5: add r2.y, l(-1.000000), cb4[1].x  
   6: mul r2.y, r2.y, l(0.500000)  
   7: mov_sat r2.z, r2.y  
   8: mad r1.w, r1.w, l(-0.666600), l(1.000000)  
   9: mad r1.w, r2.z, r1.w, r2.x  
  10: mul r2.xzw, r1.xxyz, cb4[0].xxyz  
  11: mul_sat r2.xzw, r2.xxzw, l(1.500000, 0.000000, 1.500000, 1.500000)  
  12: mul_sat r1.w, abs(r2.y), r1.w  
  13: add r2.xyz, -r1.xyzx, r2.xzwx  
  14: mad r1.xyz, r1.wwww, r2.xyzx, r1.xyzx  
  15: max r1.w, r1.z, r1.y  
  16: max r1.w, r1.w, r1.x  
  17: lt r1.w, l(0.220000), r1.w  
  18: movc r1.w, r1.w, l(-0.300000), l(-0.150000)  
  19: mad r1.w, v0.z, r1.w, l(1.000000)  
  20: mul o0.xyz, r1.wwww, r1.xyzx  
  21: add r0.xyz, r0.xyzx, l(-0.500000, -0.500000, -0.500000, 0.000000)  
  22: add r0.xyz, r0.xyzx, r0.xyzx  
  23: mov r1.x, v0.w  
  24: mov r1.yz, v1.zzwz  
  25: mul r1.xyz, r0.yyyy, r1.xyzx  
  26: mad r1.xyz, v3.xyzx, r0.xxxx, r1.xyzx  
  27: mad r0.xyz, v2.xyzx, r0.zzzz, r1.xyzx  
  28: uge r1.x, l(0), v4.x  
  29: if_nz r1.x  
  30:  dp3 r1.x, v2.xyzx, r0.xyzx  
  31:  mul r1.xyz, r1.xxxx, v2.xyzx  
  32:  mad r0.xyz, -r1.xyzx, l(2.000000, 2.000000, 2.000000, 0.000000), r0.xyzx  
  33: endif  
  34: sample_indexable(texture2d)(float,float,float,float) r1.xyz, v1.xyxx, t2.xyzw, s0  
  35: max r1.w, r1.z, r1.y  
  36: max r1.w, r1.w, r1.x  
  37: lt r1.w, l(0.200000), r1.w  
  38: movc r2.xyz, r1.wwww, r1.xyzx, l(0.120000, 0.120000, 0.120000, 0.000000)  
  39: add r2.xyz, -r1.xyzx, r2.xyzx  
  40: mad o2.xyz, v0.zzzz, r2.xyzx, r1.xyzx  
  41: lt r1.x, r0.w, l(0.330000)  
  42: mul r1.y, r0.w, l(0.950000)  
  43: movc r1.x, r1.x, r1.y, l(0.330000)  
  44: add r1.x, -r0.w, r1.x  
  45: mad o1.w, v0.z, r1.x, r0.w  
  46: dp3 r0.w, r0.xyzx, r0.xyzx  
  47: rsq r0.w, r0.w  
  48: mul r0.xyz, r0.wwww, r0.xyzx  
  49: max r0.w, abs(r0.y), abs(r0.x)  
  50: max r0.w, r0.w, abs(r0.z)  
  51: lt r1.xy, abs(r0.zyzz), r0.wwww  
  52: movc r1.yz, r1.yyyy, abs(r0.zzyz), abs(r0.zzxz)  
  53: movc r1.xy, r1.xxxx, r1.yzyy, abs(r0.yxyy)  
  54: lt r1.z, r1.y, r1.x  
  55: movc r1.xy, r1.zzzz, r1.xyxx, r1.yxyy  
  56: div r1.z, r1.y, r1.x  
  57: div r0.xyz, r0.xyzx, r0.wwww  
  58: sample_l(texture2d)(float,float,float,float) r0.w, r1.xzxx, t13.yzwx, s13, l(0)  
  59: mul r0.xyz, r0.wwww, r0.xyzx  
  60: mad o1.xyz, r0.xyzx, l(0.500000, 0.500000, 0.500000, 0.000000), l(0.500000, 0.500000, 0.500000, 0.000000)  
  61: mov o0.w, cb4[2].x  
  62: mov o2.w, l(0)  
  63: ret  

The shader has few stages. I will describe each main part of this shader separately.
But at first, as always - screenshot with values from constant buffer:

Albedo

We start with hard stuff. It's not that simple as "OutputColor.rgb = Texture.Sample(uv).rgb"
After we sample RGB of color texture (line 1) the next 14 lines are something which I called "desaturation filter". Let me show you HLSL code:

 float3 albedoColorFilter( in float3 color, in float desaturationFactor, in float3 desaturationValue )  
 {  
   float sumColorComponents = color.r + color.g + color.b;  
    
   float averageColorComponentValue = 0.3333 * sumColorComponents;  
   float oneMinusAverageColorComponentValue = 1.0 - averageColorComponentValue;  
     
   float factor = 0.5 * (desaturationFactor - 1.0);  
     
   float avgColorComponent = lerp(averageColorComponentValue, oneMinusAverageColorComponentValue, saturate(factor));  
   float3 desaturatedColor = saturate(color * desaturationValue * 1.5);  
    
   float mask = saturate( avgColorComponent * abs(factor) );  
   
   float3 finalColor = lerp( color, desaturatedColor, mask );  
   return finalColor;  
 }  

For majority of objects, this code does nothing but returns the original color from texture. This is achieved by proper "material cbuffer" values. cb4_v1.x is set to 1.0 which returns in mask equal to 0.0 and gives input color from lerp instruction.

However, there are some exceptions. The highest value of desaturationFactor I found was 4.0 (never below 1.0) and desaturatedColor depends on material. It can be something like (0.2, 0.3, 0.4); there are no strict rules. Of course I couldn't resist to implement this in my own DX11 framework and here are the results, all with desaturatedColor equal to float3( 0.25, 0.3, 0.45 )

desaturationFactor = 1.0 (no effect)

desaturationFactor = 2.0

desaturationFactor = 3.0

desaturationFactor = 4.0
I'm sure it's just applying material parameters but it's not the end of the albedo part.
Lines 15-20 perform final touches:
  15: max r1.w, r1.z, r1.y   
  16: max r1.w, r1.w, r1.x   
  17: lt r1.w, l(0.220000), r1.w   
  18: movc r1.w, r1.w, l(-0.300000), l(-0.150000)   
  19: mad r1.w, v0.z, r1.w, l(1.000000)   
  20: mul o0.xyz, r1.wwww, r1.xyzx   

v0.z is output from Vertex Shader and it's equal to zero. Remember it, because v0.z will be used later a couple of times.

It seems to be some factor and all this code looks like darkening albedo a little bit, but since v0.z is equal to 0, the color is untouched. HLSL:

   /* ALBEDO */  
   // optional desaturation (?) filter  
   float3 albedoColor = albedoColorFilter( colorTex, cb4_v1.x, cb4_v0.rgb );  
   float albedoMaxComponent = getMaxComponent( albedoColor );  
     
   // I really have no idea what this is  
   // In most of cases Vertex Shader outputs "paramZ" as 0  
   float paramZ = Input.out0.z;  // note, mostly 0  
   
   // Note that 0.70 are 0.85 are not present in the output assembly  
   // Because I wanted to use lerp here I had to adjust them manually.  
   float param = (albedoMaxComponent > 0.22) ? 0.70 : 0.85;  
   float mulParam = lerp(1, param, paramZ);  
   
   // Output  
   pout.RT0.rgb = albedoColor * mulParam;  
   pout.RT0.a = cb4_v2.x;  

Regarding RT0.a, as you can see, it comes from materal's constant buffer but since the shader has no debug information, it's hard to say exactly what this is. Maybe translucency?

We are done with the first render target!

Normals

We start by unpacking normal map, then we perform normal mapping as usual:
   /* NORMALS */   
   float3 sampledNormal = ((normalTex.xyz - 0.5) * 2);  
   
   // Data to construct TBN matrix  
   float3 Tangent = Input.TangentW.xyz;  
   float3 Normal = Input.NormalW.xyz;  
   float3 Bitangent;  
   Bitangent.x = Input.out0.w;  
   Bitangent.yz = Input.out1.zw;  
   
   // remove this saturate in real scenario, this is a hack to make sure normal-tbn multiplication  
   // will have 'mad' instructions in assembly instead a bunch of 'mov's
   Bitangent = saturate(Bitangent);  
     
   float3x3 TBN = float3x3(Tangent, Bitangent, Normal);  
   float3 normal = mul( sampledNormal, TBN );  

Nothing really surprising so far.

Take a look at lines 28-33:
  28: uge r1.x, l(0), v4.x   
  29: if_nz r1.x   
  30: dp3 r1.x, v2.xyzx, r0.xyzx   
  31: mul r1.xyz, r1.xxxx, v2.xyzx   
  32: mad r0.xyz, -r1.xyzx, l(2.000000, 2.000000, 2.000000, 0.000000), r0.xyzx   
  33: endif   

We can roughly write it this way:
   [branch] if (bIsFrontFace <= 0)  
   {  
      float cosTheta = dot(Input.NormalW, normal);  
      float3 invNormal = cosTheta * Input.NormalW;  
      normal = normal - 2*invNormal;  
   }  

I'm not sure if this is a proper way of writing this. If you know what type of mathematical operation this is - let me know.

We see that the pixel shader uses SV_IsFrontFace.
What's that? Documentation (I wanted to write 'msdn' but..) comes to the rescue:

"Specifies whether a triangle is front facing. For lines and points, IsFrontFace has the value true. The exception is lines drawn out of triangles (wireframe mode), which sets IsFrontFace the same way as rasterizing the triangle in solid mode. Can be written to by the geometry shader, and read by the pixel shader."

I also wanted to check it for myself. Indeed, the effect is visible in wireframe mode only. I believe the purpose of this piece of code is to properly calculate normals (therefore, lighting) in wireframe mode.
Here is a comparison: Both wireframe final scene color with this trick off/on as well as gbuffer normal [0-1] texture with this trick off/on:

Scene color without the trick

Scene color with the trick
Normals [0-1] without the trick

Normals [0-1] with the trick
Have you noticed that the format of every rendertarget of GBuffer is R8G8B8A8_UNORM? That means we have 256 possible values per one component. Is it enough for storing normals?

Storing high quality normals with reasonable amount of bytes in GBuffer is known problem but fortunately there is a lot of material to learn from.

Probably some of you already know what technique is used here. I'd like to say that in whole geometry pass there is one additional texture attached to slot #13...:


Ha! The Witcher 3 uses technique known as "Best Fit Normals". I will not go here in details (refer to the presentation). It was invented around 2009-2010 by Crytek and since CryEngine is open source, BFN is open source too.

BFN causes "grainy" look of normals texture.
Afer scaling normal with the best fit, we encode it from [-1;1] to [0, 1] range.

Specular 

We start from line 34, by sampling specular texture:
  34: sample_indexable(texture2d)(float,float,float,float) r1.xyz, v1.xyxx, t2.xyzw, s0   
  35: max r1.w, r1.z, r1.y   
  36: max r1.w, r1.w, r1.x   
  37: lt r1.w, l(0.200000), r1.w   
  38: movc r2.xyz, r1.wwww, r1.xyzx, l(0.120000, 0.120000, 0.120000, 0.000000)   
  39: add r2.xyz, -r1.xyzx, r2.xyzx   
  40: mad o2.xyz, v0.zzzz, r2.xyzx, r1.xyzx   

As you can see, there is similar "darkening" filter as with Albedo:
Calc component with max value, then calulate "darker" color and interpolate with original specular color using a parameter from vertex shader... which is set to 0, so we output color from texture.

HLSL:
   /* SPECULAR */  
   float3 specularTex = texture2.Sample( samplerAnisoWrap, Texcoords ).rgb;  
   
   // Similar algorithm as in Albedo. Calculate max component, compare this with  
   // some threshold and calculate "minimum" value if needed.  
   // Because in the scene I analyzed paramZ was set to zero, value from texture will be  
   // the final result.  
   float specularMaxComponent = getMaxComponent( specularTex );  
   float3 specB = (specularMaxComponent > 0.2) ? specularTex : float3(0.12, 0.12, 0.12);  
   float3 finalSpec = lerp(specularTex, specB, paramZ);  
   pout.RT2.xyz = finalSpec;  

Reflectivity

I have no idea if this name is proper for this parameter since I don't know how it affects lighting pass. The thing is that alpha channel of input normal map has additional data:
Alpha channel of "normal map" texture. (c) CD Projekt Red
Assembly:
  41: lt r1.x, r0.w, l(0.330000)   
  42: mul r1.y, r0.w, l(0.950000)   
  43: movc r1.x, r1.x, r1.y, l(0.330000)   
  44: add r1.x, -r0.w, r1.x   
  45: mad o1.w, v0.z, r1.x, r0.w   

Say hello to our old friend, 'v0.z'! This is similar to both albedo and specular:
   /* REFLECTIVITY */  
   float reflectivity = normalTex.a;  
   float reflectivity2 = (reflectivity < 0.33) ? (reflectivity * 0.95) : 0.33;  
     
   float finalReflectivity = lerp(reflectivity, reflectivity2, paramZ);  
   pout.RT1.a = finalReflectivity;  

Nice! This is the end of analyzing the first variant of pixel shader.

In terms of result, here is a comparison of my shader (left) with the original one (right):
These differences do not affect calculations so my job is done here ;)



Pixel Shader - "Albedo + Normals" variant

I decided to show you one more variant - now with albedo & normal maps only - without specular texture. The assembly is a bit longer:
 ps_5_0  
    dcl_globalFlags refactoringAllowed  
    dcl_constantbuffer cb4[8], immediateIndexed  
    dcl_sampler s0, mode_default  
    dcl_sampler s13, mode_default  
    dcl_resource_texture2d (float,float,float,float) t0  
    dcl_resource_texture2d (float,float,float,float) t1  
    dcl_resource_texture2d (float,float,float,float) t13  
    dcl_input_ps linear v0.zw  
    dcl_input_ps linear v1.xyzw  
    dcl_input_ps linear v2.xyz  
    dcl_input_ps linear v3.xyz  
    dcl_input_ps_sgv v4.x, isfrontface  
    dcl_output o0.xyzw  
    dcl_output o1.xyzw  
    dcl_output o2.xyzw  
    dcl_temps 4  
   0: mul r0.x, v0.z, cb4[0].x  
   1: sample_indexable(texture2d)(float,float,float,float) r1.xyzw, v1.xyxx, t1.xyzw, s0  
   2: sample_indexable(texture2d)(float,float,float,float) r0.yzw, v1.xyxx, t0.wxyz, s0  
   3: add r2.x, r0.z, r0.y  
   4: add r2.x, r0.w, r2.x  
   5: add r2.z, l(-1.000000), cb4[2].x  
   6: mul r2.yz, r2.xxzx, l(0.000000, 0.333300, 0.500000, 0.000000)  
   7: mov_sat r2.w, r2.z  
   8: mad r2.x, r2.x, l(-0.666600), l(1.000000)  
   9: mad r2.x, r2.w, r2.x, r2.y  
  10: mul r3.xyz, r0.yzwy, cb4[1].xyzx  
  11: mul_sat r3.xyz, r3.xyzx, l(1.500000, 1.500000, 1.500000, 0.000000)  
  12: mul_sat r2.x, abs(r2.z), r2.x  
  13: add r2.yzw, -r0.yyzw, r3.xxyz  
  14: mad r0.yzw, r2.xxxx, r2.yyzw, r0.yyzw  
  15: max r2.x, r0.w, r0.z  
  16: max r2.x, r0.y, r2.x  
  17: lt r2.x, l(0.220000), r2.x  
  18: movc r2.x, r2.x, l(-0.300000), l(-0.150000)  
  19: mad r0.x, r0.x, r2.x, l(1.000000)  
  20: mul o0.xyz, r0.xxxx, r0.yzwy  
  21: add r0.xyz, r1.xyzx, l(-0.500000, -0.500000, -0.500000, 0.000000)  
  22: add r0.xyz, r0.xyzx, r0.xyzx  
  23: mov r1.x, v0.w  
  24: mov r1.yz, v1.zzwz  
  25: mul r1.xyz, r0.yyyy, r1.xyzx  
  26: mad r0.xyw, v3.xyxz, r0.xxxx, r1.xyxz  
  27: mad r0.xyz, v2.xyzx, r0.zzzz, r0.xywx  
  28: uge r0.w, l(0), v4.x  
  29: if_nz r0.w  
  30:  dp3 r0.w, v2.xyzx, r0.xyzx  
  31:  mul r1.xyz, r0.wwww, v2.xyzx  
  32:  mad r0.xyz, -r1.xyzx, l(2.000000, 2.000000, 2.000000, 0.000000), r0.xyzx  
  33: endif  
  34: add r0.w, -r1.w, l(1.000000)  
  35: log r1.xyz, cb4[3].xyzx  
  36: mul r1.xyz, r1.xyzx, l(2.200000, 2.200000, 2.200000, 0.000000)  
  37: exp r1.xyz, r1.xyzx  
  38: mad r0.w, r0.w, cb4[4].x, cb4[5].x  
  39: mul_sat r1.xyz, r0.wwww, r1.xyzx  
  40: log r1.xyz, r1.xyzx  
  41: mul r1.xyz, r1.xyzx, l(0.454545, 0.454545, 0.454545, 0.000000)  
  42: exp r1.xyz, r1.xyzx  
  43: max r0.w, r1.z, r1.y  
  44: max r0.w, r0.w, r1.x  
  45: lt r0.w, l(0.200000), r0.w  
  46: movc r2.xyz, r0.wwww, r1.xyzx, l(0.120000, 0.120000, 0.120000, 0.000000)  
  47: add r2.xyz, -r1.xyzx, r2.xyzx  
  48: mad o2.xyz, v0.zzzz, r2.xyzx, r1.xyzx  
  49: lt r0.w, r1.w, l(0.330000)  
  50: mul r1.x, r1.w, l(0.950000)  
  51: movc r0.w, r0.w, r1.x, l(0.330000)  
  52: add r0.w, -r1.w, r0.w  
  53: mad o1.w, v0.z, r0.w, r1.w  
  54: lt r0.w, l(0), cb4[7].x  
  55: and o2.w, r0.w, l(0.064706)  
  56: dp3 r0.w, r0.xyzx, r0.xyzx  
  57: rsq r0.w, r0.w  
  58: mul r0.xyz, r0.wwww, r0.xyzx  
  59: max r0.w, abs(r0.y), abs(r0.x)  
  60: max r0.w, r0.w, abs(r0.z)  
  61: lt r1.xy, abs(r0.zyzz), r0.wwww  
  62: movc r1.yz, r1.yyyy, abs(r0.zzyz), abs(r0.zzxz)  
  63: movc r1.xy, r1.xxxx, r1.yzyy, abs(r0.yxyy)  
  64: lt r1.z, r1.y, r1.x  
  65: movc r1.xy, r1.zzzz, r1.xyxx, r1.yxyy  
  66: div r1.z, r1.y, r1.x  
  67: div r0.xyz, r0.xyzx, r0.wwww  
  68: sample_l(texture2d)(float,float,float,float) r0.w, r1.xzxx, t13.yzwx, s13, l(0)  
  69: mul r0.xyz, r0.wwww, r0.xyzx  
  70: mad o1.xyz, r0.xyzx, l(0.500000, 0.500000, 0.500000, 0.000000), l(0.500000, 0.500000, 0.500000, 0.000000)  
  71: mov o0.w, cb4[6].x  
  72: ret

The differences between this variant and previous one are:

a) lines 1, 19: interpolation parameter v0.z is multiplied by cb4[0].x from constant buffer, but this product is used only to interpolate albedo at line 19. For other output data, 'usual' v0.z is used.


b) lines 54-55: o2.w is now set under condition that ( cb4[7].x > 0.0 )

We already know this pattern "someComparison - and" from calculating luminance histogram from TW3, we can write this as:
 pout.RT2.w = (cb4_v7.x > 0.0) ? (16.5/255.0) : 0.0;  


c) lines 34-42: completely different calculation of specular.

There is no specular texture. Let's see assembly responsible for that:
  34: add r0.w, -r1.w, l(1.000000)   
  35: log r1.xyz, cb4[3].xyzx   
  36: mul r1.xyz, r1.xyzx, l(2.200000, 2.200000, 2.200000, 0.000000)   
  37: exp r1.xyz, r1.xyzx   
  38: mad r0.w, r0.w, cb4[4].x, cb4[5].x   
  39: mul_sat r1.xyz, r0.wwww, r1.xyzx   
  40: log r1.xyz, r1.xyzx   
  41: mul r1.xyz, r1.xyzx, l(0.454545, 0.454545, 0.454545, 0.000000)   
  42: exp r1.xyz, r1.xyzx   

Note we used here (1-reflectivity). Luckily, this is quite simple in HLSL:
   float oneMinusReflectivity = 1.0 - normalTex.a;  
   float3 specularTex = pow(cb4_v3.rgb, 2.2);  
   oneMinusReflectivity = oneMinusReflectivity * cb4_v4.x + cb4_v5.x;  
   specularTex = saturate(specularTex * oneMinusReflectivity);  
   specularTex = pow(specularTex, 1.0/2.2);  
   
   // proceed as in the first variant...  
   float specularMaxComponent = getMaxComponent( specularTex ); 
   ... 

On a side note, in this variant we have slightly larger constant buffer with material data. These extra values are used to emulate specular color here.

The rest of the shader is the same as in prevous variant.

72 lines of assembly is a little too much for WinMerge to display at once so just believe me it's almost the same assembly as in original. Or you can grab my HLSLexplorer and see it for yourself! ;)


Summary

...and if you've come this far, maybe you're willing to come a little further.

Nothing what seems simple is not in real life and feeding the gbuffer in The Witcher 3 is no exception. I've just shown you the simplest variants of pixel shaders responsible for it and some general observations which apply to deferred shading in general.

For the most patient ones (or vice versa) the two variants of pixel shaders @ pastebin:





Feel free to comment.

I hope you enjoyed it.
Thanks for reading!

No comments:

Post a Comment