sobota, 6 kwietnia 2019

Reverse engineering the rendering of The Witcher 3, part 13a - witcher senses (objects & intensity map)

Welcome back!

So far, almost every effect/technique explained in this series was not really Witcher 3-related. I mean, you can find things like tonemapping, vignette or calculating average luminance almost in every modern video game. Even drunk effect is something quite widespread.

That's why I decided to take a closer look at "witcher senses" rendering mechanics. Since Geralt is a witcher, his senses are much more sensitive comparing to an ordinary human. Therefore, he can see and hear more than other people which greatly helps him with solving his investigations. Witcher senses mechanics allows player to visualize these traces.

Here is a demonstration of this effect:

And one more, in better lighting:

As you can see, there are 2 types of objects: the ones Geralt can interact with (yellow outline) and traces related with investigating (red outline). Once Geralt investigates red trace, it can turn to yellow one (video #1). Please note that whole screen gets more grey-ish and a fish-eye effect is applied (video #2)

The effect is quite complex so I decided to split it across three blog posts.
In the first one I will describe selection of objects, in the second one - generation of outline and the third one will focus on final combining all this together.

Selection of objects

As I mentioned, there are two types of objects so we want to distinguish between them. In The Witcher 3 it's done by using stencil buffer. While generating GBuffer meshes which are meant to be marked as "traces" (red) are rendered with stencil = 8. Meshes which are marked with yellow color as "interesting" are rendered with stencil = 4.

For example, the following two textures show example frame with visible witcher senses and corresponding stencil buffer:



Stencil buffer - a brief refreshment

Stencil buffer is generally quite often used to identify meshes drawn by assigning the same ID to certain categories of meshes.

The idea is to use Always function with Replace operator once stencil test passes and Keep operator in other cases.

Here's how to implement it with D3D11:

 D3D11_DEPTH_STENCIL_DESC depthstencilState;  
 // Set depth parameters....  
   
 // Enable stencil  
 depthstencilState.StencilEnable = TRUE;  
   
 // Read & write all bits  
 depthstencilState.StencilReadMask = 0xFF;  
 depthstencilState.StencilWriteMask = 0xFF;  
   
 // Stencil operator for front face  
 depthstencilState.FrontFace.StencilFunc = D3D11_COMPARISON_ALWAYS;  
 depthstencilState.FrontFace.StencilDepthFailOp = D3D11_STENCIL_OP_KEEP;  
 depthstencilState.FrontFace.StencilFailOp = D3D11_STENCIL_OP_KEEP;  
 depthstencilState.FrontFace.StencilPassOp = D3D11_STENCIL_OP_REPLACE;  
   
 // Stencil operator for back face.  
 depthstencilState.BackFace.StencilFunc = D3D11_COMPARISON_ALWAYS;  
 depthstencilState.BackFace.StencilDepthFailOp = D3D11_STENCIL_OP_KEEP;  
 depthstencilState.BackFace.StencilFailOp = D3D11_STENCIL_OP_KEEP;  
 depthstencilState.BackFace.StencilPassOp = D3D11_STENCIL_OP_REPLACE;  
   
 pDevice->CreateDepthStencilState( &depthstencilState, &m_pDS_AssignValue );  


Stencil value to write to the buffer is passed as StencilRef via API call:

 // from now on set stencil buffer values to 8  
 pDevCon->OMSetDepthStencilState( m_pDS_AssignValue, 8 );  
 ...  
 pDevCon->DrawIndexed( ... );  

Rendering intensities

For this pass, in terms of implementation there is one R11G11B10_FLOAT full-screen texture which will be used to save interesting objects and traces in R and G channel, respectively.

In terms of intensity - why exactly do we need it? It turns out that Geralt's senses have, well, limitied radius so the particular object starts to be outlined when player is close enough.

See this aspect in action:




We start by clearing the intensity texture with black color.
Then two fullscreen draw calls are peformed: the first one for "traces" and the second one for interesting objects:


The first draw call is for traces - green channel:

And the second one for interesting objects - red channel:


Okay, but how we distinguish here which pixels will be considered? We have to use stencil buffer!
During each of these calls a stencil test is performed to accept only these pixels which were marked before with "8" (first Draw call) or "4".

Visualization of stencil test for traces:

...and for interesting objects:

How the test is performed in this case? For basics about stencil test, here is good blog post about it. The general stencil test formula is as follows:
 if (StencilRef & StencilReadMask OP StencilValue & StencilReadMask)  
   accept pixel  
 else  
   discard pixel  

where:
StencilRef is value passed with API call,
StencilReadMask is a mask used to read stencil value (note it's present on both left and right side),
OP is operator used for comparing, it's set through API,
StencilValue is value of stencil buffer in currently processed pixel.

It's important to be aware that we use binary ANDs to calculate operands.

Knowing the basics, let's see settings used during these drawcalls:
Stencil state for traces

Stencil state for interesing objects

Ha! As we can see, ReadMask is the only difference. Let's try it! Let's subsitute these values to stencil test equation:
 Let StencilReadMask = 0x08 and StencilRef = 0:  
   
 For a pixel with stencil = 8:  
 0 & 0x08 < 8 & 0x08  
 0 < 8
 TRUE  
   
 For a pixel with stencil = 4:  
 0 & 0x08 < 4 & 0x08  
 0 < 0  
 FALSE  
   

Ha! Clever. As you can see, in this scenario we don't compare stencil value but rather we check if particular bit of stencil buffer value is set on. Every pixel of stencil buffer is uint8, so we have [0-255].

Side note: All DrawIndexed(36) calls are related with rendering footsteps as traces, so the final look of intensity map in this particular frame is:



But before stencil test there is a pixel shader. Both 28738 and 28748 use the same pixel shader:
 ps_5_0  
    dcl_globalFlags refactoringAllowed  
    dcl_constantbuffer cb0[2], immediateIndexed  
    dcl_constantbuffer cb3[8], immediateIndexed  
    dcl_constantbuffer cb12[214], immediateIndexed  
    dcl_sampler s15, mode_default  
    dcl_resource_texture2d (float,float,float,float) t15  
    dcl_input_ps_siv v0.xy, position  
    dcl_output o0.xyzw  
    dcl_output o1.xyzw  
    dcl_output o2.xyzw  
    dcl_output o3.xyzw  
    dcl_temps 2  
   0: mul r0.xy, v0.xyxx, cb0[1].zwzz  
   1: sample_indexable(texture2d)(float,float,float,float) r0.x, r0.xyxx, t15.xyzw, s15  
   2: mul r1.xyzw, v0.yyyy, cb12[211].xyzw  
   3: mad r1.xyzw, cb12[210].xyzw, v0.xxxx, r1.xyzw  
   4: mad r0.xyzw, cb12[212].xyzw, r0.xxxx, r1.xyzw  
   5: add r0.xyzw, r0.xyzw, cb12[213].xyzw  
   6: div r0.xyz, r0.xyzx, r0.wwww  
   7: add r0.xyz, r0.xyzx, -cb3[7].xyzx  
   8: dp3 r0.x, r0.xyzx, r0.xyzx  
   9: sqrt r0.x, r0.x  
  10: mul r0.y, r0.x, l(0.120000)  
  11: log r1.x, abs(cb3[6].y)  
  12: mul r1.xy, r1.xxxx, l(2.800000, 0.800000, 0.000000, 0.000000)  
  13: exp r1.xy, r1.xyxx  
  14: mad r0.zw, r1.xxxy, l(0.000000, 0.000000, 120.000000, 120.000000), l(0.000000, 0.000000, 1.000000, 1.000000)  
  15: lt r1.x, l(0.030000), cb3[6].y  
  16: movc r0.xy, r1.xxxx, r0.yzyy, r0.xwxx  
  17: div r0.x, r0.x, r0.y  
  18: log r0.x, r0.x  
  19: mul r0.x, r0.x, l(1.600000)  
  20: exp r0.x, r0.x  
  21: add r0.x, -r0.x, l(1.000000)  
  22: max r0.x, r0.x, l(0)  
  23: mul o0.xyz, r0.xxxx, cb3[0].xyzx  
  24: mov o0.w, cb3[0].w  
  25: mov o1.xyzw, cb3[1].xyzw  
  26: mov o2.xyzw, cb3[2].xyzw  
  27: mov o3.xyzw, cb3[3].xyzw  
  28: ret  

This pixel shader writes to only one render target, so lines 24-27 are redundant.

The first thing which takes place here is sampling depth (with point clamp sampler), line 1. This value is used to reconstruct world position by multiplication with special matrix and, after that, perspective division (lines 2-6).

Having Geralt's position ( cb3[7].xyz - please note this is not a camera position! ) distance from Geralt to this particular point is calculated (lines 7-9).

Inputs which are important for this shader are:
- cb3[0].rgb - color of output. This can be float3(0, 1, 0) (traces) or float3(1, 0, 0)  (interesting objects),
- cb3[6].y - distance scaling factor. This directly affects radius and intensity of final output.

Later we have a bit tricky formulas to calculate intensity depending on distance from Geralt to object. My guess is that all coefficents were selected experimentally.
Final output is color*intensity.


The HLSL would be something like this:
 struct FSInput  
 {  
      float4 param0 : SV_Position;  
 };  
   
 struct FSOutput  
 {  
      float4 param0 : SV_Target0;  
      float4 param1 : SV_Target1;  
      float4 param2 : SV_Target2;  
      float4 param3 : SV_Target3;  
 };  
   
 float3 getWorldPos( float2 screenPos, float depth )  
 {  
   float4 worldPos = float4(screenPos, depth, 1.0);  
   worldPos = mul( worldPos, screenToWorld );  
     
   return worldPos.xyz / worldPos.w;  
 }  
   
 FSOutput EditedShaderPS(in FSInput IN)  
 {  
   // * Inputs    
   // Directly affects radius of the effect  
   float distanceScaling = cb3_v6.y;  
     
   // Color of output at the end  
   float3 color = cb3_v0.rgb;  
        

   // Sample depth  
   float2 uv = IN.param0.xy * cb0_v1.zw;  
   float depth = texture15.Sample( sampler15, uv ).x;  
     
   // Reconstruct world position  
   float3 worldPos = getWorldPos( IN.param0.xy, depth );  
   
   // Calculate distance from Geralt to world position of particular object  
   float dist_geraltToWorld = length( worldPos - cb3_v7.xyz );  
     
   // Calculate two squeezing params  
   float t0 = 1.0 + 120*pow( abs(distanceScaling), 2.8 );  
   float t1 = 1.0 + 120*pow( abs(distanceScaling), 0.8 );  
     
   // Determine nominator and denominator  
   float2 params;  
   params = (distanceScaling > 0.03) ? float2(dist_geraltToWorld * 0.12, t0) : float2(dist_geraltToWorld, t1);  
     
   // Distance Geralt <-> Object  
   float nominator = params.x;   
     
   // Hiding factor  
   float denominator = params.y;  
     
   // Raise to power of 1.6  
   float param = pow( params.x / params.y, 1.6 );  
     
   // Calculate final intensity  
   float intensity = max(0.0, 1.0 - param );   
     
     
   // * Final outputs.  
   // *  
   // * This PS outputs only one color, the rest  
   // * is redundant. I just added this to keep 1-1 ratio with  
   // * original assembly.  
   FSOutput OUT = (FSOutput)0;  
   OUT.param0.xyz = color * intensity;  
     
   // == redundant ==  
   OUT.param0.w = cb3_v0.w;  
   OUT.param1 = cb3_v1;  
   OUT.param2 = cb3_v2;  
   OUT.param3 = cb3_v3;  
   // ===============  
   
   return OUT;  
 }  

And small comparison between original (left) and my (right) shader assembly.

This was the first stage of witcher senses effect. Actually, the easiest one.
Go to the second one here.

Feel free to comment and share.
Thanks for reading!

PS. This is my first post as BSc in Computer Science. Feels great! :)

Brak komentarzy:

Prześlij komentarz