sobota, 6 lipca 2019

HLSLexplorer 1.01 - UI improvements and D3D12 rendering backend

Hi,

A new version of HLSLexplorer, 1.01 is done and available to download from here (Google Drive). The executable is working on Windows 10 and was built against Windows 10 SDK 10.0.17763.0.
If you would to like to play with source, it's on GitHub.



Let's go briefly through the most important changes:

1) Recent Files List

Quite self-explanatory. I've been wasting too much time because of looking for a particular files so I did recent files list. Saved as recent.dat in HLSLexplorer's main directory. So far there is a limit of 8 files, but you can easily change it in source. There is also a fancy separator and an option to clear recents (no, it doesn't ask you to confirm).

2) Go to Line

I've always liked Go To Line dialog from Visual Studio. Simple and very intuitive. So, I did the same for HLSLexplorer, because why not. It's useful for shaders since compiler usually gives you hints which lines are worth to check for problems. Just click on text window which is of your interest (HLSL or any of three assembly outputs) and Go to Line will know where to look. Of course, Ctrl+G works.



3) D3D12 backend for rendering

HLSLexplorer has just received support for D3D12 renderer backend. Well, finally, one could say.
It doesn't mean that D3D11 is gone. Because what real-time Pixel Shader preview does is just rendering of fullscreen triangle, I came up with simple IRenderer interface which nicely wraps all needed functionality.

Despite D3D12 backend is still really basic, I still found it fun to write because I have started learning D3D12 not so long ago, so that was a good opportunity to learn something new.

The new backend currently works with 2D textures only and is very error-prone. F7 opens PS preview with D3D11, F8 opens one with D3D12.

D3D12 renderer supports pixel shaders for both Shader Model 5.0 and 6.0+. So yeah, you can now visualize wave intrinsics! 馃槑


4) Minor stuff

I got rid of all DirectXTK stuff since everything I needed was just loading textures.
Luckily, there are a standalone versions of DDSTextureLoader and WICTextureLoader, for both D3D11 and D3D12. You can grab them from DirectXTex github.

Real-Time Pixel Shader Preview informs now which API is used and what is the resolution of backbuffer.


Conclusion

These are the most important changes I introduced in the latest version of HLSLexplorer.
Feedback is always welcome :)


Thanks for reading. Take care!

niedziela, 16 czerwca 2019

Reverse engineering the rendering of The Witcher 3, part 15 - fog

Fog can be implemented in many ways. However, times when we could apply simple distance-based fog and be happy with it are gone forever (probably). Living in a land of programmable shaders opened us doors to new, crazy, but most importantly - physically-correct and visually plausible solutions.

Current trend in fog rendering is probably based on compute shaders (see this presentation by Bart艂omiej Wro艅ski for more details). 

Despite the mentioned presentation has appeared in 2014 and The Witcher 3 was released in 2015/2016 - fog in the last part of Geralt's adventures is completely screen-based, as typical postprocess.

Before we start our 15th reverse engineering session in this series, I must say that I've been trying to understand the fog in The Witcher 3 at least 5 times in the last year - and each time I failed. The assembly, as you'll see in a mnute is quite complex which makes process of creating readable HLSL fog shader I won't be ashamed of almost impossible.

However, I managed to found in the internet a fog shader which almost immediately caught my attention by similarity to The Witcher 3 in terms of variable names and general instructions order. That shader was not *exactly* the same as in the game so I had to work on it a little bit. My point is, most of HLSL you'll see here is not created/reversed by me with two exceptions. Keep that in mind.


Here is assembly of pixel shader for fog - it's worth noticing it's the same for whole game (2015 base and both DLCs):
 ps_5_0  
    dcl_globalFlags refactoringAllowed  
    dcl_constantbuffer cb3[2], immediateIndexed  
    dcl_constantbuffer cb12[214], immediateIndexed  
    dcl_resource_texture2d (float,float,float,float) t0  
    dcl_resource_texture2d (float,float,float,float) t1  
    dcl_resource_texture2d (float,float,float,float) t2  
    dcl_input_ps_siv v0.xy, position  
    dcl_output o0.xyzw  
    dcl_temps 7  
   0: ftou r0.xy, v0.xyxx  
   1: mov r0.zw, l(0, 0, 0, 0)  
   2: ld_indexable(texture2d)(float,float,float,float) r1.x, r0.xyww, t0.xyzw  
   3: mad r1.y, r1.x, cb12[22].x, cb12[22].y  
   4: lt r1.y, r1.y, l(1.000000)  
   5: if_nz r1.y  
   6:  utof r1.yz, r0.xxyx  
   7:  mul r2.xyzw, r1.zzzz, cb12[211].xyzw  
   8:  mad r2.xyzw, cb12[210].xyzw, r1.yyyy, r2.xyzw  
   9:  mad r1.xyzw, cb12[212].xyzw, r1.xxxx, r2.xyzw  
  10:  add r1.xyzw, r1.xyzw, cb12[213].xyzw  
  11:  div r1.xyz, r1.xyzx, r1.wwww  
  12:  ld_indexable(texture2d)(float,float,float,float) r2.xyz, r0.xyww, t1.xyzw  
  13:  ld_indexable(texture2d)(float,float,float,float) r0.x, r0.xyzw, t2.xyzw  
  14:  max r0.x, r0.x, cb3[1].x  
  15:  add r0.yzw, r1.xxyz, -cb12[0].xxyz  
  16:  dp3 r1.x, r0.yzwy, r0.yzwy  
  17:  sqrt r1.x, r1.x  
  18:  add r1.y, r1.x, -cb3[0].x  
  19:  add r1.zw, -cb3[0].xxxz, cb3[0].yyyw  
  20:  div_sat r1.y, r1.y, r1.z  
  21:  mad r1.y, r1.y, r1.w, cb3[0].z  
  22:  add r0.x, r0.x, l(-1.000000)  
  23:  mad r0.x, r1.y, r0.x, l(1.000000)  
  24:  div r0.yzw, r0.yyzw, r1.xxxx  
  25:  mad r1.y, r0.w, cb12[22].z, cb12[0].z  
  26:  add r1.x, r1.x, -cb12[22].z  
  27:  max r1.x, r1.x, l(0)  
  28:  min r1.x, r1.x, cb12[42].z  
  29:  mul r1.z, r0.w, r1.x  
  30:  mul r1.w, r1.x, cb12[43].x  
  31:  mul r1.zw, r1.zzzw, l(0.000000, 0.000000, 0.062500, 0.062500)  
  32:  dp3 r0.y, cb12[38].xyzx, r0.yzwy  
  33:  add r0.z, r0.y, cb12[42].x  
  34:  add r0.w, cb12[42].x, l(1.000000)  
  35:  div_sat r0.z, r0.z, r0.w  
  36:  add r0.w, -cb12[43].z, cb12[43].y  
  37:  mad r0.z, r0.z, r0.w, cb12[43].z  
  38:  mul r0.w, abs(r0.y), abs(r0.y)  
  39:  mad_sat r2.w, r1.x, l(0.002000), l(-0.300000)  
  40:  mul r0.w, r0.w, r2.w  
  41:  lt r0.y, l(0), r0.y  
  42:  movc r3.xyz, r0.yyyy, cb12[39].xyzx, cb12[41].xyzx  
  43:  add r3.xyz, r3.xyzx, -cb12[40].xyzx  
  44:  mad r3.xyz, r0.wwww, r3.xyzx, cb12[40].xyzx  
  45:  movc r4.xyz, r0.yyyy, cb12[45].xyzx, cb12[47].xyzx  
  46:  add r4.xyz, r4.xyzx, -cb12[46].xyzx  
  47:  mad r4.xyz, r0.wwww, r4.xyzx, cb12[46].xyzx  
  48:  ge r0.y, r1.x, cb12[48].y  
  49:  if_nz r0.y  
  50:   add r0.y, r1.y, cb12[42].y  
  51:   mul r0.w, r0.z, r0.y  
  52:   mul r1.y, r0.z, r1.z  
  53:   mad r5.xyzw, r1.yyyy, l(16.000000, 15.000000, 14.000000, 13.000000), r0.wwww  
  54:   max r5.xyzw, r5.xyzw, l(0, 0, 0, 0)  
  55:   add r5.xyzw, r5.xyzw, l(1.000000, 1.000000, 1.000000, 1.000000)  
  56:   div_sat r5.xyzw, r1.wwww, r5.xyzw  
  57:   add r5.xyzw, -r5.xyzw, l(1.000000, 1.000000, 1.000000, 1.000000)  
  58:   mul r1.z, r5.y, r5.x  
  59:   mul r1.z, r5.z, r1.z  
  60:   mul r1.z, r5.w, r1.z  
  61:   mad r5.xyzw, r1.yyyy, l(12.000000, 11.000000, 10.000000, 9.000000), r0.wwww  
  62:   max r5.xyzw, r5.xyzw, l(0, 0, 0, 0)  
  63:   add r5.xyzw, r5.xyzw, l(1.000000, 1.000000, 1.000000, 1.000000)  
  64:   div_sat r5.xyzw, r1.wwww, r5.xyzw  
  65:   add r5.xyzw, -r5.xyzw, l(1.000000, 1.000000, 1.000000, 1.000000)  
  66:   mul r1.z, r1.z, r5.x  
  67:   mul r1.z, r5.y, r1.z  
  68:   mul r1.z, r5.z, r1.z  
  69:   mul r1.z, r5.w, r1.z  
  70:   mad r5.xyzw, r1.yyyy, l(8.000000, 7.000000, 6.000000, 5.000000), r0.wwww  
  71:   max r5.xyzw, r5.xyzw, l(0, 0, 0, 0)  
  72:   add r5.xyzw, r5.xyzw, l(1.000000, 1.000000, 1.000000, 1.000000)  
  73:   div_sat r5.xyzw, r1.wwww, r5.xyzw  
  74:   add r5.xyzw, -r5.xyzw, l(1.000000, 1.000000, 1.000000, 1.000000)  
  75:   mul r1.z, r1.z, r5.x  
  76:   mul r1.z, r5.y, r1.z  
  77:   mul r1.z, r5.z, r1.z  
  78:   mul r1.z, r5.w, r1.z  
  79:   mad r5.xy, r1.yyyy, l(4.000000, 3.000000, 0.000000, 0.000000), r0.wwww  
  80:   max r5.xy, r5.xyxx, l(0, 0, 0, 0)  
  81:   add r5.xy, r5.xyxx, l(1.000000, 1.000000, 0.000000, 0.000000)  
  82:   div_sat r5.xy, r1.wwww, r5.xyxx  
  83:   add r5.xy, -r5.xyxx, l(1.000000, 1.000000, 0.000000, 0.000000)  
  84:   mul r1.z, r1.z, r5.x  
  85:   mul r1.z, r5.y, r1.z  
  86:   mad r0.w, r1.y, l(2.000000), r0.w  
  87:   max r0.w, r0.w, l(0)  
  88:   add r0.w, r0.w, l(1.000000)  
  89:   div_sat r0.w, r1.w, r0.w  
  90:   add r0.w, -r0.w, l(1.000000)  
  91:   mul r0.w, r0.w, r1.z  
  92:   mad r0.y, r0.y, r0.z, r1.y  
  93:   max r0.y, r0.y, l(0)  
  94:   add r0.y, r0.y, l(1.000000)  
  95:   div_sat r0.y, r1.w, r0.y  
  96:   add r0.y, -r0.y, l(1.000000)  
  97:   mad r0.y, -r0.w, r0.y, l(1.000000)  
  98:   add r0.z, r1.x, -cb12[48].y  
  99:   mul_sat r0.z, r0.z, cb12[48].z  
  100:  else  
  101:   mov r0.yz, l(0.000000, 1.000000, 0.000000, 0.000000)  
  102:  endif  
  103:  log r0.y, r0.y  
  104:  mul r0.w, r0.y, cb12[42].w  
  105:  exp r0.w, r0.w  
  106:  mul r0.y, r0.y, cb12[48].x  
  107:  exp r0.y, r0.y  
  108:  mul r0.yw, r0.yyyw, r0.zzzz  
  109:  mad_sat r1.xy, r0.wwww, cb12[189].xzxx, cb12[189].ywyy  
  110:  add r5.xyz, -r3.xyzx, cb12[188].xyzx  
  111:  mad r5.xyz, r1.xxxx, r5.xyzx, r3.xyzx  
  112:  add r0.z, cb12[188].w, l(-1.000000)  
  113:  mad r0.z, r1.y, r0.z, l(1.000000)  
  114:  mul_sat r5.w, r0.z, r0.w  
  115:  lt r0.z, l(0), cb12[192].x  
  116:  if_nz r0.z  
  117:   mad_sat r1.xy, r0.wwww, cb12[191].xzxx, cb12[191].ywyy  
  118:   add r6.xyz, -r3.xyzx, cb12[190].xyzx  
  119:   mad r3.xyz, r1.xxxx, r6.xyzx, r3.xyzx  
  120:   add r0.z, cb12[190].w, l(-1.000000)  
  121:   mad r0.z, r1.y, r0.z, l(1.000000)  
  122:   mul_sat r3.w, r0.z, r0.w  
  123:   add r1.xyzw, -r5.xyzw, r3.xyzw  
  124:   mad r5.xyzw, cb12[192].xxxx, r1.xyzw, r5.xyzw  
  125:  endif  
  126:  mul r0.z, r0.x, r5.w  
  127:  mul r0.x, r0.x, r0.y  
  128:  dp3 r0.y, l(0.333000, 0.555000, 0.222000, 0.000000), r2.xyzx  
  129:  mad r1.xyz, r0.yyyy, r4.xyzx, -r2.xyzx  
  130:  mad r0.xyw, r0.xxxx, r1.xyxz, r2.xyxz  
  131:  add r1.xyz, -r0.xywx, r5.xyzx  
  132:  mad r0.xyz, r0.zzzz, r1.xyzx, r0.xywx  
  133: else  
  134:  mov r0.xyz, l(0, 0, 0, 0)  
  135: endif  
  136: mov o0.xyz, r0.xyzx  
  137: mov o0.w, l(1.000000)  
  138: ret  
   

This shader is quite long, let's be honest. Probably too long for effective reverse engineering process ;)


Here is an example sunset scene with fog:


Let's take a look at inputs:

In terms of textures, we have depth buffer, Ambient Occlusion and HDR color buffer
Input depth buffer


Input ambient occlusion


Input HDR color buffer

... and the result of fog shader in this scene looks like this:

HDR texture after applying fog

Depth buffer is used to reconstruct world position. This is a common pattern in shaders of The Witcher 3.

Having ambient occlusion data (if enabled) allows us to darken fog. Very smart idea, seems maybe obvious but I've never thinked of it in that way. I'll go back to this aspect later.

The shader begins with determining if pixel is not on sky. In case when pixel lies on sky (depth == 1.0) - shader returns black color. If a pixel is on scene (depth < 1.0), we reconstruct world position using depth buffer (lines 7-11) and process with calculation of fog.

The fog pass takes place shortly after deferred shading process. You can see that some of forward-pass specific elements are missing yet. In this particular scene, deferred light volumes were applied and, after that, we rendered hair/face/eyes of Geralt.

The first thing to know about fog in The Witcher 3 is that it consists of two parts: "fog color" and "aerial color".
 struct FogResult  
 {  
    float4 paramsFog;     // RGB: color, A: influence  
    float4 paramsAerial;  // RGB: color, A: influence  
 };  

For each part, there are 3 colors provided: front, middle and back. So we have in constant buffer data such as "FogColorFront", "FogColorMiddle", "AerialColorBack" etc. See the inputs:



   // *** Inputs *** //  
   float3 FogSunDir = cb12_v38.xyz;  
   float3 FogColorFront = cb12_v39.xyz;  
   float3 FogColorMiddle = cb12_v40.xyz;  
   float3 FogColorBack = cb12_v41.xyz;  
     
   float4 FogBaseParams = cb12_v42;  
   float4 FogDensityParamsScene = cb12_v43;  
   float4 FogDensityParamsSky = cb12_v44;  
     
   float3 AerialColorFront = cb12_v45.xyz;  
   float3 AerialColorMiddle = cb12_v46.xyz;  
   float3 AerialColorBack = cb12_v47.xyz;  
   float4 AerialParams = cb12_v48; 


Before calculating final colors, we need to calculate some vectors and dot products. Shader has access to pixel's world position, camera position (cb12[0].xyz) and fog/light direction (cb12[38].xyz). This allows us to calculate dot product between view vector and fog direction.
   float3 frag_vec = fragPosWorldSpace.xyz - customCameraPos.xyz;  
   float frag_dist = length(frag_vec);  
     
   float3 frag_dir = frag_vec / frag_dist;  
   
   float dot_fragDirSunDir = dot(GlobalLightDirection.xyz, frag_dir);  


Calculating mixing gradient involves using square absolute dot product, and then again multiply the result with some distance-based parameter:
   float3 curr_col_fog;  
   float3 curr_col_aerial;  
   {  
     float _dot = dot_fragDirSunDir;  
   
     float _dd = _dot;  
     {  
       const float _distOffset = -150;  
       const float _distRange = 500;  
       const float _mul = 1.0 / _distRange;  
       const float _bias = _distOffset * _mul;  
   
       _dd = abs(_dd);  
       _dd *= _dd;  
       _dd *= saturate( frag_dist * _mul + _bias );  
     }  
   
     curr_col_fog = lerp( FogColorMiddle.xyz, (_dot>0.0f ? FogColorFront.xyz : FogColorBack.xyz), _dd );  
     curr_col_aerial = lerp( AerialColorMiddle.xyz, (_dot>0.0f ? AerialColorFront.xyz : AerialColorBack.xyz), _dd );  
   }  

This code block clearly gives us notion where the heck that 0.002 and -0.300 come from. As you can see, dot product between view and light vectors is responsible for choice between 'front' and 'back' colors. Clever!

Here is a visualization of final gradient (_dd).


Caculating of aerial/fog influence is much more complicated, though. As you can see, we have more parameters than just rgb colors. They include e.g. density of scene. We use ray marching (16 steps, that's why the loop can be unrolled) to determine amount of fog and scale factor:

Having a [camera--->world] vector, we can divide all its components by 16 - this will be our ray marching step. As you can see below, only .z component (height) is considered in caluclations (curr_pos_z_step).

You can read more about raymarched fog, for instance, here.

   float fog_amount = 1;  
   float fog_amount_scale = 0;  
   [branch]  
   if ( frag_dist >= AerialParams.y )  
   {  
     float curr_pos_z_base = (customCameraPos.z + FogBaseParams.y) * density_factor;  
     float curr_pos_z_step = frag_step.z * density_factor;  
   
     [unroll]  
     for ( int i=16; i>0; --i )  
     {  
       fog_amount *= 1 - saturate( density_sample_scale / (1 + max( 0.0, curr_pos_z_base + (i) * curr_pos_z_step ) ) );  
     }  
   
     fog_amount = 1 - fog_amount;  
     fog_amount_scale = saturate( (frag_dist - AerialParams.y) * AerialParams.z );  
   }  
   
   FogResult ret;  
   
   ret.paramsFog = float4 ( curr_col_fog, fog_amount_scale * pow( abs(fog_amount), final_exp_fog ) );  
   ret.paramsAerial = float4 ( curr_col_aerial, fog_amount_scale * pow( abs(fog_amount), final_exp_aerial ) );  

Amount of fog clearly depends on height (.z component), at the end fog amount is raised to fog/aerial power.

final_exp_fog and final_exp_aerial are from constant buffer and they allow to control how fog and aerial colors influences world as height raises.

Fog Override

The shader I found did not include this fragment of assembly:
  109:  mad_sat r1.xy, r0.wwww, cb12[189].xzxx, cb12[189].ywyy  
  110:  add r5.xyz, -r3.xyzx, cb12[188].xyzx  
  111:  mad r5.xyz, r1.xxxx, r5.xyzx, r3.xyzx  
  112:  add r0.z, l(-1.000000), cb12[188].w  
  113:  mad r0.z, r1.y, r0.z, l(1.000000)  
  114:  mul_sat r5.w, r0.w, r0.z  
  115:  lt r0.z, l(0.000000), cb12[192].x  
  116:  if_nz r0.z  
  117:   mad_sat r1.xy, r0.wwww, cb12[191].xzxx, cb12[191].ywyy  
  118:   add r6.xyz, -r3.xyzx, cb12[190].xyzx  
  119:   mad r3.xyz, r1.xxxx, r6.xyzx, r3.xyzx  
  120:   add r0.z, l(-1.000000), cb12[190].w  
  121:   mad r0.z, r1.y, r0.z, l(1.000000)  
  122:   mul_sat r3.w, r0.w, r0.z  
  123:   add r1.xyzw, -r5.xyzw, r3.xyzw  
  124:   mad r5.xyzw, cb12[192].xxxx, r1.xyzw, r5.xyzw  
  125:  endif   

From what I managed to understand this looks like a double override of fog color and influence:
For most of the time, there is only one override (cb12_v192.x is 0.0) but in this particular case - its value is ~0.22, so we perform second override.



 #ifdef OVERRIDE_FOG  
     
   // Override  
   float fog_influence = ret.paramsFog.w; // r0.w  
   
   float override1ColorScale = cb12_v189.x;  
   float override1ColorBias = cb12_v189.y;  
   float3 override1Color = cb12_v188.rgb;  
     
   float override1InfluenceScale = cb12_v189.z;  
   float override1InfluenceBias = cb12_v189.w;  
   float override1Influence = cb12_v188.w;  
     
   float override1ColorAmount = saturate(fog_influence * override1ColorScale + override1ColorBias);  
   float override1InfluenceAmount = saturate(fog_influence * override1InfluenceScale + override1InfluenceBias);    
     

   float4 paramsFogOverride;  
   paramsFogOverride.rgb = lerp(curr_col_fog, override1Color, override1ColorAmount ); // ***r5.xyz   
     
   float param1 = lerp(1.0, override1Influence, override1InfluenceAmount); // r0.x  
   paramsFogOverride.w = saturate(param1 * fog_influence ); // ** r5.w  
   
     
   const float extraFogOverride = cb12_v192.x;  
     
   [branch]   
   if (extraFogOverride > 0.0)  
   {  
     float override2ColorScale = cb12_v191.x;  
     float override2ColorBias = cb12_v191.y;  
     float3 override2Color = cb12_v190.rgb;  
     
     float override2InfluenceScale = cb12_v191.z;  
     float override2InfluenceBias = cb12_v191.w;  
     float override2Influence = cb12_v190.w;  
       
     float override2ColorAmount = saturate(fog_influence * override2ColorScale + override2ColorBias);  
     float override2InfluenceAmount = saturate(fog_influence * override2InfluenceScale + override2InfluenceBias);  
      

     float4 paramsFogOverride2;  
     paramsFogOverride2.rgb = lerp(curr_col_fog, override2Color, override2ColorAmount); // r3.xyz   
           
     float ov_param1 = lerp(1.0, override2Influence, override2InfluenceAmount); // r0.z  
     paramsFogOverride2.w = saturate(ov_param1 * fog_influence); // r3.w  
   
     paramsFogOverride = lerp(paramsFogOverride, paramsFogOverride2, extraFogOverride);  
   
   }  
   ret.paramsFog = paramsFogOverride;  
     
 #endif   

Here is our final scene without fog override (first image), single override (second image) and double override (third image, final result):




Adjusting ambient occlusion

The shader I found also did not use ambient occlusion at all. Let's take a look at AO texture again and code which is of our interest:

  13:  ld_indexable(texture2d)(float,float,float,float) r0.x, r0.xyzw, t2.xyzw  
  14:  max r0.x, r0.x, cb3[1].x  
  15:  add r0.yzw, r1.xxyz, -cb12[0].xxyz  
  16:  dp3 r1.x, r0.yzwy, r0.yzwy  
  17:  sqrt r1.x, r1.x  
  18:  add r1.y, r1.x, -cb3[0].x  
  19:  add r1.zw, -cb3[0].xxxz, cb3[0].yyyw  
  20:  div_sat r1.y, r1.y, r1.z  
  21:  mad r1.y, r1.y, r1.w, cb3[0].z  
  22:  add r0.x, r0.x, l(-1.000000)  
  23:  mad r0.x, r1.y, r0.x, l(1.000000)  

Maybe this scene is not the best example because there is no detail at the distant island. Anyway, let's take a look at constant buffer which is used for adjusting ambient occlusion value:


We start by loading AO from texture, and then we perform max instruction. In this scene cb3_v1.x is really high (0.96888) which pretty makes our AO very, very subtle.

The next portion of code calculates distance between camera and pixels' world position.

I believe the code sometimes speak for itself, so see HLSL which performs most of this adjusting:
 float AdjustAmbientOcclusion(in float inputAO, in float worldToCameraDistance)  
 {  
   // *** Inputs *** //  
   const float aoDistanceStart = cb3_v0.x;  
   const float aoDistanceEnd = cb3_v0.y;  
   const float aoStrengthStart = cb3_v0.z;  
   const float aoStrengthEnd = cb3_v0.w;  
      
   // * Adjust AO  
   float aoDistanceIntensity = linstep( aoDistanceStart, aoDistanceEnd, worldToCameraDistance );  
   float aoStrength = lerp(aoStrengthStart, aoStrengthEnd, aoDistanceIntensity);   
   float adjustedAO = lerp(1.0, inputAO, aoStrength);  
     
   return adjustedAO;   
 }  

The calculated camera-world distance is used for linstep function. We know this function already - it appeared in cirrus clouds shader from The Witcher 3.

As you can see, in constant buffer we have start/end distance values for AO. The output from linstep affects strength of AO (also from cbuffer) - and strength affects output value of AO.


A quick example: A pixel is far away, let's say the distance is equal to 500.

linstep returns 1.0;
aoStrength is equal to aoStrengthEnd;
This results in returning AO which is about 77% (end strength) of input value.

The input AO for this function was previously max-ed.

Combining all this together

Once we have color and influence for both fog color and aerial color, it's time for final combine.

We start by attenuation of influence using adjusted AO:

   ...
   FogResult fog = CalculateFog( worldPos, CameraPosition, fogStart, ao, false );  
      
   // Apply AO to influence  
   fog.paramsFog.w *= ao;  
   fog.paramsAerial.w *= ao;  
      
   // Mix fog with scene color  
   outColor = ApplyFog(fog, colorHDR);  


Okay, the whole magic happens in ApplyFog function:

 float3 ApplyFog(FogResult fog, float3 color)  
 {  
   const float3 LuminanceFactors = float3(0.333f, 0.555f, 0.222f);  
   
   float3 aerialColor = dot(LuminanceFactors, color) * fog.paramsAerial.xyz;  
   color = lerp(color, aerialColor, fog.paramsAerial.w);  
   color = lerp(color, fog.paramsFog.xyz, fog.paramsFog.w);  
    
   return color.xyz;  
 }  

At first we calculate "luminance" of pixels:

Then, we multiply it with aerial color:

Then, we combine HDR color with aerial one:


The last step is combining the intermediate result with fog color:


And that's all :)

Some debug screenshots:

Aerial influence

Aerial color

Fog influence

Fog color

Final scene without fog at all

Final scene with aerial fog only

Final scene - main fog only

Final scene with all fog - again, but for easier comparison

Summary

I believe you can get the most out of it if you take a look at the shader - it's here
I'm happy to say that this shader is exactly the same as original one - which makes me really happy :)

In general, final result strongly depends of values passed to shade This is not a "magic" solution which gives perfect color output but rather requires a lot of iterations and tweaking by artists to make the final result look decent. I think this can be long process, but once you do it, the final result is really convincing - just like this sunset scene.
Sky shader from The Witcher 3 also uses these fog calculations in order to make smooth color transition around horizon. However, for sky shader a different set of density coefficients is provided.

As a reminder - most of the shader (except of adjusting AO and overriding) was not created/reversed by me. All kudos and praises go to CD PROJEKT RED. Please support them, they are doing great job.

Thanks for reading!

sobota, 11 maja 2019

Reverse engineering the rendering of The Witcher 3, part 14 - cirrus clouds

When it comes to being outdoor, sky is one of these aspects which decide if the world in a game is believable. Think about that for a while - sky literally takes, let's say, 40-50% of whole screen for most of the time. Sky is a lot more than just nice gradient. We have stars, the Sun, the Moon and, finally, clouds.

While the current trend apparently is rendering clouds in volumetric way using raymarching (see this one), clouds in The Witcher 3 are completely textures-based. I have been looking at them since some time already but, obviously, things are more complicated than I initially expected. If you have been following the series you know that there are differences between "Blood and Wine" addon and the rest. And guess what - there are some changes in terms of clouds in B&W too.

There are a few layers of clouds in The Witcher 3. Depending of current weather, we can have only cirrus cloudsaltocumulus, maybe a few from stratus family (during storm, for instance). Or, what the heck, we can have nothing at all.

Some layers vary in terms of input textures and shaders used to render them. This affects their complexity and length of pixel shader assembly (obviously).

Despite all this diversity, there are some common patterns we can observe in clouds rendering of The Witcher 3. First of all, they are all rendered in forward pass which is absolutely the right choice. All of them use blending (see below). This way it's much easier to control how particular layer covers sky - alpha value from pixel shader affects it.
What's more interesting, some layers are rendered twice with the same settings.

After an evaluation I picked up the shortest shader I could find - in order to (1) have the largest probability of completely reverse engineering of it, (2) to be able to understand all aspect of it.
I'll take a closer look at cirrus clouds from The Witcher 3: Blood and Wine.

Here is an example frame:
Before rendering
After first rendering pass
After second rendering pass

In this particular frame cirrus clouds are the first layer being rendered. As you can see, it's rendered twice which increases its intensity.

Geometry and vertex shader

Before pixel shader part, a short paragraph about geometry and vertex shader used. The mesh for clouds representation is something similar to typical skydome:

All vertices are contained in [0-1] range, so in order to make the mesh centered around (0,0,0) point, scale+bias is used before worldViewProj transform (we already know this pattern from previous parts of this series). For clouds the mesh is largely stretched along XY plane (Z is up) to cover more than view frustum, the result is as follows:

Apart from that, the mesh has normals and tangents vectors. The vertex shader calculates also bitangent vector by cross product - all three are outputted in normalized form. Moreover, there is per-vertex fog calculation (color and intensity).

Pixel Shader


The pixel shader assembly is as follows:
 ps_5_0  
    dcl_globalFlags refactoringAllowed  
    dcl_constantbuffer cb0[10], immediateIndexed  
    dcl_constantbuffer cb1[9], immediateIndexed  
    dcl_constantbuffer cb12[238], immediateIndexed  
    dcl_constantbuffer cb4[13], immediateIndexed  
    dcl_sampler s0, mode_default  
    dcl_resource_texture2d (float,float,float,float) t0  
    dcl_resource_texture2d (float,float,float,float) t1  
    dcl_input_ps linear v0.xyzw  
    dcl_input_ps linear v1.xyzw  
    dcl_input_ps linear v2.w  
    dcl_input_ps linear v3.xyzw  
    dcl_input_ps linear v4.xyz  
    dcl_input_ps linear v5.xyz  
    dcl_output o0.xyzw  
    dcl_temps 4  
   0: mul r0.xyz, cb0[9].xyzx, l(1.000000, 1.000000, -1.000000, 0.000000)  
   1: dp3 r0.w, r0.xyzx, r0.xyzx  
   2: rsq r0.w, r0.w  
   3: mul r0.xyz, r0.wwww, r0.xyzx  
   4: mul r1.xy, cb0[0].xxxx, cb4[5].xyxx  
   5: mad r1.xy, v1.xyxx, cb4[4].xyxx, r1.xyxx  
   6: sample_indexable(texture2d)(float,float,float,float) r1.xyzw, r1.xyxx, t0.xyzw, s0  
   7: add r1.xyz, r1.xyzx, l(-0.500000, -0.500000, -0.500000, 0.000000)  
   8: add r1.xyz, r1.xyzx, r1.xyzx  
   9: dp3 r0.w, r1.xyzx, r1.xyzx  
  10: rsq r0.w, r0.w  
  11: mul r1.xyz, r0.wwww, r1.xyzx  
  12: mul r2.xyz, r1.yyyy, v3.xyzx  
  13: mad r2.xyz, v5.xyzx, r1.xxxx, r2.xyzx  
  14: mov r3.xy, v1.zwzz  
  15: mov r3.z, v3.w  
  16: mad r1.xyz, r3.xyzx, r1.zzzz, r2.xyzx  
  17: dp3_sat r0.x, r0.xyzx, r1.xyzx  
  18: add r0.y, -cb4[2].x, cb4[3].x  
  19: mad r0.x, r0.x, r0.y, cb4[2].x  
  20: dp2 r0.y, -cb0[9].xyxx, -cb0[9].xyxx  
  21: rsq r0.y, r0.y  
  22: mul r0.yz, r0.yyyy, -cb0[9].xxyx  
  23: add r1.xyz, -v4.xyzx, cb1[8].xyzx  
  24: dp3 r0.w, r1.xyzx, r1.xyzx  
  25: rsq r1.z, r0.w  
  26: sqrt r0.w, r0.w  
  27: add r0.w, r0.w, -cb4[7].x  
  28: mul r1.xy, r1.zzzz, r1.xyxx  
  29: dp2_sat r0.y, r0.yzyy, r1.xyxx  
  30: add r0.y, r0.y, r0.y  
  31: min r0.y, r0.y, l(1.000000)  
  32: add r0.z, -cb4[0].x, cb4[1].x  
  33: mad r0.z, r0.y, r0.z, cb4[0].x  
  34: mul r0.x, r0.x, r0.z  
  35: log r0.x, r0.x  
  36: mul r0.x, r0.x, l(2.200000)  
  37: exp r0.x, r0.x  
  38: add r1.xyz, cb12[236].xyzx, -cb12[237].xyzx  
  39: mad r1.xyz, r0.yyyy, r1.xyzx, cb12[237].xyzx  
  40: mul r2.xyz, r0.xxxx, r1.xyzx  
  41: mad r0.xyz, -r1.xyzx, r0.xxxx, v0.xyzx  
  42: mad r0.xyz, v0.wwww, r0.xyzx, r2.xyzx  
  43: add r1.x, -cb4[7].x, cb4[8].x  
  44: div_sat r0.w, r0.w, r1.x  
  45: mul r1.x, r1.w, cb4[9].x  
  46: mad r1.y, -cb4[9].x, r1.w, r1.w  
  47: mad r0.w, r0.w, r1.y, r1.x  
  48: mul r1.xy, cb0[0].xxxx, cb4[11].xyxx  
  49: mad r1.xy, v1.xyxx, cb4[10].xyxx, r1.xyxx  
  50: sample_indexable(texture2d)(float,float,float,float) r1.x, r1.xyxx, t1.xyzw, s0  
  51: mad r1.x, r1.x, cb4[12].x, -cb4[12].x  
  52: mad_sat r1.x, cb4[12].x, v2.w, r1.x  
  53: mul r0.w, r0.w, r1.x  
  54: mul_sat r0.w, r0.w, cb4[6].x  
  55: mul o0.xyz, r0.wwww, r0.xyzx  
  56: mov o0.w, r0.w  
  57: ret  

In terms of input, there are two tiled textures. One of them contains normal map (xyz channels) and cloud shape (a channel). The second one is noise for shape perturbations.

Normal map (c) CD Projekt Red
Cloud shape (c) CD Projekt Red
Noise texture (c) CD Projekt Red

The main constant buffer with clouds parameters is cb4. For this frame their values are:
Apart from that, there are other values used from other cbuffers. Don't worry, we'll get through these ones as well.

Z-Inverted sunlight direction

The first thing which happens in the shader is calculation of normalized, Z-inverted direction of sunlight:
   0: mul r0.xyz, cb0[9].xyzx, l(1.000000, 1.000000, -1.000000, 0.000000)  
   1: dp3 r0.w, r0.xyzx, r0.xyzx  
   2: rsq r0.w, r0.w  
   3: mul r0.xyz, r0.wwww, r0.xyzx  

   float3 invertedSunlightDir = normalize(lightDir * float3(1, 1, -1) );

As I mentioned earlier, Z is up-axis while cb0[9] is sunlight direction. This vector goes into the Sun - this is important! You can check this by writing simple compute shader which performs simple NdotL and inject it to deferred shading pass :)

Sampling the clouds texture

The next step is to calculate texcoords to sample "clouds" texture, sample it, unpack normal vector and normalize it.

   4: mul r1.xy, cb0[0].xxxx, cb4[5].xyxx   
   5: mad r1.xy, v1.xyxx, cb4[4].xyxx, r1.xyxx   
   6: sample_indexable(texture2d)(float,float,float,float) r1.xyzw, r1.xyxx, t0.xyzw, s0   
   7: add r1.xyz, r1.xyzx, l(-0.500000, -0.500000, -0.500000, 0.000000)   
   8: add r1.xyz, r1.xyzx, r1.xyzx   
   9: dp3 r0.w, r1.xyzx, r1.xyzx   
  10: rsq r0.w, r0.w   
   
   
   // Calc sampling coords  
   float2 cloudTextureUV = Texcoords * textureScale + elapsedTime * speedFactors;  
   
   // Sample texture and get data from it  
   float4 cloudTextureValue = texture0.Sample( sampler0, cloudTextureUV ).rgba;  
   float3 normalMap = cloudTextureValue.xyz;  
   float cloudShape = cloudTextureValue.a;  
   
   // Unpack normal and normalize it  
   float3 unpackedNormal = (normalMap - 0.5) * 2.0;  
   unpackedNormal = normalize(unpackedNormal);  

Let's go through this slowly.
In order to have a motion of clouds, we need elapsed time in seconds ( cb[0].x ) multipiled by speed factor which affects how fast clouds are moving on the sky ( cb4[5].xy ).
UVs are stretched on skydome geoometry I was talking before, we also need texture scaling coefficients which affect size of clouds ( cb4[4].xy ).

Final formula is:
samplingUV = Input.TextureUV * textureScale + time * speedMultiplier;

After sampling of all 4 channels we have normal map (rgb channels) and cloud shape (a channel).
To unpack normal map from [0; 1] to [-1; 1] range, we use the following formula:

unpackedNormal = (packedNormal - 0.5) * 2.0;

We could also use this one:
unpackedNormal = packedNormal * 2.0 - 1.0;

And finally we normalize the unpacked normal vector.

Normal mapping

Having normal, tangent and bitangent vectors from vertex shader and normal vector from normal map we perform normal mapping the usual way.

  11: mul r1.xyz, r0.wwww, r1.xyzx  
  12: mul r2.xyz, r1.yyyy, v3.xyzx  
  13: mad r2.xyz, v5.xyzx, r1.xxxx, r2.xyzx  
  14: mov r3.xy, v1.zwzz  
  15: mov r3.z, v3.w  
  16: mad r1.xyz, r3.xyzx, r1.zzzz, r2.xyzx  
    
   // Perform bump mapping  
   float3 SkyTangent = Input.Tangent;  
   float3 SkyNormal = (float3( Input.Texcoords.zw, Input.param3.w ));  
   float3 SkyBitangent = Input.param3.xyz;  
        
   float3x3 TBN = float3x3(SkyTangent, SkyBitangent, SkyNormal);  
   float3 finalNormal = (float3)mul( unpackedNormal, (TBN) );  

Highlight intensity (1)

The next step involves calculating NdotL and this affects amount of highlighting at particular pixel.
Consider the following piece of assembly:
  17: dp3_sat r0.x, r0.xyzx, r1.xyzx  
  18: add r0.y, -cb4[2].x, cb4[3].x  
  19: mad r0.x, r0.x, r0.y, cb4[2].x  

Here is the visualization of NdotL in considered frame:

This (saturated) dot product is used to interpolate between minIntensity and maxIntensity:
This way parts of clouds which are more exposed to sunlight will be brighter.
   // Calculate cosine between normal and up-inv lightdir  
   float NdotL = saturate( dot(invertedSunlightDir, finalNormal) );  
   
   // Param 1, line 19, r0.x  
   float intensity1 = lerp( param1Min, param1Max, NdotL );  

Highlight intensity (2)

There is one more factor which affects the intensity of clouds.
Clouds which exist on the section of sky where Sun in present should be more highlighted. To do that, we calculate a gradient based on XY plane.
This gradient is used to lerp between min/max values, similar fashion as in (1) part.
So in theory we could request darkening clouds which are on opposite side of the Sun but in this particular frame it does not happen because param2Min and param2Max ( cb4[0].x and cb4[1].x, respectively) are both set to 1.0f.

  20: dp2 r0.y, -cb0[9].xyxx, -cb0[9].xyxx  
  21: rsq r0.y, r0.y  
  22: mul r0.yz, r0.yyyy, -cb0[9].xxyx  
  23: add r1.xyz, -v4.xyzx, cb1[8].xyzx  
  24: dp3 r0.w, r1.xyzx, r1.xyzx  
  25: rsq r1.z, r0.w  
  26: sqrt r0.w, r0.w  
  27: add r0.w, r0.w, -cb4[7].x  
  28: mul r1.xy, r1.zzzz, r1.xyxx  
  29: dp2_sat r0.y, r0.yzyy, r1.xyxx  
  30: add r0.y, r0.y, r0.y  
  31: min r0.y, r0.y, l(1.000000)  
  32: add r0.z, -cb4[0].x, cb4[1].x  
  33: mad r0.z, r0.y, r0.z, cb4[0].x  
  34: mul r0.x, r0.x, r0.z  
  35: log r0.x, r0.x  
  36: mul r0.x, r0.x, l(2.200000)  
  37: exp r0.x, r0.x   
   
   
   // Calculate normalized -lightDir.xy (20-22)  
   float2 lightDirXY = normalize( -lightDir.xy );  
   
   // Calculate world to camera  
   float3 vWorldToCamera = ( CameraPos - WorldPos );  
   float worldToCamera_distance = length(vWorldToCamera);  
        
   // normalize vector  
   vWorldToCamera = normalize( vWorldToCamera );  
        
   
   float LdotV = saturate( dot(lightDirXY, vWorldToCamera.xy) );  
   float highlightedSkySection = saturate( 2*LdotV );  
   float intensity2 = lerp( param2Min, param2Max, highlightedSkySection );  
   
   float finalIntensity = pow( intensity2 *intensity1, 2.2);  

At the very end we multiply both intensities and raise it to power of 2.2.

Clouds color

Calculating color of clouds starts by having two values from constant buffer which indicate color of clouds near the Sun and clouds on the opposite side of the Sky. They are lerped with highlightedSkySection.

Then, the result is multiplied by finalIntensity.
And at the end, the result is mixed with fog (it was calculated in the vertex shader for sake of performance).

  38: add r1.xyz, cb12[236].xyzx, -cb12[237].xyzx  
  39: mad r1.xyz, r0.yyyy, r1.xyzx, cb12[237].xyzx  
  40: mul r2.xyz, r0.xxxx, r1.xyzx  
  41: mad r0.xyz, -r1.xyzx, r0.xxxx, v0.xyzx  
  42: mad r0.xyz, v0.wwww, r0.xyzx, r2.xyzx  
   
  float3 cloudsColor = lerp( cloudsColorBack, cloudsColorFront, highlightedSunSection );  
  cloudsColor *= finalIntensity;  
  cloudsColor = lerp( cloudsColor, FogColor, FogAmount );  

Making sure cirrus clouds are more visible on horizon

It's not really visible on this frame, but in fact this layer is more visible near horizon than above Geralt's head. Here's how to do that.

You might have noticed that we calculated length of worldToCamera vector at calculating the second intensity:

  23: add r1.xyz, -v4.xyzx, cb1[8].xyzx  
  24: dp3 r0.w, r1.xyzx, r1.xyzx  
  25: rsq r1.z, r0.w  
  26: sqrt r0.w, r0.w

Let's find the next appearances of this length in the assembly:

  26: sqrt r0.w, r0.w  
  27: add r0.w, r0.w, -cb4[7].x  
  ...  
  43: add r1.x, -cb4[7].x, cb4[8].x  
  44: div_sat r0.w, r0.w, r1.x  

Woah, what we have here??
cb[7].x and cb[8].x have values of 2000.0 and 7000.0, respectively.

It turns out this is use of a function called linstep.
It takes three parameters: min/max which are range, and v which is value.
So the way it works is if v is within [min-max] range it returns a linear interpolation between [0.0 - 1.0]. On the other hand, if v is out of bounds, linstep returns 0.0 or 1.0.

A simple example:

linstep( 1000.0, 2000.0, 999.0) = 0.0
linstep( 1000.0, 2000.0, 1500.0) = 0.5
linstep( 1000.0, 2000.0, 2000.0) = 1.0

So it's quite similar to smoothstep from HLSL except in this case a linear interpolation is performed instead of Hermite one.
Linstep is not present in HLSL, but it's very useful. Really worth to have it in your toolbox.
 // linstep:  
 //  
 // Returns a linear interpolation between 0 and 1 if t is in the range [min, max]   
 // if "v" is <= min, the output is 0  
 // if "v" i >= max, the output is 1  
   
 float linstep( float min, float max, float v )  
 {  
   return saturate( (v - min) / (max - min) );  
 }  


Returning to The Witcher 3:
Once we have deducted this factor which indicates how far the particular piece of the sky is from Geralt we use it to attenuate clouds intensity:

  45: mul r1.x, r1.w, cb4[9].x  
  46: mad r1.y, -cb4[9].x, r1.w, r1.w  
  47: mad r0.w, r0.w, r1.y, r1.x  
   
   float distanceAttenuation = linstep( fadeDistanceStart, fadeDistanceEnd, worldToCamera_distance );  
    
   float fadedCloudShape = closeCloudsHidingFactor * cloudShape;  
   cloudShape = lerp( fadedCloudShape, cloudShape, distanceAttenuation );  

cloudShape is .a channel from the first texture while closeCloudsHidingFactor is a value from constant buffer which controls how visible are clouds above Geralt's head. In every frame I tested this was set to 0.0 which is equal to no clouds. With distanceAttenuation getting closer to 1.0 (distance from camera to skydome increases) clouds are more and more visible.

Sampling noise texture

For a noise texture calculating of sampling coordinates is the same as for the clouds texture except there is other set of textureScale and speedMultiplier for it.

Of course to sample all these texture a sampler with wrap addressing mode is used.

  48: mul r1.xy, cb0[0].xxxx, cb4[11].xyxx  
  49: mad r1.xy, v1.xyxx, cb4[10].xyxx, r1.xyxx  
  50: sample_indexable(texture2d)(float,float,float,float) r1.x, r1.xyxx, t1.xyzw, s0  
   
   // Calc sampling coords for noise  
   float2 noiseTextureUV = Texcoords * textureScaleNoise + elapsedTime * speedFactorsNoise;  
   
   // Sample texture and get data from it  
   float noiseTextureValue = texture1.Sample( sampler0, noiseTextureUV ).x;  

Getting all this together

Once we have a noise value, we have to combine it with cloudShape.
I had some problems with understanding these lines with "param2.w" (which is always 1.0) and noiseMult (was set to 5.0, it's from cbuffer).

Anyway, what's important here is the final value generalCloudsVisibility which affects how clouds are visible.

Take a look at final noise value as well. Output color is cloudsColor multiplied by the final noise which is also output at alpha channel.

  51: mad r1.x, r1.x, cb4[12].x, -cb4[12].x
  52: mad_sat r1.x, cb4[12].x, v2.w, r1.x
  53: mul r0.w, r0.w, r1.x
  54: mul_sat r0.w, r0.w, cb4[6].x
  55: mul o0.xyz, r0.wwww, r0.xyzx
  56: mov o0.w, r0.w
  57: ret   

   // Sample noise texture and get data from it  
   float noiseTextureValue = texture1.Sample( sampler0, noiseTextureUV ).x;  
   noiseTextureValue = noiseTextureValue * noiseMult - noiseMult;  
     
   float noiseValue = saturate( noiseMult * Input.param2.w + noiseTextureValue);  
   noiseValue *= cloudShape;  
     
   float finalNoise = saturate( noiseValue * generalCloudsVisibility);  
   
   return float4( cloudsColor*finalNoise, finalNoise );  

Summary

So here we are at the end. The final result looks really convincing.
Time for a comparison. My shader - left, original one - right:

The shader is here if you're interested.

Feel free to comment and thanks for reading! :)
M.

sobota, 6 kwietnia 2019

Reverse engineering the rendering of The Witcher 3, part 13c - witcher senses (fisheye effect & final combining)

Welcome!

This is the last part of reverse engineering witcher senses effect from The Witcher 3: Wild Hunt.

Quick look on what we have now: in the first part full-screen intensity map was generated which tells how visible effect will be depending on distance. In the second part I investigated "outline map" in more detail which is responsible for outline and "moving" look of the final effect.

We have arrived to the last stop. We need to combine this all together! The last pass is fullscreen quad. Inputs are: color buffer, outline map and intensity map.

Before:



After:


And a video (once again) to show how the effect is applied:


As you can see, besides applying outline to objects which Geralt can see/hear, a fisheye effect is applied to whole screen and whole screen (corners espiecially) is getting greyish to feel like real monster hunter in action.

Full pixel shader assembly:
 ps_5_0  
    dcl_globalFlags refactoringAllowed  
    dcl_constantbuffer cb0[3], immediateIndexed  
    dcl_constantbuffer cb3[7], immediateIndexed  
    dcl_sampler s0, mode_default  
    dcl_sampler s2, mode_default  
    dcl_resource_texture2d (float,float,float,float) t0  
    dcl_resource_texture2d (float,float,float,float) t2  
    dcl_resource_texture2d (float,float,float,float) t3  
    dcl_input_ps_siv v0.xy, position  
    dcl_output o0.xyzw  
    dcl_temps 7  
   0: div r0.xy, v0.xyxx, cb0[2].xyxx  
   1: mad r0.zw, r0.xxxy, l(0.000000, 0.000000, 2.000000, 2.000000), l(0.000000, 0.000000, -1.000000, -1.000000)  
   2: mov r1.yz, abs(r0.zzwz)  
   3: div r0.z, cb0[2].x, cb0[2].y  
   4: mul r1.x, r0.z, r1.y  
   5: add r0.zw, r1.xxxz, -cb3[2].xxxy  
   6: mul_sat r0.zw, r0.zzzw, l(0.000000, 0.000000, 0.555556, 0.555556)  
   7: log r0.zw, r0.zzzw  
   8: mul r0.zw, r0.zzzw, l(0.000000, 0.000000, 2.500000, 2.500000)  
   9: exp r0.zw, r0.zzzw  
  10: dp2 r0.z, r0.zwzz, r0.zwzz  
  11: sqrt r0.z, r0.z  
  12: min r0.z, r0.z, l(1.000000)  
  13: add r0.z, -r0.z, l(1.000000)  
  14: mov_sat r0.w, cb3[6].x  
  15: add_sat r1.xy, -r0.xyxx, l(0.030000, 0.030000, 0.000000, 0.000000)  
  16: add r1.x, r1.y, r1.x  
  17: add_sat r0.xy, r0.xyxx, l(-0.970000, -0.970000, 0.000000, 0.000000)  
  18: add r0.x, r0.x, r1.x  
  19: add r0.x, r0.y, r0.x  
  20: mul r0.x, r0.x, l(20.000000)  
  21: min r0.x, r0.x, l(1.000000)  
  22: add r1.xy, v0.xyxx, v0.xyxx  
  23: div r1.xy, r1.xyxx, cb0[2].xyxx  
  24: add r1.xy, r1.xyxx, l(-1.000000, -1.000000, 0.000000, 0.000000)  
  25: dp2 r0.y, r1.xyxx, r1.xyxx  
  26: mul r1.xy, r0.yyyy, r1.xyxx  
  27: mul r0.y, r0.w, l(0.100000)  
  28: mul r1.xy, r0.yyyy, r1.xyxx  
  29: max r1.xy, r1.xyxx, l(-0.400000, -0.400000, 0.000000, 0.000000)  
  30: min r1.xy, r1.xyxx, l(0.400000, 0.400000, 0.000000, 0.000000)  
  31: mul r1.xy, r1.xyxx, cb3[1].xxxx  
  32: mul r1.zw, r1.xxxy, cb0[2].zzzw  
  33: mad r1.zw, v0.xxxy, cb0[1].zzzw, -r1.zzzw  
  34: sample_indexable(texture2d)(float,float,float,float) r2.xyz, r1.zwzz, t0.xyzw, s0  
  35: mul r3.xy, r1.zwzz, l(0.500000, 0.500000, 0.000000, 0.000000)  
  36: sample_indexable(texture2d)(float,float,float,float) r0.y, r3.xyxx, t2.yxzw, s2  
  37: mad r3.xy, r1.zwzz, l(0.500000, 0.500000, 0.000000, 0.000000), l(0.500000, 0.000000, 0.000000, 0.000000)  
  38: sample_indexable(texture2d)(float,float,float,float) r2.w, r3.xyxx, t2.yzwx, s2  
  39: mul r2.w, r2.w, l(0.125000)  
  40: mul r3.x, cb0[0].x, l(0.100000)  
  41: add r0.x, -r0.x, l(1.000000)  
  42: mul r0.xy, r0.xyxx, l(0.030000, 0.125000, 0.000000, 0.000000)  
  43: mov r3.yzw, l(0, 0, 0, 0)  
  44: mov r4.x, r0.y  
  45: mov r4.y, r2.w  
  46: mov r4.z, l(0)  
  47: loop  
  48:  ige r4.w, r4.z, l(8)  
  49:  breakc_nz r4.w  
  50:  itof r4.w, r4.z  
  51:  mad r4.w, r4.w, l(0.785375), -r3.x  
  52:  sincos r5.x, r6.x, r4.w  
  53:  mov r6.y, r5.x  
  54:  mul r5.xy, r0.xxxx, r6.xyxx  
  55:  mad r5.zw, r5.xxxy, l(0.000000, 0.000000, 0.125000, 0.125000), r1.zzzw  
  56:  mul r6.xy, r5.zwzz, l(0.500000, 0.500000, 0.000000, 0.000000)  
  57:  sample_indexable(texture2d)(float,float,float,float) r4.w, r6.xyxx, t2.yzwx, s2  
  58:  mad r4.x, r4.w, l(0.125000), r4.x  
  59:  mad r5.zw, r5.zzzw, l(0.000000, 0.000000, 0.500000, 0.500000), l(0.000000, 0.000000, 0.500000, 0.000000)  
  60:  sample_indexable(texture2d)(float,float,float,float) r4.w, r5.zwzz, t2.yzwx, s2  
  61:  mad r4.y, r4.w, l(0.125000), r4.y  
  62:  mad r5.xy, r5.xyxx, r1.xyxx, r1.zwzz  
  63:  sample_indexable(texture2d)(float,float,float,float) r5.xyz, r5.xyxx, t0.xyzw, s0  
  64:  mad r3.yzw, r5.xxyz, l(0.000000, 0.125000, 0.125000, 0.125000), r3.yyzw  
  65:  iadd r4.z, r4.z, l(1)  
  66: endloop  
  67: sample_indexable(texture2d)(float,float,float,float) r0.xy, r1.zwzz, t3.xyzw, s0  
  68: mad_sat r0.xy, -r0.xyxx, l(0.800000, 0.750000, 0.000000, 0.000000), r4.xyxx  
  69: dp3 r1.x, r3.yzwy, l(0.300000, 0.300000, 0.300000, 0.000000)  
  70: add r1.yzw, -r1.xxxx, r3.yyzw  
  71: mad r1.xyz, r0.zzzz, r1.yzwy, r1.xxxx  
  72: mad r1.xyz, r1.xyzx, l(0.600000, 0.600000, 0.600000, 0.000000), -r2.xyzx  
  73: mad r1.xyz, r0.wwww, r1.xyzx, r2.xyzx  
  74: mul r0.yzw, r0.yyyy, cb3[4].xxyz  
  75: mul r2.xyz, r0.xxxx, cb3[5].xyzx  
  76: mad r0.xyz, r0.yzwy, l(1.200000, 1.200000, 1.200000, 0.000000), r2.xyzx  
  77: mov_sat r2.xyz, r0.xyzx  
  78: dp3_sat r0.x, r0.xyzx, l(1.000000, 1.000000, 1.000000, 0.000000)  
  79: add r0.yzw, -r1.xxyz, r2.xxyz  
  80: mad o0.xyz, r0.xxxx, r0.yzwy, r1.xyzx  
  81: mov o0.w, l(1.000000)  
  82: ret   


82 lines means a lot of work! Let's get into it!

Take a look at inputs first:
   // *** Inputs       
     
   // * Zoom amount, always 1  
   float zoomAmount = cb3_v1.x;  
     
   // Another value which affect fisheye effect  
   // but always set to float2(1.0, 1.0).  
   float2 amount = cb0_v2.zw;  
     
   // Elapsed time in seconds  
   float time = cb0_v0.x;  
     
   // Colors of witcher senses  
   float3 colorInteresting = cb3_v5.rgb;  
   float3 colorTraces = cb3_v4.rgb;  
     
   // Was always set to float2(0.0, 0.0).  
   // Setting this to higher values  
   // makes "grey corners" effect weaker.  
   float2 offset = cb3_v2.xy;  
     
   // Dimensions of fullscreen  
   float2 texSize = cb0_v2.xy;  
   float2 invTexSize = cb0_v1.zw;  
   
   // Main value which causes fisheye effect [0-1]  
   const float fisheyeAmount = saturate( cb3_v6.x );  

The main value responsible for amount of the effect is fisheyeAmount. I guess it rises gradually from 0.0 to 1.0 once Geralt is starting to use his senses. Rest of values are rather constant but I guess some of them are different if user disables fisheye effect in gameplay options (I haven't checked it).


The first thing which happens in the shader is calculating mask responsible for grey corners:
   0: div r0.xy, v0.xyxx, cb0[2].xyxx   
   1: mad r0.zw, r0.xxxy, l(0.000000, 0.000000, 2.000000, 2.000000), l(0.000000, 0.000000, -1.000000, -1.000000)   
   2: mov r1.yz, abs(r0.zzwz)   
   3: div r0.z, cb0[2].x, cb0[2].y   
   4: mul r1.x, r0.z, r1.y   
   5: add r0.zw, r1.xxxz, -cb3[2].xxxy   
   6: mul_sat r0.zw, r0.zzzw, l(0.000000, 0.000000, 0.555556, 0.555556)   
   7: log r0.zw, r0.zzzw   
   8: mul r0.zw, r0.zzzw, l(0.000000, 0.000000, 2.500000, 2.500000)   
   9: exp r0.zw, r0.zzzw   
  10: dp2 r0.z, r0.zwzz, r0.zwzz   
  11: sqrt r0.z, r0.z   
  12: min r0.z, r0.z, l(1.000000)   
  13: add r0.z, -r0.z, l(1.000000)   

In HLSL we can write it this way:
   // Main uv  
   float2 uv = PosH.xy / texSize;  
     
   // Scale at first from [0-1] to [-1;1], then calculate abs  
   float2 uv3 = abs( uv * 2.0 - 1.0);   
        
   // Aspect ratio  
   float aspectRatio = texSize.x / texSize.y;  
        
   // * Mask used to make corners grey  
   float mask_gray_corners;  
   {  
     float2 newUv = float2( uv3.x * aspectRatio, uv3.y ) - offset;  
     newUv = saturate( newUv / 1.8 );  
     newUv = pow(newUv, 2.5);  
       
     mask_gray_corners = 1-min(1.0, length(newUv) );  
   }  

At first uv is [-1; 1] range are calculated and their absolute value. Then, some clever "squeezing" takes place. Final mask looks this way:

I'll come back to this mask later.


Now I'm going to intentionally omit a few lines of assembly and take a closer look at code responsible for "zooming" effect.
  22: add r1.xy, v0.xyxx, v0.xyxx   
  23: div r1.xy, r1.xyxx, cb0[2].xyxx   
  24: add r1.xy, r1.xyxx, l(-1.000000, -1.000000, 0.000000, 0.000000)   
  25: dp2 r0.y, r1.xyxx, r1.xyxx   
  26: mul r1.xy, r0.yyyy, r1.xyxx   
  27: mul r0.y, r0.w, l(0.100000)   
  28: mul r1.xy, r0.yyyy, r1.xyxx   
  29: max r1.xy, r1.xyxx, l(-0.400000, -0.400000, 0.000000, 0.000000)   
  30: min r1.xy, r1.xyxx, l(0.400000, 0.400000, 0.000000, 0.000000)   
  31: mul r1.xy, r1.xyxx, cb3[1].xxxx   
  32: mul r1.zw, r1.xxxy, cb0[2].zzzw   
  33: mad r1.zw, v0.xxxy, cb0[1].zzzw, -r1.zzzw   

At first "double" texture coordinates are calculated and float2(1, 1) is subtracted:
   float2 uv4 = 2 * PosH.xy;  
   uv4 /= cb0_v2.xy;  
   uv4 -= float2(1.0, 1.0);  

Such texcoord can be visualised as:

Then dot product is calculated as dot(uv4, uv4), which yields a mask:

which is used to multiply with aforementioned texcoords:

Important: In upper left corner (black pixels) values are negative. The reason why they are represented as black (0.0) is limited precision of R11G11B10_FLOAT format. There is no sign bit there so we cannot store negative values.

Later an attenuation factor is calculated (As I mentioned before, fisheyeAmount changes from 0.0 to 1.0).
   float attenuation = fisheyeAmount * 0.1;  
   uv4 *= attenuation;  

Later we have a clamp (max/min) and one multiplication.
This way an offset was calculated. To calculate final uv which will be used to sample color texture we just subtract:

float2 colorUV = mainUv - offset;

Sampling with colorUV input color texture, we get distorted image around corners:



Outlines

The next step is to sample outline map to find outlines. This is quite easy, at first we find texcoords to sample interesting objects' outline, then the same for traces:
   // * Sample outline map  
        
   // interesting objects (upper left square)  
   float2 outlineUV = colorUV * 0.5;  
   float outlineInteresting = texture2.Sample( sampler2, outlineUV ).x; // r0.y  
        
   // traces (upper right square)  
   outlineUV = colorUV * 0.5 + float2(0.5, 0.0);  
   float outlineTraces = texture2.Sample( sampler2, outlineUV ).x; // r2.w  
        
   outlineInteresting /= 8.0; // r4.x  
   outlineTraces /= 8.0; // r4.y  

interesting objects from outline map
traces from outline map
It's worth to notice that we only sample .x channel from outline map and only upper squares of it are considered.

Movement

To make a movement of traces with time quite a similar trick is used as it was with drunk effect. A unit circle is introduced and we sample 8 times both outline map for interesting objects and traces as well as color texture.

Note that we divided found outlines by 8.0 just a moment ago.

Because we are in texture coordinates space [0-1]2 having circle radius = 1 to circle around particular pixel would give us unacceptable artifacts:


So, before going further let's find out how the radius is calculated. To do that, we have to go back to missed assembly lines 15-21. A small problem with calculation of this radius is that its calculation is scattered within shader (probably clever shader compiler optimizations or so). So, there is one part (15-21) and second one (41-42):
  15: add_sat r1.xy, -r0.xyxx, l(0.030000, 0.030000, 0.000000, 0.000000)  
  16: add r1.x, r1.y, r1.x  
  17: add_sat r0.xy, r0.xyxx, l(-0.970000, -0.970000, 0.000000, 0.000000)  
  18: add r0.x, r0.x, r1.x  
  19: add r0.x, r0.y, r0.x  
  20: mul r0.x, r0.x, l(20.000000)  
  21: min r0.x, r0.x, l(1.000000)  
  ...  
  41: add r0.x, -r0.x, l(1.000000)  
  42: mul r0.xy, r0.xyxx, l(0.030000, 0.125000, 0.000000, 0.000000)  

As you can see we consider only texels from [0.00 - 0.03] near every surface, sum their values up, multiply by 20 and saturate. Here is how it looks just after lines 15-21:


 And just after line 41:

Then at line 42 we multiply above by 0.03, which is circle radius for whole screen. As you can see, the radius is getting smaller near the edges of screen.


Having that, we can take a look at assembly resposible for movement:
  40: mul r3.x, cb0[0].x, l(0.100000)  
  41: add r0.x, -r0.x, l(1.000000)  
  42: mul r0.xy, r0.xyxx, l(0.030000, 0.125000, 0.000000, 0.000000)  
  43: mov r3.yzw, l(0, 0, 0, 0)  
  44: mov r4.x, r0.y  
  45: mov r4.y, r2.w  
  46: mov r4.z, l(0)  
  47: loop  
  48:  ige r4.w, r4.z, l(8)  
  49:  breakc_nz r4.w  
  50:  itof r4.w, r4.z  
  51:  mad r4.w, r4.w, l(0.785375), -r3.x  
  52:  sincos r5.x, r6.x, r4.w  
  53:  mov r6.y, r5.x  
  54:  mul r5.xy, r0.xxxx, r6.xyxx  
  55:  mad r5.zw, r5.xxxy, l(0.000000, 0.000000, 0.125000, 0.125000), r1.zzzw  
  56:  mul r6.xy, r5.zwzz, l(0.500000, 0.500000, 0.000000, 0.000000)  
  57:  sample_indexable(texture2d)(float,float,float,float) r4.w, r6.xyxx, t2.yzwx, s2  
  58:  mad r4.x, r4.w, l(0.125000), r4.x  
  59:  mad r5.zw, r5.zzzw, l(0.000000, 0.000000, 0.500000, 0.500000), l(0.000000, 0.000000, 0.500000, 0.000000)  
  60:  sample_indexable(texture2d)(float,float,float,float) r4.w, r5.zwzz, t2.yzwx, s2  
  61:  mad r4.y, r4.w, l(0.125000), r4.y  
  62:  mad r5.xy, r5.xyxx, r1.xyxx, r1.zwzz  
  63:  sample_indexable(texture2d)(float,float,float,float) r5.xyz, r5.xyxx, t0.xyzw, s0  
  64:  mad r3.yzw, r5.xxyz, l(0.000000, 0.125000, 0.125000, 0.125000), r3.yyzw  
  65:  iadd r4.z, r4.z, l(1)  
  66: endloop  

Let's take a moment to stop here. At line 40 we have time factor - simply elapsedTime * 0.1. At line 43 we have buffer for color texture fetched inside loop.

r0.x (lines 41-42) is radius of circle as we know it now. r4.x (line 44) is outline of interesting objects, r4.y (line 45) - outline of traces (divided previously by 8!) and r4.z (line 46) - loop counter.

As one can expect, loop has 8 iterations. We start by calculating angle in radians with i * PI_4 which gives 2*PI - full cycle. Angle is perturbed with time.

Using sincos we determine point of sampling (unit circle)  and we adjust the radius using multiplication (line 54).

After that we circle around a pixel and sample outlines and color. After the loop we will have average values (thanks to dividing by 8) of outlines and color.
   float timeParam = time * 0.1;  
     
   // adjust circle radius  
   circle_radius = 1.0 - circle_radius;  
   circle_radius *= 0.03;  
        
   float3 color_circle_main = float3(0.0, 0.0, 0.0);  
        
   [loop]  
   for (int i=0; 8 > i; i++)  
   {  
      // full 2*PI = 360 angles cycle  
      const float angleRadians = (float) i * PI_4 - timeParam;  
             
      // unit circle  
      float2 unitCircle;  
      sincos(angleRadians, unitCircle.y, unitCircle.x); // unitCircle.x = cos, unitCircle.y = sin  
             
      // adjust radius  
      unitCircle *= circle_radius;  
             
      // * base texcoords (circle) - note we also scale radius here by 8  
      // * probably because of dimensions of outline map.  
      // line 55  
      float2 uv_outline_base = colorUV + unitCircle / 8.0;  
                       
      // * interesting objects (circle)  
      float2 uv_outline_interesting_circle = uv_outline_base * 0.5;  
      float outline_interesting_circle = texture2.Sample( sampler2, uv_outline_interesting_circle ).x;  
      outlineInteresting += outline_interesting_circle / 8.0;  
             
      // * traces (circle)  
      float2 uv_outline_traces_circle = uv_outline_base * 0.5 + float2(0.5, 0.0);  
      float outline_traces_circle = texture2.Sample( sampler2, uv_outline_traces_circle ).x;  
      outlineTraces += outline_traces_circle / 8.0;  
             
      // * sample color texture (zooming effect) with perturbation  
      float2 uv_color_circle = colorUV + unitCircle * offsetUV;  
      float3 color_circle = texture0.Sample( sampler0, uv_color_circle ).rgb;  
      color_circle_main += color_circle / 8.0;  
   }  
        

Sampling of color is quite similar, but to base colorUV we add offset multiplied by "unit" circle.

Intensities

After the loop we sample intensity map and adjust final intensities (because intensity map has no idea about outlines):
  67: sample_indexable(texture2d)(float,float,float,float) r0.xy, r1.zwzz, t3.xyzw, s0  
  68: mad_sat r0.xy, -r0.xyxx, l(0.800000, 0.750000, 0.000000, 0.000000), r4.xyxx  

HLSL:
   // * Sample intensity map  
   float2 intensityMap = texture3.Sample( sampler0, colorUV ).xy;  
     
   float intensityInteresting = intensityMap.r;  
   float intensityTraces = intensityMap.g;  
        
   // * Adjust outlines  
   float mainOutlineInteresting = saturate( outlineInteresting - 0.8*intensityInteresting );  
   float mainOutlineTraces = saturate( outlineTraces - 0.75*intensityTraces ); 

Gray corners and final combining

The gray color near corners is calculated using dot product (assembly line 69):
   // * Greyish color  
   float3 color_greyish = dot( color_circle_main, float3(0.3, 0.3, 0.3) ).xxx;  



Then we have two interpolations. The first one combines gray color with "circled one" using the first mask I described - so the corners are grey. Additionally there is 0.6 factor which desaturates final image:

The second one combines the first color with the above one using fisheyeAmount. That means, the screen is getting progressively darker (thanks to 0.6 multiplication above) and more gray around corner! Genius.

HLSL:
   // * Determine main color.  
   // (1) At first, combine "circled" color with gray one.  
   // Now we have have greyish corners here.  
   float3 mainColor = lerp( color_greyish, color_circle_main, mask_gray_corners ) * 0.6;  
     
   // (2) Then mix "regular" color with the above.  
   // Please note this operation makes corners gradually gray (because fisheyeAmount rises from 0 to 1)
   // and gradually darker (because of 0.6 multiplier).  
   mainColor = lerp( color, mainColor, fisheyeAmount );  


Now we can move to outlining objects.
Colors (red and yellow) are taken from constant buffer.
   // * Determine color of witcher senses  
   float3 senses_traces = mainOutlineTraces * colorTraces;  
   float3 senses_interesting = mainOutlineInteresting * colorInteresting;  
   float3 senses_total = 1.2 * senses_traces + senses_interesting;   



Phew! We are almost at the finish line!
We have final color, we have color of witcher senses... ale we have to do is to combine it somehow!

This is not just simple adding. At first, we calculate dot product:
  78: dp3_sat r0.x, r0.xyzx, l(1.000000, 1.000000, 1.000000, 0.000000)  
   
  float dot_senses_total = saturate( dot(senses_total, float3(1.0, 1.0, 1.0) ) );  

which looks like this:

And this is, at the very end, used to interpolate between color and (saturated) witcher senses:
  76: mad r0.xyz, r0.yzwy, l(1.200000, 1.200000, 1.200000, 0.000000), r2.xyzx  
  77: mov_sat r2.xyz, r0.xyzx  
  78: dp3_sat r0.x, r0.xyzx, l(1.000000, 1.000000, 1.000000, 0.000000)  
  79: add r0.yzw, -r1.xxyz, r2.xxyz  
  80: mad o0.xyz, r0.xxxx, r0.yzwy, r1.xyzx  
  81: mov o0.w, l(1.000000)  
  82: ret  
   
   float3 senses_total = 1.2 * senses_traces + senses_interesting;   
     
   // * Final combining  
   float3 senses_total_sat = saturate(senses_total);  
   float dot_senses_total = saturate( dot(senses_total, float3(1.0, 1.0, 1.0) ) );  
        
   float3 finalColor = lerp( mainColor, senses_total_sat, dot_senses_total );  
   return float4( finalColor, 1.0 );  



This is the end.


The full shader is available here.
Comparison of my (left) and original (right) shaders:


If you have come this far, congratulations. Feel free to comment.
I hope you enjoyed this mini-series! In "witcher senses" mechanics there is a lot of brilliant ideas and final result is really convincing.

Thank you very much for reading!


PS. A decent part of this mini-series was done with High Contrast in background :)