niedziela, 16 czerwca 2019

Reverse engineering the rendering of The Witcher 3, part 15 - fog

This post is a part of the series "Reverse engineering the rendering of The Witcher 3".

Fog can be implemented in many ways. However, times when we could apply simple distance-based fog and be happy with it are gone forever (probably). Living in a land of programmable shaders opened us doors to new, crazy, but most importantly - physically-correct and visually plausible solutions.

Current trend in fog rendering is probably based on compute shaders (see this presentation by Bartłomiej Wroński for more details). 

Despite the mentioned presentation has appeared in 2014 and The Witcher 3 was released in 2015/2016 - fog in the last part of Geralt's adventures is completely screen-based, as typical postprocess.

Before we start our 15th reverse engineering session in this series, I must say that I've been trying to understand the fog in The Witcher 3 at least 5 times in the last year - and each time I failed. The assembly, as you'll see in a mnute is quite complex which makes process of creating readable HLSL fog shader I won't be ashamed of almost impossible.

However, I managed to found in the internet a fog shader which almost immediately caught my attention by similarity to The Witcher 3 in terms of variable names and general instructions order. That shader was not *exactly* the same as in the game so I had to work on it a little bit. My point is, most of HLSL you'll see here is not created/reversed by me with two exceptions. Keep that in mind.

Here is assembly of pixel shader for fog - it's worth noticing it's the same for whole game (2015 base and both DLCs):
    dcl_globalFlags refactoringAllowed  
    dcl_constantbuffer cb3[2], immediateIndexed  
    dcl_constantbuffer cb12[214], immediateIndexed  
    dcl_resource_texture2d (float,float,float,float) t0  
    dcl_resource_texture2d (float,float,float,float) t1  
    dcl_resource_texture2d (float,float,float,float) t2  
    dcl_input_ps_siv v0.xy, position  
    dcl_output o0.xyzw  
    dcl_temps 7  
   0: ftou r0.xy, v0.xyxx  
   1: mov, l(0, 0, 0, 0)  
   2: ld_indexable(texture2d)(float,float,float,float) r1.x, r0.xyww, t0.xyzw  
   3: mad r1.y, r1.x, cb12[22].x, cb12[22].y  
   4: lt r1.y, r1.y, l(1.000000)  
   5: if_nz r1.y  
   6:  utof r1.yz, r0.xxyx  
   7:  mul r2.xyzw, r1.zzzz, cb12[211].xyzw  
   8:  mad r2.xyzw, cb12[210].xyzw, r1.yyyy, r2.xyzw  
   9:  mad r1.xyzw, cb12[212].xyzw, r1.xxxx, r2.xyzw  
  10:  add r1.xyzw, r1.xyzw, cb12[213].xyzw  
  11:  div, r1.xyzx, r1.wwww  
  12:  ld_indexable(texture2d)(float,float,float,float), r0.xyww, t1.xyzw  
  13:  ld_indexable(texture2d)(float,float,float,float) r0.x, r0.xyzw, t2.xyzw  
  14:  max r0.x, r0.x, cb3[1].x  
  15:  add r0.yzw, r1.xxyz, -cb12[0].xxyz  
  16:  dp3 r1.x, r0.yzwy, r0.yzwy  
  17:  sqrt r1.x, r1.x  
  18:  add r1.y, r1.x, -cb3[0].x  
  19:  add, -cb3[0].xxxz, cb3[0].yyyw  
  20:  div_sat r1.y, r1.y, r1.z  
  21:  mad r1.y, r1.y, r1.w, cb3[0].z  
  22:  add r0.x, r0.x, l(-1.000000)  
  23:  mad r0.x, r1.y, r0.x, l(1.000000)  
  24:  div r0.yzw, r0.yyzw, r1.xxxx  
  25:  mad r1.y, r0.w, cb12[22].z, cb12[0].z  
  26:  add r1.x, r1.x, -cb12[22].z  
  27:  max r1.x, r1.x, l(0)  
  28:  min r1.x, r1.x, cb12[42].z  
  29:  mul r1.z, r0.w, r1.x  
  30:  mul r1.w, r1.x, cb12[43].x  
  31:  mul, r1.zzzw, l(0.000000, 0.000000, 0.062500, 0.062500)  
  32:  dp3 r0.y, cb12[38].xyzx, r0.yzwy  
  33:  add r0.z, r0.y, cb12[42].x  
  34:  add r0.w, cb12[42].x, l(1.000000)  
  35:  div_sat r0.z, r0.z, r0.w  
  36:  add r0.w, -cb12[43].z, cb12[43].y  
  37:  mad r0.z, r0.z, r0.w, cb12[43].z  
  38:  mul r0.w, abs(r0.y), abs(r0.y)  
  39:  mad_sat r2.w, r1.x, l(0.002000), l(-0.300000)  
  40:  mul r0.w, r0.w, r2.w  
  41:  lt r0.y, l(0), r0.y  
  42:  movc, r0.yyyy, cb12[39].xyzx, cb12[41].xyzx  
  43:  add, r3.xyzx, -cb12[40].xyzx  
  44:  mad, r0.wwww, r3.xyzx, cb12[40].xyzx  
  45:  movc, r0.yyyy, cb12[45].xyzx, cb12[47].xyzx  
  46:  add, r4.xyzx, -cb12[46].xyzx  
  47:  mad, r0.wwww, r4.xyzx, cb12[46].xyzx  
  48:  ge r0.y, r1.x, cb12[48].y  
  49:  if_nz r0.y  
  50:   add r0.y, r1.y, cb12[42].y  
  51:   mul r0.w, r0.z, r0.y  
  52:   mul r1.y, r0.z, r1.z  
  53:   mad r5.xyzw, r1.yyyy, l(16.000000, 15.000000, 14.000000, 13.000000), r0.wwww  
  54:   max r5.xyzw, r5.xyzw, l(0, 0, 0, 0)  
  55:   add r5.xyzw, r5.xyzw, l(1.000000, 1.000000, 1.000000, 1.000000)  
  56:   div_sat r5.xyzw, r1.wwww, r5.xyzw  
  57:   add r5.xyzw, -r5.xyzw, l(1.000000, 1.000000, 1.000000, 1.000000)  
  58:   mul r1.z, r5.y, r5.x  
  59:   mul r1.z, r5.z, r1.z  
  60:   mul r1.z, r5.w, r1.z  
  61:   mad r5.xyzw, r1.yyyy, l(12.000000, 11.000000, 10.000000, 9.000000), r0.wwww  
  62:   max r5.xyzw, r5.xyzw, l(0, 0, 0, 0)  
  63:   add r5.xyzw, r5.xyzw, l(1.000000, 1.000000, 1.000000, 1.000000)  
  64:   div_sat r5.xyzw, r1.wwww, r5.xyzw  
  65:   add r5.xyzw, -r5.xyzw, l(1.000000, 1.000000, 1.000000, 1.000000)  
  66:   mul r1.z, r1.z, r5.x  
  67:   mul r1.z, r5.y, r1.z  
  68:   mul r1.z, r5.z, r1.z  
  69:   mul r1.z, r5.w, r1.z  
  70:   mad r5.xyzw, r1.yyyy, l(8.000000, 7.000000, 6.000000, 5.000000), r0.wwww  
  71:   max r5.xyzw, r5.xyzw, l(0, 0, 0, 0)  
  72:   add r5.xyzw, r5.xyzw, l(1.000000, 1.000000, 1.000000, 1.000000)  
  73:   div_sat r5.xyzw, r1.wwww, r5.xyzw  
  74:   add r5.xyzw, -r5.xyzw, l(1.000000, 1.000000, 1.000000, 1.000000)  
  75:   mul r1.z, r1.z, r5.x  
  76:   mul r1.z, r5.y, r1.z  
  77:   mul r1.z, r5.z, r1.z  
  78:   mul r1.z, r5.w, r1.z  
  79:   mad r5.xy, r1.yyyy, l(4.000000, 3.000000, 0.000000, 0.000000), r0.wwww  
  80:   max r5.xy, r5.xyxx, l(0, 0, 0, 0)  
  81:   add r5.xy, r5.xyxx, l(1.000000, 1.000000, 0.000000, 0.000000)  
  82:   div_sat r5.xy, r1.wwww, r5.xyxx  
  83:   add r5.xy, -r5.xyxx, l(1.000000, 1.000000, 0.000000, 0.000000)  
  84:   mul r1.z, r1.z, r5.x  
  85:   mul r1.z, r5.y, r1.z  
  86:   mad r0.w, r1.y, l(2.000000), r0.w  
  87:   max r0.w, r0.w, l(0)  
  88:   add r0.w, r0.w, l(1.000000)  
  89:   div_sat r0.w, r1.w, r0.w  
  90:   add r0.w, -r0.w, l(1.000000)  
  91:   mul r0.w, r0.w, r1.z  
  92:   mad r0.y, r0.y, r0.z, r1.y  
  93:   max r0.y, r0.y, l(0)  
  94:   add r0.y, r0.y, l(1.000000)  
  95:   div_sat r0.y, r1.w, r0.y  
  96:   add r0.y, -r0.y, l(1.000000)  
  97:   mad r0.y, -r0.w, r0.y, l(1.000000)  
  98:   add r0.z, r1.x, -cb12[48].y  
  99:   mul_sat r0.z, r0.z, cb12[48].z  
  100:  else  
  101:   mov r0.yz, l(0.000000, 1.000000, 0.000000, 0.000000)  
  102:  endif  
  103:  log r0.y, r0.y  
  104:  mul r0.w, r0.y, cb12[42].w  
  105:  exp r0.w, r0.w  
  106:  mul r0.y, r0.y, cb12[48].x  
  107:  exp r0.y, r0.y  
  108:  mul r0.yw, r0.yyyw, r0.zzzz  
  109:  mad_sat r1.xy, r0.wwww, cb12[189].xzxx, cb12[189].ywyy  
  110:  add, -r3.xyzx, cb12[188].xyzx  
  111:  mad, r1.xxxx, r5.xyzx, r3.xyzx  
  112:  add r0.z, cb12[188].w, l(-1.000000)  
  113:  mad r0.z, r1.y, r0.z, l(1.000000)  
  114:  mul_sat r5.w, r0.z, r0.w  
  115:  lt r0.z, l(0), cb12[192].x  
  116:  if_nz r0.z  
  117:   mad_sat r1.xy, r0.wwww, cb12[191].xzxx, cb12[191].ywyy  
  118:   add, -r3.xyzx, cb12[190].xyzx  
  119:   mad, r1.xxxx, r6.xyzx, r3.xyzx  
  120:   add r0.z, cb12[190].w, l(-1.000000)  
  121:   mad r0.z, r1.y, r0.z, l(1.000000)  
  122:   mul_sat r3.w, r0.z, r0.w  
  123:   add r1.xyzw, -r5.xyzw, r3.xyzw  
  124:   mad r5.xyzw, cb12[192].xxxx, r1.xyzw, r5.xyzw  
  125:  endif  
  126:  mul r0.z, r0.x, r5.w  
  127:  mul r0.x, r0.x, r0.y  
  128:  dp3 r0.y, l(0.333000, 0.555000, 0.222000, 0.000000), r2.xyzx  
  129:  mad, r0.yyyy, r4.xyzx, -r2.xyzx  
  130:  mad r0.xyw, r0.xxxx, r1.xyxz, r2.xyxz  
  131:  add, -r0.xywx, r5.xyzx  
  132:  mad, r0.zzzz, r1.xyzx, r0.xywx  
  133: else  
  134:  mov, l(0, 0, 0, 0)  
  135: endif  
  136: mov, r0.xyzx  
  137: mov o0.w, l(1.000000)  
  138: ret  

This shader is quite long, let's be honest. Probably too long for effective reverse engineering process ;)

Here is an example sunset scene with fog:

Let's take a look at inputs:

In terms of textures, we have depth buffer, Ambient Occlusion and HDR color buffer
Input depth buffer

Input ambient occlusion

Input HDR color buffer

... and the result of fog shader in this scene looks like this:

HDR texture after applying fog

Depth buffer is used to reconstruct world position. This is a common pattern in shaders of The Witcher 3.

Having ambient occlusion data (if enabled) allows us to darken fog. Very smart idea, seems maybe obvious but I've never thinked of it in that way. I'll go back to this aspect later.

The shader begins with determining if pixel is not on sky. In case when pixel lies on sky (depth == 1.0) - shader returns black color. If a pixel is on scene (depth < 1.0), we reconstruct world position using depth buffer (lines 7-11) and process with calculation of fog.

The fog pass takes place shortly after deferred shading process. You can see that some of forward-pass specific elements are missing yet. In this particular scene, deferred light volumes were applied and, after that, we rendered hair/face/eyes of Geralt.

The first thing to know about fog in The Witcher 3 is that it consists of two parts: "fog color" and "aerial color".
 struct FogResult  
    float4 paramsFog;     // RGB: color, A: influence  
    float4 paramsAerial;  // RGB: color, A: influence  

For each part, there are 3 colors provided: front, middle and back. So we have in constant buffer data such as "FogColorFront", "FogColorMiddle", "AerialColorBack" etc. See the inputs:

   // *** Inputs *** //  
   float3 FogSunDir =;  
   float3 FogColorFront =;  
   float3 FogColorMiddle =;  
   float3 FogColorBack =;  
   float4 FogBaseParams = cb12_v42;  
   float4 FogDensityParamsScene = cb12_v43;  
   float4 FogDensityParamsSky = cb12_v44;  
   float3 AerialColorFront =;  
   float3 AerialColorMiddle =;  
   float3 AerialColorBack =;  
   float4 AerialParams = cb12_v48; 

Before calculating final colors, we need to calculate some vectors and dot products. Shader has access to pixel's world position, camera position (cb12[0].xyz) and fog/light direction (cb12[38].xyz). This allows us to calculate dot product between view vector and fog direction.
   float3 frag_vec = -;  
   float frag_dist = length(frag_vec);  
   float3 frag_dir = frag_vec / frag_dist;  
   float dot_fragDirSunDir = dot(, frag_dir);  

Calculating mixing gradient involves using square absolute dot product, and then again multiply the result with some distance-based parameter:
   float3 curr_col_fog;  
   float3 curr_col_aerial;  
     float _dot = dot_fragDirSunDir;  
     float _dd = _dot;  
       const float _distOffset = -150;  
       const float _distRange = 500;  
       const float _mul = 1.0 / _distRange;  
       const float _bias = _distOffset * _mul;  
       _dd = abs(_dd);  
       _dd *= _dd;  
       _dd *= saturate( frag_dist * _mul + _bias );  
     curr_col_fog = lerp(, (_dot>0.0f ? :, _dd );  
     curr_col_aerial = lerp(, (_dot>0.0f ? :, _dd );  

This code block clearly gives us notion where the heck that 0.002 and -0.300 come from. As you can see, dot product between view and light vectors is responsible for choice between 'front' and 'back' colors. Clever!

Here is a visualization of final gradient (_dd).

Caculating of aerial/fog influence is much more complicated, though. As you can see, we have more parameters than just rgb colors. They include e.g. density of scene. We use ray marching (16 steps, that's why the loop can be unrolled) to determine amount of fog and scale factor:

Having a [camera--->world] vector, we can divide all its components by 16 - this will be our ray marching step. As you can see below, only .z component (height) is considered in caluclations (curr_pos_z_step).

You can read more about raymarched fog, for instance, here.

   float fog_amount = 1;  
   float fog_amount_scale = 0;  
   if ( frag_dist >= AerialParams.y )  
     float curr_pos_z_base = (customCameraPos.z + FogBaseParams.y) * density_factor;  
     float curr_pos_z_step = frag_step.z * density_factor;  
     for ( int i=16; i>0; --i )  
       fog_amount *= 1 - saturate( density_sample_scale / (1 + max( 0.0, curr_pos_z_base + (i) * curr_pos_z_step ) ) );  
     fog_amount = 1 - fog_amount;  
     fog_amount_scale = saturate( (frag_dist - AerialParams.y) * AerialParams.z );  
   FogResult ret;  
   ret.paramsFog = float4 ( curr_col_fog, fog_amount_scale * pow( abs(fog_amount), final_exp_fog ) );  
   ret.paramsAerial = float4 ( curr_col_aerial, fog_amount_scale * pow( abs(fog_amount), final_exp_aerial ) );  

Amount of fog clearly depends on height (.z component), at the end fog amount is raised to fog/aerial power.

final_exp_fog and final_exp_aerial are from constant buffer and they allow to control how fog and aerial colors influences world as height raises.

Fog Override

The shader I found did not include this fragment of assembly:
  109:  mad_sat r1.xy, r0.wwww, cb12[189].xzxx, cb12[189].ywyy  
  110:  add, -r3.xyzx, cb12[188].xyzx  
  111:  mad, r1.xxxx, r5.xyzx, r3.xyzx  
  112:  add r0.z, l(-1.000000), cb12[188].w  
  113:  mad r0.z, r1.y, r0.z, l(1.000000)  
  114:  mul_sat r5.w, r0.w, r0.z  
  115:  lt r0.z, l(0.000000), cb12[192].x  
  116:  if_nz r0.z  
  117:   mad_sat r1.xy, r0.wwww, cb12[191].xzxx, cb12[191].ywyy  
  118:   add, -r3.xyzx, cb12[190].xyzx  
  119:   mad, r1.xxxx, r6.xyzx, r3.xyzx  
  120:   add r0.z, l(-1.000000), cb12[190].w  
  121:   mad r0.z, r1.y, r0.z, l(1.000000)  
  122:   mul_sat r3.w, r0.w, r0.z  
  123:   add r1.xyzw, -r5.xyzw, r3.xyzw  
  124:   mad r5.xyzw, cb12[192].xxxx, r1.xyzw, r5.xyzw  
  125:  endif   

From what I managed to understand this looks like a double override of fog color and influence:
For most of the time, there is only one override (cb12_v192.x is 0.0) but in this particular case - its value is ~0.22, so we perform second override.

 #ifdef OVERRIDE_FOG  
   // Override  
   float fog_influence = ret.paramsFog.w; // r0.w  
   float override1ColorScale = cb12_v189.x;  
   float override1ColorBias = cb12_v189.y;  
   float3 override1Color = cb12_v188.rgb;  
   float override1InfluenceScale = cb12_v189.z;  
   float override1InfluenceBias = cb12_v189.w;  
   float override1Influence = cb12_v188.w;  
   float override1ColorAmount = saturate(fog_influence * override1ColorScale + override1ColorBias);  
   float override1InfluenceAmount = saturate(fog_influence * override1InfluenceScale + override1InfluenceBias);    

   float4 paramsFogOverride;  
   paramsFogOverride.rgb = lerp(curr_col_fog, override1Color, override1ColorAmount ); // ***   
   float param1 = lerp(1.0, override1Influence, override1InfluenceAmount); // r0.x  
   paramsFogOverride.w = saturate(param1 * fog_influence ); // ** r5.w  
   const float extraFogOverride = cb12_v192.x;  
   if (extraFogOverride > 0.0)  
     float override2ColorScale = cb12_v191.x;  
     float override2ColorBias = cb12_v191.y;  
     float3 override2Color = cb12_v190.rgb;  
     float override2InfluenceScale = cb12_v191.z;  
     float override2InfluenceBias = cb12_v191.w;  
     float override2Influence = cb12_v190.w;  
     float override2ColorAmount = saturate(fog_influence * override2ColorScale + override2ColorBias);  
     float override2InfluenceAmount = saturate(fog_influence * override2InfluenceScale + override2InfluenceBias);  

     float4 paramsFogOverride2;  
     paramsFogOverride2.rgb = lerp(curr_col_fog, override2Color, override2ColorAmount); //   
     float ov_param1 = lerp(1.0, override2Influence, override2InfluenceAmount); // r0.z  
     paramsFogOverride2.w = saturate(ov_param1 * fog_influence); // r3.w  
     paramsFogOverride = lerp(paramsFogOverride, paramsFogOverride2, extraFogOverride);  
   ret.paramsFog = paramsFogOverride;  

Here is our final scene without fog override (first image), single override (second image) and double override (third image, final result):

Adjusting ambient occlusion

The shader I found also did not use ambient occlusion at all. Let's take a look at AO texture again and code which is of our interest:

  13:  ld_indexable(texture2d)(float,float,float,float) r0.x, r0.xyzw, t2.xyzw  
  14:  max r0.x, r0.x, cb3[1].x  
  15:  add r0.yzw, r1.xxyz, -cb12[0].xxyz  
  16:  dp3 r1.x, r0.yzwy, r0.yzwy  
  17:  sqrt r1.x, r1.x  
  18:  add r1.y, r1.x, -cb3[0].x  
  19:  add, -cb3[0].xxxz, cb3[0].yyyw  
  20:  div_sat r1.y, r1.y, r1.z  
  21:  mad r1.y, r1.y, r1.w, cb3[0].z  
  22:  add r0.x, r0.x, l(-1.000000)  
  23:  mad r0.x, r1.y, r0.x, l(1.000000)  

Maybe this scene is not the best example because there is no detail at the distant island. Anyway, let's take a look at constant buffer which is used for adjusting ambient occlusion value:

We start by loading AO from texture, and then we perform max instruction. In this scene cb3_v1.x is really high (0.96888) which pretty makes our AO very, very subtle.

The next portion of code calculates distance between camera and pixels' world position.

I believe the code sometimes speak for itself, so see HLSL which performs most of this adjusting:
 float AdjustAmbientOcclusion(in float inputAO, in float worldToCameraDistance)  
   // *** Inputs *** //  
   const float aoDistanceStart = cb3_v0.x;  
   const float aoDistanceEnd = cb3_v0.y;  
   const float aoStrengthStart = cb3_v0.z;  
   const float aoStrengthEnd = cb3_v0.w;  
   // * Adjust AO  
   float aoDistanceIntensity = linstep( aoDistanceStart, aoDistanceEnd, worldToCameraDistance );  
   float aoStrength = lerp(aoStrengthStart, aoStrengthEnd, aoDistanceIntensity);   
   float adjustedAO = lerp(1.0, inputAO, aoStrength);  
   return adjustedAO;   

The calculated camera-world distance is used for linstep function. We know this function already - it appeared in cirrus clouds shader from The Witcher 3.

As you can see, in constant buffer we have start/end distance values for AO. The output from linstep affects strength of AO (also from cbuffer) - and strength affects output value of AO.

A quick example: A pixel is far away, let's say the distance is equal to 500.

linstep returns 1.0;
aoStrength is equal to aoStrengthEnd;
This results in returning AO which is about 77% (end strength) of input value.

The input AO for this function was previously max-ed.

Combining all this together

Once we have color and influence for both fog color and aerial color, it's time for final combine.

We start by attenuation of influence using adjusted AO:

   FogResult fog = CalculateFog( worldPos, CameraPosition, fogStart, ao, false );  
   // Apply AO to influence  
   fog.paramsFog.w *= ao;  
   fog.paramsAerial.w *= ao;  
   // Mix fog with scene color  
   outColor = ApplyFog(fog, colorHDR);  

Okay, the whole magic happens in ApplyFog function:

 float3 ApplyFog(FogResult fog, float3 color)  
   const float3 LuminanceFactors = float3(0.333f, 0.555f, 0.222f);  
   float3 aerialColor = dot(LuminanceFactors, color) *;  
   color = lerp(color, aerialColor, fog.paramsAerial.w);  
   color = lerp(color,, fog.paramsFog.w);  

At first we calculate "luminance" of pixels:

Then, we multiply it with aerial color:

Then, we combine HDR color with aerial one:

The last step is combining the intermediate result with fog color:

And that's all :)

Some debug screenshots:

Aerial influence

Aerial color

Fog influence

Fog color

Final scene without fog at all

Final scene with aerial fog only

Final scene - main fog only

Final scene with all fog - again, but for easier comparison


I believe you can get the most out of it if you take a look at the shader - it's here
I'm happy to say that this shader is exactly the same as original one - which makes me really happy :)

In general, final result strongly depends of values passed to shade This is not a "magic" solution which gives perfect color output but rather requires a lot of iterations and tweaking by artists to make the final result look decent. I think this can be long process, but once you do it, the final result is really convincing - just like this sunset scene.
Sky shader from The Witcher 3 also uses these fog calculations in order to make smooth color transition around horizon. However, for sky shader a different set of density coefficients is provided.

As a reminder - most of the shader (except of adjusting AO and overriding) was not created/reversed by me. All kudos and praises go to CD PROJEKT RED. Please support them, they are doing great job.

Thanks for reading!

2 komentarze: