coding adventures: Reverse engineering the rendering of The Witcher 3, part 11

This post is a part of the series "Reverse engineering the rendering of The Witcher 3".

Welcome back!

In the 11th part of the series we will take a look how lightnings are rendered in The Witcher 3: Wild Hunt.

Following the distant rain shafts effect, lightnings are rendered slightly after, but still in forward pass. You can see them in action in the following video:

They last for a very short time; therefore, it will be the best to play this video with 0.25 speed.
You can see that they are not static images; their intensity slightly changes with time.

There are many similarities here with distant rain shafts in terms of rendernig nuances, like the same blending (additive blending) and depth (test enabled, no depth write) states.

Scene without lightning

Scene with lightning

In terms of geometry, lightnings in The Witcher 3 are tree-like meshes, this particular lightning is represented by the following one:

It's provided with UV coordinates and normal vectors. They will be useful in the vertex shader stage.

Vertex Shader

Let's take a look at the assembly of the vertex shader:

 vs_5_0  
    dcl_globalFlags refactoringAllowed  
    dcl_constantbuffer cb1[9], immediateIndexed  
    dcl_constantbuffer cb2[6], immediateIndexed  
    dcl_input v0.xyz  
    dcl_input v1.xy  
    dcl_input v2.xyz  
    dcl_input v4.xyzw  
    dcl_input v5.xyzw  
    dcl_input v6.xyzw  
    dcl_input v7.xyzw  
    dcl_output o0.xy  
    dcl_output o1.xyzw  
    dcl_output_siv o2.xyzw, position  
    dcl_temps 3  
   0: mov o0.xy, v1.xyxx  
   1: mov o1.xyzw, v7.xyzw  
   2: mul r0.xyzw, v5.xyzw, cb1[0].yyyy  
   3: mad r0.xyzw, v4.xyzw, cb1[0].xxxx, r0.xyzw  
   4: mad r0.xyzw, v6.xyzw, cb1[0].zzzz, r0.xyzw  
   5: mad r0.xyzw, cb1[0].wwww, l(0.000000, 0.000000, 0.000000, 1.000000), r0.xyzw  
   6: mov r1.w, l(1.000000)  
   7: mad r1.xyz, v0.xyzx, cb2[4].xyzx, cb2[5].xyzx  
   8: dp4 r2.x, r1.xyzw, v4.xyzw  
   9: dp4 r2.y, r1.xyzw, v5.xyzw  
  10: dp4 r2.z, r1.xyzw, v6.xyzw  
  11: add r2.xyz, r2.xyzx, -cb1[8].xyzx  
  12: dp3 r1.w, r2.xyzx, r2.xyzx  
  13: rsq r1.w, r1.w  
  14: div r1.w, l(1.000000, 1.000000, 1.000000, 1.000000), r1.w  
  15: mul r1.w, r1.w, l(0.000001)  
  16: mad r2.xyz, v2.xyzx, l(2.000000, 2.000000, 2.000000, 0.000000), l(-1.000000, -1.000000, -1.000000, 0.000000)  
  17: mad r1.xyz, r2.xyzx, r1.wwww, r1.xyzx  
  18: mov r1.w, l(1.000000)  
  19: dp4 o2.x, r1.xyzw, r0.xyzw  
  20: mul r0.xyzw, v5.xyzw, cb1[1].yyyy  
  21: mad r0.xyzw, v4.xyzw, cb1[1].xxxx, r0.xyzw  
  22: mad r0.xyzw, v6.xyzw, cb1[1].zzzz, r0.xyzw  
  23: mad r0.xyzw, cb1[1].wwww, l(0.000000, 0.000000, 0.000000, 1.000000), r0.xyzw  
  24: dp4 o2.y, r1.xyzw, r0.xyzw  
  25: mul r0.xyzw, v5.xyzw, cb1[2].yyyy  
  26: mad r0.xyzw, v4.xyzw, cb1[2].xxxx, r0.xyzw  
  27: mad r0.xyzw, v6.xyzw, cb1[2].zzzz, r0.xyzw  
  28: mad r0.xyzw, cb1[2].wwww, l(0.000000, 0.000000, 0.000000, 1.000000), r0.xyzw  
  29: dp4 o2.z, r1.xyzw, r0.xyzw  
  30: mul r0.xyzw, v5.xyzw, cb1[3].yyyy  
  31: mad r0.xyzw, v4.xyzw, cb1[3].xxxx, r0.xyzw  
  32: mad r0.xyzw, v6.xyzw, cb1[3].zzzz, r0.xyzw  
  33: mad r0.xyzw, cb1[3].wwww, l(0.000000, 0.000000, 0.000000, 1.000000), r0.xyzw  
  34: dp4 o2.w, r1.xyzw, r0.xyzw  
  35: ret

There are many similarities here compared to the vertex shader from the one from distant rain shafts so I won't repeat myself. The one major difference I want to show are lines 11-18:

  11: add r2.xyz, r2.xyzx, -cb1[8].xyzx  
  12: dp3 r1.w, r2.xyzx, r2.xyzx  
  13: rsq r1.w, r1.w  
  14: div r1.w, l(1.000000, 1.000000, 1.000000, 1.000000), r1.w  
  15: mul r1.w, r1.w, l(0.000001)  
  16: mad r2.xyz, v2.xyzx, l(2.000000, 2.000000, 2.000000, 0.000000), l(-1.000000, -1.000000, -1.000000, 0.000000)  
  17: mad r1.xyz, r2.xyzx, r1.wwww, r1.xyzx  
  18: mov r1.w, l(1.000000)  
  19: dp4 o2.x, r1.xyzw, r0.xyzw

To start, cb1[8].xyz is camera position while r2.xyz is world-space position, so line 11 calculates vector from camera to world position. Then, lines 12-15 compute length( worldPos - cameraPos) * 0.000001.

v2.xyz is normal vector from input geometry. Line 16 unpacks it from [0-1] range to [-1;1] one.
Then, the final world-space position is calculated:

finalWorldPos = worldPos + length( worldPos - cameraPos) * 0.000001 * normalVector

HLSL snippet for this operation would be something like this:

      ...  
      // final world-space position  
      float3 vNormal = Input.NormalW * 2.0 - 1.0;  
      float lencameratoworld = length( PositionL - g_cameraPos.xyz) * 0.000001;  
   
      PositionL += vNormal*lencameratoworld;  
   
      // SV_Posiiton   
      float4x4 matModelViewProjection = mul(g_viewProjMatrix, matInstanceWorld );   
      Output.PositionH = mul( float4(PositionL, 1.0), transpose(matModelViewProjection) );      
   
      return Output;

Such operation causes slight "explosion" of the mesh (in the direction of normal vector). I did some simple experiments and replaced 0.000001 with a few different values, see the results:

0.000002

0.000005

0.00001

0.000025

Pixel Shader

Ok, we're done with the vertex shader, time to see the assembly of pixel shader!

 ps_5_0  
    dcl_globalFlags refactoringAllowed  
    dcl_constantbuffer cb0[1], immediateIndexed  
    dcl_constantbuffer cb2[3], immediateIndexed  
    dcl_constantbuffer cb4[5], immediateIndexed  
    dcl_input_ps linear v0.x  
    dcl_input_ps linear v1.w  
    dcl_output o0.xyzw  
    dcl_temps 1  
   0: mad r0.x, cb0[0].x, cb4[4].x, v0.x  
   1: add r0.y, r0.x, l(-1.000000)  
   2: round_ni r0.y, r0.y  
   3: ishr r0.z, r0.y, l(13)  
   4: xor r0.y, r0.y, r0.z  
   5: imul null, r0.z, r0.y, r0.y  
   6: imad r0.z, r0.z, l(0x0000ec4d), l(0.0000000000000000000000000000000000001)  
   7: imad r0.y, r0.y, r0.z, l(146956042240.000000)  
   8: and r0.y, r0.y, l(0x7fffffff)  
   9: round_ni r0.z, r0.x  
  10: frc r0.x, r0.x  
  11: add r0.x, -r0.x, l(1.000000)  
  12: ishr r0.w, r0.z, l(13)  
  13: xor r0.z, r0.z, r0.w  
  14: imul null, r0.w, r0.z, r0.z  
  15: imad r0.w, r0.w, l(0x0000ec4d), l(0.0000000000000000000000000000000000001)  
  16: imad r0.z, r0.z, r0.w, l(146956042240.000000)  
  17: and r0.z, r0.z, l(0x7fffffff)  
  18: itof r0.yz, r0.yyzy  
  19: mul r0.z, r0.z, l(0.000000001)  
  20: mad r0.y, r0.y, l(0.000000001), -r0.z  
  21: mul r0.w, r0.x, r0.x  
  22: mul r0.x, r0.x, r0.w  
  23: mul r0.w, r0.w, l(3.000000)  
  24: mad r0.x, r0.x, l(-2.000000), r0.w  
  25: mad r0.x, r0.x, r0.y, r0.z  
  26: add r0.y, -cb4[2].x, cb4[3].x  
  27: mad_sat r0.x, r0.x, r0.y, cb4[2].x  
  28: mul r0.x, r0.x, v1.w  
  29: mul r0.yzw, cb4[0].xxxx, cb4[1].xxyz  
  30: mul r0.xyzw, r0.xyzw, cb2[2].wxyz  
  31: mul o0.xyz, r0.xxxx, r0.yzwy  
  32: mov o0.w, r0.x  
  33: ret

The good thing: this code is not that long.
The bad thing:

   3: ishr r0.z, r0.y, l(13)  
   4: xor r0.y, r0.y, r0.z  
   5: imul null, r0.z, r0.y, r0.y  
   6: imad r0.z, r0.z, l(0x0000ec4d), l(0.0000000000000000000000000000000000001)  
   7: imad r0.y, r0.y, r0.z, l(146956042240.000000)  
   8: and r0.y, r0.y, l(0x7fffffff)

...what the heck is this???

To be honest, it's not the first time when I see that piece of cra... assembly in shaders of The Witcher 3. But when I found it for the first time, I was like: "oh crap, wtf is this?".

Indeed, you can find something like this in a few shaders of TW3. I won't go now through many adventures I've had with this one, but let me just say the answer is integer noise:

 // For more details see: http://libnoise.sourceforge.net/noisegen/  
 float integerNoise( int n )  
 {  
      n = (n >> 13) ^ n;  
      int nn = (n * (n * n * 60493 + 19990303) + 1376312589) & 0x7fffffff;  
      return ((float)nn / 1073741824.0);  
 }

Phew. As you can see, it's invoked 2 times in the pixel shader. Following the guidance from its website gives us some useful tips how to implement smooth noise properly. I'll back to this in a minute.

Take a look at line 0, we perform animation here based on the following formula:

animation = elapsedTime * animationSpeed + TextureUV.x

These values, after being floored (round_ni instruction) are subsequent entry points to integer noise. Generally we calculate value of the noise for two integers and then calculate final, interpolated value between them (see libnoise's website for details).

Okay, this is integer noise while all previously mentioned values (also floored values) are floats!
Please notice that there are no ftoi instructions here. My guess is that programmers from CD Projekt Red used here asint HLSL intrinsic function which performs "reinterpret_cast" of floating-point values and treat it like integer pattern.

The interpolation weight for two values is calculated in lines 10-11

interpolationWeight = 1.0 - frac( animation );

Such approach allows to interpolate between values with time.
To make a smooth noise, this interpolant is passed to SCurve function:

 float s_curve( float x )  
 {  
   float x2 = x * x;  
   float x3 = x2 * x;  
     
   // -2x^3 + 3x^2  
   return -2.0*x3 + 3.0*x2;  
 }

Smoothstep function [libnoise.sourceforge.net]

This function is known as "smoothstep". But as you can see from the assembly, this is not smoothstep instrinsic from HLSL. The intrinsic performs some clamps to make sure the values will be correct. Since we know that our interpolationWeight will always be in [0-1] range we can safely omit these checks.

Calculating the final value includes a few multiplications. Please take a look that the final output alpha can change, depending on noise value. This is handy, because it will affect opacity of the rendered lightning - just like in real life.

The final pixel shader:

 cbuffer cbPerFrame : register (b0)  
 {  
   float4 cb0_v0;  
   float4 cb0_v1;  
   float4 cb0_v2;  
   float4 cb0_v3;  
 }  
   
 cbuffer cbPerFrame : register (b2)  
 {  
   float4 cb2_v0;  
   float4 cb2_v1;  
   float4 cb2_v2;  
   float4 cb2_v3;  
 }  
   
 cbuffer cbPerFrame : register (b4)  
 {  
   float4 cb4_v0;  
   float4 cb4_v1;  
   float4 cb4_v2;  
   float4 cb4_v3;  
   float4 cb4_v4;  
 }  
   
 struct VS_OUTPUT  
 {  
   float2 Texcoords : Texcoord0;  
   float4 InstanceLODParams : INSTANCE_LOD_PARAMS;  
   float4 PositionH : SV_Position;  
 };  
   
 // Shaders in TW3 use integer noise.  
 // For more details see: http://libnoise.sourceforge.net/noisegen/  
 float integerNoise( int n )  
 {  
   n = (n >> 13) ^ n;  
   int nn = (n * (n * n * 60493 + 19990303) + 1376312589) & 0x7fffffff;  
   return ((float)nn / 1073741824.0);  
 }  
   
 float s_curve( float x )  
 {  
   float x2 = x * x;  
   float x3 = x2 * x;  
   
   // -2x^3 + 3x^2  
   return -2.0*x3 + 3.0*x2;  
 }  
   
 float4 Lightning_TW3_PS( in VS_OUTPUT Input ) : SV_Target
 {  
   // * Inputs  
   float elapsedTime = cb0_v0.x;  
   float animationSpeed = cb4_v4.x;  
   
   float minAmount = cb4_v2.x;  
   float maxAmount = cb4_v3.x;  
   
   float colorMultiplier = cb4_v0.x;  
   float3 colorFilter = cb4_v1.xyz;  
   float3 lightningColorRGB = cb2_v2.rgb;  
   
   
   // Animation using time and X texcoord  
   float animation = elapsedTime * animationSpeed + Input.Texcoords.x;  
   
   // Input parameters for Integer Noise.  
   // They are floored and please note there are using asint.  
   // That might be an optimization to avoid "ftoi" instructions.  
   int intX0 = asint( floor(animation) );  
   int intX1 = asint( floor(animation-1.0) );  
   
   float n0 = integerNoise( intX0 );  
   float n1 = integerNoise( intX1 );    
   
   // We interpolate "backwards" here.  
   float weight = 1.0 - frac(animation);  
   
   // Following the instructions from libnoise, we perform  
   // smooth interpolation here with cubic s-curve function.  
   float noise = lerp( n0, n1, s_curve(weight) );  
   
   // Make sure we are in [0.0 - 1.0] range.  
   float lightningAmount = saturate( lerp(minAmount, maxAmount, noise) );  
   lightningAmount *= Input.InstanceLODParams.w;    // 1.0  
   lightningAmount *= cb2_v2.w;             // 1.0  
   
   // Calculate final lightning color   
   float3 lightningColor = colorMultiplier * colorFilter;  
   lightningColor *= lighntingColorRGB;  
   
   float3 finalLightningColor = lightningColor * lightningAmount;  
   return float4( finalLightningColor, lightningAmount );  
 }

Summary

In this post I described how lightnings are rendered in The Witcher 3.
I'm more than happy that output assembly from my shader is the same as the original one!

On the left - my shader, on the right - original assembly

I hope you enjoyed it! Thanks for reading.

Feel free to comment and take care,
M.

coding adventures

Monday, March 4, 2019

Reverse engineering the rendering of The Witcher 3, part 11 - lightnings

Vertex Shader

Pixel Shader

Summary

No comments:

Post a Comment