niedziela, 20 stycznia 2019

Reverse engineering the rendering of The Witcher 3, part 10 - distant rain shafts

Welcome to the 10th part of the series! Woohooo! :)

This time we will take a look at really cool atmospheric effect I really like - distant rain/light shafts near the horizon. The easiest way to encounter them in the game is visiting Skellige Islands.



Personally I really love such atmospherical phenomena and I was really curious how graphics programmers from CD Projekt Red implemented this. Let's find out!

Here are two screenshots before and after applying rain shafts:

Before rain shafts

After rain shafts

Geometry

Our first stop is geometry. The idea is to use small cylinder:
Cylinder in local space
In terms of local space position, it's pretty small - the range of position is in ( 0.0 - 1.0 ).

The input layout for this draw call looks like this...

What is important to us here: Texcoords and Instance_Transform.

Texcoords are wrapped pretty simply: U of both upper and lower base is in [0.02777 - 1.02734] range. V is equal to 1.0 on lower base and equal to 0.0 on upper one. As you can see, it's pretty simple to even generate this mesh procedurally.

As we have this small cylinder in local space, we multiply it by world matrix which is provided by per-instance INSTANCE_TRANSFORM input element. Let's check values of this matrix:



It looks quite intimidating, isn't it? Don't worry, let's do some decomposing and see what this matrix hides!
 XMMATRIX mat( -227.7472,  159.8043,  374.0736, -116.4951,  
               -194.7577, -173.3836, -494.4982,  238.6908,  
               -14.16466, -185.4743,  784.564,   -1.45565,  
                0.0, 0.0, 0.0, 1.0 );  
   
      mat = XMMatrixTranspose( mat );  
   
      XMVECTOR vScale;  
      XMVECTOR vRotateQuat;  
      XMVECTOR vTranslation;  
      XMMatrixDecompose( &vScale, &vRotateQuat, &vTranslation, mat );  
   
      // Rotation matrix...  
      XMMATRIX matRotate = XMMatrixRotationQuaternion( vRotateQuat );  

Results are really interesting:

vRotateQuat: (0.0924987569, -0.314900011, 0.883411944, -0.334462732)
vScale: (299.999969, 300.000000, 1000.00012)
vTranslation: (-116.495102, 238.690796, -1.45564997)

It's important to know camera position at this particular frame: ( -116.5338, 234.8695, 2.09 )

As you can see, we scale the cylinder to make it pretty big in world space ( in TW3 Z axis is up-one), translate it with respect to camera position and rotate.
Here is the cylinder after vertex shader transform:

Cylinder after transforming by vertex shader. See how it's placed relative to view frustum


Vertex Shader

Input geometry and vertex shader are are strictly dependent on each other.
Let's take a closer look at assembly of vertex shader:

 vs_5_0  
    dcl_globalFlags refactoringAllowed  
    dcl_constantbuffer cb1[7], immediateIndexed  
    dcl_constantbuffer cb2[6], immediateIndexed  
    dcl_input v0.xyz  
    dcl_input v1.xy  
    dcl_input v4.xyzw  
    dcl_input v5.xyzw  
    dcl_input v6.xyzw  
    dcl_input v7.xyzw  
    dcl_output o0.xyz  
    dcl_output o1.xyzw  
    dcl_output_siv o2.xyzw, position  
    dcl_temps 2  
   0: mov o0.xy, v1.xyxx  
   1: mul r0.xyzw, v5.xyzw, cb1[6].yyyy  
   2: mad r0.xyzw, v4.xyzw, cb1[6].xxxx, r0.xyzw  
   3: mad r0.xyzw, v6.xyzw, cb1[6].zzzz, r0.xyzw  
   4: mad r0.xyzw, cb1[6].wwww, l(0.000000, 0.000000, 0.000000, 1.000000), r0.xyzw  
   5: mad r1.xyz, v0.xyzx, cb2[4].xyzx, cb2[5].xyzx  
   6: mov r1.w, l(1.000000)  
   7: dp4 o0.z, r1.xyzw, r0.xyzw  
   8: mov o1.xyzw, v7.xyzw  
   9: mul r0.xyzw, v5.xyzw, cb1[0].yyyy  
  10: mad r0.xyzw, v4.xyzw, cb1[0].xxxx, r0.xyzw  
  11: mad r0.xyzw, v6.xyzw, cb1[0].zzzz, r0.xyzw  
  12: mad r0.xyzw, cb1[0].wwww, l(0.000000, 0.000000, 0.000000, 1.000000), r0.xyzw  
  13: dp4 o2.x, r1.xyzw, r0.xyzw  
  14: mul r0.xyzw, v5.xyzw, cb1[1].yyyy  
  15: mad r0.xyzw, v4.xyzw, cb1[1].xxxx, r0.xyzw  
  16: mad r0.xyzw, v6.xyzw, cb1[1].zzzz, r0.xyzw  
  17: mad r0.xyzw, cb1[1].wwww, l(0.000000, 0.000000, 0.000000, 1.000000), r0.xyzw  
  18: dp4 o2.y, r1.xyzw, r0.xyzw  
  19: mul r0.xyzw, v5.xyzw, cb1[2].yyyy  
  20: mad r0.xyzw, v4.xyzw, cb1[2].xxxx, r0.xyzw  
  21: mad r0.xyzw, v6.xyzw, cb1[2].zzzz, r0.xyzw  
  22: mad r0.xyzw, cb1[2].wwww, l(0.000000, 0.000000, 0.000000, 1.000000), r0.xyzw  
  23: dp4 o2.z, r1.xyzw, r0.xyzw  
  24: mul r0.xyzw, v5.xyzw, cb1[3].yyyy  
  25: mad r0.xyzw, v4.xyzw, cb1[3].xxxx, r0.xyzw  
  26: mad r0.xyzw, v6.xyzw, cb1[3].zzzz, r0.xyzw  
  27: mad r0.xyzw, cb1[3].wwww, l(0.000000, 0.000000, 0.000000, 1.000000), r0.xyzw  
  28: dp4 o2.w, r1.xyzw, r0.xyzw  
  29: ret  

Together with simple passing of Texcoords (line 0) and Instance_LOD_Params (line 8), two more things are needed for output: SV_Position (obviously) and Height (.z component) of world position.

Remember that local space is in [0-1] range? Well, the vertex shader uses scale&bias to adjust local position just before applying world matrix. Smart!
In this case, we have scale = float3(4, 4, 2) and bias = float3(-2, -2, -1).

The pattern you can notice between line 9 and 28 is multiplying two row-major martices.
Just take a look at final vertex shader in HLSL :)
 cbuffer cbPerFrame : register (b1)  
 {  
   row_major float4x4 g_viewProjMatrix;  
   row_major float4x4 g_rainShaftsViewProjMatrix;  
 }  
   
 cbuffer cbPerObject : register (b2)  
 {  
   float4x4 g_mtxWorld;  
   float4 g_modelScale;  
   float4 g_modelBias;  
 }  
   
 struct VS_INPUT  
 {  
   float3 PositionW : POSITION;  
   float2 Texcoord : TEXCOORD;  
   float3 NormalW : NORMAL;  
   float3 TangentW : TANGENT;  
   float4 InstanceTransform0 : INSTANCE_TRANSFORM0;  
   float4 InstanceTransform1 : INSTANCE_TRANSFORM1;  
   float4 InstanceTransform2 : INSTANCE_TRANSFORM2;  
   float4 InstanceLODParams  : INSTANCE_LOD_PARAMS;  
 };  
   
 struct VS_OUTPUT  
 {  
   float3 TexcoordAndZ : Texcoord0;  
   
   float4 LODParams : LODParams;  
   float4 PositionH : SV_Position;  
 };  
   
 VS_OUTPUT RainShaftsVS( VS_INPUT Input )  
 {  
   VS_OUTPUT Output = (VS_OUTPUT)0;  
   
   // simple data passing  
   Output.TexcoordAndZ.xy = Input.Texcoord;  
   Output.LODParams = Input.InstanceLODParams;  
   
   // world space  
   float3 meshScale = g_modelScale.xyz;  // float3( 4, 4, 2 );
   float3 meshBias =  g_modelBias.xyz;   // float3( -2, -2, -1 );
   float3 PositionL = Input.PositionW * meshScale + meshBias;  
   
   // Manually build instanceLod and viewProj martices from float4s:  
   float4x4 matInstanceLod = float4x4(Input.InstanceTransform0, Input.InstanceTransform1,  
   Input.InstanceTransform2 , float4(0, 0, 0, 1) );  
   
   // World-space Height (.z)  
   float4x4 matWorldInstanceLod = mul( g_rainShaftsViewProjMatrix, matInstanceLod );  
   Output.TexcoordAndZ.z = mul( float4(PositionL, 1.0), transpose(matWorldInstanceLod) ).z;  
   
   // SV_Posiiton  
   float4x4 matModelViewProjection = mul(g_viewProjMatrix, matInstanceLod);  
   Output.PositionH = mul( float4(PositionL, 1.0), transpose(matModelViewProjection) );       
 
   return Output;  
 }   


Comparison between my VS (left) and original one (right):


The differences do not affect calculations ;) I injected my VS into frame and everything is alright!


Pixel Shader

Finally....! For start, I'll show you inputs:
There are two textures involved: noise texture and depth buffer:



Values from constant buffers:





And pixel shader assembly:
 ps_5_0  
    dcl_globalFlags refactoringAllowed  
    dcl_constantbuffer cb0[8], immediateIndexed  
    dcl_constantbuffer cb2[3], immediateIndexed  
    dcl_constantbuffer cb12[23], immediateIndexed  
    dcl_constantbuffer cb4[8], immediateIndexed  
    dcl_sampler s0, mode_default  
    dcl_sampler s15, mode_default  
    dcl_resource_texture2d (float,float,float,float) t0  
    dcl_resource_texture2d (float,float,float,float) t15  
    dcl_input_ps linear v0.xyz  
    dcl_input_ps linear v1.w  
    dcl_input_ps_siv v2.xy, position  
    dcl_output o0.xyzw  
    dcl_temps 1  
   0: mul r0.xy, cb0[0].xxxx, cb4[5].xyxx  
   1: mad r0.xy, v0.xyxx, cb4[4].xyxx, r0.xyxx  
   2: sample_indexable(texture2d)(float,float,float,float) r0.x, r0.xyxx, t0.xyzw, s0  
   3: add r0.y, -cb4[2].x, cb4[3].x  
   4: mad_sat r0.x, r0.x, r0.y, cb4[2].x  
   5: mul r0.x, r0.x, v0.y  
   6: mul r0.x, r0.x, v1.w  
   7: mul r0.x, r0.x, cb4[1].x  
   8: mul r0.yz, v2.xxyx, cb0[1].zzwz  
   9: sample_l(texture2d)(float,float,float,float) r0.y, r0.yzyy, t15.yxzw, s15, l(0)  
  10: mad r0.y, r0.y, cb12[22].x, cb12[22].y  
  11: mad r0.y, r0.y, cb12[21].x, cb12[21].y  
  12: max r0.y, r0.y, l(0.000100)  
  13: div r0.y, l(1.000000, 1.000000, 1.000000, 1.000000), r0.y  
  14: add r0.y, r0.y, -v0.z  
  15: mul_sat r0.y, r0.y, cb4[6].x  
  16: mul_sat r0.x, r0.y, r0.x  
  17: mad r0.y, cb0[7].y, r0.x, -r0.x  
  18: mad r0.x, cb4[7].x, r0.y, r0.x  
  19: mul r0.xyz, r0.xxxx, cb4[0].xyzx  
  20: log r0.xyz, r0.xyzx  
  21: mul r0.xyz, r0.xyzx, l(2.200000, 2.200000, 2.200000, 0.000000)  
  22: exp r0.xyz, r0.xyzx  
  23: mul r0.xyz, r0.xyzx, cb2[2].xyzx  
  24: mul o0.xyz, r0.xyzx, cb2[2].wwww  
  25: mov o0.w, l(0)  
  26: ret  

Phew! Quite a lot of stuff, but actually it's not that bad.


So, what happens here? At first we calculate animated UVs using elapsed time from cbuffer (cb0[0].x) and some scale/offsets. These texcoords are used to sample from noise texture (line 2).

Once we have noise value from texture, we interpolate between min/max values (usually 0 and 1).
Then we perform some multiplications, like by V tex coordinate (remember that V coordinate comes from 1 to 0?) - line 5

This way we calculated "intensity mask" - it looks like this:



Notice that distant objects (lighthouse, mountains...) are gone. This happens because the cylinder passes depth test - the cylinder is not on far plane and is drawn in front of aforementioned objects:
depth test
We want to mimic that the rain shafts are further (not necessary on far plane, though). To achieve that, we compute another mask, "far objects mask".

So we compute it with following formula:
mask = saturate( (FrustumDepth - CylinderWorldSpaceHeight) * 0.001 );

(0.001 comes from cbuffer)

which gives us desired mask:


( I explained a bit how frustum depth is extracted from depth buffer in my post about sharpen )

Personally, I think this could be done cheaper, without calculating world-space height in VS by multiplying frusum depth by smaller number, like 0.0004.


Then we multiply one with another, which yields final effect mask:


Having this final mask (line 16) we have another interpolation which pretty much does nothing (at least in tested scenario), then we multiply the final mask with shafts color (line 19), perform gamma correction (lines 20-22) and perform final multiplications (23-24).

At the end we return color with zero alpha. This is because blending is enabled in this pass:

FinalColor = Source * 1.0 + (1.0 - SrcAlpha) * DestColor.

Simply speaking, we perform additive blending here. If this pixel shader returns (0, 0, 0) the color will remain the same.

Here is final HLSL - I think after this description it will be much easier to follow:
 struct VS_OUTPUT  
 {  
   float3 TexcoordAndWorldspaceHeight : Texcoord0;  
   float4 LODParams : LODParams;    // float4(1,1,1,1)  
   float4 PositionH : SV_Position;  
 };  
   
 float getFrustumDepth( in float depth )  
 {  
   // from [1-0] to [0-1]  
   float d = depth * cb12_v22.x + cb12_v22.y;  
   
   // special coefficents  
   d = d * cb12_v21.x + cb12_v21.y;  
   
   // return frustum depth  
   return 1.0 / max(d, 1e-4);  
 }  
   
 float4 EditedShaderPS( in VS_OUTPUT Input ) : SV_Target0  
 {  
   // * Input from Vertex Shader  
   float2 InputUV = Input.TexcoordAndWorldspaceHeight.xy;  
   float WorldHeight = Input.TexcoordAndWorldspaceHeight.z;  
   float LODParam = Input.LODParams.w;  
   
   // * Inputs  
   float elapsedTime = cb0_v0.x;  
   float2 uvAnimation = cb4_v5.xy;  
   float2 uvScale = cb4_v4.xy;    
   float minValue = cb4_v2.x; // 0.0  
   float maxValue = cb4_v3.x; // 1.0  
   float3 shaftsColor = cb4_v0.rgb;  // RGB( 147, 162, 173 )  
   
   float3 finalColorFilter = cb2_v2.rgb; // float3( 1.175, 1.296, 1.342 );  
   float finalEffectIntensity = cb2_v2.w;  
   
   float2 invViewportSize = cb0_v1.zw;  
   
   float depthScale = cb4_v6.x;  // 0.001  
   
   // sample noise  
   float2 uvOffsets = elapsedTime * uvAnimation;  
   float2 uv = InputUV * uvScale + uvOffsets;    
   float disturb = texture0.Sample( sampler0, uv ).x;  
   
   // * Intensity mask  
   float intensity = saturate( lerp(minValue, maxValue, disturb) );  
   intensity *= InputUV.y;   // transition from (0, 1)  
   intensity *= LODParam;   // usually 1.0  
   intensity *= cb4_v1.x;   // 1.0    
   
   // Sample depth  
   float2 ScreenUV = Input.PositionH.xy * invViewportSize;  
   float hardwareDepth = texture15.SampleLevel( sampler15, ScreenUV, 0 ).x;  
   float frustumDepth = getFrustumDepth( hardwareDepth );  
   
   
   // * Calculate mask covering distant objects behind cylinder.  
   
   // Seems that the input really is world-space height (.z component, see vertex shader)  
   float depth = frustumDepth - WorldHeight;  
   float distantObjectsMask = saturate( depth * depthScale );  
   
   // * calculate final mask  
   float finalEffectMask = saturate( intensity * distantObjectsMask );  
   
   // cb0_v7.y and cb4_v7.x are set to 1.0 so I didn't bother with naming them :)  
   float paramX = finalEffectMask;  
   float paramY = cb0_v7.y * finalEffectMask;  
   float effectAmount = lerp(paramX, paramY, cb4_v7.x);  
   
   // color of shafts comes from contant buffer  
   float3 effectColor = effectAmount * shaftsColor;  
   
   // gamma correction  
   effectColor = pow(effectColor, 2.2);  
   
   // final multiplications  
   effectColor *= finalColorFilter;  
   effectColor *= finalEffectIntensity;  
   
   // return with zero alpha 'cause the blending used here is:  
   // SourceColor * 1.0 + (1.0 - SrcAlpha) * DestColor  
   return float4( effectColor, 0.0 );  
 }   

I'm happy to say that my PS produces the same assembly as original is ;)

I hope you enjoyed it.
Thanks for reading! :)

M.

piątek, 28 grudnia 2018

Reverse engineering the rendering of The Witcher 3, part 9 - GBuffer

Welcome,

This is the ninth part of my series about rendering in The Witcher 3. Click here for full index.

In this part I will show some details about geometry buffer (gbuffer) in The Witcher 3.

I assume here that you know the basics of deferred shading. 
Quick recap: the idea is to, well, defer rendering by not calculating all final lighting and shading immediately, but instead separate calculations into two stages.
In the first one (geometry pass) we fill GBuffer with data about surface (position, normals, specular color etc...) and in the second one (lighting pass) we combine everything and calculate lighting. 

Deferred shading is hugely popular approach because it allows to calculate lighting in one full-screen pass with techniques like tile-based deferred shading which greatly improves performance.

Simply speaking, GBuffer is collecton of textures with properties of geometry. It's very important to design its layout carefully. For real-life example, check for instance The Rendering Technologies of Crysis 3

After this brief introduction let's take a look at example frame from The Witcher 3: Blood & Wine:
One of many inns in Toussaint

The main GBuffer consists of three fullscreen render targets with DXGI_FORMAT_R8G8B8A8_UNORM format
and DXGI_FORMAT_D24_UNORM_S8_UINT depth+stencil buffer.

Here are screenshots of them:
Render Target 0 - RGB channels, surface color

Render Target 0 - A channel. I have no idea what it is, really.

Render Target 1 - RGB channels. We have normal vectors in [0-1] range here.

Render Target 1 - A channel. Looks like reflectance!

Render Target 2 - RGB channels. Looks like specular color!
A channel is black in this scene (but it is used later)

Depth buffer. Note that reversed depth is used here

Stencil buffer to mark certain type of pixels (like skin, vegetation etc)
This is not whole GBuffer. Lighting pass also uses reflection probes and other buffers but this is not the subject of this post.

Before I start the "main" part of this post, some general observations first:


General observations


1) The only buffer to clear is depth/stencil.

If you analyze aforementioned textures in any good frame analyzer you may be a little surprised, because there is no "Clear" call on them with exception of Depth/Stencil.

So in reality RenderTarget1 looks like this (notice "blurred" pixels on far plane):

This is simple and nice optimization. 
Take with you: ClearRenderTargetView calls are not free, so use them only when really necessary.


2) Reversed depth rocks

Many articles have been already written about precision of floating-point depth buffer. The Witcher 3 uses reversed-z which is natural choice for such game with open world and long draw distances.

For DirectX the switch shouldn't be difficult:

a) Clear depth buffer with "0" intead on "1".
In a traditional approach we used to clear depth buffer far value of "1". After reversing depth, the new "far" value is zero, so we need to change that.

b) Flip near and far clip values when calculating projection matrix

c) Change depth test from "Less" to "Greater".

For OpenGL there is a bit more work (see mentioned articles) but it is really worth the effort.


3) Do not store world position

It is that simple. Reconstruct world position from depth in lighting pass.


Pixel Shader

What I want to show in this post is pixel shader which feeds GBuffer with surface data. 
So we know by now we that store at least color, normals and specular.
Of course it's not that simple as you may think.

The problem with this pixel shader is that it comes in many variants. They differ in number of textures consumed and number of parameters used from constant buffer (probably constant buffer which describes material).

I decided to use this nice barrel for analyze:
Our heroic barrel!
And please give warm welcome to textures used:

So we have albedo, normal map and specular color. Pretty common scenario.

Before we start, few words about geometry inputs:
The geometry comes with position, texcoords, normal and tangent buffers.
Vertex Shader outputs at least texcoords, normalized tangent/normal/bitangent vectors multiplied earlier by world matrix. For more complicated materials (like with two diffuse or normal maps) vertex shader can output other data but I wanted to show here the simple cases.


Pixel Shader as assembly:
 ps_5_0  
    dcl_globalFlags refactoringAllowed  
    dcl_constantbuffer cb4[3], immediateIndexed  
    dcl_sampler s0, mode_default  
    dcl_sampler s13, mode_default  
    dcl_resource_texture2d (float,float,float,float) t0  
    dcl_resource_texture2d (float,float,float,float) t1  
    dcl_resource_texture2d (float,float,float,float) t2  
    dcl_resource_texture2d (float,float,float,float) t13  
    dcl_input_ps linear v0.zw  
    dcl_input_ps linear v1.xyzw  
    dcl_input_ps linear v2.xyz  
    dcl_input_ps linear v3.xyz  
    dcl_input_ps_sgv v4.x, isfrontface  
    dcl_output o0.xyzw  
    dcl_output o1.xyzw  
    dcl_output o2.xyzw  
    dcl_temps 3  
   0: sample_indexable(texture2d)(float,float,float,float) r0.xyzw, v1.xyxx, t1.xyzw, s0  
   1: sample_indexable(texture2d)(float,float,float,float) r1.xyz, v1.xyxx, t0.xyzw, s0  
   2: add r1.w, r1.y, r1.x  
   3: add r1.w, r1.z, r1.w  
   4: mul r2.x, r1.w, l(0.333300)  
   5: add r2.y, l(-1.000000), cb4[1].x  
   6: mul r2.y, r2.y, l(0.500000)  
   7: mov_sat r2.z, r2.y  
   8: mad r1.w, r1.w, l(-0.666600), l(1.000000)  
   9: mad r1.w, r2.z, r1.w, r2.x  
  10: mul r2.xzw, r1.xxyz, cb4[0].xxyz  
  11: mul_sat r2.xzw, r2.xxzw, l(1.500000, 0.000000, 1.500000, 1.500000)  
  12: mul_sat r1.w, abs(r2.y), r1.w  
  13: add r2.xyz, -r1.xyzx, r2.xzwx  
  14: mad r1.xyz, r1.wwww, r2.xyzx, r1.xyzx  
  15: max r1.w, r1.z, r1.y  
  16: max r1.w, r1.w, r1.x  
  17: lt r1.w, l(0.220000), r1.w  
  18: movc r1.w, r1.w, l(-0.300000), l(-0.150000)  
  19: mad r1.w, v0.z, r1.w, l(1.000000)  
  20: mul o0.xyz, r1.wwww, r1.xyzx  
  21: add r0.xyz, r0.xyzx, l(-0.500000, -0.500000, -0.500000, 0.000000)  
  22: add r0.xyz, r0.xyzx, r0.xyzx  
  23: mov r1.x, v0.w  
  24: mov r1.yz, v1.zzwz  
  25: mul r1.xyz, r0.yyyy, r1.xyzx  
  26: mad r1.xyz, v3.xyzx, r0.xxxx, r1.xyzx  
  27: mad r0.xyz, v2.xyzx, r0.zzzz, r1.xyzx  
  28: uge r1.x, l(0), v4.x  
  29: if_nz r1.x  
  30:  dp3 r1.x, v2.xyzx, r0.xyzx  
  31:  mul r1.xyz, r1.xxxx, v2.xyzx  
  32:  mad r0.xyz, -r1.xyzx, l(2.000000, 2.000000, 2.000000, 0.000000), r0.xyzx  
  33: endif  
  34: sample_indexable(texture2d)(float,float,float,float) r1.xyz, v1.xyxx, t2.xyzw, s0  
  35: max r1.w, r1.z, r1.y  
  36: max r1.w, r1.w, r1.x  
  37: lt r1.w, l(0.200000), r1.w  
  38: movc r2.xyz, r1.wwww, r1.xyzx, l(0.120000, 0.120000, 0.120000, 0.000000)  
  39: add r2.xyz, -r1.xyzx, r2.xyzx  
  40: mad o2.xyz, v0.zzzz, r2.xyzx, r1.xyzx  
  41: lt r1.x, r0.w, l(0.330000)  
  42: mul r1.y, r0.w, l(0.950000)  
  43: movc r1.x, r1.x, r1.y, l(0.330000)  
  44: add r1.x, -r0.w, r1.x  
  45: mad o1.w, v0.z, r1.x, r0.w  
  46: dp3 r0.w, r0.xyzx, r0.xyzx  
  47: rsq r0.w, r0.w  
  48: mul r0.xyz, r0.wwww, r0.xyzx  
  49: max r0.w, abs(r0.y), abs(r0.x)  
  50: max r0.w, r0.w, abs(r0.z)  
  51: lt r1.xy, abs(r0.zyzz), r0.wwww  
  52: movc r1.yz, r1.yyyy, abs(r0.zzyz), abs(r0.zzxz)  
  53: movc r1.xy, r1.xxxx, r1.yzyy, abs(r0.yxyy)  
  54: lt r1.z, r1.y, r1.x  
  55: movc r1.xy, r1.zzzz, r1.xyxx, r1.yxyy  
  56: div r1.z, r1.y, r1.x  
  57: div r0.xyz, r0.xyzx, r0.wwww  
  58: sample_l(texture2d)(float,float,float,float) r0.w, r1.xzxx, t13.yzwx, s13, l(0)  
  59: mul r0.xyz, r0.wwww, r0.xyzx  
  60: mad o1.xyz, r0.xyzx, l(0.500000, 0.500000, 0.500000, 0.000000), l(0.500000, 0.500000, 0.500000, 0.000000)  
  61: mov o0.w, cb4[2].x  
  62: mov o2.w, l(0)  
  63: ret  

The shader has few stages. I will describe each main part of this shader separately.
But at first, as always - screenshot with values from constant buffer:

Albedo

We start with hard stuff. It's not that simple as "OutputColor.rgb = Texture.Sample(uv).rgb"
After we sample RGB of color texture (line 1) the next 14 lines are something which I called "desaturation filter". Let me show you HLSL code:

 float3 albedoColorFilter( in float3 color, in float desaturationFactor, in float3 desaturationValue )  
 {  
   float sumColorComponents = color.r + color.g + color.b;  
    
   float averageColorComponentValue = 0.3333 * sumColorComponents;  
   float oneMinusAverageColorComponentValue = 1.0 - averageColorComponentValue;  
     
   float factor = 0.5 * (desaturationFactor - 1.0);  
     
   float avgColorComponent = lerp(averageColorComponentValue, oneMinusAverageColorComponentValue, saturate(factor));  
   float3 desaturatedColor = saturate(color * desaturationValue * 1.5);  
    
   float mask = saturate( avgColorComponent * abs(factor) );  
   
   float3 finalColor = lerp( color, desaturatedColor, mask );  
   return finalColor;  
 }  

For majority of objects, this code does nothing but returns the original color from texture. This is achieved by proper "material cbuffer" values. cb4_v1.x is set to 1.0 which returns in mask equal to 0.0 and gives input color from lerp instruction.

However, there are some exceptions. The highest value of desaturationFactor I found was 4.0 (never below 1.0) and desaturatedColor depends on material. It can be something like (0.2, 0.3, 0.4); there are no strict rules. Of course I couldn't resist to implement this in my own DX11 framework and here are the results, all with desaturatedColor equal to float3( 0.25, 0.3, 0.45 )

desaturationFactor = 1.0 (no effect)

desaturationFactor = 2.0

desaturationFactor = 3.0

desaturationFactor = 4.0
I'm sure it's just applying material parameters but it's not the end of the albedo part.
Lines 15-20 perform final touches:
  15: max r1.w, r1.z, r1.y   
  16: max r1.w, r1.w, r1.x   
  17: lt r1.w, l(0.220000), r1.w   
  18: movc r1.w, r1.w, l(-0.300000), l(-0.150000)   
  19: mad r1.w, v0.z, r1.w, l(1.000000)   
  20: mul o0.xyz, r1.wwww, r1.xyzx   

v0.z is output from Vertex Shader and it's equal to zero. Remember it, because v0.z will be used later a couple of times.

It seems to be some factor and all this code looks like darkening albedo a little bit, but since v0.z is equal to 0, the color is untouched. HLSL:

   /* ALBEDO */  
   // optional desaturation (?) filter  
   float3 albedoColor = albedoColorFilter( colorTex, cb4_v1.x, cb4_v0.rgb );  
   float albedoMaxComponent = getMaxComponent( albedoColor );  
     
   // I really have no idea what this is  
   // In most of cases Vertex Shader outputs "paramZ" as 0  
   float paramZ = Input.out0.z;  // note, mostly 0  
   
   // Note that 0.70 are 0.85 are not present in the output assembly  
   // Because I wanted to use lerp here I had to adjust them manually.  
   float param = (albedoMaxComponent > 0.22) ? 0.70 : 0.85;  
   float mulParam = lerp(1, param, paramZ);  
   
   // Output  
   pout.RT0.rgb = albedoColor * mulParam;  
   pout.RT0.a = cb4_v2.x;  

Regarding RT0.a, as you can see, it comes from materal's constant buffer but since the shader has no debug information, it's hard to say exactly what this is. Maybe translucency?

We are done with the first render target!

Normals

We start by unpacking normal map, then we perform normal mapping as usual:
   /* NORMALS */   
   float3 sampledNormal = ((normalTex.xyz - 0.5) * 2);  
   
   // Data to construct TBN matrix  
   float3 Tangent = Input.TangentW.xyz;  
   float3 Normal = Input.NormalW.xyz;  
   float3 Bitangent;  
   Bitangent.x = Input.out0.w;  
   Bitangent.yz = Input.out1.zw;  
   
   // remove this saturate in real scenario, this is a hack to make sure normal-tbn multiplication  
   // will have 'mad' instructions in assembly instead a bunch of 'mov's
   Bitangent = saturate(Bitangent);  
     
   float3x3 TBN = float3x3(Tangent, Bitangent, Normal);  
   float3 normal = mul( sampledNormal, TBN );  

Nothing really surprising so far.

Take a look at lines 28-33:
  28: uge r1.x, l(0), v4.x   
  29: if_nz r1.x   
  30: dp3 r1.x, v2.xyzx, r0.xyzx   
  31: mul r1.xyz, r1.xxxx, v2.xyzx   
  32: mad r0.xyz, -r1.xyzx, l(2.000000, 2.000000, 2.000000, 0.000000), r0.xyzx   
  33: endif   

We can roughly write it this way:
   [branch] if (bIsFrontFace <= 0)  
   {  
      float cosTheta = dot(Input.NormalW, normal);  
      float3 invNormal = cosTheta * Input.NormalW;  
      normal = normal - 2*invNormal;  
   }  

I'm not sure if this is a proper way of writing this. If you know what type of mathematical operation this is - let me know.

We see that the pixel shader uses SV_IsFrontFace.
What's that? Documentation (I wanted to write 'msdn' but..) comes to the rescue:

"Specifies whether a triangle is front facing. For lines and points, IsFrontFace has the value true. The exception is lines drawn out of triangles (wireframe mode), which sets IsFrontFace the same way as rasterizing the triangle in solid mode. Can be written to by the geometry shader, and read by the pixel shader."

I also wanted to check it for myself. Indeed, the effect is visible in wireframe mode only. I believe the purpose of this piece of code is to properly calculate normals (therefore, lighting) in wireframe mode.
Here is a comparison: Both wireframe final scene color with this trick off/on as well as gbuffer normal [0-1] texture with this trick off/on:

Scene color without the trick

Scene color with the trick
Normals [0-1] without the trick

Normals [0-1] with the trick
Have you noticed that the format of every rendertarget of GBuffer is R8G8B8A8_UNORM? That means we have 256 possible values per one component. Is it enough for storing normals?

Storing high quality normals with reasonable amount of bytes in GBuffer is known problem but fortunately there is a lot of material to learn from.

Probably some of you already know what technique is used here. I'd like to say that in whole geometry pass there is one additional texture attached to slot #13...:


Ha! The Witcher 3 uses technique known as "Best Fit Normals". I will not go here in details (refer to the presentation). It was invented around 2009-2010 by Crytek and since CryEngine is open source, BFN is open source too.

BFN causes "grainy" look of normals texture.
Afer scaling normal with the best fit, we encode it from [-1;1] to [0, 1] range.

Specular 

We start from line 34, by sampling specular texture:
  34: sample_indexable(texture2d)(float,float,float,float) r1.xyz, v1.xyxx, t2.xyzw, s0   
  35: max r1.w, r1.z, r1.y   
  36: max r1.w, r1.w, r1.x   
  37: lt r1.w, l(0.200000), r1.w   
  38: movc r2.xyz, r1.wwww, r1.xyzx, l(0.120000, 0.120000, 0.120000, 0.000000)   
  39: add r2.xyz, -r1.xyzx, r2.xyzx   
  40: mad o2.xyz, v0.zzzz, r2.xyzx, r1.xyzx   

As you can see, there is similar "darkening" filter as with Albedo:
Calc component with max value, then calulate "darker" color and interpolate with original specular color using a parameter from vertex shader... which is set to 0, so we output color from texture.

HLSL:
   /* SPECULAR */  
   float3 specularTex = texture2.Sample( samplerAnisoWrap, Texcoords ).rgb;  
   
   // Similar algorithm as in Albedo. Calculate max component, compare this with  
   // some threshold and calculate "minimum" value if needed.  
   // Because in the scene I analyzed paramZ was set to zero, value from texture will be  
   // the final result.  
   float specularMaxComponent = getMaxComponent( specularTex );  
   float3 specB = (specularMaxComponent > 0.2) ? specularTex : float3(0.12, 0.12, 0.12);  
   float3 finalSpec = lerp(specularTex, specB, paramZ);  
   pout.RT2.xyz = finalSpec;  

Reflectivity

I have no idea if this name is proper for this parameter since I don't know how it affects lighting pass. The thing is that alpha channel of input normal map has additional data:
Alpha channel of "normal map" texture. (c) CD Projekt Red
Assembly:
  41: lt r1.x, r0.w, l(0.330000)   
  42: mul r1.y, r0.w, l(0.950000)   
  43: movc r1.x, r1.x, r1.y, l(0.330000)   
  44: add r1.x, -r0.w, r1.x   
  45: mad o1.w, v0.z, r1.x, r0.w   

Say hello to our old friend, 'v0.z'! This is similar to both albedo and specular:
   /* REFLECTIVITY */  
   float reflectivity = normalTex.a;  
   float reflectivity2 = (reflectivity < 0.33) ? (reflectivity * 0.95) : 0.33;  
     
   float finalReflectivity = lerp(reflectivity, reflectivity2, paramZ);  
   pout.RT1.a = finalReflectivity;  

Nice! This is the end of analyzing the first variant of pixel shader.

In terms of result, here is a comparison of my shader (left) with the original one (right):
These differences do not affect calculations so my job is done here ;)



Pixel Shader - "Albedo + Normals" variant

I decided to show you one more variant - now with albedo & normal maps only - without specular texture. The assembly is a bit longer:
 ps_5_0  
    dcl_globalFlags refactoringAllowed  
    dcl_constantbuffer cb4[8], immediateIndexed  
    dcl_sampler s0, mode_default  
    dcl_sampler s13, mode_default  
    dcl_resource_texture2d (float,float,float,float) t0  
    dcl_resource_texture2d (float,float,float,float) t1  
    dcl_resource_texture2d (float,float,float,float) t13  
    dcl_input_ps linear v0.zw  
    dcl_input_ps linear v1.xyzw  
    dcl_input_ps linear v2.xyz  
    dcl_input_ps linear v3.xyz  
    dcl_input_ps_sgv v4.x, isfrontface  
    dcl_output o0.xyzw  
    dcl_output o1.xyzw  
    dcl_output o2.xyzw  
    dcl_temps 4  
   0: mul r0.x, v0.z, cb4[0].x  
   1: sample_indexable(texture2d)(float,float,float,float) r1.xyzw, v1.xyxx, t1.xyzw, s0  
   2: sample_indexable(texture2d)(float,float,float,float) r0.yzw, v1.xyxx, t0.wxyz, s0  
   3: add r2.x, r0.z, r0.y  
   4: add r2.x, r0.w, r2.x  
   5: add r2.z, l(-1.000000), cb4[2].x  
   6: mul r2.yz, r2.xxzx, l(0.000000, 0.333300, 0.500000, 0.000000)  
   7: mov_sat r2.w, r2.z  
   8: mad r2.x, r2.x, l(-0.666600), l(1.000000)  
   9: mad r2.x, r2.w, r2.x, r2.y  
  10: mul r3.xyz, r0.yzwy, cb4[1].xyzx  
  11: mul_sat r3.xyz, r3.xyzx, l(1.500000, 1.500000, 1.500000, 0.000000)  
  12: mul_sat r2.x, abs(r2.z), r2.x  
  13: add r2.yzw, -r0.yyzw, r3.xxyz  
  14: mad r0.yzw, r2.xxxx, r2.yyzw, r0.yyzw  
  15: max r2.x, r0.w, r0.z  
  16: max r2.x, r0.y, r2.x  
  17: lt r2.x, l(0.220000), r2.x  
  18: movc r2.x, r2.x, l(-0.300000), l(-0.150000)  
  19: mad r0.x, r0.x, r2.x, l(1.000000)  
  20: mul o0.xyz, r0.xxxx, r0.yzwy  
  21: add r0.xyz, r1.xyzx, l(-0.500000, -0.500000, -0.500000, 0.000000)  
  22: add r0.xyz, r0.xyzx, r0.xyzx  
  23: mov r1.x, v0.w  
  24: mov r1.yz, v1.zzwz  
  25: mul r1.xyz, r0.yyyy, r1.xyzx  
  26: mad r0.xyw, v3.xyxz, r0.xxxx, r1.xyxz  
  27: mad r0.xyz, v2.xyzx, r0.zzzz, r0.xywx  
  28: uge r0.w, l(0), v4.x  
  29: if_nz r0.w  
  30:  dp3 r0.w, v2.xyzx, r0.xyzx  
  31:  mul r1.xyz, r0.wwww, v2.xyzx  
  32:  mad r0.xyz, -r1.xyzx, l(2.000000, 2.000000, 2.000000, 0.000000), r0.xyzx  
  33: endif  
  34: add r0.w, -r1.w, l(1.000000)  
  35: log r1.xyz, cb4[3].xyzx  
  36: mul r1.xyz, r1.xyzx, l(2.200000, 2.200000, 2.200000, 0.000000)  
  37: exp r1.xyz, r1.xyzx  
  38: mad r0.w, r0.w, cb4[4].x, cb4[5].x  
  39: mul_sat r1.xyz, r0.wwww, r1.xyzx  
  40: log r1.xyz, r1.xyzx  
  41: mul r1.xyz, r1.xyzx, l(0.454545, 0.454545, 0.454545, 0.000000)  
  42: exp r1.xyz, r1.xyzx  
  43: max r0.w, r1.z, r1.y  
  44: max r0.w, r0.w, r1.x  
  45: lt r0.w, l(0.200000), r0.w  
  46: movc r2.xyz, r0.wwww, r1.xyzx, l(0.120000, 0.120000, 0.120000, 0.000000)  
  47: add r2.xyz, -r1.xyzx, r2.xyzx  
  48: mad o2.xyz, v0.zzzz, r2.xyzx, r1.xyzx  
  49: lt r0.w, r1.w, l(0.330000)  
  50: mul r1.x, r1.w, l(0.950000)  
  51: movc r0.w, r0.w, r1.x, l(0.330000)  
  52: add r0.w, -r1.w, r0.w  
  53: mad o1.w, v0.z, r0.w, r1.w  
  54: lt r0.w, l(0), cb4[7].x  
  55: and o2.w, r0.w, l(0.064706)  
  56: dp3 r0.w, r0.xyzx, r0.xyzx  
  57: rsq r0.w, r0.w  
  58: mul r0.xyz, r0.wwww, r0.xyzx  
  59: max r0.w, abs(r0.y), abs(r0.x)  
  60: max r0.w, r0.w, abs(r0.z)  
  61: lt r1.xy, abs(r0.zyzz), r0.wwww  
  62: movc r1.yz, r1.yyyy, abs(r0.zzyz), abs(r0.zzxz)  
  63: movc r1.xy, r1.xxxx, r1.yzyy, abs(r0.yxyy)  
  64: lt r1.z, r1.y, r1.x  
  65: movc r1.xy, r1.zzzz, r1.xyxx, r1.yxyy  
  66: div r1.z, r1.y, r1.x  
  67: div r0.xyz, r0.xyzx, r0.wwww  
  68: sample_l(texture2d)(float,float,float,float) r0.w, r1.xzxx, t13.yzwx, s13, l(0)  
  69: mul r0.xyz, r0.wwww, r0.xyzx  
  70: mad o1.xyz, r0.xyzx, l(0.500000, 0.500000, 0.500000, 0.000000), l(0.500000, 0.500000, 0.500000, 0.000000)  
  71: mov o0.w, cb4[6].x  
  72: ret

The differences between this variant and previous one are:

a) lines 1, 19: interpolation parameter v0.z is multiplied by cb4[0].x from constant buffer, but this product is used only to interpolate albedo at line 19. For other output data, 'usual' v0.z is used.


b) lines 54-55: o2.w is now set under condition that ( cb4[7].x > 0.0 )

We already know this pattern "someComparison - and" from calculating luminance histogram from TW3, we can write this as:
 pout.RT2.w = (cb4_v7.x > 0.0) ? (16.5/255.0) : 0.0;  


c) lines 34-42: completely different calculation of specular.

There is no specular texture. Let's see assembly responsible for that:
  34: add r0.w, -r1.w, l(1.000000)   
  35: log r1.xyz, cb4[3].xyzx   
  36: mul r1.xyz, r1.xyzx, l(2.200000, 2.200000, 2.200000, 0.000000)   
  37: exp r1.xyz, r1.xyzx   
  38: mad r0.w, r0.w, cb4[4].x, cb4[5].x   
  39: mul_sat r1.xyz, r0.wwww, r1.xyzx   
  40: log r1.xyz, r1.xyzx   
  41: mul r1.xyz, r1.xyzx, l(0.454545, 0.454545, 0.454545, 0.000000)   
  42: exp r1.xyz, r1.xyzx   

Note we used here (1-reflectivity). Luckily, this is quite simple in HLSL:
   float oneMinusReflectivity = 1.0 - normalTex.a;  
   float3 specularTex = pow(cb4_v3.rgb, 2.2);  
   oneMinusReflectivity = oneMinusReflectivity * cb4_v4.x + cb4_v5.x;  
   specularTex = saturate(specularTex * oneMinusReflectivity);  
   specularTex = pow(specularTex, 1.0/2.2);  
   
   // proceed as in the first variant...  
   float specularMaxComponent = getMaxComponent( specularTex ); 
   ... 

On a side note, in this variant we have slightly larger constant buffer with material data. These extra values are used to emulate specular color here.

The rest of the shader is the same as in prevous variant.

72 lines of assembly is a little too much for WinMerge to display at once so just believe me it's almost the same assembly as in original. Or you can grab my HLSLexplorer and see it for yourself! ;)


Summary

...and if you've come this far, maybe you're willing to come a little further.

Nothing what seems simple is not in real life and feeding the gbuffer in The Witcher 3 is no exception. I've just shown you the simplest variants of pixel shaders responsible for it and some general observations which apply to deferred shading in general.

For the most patient ones (or vice versa) the two variants of pixel shaders @ pastebin:





Feel free to comment.

I hope you enjoyed it.
Thanks for reading!

sobota, 22 grudnia 2018

Reverse engineering the rendering of The Witcher 3, part 8 - The Moon and lunar phases

Welcome,

In the 8th part of this series I will investigate the Moon shader from The Witcher 3 (more specifically, from "Blood and Wine" expansion pack).

The Moon is an important element of night sky and can be quite challenging to make it believable, but in TW3 for me it's just a pleasure to walk around during the night.
Just take a look at this scene!


Before I will get to the pixel shader, few words about rendering nuances. In terms of geometry it's just a sphere (see below) which comes with texture coordinates, normal and tangent vectors. The vertex shader calculates world space position as well as normalized normal, tangent, and bitangent (using cross product) vectors multiplied by world matrix.
To make sure that the Moon lies completely on far plane, MinDepth and MaxDepth fields of D3D11_VIEWPORT structure are set to 0.0 (the same trick is used for skydome). The Moon is rendered just after sky.

Sphere used to draw the Moon
Alright, I think we are ready to go. Let's see the pixel shader:
 ps_5_0  
    dcl_globalFlags refactoringAllowed  
    dcl_constantbuffer cb0[1], immediateIndexed  
    dcl_constantbuffer cb2[3], immediateIndexed  
    dcl_constantbuffer cb12[267], immediateIndexed  
    dcl_sampler s0, mode_default  
    dcl_resource_texture2d (float,float,float,float) t0  
    dcl_input_ps linear v1.w  
    dcl_input_ps linear v2.xyzw  
    dcl_input_ps linear v3.xy  
    dcl_input_ps linear v4.xy  
    dcl_output o0.xyzw  
    dcl_temps 3  
   0: mov r0.x, -cb0[0].w  
   1: mov r0.y, l(0)  
   2: add r0.xy, r0.xyxx, v2.xyxx  
   3: sample_indexable(texture2d)(float,float,float,float) r0.xyzw, r0.xyxx, t0.xyzw, s0  
   4: add r0.xyz, r0.xyzx, l(-0.500000, -0.500000, -0.500000, 0.000000)  
   5: log r0.w, r0.w  
   6: mul r0.w, r0.w, l(2.200000)  
   7: exp r0.w, r0.w  
   8: add r0.xyz, r0.xyzx, r0.xyzx  
   9: dp3 r1.x, r0.xyzx, r0.xyzx  
  10: rsq r1.x, r1.x  
  11: mul r0.xyz, r0.xyzx, r1.xxxx  
  12: mul r1.xy, r0.yyyy, v3.xyxx  
  13: mad r0.xy, v4.xyxx, r0.xxxx, r1.xyxx  
  14: mad r0.xy, v2.zwzz, r0.zzzz, r0.xyxx  
  15: mad r0.z, cb0[0].y, l(0.033864), cb0[0].w  
  16: mul r0.z, r0.z, l(6.283185)  
  17: sincos r1.x, r2.x, r0.z  
  18: mov r2.y, r1.x  
  19: dp2_sat r0.x, r0.xyxx, r2.xyxx  
  20: mul r0.xyz, r0.xxxx, cb12[266].xyzx  
  21: mul r0.xyz, r0.xyzx, r0.wwww  
  22: mul r0.xyz, r0.xyzx, cb2[2].xyzx  
  23: add_sat r0.w, -v1.w, l(1.000000)  
  24: mul r0.w, r0.w, cb2[2].w  
  25: mul o0.xyz, r0.wwww, r0.xyzx  
  26: mov o0.w, l(0)  
  27: ret  

The main reason I selected shader from "Blood and Wine" expansions pack is simple - it's shorter ;)

At first we calculate offset for texture sampling.
cb0[0].w is used as offset along X axis. Using this simple trick we can simulate rotation of the Moon along its axis.

Example values from constant buffer


There is one texture  (1024x512) attached as input. We have normal map encoded in RGB channels and in alpha channel - color of the Moon's surface. Smart!

Alpha channel of the texture - color of the Moon's surface. (c) CD Projekt Red

RGB channels of the texture - normal map. (c) CD Projekt Red
Once we have proper texture coordinates, we sample RGBA channels. We have to unpack normal map and perform gamma correction of surface color. So far our HLSL shader can be written for example like this:
 float4 MoonPS(in InputStruct IN) : SV_Target0  
 {  
   // Texcoords offset  
   float2 uvOffsets = float2(-cb0_v0.w, 0.0);  
     
   // Final texcoords  
   float2 uv = IN.param2.xy + uvOffsets;  
   
   // Sample texture  
   float4 sampledTexture = texture0.Sample( sampler0, uv);  
   
   // Moon surface color - perform gamma correction  
   float moonColorTex = pow(sampledTexture.a, 2.2 );  
   
   // Unpack normal from [0,1] to [-1,1] range.  
   // Note: sampledTexture.xyz * 2.0 - 1.0 works the same way  
   float3 sampledNormal = normalize((sampledTexture.xyz - 0.5) * 2);  

The next is step is to perform normal mapping, but only on XY components. (In The Witcher 3, Z-axis is up and whole Z channel of the texture is 1.0) . We can do it like this:
   // Tangent space vectors  
   float3 Tangent = IN.param4.xyz;  
   float3 Normal = float3(IN.param2.zw, IN.param3.w);  
   float3 Bitangent = IN.param3.xyz;  
        
   // TBN matrix   
   float3x3 TBN = float3x3(Tangent, Bitangent, Normal);  
        
   // Calculate XY normal vector  
   // Squeeze TBN matrix to float3x2: 3 rows, 2 columns  
   float2 vNormal = mul(sampledNormal, (float3x2)TBN).xy;  

Now it's time for my favourite part of this shader. Take a look at lines 15-16 again:
  15: mad r0.z, cb0[0].y, l(0.033864), cb0[0].w  
  16: mul r0.z, r0.z, l(6.283185)

Well, what's this mysterious 0.033864? It seems to make no sense at first sight, but if we calculate its reciprocal, we'll get ~29.53, which is length of synodic month in days! Now this is what I call attention to detail!
We can safely assume that cb0[0].y is number of days which passed during gameplay. Additional bias which was used as X-axis offset of texture is used here.

Once we have this ratio, we multiply it by 2*Pi.
Then, using sincos, we calculate another 2d vector.

By calculating dot product between normal vector and "lunar" one lunar phase is simulated.
   // Lunar phase.  
   // We calculate days/29.53 + bias.  
   float phase = cb0_v0.y * (1.0 / SYNODIC_MONTH_LENGTH) + cb0_v0.w;  
   
   // Multiply by 2*PI. This way 29.53 will be a full period  
   // for sin/cos functions.  
   phase *= TWOPI;  
        
   // Calculate sine and cosine of lunar phase.  
   float outSin = 0.0;  
   float outCos = 0.0;  
   sincos(phase, outSin, outCos);  
        
   // Calculate lunar phase  
   float lunarPhase = saturate( dot(vNormal, float2(outCos, outSin)) );  

See some screenshots with various lunar phases:




The last step is to perform a series of multiplications to calculate final color.
   // Perform a series of multiplications to calculate final color.  
   
   // cb12_v266.xyz is used to boost Moon's glow and color.  
   // for example (1.54, 2.82, 4.13)  
   float3 moonSurfaceGlowColor = cb12_v266.xyz;  
   
   float3 moonColor = lunarPhase * moonSurfaceGlowColor;  
   moonColor = moonColorTex * moonColor;  
     
   // cb_v2.xyz is probably a filter, like (1.0, 1.0, 1.0)  
   moonColor *= cb2_v2.xyz;  
        
   // I'm not really sure what this thing is, maybe some horizon opacity value.  
   // Anyway, it doesn't seem to have that much influence to final color  
   // as parameters above.  
   float paramHorizon = saturate(1.0 - IN.param1.w);  
   paramHorizon *= cb2_v2.w;  
        
   moonColor *= paramHorizon;  
   
   // Output final color with zero alpha  
   return float4(moonColor, 0.0);  

You may wonder why this shader outputs 0.0 alpha. Well, the Moon is rendered with blending enabled:
Such approach allows us to have background (sky) color if this shader returns black one.

If you are interested in full shader, it's here. It has some big constant buffers and should be ready to inject instead original one in RenderDoc (just rename "MoonPS" to "EditedShaderPS").

Last but not least, I wanted to share results with you:
On the left - my shader, on the right - original shader from the game.
The difference is really minor which has no impact on results.

As you can see, this shader was quite easy to reconstruct.
I hope you enjoyed it.

Thanks for reading!