## niedziela, 20 stycznia 2019

### Reverse engineering the rendering of The Witcher 3, part 10 - distant rain shafts

This post is a part of the series "Reverse engineering the rendering of The Witcher 3".

Welcome to the 10th part of the series! Woohooo! :)

See previous parts here.

This time we will take a look at really cool atmospheric effect I really like - distant rain/light shafts near the horizon. The easiest way to encounter them in the game is visiting Skellige Islands.

Personally I really love such atmospherical phenomena and I was really curious how graphics programmers from CD Projekt Red implemented this. Let's find out!

Here are two screenshots before and after applying rain shafts:

 Before rain shafts

 After rain shafts

### Geometry

Our first stop is geometry. The idea is to use small cylinder:
 Cylinder in local space
In terms of local space position, it's pretty small - the range of position is in ( 0.0 - 1.0 ).

The input layout for this draw call looks like this...

What is important to us here: Texcoords and Instance_Transform.

Texcoords are wrapped pretty simply: U of both upper and lower base is in [0.02777 - 1.02734] range. V is equal to 1.0 on lower base and equal to 0.0 on upper one. As you can see, it's pretty simple to even generate this mesh procedurally.

As we have this small cylinder in local space, we multiply it by world matrix which is provided by per-instance INSTANCE_TRANSFORM input element. Let's check values of this matrix:

It looks quite intimidating, isn't it? Don't worry, let's do some decomposing and see what this matrix hides!
`````` XMMATRIX mat( -227.7472,  159.8043,  374.0736, -116.4951,
-194.7577, -173.3836, -494.4982,  238.6908,
-14.16466, -185.4743,  784.564,   -1.45565,
0.0, 0.0, 0.0, 1.0 );

mat = XMMatrixTranspose( mat );

XMVECTOR vScale;
XMVECTOR vRotateQuat;
XMVECTOR vTranslation;
XMMatrixDecompose( &vScale, &vRotateQuat, &vTranslation, mat );

// Rotation matrix...
XMMATRIX matRotate = XMMatrixRotationQuaternion( vRotateQuat );
``````

Results are really interesting:

vRotateQuat: (0.0924987569, -0.314900011, 0.883411944, -0.334462732)
vScale: (299.999969, 300.000000, 1000.00012)
vTranslation: (-116.495102, 238.690796, -1.45564997)

It's important to know camera position at this particular frame: ( -116.5338, 234.8695, 2.09 )

As you can see, we scale the cylinder to make it pretty big in world space ( in TW3 Z axis is up-one), translate it with respect to camera position and rotate.
Here is the cylinder after vertex shader transform:

 Cylinder after transforming by vertex shader. See how it's placed relative to view frustum

Input geometry and vertex shader are are strictly dependent on each other.
Let's take a closer look at assembly of vertex shader:

`````` vs_5_0
dcl_globalFlags refactoringAllowed
dcl_constantbuffer cb1[7], immediateIndexed
dcl_constantbuffer cb2[6], immediateIndexed
dcl_input v0.xyz
dcl_input v1.xy
dcl_input v4.xyzw
dcl_input v5.xyzw
dcl_input v6.xyzw
dcl_input v7.xyzw
dcl_output o0.xyz
dcl_output o1.xyzw
dcl_output_siv o2.xyzw, position
dcl_temps 2
0: mov o0.xy, v1.xyxx
1: mul r0.xyzw, v5.xyzw, cb1[6].yyyy
2: mad r0.xyzw, v4.xyzw, cb1[6].xxxx, r0.xyzw
3: mad r0.xyzw, v6.xyzw, cb1[6].zzzz, r0.xyzw
4: mad r0.xyzw, cb1[6].wwww, l(0.000000, 0.000000, 0.000000, 1.000000), r0.xyzw
5: mad r1.xyz, v0.xyzx, cb2[4].xyzx, cb2[5].xyzx
6: mov r1.w, l(1.000000)
7: dp4 o0.z, r1.xyzw, r0.xyzw
8: mov o1.xyzw, v7.xyzw
9: mul r0.xyzw, v5.xyzw, cb1[0].yyyy
10: mad r0.xyzw, v4.xyzw, cb1[0].xxxx, r0.xyzw
11: mad r0.xyzw, v6.xyzw, cb1[0].zzzz, r0.xyzw
12: mad r0.xyzw, cb1[0].wwww, l(0.000000, 0.000000, 0.000000, 1.000000), r0.xyzw
13: dp4 o2.x, r1.xyzw, r0.xyzw
14: mul r0.xyzw, v5.xyzw, cb1[1].yyyy
15: mad r0.xyzw, v4.xyzw, cb1[1].xxxx, r0.xyzw
16: mad r0.xyzw, v6.xyzw, cb1[1].zzzz, r0.xyzw
17: mad r0.xyzw, cb1[1].wwww, l(0.000000, 0.000000, 0.000000, 1.000000), r0.xyzw
18: dp4 o2.y, r1.xyzw, r0.xyzw
19: mul r0.xyzw, v5.xyzw, cb1[2].yyyy
20: mad r0.xyzw, v4.xyzw, cb1[2].xxxx, r0.xyzw
21: mad r0.xyzw, v6.xyzw, cb1[2].zzzz, r0.xyzw
22: mad r0.xyzw, cb1[2].wwww, l(0.000000, 0.000000, 0.000000, 1.000000), r0.xyzw
23: dp4 o2.z, r1.xyzw, r0.xyzw
24: mul r0.xyzw, v5.xyzw, cb1[3].yyyy
25: mad r0.xyzw, v4.xyzw, cb1[3].xxxx, r0.xyzw
26: mad r0.xyzw, v6.xyzw, cb1[3].zzzz, r0.xyzw
27: mad r0.xyzw, cb1[3].wwww, l(0.000000, 0.000000, 0.000000, 1.000000), r0.xyzw
28: dp4 o2.w, r1.xyzw, r0.xyzw
29: ret
``````

Together with simple passing of Texcoords (line 0) and Instance_LOD_Params (line 8), two more things are needed for output: SV_Position (obviously) and Height (.z component) of world position.

Remember that local space is in [0-1] range? Well, the vertex shader uses scale&bias to adjust local position just before applying world matrix. Smart!
In this case, we have scale = float3(4, 4, 2) and bias = float3(-2, -2, -1).

The pattern you can notice between line 9 and 28 is multiplying two row-major martices.
Just take a look at final vertex shader in HLSL :)
`````` cbuffer cbPerFrame : register (b1)
{
row_major float4x4 g_viewProjMatrix;
row_major float4x4 g_rainShaftsViewProjMatrix;
}

cbuffer cbPerObject : register (b2)
{
float4x4 g_mtxWorld;
float4 g_modelScale;
float4 g_modelBias;
}

struct VS_INPUT
{
float3 PositionW : POSITION;
float2 Texcoord : TEXCOORD;
float3 NormalW : NORMAL;
float3 TangentW : TANGENT;
float4 InstanceTransform0 : INSTANCE_TRANSFORM0;
float4 InstanceTransform1 : INSTANCE_TRANSFORM1;
float4 InstanceTransform2 : INSTANCE_TRANSFORM2;
float4 InstanceLODParams  : INSTANCE_LOD_PARAMS;
};

struct VS_OUTPUT
{
float3 TexcoordAndZ : Texcoord0;

float4 LODParams : LODParams;
float4 PositionH : SV_Position;
};

VS_OUTPUT RainShaftsVS( VS_INPUT Input )
{
VS_OUTPUT Output = (VS_OUTPUT)0;

// simple data passing
Output.TexcoordAndZ.xy = Input.Texcoord;
Output.LODParams = Input.InstanceLODParams;

// world space
float3 meshScale = g_modelScale.xyz;  // float3( 4, 4, 2 );
float3 meshBias =  g_modelBias.xyz;   // float3( -2, -2, -1 );
float3 PositionL = Input.PositionW * meshScale + meshBias;

// Manually build instanceWorld matrix from float4s:
float4x4 matInstanceWorld = float4x4(Input.InstanceTransform0, Input.InstanceTransform1,
Input.InstanceTransform2 , float4(0, 0, 0, 1) );

// World-space Height (.z)
float4x4 matWorldInstanceLod = mul( g_rainShaftsViewProjMatrix, matInstanceWorld );
Output.TexcoordAndZ.z = mul( float4(PositionL, 1.0), transpose(matWorldInstanceLod) ).z;

// SV_Posiiton
float4x4 matModelViewProjection = mul(g_viewProjMatrix, matInstanceWorld );
Output.PositionH = mul( float4(PositionL, 1.0), transpose(matModelViewProjection) );

return Output;
}
``````

Comparison between my VS (left) and original one (right):

The differences do not affect calculations ;) I injected my VS into frame and everything is alright!

Finally....! For start, I'll show you inputs:
There are two textures involved: noise texture and depth buffer:

Values from constant buffers:

`````` ps_5_0
dcl_globalFlags refactoringAllowed
dcl_constantbuffer cb0[8], immediateIndexed
dcl_constantbuffer cb2[3], immediateIndexed
dcl_constantbuffer cb12[23], immediateIndexed
dcl_constantbuffer cb4[8], immediateIndexed
dcl_sampler s0, mode_default
dcl_sampler s15, mode_default
dcl_resource_texture2d (float,float,float,float) t0
dcl_resource_texture2d (float,float,float,float) t15
dcl_input_ps linear v0.xyz
dcl_input_ps linear v1.w
dcl_input_ps_siv v2.xy, position
dcl_output o0.xyzw
dcl_temps 1
0: mul r0.xy, cb0[0].xxxx, cb4[5].xyxx
1: mad r0.xy, v0.xyxx, cb4[4].xyxx, r0.xyxx
2: sample_indexable(texture2d)(float,float,float,float) r0.x, r0.xyxx, t0.xyzw, s0
4: mad_sat r0.x, r0.x, r0.y, cb4[2].x
5: mul r0.x, r0.x, v0.y
6: mul r0.x, r0.x, v1.w
7: mul r0.x, r0.x, cb4[1].x
8: mul r0.yz, v2.xxyx, cb0[1].zzwz
9: sample_l(texture2d)(float,float,float,float) r0.y, r0.yzyy, t15.yxzw, s15, l(0)
10: mad r0.y, r0.y, cb12[22].x, cb12[22].y
11: mad r0.y, r0.y, cb12[21].x, cb12[21].y
12: max r0.y, r0.y, l(0.000100)
13: div r0.y, l(1.000000, 1.000000, 1.000000, 1.000000), r0.y
15: mul_sat r0.y, r0.y, cb4[6].x
16: mul_sat r0.x, r0.y, r0.x
17: mad r0.y, cb0[7].y, r0.x, -r0.x
18: mad r0.x, cb4[7].x, r0.y, r0.x
19: mul r0.xyz, r0.xxxx, cb4[0].xyzx
20: log r0.xyz, r0.xyzx
21: mul r0.xyz, r0.xyzx, l(2.200000, 2.200000, 2.200000, 0.000000)
22: exp r0.xyz, r0.xyzx
23: mul r0.xyz, r0.xyzx, cb2[2].xyzx
24: mul o0.xyz, r0.xyzx, cb2[2].wwww
25: mov o0.w, l(0)
26: ret
``````

Phew! Quite a lot of stuff, but actually it's not that bad.

So, what happens here? At first we calculate animated UVs using elapsed time from cbuffer (cb0[0].x) and some scale/offsets. These texcoords are used to sample from noise texture (line 2).

Once we have noise value from texture, we interpolate between min/max values (usually 0 and 1).
Then we perform some multiplications, like by V tex coordinate (remember that V coordinate comes from 1 to 0?) - line 5

This way we calculated "intensity mask" - it looks like this:

Notice that distant objects (lighthouse, mountains...) are gone. This happens because the cylinder passes depth test - the cylinder is not on far plane and is drawn in front of aforementioned objects:
 depth test
We want to mimic that the rain shafts are further (not necessarily on far plane, though). To achieve that, we compute another mask, "far objects mask".

So we compute it with the following formula:
farObjectsMask = saturate( (FrustumDepth - CylinderWorldSpaceHeight) * 0.001 );

(0.001 comes from cbuffer)

( I explained a bit how frustum depth is extracted from depth buffer in my post about sharpen )

Personally, I think this could be done cheaper, without calculating world-space height in VS by multiplying frustum depth by smaller number, like 0.0004.

Then we multiply both masks, which yields the final one:

Having this final mask (line 16) we have another interpolation which pretty much does nothing (at least in tested scenario), then we multiply the final mask with shafts color (line 19), perform gamma correction (lines 20-22) and perform final multiplications (23-24).

At the end we return color with zero alpha. This is because blending is enabled in this pass:

FinalColor = SourceColor * 1.0 + (1.0 - SourceAlpha) * DestColor.

If you are a bit rusty how blending works, a quick explanation:
SourceColor is RGB output from the pixel shader while DestColor is current RGB color of pixel in render target. Because SourceAlpha is always 0.0, the afromentioned equation simplifies to:

FinalColor = SourceColor + DestColor.

Simply speaking, we perform additive blending here. If this pixel shader returns (0, 0, 0) the color will remain the same.

Here is final HLSL - I think after this description it will be much easier to follow:
`````` struct VS_OUTPUT
{
float3 TexcoordAndWorldspaceHeight : Texcoord0;
float4 LODParams : LODParams;    // float4(1,1,1,1)
float4 PositionH : SV_Position;
};

float getFrustumDepth( in float depth )
{
// from [1-0] to [0-1]
float d = depth * cb12_v22.x + cb12_v22.y;

// special coefficents
d = d * cb12_v21.x + cb12_v21.y;

// return frustum depth
return 1.0 / max(d, 1e-4);
}

float4 EditedShaderPS( in VS_OUTPUT Input ) : SV_Target0
{
// * Input from Vertex Shader
float2 InputUV = Input.TexcoordAndWorldspaceHeight.xy;
float WorldHeight = Input.TexcoordAndWorldspaceHeight.z;
float LODParam = Input.LODParams.w;

// * Inputs
float elapsedTime = cb0_v0.x;
float2 uvAnimation = cb4_v5.xy;
float2 uvScale = cb4_v4.xy;
float minValue = cb4_v2.x; // 0.0
float maxValue = cb4_v3.x; // 1.0
float3 shaftsColor = cb4_v0.rgb;  // RGB( 147, 162, 173 )

float3 finalColorFilter = cb2_v2.rgb; // float3( 1.175, 1.296, 1.342 );
float finalEffectIntensity = cb2_v2.w;

float2 invViewportSize = cb0_v1.zw;

float depthScale = cb4_v6.x;  // 0.001

// sample noise
float2 uvOffsets = elapsedTime * uvAnimation;
float2 uv = InputUV * uvScale + uvOffsets;
float disturb = texture0.Sample( sampler0, uv ).x;

float intensity = saturate( lerp(minValue, maxValue, disturb) );
intensity *= InputUV.y;   // transition from (0, 1)
intensity *= LODParam;   // usually 1.0
intensity *= cb4_v1.x;   // 1.0

// Sample depth
float2 ScreenUV = Input.PositionH.xy * invViewportSize;
float hardwareDepth = texture15.SampleLevel( sampler15, ScreenUV, 0 ).x;
float frustumDepth = getFrustumDepth( hardwareDepth );

// * Calculate mask covering distant objects behind cylinder.

// Seems that the input really is world-space height (.z component, see vertex shader)
float depth = frustumDepth - WorldHeight;
float distantObjectsMask = saturate( depth * depthScale );

// cb0_v7.y and cb4_v7.x are set to 1.0 so I didn't bother with naming them :)
float paramY = cb0_v7.y * finalEffectMask;
float effectAmount = lerp(paramX, paramY, cb4_v7.x);

// color of shafts comes from contant buffer
float3 effectColor = effectAmount * shaftsColor;

// gamma correction
effectColor = pow(effectColor, 2.2);

// final multiplications
effectColor *= finalColorFilter;
effectColor *= finalEffectIntensity;

// return with zero alpha 'cause the blending used here is:
// SourceColor * 1.0 + (1.0 - SrcAlpha) * DestColor
return float4( effectColor, 0.0 );
}
``````

I'm happy to say that my PS produces the same assembly as original is ;)

I hope you enjoyed it.