niedziela, 20 stycznia 2019

Reverse engineering the rendering of The Witcher 3, part 10 - distant rain shafts

Welcome to the 10th part of the series! Woohooo! :)

See previous parts here.

This time we will take a look at really cool atmospheric effect I really like - distant rain/light shafts near the horizon. The easiest way to encounter them in the game is visiting Skellige Islands.



Personally I really love such atmospherical phenomena and I was really curious how graphics programmers from CD Projekt Red implemented this. Let's find out!

Here are two screenshots before and after applying rain shafts:

Before rain shafts

After rain shafts

Geometry

Our first stop is geometry. The idea is to use small cylinder:
Cylinder in local space
In terms of local space position, it's pretty small - the range of position is in ( 0.0 - 1.0 ).

The input layout for this draw call looks like this...

What is important to us here: Texcoords and Instance_Transform.

Texcoords are wrapped pretty simply: U of both upper and lower base is in [0.02777 - 1.02734] range. V is equal to 1.0 on lower base and equal to 0.0 on upper one. As you can see, it's pretty simple to even generate this mesh procedurally.

As we have this small cylinder in local space, we multiply it by world matrix which is provided by per-instance INSTANCE_TRANSFORM input element. Let's check values of this matrix:



It looks quite intimidating, isn't it? Don't worry, let's do some decomposing and see what this matrix hides!
 XMMATRIX mat( -227.7472,  159.8043,  374.0736, -116.4951,  
               -194.7577, -173.3836, -494.4982,  238.6908,  
               -14.16466, -185.4743,  784.564,   -1.45565,  
                0.0, 0.0, 0.0, 1.0 );  
   
      mat = XMMatrixTranspose( mat );  
   
      XMVECTOR vScale;  
      XMVECTOR vRotateQuat;  
      XMVECTOR vTranslation;  
      XMMatrixDecompose( &vScale, &vRotateQuat, &vTranslation, mat );  
   
      // Rotation matrix...  
      XMMATRIX matRotate = XMMatrixRotationQuaternion( vRotateQuat );  

Results are really interesting:

vRotateQuat: (0.0924987569, -0.314900011, 0.883411944, -0.334462732)
vScale: (299.999969, 300.000000, 1000.00012)
vTranslation: (-116.495102, 238.690796, -1.45564997)

It's important to know camera position at this particular frame: ( -116.5338, 234.8695, 2.09 )

As you can see, we scale the cylinder to make it pretty big in world space ( in TW3 Z axis is up-one), translate it with respect to camera position and rotate.
Here is the cylinder after vertex shader transform:

Cylinder after transforming by vertex shader. See how it's placed relative to view frustum


Vertex Shader

Input geometry and vertex shader are are strictly dependent on each other.
Let's take a closer look at assembly of vertex shader:

 vs_5_0  
    dcl_globalFlags refactoringAllowed  
    dcl_constantbuffer cb1[7], immediateIndexed  
    dcl_constantbuffer cb2[6], immediateIndexed  
    dcl_input v0.xyz  
    dcl_input v1.xy  
    dcl_input v4.xyzw  
    dcl_input v5.xyzw  
    dcl_input v6.xyzw  
    dcl_input v7.xyzw  
    dcl_output o0.xyz  
    dcl_output o1.xyzw  
    dcl_output_siv o2.xyzw, position  
    dcl_temps 2  
   0: mov o0.xy, v1.xyxx  
   1: mul r0.xyzw, v5.xyzw, cb1[6].yyyy  
   2: mad r0.xyzw, v4.xyzw, cb1[6].xxxx, r0.xyzw  
   3: mad r0.xyzw, v6.xyzw, cb1[6].zzzz, r0.xyzw  
   4: mad r0.xyzw, cb1[6].wwww, l(0.000000, 0.000000, 0.000000, 1.000000), r0.xyzw  
   5: mad r1.xyz, v0.xyzx, cb2[4].xyzx, cb2[5].xyzx  
   6: mov r1.w, l(1.000000)  
   7: dp4 o0.z, r1.xyzw, r0.xyzw  
   8: mov o1.xyzw, v7.xyzw  
   9: mul r0.xyzw, v5.xyzw, cb1[0].yyyy  
  10: mad r0.xyzw, v4.xyzw, cb1[0].xxxx, r0.xyzw  
  11: mad r0.xyzw, v6.xyzw, cb1[0].zzzz, r0.xyzw  
  12: mad r0.xyzw, cb1[0].wwww, l(0.000000, 0.000000, 0.000000, 1.000000), r0.xyzw  
  13: dp4 o2.x, r1.xyzw, r0.xyzw  
  14: mul r0.xyzw, v5.xyzw, cb1[1].yyyy  
  15: mad r0.xyzw, v4.xyzw, cb1[1].xxxx, r0.xyzw  
  16: mad r0.xyzw, v6.xyzw, cb1[1].zzzz, r0.xyzw  
  17: mad r0.xyzw, cb1[1].wwww, l(0.000000, 0.000000, 0.000000, 1.000000), r0.xyzw  
  18: dp4 o2.y, r1.xyzw, r0.xyzw  
  19: mul r0.xyzw, v5.xyzw, cb1[2].yyyy  
  20: mad r0.xyzw, v4.xyzw, cb1[2].xxxx, r0.xyzw  
  21: mad r0.xyzw, v6.xyzw, cb1[2].zzzz, r0.xyzw  
  22: mad r0.xyzw, cb1[2].wwww, l(0.000000, 0.000000, 0.000000, 1.000000), r0.xyzw  
  23: dp4 o2.z, r1.xyzw, r0.xyzw  
  24: mul r0.xyzw, v5.xyzw, cb1[3].yyyy  
  25: mad r0.xyzw, v4.xyzw, cb1[3].xxxx, r0.xyzw  
  26: mad r0.xyzw, v6.xyzw, cb1[3].zzzz, r0.xyzw  
  27: mad r0.xyzw, cb1[3].wwww, l(0.000000, 0.000000, 0.000000, 1.000000), r0.xyzw  
  28: dp4 o2.w, r1.xyzw, r0.xyzw  
  29: ret  

Together with simple passing of Texcoords (line 0) and Instance_LOD_Params (line 8), two more things are needed for output: SV_Position (obviously) and Height (.z component) of world position.

Remember that local space is in [0-1] range? Well, the vertex shader uses scale&bias to adjust local position just before applying world matrix. Smart!
In this case, we have scale = float3(4, 4, 2) and bias = float3(-2, -2, -1).

The pattern you can notice between line 9 and 28 is multiplying two row-major martices.
Just take a look at final vertex shader in HLSL :)
 cbuffer cbPerFrame : register (b1)  
 {  
   row_major float4x4 g_viewProjMatrix;  
   row_major float4x4 g_rainShaftsViewProjMatrix;  
 }  
   
 cbuffer cbPerObject : register (b2)  
 {  
   float4x4 g_mtxWorld;  
   float4 g_modelScale;  
   float4 g_modelBias;  
 }  
   
 struct VS_INPUT  
 {  
   float3 PositionW : POSITION;  
   float2 Texcoord : TEXCOORD;  
   float3 NormalW : NORMAL;  
   float3 TangentW : TANGENT;  
   float4 InstanceTransform0 : INSTANCE_TRANSFORM0;  
   float4 InstanceTransform1 : INSTANCE_TRANSFORM1;  
   float4 InstanceTransform2 : INSTANCE_TRANSFORM2;  
   float4 InstanceLODParams  : INSTANCE_LOD_PARAMS;  
 };  
   
 struct VS_OUTPUT  
 {  
   float3 TexcoordAndZ : Texcoord0;  
   
   float4 LODParams : LODParams;  
   float4 PositionH : SV_Position;  
 };  
   
 VS_OUTPUT RainShaftsVS( VS_INPUT Input )  
 {  
   VS_OUTPUT Output = (VS_OUTPUT)0;  
   
   // simple data passing  
   Output.TexcoordAndZ.xy = Input.Texcoord;  
   Output.LODParams = Input.InstanceLODParams;  
   
   // world space  
   float3 meshScale = g_modelScale.xyz;  // float3( 4, 4, 2 );
   float3 meshBias =  g_modelBias.xyz;   // float3( -2, -2, -1 );
   float3 PositionL = Input.PositionW * meshScale + meshBias;  
   
   // Manually build instanceWorld matrix from float4s:  
   float4x4 matInstanceWorld = float4x4(Input.InstanceTransform0, Input.InstanceTransform1,  
   Input.InstanceTransform2 , float4(0, 0, 0, 1) );  
   
   // World-space Height (.z)  
   float4x4 matWorldInstanceLod = mul( g_rainShaftsViewProjMatrix, matInstanceWorld );  
   Output.TexcoordAndZ.z = mul( float4(PositionL, 1.0), transpose(matWorldInstanceLod) ).z;  
   
   // SV_Posiiton  
   float4x4 matModelViewProjection = mul(g_viewProjMatrix, matInstanceWorld );  
   Output.PositionH = mul( float4(PositionL, 1.0), transpose(matModelViewProjection) );       
 
   return Output;  
 }   


Comparison between my VS (left) and original one (right):


The differences do not affect calculations ;) I injected my VS into frame and everything is alright!


Pixel Shader

Finally....! For start, I'll show you inputs:
There are two textures involved: noise texture and depth buffer:



Values from constant buffers:





And pixel shader assembly:
 ps_5_0  
    dcl_globalFlags refactoringAllowed  
    dcl_constantbuffer cb0[8], immediateIndexed  
    dcl_constantbuffer cb2[3], immediateIndexed  
    dcl_constantbuffer cb12[23], immediateIndexed  
    dcl_constantbuffer cb4[8], immediateIndexed  
    dcl_sampler s0, mode_default  
    dcl_sampler s15, mode_default  
    dcl_resource_texture2d (float,float,float,float) t0  
    dcl_resource_texture2d (float,float,float,float) t15  
    dcl_input_ps linear v0.xyz  
    dcl_input_ps linear v1.w  
    dcl_input_ps_siv v2.xy, position  
    dcl_output o0.xyzw  
    dcl_temps 1  
   0: mul r0.xy, cb0[0].xxxx, cb4[5].xyxx  
   1: mad r0.xy, v0.xyxx, cb4[4].xyxx, r0.xyxx  
   2: sample_indexable(texture2d)(float,float,float,float) r0.x, r0.xyxx, t0.xyzw, s0  
   3: add r0.y, -cb4[2].x, cb4[3].x  
   4: mad_sat r0.x, r0.x, r0.y, cb4[2].x  
   5: mul r0.x, r0.x, v0.y  
   6: mul r0.x, r0.x, v1.w  
   7: mul r0.x, r0.x, cb4[1].x  
   8: mul r0.yz, v2.xxyx, cb0[1].zzwz  
   9: sample_l(texture2d)(float,float,float,float) r0.y, r0.yzyy, t15.yxzw, s15, l(0)  
  10: mad r0.y, r0.y, cb12[22].x, cb12[22].y  
  11: mad r0.y, r0.y, cb12[21].x, cb12[21].y  
  12: max r0.y, r0.y, l(0.000100)  
  13: div r0.y, l(1.000000, 1.000000, 1.000000, 1.000000), r0.y  
  14: add r0.y, r0.y, -v0.z  
  15: mul_sat r0.y, r0.y, cb4[6].x  
  16: mul_sat r0.x, r0.y, r0.x  
  17: mad r0.y, cb0[7].y, r0.x, -r0.x  
  18: mad r0.x, cb4[7].x, r0.y, r0.x  
  19: mul r0.xyz, r0.xxxx, cb4[0].xyzx  
  20: log r0.xyz, r0.xyzx  
  21: mul r0.xyz, r0.xyzx, l(2.200000, 2.200000, 2.200000, 0.000000)  
  22: exp r0.xyz, r0.xyzx  
  23: mul r0.xyz, r0.xyzx, cb2[2].xyzx  
  24: mul o0.xyz, r0.xyzx, cb2[2].wwww  
  25: mov o0.w, l(0)  
  26: ret  

Phew! Quite a lot of stuff, but actually it's not that bad.


So, what happens here? At first we calculate animated UVs using elapsed time from cbuffer (cb0[0].x) and some scale/offsets. These texcoords are used to sample from noise texture (line 2).

Once we have noise value from texture, we interpolate between min/max values (usually 0 and 1).
Then we perform some multiplications, like by V tex coordinate (remember that V coordinate comes from 1 to 0?) - line 5

This way we calculated "intensity mask" - it looks like this:



Notice that distant objects (lighthouse, mountains...) are gone. This happens because the cylinder passes depth test - the cylinder is not on far plane and is drawn in front of aforementioned objects:
depth test
We want to mimic that the rain shafts are further (not necessarily on far plane, though). To achieve that, we compute another mask, "far objects mask".

So we compute it with the following formula:
farObjectsMask = saturate( (FrustumDepth - CylinderWorldSpaceHeight) * 0.001 );

(0.001 comes from cbuffer)

which gives us desired mask:


( I explained a bit how frustum depth is extracted from depth buffer in my post about sharpen )

Personally, I think this could be done cheaper, without calculating world-space height in VS by multiplying frustum depth by smaller number, like 0.0004.


Then we multiply both masks, which yields the final one:


Having this final mask (line 16) we have another interpolation which pretty much does nothing (at least in tested scenario), then we multiply the final mask with shafts color (line 19), perform gamma correction (lines 20-22) and perform final multiplications (23-24).

At the end we return color with zero alpha. This is because blending is enabled in this pass:

FinalColor = SourceColor * 1.0 + (1.0 - SourceAlpha) * DestColor.

If you are a bit rusty how blending works, a quick explanation:
SourceColor is RGB output from the pixel shader while DestColor is current RGB color of pixel in render target. Because SourceAlpha is always 0.0, the afromentioned equation simplifies to:

FinalColor = SourceColor + DestColor.

Simply speaking, we perform additive blending here. If this pixel shader returns (0, 0, 0) the color will remain the same.

Here is final HLSL - I think after this description it will be much easier to follow:
 struct VS_OUTPUT  
 {  
   float3 TexcoordAndWorldspaceHeight : Texcoord0;  
   float4 LODParams : LODParams;    // float4(1,1,1,1)  
   float4 PositionH : SV_Position;  
 };  
   
 float getFrustumDepth( in float depth )  
 {  
   // from [1-0] to [0-1]  
   float d = depth * cb12_v22.x + cb12_v22.y;  
   
   // special coefficents  
   d = d * cb12_v21.x + cb12_v21.y;  
   
   // return frustum depth  
   return 1.0 / max(d, 1e-4);  
 }  
   
 float4 EditedShaderPS( in VS_OUTPUT Input ) : SV_Target0  
 {  
   // * Input from Vertex Shader  
   float2 InputUV = Input.TexcoordAndWorldspaceHeight.xy;  
   float WorldHeight = Input.TexcoordAndWorldspaceHeight.z;  
   float LODParam = Input.LODParams.w;  
   
   // * Inputs  
   float elapsedTime = cb0_v0.x;  
   float2 uvAnimation = cb4_v5.xy;  
   float2 uvScale = cb4_v4.xy;    
   float minValue = cb4_v2.x; // 0.0  
   float maxValue = cb4_v3.x; // 1.0  
   float3 shaftsColor = cb4_v0.rgb;  // RGB( 147, 162, 173 )  
   
   float3 finalColorFilter = cb2_v2.rgb; // float3( 1.175, 1.296, 1.342 );  
   float finalEffectIntensity = cb2_v2.w;  
   
   float2 invViewportSize = cb0_v1.zw;  
   
   float depthScale = cb4_v6.x;  // 0.001  
   
   // sample noise  
   float2 uvOffsets = elapsedTime * uvAnimation;  
   float2 uv = InputUV * uvScale + uvOffsets;    
   float disturb = texture0.Sample( sampler0, uv ).x;  
   
   // * Intensity mask  
   float intensity = saturate( lerp(minValue, maxValue, disturb) );  
   intensity *= InputUV.y;   // transition from (0, 1)  
   intensity *= LODParam;   // usually 1.0  
   intensity *= cb4_v1.x;   // 1.0    
   
   // Sample depth  
   float2 ScreenUV = Input.PositionH.xy * invViewportSize;  
   float hardwareDepth = texture15.SampleLevel( sampler15, ScreenUV, 0 ).x;  
   float frustumDepth = getFrustumDepth( hardwareDepth );  
   
   
   // * Calculate mask covering distant objects behind cylinder.  
   
   // Seems that the input really is world-space height (.z component, see vertex shader)  
   float depth = frustumDepth - WorldHeight;  
   float distantObjectsMask = saturate( depth * depthScale );  
   
   // * calculate final mask  
   float finalEffectMask = saturate( intensity * distantObjectsMask );  
   
   // cb0_v7.y and cb4_v7.x are set to 1.0 so I didn't bother with naming them :)  
   float paramX = finalEffectMask;  
   float paramY = cb0_v7.y * finalEffectMask;  
   float effectAmount = lerp(paramX, paramY, cb4_v7.x);  
   
   // color of shafts comes from contant buffer  
   float3 effectColor = effectAmount * shaftsColor;  
   
   // gamma correction  
   effectColor = pow(effectColor, 2.2);  
   
   // final multiplications  
   effectColor *= finalColorFilter;  
   effectColor *= finalEffectIntensity;  
   
   // return with zero alpha 'cause the blending used here is:  
   // SourceColor * 1.0 + (1.0 - SrcAlpha) * DestColor  
   return float4( effectColor, 0.0 );  
 }   

I'm happy to say that my PS produces the same assembly as original is ;)

I hope you enjoyed it.
Thanks for reading! :)

M.

Brak komentarzy:

Prześlij komentarz