sobota, 11 maja 2019

Reverse engineering the rendering of The Witcher 3, part 14 - cirrus clouds

When it comes to being outdoor, sky is one of these aspects which decide if the world in a game is believable. Think about that for a while - sky literally takes, let's say, 40-50% of whole screen for most of the time. Sky is a lot more than just nice gradient. We have stars, the Sun, the Moon and, finally, clouds.

While the current trend apparently is rendering clouds in volumetric way using raymarching (see this one), clouds in The Witcher 3 are completely textures-based. I have been looking at them since some time already but, obviously, things are more complicated than I initially expected. If you have been following the series you know that there are differences between "Blood and Wine" addon and the rest. And guess what - there are some changes in terms of clouds in B&W too.

There are a few layers of clouds in The Witcher 3. Depending of current weather, we can have only cirrus cloudsaltocumulus, maybe a few from stratus family (during storm, for instance). Or, what the heck, we can have nothing at all.

Some layers vary in terms of input textures and shaders used to render them. This affects their complexity and length of pixel shader assembly (obviously).

Despite all this diversity, there are some common patterns we can observe in clouds rendering of The Witcher 3. First of all, they are all rendered in forward pass which is absolutely the right choice. All of them use blending (see below). This way it's much easier to control how particular layer covers sky - alpha value from pixel shader affects it.
What's more interesting, some layers are rendered twice with the same settings.

After an evaluation I picked up the shortest shader I could find - in order to (1) have the largest probability of completely reverse engineering of it, (2) to be able to understand all aspect of it.
I'll take a closer look at cirrus clouds from The Witcher 3: Blood and Wine.

Here is an example frame:
Before rendering
After first rendering pass
After second rendering pass

In this particular frame cirrus clouds are the first layer being rendered. As you can see, it's rendered twice which increases its intensity.

Geometry and vertex shader

Before pixel shader part, a short paragraph about geometry and vertex shader used. The mesh for clouds representation is something similar to typical skydome:

All vertices are contained in [0-1] range, so in order to make the mesh centered around (0,0,0) point, scale+bias is used before worldViewProj transform (we already know this pattern from previous parts of this series). For clouds the mesh is largely stretched along XY plane (Z is up) to cover more than view frustum, the result is as follows:

Apart from that, the mesh has normals and tangents vectors. The vertex shader calculates also bitangent vector by cross product - all three are outputted in normalized form. Moreover, there is per-vertex fog calculation (color and intensity).

Pixel Shader


The pixel shader assembly is as follows:
 ps_5_0  
    dcl_globalFlags refactoringAllowed  
    dcl_constantbuffer cb0[10], immediateIndexed  
    dcl_constantbuffer cb1[9], immediateIndexed  
    dcl_constantbuffer cb12[238], immediateIndexed  
    dcl_constantbuffer cb4[13], immediateIndexed  
    dcl_sampler s0, mode_default  
    dcl_resource_texture2d (float,float,float,float) t0  
    dcl_resource_texture2d (float,float,float,float) t1  
    dcl_input_ps linear v0.xyzw  
    dcl_input_ps linear v1.xyzw  
    dcl_input_ps linear v2.w  
    dcl_input_ps linear v3.xyzw  
    dcl_input_ps linear v4.xyz  
    dcl_input_ps linear v5.xyz  
    dcl_output o0.xyzw  
    dcl_temps 4  
   0: mul r0.xyz, cb0[9].xyzx, l(1.000000, 1.000000, -1.000000, 0.000000)  
   1: dp3 r0.w, r0.xyzx, r0.xyzx  
   2: rsq r0.w, r0.w  
   3: mul r0.xyz, r0.wwww, r0.xyzx  
   4: mul r1.xy, cb0[0].xxxx, cb4[5].xyxx  
   5: mad r1.xy, v1.xyxx, cb4[4].xyxx, r1.xyxx  
   6: sample_indexable(texture2d)(float,float,float,float) r1.xyzw, r1.xyxx, t0.xyzw, s0  
   7: add r1.xyz, r1.xyzx, l(-0.500000, -0.500000, -0.500000, 0.000000)  
   8: add r1.xyz, r1.xyzx, r1.xyzx  
   9: dp3 r0.w, r1.xyzx, r1.xyzx  
  10: rsq r0.w, r0.w  
  11: mul r1.xyz, r0.wwww, r1.xyzx  
  12: mul r2.xyz, r1.yyyy, v3.xyzx  
  13: mad r2.xyz, v5.xyzx, r1.xxxx, r2.xyzx  
  14: mov r3.xy, v1.zwzz  
  15: mov r3.z, v3.w  
  16: mad r1.xyz, r3.xyzx, r1.zzzz, r2.xyzx  
  17: dp3_sat r0.x, r0.xyzx, r1.xyzx  
  18: add r0.y, -cb4[2].x, cb4[3].x  
  19: mad r0.x, r0.x, r0.y, cb4[2].x  
  20: dp2 r0.y, -cb0[9].xyxx, -cb0[9].xyxx  
  21: rsq r0.y, r0.y  
  22: mul r0.yz, r0.yyyy, -cb0[9].xxyx  
  23: add r1.xyz, -v4.xyzx, cb1[8].xyzx  
  24: dp3 r0.w, r1.xyzx, r1.xyzx  
  25: rsq r1.z, r0.w  
  26: sqrt r0.w, r0.w  
  27: add r0.w, r0.w, -cb4[7].x  
  28: mul r1.xy, r1.zzzz, r1.xyxx  
  29: dp2_sat r0.y, r0.yzyy, r1.xyxx  
  30: add r0.y, r0.y, r0.y  
  31: min r0.y, r0.y, l(1.000000)  
  32: add r0.z, -cb4[0].x, cb4[1].x  
  33: mad r0.z, r0.y, r0.z, cb4[0].x  
  34: mul r0.x, r0.x, r0.z  
  35: log r0.x, r0.x  
  36: mul r0.x, r0.x, l(2.200000)  
  37: exp r0.x, r0.x  
  38: add r1.xyz, cb12[236].xyzx, -cb12[237].xyzx  
  39: mad r1.xyz, r0.yyyy, r1.xyzx, cb12[237].xyzx  
  40: mul r2.xyz, r0.xxxx, r1.xyzx  
  41: mad r0.xyz, -r1.xyzx, r0.xxxx, v0.xyzx  
  42: mad r0.xyz, v0.wwww, r0.xyzx, r2.xyzx  
  43: add r1.x, -cb4[7].x, cb4[8].x  
  44: div_sat r0.w, r0.w, r1.x  
  45: mul r1.x, r1.w, cb4[9].x  
  46: mad r1.y, -cb4[9].x, r1.w, r1.w  
  47: mad r0.w, r0.w, r1.y, r1.x  
  48: mul r1.xy, cb0[0].xxxx, cb4[11].xyxx  
  49: mad r1.xy, v1.xyxx, cb4[10].xyxx, r1.xyxx  
  50: sample_indexable(texture2d)(float,float,float,float) r1.x, r1.xyxx, t1.xyzw, s0  
  51: mad r1.x, r1.x, cb4[12].x, -cb4[12].x  
  52: mad_sat r1.x, cb4[12].x, v2.w, r1.x  
  53: mul r0.w, r0.w, r1.x  
  54: mul_sat r0.w, r0.w, cb4[6].x  
  55: mul o0.xyz, r0.wwww, r0.xyzx  
  56: mov o0.w, r0.w  
  57: ret  

In terms of input, there are two tiled textures. One of them contains normal map (xyz channels) and cloud shape (a channel). The second one is noise for shape perturbations.

Normal map (c) CD Projekt Red
Cloud shape (c) CD Projekt Red
Noise texture (c) CD Projekt Red

The main constant buffer with clouds parameters is cb4. For this frame their values are:
Apart from that, there are other values used from other cbuffers. Don't worry, we'll get through these ones as well.

Z-Inverted sunlight direction

The first thing which happens in the shader is calculation of normalized, Z-inverted direction of sunlight:
   0: mul r0.xyz, cb0[9].xyzx, l(1.000000, 1.000000, -1.000000, 0.000000)  
   1: dp3 r0.w, r0.xyzx, r0.xyzx  
   2: rsq r0.w, r0.w  
   3: mul r0.xyz, r0.wwww, r0.xyzx  

   float3 invertedSunlightDir = normalize(lightDir * float3(1, 1, -1) );

As I mentioned earlier, Z is up-axis while cb0[9] is sunlight direction. This vector goes into the Sun - this is important! You can check this by writing simple compute shader which performs simple NdotL and inject it to deferred shading pass :)

Sampling the clouds texture

The next step is to calculate texcoords to sample "clouds" texture, sample it, unpack normal vector and normalize it.

   4: mul r1.xy, cb0[0].xxxx, cb4[5].xyxx   
   5: mad r1.xy, v1.xyxx, cb4[4].xyxx, r1.xyxx   
   6: sample_indexable(texture2d)(float,float,float,float) r1.xyzw, r1.xyxx, t0.xyzw, s0   
   7: add r1.xyz, r1.xyzx, l(-0.500000, -0.500000, -0.500000, 0.000000)   
   8: add r1.xyz, r1.xyzx, r1.xyzx   
   9: dp3 r0.w, r1.xyzx, r1.xyzx   
  10: rsq r0.w, r0.w   
   
   
   // Calc sampling coords  
   float2 cloudTextureUV = Texcoords * textureScale + elapsedTime * speedFactors;  
   
   // Sample texture and get data from it  
   float4 cloudTextureValue = texture0.Sample( sampler0, cloudTextureUV ).rgba;  
   float3 normalMap = cloudTextureValue.xyz;  
   float cloudShape = cloudTextureValue.a;  
   
   // Unpack normal and normalize it  
   float3 unpackedNormal = (normalMap - 0.5) * 2.0;  
   unpackedNormal = normalize(unpackedNormal);  

Let's go through this slowly.
In order to have a motion of clouds, we need elapsed time in seconds ( cb[0].x ) multipiled by speed factor which affects how fast clouds are moving on the sky ( cb4[5].xy ).
UVs are stretched on skydome geoometry I was talking before, we also need texture scaling coefficients which affect size of clouds ( cb4[4].xy ).

Final formula is:
samplingUV = Input.TextureUV * textureScale + time * speedMultiplier;

After sampling of all 4 channels we have normal map (rgb channels) and cloud shape (a channel).
To unpack normal map from [0; 1] to [-1; 1] range, we use the following formula:

unpackedNormal = (packedNormal - 0.5) * 2.0;

We could also use this one:
unpackedNormal = packedNormal * 2.0 - 1.0;

And finally we normalize the unpacked normal vector.

Normal mapping

Having normal, tangent and bitangent vectors from vertex shader and normal vector from normal map we perform normal mapping the usual way.

  11: mul r1.xyz, r0.wwww, r1.xyzx  
  12: mul r2.xyz, r1.yyyy, v3.xyzx  
  13: mad r2.xyz, v5.xyzx, r1.xxxx, r2.xyzx  
  14: mov r3.xy, v1.zwzz  
  15: mov r3.z, v3.w  
  16: mad r1.xyz, r3.xyzx, r1.zzzz, r2.xyzx  
    
   // Perform bump mapping  
   float3 SkyTangent = Input.Tangent;  
   float3 SkyNormal = (float3( Input.Texcoords.zw, Input.param3.w ));  
   float3 SkyBitangent = Input.param3.xyz;  
        
   float3x3 TBN = float3x3(SkyTangent, SkyBitangent, SkyNormal);  
   float3 finalNormal = (float3)mul( unpackedNormal, (TBN) );  

Highlight intensity (1)

The next step involves calculating NdotL and this affects amount of highlighting at particular pixel.
Consider the following piece of assembly:
  17: dp3_sat r0.x, r0.xyzx, r1.xyzx  
  18: add r0.y, -cb4[2].x, cb4[3].x  
  19: mad r0.x, r0.x, r0.y, cb4[2].x  

Here is the visualization of NdotL in considered frame:

This (saturated) dot product is used to interpolate between minIntensity and maxIntensity:
This way parts of clouds which are more exposed to sunlight will be brighter.
   // Calculate cosine between normal and up-inv lightdir  
   float NdotL = saturate( dot(invertedSunlightDir, finalNormal) );  
   
   // Param 1, line 19, r0.x  
   float intensity1 = lerp( param1Min, param1Max, NdotL );  

Highlight intensity (2)

There is one more factor which affects the intensity of clouds.
Clouds which exist on the section of sky where Sun in present should be more highlighted. To do that, we calculate a gradient based on XY plane.
This gradient is used to lerp between min/max values, similar fashion as in (1) part.
So in theory we could request darkening clouds which are on opposite side of the Sun but in this particular frame it does not happen because param2Min and param2Max ( cb4[0].x and cb4[1].x, respectively) are both set to 1.0f.

  20: dp2 r0.y, -cb0[9].xyxx, -cb0[9].xyxx  
  21: rsq r0.y, r0.y  
  22: mul r0.yz, r0.yyyy, -cb0[9].xxyx  
  23: add r1.xyz, -v4.xyzx, cb1[8].xyzx  
  24: dp3 r0.w, r1.xyzx, r1.xyzx  
  25: rsq r1.z, r0.w  
  26: sqrt r0.w, r0.w  
  27: add r0.w, r0.w, -cb4[7].x  
  28: mul r1.xy, r1.zzzz, r1.xyxx  
  29: dp2_sat r0.y, r0.yzyy, r1.xyxx  
  30: add r0.y, r0.y, r0.y  
  31: min r0.y, r0.y, l(1.000000)  
  32: add r0.z, -cb4[0].x, cb4[1].x  
  33: mad r0.z, r0.y, r0.z, cb4[0].x  
  34: mul r0.x, r0.x, r0.z  
  35: log r0.x, r0.x  
  36: mul r0.x, r0.x, l(2.200000)  
  37: exp r0.x, r0.x   
   
   
   // Calculate normalized -lightDir.xy (20-22)  
   float2 lightDirXY = normalize( -lightDir.xy );  
   
   // Calculate world to camera  
   float3 vWorldToCamera = ( CameraPos - WorldPos );  
   float worldToCamera_distance = length(vWorldToCamera);  
        
   // normalize vector  
   vWorldToCamera = normalize( vWorldToCamera );  
        
   
   float LdotV = saturate( dot(lightDirXY, vWorldToCamera.xy) );  
   float highlightedSkySection = saturate( 2*LdotV );  
   float intensity2 = lerp( param2Min, param2Max, highlightedSkySection );  
   
   float finalIntensity = pow( intensity2 *intensity1, 2.2);  

At the very end we multiply both intensities and raise it to power of 2.2.

Clouds color

Calculating color of clouds starts by having two values from constant buffer which indicate color of clouds near the Sun and clouds on the opposite side of the Sky. They are lerped with highlightedSkySection.

Then, the result is multiplied by finalIntensity.
And at the end, the result is mixed with fog (it was calculated in the vertex shader for sake of performance).

  38: add r1.xyz, cb12[236].xyzx, -cb12[237].xyzx  
  39: mad r1.xyz, r0.yyyy, r1.xyzx, cb12[237].xyzx  
  40: mul r2.xyz, r0.xxxx, r1.xyzx  
  41: mad r0.xyz, -r1.xyzx, r0.xxxx, v0.xyzx  
  42: mad r0.xyz, v0.wwww, r0.xyzx, r2.xyzx  
   
  float3 cloudsColor = lerp( cloudsColorBack, cloudsColorFront, highlightedSunSection );  
  cloudsColor *= finalIntensity;  
  cloudsColor = lerp( cloudsColor, FogColor, FogAmount );  

Making sure cirrus clouds are more visible on horizon

It's not really visible on this frame, but in fact this layer is more visible near horizon than above Geralt's head. Here's how to do that.

You might have noticed that we calculated length of worldToCamera vector at calculating the second intensity:

  23: add r1.xyz, -v4.xyzx, cb1[8].xyzx  
  24: dp3 r0.w, r1.xyzx, r1.xyzx  
  25: rsq r1.z, r0.w  
  26: sqrt r0.w, r0.w

Let's find the next appearances of this length in the assembly:

  26: sqrt r0.w, r0.w  
  27: add r0.w, r0.w, -cb4[7].x  
  ...  
  43: add r1.x, -cb4[7].x, cb4[8].x  
  44: div_sat r0.w, r0.w, r1.x  

Woah, what we have here??
cb[7].x and cb[8].x have values of 2000.0 and 7000.0, respectively.

It turns out this is use of a function called linstep.
It takes three parameters: min/max which are range, and v which is value.
So the way it works is if v is within [min-max] range it returns a linear interpolation between [0.0 - 1.0]. On the other hand, if v is out of bounds, linstep returns 0.0 or 1.0.

A simple example:

linstep( 1000.0, 2000.0, 999.0) = 0.0
linstep( 1000.0, 2000.0, 1500.0) = 0.5
linstep( 1000.0, 2000.0, 2000.0) = 1.0

So it's quite similar to smoothstep from HLSL except in this case a linear interpolation is performed instead of Hermite one.
Linstep is not present in HLSL, but it's very useful. Really worth to have it in your toolbox.
 // linstep:  
 //  
 // Returns a linear interpolation between 0 and 1 if t is in the range [min, max]   
 // if "v" is <= min, the output is 0  
 // if "v" i >= max, the output is 1  
   
 float linstep( float min, float max, float v )  
 {  
   return saturate( (v - min) / (max - min) );  
 }  


Returning to The Witcher 3:
Once we have deducted this factor which indicates how far the particular piece of the sky is from Geralt we use it to attenuate clouds intensity:

  45: mul r1.x, r1.w, cb4[9].x  
  46: mad r1.y, -cb4[9].x, r1.w, r1.w  
  47: mad r0.w, r0.w, r1.y, r1.x  
   
   float distanceAttenuation = linstep( fadeDistanceStart, fadeDistanceEnd, worldToCamera_distance );  
    
   float fadedCloudShape = closeCloudsHidingFactor * cloudShape;  
   cloudShape = lerp( fadedCloudShape, cloudShape, distanceAttenuation );  

cloudShape is .a channel from the first texture while closeCloudsHidingFactor is a value from constant buffer which controls how visible are clouds above Geralt's head. In every frame I tested this was set to 0.0 which is equal to no clouds. With distanceAttenuation getting closer to 1.0 (distance from camera to skydome increases) clouds are more and more visible.

Sampling noise texture

For a noise texture calculating of sampling coordinates is the same as for the clouds texture except there is other set of textureScale and speedMultiplier for it.

Of course to sample all these texture a sampler with wrap addressing mode is used.

  48: mul r1.xy, cb0[0].xxxx, cb4[11].xyxx  
  49: mad r1.xy, v1.xyxx, cb4[10].xyxx, r1.xyxx  
  50: sample_indexable(texture2d)(float,float,float,float) r1.x, r1.xyxx, t1.xyzw, s0  
   
   // Calc sampling coords for noise  
   float2 noiseTextureUV = Texcoords * textureScaleNoise + elapsedTime * speedFactorsNoise;  
   
   // Sample texture and get data from it  
   float noiseTextureValue = texture1.Sample( sampler0, noiseTextureUV ).x;  

Getting all this together

Once we have a noise value, we have to combine it with cloudShape.
I had some problems with understanding these lines with "param2.w" (which is always 1.0) and noiseMult (was set to 5.0, it's from cbuffer).

Anyway, what's important here is the final value generalCloudsVisibility which affects how clouds are visible.

Take a look at final noise value as well. Output color is cloudsColor multiplied by the final noise which is also output at alpha channel.

  51: mad r1.x, r1.x, cb4[12].x, -cb4[12].x
  52: mad_sat r1.x, cb4[12].x, v2.w, r1.x
  53: mul r0.w, r0.w, r1.x
  54: mul_sat r0.w, r0.w, cb4[6].x
  55: mul o0.xyz, r0.wwww, r0.xyzx
  56: mov o0.w, r0.w
  57: ret   

   // Sample noise texture and get data from it  
   float noiseTextureValue = texture1.Sample( sampler0, noiseTextureUV ).x;  
   noiseTextureValue = noiseTextureValue * noiseMult - noiseMult;  
     
   float noiseValue = saturate( noiseMult * Input.param2.w + noiseTextureValue);  
   noiseValue *= cloudShape;  
     
   float finalNoise = saturate( noiseValue * generalCloudsVisibility);  
   
   return float4( cloudsColor*finalNoise, finalNoise );  

Summary

So here we are at the end. The final result looks really convincing.
Time for a comparison. My shader - left, original one - right:

The shader is here if you're interested.

Feel free to comment and thanks for reading! :)
M.

Brak komentarzy:

Prześlij komentarz