sobota, 11 maja 2019

Reverse engineering the rendering of The Witcher 3, part 14 - cirrus clouds

When it comes to being outdoor, sky is one of these aspects which decide if the world in a game is believable. Think about that for a while - sky literally takes, let's say, 40-50% of whole screen for most of the time. Sky is a lot more than just nice gradient. We have stars, the Sun, the Moon and, finally, clouds.

While the current trend apparently is rendering clouds in volumetric way using raymarching (see this one), clouds in The Witcher 3 are completely textures-based. I have been looking at them since some time already but, obviously, things are more complicated than I initially expected. If you have been following the series you know that there are differences between "Blood and Wine" addon and the rest. And guess what - there are some changes in terms of clouds in B&W too.

There are a few layers of clouds in The Witcher 3. Depending of current weather, we can have only cirrus cloudsaltocumulus, maybe a few from stratus family (during storm, for instance). Or, what the heck, we can have nothing at all.

Some layers vary in terms of input textures and shaders used to render them. This affects their complexity and length of pixel shader assembly (obviously).

Despite all this diversity, there are some common patterns we can observe in clouds rendering of The Witcher 3. First of all, they are all rendered in forward pass which is absolutely the right choice. All of them use blending (see below). This way it's much easier to control how particular layer covers sky - alpha value from pixel shader affects it.
What's more interesting, some layers are rendered twice with the same settings.

After an evaluation I picked up the shortest shader I could find - in order to (1) have the largest probability of completely reverse engineering of it, (2) to be able to understand all aspect of it.
I'll take a closer look at cirrus clouds from The Witcher 3: Blood and Wine.

Here is an example frame:
Before rendering
After first rendering pass
After second rendering pass

In this particular frame cirrus clouds are the first layer being rendered. As you can see, it's rendered twice which increases its intensity.

Geometry and vertex shader

Before pixel shader part, a short paragraph about geometry and vertex shader used. The mesh for clouds representation is something similar to typical skydome:

All vertices are contained in [0-1] range, so in order to make the mesh centered around (0,0,0) point, scale+bias is used before worldViewProj transform (we already know this pattern from previous parts of this series). For clouds the mesh is largely stretched along XY plane (Z is up) to cover more than view frustum, the result is as follows:

Apart from that, the mesh has normals and tangents vectors. The vertex shader calculates also bitangent vector by cross product - all three are outputted in normalized form. Moreover, there is per-vertex fog calculation (color and intensity).

Pixel Shader


The pixel shader assembly is as follows:
 ps_5_0  
    dcl_globalFlags refactoringAllowed  
    dcl_constantbuffer cb0[10], immediateIndexed  
    dcl_constantbuffer cb1[9], immediateIndexed  
    dcl_constantbuffer cb12[238], immediateIndexed  
    dcl_constantbuffer cb4[13], immediateIndexed  
    dcl_sampler s0, mode_default  
    dcl_resource_texture2d (float,float,float,float) t0  
    dcl_resource_texture2d (float,float,float,float) t1  
    dcl_input_ps linear v0.xyzw  
    dcl_input_ps linear v1.xyzw  
    dcl_input_ps linear v2.w  
    dcl_input_ps linear v3.xyzw  
    dcl_input_ps linear v4.xyz  
    dcl_input_ps linear v5.xyz  
    dcl_output o0.xyzw  
    dcl_temps 4  
   0: mul r0.xyz, cb0[9].xyzx, l(1.000000, 1.000000, -1.000000, 0.000000)  
   1: dp3 r0.w, r0.xyzx, r0.xyzx  
   2: rsq r0.w, r0.w  
   3: mul r0.xyz, r0.wwww, r0.xyzx  
   4: mul r1.xy, cb0[0].xxxx, cb4[5].xyxx  
   5: mad r1.xy, v1.xyxx, cb4[4].xyxx, r1.xyxx  
   6: sample_indexable(texture2d)(float,float,float,float) r1.xyzw, r1.xyxx, t0.xyzw, s0  
   7: add r1.xyz, r1.xyzx, l(-0.500000, -0.500000, -0.500000, 0.000000)  
   8: add r1.xyz, r1.xyzx, r1.xyzx  
   9: dp3 r0.w, r1.xyzx, r1.xyzx  
  10: rsq r0.w, r0.w  
  11: mul r1.xyz, r0.wwww, r1.xyzx  
  12: mul r2.xyz, r1.yyyy, v3.xyzx  
  13: mad r2.xyz, v5.xyzx, r1.xxxx, r2.xyzx  
  14: mov r3.xy, v1.zwzz  
  15: mov r3.z, v3.w  
  16: mad r1.xyz, r3.xyzx, r1.zzzz, r2.xyzx  
  17: dp3_sat r0.x, r0.xyzx, r1.xyzx  
  18: add r0.y, -cb4[2].x, cb4[3].x  
  19: mad r0.x, r0.x, r0.y, cb4[2].x  
  20: dp2 r0.y, -cb0[9].xyxx, -cb0[9].xyxx  
  21: rsq r0.y, r0.y  
  22: mul r0.yz, r0.yyyy, -cb0[9].xxyx  
  23: add r1.xyz, -v4.xyzx, cb1[8].xyzx  
  24: dp3 r0.w, r1.xyzx, r1.xyzx  
  25: rsq r1.z, r0.w  
  26: sqrt r0.w, r0.w  
  27: add r0.w, r0.w, -cb4[7].x  
  28: mul r1.xy, r1.zzzz, r1.xyxx  
  29: dp2_sat r0.y, r0.yzyy, r1.xyxx  
  30: add r0.y, r0.y, r0.y  
  31: min r0.y, r0.y, l(1.000000)  
  32: add r0.z, -cb4[0].x, cb4[1].x  
  33: mad r0.z, r0.y, r0.z, cb4[0].x  
  34: mul r0.x, r0.x, r0.z  
  35: log r0.x, r0.x  
  36: mul r0.x, r0.x, l(2.200000)  
  37: exp r0.x, r0.x  
  38: add r1.xyz, cb12[236].xyzx, -cb12[237].xyzx  
  39: mad r1.xyz, r0.yyyy, r1.xyzx, cb12[237].xyzx  
  40: mul r2.xyz, r0.xxxx, r1.xyzx  
  41: mad r0.xyz, -r1.xyzx, r0.xxxx, v0.xyzx  
  42: mad r0.xyz, v0.wwww, r0.xyzx, r2.xyzx  
  43: add r1.x, -cb4[7].x, cb4[8].x  
  44: div_sat r0.w, r0.w, r1.x  
  45: mul r1.x, r1.w, cb4[9].x  
  46: mad r1.y, -cb4[9].x, r1.w, r1.w  
  47: mad r0.w, r0.w, r1.y, r1.x  
  48: mul r1.xy, cb0[0].xxxx, cb4[11].xyxx  
  49: mad r1.xy, v1.xyxx, cb4[10].xyxx, r1.xyxx  
  50: sample_indexable(texture2d)(float,float,float,float) r1.x, r1.xyxx, t1.xyzw, s0  
  51: mad r1.x, r1.x, cb4[12].x, -cb4[12].x  
  52: mad_sat r1.x, cb4[12].x, v2.w, r1.x  
  53: mul r0.w, r0.w, r1.x  
  54: mul_sat r0.w, r0.w, cb4[6].x  
  55: mul o0.xyz, r0.wwww, r0.xyzx  
  56: mov o0.w, r0.w  
  57: ret  

In terms of input, there are two tiled textures. One of them contains normal map (xyz channels) and cloud shape (a channel). The second one is noise for shape perturbations.

Normal map (c) CD Projekt Red
Cloud shape (c) CD Projekt Red
Noise texture (c) CD Projekt Red

The main constant buffer with clouds parameters is cb4. For this frame their values are:
Apart from that, there are other values used from other cbuffers. Don't worry, we'll get through these ones as well.

Z-Inverted sunlight direction

The first thing which happens in the shader is calculation of normalized, Z-inverted direction of sunlight:
   0: mul r0.xyz, cb0[9].xyzx, l(1.000000, 1.000000, -1.000000, 0.000000)  
   1: dp3 r0.w, r0.xyzx, r0.xyzx  
   2: rsq r0.w, r0.w  
   3: mul r0.xyz, r0.wwww, r0.xyzx  

   float3 invertedSunlightDir = normalize(lightDir * float3(1, 1, -1) );

As I mentioned earlier, Z is up-axis while cb0[9] is sunlight direction. This vector goes into the Sun - this is important! You can check this by writing simple compute shader which performs simple NdotL and inject it to deferred shading pass :)

Sampling the clouds texture

The next step is to calculate texcoords to sample "clouds" texture, sample it, unpack normal vector and normalize it.

   4: mul r1.xy, cb0[0].xxxx, cb4[5].xyxx   
   5: mad r1.xy, v1.xyxx, cb4[4].xyxx, r1.xyxx   
   6: sample_indexable(texture2d)(float,float,float,float) r1.xyzw, r1.xyxx, t0.xyzw, s0   
   7: add r1.xyz, r1.xyzx, l(-0.500000, -0.500000, -0.500000, 0.000000)   
   8: add r1.xyz, r1.xyzx, r1.xyzx   
   9: dp3 r0.w, r1.xyzx, r1.xyzx   
  10: rsq r0.w, r0.w   
   
   
   // Calc sampling coords  
   float2 cloudTextureUV = Texcoords * textureScale + elapsedTime * speedFactors;  
   
   // Sample texture and get data from it  
   float4 cloudTextureValue = texture0.Sample( sampler0, cloudTextureUV ).rgba;  
   float3 normalMap = cloudTextureValue.xyz;  
   float cloudShape = cloudTextureValue.a;  
   
   // Unpack normal and normalize it  
   float3 unpackedNormal = (normalMap - 0.5) * 2.0;  
   unpackedNormal = normalize(unpackedNormal);  

Let's go through this slowly.
In order to have a motion of clouds, we need elapsed time in seconds ( cb[0].x ) multipiled by speed factor which affects how fast clouds are moving on the sky ( cb4[5].xy ).
UVs are stretched on skydome geoometry I was talking before, we also need texture scaling coefficients which affect size of clouds ( cb4[4].xy ).

Final formula is:
samplingUV = Input.TextureUV * textureScale + time * speedMultiplier;

After sampling of all 4 channels we have normal map (rgb channels) and cloud shape (a channel).
To unpack normal map from [0; 1] to [-1; 1] range, we use the following formula:

unpackedNormal = (packedNormal - 0.5) * 2.0;

We could also use this one:
unpackedNormal = packedNormal * 2.0 - 1.0;

And finally we normalize the unpacked normal vector.

Normal mapping

Having normal, tangent and bitangent vectors from vertex shader and normal vector from normal map we perform normal mapping the usual way.

  11: mul r1.xyz, r0.wwww, r1.xyzx  
  12: mul r2.xyz, r1.yyyy, v3.xyzx  
  13: mad r2.xyz, v5.xyzx, r1.xxxx, r2.xyzx  
  14: mov r3.xy, v1.zwzz  
  15: mov r3.z, v3.w  
  16: mad r1.xyz, r3.xyzx, r1.zzzz, r2.xyzx  
    
   // Perform bump mapping  
   float3 SkyTangent = Input.Tangent;  
   float3 SkyNormal = (float3( Input.Texcoords.zw, Input.param3.w ));  
   float3 SkyBitangent = Input.param3.xyz;  
        
   float3x3 TBN = float3x3(SkyTangent, SkyBitangent, SkyNormal);  
   float3 finalNormal = (float3)mul( unpackedNormal, (TBN) );  

Highlight intensity (1)

The next step involves calculating NdotL and this affects amount of highlighting at particular pixel.
Consider the following piece of assembly:
  17: dp3_sat r0.x, r0.xyzx, r1.xyzx  
  18: add r0.y, -cb4[2].x, cb4[3].x  
  19: mad r0.x, r0.x, r0.y, cb4[2].x  

Here is the visualization of NdotL in considered frame:

This (saturated) dot product is used to interpolate between minIntensity and maxIntensity:
This way parts of clouds which are more exposed to sunlight will be brighter.
   // Calculate cosine between normal and up-inv lightdir  
   float NdotL = saturate( dot(invertedSunlightDir, finalNormal) );  
   
   // Param 1, line 19, r0.x  
   float intensity1 = lerp( param1Min, param1Max, NdotL );  

Highlight intensity (2)

There is one more factor which affects the intensity of clouds.
Clouds which exist on the section of sky where Sun in present should be more highlighted. To do that, we calculate a gradient based on XY plane.
This gradient is used to lerp between min/max values, similar fashion as in (1) part.
So in theory we could request darkening clouds which are on opposite side of the Sun but in this particular frame it does not happen because param2Min and param2Max ( cb4[0].x and cb4[1].x, respectively) are both set to 1.0f.

  20: dp2 r0.y, -cb0[9].xyxx, -cb0[9].xyxx  
  21: rsq r0.y, r0.y  
  22: mul r0.yz, r0.yyyy, -cb0[9].xxyx  
  23: add r1.xyz, -v4.xyzx, cb1[8].xyzx  
  24: dp3 r0.w, r1.xyzx, r1.xyzx  
  25: rsq r1.z, r0.w  
  26: sqrt r0.w, r0.w  
  27: add r0.w, r0.w, -cb4[7].x  
  28: mul r1.xy, r1.zzzz, r1.xyxx  
  29: dp2_sat r0.y, r0.yzyy, r1.xyxx  
  30: add r0.y, r0.y, r0.y  
  31: min r0.y, r0.y, l(1.000000)  
  32: add r0.z, -cb4[0].x, cb4[1].x  
  33: mad r0.z, r0.y, r0.z, cb4[0].x  
  34: mul r0.x, r0.x, r0.z  
  35: log r0.x, r0.x  
  36: mul r0.x, r0.x, l(2.200000)  
  37: exp r0.x, r0.x   
   
   
   // Calculate normalized -lightDir.xy (20-22)  
   float2 lightDirXY = normalize( -lightDir.xy );  
   
   // Calculate world to camera  
   float3 vWorldToCamera = ( CameraPos - WorldPos );  
   float worldToCamera_distance = length(vWorldToCamera);  
        
   // normalize vector  
   vWorldToCamera = normalize( vWorldToCamera );  
        
   
   float LdotV = saturate( dot(lightDirXY, vWorldToCamera.xy) );  
   float highlightedSkySection = saturate( 2*LdotV );  
   float intensity2 = lerp( param2Min, param2Max, highlightedSkySection );  
   
   float finalIntensity = pow( intensity2 *intensity1, 2.2);  

At the very end we multiply both intensities and raise it to power of 2.2.

Clouds color

Calculating color of clouds starts by having two values from constant buffer which indicate color of clouds near the Sun and clouds on the opposite side of the Sky. They are lerped with highlightedSkySection.

Then, the result is multiplied by finalIntensity.
And at the end, the result is mixed with fog (it was calculated in the vertex shader for sake of performance).

  38: add r1.xyz, cb12[236].xyzx, -cb12[237].xyzx  
  39: mad r1.xyz, r0.yyyy, r1.xyzx, cb12[237].xyzx  
  40: mul r2.xyz, r0.xxxx, r1.xyzx  
  41: mad r0.xyz, -r1.xyzx, r0.xxxx, v0.xyzx  
  42: mad r0.xyz, v0.wwww, r0.xyzx, r2.xyzx  
   
  float3 cloudsColor = lerp( cloudsColorBack, cloudsColorFront, highlightedSunSection );  
  cloudsColor *= finalIntensity;  
  cloudsColor = lerp( cloudsColor, FogColor, FogAmount );  

Making sure cirrus clouds are more visible on horizon

It's not really visible on this frame, but in fact this layer is more visible near horizon than above Geralt's head. Here's how to do that.

You might have noticed that we calculated length of worldToCamera vector at calculating the second intensity:

  23: add r1.xyz, -v4.xyzx, cb1[8].xyzx  
  24: dp3 r0.w, r1.xyzx, r1.xyzx  
  25: rsq r1.z, r0.w  
  26: sqrt r0.w, r0.w

Let's find the next appearances of this length in the assembly:

  26: sqrt r0.w, r0.w  
  27: add r0.w, r0.w, -cb4[7].x  
  ...  
  43: add r1.x, -cb4[7].x, cb4[8].x  
  44: div_sat r0.w, r0.w, r1.x  

Woah, what we have here??
cb[7].x and cb[8].x have values of 2000.0 and 7000.0, respectively.

It turns out this is use of a function called linstep.
It takes three parameters: min/max which are range, and v which is value.
So the way it works is if v is within [min-max] range it returns a linear interpolation between [0.0 - 1.0]. On the other hand, if v is out of bounds, linstep returns 0.0 or 1.0.

A simple example:

linstep( 1000.0, 2000.0, 999.0) = 0.0
linstep( 1000.0, 2000.0, 1500.0) = 0.5
linstep( 1000.0, 2000.0, 2000.0) = 1.0

So it's quite similar to smoothstep from HLSL except in this case a linear interpolation is performed instead of Hermite one.
Linstep is not present in HLSL, but it's very useful. Really worth to have it in your toolbox.
 // linstep:  
 //  
 // Returns a linear interpolation between 0 and 1 if t is in the range [min, max]   
 // if "v" is <= min, the output is 0  
 // if "v" i >= max, the output is 1  
   
 float linstep( float min, float max, float v )  
 {  
   return saturate( (v - min) / (max - min) );  
 }  


Returning to The Witcher 3:
Once we have deducted this factor which indicates how far the particular piece of the sky is from Geralt we use it to attenuate clouds intensity:

  45: mul r1.x, r1.w, cb4[9].x  
  46: mad r1.y, -cb4[9].x, r1.w, r1.w  
  47: mad r0.w, r0.w, r1.y, r1.x  
   
   float distanceAttenuation = linstep( fadeDistanceStart, fadeDistanceEnd, worldToCamera_distance );  
    
   float fadedCloudShape = closeCloudsHidingFactor * cloudShape;  
   cloudShape = lerp( fadedCloudShape, cloudShape, distanceAttenuation );  

cloudShape is .a channel from the first texture while closeCloudsHidingFactor is a value from constant buffer which controls how visible are clouds above Geralt's head. In every frame I tested this was set to 0.0 which is equal to no clouds. With distanceAttenuation getting closer to 1.0 (distance from camera to skydome increases) clouds are more and more visible.

Sampling noise texture

For a noise texture calculating of sampling coordinates is the same as for the clouds texture except there is other set of textureScale and speedMultiplier for it.

Of course to sample all these texture a sampler with wrap addressing mode is used.

  48: mul r1.xy, cb0[0].xxxx, cb4[11].xyxx  
  49: mad r1.xy, v1.xyxx, cb4[10].xyxx, r1.xyxx  
  50: sample_indexable(texture2d)(float,float,float,float) r1.x, r1.xyxx, t1.xyzw, s0  
   
   // Calc sampling coords for noise  
   float2 noiseTextureUV = Texcoords * textureScaleNoise + elapsedTime * speedFactorsNoise;  
   
   // Sample texture and get data from it  
   float noiseTextureValue = texture1.Sample( sampler0, noiseTextureUV ).x;  

Getting all this together

Once we have a noise value, we have to combine it with cloudShape.
I had some problems with understanding these lines with "param2.w" (which is always 1.0) and noiseMult (was set to 5.0, it's from cbuffer).

Anyway, what's important here is the final value generalCloudsVisibility which affects how clouds are visible.

Take a look at final noise value as well. Output color is cloudsColor multiplied by the final noise which is also output at alpha channel.

  51: mad r1.x, r1.x, cb4[12].x, -cb4[12].x
  52: mad_sat r1.x, cb4[12].x, v2.w, r1.x
  53: mul r0.w, r0.w, r1.x
  54: mul_sat r0.w, r0.w, cb4[6].x
  55: mul o0.xyz, r0.wwww, r0.xyzx
  56: mov o0.w, r0.w
  57: ret   

   // Sample noise texture and get data from it  
   float noiseTextureValue = texture1.Sample( sampler0, noiseTextureUV ).x;  
   noiseTextureValue = noiseTextureValue * noiseMult - noiseMult;  
     
   float noiseValue = saturate( noiseMult * Input.param2.w + noiseTextureValue);  
   noiseValue *= cloudShape;  
     
   float finalNoise = saturate( noiseValue * generalCloudsVisibility);  
   
   return float4( cloudsColor*finalNoise, finalNoise );  

Summary

So here we are at the end. The final result looks really convincing.
Time for a comparison. My shader - left, original one - right:

The shader is here if you're interested.

Feel free to comment and thanks for reading! :)
M.

sobota, 6 kwietnia 2019

Reverse engineering the rendering of The Witcher 3, part 13c - witcher senses (fisheye effect & final combining)

Welcome!

This is the last part of reverse engineering witcher senses effect from The Witcher 3: Wild Hunt.

Quick look on what we have now: in the first part full-screen intensity map was generated which tells how visible effect will be depending on distance. In the second part I investigated "outline map" in more detail which is responsible for outline and "moving" look of the final effect.

We have arrived to the last stop. We need to combine this all together! The last pass is fullscreen quad. Inputs are: color buffer, outline map and intensity map.

Before:



After:


And a video (once again) to show how the effect is applied:


As you can see, besides applying outline to objects which Geralt can see/hear, a fisheye effect is applied to whole screen and whole screen (corners espiecially) is getting greyish to feel like real monster hunter in action.

Full pixel shader assembly:
 ps_5_0  
    dcl_globalFlags refactoringAllowed  
    dcl_constantbuffer cb0[3], immediateIndexed  
    dcl_constantbuffer cb3[7], immediateIndexed  
    dcl_sampler s0, mode_default  
    dcl_sampler s2, mode_default  
    dcl_resource_texture2d (float,float,float,float) t0  
    dcl_resource_texture2d (float,float,float,float) t2  
    dcl_resource_texture2d (float,float,float,float) t3  
    dcl_input_ps_siv v0.xy, position  
    dcl_output o0.xyzw  
    dcl_temps 7  
   0: div r0.xy, v0.xyxx, cb0[2].xyxx  
   1: mad r0.zw, r0.xxxy, l(0.000000, 0.000000, 2.000000, 2.000000), l(0.000000, 0.000000, -1.000000, -1.000000)  
   2: mov r1.yz, abs(r0.zzwz)  
   3: div r0.z, cb0[2].x, cb0[2].y  
   4: mul r1.x, r0.z, r1.y  
   5: add r0.zw, r1.xxxz, -cb3[2].xxxy  
   6: mul_sat r0.zw, r0.zzzw, l(0.000000, 0.000000, 0.555556, 0.555556)  
   7: log r0.zw, r0.zzzw  
   8: mul r0.zw, r0.zzzw, l(0.000000, 0.000000, 2.500000, 2.500000)  
   9: exp r0.zw, r0.zzzw  
  10: dp2 r0.z, r0.zwzz, r0.zwzz  
  11: sqrt r0.z, r0.z  
  12: min r0.z, r0.z, l(1.000000)  
  13: add r0.z, -r0.z, l(1.000000)  
  14: mov_sat r0.w, cb3[6].x  
  15: add_sat r1.xy, -r0.xyxx, l(0.030000, 0.030000, 0.000000, 0.000000)  
  16: add r1.x, r1.y, r1.x  
  17: add_sat r0.xy, r0.xyxx, l(-0.970000, -0.970000, 0.000000, 0.000000)  
  18: add r0.x, r0.x, r1.x  
  19: add r0.x, r0.y, r0.x  
  20: mul r0.x, r0.x, l(20.000000)  
  21: min r0.x, r0.x, l(1.000000)  
  22: add r1.xy, v0.xyxx, v0.xyxx  
  23: div r1.xy, r1.xyxx, cb0[2].xyxx  
  24: add r1.xy, r1.xyxx, l(-1.000000, -1.000000, 0.000000, 0.000000)  
  25: dp2 r0.y, r1.xyxx, r1.xyxx  
  26: mul r1.xy, r0.yyyy, r1.xyxx  
  27: mul r0.y, r0.w, l(0.100000)  
  28: mul r1.xy, r0.yyyy, r1.xyxx  
  29: max r1.xy, r1.xyxx, l(-0.400000, -0.400000, 0.000000, 0.000000)  
  30: min r1.xy, r1.xyxx, l(0.400000, 0.400000, 0.000000, 0.000000)  
  31: mul r1.xy, r1.xyxx, cb3[1].xxxx  
  32: mul r1.zw, r1.xxxy, cb0[2].zzzw  
  33: mad r1.zw, v0.xxxy, cb0[1].zzzw, -r1.zzzw  
  34: sample_indexable(texture2d)(float,float,float,float) r2.xyz, r1.zwzz, t0.xyzw, s0  
  35: mul r3.xy, r1.zwzz, l(0.500000, 0.500000, 0.000000, 0.000000)  
  36: sample_indexable(texture2d)(float,float,float,float) r0.y, r3.xyxx, t2.yxzw, s2  
  37: mad r3.xy, r1.zwzz, l(0.500000, 0.500000, 0.000000, 0.000000), l(0.500000, 0.000000, 0.000000, 0.000000)  
  38: sample_indexable(texture2d)(float,float,float,float) r2.w, r3.xyxx, t2.yzwx, s2  
  39: mul r2.w, r2.w, l(0.125000)  
  40: mul r3.x, cb0[0].x, l(0.100000)  
  41: add r0.x, -r0.x, l(1.000000)  
  42: mul r0.xy, r0.xyxx, l(0.030000, 0.125000, 0.000000, 0.000000)  
  43: mov r3.yzw, l(0, 0, 0, 0)  
  44: mov r4.x, r0.y  
  45: mov r4.y, r2.w  
  46: mov r4.z, l(0)  
  47: loop  
  48:  ige r4.w, r4.z, l(8)  
  49:  breakc_nz r4.w  
  50:  itof r4.w, r4.z  
  51:  mad r4.w, r4.w, l(0.785375), -r3.x  
  52:  sincos r5.x, r6.x, r4.w  
  53:  mov r6.y, r5.x  
  54:  mul r5.xy, r0.xxxx, r6.xyxx  
  55:  mad r5.zw, r5.xxxy, l(0.000000, 0.000000, 0.125000, 0.125000), r1.zzzw  
  56:  mul r6.xy, r5.zwzz, l(0.500000, 0.500000, 0.000000, 0.000000)  
  57:  sample_indexable(texture2d)(float,float,float,float) r4.w, r6.xyxx, t2.yzwx, s2  
  58:  mad r4.x, r4.w, l(0.125000), r4.x  
  59:  mad r5.zw, r5.zzzw, l(0.000000, 0.000000, 0.500000, 0.500000), l(0.000000, 0.000000, 0.500000, 0.000000)  
  60:  sample_indexable(texture2d)(float,float,float,float) r4.w, r5.zwzz, t2.yzwx, s2  
  61:  mad r4.y, r4.w, l(0.125000), r4.y  
  62:  mad r5.xy, r5.xyxx, r1.xyxx, r1.zwzz  
  63:  sample_indexable(texture2d)(float,float,float,float) r5.xyz, r5.xyxx, t0.xyzw, s0  
  64:  mad r3.yzw, r5.xxyz, l(0.000000, 0.125000, 0.125000, 0.125000), r3.yyzw  
  65:  iadd r4.z, r4.z, l(1)  
  66: endloop  
  67: sample_indexable(texture2d)(float,float,float,float) r0.xy, r1.zwzz, t3.xyzw, s0  
  68: mad_sat r0.xy, -r0.xyxx, l(0.800000, 0.750000, 0.000000, 0.000000), r4.xyxx  
  69: dp3 r1.x, r3.yzwy, l(0.300000, 0.300000, 0.300000, 0.000000)  
  70: add r1.yzw, -r1.xxxx, r3.yyzw  
  71: mad r1.xyz, r0.zzzz, r1.yzwy, r1.xxxx  
  72: mad r1.xyz, r1.xyzx, l(0.600000, 0.600000, 0.600000, 0.000000), -r2.xyzx  
  73: mad r1.xyz, r0.wwww, r1.xyzx, r2.xyzx  
  74: mul r0.yzw, r0.yyyy, cb3[4].xxyz  
  75: mul r2.xyz, r0.xxxx, cb3[5].xyzx  
  76: mad r0.xyz, r0.yzwy, l(1.200000, 1.200000, 1.200000, 0.000000), r2.xyzx  
  77: mov_sat r2.xyz, r0.xyzx  
  78: dp3_sat r0.x, r0.xyzx, l(1.000000, 1.000000, 1.000000, 0.000000)  
  79: add r0.yzw, -r1.xxyz, r2.xxyz  
  80: mad o0.xyz, r0.xxxx, r0.yzwy, r1.xyzx  
  81: mov o0.w, l(1.000000)  
  82: ret   


82 lines means a lot of work! Let's get into it!

Take a look at inputs first:
   // *** Inputs       
     
   // * Zoom amount, always 1  
   float zoomAmount = cb3_v1.x;  
     
   // Another value which affect fisheye effect  
   // but always set to float2(1.0, 1.0).  
   float2 amount = cb0_v2.zw;  
     
   // Elapsed time in seconds  
   float time = cb0_v0.x;  
     
   // Colors of witcher senses  
   float3 colorInteresting = cb3_v5.rgb;  
   float3 colorTraces = cb3_v4.rgb;  
     
   // Was always set to float2(0.0, 0.0).  
   // Setting this to higher values  
   // makes "grey corners" effect weaker.  
   float2 offset = cb3_v2.xy;  
     
   // Dimensions of fullscreen  
   float2 texSize = cb0_v2.xy;  
   float2 invTexSize = cb0_v1.zw;  
   
   // Main value which causes fisheye effect [0-1]  
   const float fisheyeAmount = saturate( cb3_v6.x );  

The main value responsible for amount of the effect is fisheyeAmount. I guess it rises gradually from 0.0 to 1.0 once Geralt is starting to use his senses. Rest of values are rather constant but I guess some of them are different if user disables fisheye effect in gameplay options (I haven't checked it).


The first thing which happens in the shader is calculating mask responsible for grey corners:
   0: div r0.xy, v0.xyxx, cb0[2].xyxx   
   1: mad r0.zw, r0.xxxy, l(0.000000, 0.000000, 2.000000, 2.000000), l(0.000000, 0.000000, -1.000000, -1.000000)   
   2: mov r1.yz, abs(r0.zzwz)   
   3: div r0.z, cb0[2].x, cb0[2].y   
   4: mul r1.x, r0.z, r1.y   
   5: add r0.zw, r1.xxxz, -cb3[2].xxxy   
   6: mul_sat r0.zw, r0.zzzw, l(0.000000, 0.000000, 0.555556, 0.555556)   
   7: log r0.zw, r0.zzzw   
   8: mul r0.zw, r0.zzzw, l(0.000000, 0.000000, 2.500000, 2.500000)   
   9: exp r0.zw, r0.zzzw   
  10: dp2 r0.z, r0.zwzz, r0.zwzz   
  11: sqrt r0.z, r0.z   
  12: min r0.z, r0.z, l(1.000000)   
  13: add r0.z, -r0.z, l(1.000000)   

In HLSL we can write it this way:
   // Main uv  
   float2 uv = PosH.xy / texSize;  
     
   // Scale at first from [0-1] to [-1;1], then calculate abs  
   float2 uv3 = abs( uv * 2.0 - 1.0);   
        
   // Aspect ratio  
   float aspectRatio = texSize.x / texSize.y;  
        
   // * Mask used to make corners grey  
   float mask_gray_corners;  
   {  
     float2 newUv = float2( uv3.x * aspectRatio, uv3.y ) - offset;  
     newUv = saturate( newUv / 1.8 );  
     newUv = pow(newUv, 2.5);  
       
     mask_gray_corners = 1-min(1.0, length(newUv) );  
   }  

At first uv is [-1; 1] range are calculated and their absolute value. Then, some clever "squeezing" takes place. Final mask looks this way:

I'll come back to this mask later.


Now I'm going to intentionally omit a few lines of assembly and take a closer look at code responsible for "zooming" effect.
  22: add r1.xy, v0.xyxx, v0.xyxx   
  23: div r1.xy, r1.xyxx, cb0[2].xyxx   
  24: add r1.xy, r1.xyxx, l(-1.000000, -1.000000, 0.000000, 0.000000)   
  25: dp2 r0.y, r1.xyxx, r1.xyxx   
  26: mul r1.xy, r0.yyyy, r1.xyxx   
  27: mul r0.y, r0.w, l(0.100000)   
  28: mul r1.xy, r0.yyyy, r1.xyxx   
  29: max r1.xy, r1.xyxx, l(-0.400000, -0.400000, 0.000000, 0.000000)   
  30: min r1.xy, r1.xyxx, l(0.400000, 0.400000, 0.000000, 0.000000)   
  31: mul r1.xy, r1.xyxx, cb3[1].xxxx   
  32: mul r1.zw, r1.xxxy, cb0[2].zzzw   
  33: mad r1.zw, v0.xxxy, cb0[1].zzzw, -r1.zzzw   

At first "double" texture coordinates are calculated and float2(1, 1) is subtracted:
   float2 uv4 = 2 * PosH.xy;  
   uv4 /= cb0_v2.xy;  
   uv4 -= float2(1.0, 1.0);  

Such texcoord can be visualised as:

Then dot product is calculated as dot(uv4, uv4), which yields a mask:

which is used to multiply with aforementioned texcoords:

Important: In upper left corner (black pixels) values are negative. The reason why they are represented as black (0.0) is limited precision of R11G11B10_FLOAT format. There is no sign bit there so we cannot store negative values.

Later an attenuation factor is calculated (As I mentioned before, fisheyeAmount changes from 0.0 to 1.0).
   float attenuation = fisheyeAmount * 0.1;  
   uv4 *= attenuation;  

Later we have a clamp (max/min) and one multiplication.
This way an offset was calculated. To calculate final uv which will be used to sample color texture we just subtract:

float2 colorUV = mainUv - offset;

Sampling with colorUV input color texture, we get distorted image around corners:



Outlines

The next step is to sample outline map to find outlines. This is quite easy, at first we find texcoords to sample interesting objects' outline, then the same for traces:
   // * Sample outline map  
        
   // interesting objects (upper left square)  
   float2 outlineUV = colorUV * 0.5;  
   float outlineInteresting = texture2.Sample( sampler2, outlineUV ).x; // r0.y  
        
   // traces (upper right square)  
   outlineUV = colorUV * 0.5 + float2(0.5, 0.0);  
   float outlineTraces = texture2.Sample( sampler2, outlineUV ).x; // r2.w  
        
   outlineInteresting /= 8.0; // r4.x  
   outlineTraces /= 8.0; // r4.y  

interesting objects from outline map
traces from outline map
It's worth to notice that we only sample .x channel from outline map and only upper squares of it are considered.

Movement

To make a movement of traces with time quite a similar trick is used as it was with drunk effect. A unit circle is introduced and we sample 8 times both outline map for interesting objects and traces as well as color texture.

Note that we divided found outlines by 8.0 just a moment ago.

Because we are in texture coordinates space [0-1]2 having circle radius = 1 to circle around particular pixel would give us unacceptable artifacts:


So, before going further let's find out how the radius is calculated. To do that, we have to go back to missed assembly lines 15-21. A small problem with calculation of this radius is that its calculation is scattered within shader (probably clever shader compiler optimizations or so). So, there is one part (15-21) and second one (41-42):
  15: add_sat r1.xy, -r0.xyxx, l(0.030000, 0.030000, 0.000000, 0.000000)  
  16: add r1.x, r1.y, r1.x  
  17: add_sat r0.xy, r0.xyxx, l(-0.970000, -0.970000, 0.000000, 0.000000)  
  18: add r0.x, r0.x, r1.x  
  19: add r0.x, r0.y, r0.x  
  20: mul r0.x, r0.x, l(20.000000)  
  21: min r0.x, r0.x, l(1.000000)  
  ...  
  41: add r0.x, -r0.x, l(1.000000)  
  42: mul r0.xy, r0.xyxx, l(0.030000, 0.125000, 0.000000, 0.000000)  

As you can see we consider only texels from [0.00 - 0.03] near every surface, sum their values up, multiply by 20 and saturate. Here is how it looks just after lines 15-21:


 And just after line 41:

Then at line 42 we multiply above by 0.03, which is circle radius for whole screen. As you can see, the radius is getting smaller near the edges of screen.


Having that, we can take a look at assembly resposible for movement:
  40: mul r3.x, cb0[0].x, l(0.100000)  
  41: add r0.x, -r0.x, l(1.000000)  
  42: mul r0.xy, r0.xyxx, l(0.030000, 0.125000, 0.000000, 0.000000)  
  43: mov r3.yzw, l(0, 0, 0, 0)  
  44: mov r4.x, r0.y  
  45: mov r4.y, r2.w  
  46: mov r4.z, l(0)  
  47: loop  
  48:  ige r4.w, r4.z, l(8)  
  49:  breakc_nz r4.w  
  50:  itof r4.w, r4.z  
  51:  mad r4.w, r4.w, l(0.785375), -r3.x  
  52:  sincos r5.x, r6.x, r4.w  
  53:  mov r6.y, r5.x  
  54:  mul r5.xy, r0.xxxx, r6.xyxx  
  55:  mad r5.zw, r5.xxxy, l(0.000000, 0.000000, 0.125000, 0.125000), r1.zzzw  
  56:  mul r6.xy, r5.zwzz, l(0.500000, 0.500000, 0.000000, 0.000000)  
  57:  sample_indexable(texture2d)(float,float,float,float) r4.w, r6.xyxx, t2.yzwx, s2  
  58:  mad r4.x, r4.w, l(0.125000), r4.x  
  59:  mad r5.zw, r5.zzzw, l(0.000000, 0.000000, 0.500000, 0.500000), l(0.000000, 0.000000, 0.500000, 0.000000)  
  60:  sample_indexable(texture2d)(float,float,float,float) r4.w, r5.zwzz, t2.yzwx, s2  
  61:  mad r4.y, r4.w, l(0.125000), r4.y  
  62:  mad r5.xy, r5.xyxx, r1.xyxx, r1.zwzz  
  63:  sample_indexable(texture2d)(float,float,float,float) r5.xyz, r5.xyxx, t0.xyzw, s0  
  64:  mad r3.yzw, r5.xxyz, l(0.000000, 0.125000, 0.125000, 0.125000), r3.yyzw  
  65:  iadd r4.z, r4.z, l(1)  
  66: endloop  

Let's take a moment to stop here. At line 40 we have time factor - simply elapsedTime * 0.1. At line 43 we have buffer for color texture fetched inside loop.

r0.x (lines 41-42) is radius of circle as we know it now. r4.x (line 44) is outline of interesting objects, r4.y (line 45) - outline of traces (divided previously by 8!) and r4.z (line 46) - loop counter.

As one can expect, loop has 8 iterations. We start by calculating angle in radians with i * PI_4 which gives 2*PI - full cycle. Angle is perturbed with time.

Using sincos we determine point of sampling (unit circle)  and we adjust the radius using multiplication (line 54).

After that we circle around a pixel and sample outlines and color. After the loop we will have average values (thanks to dividing by 8) of outlines and color.
   float timeParam = time * 0.1;  
     
   // adjust circle radius  
   circle_radius = 1.0 - circle_radius;  
   circle_radius *= 0.03;  
        
   float3 color_circle_main = float3(0.0, 0.0, 0.0);  
        
   [loop]  
   for (int i=0; 8 > i; i++)  
   {  
      // full 2*PI = 360 angles cycle  
      const float angleRadians = (float) i * PI_4 - timeParam;  
             
      // unit circle  
      float2 unitCircle;  
      sincos(angleRadians, unitCircle.y, unitCircle.x); // unitCircle.x = cos, unitCircle.y = sin  
             
      // adjust radius  
      unitCircle *= circle_radius;  
             
      // * base texcoords (circle) - note we also scale radius here by 8  
      // * probably because of dimensions of outline map.  
      // line 55  
      float2 uv_outline_base = colorUV + unitCircle / 8.0;  
                       
      // * interesting objects (circle)  
      float2 uv_outline_interesting_circle = uv_outline_base * 0.5;  
      float outline_interesting_circle = texture2.Sample( sampler2, uv_outline_interesting_circle ).x;  
      outlineInteresting += outline_interesting_circle / 8.0;  
             
      // * traces (circle)  
      float2 uv_outline_traces_circle = uv_outline_base * 0.5 + float2(0.5, 0.0);  
      float outline_traces_circle = texture2.Sample( sampler2, uv_outline_traces_circle ).x;  
      outlineTraces += outline_traces_circle / 8.0;  
             
      // * sample color texture (zooming effect) with perturbation  
      float2 uv_color_circle = colorUV + unitCircle * offsetUV;  
      float3 color_circle = texture0.Sample( sampler0, uv_color_circle ).rgb;  
      color_circle_main += color_circle / 8.0;  
   }  
        

Sampling of color is quite similar, but to base colorUV we add offset multiplied by "unit" circle.

Intensities

After the loop we sample intensity map and adjust final intensities (because intensity map has no idea about outlines):
  67: sample_indexable(texture2d)(float,float,float,float) r0.xy, r1.zwzz, t3.xyzw, s0  
  68: mad_sat r0.xy, -r0.xyxx, l(0.800000, 0.750000, 0.000000, 0.000000), r4.xyxx  

HLSL:
   // * Sample intensity map  
   float2 intensityMap = texture3.Sample( sampler0, colorUV ).xy;  
     
   float intensityInteresting = intensityMap.r;  
   float intensityTraces = intensityMap.g;  
        
   // * Adjust outlines  
   float mainOutlineInteresting = saturate( outlineInteresting - 0.8*intensityInteresting );  
   float mainOutlineTraces = saturate( outlineTraces - 0.75*intensityTraces ); 

Gray corners and final combining

The gray color near corners is calculated using dot product (assembly line 69):
   // * Greyish color  
   float3 color_greyish = dot( color_circle_main, float3(0.3, 0.3, 0.3) ).xxx;  



Then we have two interpolations. The first one combines gray color with "circled one" using the first mask I described - so the corners are grey. Additionally there is 0.6 factor which desaturates final image:

The second one combines the first color with the above one using fisheyeAmount. That means, the screen is getting progressively darker (thanks to 0.6 multiplication above) and more gray around corner! Genius.

HLSL:
   // * Determine main color.  
   // (1) At first, combine "circled" color with gray one.  
   // Now we have have greyish corners here.  
   float3 mainColor = lerp( color_greyish, color_circle_main, mask_gray_corners ) * 0.6;  
     
   // (2) Then mix "regular" color with the above.  
   // Please note this operation makes corners gradually gray (because fisheyeAmount rises from 0 to 1)
   // and gradually darker (because of 0.6 multiplier).  
   mainColor = lerp( color, mainColor, fisheyeAmount );  


Now we can move to outlining objects.
Colors (red and yellow) are taken from constant buffer.
   // * Determine color of witcher senses  
   float3 senses_traces = mainOutlineTraces * colorTraces;  
   float3 senses_interesting = mainOutlineInteresting * colorInteresting;  
   float3 senses_total = 1.2 * senses_traces + senses_interesting;   



Phew! We are almost at the finish line!
We have final color, we have color of witcher senses... ale we have to do is to combine it somehow!

This is not just simple adding. At first, we calculate dot product:
  78: dp3_sat r0.x, r0.xyzx, l(1.000000, 1.000000, 1.000000, 0.000000)  
   
  float dot_senses_total = saturate( dot(senses_total, float3(1.0, 1.0, 1.0) ) );  

which looks like this:

And this is, at the very end, used to interpolate between color and (saturated) witcher senses:
  76: mad r0.xyz, r0.yzwy, l(1.200000, 1.200000, 1.200000, 0.000000), r2.xyzx  
  77: mov_sat r2.xyz, r0.xyzx  
  78: dp3_sat r0.x, r0.xyzx, l(1.000000, 1.000000, 1.000000, 0.000000)  
  79: add r0.yzw, -r1.xxyz, r2.xxyz  
  80: mad o0.xyz, r0.xxxx, r0.yzwy, r1.xyzx  
  81: mov o0.w, l(1.000000)  
  82: ret  
   
   float3 senses_total = 1.2 * senses_traces + senses_interesting;   
     
   // * Final combining  
   float3 senses_total_sat = saturate(senses_total);  
   float dot_senses_total = saturate( dot(senses_total, float3(1.0, 1.0, 1.0) ) );  
        
   float3 finalColor = lerp( mainColor, senses_total_sat, dot_senses_total );  
   return float4( finalColor, 1.0 );  



This is the end.


The full shader is available here.
Comparison of my (left) and original (right) shaders:


If you have come this far, congratulations. Feel free to comment.
I hope you enjoyed this mini-series! In "witcher senses" mechanics there is a lot of brilliant ideas and final result is really convincing.

Thank you very much for reading!


PS. A decent part of this mini-series was done with High Contrast in background :)

Reverse engineering the rendering of The Witcher 3, part 13b - witcher senses (outline map)

Welcome,

This is the second part of demystifying Witcher Senses effect from The Witcher 3: Wild Hunt.

Once again, example scene we are working on:


In the first post I showed a bit how "intensity map" is being generated.
We have one full-resolution R11G11B10_FLOAT texture which can look like this:


The green channel represents "traces" and red one - interesting objects Geralt can interact with.

Having this we can move to the next stage - I called it "outline map".

This is a bit strange 512x512 R16G16_FLOAT texture. What's important here, it's implemented in ping-pong fashion. That means, outline map from previous frame is input (along with intensity map) for generating a new outline map in current frame.

You can implement ping-pong buffers in many ways probably but my personal like is as follows (pseudocode):
 // Declarations  
 Texture2D m_texOutlineMap[2];  
 uint m_outlineIndex = 0;  
   
 // Rendering  
 void Render()  
 {  
   pDevCon->SetInputTexture( m_texOutlineMap[m_outlineIndex] );  
   pDevCon->SetOutputTexture( m_texOutlineMap[!m_outlineIndex] );  
   ...  
   pDevCon->Draw(...);  
   
   // after draw  
   m_outlineIndex = !m_outlineIndex;  
 }  

Such approach, when input is always [m_outlineIndex] and output is always [!m_outlineIndex] allows for nice flexibility in terms of applying postFXs in general.

Let's take a look at pixel shader:
 ps_5_0  
    dcl_globalFlags refactoringAllowed  
    dcl_constantbuffer cb3[1], immediateIndexed  
    dcl_sampler s0, mode_default  
    dcl_sampler s1, mode_default  
    dcl_resource_texture2d (float,float,float,float) t0  
    dcl_resource_texture2d (float,float,float,float) t1  
    dcl_input_ps linear v2.xy  
    dcl_output o0.xyzw  
    dcl_temps 4  
   0: add r0.xyzw, v2.xyxy, v2.xyxy  
   1: round_ni r1.xy, r0.zwzz  
   2: frc r0.xyzw, r0.xyzw  
   3: add r1.zw, r1.xxxy, l(0.000000, 0.000000, -1.000000, -1.000000)  
   4: dp2 r1.z, r1.zwzz, r1.zwzz  
   5: add r1.z, -r1.z, l(1.000000)  
   6: max r2.w, r1.z, l(0)  
   7: dp2 r1.z, r1.xyxx, r1.xyxx  
   8: add r3.xyzw, r1.xyxy, l(-1.000000, -0.000000, -0.000000, -1.000000)  
   9: add r1.x, -r1.z, l(1.000000)  
  10: max r2.x, r1.x, l(0)  
  11: dp2 r1.x, r3.xyxx, r3.xyxx  
  12: dp2 r1.y, r3.zwzz, r3.zwzz  
  13: add r1.xy, -r1.xyxx, l(1.000000, 1.000000, 0.000000, 0.000000)  
  14: max r2.yz, r1.xxyx, l(0, 0, 0, 0)  
  15: sample_indexable(texture2d)(float,float,float,float) r1.xyzw, r0.zwzz, t1.xyzw, s1  
  16: dp4 r1.x, r1.xyzw, r2.xyzw  
  17: add r2.xyzw, r0.zwzw, l(0.003906, 0.000000, -0.003906, 0.000000)  
  18: add r0.xyzw, r0.xyzw, l(0.000000, 0.003906, 0.000000, -0.003906)  
  19: sample_indexable(texture2d)(float,float,float,float) r1.yz, r2.xyxx, t1.zxyw, s1  
  20: sample_indexable(texture2d)(float,float,float,float) r2.xy, r2.zwzz, t1.xyzw, s1  
  21: add r1.yz, r1.yyzy, -r2.xxyx  
  22: sample_indexable(texture2d)(float,float,float,float) r0.xy, r0.xyxx, t1.xyzw, s1  
  23: sample_indexable(texture2d)(float,float,float,float) r0.zw, r0.zwzz, t1.zwxy, s1  
  24: add r0.xy, -r0.zwzz, r0.xyxx  
  25: max r0.xy, abs(r0.xyxx), abs(r1.yzyy)  
  26: min r0.xy, r0.xyxx, l(1.000000, 1.000000, 0.000000, 0.000000)  
  27: mul r0.xy, r0.xyxx, r1.xxxx  
  28: sample_indexable(texture2d)(float,float,float,float) r0.zw, v2.xyxx, t0.zwxy, s0  
  29: mad r0.w, r1.x, l(0.150000), r0.w  
  30: mad r0.x, r0.x, l(0.350000), r0.w  
  31: mad r0.x, r0.y, l(0.350000), r0.x  
  32: mul r0.yw, cb3[0].zzzw, l(0.000000, 300.000000, 0.000000, 300.000000)  
  33: mad r0.yw, v2.xxxy, l(0.000000, 150.000000, 0.000000, 150.000000), r0.yyyw  
  34: ftoi r0.yw, r0.yyyw  
  35: bfrev r0.w, r0.w  
  36: iadd r0.y, r0.w, r0.y  
  37: ishr r0.w, r0.y, l(13)  
  38: xor r0.y, r0.y, r0.w  
  39: imul null, r0.w, r0.y, r0.y  
  40: imad r0.w, r0.w, l(0x0000ec4d), l(0.0000000000000000000000000000000000001)  
  41: imad r0.y, r0.y, r0.w, l(146956042240.000000)  
  42: and r0.y, r0.y, l(0x7fffffff)  
  43: itof r0.y, r0.y  
  44: mad r0.y, r0.y, l(0.000000001), l(0.650000)  
  45: add_sat r1.xyzw, v2.xyxy, l(0.001953, 0.000000, -0.001953, 0.000000)  
  46: sample_indexable(texture2d)(float,float,float,float) r0.w, r1.xyxx, t0.yzwx, s0  
  47: sample_indexable(texture2d)(float,float,float,float) r1.x, r1.zwzz, t0.xyzw, s0  
  48: add r0.w, r0.w, r1.x  
  49: add_sat r1.xyzw, v2.xyxy, l(0.000000, 0.001953, 0.000000, -0.001953)  
  50: sample_indexable(texture2d)(float,float,float,float) r1.x, r1.xyxx, t0.xyzw, s0  
  51: sample_indexable(texture2d)(float,float,float,float) r1.y, r1.zwzz, t0.yxzw, s0  
  52: add r0.w, r0.w, r1.x  
  53: add r0.w, r1.y, r0.w  
  54: mad r0.w, r0.w, l(0.250000), -r0.z  
  55: mul r0.w, r0.y, r0.w  
  56: mul r0.y, r0.y, r0.z  
  57: mad r0.x, r0.w, l(0.900000), r0.x  
  58: mad r0.y, r0.y, l(-0.240000), r0.x  
  59: add r0.x, r0.y, r0.z  
  60: mov_sat r0.z, cb3[0].x  
  61: log r0.z, r0.z  
  62: mul r0.z, r0.z, l(100.000000)  
  63: exp r0.z, r0.z  
  64: mad r0.z, r0.z, l(0.160000), l(0.700000)  
  65: mul o0.xy, r0.zzzz, r0.xyxx  
  66: mov o0.zw, l(0, 0, 0, 0)  
  67: ret  


As you can see, output of outline map is divided to four equal squares and this is the first thing we need to look at:
   0: add r0.xyzw, v2.xyxy, v2.xyxy  
   1: round_ni r1.xy, r0.zwzz  
   2: frc r0.xyzw, r0.xyzw  
   3: add r1.zw, r1.xxxy, l(0.000000, 0.000000, -1.000000, -1.000000)  
   4: dp2 r1.z, r1.zwzz, r1.zwzz  
   5: add r1.z, -r1.z, l(1.000000)  
   6: max r2.w, r1.z, l(0)  
   7: dp2 r1.z, r1.xyxx, r1.xyxx  
   8: add r3.xyzw, r1.xyxy, l(-1.000000, -0.000000, -0.000000, -1.000000)  
   9: add r1.x, -r1.z, l(1.000000)  
  10: max r2.x, r1.x, l(0)  
  11: dp2 r1.x, r3.xyxx, r3.xyxx  
  12: dp2 r1.y, r3.zwzz, r3.zwzz  
  13: add r1.xy, -r1.xyxx, l(1.000000, 1.000000, 0.000000, 0.000000)  
  14: max r2.yz, r1.xxyx, l(0, 0, 0, 0)  

We start by calculating floor( TextureUV * 2.0 ), which gives:

To determine individual squares, a small function is used:
 float getParams(float2 uv)  
 {  
      float d = dot(uv, uv);  
      d = 1.0 - d;  
      d = max( d, 0.0 );  
   
      return d;  
 }  

Note that this function returns 1.0 when input is float2(0.0, 0.0).
We have this case in upper left corner. To have the same situtation for upper right corner, we have to subtract float2(1, 0) from floored texcoords, for green square subtract float2(0, 1) and for yellow one - float2(1.0, 1.0).

So:
   float2 flooredTextureUV = floor( 2.0 * TextureUV );  
   ...
     
   float2 uv1 = flooredTextureUV;  
   float2 uv2 = flooredTextureUV + float2(-1.0, -0.0);   
   float2 uv3 = flooredTextureUV + float2( -0.0, -1.0);  
   float2 uv4 = flooredTextureUV + float2(-1.0, -1.0);  
   
   float4 mask;  
   mask.x = getParams( uv1 );  
   mask.y = getParams( uv2 );  
   mask.z = getParams( uv3 );  
   mask.w = getParams( uv4 );  

Each of mask components is equal to one or zero and is responsible for one square within texture. For instance mask.r and mask.w:
mask.r

mask.w

Once we have obtainted mask, let's move further. Line 15 samples intensity map. Please note that intensity texture is R11G11B10_FLOAT, while we sample all rgba components. In this scenario, .a is set implicitly to 1.0f.

Texcoords used for this operation can be calculated as frac( TextureUV * 2.0 ). So result of this operations looks for example like this:

Do you see similarity?

The next step is really smart - a 4-components dot product (dp4) is performed:
  16: dp4 r1.x, r1.xyzw, r2.xyzw   

This way, in upper left square we have only red channel (therefore, only interesting objects), in upper right - only green channel (only traces) and in lower right - everything (because .w component of intensity was implicitly set to 1.0). Brilliant idea. The result of dot product looks this way:


Having this masterFilter, we are ready to determine outlines of objects. This is not that hard as one can expect. The algorithm is quite similar to the one applied to sharpen - we have to obtain max abs difference of values.

Here's what happens, we sample four texels near currently processsed one (important: texel size in this case is 1.0/256.0 !) and calculate maximum absolute differences for both red and green channels:
   float fTexel = 1.0 / 256;  
     
   float2 sampling1 = TextureUV + float2( fTexel, 0 );  
   float2 sampling2 = TextureUV + float2( -fTexel, 0 );  
   float2 sampling3 = TextureUV + float2( 0, fTexel );  
   float2 sampling4 = TextureUV + float2( 0, -fTexel );  
     
   float2 intensity_x0 = texIntensityMap.Sample( sampler1, sampling1 ).xy;  
   float2 intensity_x1 = texIntensityMap.Sample( sampler1, sampling2 ).xy;  
   float2 intensity_diff_x = intensity_x0 - intensity_x1;  
     
   float2 intensity_y0 = texIntensityMap.Sample( sampler1, sampling3 ).xy;  
   float2 intensity_y1 = texIntensityMap.Sample( sampler1, sampling4 ).xy;  
   float2 intensity_diff_y = intensity_y0 - intensity_y1;  
     
   float2 maxAbsDifference = max( abs(intensity_diff_x), abs(intensity_diff_y) );  
   maxAbsDifference = saturate(maxAbsDifference);  

Now - if we multiply filter and maxAbsDifference...

So simple and so effective.

Once we have outlines, we sample outline map from previous frame.
Then, to have "ghosting" effect we take a bit of parameters calculated with current pass and values from outline map.

Say "hi" to our old friend - integer noise. It's present here as well. Animation parameters ( cb3[0].zw ) are from constant buffer and they change with time.
   float2 outlines = masterFilter * maxAbsDifference;  
     
   // Sample outline map  
   float2 outlineMap = texOutlineMap.Sample( samplerLinearWrap, uv ).xy;  
     
   // I guess it's related with ghosting   
   float paramOutline = masterFilter*0.15 + outlineMap.y;  
   paramOutline += 0.35 * outlines.r;  
   paramOutline += 0.35 * outlines.g;  
     
   // input for integer noise  
   float2 noiseWeights = cb3_v0.zw;
   float2 noiseInputs = 150.0*uv + 300.0*noiseWeights;  
   int2 iNoiseInputs = (int2) noiseInputs;  
     
   float noise0 = clamp( integerNoise( iNoiseInputs.x + reversebits(iNoiseInputs.y) ), -1, 1 ) + 0.65; // r0.y  
     

Side note: If you would like to implement Witcher Senses on your own I suggest to clamp integer noise to [-1;1] range (as its website says). There is no clamp in original TW3 shader but without clamping I had awful artifacts and whole outline map was unstable.

Then, we sample outline map the same way as intensity map before (this time size of texel is 1.0/512.0) and calculate average value of  .x component:

  // sampling of outline map  
   fTexel = 1.0 / 512.0;  
     
   sampling1 = saturate( uv + float2( fTexel, 0 ) );  
   sampling2 = saturate( uv + float2( -fTexel, 0 ) );  
   sampling3 = saturate( uv + float2( 0, fTexel ) );  
   sampling4 = saturate( uv + float2( 0, -fTexel ) );  
     
   float outline_x0 = texOutlineMap.Sample( sampler0, sampling1 ).x;  
   float outline_x1 = texOutlineMap.Sample( sampler0, sampling2 ).x;  
   float outline_y0 = texOutlineMap.Sample( sampler0, sampling3 ).x;  
   float outline_y1 = texOutlineMap.Sample( sampler0, sampling4 ).x;  
   float averageOutline = (outline_x0+outline_x1+outline_y0+outline_y1) / 4.0;  

Then, following the assembly, a difference between average and value in that particular pixel is computed and perturbed with integer noise:
   // perturb with noise  
   float frameOutlineDifference = averageOutline - outlineMap.x;  
   frameOutlineDifference *= noise0;  

The next step is to perturb value from "old" outline map with noise - this is main line which gives blocky look to output texture.

There are some more calculations later and, at the very end, "damping" is caculated.
   // the main place with gives blocky look of texture  
   float newNoise = outlineMap.x * noise0;  
     
   float newOutline = frameOutlineDifference * 0.9 + paramOutline;  
   newOutline -= 0.24*newNoise;  
     
   // 59: add r0.x, r0.y, r0.z  
   float2 finalOutline = float2( outlineMap.x + newOutline, newOutline);  
     
   // * calculate damping  
   float dampingParam = saturate( cb3_v0.x );  
   dampingParam = pow( dampingParam, 100 );    
     
   float damping = 0.7 + 0.16*dampingParam;  
   
   
   // * final multiplication  
   float2 finalColor = finalOutline * damping;  
   return float4(finalColor, 0, 0);


Here is a small video which shows outline map in action:



If you are interested with complete pixel shader, it's here. It's compatible with RenderDoc.
What's interesting (and, to be honest, slightly frustrating) despite its assembly is the same as the original shader from Witcher 3, the final look of outline map in RenderDoc changes!

On a side note - in the last pass (link below) you will see that only .r channel of outline map is being used. So why do we need .g channel then? I guess it's some sort of ping-pong buffer within the texture - please note that .r contains .g channel + some new value.

We have arrived to the end of the second part. Go to the last one here.


I hope you enjoyed it.
Thanks for reading!

Reverse engineering the rendering of The Witcher 3, part 13a - witcher senses (objects & intensity map)

Welcome back!

So far, almost every effect/technique explained in this series was not really Witcher 3-related. I mean, you can find things like tonemapping, vignette or calculating average luminance almost in every modern video game. Even drunk effect is something quite widespread.

That's why I decided to take a closer look at "witcher senses" rendering mechanics. Since Geralt is a witcher, his senses are much more sensitive comparing to an ordinary human. Therefore, he can see and hear more than other people which greatly helps him with solving his investigations. Witcher senses mechanics allows player to visualize these traces.

Here is a demonstration of this effect:

And one more, in better lighting:

As you can see, there are 2 types of objects: the ones Geralt can interact with (yellow outline) and traces related with investigating (red outline). Once Geralt investigates red trace, it can turn to yellow one (video #1). Please note that whole screen gets more grey-ish and a fish-eye effect is applied (video #2)

The effect is quite complex so I decided to split it across three blog posts.
In the first one I will describe selection of objects, in the second one - generation of outline and the third one will focus on final combining all this together.

Selection of objects

As I mentioned, there are two types of objects so we want to distinguish between them. In The Witcher 3 it's done by using stencil buffer. While generating GBuffer meshes which are meant to be marked as "traces" (red) are rendered with stencil = 8. Meshes which are marked with yellow color as "interesting" are rendered with stencil = 4.

For example, the following two textures show example frame with visible witcher senses and corresponding stencil buffer:



Stencil buffer - a brief refreshment

Stencil buffer is generally quite often used to identify meshes drawn by assigning the same ID to certain categories of meshes.

The idea is to use Always function with Replace operator once stencil test passes and Keep operator in other cases.

Here's how to implement it with D3D11:

 D3D11_DEPTH_STENCIL_DESC depthstencilState;  
 // Set depth parameters....  
   
 // Enable stencil  
 depthstencilState.StencilEnable = TRUE;  
   
 // Read & write all bits  
 depthstencilState.StencilReadMask = 0xFF;  
 depthstencilState.StencilWriteMask = 0xFF;  
   
 // Stencil operator for front face  
 depthstencilState.FrontFace.StencilFunc = D3D11_COMPARISON_ALWAYS;  
 depthstencilState.FrontFace.StencilDepthFailOp = D3D11_STENCIL_OP_KEEP;  
 depthstencilState.FrontFace.StencilFailOp = D3D11_STENCIL_OP_KEEP;  
 depthstencilState.FrontFace.StencilPassOp = D3D11_STENCIL_OP_REPLACE;  
   
 // Stencil operator for back face.  
 depthstencilState.BackFace.StencilFunc = D3D11_COMPARISON_ALWAYS;  
 depthstencilState.BackFace.StencilDepthFailOp = D3D11_STENCIL_OP_KEEP;  
 depthstencilState.BackFace.StencilFailOp = D3D11_STENCIL_OP_KEEP;  
 depthstencilState.BackFace.StencilPassOp = D3D11_STENCIL_OP_REPLACE;  
   
 pDevice->CreateDepthStencilState( &depthstencilState, &m_pDS_AssignValue );  


Stencil value to write to the buffer is passed as StencilRef via API call:

 // from now on set stencil buffer values to 8  
 pDevCon->OMSetDepthStencilState( m_pDS_AssignValue, 8 );  
 ...  
 pDevCon->DrawIndexed( ... );  

Rendering intensities

For this pass, in terms of implementation there is one R11G11B10_FLOAT full-screen texture which will be used to save interesting objects and traces in R and G channel, respectively.

In terms of intensity - why exactly do we need it? It turns out that Geralt's senses have, well, limitied radius so the particular object starts to be outlined when player is close enough.

See this aspect in action:




We start by clearing the intensity texture with black color.
Then two fullscreen draw calls are peformed: the first one for "traces" and the second one for interesting objects:


The first draw call is for traces - green channel:

And the second one for interesting objects - red channel:


Okay, but how we distinguish here which pixels will be considered? We have to use stencil buffer!
During each of these calls a stencil test is performed to accept only these pixels which were marked before with "8" (first Draw call) or "4".

Visualization of stencil test for traces:

...and for interesting objects:

How the test is performed in this case? For basics about stencil test, here is good blog post about it. The general stencil test formula is as follows:
 if (StencilRef & StencilReadMask OP StencilValue & StencilReadMask)  
   accept pixel  
 else  
   discard pixel  

where:
StencilRef is value passed with API call,
StencilReadMask is a mask used to read stencil value (note it's present on both left and right side),
OP is operator used for comparing, it's set through API,
StencilValue is value of stencil buffer in currently processed pixel.

It's important to be aware that we use binary ANDs to calculate operands.

Knowing the basics, let's see settings used during these drawcalls:
Stencil state for traces

Stencil state for interesing objects

Ha! As we can see, ReadMask is the only difference. Let's try it! Let's subsitute these values to stencil test equation:
 Let StencilReadMask = 0x08 and StencilRef = 0:  
   
 For a pixel with stencil = 8:  
 0 & 0x08 < 8 & 0x08  
 0 < 8
 TRUE  
   
 For a pixel with stencil = 4:  
 0 & 0x08 < 4 & 0x08  
 0 < 0  
 FALSE  
   

Ha! Clever. As you can see, in this scenario we don't compare stencil value but rather we check if particular bit of stencil buffer value is set on. Every pixel of stencil buffer is uint8, so we have [0-255].

Side note: All DrawIndexed(36) calls are related with rendering footsteps as traces, so the final look of intensity map in this particular frame is:



But before stencil test there is a pixel shader. Both 28738 and 28748 use the same pixel shader:
 ps_5_0  
    dcl_globalFlags refactoringAllowed  
    dcl_constantbuffer cb0[2], immediateIndexed  
    dcl_constantbuffer cb3[8], immediateIndexed  
    dcl_constantbuffer cb12[214], immediateIndexed  
    dcl_sampler s15, mode_default  
    dcl_resource_texture2d (float,float,float,float) t15  
    dcl_input_ps_siv v0.xy, position  
    dcl_output o0.xyzw  
    dcl_output o1.xyzw  
    dcl_output o2.xyzw  
    dcl_output o3.xyzw  
    dcl_temps 2  
   0: mul r0.xy, v0.xyxx, cb0[1].zwzz  
   1: sample_indexable(texture2d)(float,float,float,float) r0.x, r0.xyxx, t15.xyzw, s15  
   2: mul r1.xyzw, v0.yyyy, cb12[211].xyzw  
   3: mad r1.xyzw, cb12[210].xyzw, v0.xxxx, r1.xyzw  
   4: mad r0.xyzw, cb12[212].xyzw, r0.xxxx, r1.xyzw  
   5: add r0.xyzw, r0.xyzw, cb12[213].xyzw  
   6: div r0.xyz, r0.xyzx, r0.wwww  
   7: add r0.xyz, r0.xyzx, -cb3[7].xyzx  
   8: dp3 r0.x, r0.xyzx, r0.xyzx  
   9: sqrt r0.x, r0.x  
  10: mul r0.y, r0.x, l(0.120000)  
  11: log r1.x, abs(cb3[6].y)  
  12: mul r1.xy, r1.xxxx, l(2.800000, 0.800000, 0.000000, 0.000000)  
  13: exp r1.xy, r1.xyxx  
  14: mad r0.zw, r1.xxxy, l(0.000000, 0.000000, 120.000000, 120.000000), l(0.000000, 0.000000, 1.000000, 1.000000)  
  15: lt r1.x, l(0.030000), cb3[6].y  
  16: movc r0.xy, r1.xxxx, r0.yzyy, r0.xwxx  
  17: div r0.x, r0.x, r0.y  
  18: log r0.x, r0.x  
  19: mul r0.x, r0.x, l(1.600000)  
  20: exp r0.x, r0.x  
  21: add r0.x, -r0.x, l(1.000000)  
  22: max r0.x, r0.x, l(0)  
  23: mul o0.xyz, r0.xxxx, cb3[0].xyzx  
  24: mov o0.w, cb3[0].w  
  25: mov o1.xyzw, cb3[1].xyzw  
  26: mov o2.xyzw, cb3[2].xyzw  
  27: mov o3.xyzw, cb3[3].xyzw  
  28: ret  

This pixel shader writes to only one render target, so lines 24-27 are redundant.

The first thing which takes place here is sampling depth (with point clamp sampler), line 1. This value is used to reconstruct world position by multiplication with special matrix and, after that, perspective division (lines 2-6).

Having Geralt's position ( cb3[7].xyz - please note this is not a camera position! ) distance from Geralt to this particular point is calculated (lines 7-9).

Inputs which are important for this shader are:
- cb3[0].rgb - color of output. This can be float3(0, 1, 0) (traces) or float3(1, 0, 0)  (interesting objects),
- cb3[6].y - distance scaling factor. This directly affects radius and intensity of final output.

Later we have a bit tricky formulas to calculate intensity depending on distance from Geralt to object. My guess is that all coefficents were selected experimentally.
Final output is color*intensity.


The HLSL would be something like this:
 struct FSInput  
 {  
      float4 param0 : SV_Position;  
 };  
   
 struct FSOutput  
 {  
      float4 param0 : SV_Target0;  
      float4 param1 : SV_Target1;  
      float4 param2 : SV_Target2;  
      float4 param3 : SV_Target3;  
 };  
   
 float3 getWorldPos( float2 screenPos, float depth )  
 {  
   float4 worldPos = float4(screenPos, depth, 1.0);  
   worldPos = mul( worldPos, screenToWorld );  
     
   return worldPos.xyz / worldPos.w;  
 }  
   
 FSOutput EditedShaderPS(in FSInput IN)  
 {  
   // * Inputs    
   // Directly affects radius of the effect  
   float distanceScaling = cb3_v6.y;  
     
   // Color of output at the end  
   float3 color = cb3_v0.rgb;  
        

   // Sample depth  
   float2 uv = IN.param0.xy * cb0_v1.zw;  
   float depth = texture15.Sample( sampler15, uv ).x;  
     
   // Reconstruct world position  
   float3 worldPos = getWorldPos( IN.param0.xy, depth );  
   
   // Calculate distance from Geralt to world position of particular object  
   float dist_geraltToWorld = length( worldPos - cb3_v7.xyz );  
     
   // Calculate two squeezing params  
   float t0 = 1.0 + 120*pow( abs(distanceScaling), 2.8 );  
   float t1 = 1.0 + 120*pow( abs(distanceScaling), 0.8 );  
     
   // Determine nominator and denominator  
   float2 params;  
   params = (distanceScaling > 0.03) ? float2(dist_geraltToWorld * 0.12, t0) : float2(dist_geraltToWorld, t1);  
     
   // Distance Geralt <-> Object  
   float nominator = params.x;   
     
   // Hiding factor  
   float denominator = params.y;  
     
   // Raise to power of 1.6  
   float param = pow( params.x / params.y, 1.6 );  
     
   // Calculate final intensity  
   float intensity = max(0.0, 1.0 - param );   
     
     
   // * Final outputs.  
   // *  
   // * This PS outputs only one color, the rest  
   // * is redundant. I just added this to keep 1-1 ratio with  
   // * original assembly.  
   FSOutput OUT = (FSOutput)0;  
   OUT.param0.xyz = color * intensity;  
     
   // == redundant ==  
   OUT.param0.w = cb3_v0.w;  
   OUT.param1 = cb3_v1;  
   OUT.param2 = cb3_v2;  
   OUT.param3 = cb3_v3;  
   // ===============  
   
   return OUT;  
 }  

And small comparison between original (left) and my (right) shader assembly.

This was the first stage of witcher senses effect. Actually, the easiest one.
Go to the second one here.

Feel free to comment and share.
Thanks for reading!

PS. This is my first post as BSc in Computer Science. Feels great! :)