wtorek, 13 listopada 2018

Reverse engineering the rendering of The Witcher 3, part 6 - sharpen


Hi,

Today we will take a closer look at another postprocess from The Witcher 3 - sharpen.
Sharpening makes an output image a bit crisper. The effect is known from Photoshop and other image editors.

In The Witcher 3 sharpening has two presets: low and high. I will discuss differences between them later, let's take a look at some screenshots now:

"Low" setting - before
"Low" setting - after


"High" setting - before
"High" setting - after
If you want to see more (interactive) comparisons, see section in Nvidia's performance guide for The Witcher 3. As you can see, the effect is particularly visible on grass and foliage.

In this post we will invesitgate frame from the very beginning of the game: I selected this one purposefully, because here we see terrain (long draw distance) and skydome.

In terms of input, sharpening requires color buffer t0 (LDR, after tonemapping and lens flares) and depth buffer t1.

Let's see the pixel shader, assembly:

 ps_5_0  
    dcl_globalFlags refactoringAllowed  
    dcl_constantbuffer cb3[3], immediateIndexed  
    dcl_constantbuffer cb12[23], immediateIndexed  
    dcl_sampler s0, mode_default  
    dcl_resource_texture2d (float,float,float,float) t0  
    dcl_resource_texture2d (float,float,float,float) t1  
    dcl_input_ps_siv v0.xy, position  
    dcl_output o0.xyzw  
    dcl_temps 7  
   0: ftoi r0.xy, v0.xyxx  
   1: mov r0.zw, l(0, 0, 0, 0)  
   2: ld_indexable(texture2d)(float,float,float,float) r0.x, r0.xyzw, t1.xyzw  
   3: mad r0.x, r0.x, cb12[22].x, cb12[22].y  
   4: mad r0.y, r0.x, cb12[21].x, cb12[21].y  
   5: max r0.y, r0.y, l(0.000100)  
   6: div r0.y, l(1.000000, 1.000000, 1.000000, 1.000000), r0.y  
   7: mad_sat r0.y, r0.y, cb3[1].z, cb3[1].w  
   8: add r0.z, -cb3[1].x, cb3[1].y  
   9: mad r0.y, r0.y, r0.z, cb3[1].x  
  10: add r0.y, r0.y, l(1.000000)  
  11: ge r0.x, r0.x, l(1.000000)  
  12: movc r0.x, r0.x, l(0), l(1.000000)  
  13: mul r0.z, r0.x, r0.y  
  14: round_z r1.xy, v0.xyxx  
  15: add r1.xy, r1.xyxx, l(0.500000, 0.500000, 0.000000, 0.000000)  
  16: div r1.xy, r1.xyxx, cb3[0].zwzz  
  17: sample_l(texture2d)(float,float,float,float) r2.xyz, r1.xyxx, t0.xyzw, s0, l(0)  
  18: lt r0.z, l(0), r0.z  
  19: if_nz r0.z  
  20:  div r3.xy, l(0.500000, 0.500000, 0.000000, 0.000000), cb3[0].zwzz  
  21:  add r0.zw, r1.xxxy, -r3.xxxy  
  22:  sample_l(texture2d)(float,float,float,float) r4.xyz, r0.zwzz, t0.xyzw, s0, l(0)  
  23:  mov r3.zw, -r3.xxxy  
  24:  add r5.xyzw, r1.xyxy, r3.zyxw  
  25:  sample_l(texture2d)(float,float,float,float) r6.xyz, r5.xyxx, t0.xyzw, s0, l(0)  
  26:  add r4.xyz, r4.xyzx, r6.xyzx  
  27:  sample_l(texture2d)(float,float,float,float) r5.xyz, r5.zwzz, t0.xyzw, s0, l(0)  
  28:  add r4.xyz, r4.xyzx, r5.xyzx  
  29:  add r0.zw, r1.xxxy, r3.xxxy  
  30:  sample_l(texture2d)(float,float,float,float) r1.xyz, r0.zwzz, t0.xyzw, s0, l(0)  
  31:  add r1.xyz, r1.xyzx, r4.xyzx  
  32:  mul r3.xyz, r1.xyzx, l(0.250000, 0.250000, 0.250000, 0.000000)  
  33:  mad r1.xyz, -r1.xyzx, l(0.250000, 0.250000, 0.250000, 0.000000), r2.xyzx  
  34:  max r0.z, abs(r1.z), abs(r1.y)  
  35:  max r0.z, r0.z, abs(r1.x)  
  36:  mad_sat r0.z, r0.z, cb3[2].x, cb3[2].y  
  37:  mad r0.x, r0.y, r0.x, l(-1.000000)  
  38:  mad r0.x, r0.z, r0.x, l(1.000000)  
  39:  dp3 r0.y, l(0.212600, 0.715200, 0.072200, 0.000000), r2.xyzx  
  40:  dp3 r0.z, l(0.212600, 0.715200, 0.072200, 0.000000), r3.xyzx  
  41:  max r0.w, r0.y, l(0.000100)  
  42:  div r1.xyz, r2.xyzx, r0.wwww  
  43:  add r0.y, -r0.z, r0.y  
  44:  mad r0.x, r0.x, r0.y, r0.z  
  45:  max r0.x, r0.x, l(0)  
  46:  mul r2.xyz, r0.xxxx, r1.xyzx  
  47: endif  
  48: mov o0.xyz, r2.xyzx  
  49: mov o0.w, l(1.000000)  
  50: ret  

50 lines of assembly seems like pretty doable task. Let's start it then.


Sharpen amount generation

The first step is to Load depth buffer (line 1). Note that The Witcher 3 uses revesed depth (1.0 - near, 0.0 - far). As you may know, hardware depth is mapped in non-linear way (see this article for details).

Lines 3-6 perform very interesting way of mapping this hardware depth [1.0 - 0.0] to [near-far] values (you set them during MatrixPerspectiveFov). See values from constant buffer:


Having near value of 0.2 and far of 5000 I believe you can calculate values of cb12_v21.xy this way:

cb12_v21.y = 1.0 / near
cb12_v21.x = - (1.0 / near) + (1.0 / near) * (near / far)

This piece of code appears quite often in shaders from TW3, so I believe it's just a function.

When we already have "frustum depth", line 7 uses scale/bias to create a interpolation coefficient (we use saturate here to make sure it's clamped to [0-1] range).


cb3_v1.xy are intensities of sharpening at near and far distances, respectively. Let's call them "sharpenNear" and "sharpenFar". And this is the only difference between "Low" and "High" presets of this effect in The Witcher 3.

Now it's time to use the obtained coefficient. Lines 8-9 are just lerp(sharpenNear, sharpenFar, interpolationCoeff). What is this for? Thanks to that we can have different intensity near Geralt and far away from him). See:



It may be barely visible, but here we interpolated the intensity of sharpen near the player (2.177151) with intensity of the effect far away (1.91303) based on distance. Once we have calculated it we add 1.0 (line 10) to intensity. What is this for? Let's assume that lerp from above gave us 0.0. When we add 1.0 we will have 1.0 of course and this is value which will not affect the pixel during sharpening. More on this later.

During sharpening process we don't want to affect sky. We can achieve this using simple conditional test:

   // Do not perform sharpen on sky  
   float fSkyboxTest = (fDepth >= 1.0) ? 0 : 1;  

In The Witcher 3 depth value for sky pixels is 1.0, so we use it to get some sort of "binary filter" (fun fact: step does not work properly in this case)
Now we can multiply interpolated intenstiy with "sky filter":


This multiplication takes place in line 13.
Example shader code:
   // Calculate final sharpen amount  
   float fSharpenAmount = fSharpenIntensity * fSkyboxTest;  


Sampling center of the pixel

There is an aspect of SV_Position which will be important here: half-pixel offset. It turns out that pixel at top left corner (0, 0) is not (0, 0) in terms of SV_Position.xy, but (0.5, 0.5). Wow!

Here we want to sample in center of the pixel, so take a look at lines 14-16. We can write it in HLSL:
   // Sample the center of the pixel.   
   // Get rid of "half-pixel" offset from SV_Position.xy.  
   float2 uvCenter = trunc( Input.Position.xy );  

   // Add half-pixel to make sure we will sample the center of the pixel  
   uvCenter += float2(0.5, 0.5);  
   uvCenter /= g_Viewport.xy  

And later we sample input color texture from "uvCenter" texcoords. Don't worry, the effect of the sampling will be the same as using "typical" (SV_Position.xy / ViewportSize.xy).

To sharpen or not to sharpen

The decision whether to sharpen or not is based on fSharpenAmount.

   // Get the value of current pixel  
   float3 colorCenter = TexColorBuffer.SampleLevel( samplerLinearClamp, uvCenter, 0 ).rgb;  
     
   // Final result  
   float3 finalColor = colorCenter;  
   
   if ( fSharpenAmount > 0 )  
   {  
     // do the sharpening here...  
   }  
   
  return float4( finalColor, 1 );  
   

Sharpen

It's time to look at the heart of the algorithm.
Basically:
- sample the input color texture four times at the corners of the pixel,
- add the samples and calculate average value,
- calcuate the difference between "center" and "cornerAverage",
- find maximum absolute component of the difference,
- adjust max. abs. component using scale+biasvalues,
- determine amount of the effect using max. abs. component,
- calculate luma of "centerColor" and "averageColor",
- divide the colorCenter by its luma,
- caclulate the new, interpolated luma using amount of the effect,
- multiply the colorCenter by the new luma

Seems like lots of things and it was a challenge for me to understand it, since I've never played with sharpening filters. 

Let's start with sampling pattern. As you can see in the assembly, there are four texture fetches.
It will be best to show it using this image of pixel (Paint level expert):
All fetches in the shader use bilinear sampling (D3D11_FILTER_MIN_MAG_LINEAR_MIP_POINT).

The offset from center to any corner is (±0.5, ±0.5), depending on corner.
See how this can be done in HLSL? Let's see:
    float2 uvCorner;  
    float2 uvOffset = float2( 0.5, 0.5 ) / g_Viewport.xy;  // remember about division!
    
    float3 colorCorners = 0;  
             
    // Top left corner  
    // -0,5, -0.5  
    uvCorner = uvCenter - uvOffset;  
    colorCorners += TexColorBuffer.SampleLevel( samplerLinearClamp, uvCorner, 0 ).rgb;  
   
    // Top right corner  
    // +0.5, -0.5  
    uvCorner = uvCenter + float2(uvOffset.x, -uvOffset.y);  
    colorCorners += TexColorBuffer.SampleLevel( samplerLinearClamp, uvCorner, 0 ).rgb;  
   
    // Bottom left corner  
    // -0.5, +0.5  
    uvCorner = uvCenter + float2(-uvOffset.x, uvOffset.y);  
    colorCorners += TexColorBuffer.SampleLevel( samplerLinearClamp, uvCorner, 0 ).rgb;  
   
    // Bottom right corner  
    // +0.5, +0.5  
    uvCorner = uvCenter + uvOffset;  
    colorCorners += TexColorBuffer.SampleLevel( samplerLinearClamp, uvCorner, 0 ).rgb;  

So now we have all four samples summed in "colorCorners" variable. Let's perform the next steps:

   // Calculate the average of four corners  
   float3 averageColorCorners = colorCorners / 4.0;  
   
   // Calculate the color difference  
   float3 diffColor = colorCenter - averageColorCorners;  
   
   // Find max absolute RGB component of the difference  
   float fDiffColorMaxComponent = max( abs(diffColor.x), max( abs(diffColor.y), abs(diffColor.z) ) );  
   
   // Adjust this factor  
   float fDiffColorMaxComponentScaled = saturate( fDiffColorMaxComponent * sharpenLumScale + sharpenLumBias );  
   
   // Calculate how much pixel will be sharpened.  
   // Note the "1.0" here - this is why we added "1.0" before to fSharpenIntensity.  
   float fPixelSharpenAmount = lerp(1.0, fSharpenAmount, fDiffColorMaxComponentScaled);  
    
   // Calculate luminance of "center" of the pixel and luminance of average value.  
   float lumaCenter = dot( LUMINANCE_RGB, finalColor );  
   float lumaCornersAverage = dot( LUMINANCE_RGB, averageColorCorners );  
       
   // divide "centerColor" by its luma  
   float3 fColorBalanced = colorCenter / max( lumaCenter, 1e-4 );  
     
   // Calc the new luma  
   float fPixelLuminance = lerp(lumaCornersAverage, lumaCenter, fPixelSharpenAmount);  
       
   // Calc the output color  
   finalColor = fColorBalanced * max(fPixelLuminance, 0.0);  
}

return float4(finalColor, 1.0);

The edge detection is done by calculating max. abs. component of the difference. Smart! See its visualization:
Visualization of maximum absolute component of the difference.


Phew. The final HLSL shader is available here. Sorry for quite poor formatting. Feel free to use my HLSLexplorer and play with the code.

I am happy to say that the code above gives exactly the same assembly as in the game! :)

To sum up, The Witcher 3's sharpening shader is very well written (notice that fPixelSharpenAmount is larger than 1.0! that is interesting...). Also, the primary way to modify intensity of the effect are near/far intensities. In the game, they are not constant throughout the gamplay; I collected some example values:

Skellige:

sharpenNear sharpenFar sharpenDistanceScale sharpenDistanceBias sharpenLumScale sharpenLumBias
low
0.40
0.20
0.025
-0.25
-13.33333
1.33333
high
2.0
1.8
0.025
-0.25
-13.33333
1.33333

Kaer Morhen:
sharpenNear
sharpenFar
sharpenDistanceScale
sharpenDistanceBias
sharpenLumScale
sharpenLumBias
low
0.57751
0.31303
0.06665
-0.33256
-1.0
2.0
high
2.17751
1.91303
0.06665
-0.33256
-1.0
2.0


That's it for today. I hope you enjoyed it :)
Thanks for reading!

M.

4 komentarze:

  1. Hej! Jesteś w stanie zdekompilować shadery z gry Metro 2033? Są w dziwnym, formacie. Tak jak by to była paczka. Nie jestem w stanie tego rozłożyć nawet, więc pomyślałem że spytam kogoś doświadczonego w temacie. Pozdrawiam

    OdpowiedzUsuń
    Odpowiedzi
    1. Cześć! Niestety nie grałem w Metro 2033. Z tego co znalazłem, w niej używany jest DX11 więc powinno dać wyciągnąć coś z shaderów poprzez RenderDoc.

      Usuń