czwartek, 7 września 2017

Reverse engineering the rendering of The Witcher 3, part 1 - tonemapping

Hi!

In most of modern AAA games one of rendering stages you can encounter for sure is tonemapping.
Quick memory refreshment: In real life, there is a pretty huge luminance range, while our computer screens usually have a limited one (8bits per pixel, which gives 0-255). This is where tonemapping comes to party, because it allows to fit wider range of illumination into a limited one. Usually there are two inputs into this process: floating-point HDR image with color values exceeding 1.0 and an average luminance of scene (the latter can be calculated in a few ways, possibly with eye adaptation to simulate human's eye behavior, but this is not important here).

The next (and final) step consists of obtaining an exposure, calculating exposed color and processing it through tonemapping curve. This is where things start to be a bit messy, because new concepts appear, like "white point" and "middle gray". There are at least few popular curves and Matt Pettineo's article "A Closer Look at Tone Mapping" investigates some of them.

To be honest, I've alvays had problems with proper implementation of tonemapping in my code. There are at least a few different examples online which luckily turned out to be helpful... well, to some point. Some of them incorporate HDR luminance/white point/middle gray to account, some do not - which doesn't really help. I wanted to have a "battle-proven" implementation.

Recently I've started messing around rendering of The Witcher 3. This game has some awesome rendering trickery. And it's great, in terms of story/music/gameplay/eveything.


Ah, before I forget! This post is the first of short series which investigates some rendering solutions from The Witcher 3. It absolutely will not be as comprehensive, as Adrian Courrèges's GTA V graphics study, at least for now :)
We'll start by reverse-engineering tonemapping. Let's start!

We will be working on RenderDoc's capture from this frame from one of main quests from Novigrad City. All settings maxed:



After some search, there is a draw call for tonemapping! As I mentioned earlier, there is a HDR color buffer (texture #0, full res) and average luminance of scene (texture #1, 1x1, floating-point, calculated earlier by compute shader).


Let's take a look at pixel shader assembly:

 ps_5_0  
    dcl_globalFlags refactoringAllowed  
    dcl_constantbuffer cb3[17], immediateIndexed  
    dcl_resource_texture2d (float,float,float,float) t0  
    dcl_resource_texture2d (float,float,float,float) t1  
    dcl_input_ps_siv v0.xy, position  
    dcl_output o0.xyzw  
    dcl_temps 4  
   0: ld_indexable(texture2d)(float,float,float,float) r0.x, l(0, 0, 0, 0), t1.xyzw  
   1: max r0.x, r0.x, cb3[4].y  
   2: min r0.x, r0.x, cb3[4].z  
   3: max r0.x, r0.x, l(0.000100)  
   4: mul r0.y, cb3[16].x, l(11.200000)  
   5: div r0.x, r0.x, r0.y  
   6: log r0.x, r0.x  
   7: mul r0.x, r0.x, cb3[16].z  
   8: exp r0.x, r0.x  
   9: mul r0.x, r0.y, r0.x  
  10: div r0.x, cb3[16].x, r0.x  
  11: ftou r1.xy, v0.xyxx  
  12: mov r1.zw, l(0, 0, 0, 0)  
  13: ld_indexable(texture2d)(float,float,float,float) r0.yzw, r1.xyzw, t0.wxyz  
  14: mul r0.xyz, r0.yzwy, r0.xxxx  
  15: mad r1.xyz, cb3[7].xxxx, r0.xyzx, cb3[7].yyyy  
  16: mul r2.xy, cb3[8].yzyy, cb3[8].xxxx  
  17: mad r1.xyz, r0.xyzx, r1.xyzx, r2.yyyy  
  18: mul r0.w, cb3[7].y, cb3[7].z  
  19: mad r3.xyz, cb3[7].xxxx, r0.xyzx, r0.wwww  
  20: mad r0.xyz, r0.xyzx, r3.xyzx, r2.xxxx  
  21: div r0.xyz, r0.xyzx, r1.xyzx  
  22: mad r0.w, cb3[7].x, l(11.200000), r0.w  
  23: mad r0.w, r0.w, l(11.200000), r2.x  
  24: div r1.x, cb3[8].y, cb3[8].z  
  25: add r0.xyz, r0.xyzx, -r1.xxxx  
  26: max r0.xyz, r0.xyzx, l(0, 0, 0, 0)  
  27: mul r0.xyz, r0.xyzx, cb3[16].yyyy  
  28: mad r1.y, cb3[7].x, l(11.200000), cb3[7].y  
  29: mad r1.y, r1.y, l(11.200000), r2.y  
  30: div r0.w, r0.w, r1.y  
  31: add r0.w, -r1.x, r0.w  
  32: max r0.w, r0.w, l(0)  
  33: div o0.xyz, r0.xyzx, r0.wwww  
  34: mov o0.w, l(1.000000)  
  35: ret  

Some things to notice here. First of all, the loaded luminance does not have to be the used one, as it is being clamped (max/min calls) to values (from constant buffer) set by artists. This is handy, because it prevents overexposing or underexposing our scene. Sounds pretty obvious, but I've never done this before. And second - anyone familiar with tonemapping curves will quickly recognize this "11.2", as it is essentialy white point value from John Hable's Uncharted2 tonemapping curve.
A-F params are loaded from cbuffer.
Okay, there are also three more parameters: cb3_v16.x, cb3_v16.y, cb3_v16.z. We can investigate their values:

Some guessing:
I think the 'x' is some sort of 'white scale' or middle gray, as it is multiplied by 11.2 (line 4), and then this is numerator in calculation of exposure adjustment (line 10).
'y' - I called it "u2 numerator multiplier", you'll see why in a moment.
'z' - "exponent param", as it is used in log/mul/exp triple (essentialy exponentiation).
Please take these variable names with a grain of salt!

Also:
cb3_v4.yz - min/max values of allowed luminance,
cb3_v7.xyz - A-C params of Uncharted2 curve,
cb3_v8.xyz - D-F params of Uncharted2 curve.


Now the hard part - writing HLSL shader with will give us exactly the same assembly.
This can be very tricky, and the longer shader = the harder this task is. Luckily, some time ago I've written a tool which allows me to quickly view hlsl->asm.
Ladies and gentlemen... please give a warm welcome to D3DShaderDisassembler! :)



After some playing with code, here is the final "The Witcher 3 Tonemapping" HLSL:

 cbuffer cBuffer : register (b3)  
 {  
   float4 cb3_v0;  
   float4 cb3_v1;  
   float4 cb3_v2;  
   float4 cb3_v3;  
   float4 cb3_v4;  
   float4 cb3_v5;  
   float4 cb3_v6;  
   float4 cb3_v7;  
   float4 cb3_v8;  
   float4 cb3_v9;  
   float4 cb3_v10;  
   float4 cb3_v11;  
   float4 cb3_v12;  
   float4 cb3_v13;  
   float4 cb3_v14;  
   float4 cb3_v15;  
   float4 cb3_v16, cb3_v17;  
 }  
   
 Texture2D     TexHDRColor          : register (t0);  
 Texture2D     TexAvgLuminance     : register (t1);  
   
 struct VS_OUTPUT_POSTFX  
 {  
   float4 Position : SV_Position;  
 };  
   
 float3 U2Func( float A, float B, float C, float D, float E, float F, float3 x )  
 {  
      return ((x*(A*x+C*B)+D*E)/(x*(A*x+B)+D*F)) - E/F;  
 }  
   
 float3 ToneMapU2Func( float A, float B, float C, float D, float E, float F, float3 color, float numMultiplier )  
 {  
      float3 numerator =  U2Func( A, B, C, D, E, F, color );  
      numerator = max( numerator, 0 );  
      numerator.rgb *= numMultiplier;  
   
      float3 denominator = U2Func( A, B, C, D, E, F, 11.2 );  
      denominator = max( denominator, 0 );  
   
      return numerator / denominator;  
 }  
   
   
   
 float4 ToneMappingPS( VS_OUTPUT_POSTFX Input) : SV_Target0  
 {  
      float avgLuminance = TexAvgLuminance.Load( int3(0, 0, 0) );  
      avgLuminance = clamp( avgLuminance, cb3_v4.y, cb3_v4.z );  
      avgLuminance = max( avgLuminance, 1e-4 );  
   
      float scaledWhitePoint = cb3_v16.x * 11.2;  
   
      float luma = avgLuminance / scaledWhitePoint;  
      luma = pow( luma, cb3_v16.z );  
   
      luma = luma * scaledWhitePoint;  
      luma = cb3_v16.x / luma;  
   
      float3 HDRColor = TexHDRColor.Load( uint3(Input.Position.xy, 0) ).rgb;  
   
      float3 color = ToneMapU2Func( cb3_v7.x, cb3_v7.y, cb3_v7.z, cb3_v8.x, cb3_v8.y,   
         cb3_v8.z, luma*HDRColor, cb3_v16.y);  
   
      return float4(color, 1);  
 }  


And a quick screenshot from my tool to prove it:

Voilà! :)
I believe this is quite proper implementation of TW3 Tonemapping, at least in terms of assembly.
I already have this in my framework and it works well! Stay tuned for more!

I said "quite", because I have no heck idea why denominator in ToneMapU2Func is maxed with zero. Division by 0 is undefined, right?


Well... we could end right now, but quite accidentally I've found another variant of tonemapping shader in TW3 at this frame, at beautiful dusk (interestingly, minimum graphics settings!)


Let's check this out. At first, shader assembly:

 ps_5_0  
    dcl_globalFlags refactoringAllowed  
    dcl_constantbuffer cb3[18], immediateIndexed  
    dcl_resource_texture2d (float,float,float,float) t0  
    dcl_resource_texture2d (float,float,float,float) t1  
    dcl_input_ps_siv v0.xy, position  
    dcl_output o0.xyzw  
    dcl_temps 5  
   0: ld_indexable(texture2d)(float,float,float,float) r0.x, l(0, 0, 0, 0), t1.xyzw  
   1: max r0.y, r0.x, cb3[9].y  
   2: max r0.x, r0.x, cb3[4].y  
   3: min r0.x, r0.x, cb3[4].z  
   4: min r0.y, r0.y, cb3[9].z  
   5: max r0.xy, r0.xyxx, l(0.000100, 0.000100, 0.000000, 0.000000)  
   6: mul r0.z, cb3[17].x, l(11.200000)  
   7: div r0.y, r0.y, r0.z  
   8: log r0.y, r0.y  
   9: mul r0.y, r0.y, cb3[17].z  
  10: exp r0.y, r0.y  
  11: mul r0.y, r0.z, r0.y  
  12: div r0.y, cb3[17].x, r0.y  
  13: ftou r1.xy, v0.xyxx  
  14: mov r1.zw, l(0, 0, 0, 0)  
  15: ld_indexable(texture2d)(float,float,float,float) r1.xyz, r1.xyzw, t0.xyzw  
  16: mul r0.yzw, r0.yyyy, r1.xxyz  
  17: mad r2.xyz, cb3[11].xxxx, r0.yzwy, cb3[11].yyyy  
  18: mul r3.xy, cb3[12].yzyy, cb3[12].xxxx  
  19: mad r2.xyz, r0.yzwy, r2.xyzx, r3.yyyy  
  20: mul r1.w, cb3[11].y, cb3[11].z  
  21: mad r4.xyz, cb3[11].xxxx, r0.yzwy, r1.wwww  
  22: mad r0.yzw, r0.yyzw, r4.xxyz, r3.xxxx  
  23: div r0.yzw, r0.yyzw, r2.xxyz  
  24: mad r1.w, cb3[11].x, l(11.200000), r1.w  
  25: mad r1.w, r1.w, l(11.200000), r3.x  
  26: div r2.x, cb3[12].y, cb3[12].z  
  27: add r0.yzw, r0.yyzw, -r2.xxxx  
  28: max r0.yzw, r0.yyzw, l(0, 0, 0, 0)  
  29: mul r0.yzw, r0.yyzw, cb3[17].yyyy  
  30: mad r2.y, cb3[11].x, l(11.200000), cb3[11].y  
  31: mad r2.y, r2.y, l(11.200000), r3.y  
  32: div r1.w, r1.w, r2.y  
  33: add r1.w, -r2.x, r1.w  
  34: max r1.w, r1.w, l(0)  
  35: div r0.yzw, r0.yyzw, r1.wwww  
  36: mul r1.w, cb3[16].x, l(11.200000)  
  37: div r0.x, r0.x, r1.w  
  38: log r0.x, r0.x  
  39: mul r0.x, r0.x, cb3[16].z  
  40: exp r0.x, r0.x  
  41: mul r0.x, r1.w, r0.x  
  42: div r0.x, cb3[16].x, r0.x  
  43: mul r1.xyz, r1.xyzx, r0.xxxx  
  44: mad r2.xyz, cb3[7].xxxx, r1.xyzx, cb3[7].yyyy  
  45: mul r3.xy, cb3[8].yzyy, cb3[8].xxxx  
  46: mad r2.xyz, r1.xyzx, r2.xyzx, r3.yyyy  
  47: mul r0.x, cb3[7].y, cb3[7].z  
  48: mad r4.xyz, cb3[7].xxxx, r1.xyzx, r0.xxxx  
  49: mad r1.xyz, r1.xyzx, r4.xyzx, r3.xxxx  
  50: div r1.xyz, r1.xyzx, r2.xyzx  
  51: mad r0.x, cb3[7].x, l(11.200000), r0.x  
  52: mad r0.x, r0.x, l(11.200000), r3.x  
  53: div r1.w, cb3[8].y, cb3[8].z  
  54: add r1.xyz, -r1.wwww, r1.xyzx  
  55: max r1.xyz, r1.xyzx, l(0, 0, 0, 0)  
  56: mul r1.xyz, r1.xyzx, cb3[16].yyyy  
  57: mad r2.x, cb3[7].x, l(11.200000), cb3[7].y  
  58: mad r2.x, r2.x, l(11.200000), r3.y  
  59: div r0.x, r0.x, r2.x  
  60: add r0.x, -r1.w, r0.x  
  61: max r0.x, r0.x, l(0)  
  62: div r1.xyz, r1.xyzx, r0.xxxx  
  63: add r0.xyz, r0.yzwy, -r1.xyzx  
  64: mad o0.xyz, cb3[13].xxxx, r0.xyzx, r1.xyzx  
  65: mov o0.w, l(1.000000)  
  66: ret  
   

It may look intimidating at first, but actually it's not that bad. After a quick analysis we can notice that there are 2 calls to Uncharted2 func with different sets of input data
(A-F, min/max luminance...). I haven't encountered such solution before.

And HLSL:
 cbuffer cBuffer : register (b3)  
 {  
   float4 cb3_v0;  
   float4 cb3_v1;  
   float4 cb3_v2;  
   float4 cb3_v3;  
   float4 cb3_v4;  
   float4 cb3_v5;  
   float4 cb3_v6;  
   float4 cb3_v7;  
   float4 cb3_v8;  
   float4 cb3_v9;  
   float4 cb3_v10;  
   float4 cb3_v11;  
   float4 cb3_v12;  
   float4 cb3_v13;  
   float4 cb3_v14;  
   float4 cb3_v15;  
   float4 cb3_v16, cb3_v17;  
 }  
   
 Texture2D     TexHDRColor     : register (t0);  
 Texture2D     TexAvgLuminance     : register (t1);  
   
 float3 U2Func( float A, float B, float C, float D, float E, float F, float3 x )  
 {  
      return ((x*(A*x+C*B)+D*E)/(x*(A*x+B)+D*F)) - E/F;  
 }  
   
 float3 ToneMapU2Func( float A, float B, float C, float D, float E, float F, float3 color, float numMultiplier )  
 {  
      float3 numerator =  U2Func( A, B, C, D, E, F, color );  
      numerator = max( numerator, 0 );  
      numerator.rgb *= numMultiplier;  
   
      float3 denominator = U2Func( A, B, C, D, E, F, 11.2 );  
      denominator = max( denominator, 0 );  
   
      return numerator / denominator;  
 }  
   
 struct VS_OUTPUT_POSTFX  
 {  
   float4 Position : SV_Position;  
 };  
   
 float getExposure(float avgLuminance, float minLuminance, float maxLuminance, float middleGray, float powParam)  
 {  
      avgLuminance = clamp( avgLuminance, minLuminance, maxLuminance );  
      avgLuminance = max( avgLuminance, 1e-4 );  
   
      float scaledWhitePoint = middleGray * 11.2;  
   
      float luma = avgLuminance / scaledWhitePoint;  
      luma = pow( luma, powParam);  
   
      luma = luma * scaledWhitePoint;  
      float exposure = middleGray / luma;  
      return exposure;  
 }  
   
 float4 ToneMappingPS( VS_OUTPUT_POSTFX Input) : SV_Target0  
 {  
      float avgLuminance = TexAvgLuminance.Load( int3(0, 0, 0) );  
     
   
      float exposure1 = getExposure( avgLuminance, cb3_v9.y, cb3_v9.z, cb3_v17.x, cb3_v17.z);  
      float exposure2 = getExposure( avgLuminance, cb3_v4.y, cb3_v4.z, cb3_v16.x, cb3_v16.z);  
   
        
      float3 HDRColor = TexHDRColor.Load( uint3(Input.Position.xy, 0) ).rgb;  
   
      float3 color1 = ToneMapU2Func( cb3_v11.x, cb3_v11.y, cb3_v11.z, cb3_v12.x, cb3_v12.y,   
         cb3_v12.z, exposure1*HDRColor, cb3_v17.y);  
   
      float3 color2 = ToneMapU2Func( cb3_v7.x, cb3_v7.y, cb3_v7.z, cb3_v8.x, cb3_v8.y,   
         cb3_v8.z, exposure2*HDRColor, cb3_v16.y);  
      
      float3 finalColor = lerp( color2, color1, cb3_v13.x ); 
      return float4(finalColor, 1);  
 }  
   

So basically we have 2 sets of control params, then calculate two tonemapped colors and at the end we interpolate them. Smart!

Feel free to comment, maybe there's something I have missed.
I hope you enjoyed this post :)

Have a good day,
M.

4 komentarze:

  1. Nice work! Is your D3DShaderDisassembler available to download somewhere? It looks quite useful :)

    OdpowiedzUsuń
    Odpowiedzi
    1. Thank you! D3DShaderDisassembler will be available soon - need to fix some nasty bugs :) Keep an eye on my twitter.

      Usuń
  2. Awesome! I love it when people dig into stuff they love, to better understand it!

    OdpowiedzUsuń