piątek, 31 sierpnia 2018

Reverse engineering the rendering of The Witcher 3, part 5 - drunk effect

Hi,

Let's take a look how drunk effect is implemented in The Witcher 3: Wild Hunt.
If you haven't played it yet, drop anything you're doing, buy it and play it see these videos:

Evening:


Night:


At first we see "double rotating" image, pretty common when you're not sober in real life. The more distant the pixel is from the center of image, the rotation effect is stronger. I posted the second video at night on purpose, because you can clearly see this rotation on stars (do you see 8 separate points?)

The second part of TW3 drunk effect, maybe not so visible at first sight, is slight zooming in and out. It's visible near the center.

It's probably obvious that this effect is typical postprocess (pixel shader). However, the order of it in pipeline may not be so obvious. It turns out that drunk effect is applied just *after* tonemapping and just before motion blur (the drunk image is input for motion blur).

Let's start the assembly game:

 ps_5_0  
    dcl_globalFlags refactoringAllowed  
    dcl_constantbuffer cb0[2], immediateIndexed  
    dcl_constantbuffer cb3[3], immediateIndexed  
    dcl_sampler s0, mode_default  
    dcl_resource_texture2d (float,float,float,float) t0  
    dcl_input_ps_siv v1.xy, position  
    dcl_output o0.xyzw  
    dcl_temps 8  
   0: mad r0.x, cb3[0].y, l(-0.100000), l(1.000000)  
   1: mul r0.yz, cb3[1].xxyx, l(0.000000, 0.050000, 0.050000, 0.000000)  
   2: mad r1.xy, v1.xyxx, cb0[1].zwzz, -cb3[2].xyxx  
   3: dp2 r0.w, r1.xyxx, r1.xyxx  
   4: sqrt r1.z, r0.w  
   5: mul r0.w, r0.w, l(10.000000)  
   6: min r0.w, r0.w, l(1.000000)  
   7: mul r0.w, r0.w, cb3[0].y  
   8: mul r2.xyzw, r0.yzyz, r1.zzzz  
   9: mad r2.xyzw, r1.xyxy, r0.xxxx, -r2.xyzw  
  10: mul r3.xy, r0.xxxx, r1.xyxx  
  11: mad r3.xyzw, r0.yzyz, r1.zzzz, r3.xyxy  
  12: add r3.xyzw, r3.xyzw, cb3[2].xyxy  
  13: add r2.xyzw, r2.xyzw, cb3[2].xyxy  
  14: mul r0.x, r0.w, cb3[0].x  
  15: mul r0.x, r0.x, l(5.000000)  
  16: mul r4.xyzw, r0.xxxx, cb3[0].zwzw  
  17: mad r5.xyzw, r4.zwzw, l(1.000000, 0.000000, -1.000000, -0.000000), r2.xyzw  
  18: sample_indexable(texture2d)(float,float,float,float) r6.xyzw, r5.xyxx, t0.xyzw, s0  
  19: sample_indexable(texture2d)(float,float,float,float) r5.xyzw, r5.zwzz, t0.xyzw, s0  
  20: add r5.xyzw, r5.xyzw, r6.xyzw  
  21: mad r6.xyzw, r4.zwzw, l(0.707000, 0.707000, -0.707000, -0.707000), r2.xyzw  
  22: sample_indexable(texture2d)(float,float,float,float) r7.xyzw, r6.xyxx, t0.xyzw, s0  
  23: sample_indexable(texture2d)(float,float,float,float) r6.xyzw, r6.zwzz, t0.xyzw, s0  
  24: add r5.xyzw, r5.xyzw, r7.xyzw  
  25: add r5.xyzw, r6.xyzw, r5.xyzw  
  26: mad r6.xyzw, r4.zwzw, l(0.000000, 1.000000, -0.000000, -1.000000), r2.xyzw  
  27: mad r2.xyzw, r4.xyzw, l(-0.707000, 0.707000, 0.707000, -0.707000), r2.xyzw  
  28: sample_indexable(texture2d)(float,float,float,float) r7.xyzw, r6.xyxx, t0.xyzw, s0  
  29: sample_indexable(texture2d)(float,float,float,float) r6.xyzw, r6.zwzz, t0.xyzw, s0  
  30: add r5.xyzw, r5.xyzw, r7.xyzw  
  31: add r5.xyzw, r6.xyzw, r5.xyzw  
  32: sample_indexable(texture2d)(float,float,float,float) r6.xyzw, r2.xyxx, t0.xyzw, s0  
  33: sample_indexable(texture2d)(float,float,float,float) r2.xyzw, r2.zwzz, t0.xyzw, s0  
  34: add r5.xyzw, r5.xyzw, r6.xyzw  
  35: add r2.xyzw, r2.xyzw, r5.xyzw  
  36: mul r2.xyzw, r2.xyzw, l(0.062500, 0.062500, 0.062500, 0.062500)  
  37: mad r5.xyzw, r4.zwzw, l(1.000000, 0.000000, -1.000000, -0.000000), r3.zwzw  
  38: sample_indexable(texture2d)(float,float,float,float) r6.xyzw, r5.xyxx, t0.xyzw, s0  
  39: sample_indexable(texture2d)(float,float,float,float) r5.xyzw, r5.zwzz, t0.xyzw, s0  
  40: add r5.xyzw, r5.xyzw, r6.xyzw  
  41: mad r6.xyzw, r4.zwzw, l(0.707000, 0.707000, -0.707000, -0.707000), r3.zwzw  
  42: sample_indexable(texture2d)(float,float,float,float) r7.xyzw, r6.xyxx, t0.xyzw, s0  
  43: sample_indexable(texture2d)(float,float,float,float) r6.xyzw, r6.zwzz, t0.xyzw, s0  
  44: add r5.xyzw, r5.xyzw, r7.xyzw  
  45: add r5.xyzw, r6.xyzw, r5.xyzw  
  46: mad r6.xyzw, r4.zwzw, l(0.000000, 1.000000, -0.000000, -1.000000), r3.zwzw  
  47: mad r3.xyzw, r4.xyzw, l(-0.707000, 0.707000, 0.707000, -0.707000), r3.xyzw  
  48: sample_indexable(texture2d)(float,float,float,float) r4.xyzw, r6.xyxx, t0.xyzw, s0  
  49: sample_indexable(texture2d)(float,float,float,float) r6.xyzw, r6.zwzz, t0.xyzw, s0  
  50: add r4.xyzw, r4.xyzw, r5.xyzw  
  51: add r4.xyzw, r6.xyzw, r4.xyzw  
  52: sample_indexable(texture2d)(float,float,float,float) r5.xyzw, r3.xyxx, t0.xyzw, s0  
  53: sample_indexable(texture2d)(float,float,float,float) r3.xyzw, r3.zwzz, t0.xyzw, s0  
  54: add r4.xyzw, r4.xyzw, r5.xyzw  
  55: add r3.xyzw, r3.xyzw, r4.xyzw  
  56: mad r2.xyzw, r3.xyzw, l(0.062500, 0.062500, 0.062500, 0.062500), r2.xyzw  
  57: mul r0.x, cb3[0].y, l(8.000000)  
  58: mul r0.xy, r0.xxxx, cb3[0].zwzz  
  59: mad r0.z, cb3[1].y, l(0.020000), l(1.000000)  
  60: mul r1.zw, r0.zzzz, r1.xxxy  
  61: mad r1.xy, r1.xyxx, r0.zzzz, cb3[2].xyxx  
  62: mad r3.xy, r1.zwzz, r0.xyxx, r1.xyxx  
  63: mul r0.xy, r0.xyxx, r1.zwzz  
  64: mad r0.xy, r0.xyxx, l(2.000000, 2.000000, 0.000000, 0.000000), r1.xyxx  
  65: sample_indexable(texture2d)(float,float,float,float) r1.xyzw, r1.xyxx, t0.xyzw, s0  
  66: sample_indexable(texture2d)(float,float,float,float) r4.xyzw, r0.xyxx, t0.xyzw, s0  
  67: sample_indexable(texture2d)(float,float,float,float) r3.xyzw, r3.xyxx, t0.xyzw, s0  
  68: add r1.xyzw, r1.xyzw, r3.xyzw  
  69: add r1.xyzw, r4.xyzw, r1.xyzw  
  70: mad r2.xyzw, -r1.xyzw, l(0.333333, 0.333333, 0.333333, 0.333333), r2.xyzw  
  71: mul r1.xyzw, r1.xyzw, l(0.333333, 0.333333, 0.333333, 0.333333)  
  72: mul r0.xyzw, r0.wwww, r2.xyzw  
  73: mad o0.xyzw, cb3[0].yyyy, r0.xyzw, r1.xyzw  
  74: ret  

Two separate constant buffers are being used here. Let's check their values:


Few of them are interesting for us:
cb0_v0.x -> elapsed time (seconds)
cb0_v1.xyzw - viewport & inversed viewport size (aka pixel size)

cb3_v0.x - Rotation around pixel, always set to 1.0.
cb3_v0.y - amount of drunk effect. After triggering it, it does not go on full intensity, but rises from 0.0 to 1.0. This is it.
cv3_v1.xy - pixel offsets (more on this later). This is sin/cos pair, so you can use sincos(time) in shader if you want.
cb3_v2.xy - center of effect, usually float2( 0.5, 0.5 ).

What we want to focus on here is to understand how this works instead of blindly rewriting assembly.

We will start from first lines:

 ps_5_0  
   0: mad r0.x, cb3[0].y, l(-0.100000), l(1.000000)  
   1: mul r0.yz, cb3[1].xxyx, l(0.000000, 0.050000, 0.050000, 0.000000)  
   2: mad r1.xy, v1.xyxx, cb0[1].zwzz, -cb3[2].xyxx  
   3: dp2 r0.w, r1.xyxx, r1.xyxx  
   4: sqrt r1.z, r0.w  

The "0" line is something i called "zoom factor", you'll see why in a minute.
Right after that (line 1), we calculate "rotation offsets". It's just input sin/cos pair multiplied by 0.05.

Lines 2-4: At first, we calculate vector from effect center to texture uv. Then we calculate it's squared distance (3) and regular distance (4) (from center to texel)

Zoomed texture coordinates


Let's take at following assembly:
   8: mul r2.xyzw, r0.yzyz, r1.zzzz  
   9: mad r2.xyzw, r1.xyxy, r0.xxxx, -r2.xyzw  
  10: mul r3.xy, r0.xxxx, r1.xyxx  
  11: mad r3.xyzw, r0.yzyz, r1.zzzz, r3.xyxy  
  12: add r3.xyzw, r3.xyzw, cb3[2].xyxy  
  13: add r2.xyzw, r2.xyzw, cb3[2].xyxy 

Since they're packed this way, we can safely analyse only one pair of floats.
For start, r0.yz are "rotation offsets", r1.z is distance from center to texel, r1.xy is vector from center to texel and r0.x is "zoom factor".

To understand it, let zoomFactor = 1.0 for now, so we can write:
   8: mul r2.xyzw, r0.yzyz, r1.zzzz  
   9: mad r2.xyzw, r1.xyxy, r0.xxxx, -r2.xyzw  
  13: add r2.xyzw, r2.xyzw, cb3[2].xyxy 
r2.xy =
(texel - center) * zoomFactor - rotationOffsets * distanceFromCenter + center; But zoomFactor = 1.0: r2.xy = texel - center - rotationOffsets * distanceFromCenter + center; r2.xy = texel - rotationOffsets * distanceFromCenter;

Similarly for r3.xy:
  10: mul r3.xy, r0.xxxx, r1.xyxx  
  11: mad r3.xyzw, r0.yzyz, r1.zzzz, r3.xyxy  
  12: add r3.xyzw, r3.xyzw, cb3[2].xyxy  

  r3.xy = rotationOffsets * distanceFromCenter + zoomFactor * (texel - center) + center 

  But zoomFactor = 1.0:
  r3.xy = rotationOffsets * distanceFromCenter + texel - center + center
  r3.xy = texel + rotationOffsets * distanceFromCenter

Sweet. So right now we basically have current TextureUV (texel) +/- rotation offsets, but what about zoomFactor? Take a look at line 0.
Basically, zoomFactor = 1.0 - 0.1 * drunkAmount. For maximum drunkAmount, zoomFactor = 0.9 and calculating zoomed texcoords is now:

  baseTexcoordsA = 0.9 * texel + 0.1 * center + rotationOffsets * distanceFromCenter
  baseTexcoordsB = 0.9 * texel + 0.1 * center - rotationOffsets * distanceFromCenter

Or, maybe more intuitive, it's just linear interpolation between normalized texture coordinates and center by some factor. This is to "zoom in" image. The best way to understand it is to play with it, so here is a link to Shadertoy which shows it in action.

Texcoords offset

The whole piece of assembly:
   2: mad r1.xy, v1.xyxx, cb0[1].zwzz, -cb3[2].xyxx
   3: dp2 r0.w, r1.xyxx, r1.xyxx  
   5: mul r0.w, r0.w, l(10.000000)  
   6: min r0.w, r0.w, l(1.000000)  
   7: mul r0.w, r0.w, cb3[0].y  
  14: mul r0.x, r0.w, cb3[0].x  
  15: mul r0.x, r0.x, l(5.000000)           // texcoords offset intensity
  16: mul r4.xyzw, r0.xxxx, cb3[0].zwzw     // texcoords offset

produces some sort of gradient, let's call it "offset intensity mask". Actually, it produces two. One in "r0.w" (we will use it later) and second, 5 times stronger, in r0.x (line 15). The latter actually serves as multiplier for texel size, so it affects offset strength.

Sampling - rotation part


Next, a series of texture sampling goes on. There are actually 2 series per 8 samplings, one in each "side". In HLSL we can write this this way:

   static const float2 pointsAroundPixel[8] =
    {
        float2(1.0, 0.0),
        float2(-1.0, 0.0),
        float2(0.707,  0.707),
        float2(-0.707, -0.707),
        float2(0.0,  1.0),
        float2(0.0, -1.0),
        float2(-0.707, 0.707),
        float2(0.707, -0.707)
    };

    float4 colorA = 0;
    float4 colorB = 0;

    int i=0;
    [unroll] for (i = 0; i < 8; i++)
    {
        colorA += TexColorBuffer.Sample( samplerLinearClamp, baseTexcoordsA + texcoordsOffset * pointsAroundPixel[i] );
    }
    colorA /= 16.0;

    [unroll] for (i = 0; i < 8; i++)
    {
        colorB += TexColorBuffer.Sample( samplerLinearClamp, baseTexcoordsB + texcoordsOffset * pointsAroundPixel[i] );
    }
    colorB /= 16.0;

    float4 rotationPart = colorA + colorB;

Trick is, we add to baseTexcoordsA/B additional offset lying on unit circle around pixel multiplied by previously mentioned "texcoords offset intensity". The further from center the pixel is, the radius of circle around the pixel is larger - we sample it 8 times, which is well visible on stars. The values of pointsAroundPixel (multiplies of 45 degrees):
from: https://en.wikipedia.org/wiki/Unit_circle

Sampling - zooming in/out part

The second part of drunk effect in The Witcher 3 is zooming "in and out". Let's see assembly responsible for that:

  56: mad r2.xyzw, r3.xyzw, l(0.062500, 0.062500, 0.062500, 0.062500), r2.xyzw  // the rotation part is stored in r2 register

  57: mul r0.x, cb3[0].y, l(8.000000)
  58: mul r0.xy, r0.xxxx, cb3[0].zwzz
  59: mad r0.z, cb3[1].y, l(0.020000), l(1.000000)
  60: mul r1.zw, r0.zzzz, r1.xxxy
  61: mad r1.xy, r1.xyxx, r0.zzzz, cb3[2].xyxx
  62: mad r3.xy, r1.zwzz, r0.xyxx, r1.xyxx
  63: mul r0.xy, r0.xyxx, r1.zwzz
  64: mad r0.xy, r0.xyxx, l(2.000000, 2.000000, 0.000000, 0.000000), r1.xyxx
  65: sample_indexable(texture2d)(float,float,float,float) r1.xyzw, r1.xyxx, t0.xyzw, s0
  66: sample_indexable(texture2d)(float,float,float,float) r4.xyzw, r0.xyxx, t0.xyzw, s0
  67: sample_indexable(texture2d)(float,float,float,float) r3.xyzw, r3.xyxx, t0.xyzw, s0
  68: add r1.xyzw, r1.xyzw, r3.xyzw
  69: add r1.xyzw, r4.xyzw, r1.xyzw

We see that we have three separate texture fetches, so, 3 different texture coordinates. Let's analyse how texcoords for them are calculated. But first, some inputs for this part:
  float  zoomInOutScalePixels = drunkEffectAmount * 8.0; // line 57
  float2 zoomInOutScaleNormalizedScreenCoordinates = zoomInOutScalePixels * texelSize.xy; // line 58
  float  zoomInOutAmplitude = 1.0 + 0.02*cos(time); // line 59
  float2 zoomInOutfromCenterToTexel = zoomInOutAmplitude * fromCenterToTexel; // line 60
Few words about inputs. We calculate offset in texels (e.g. 8.0 * texel size) which is later added to base uv. Amplitude simply oscillates between 0.98 and 1.02 to give "zooming" feeling, like with zoomFactor in rotation part.

Let's start from pair #1, r1.xy (line 61)
  r1.xy = fromCenterToTexel * amplitude + center
  r1.xy = (TextureUV - Center) * amplitude + Center // you can insert here zoomInOutfromCenterToTexel
  r1.xy = TextureUV * amplitude - Center * amplitude + Center
  r1.xy = TextureUV * amplitude + Center * 1.0 - Center * amplitude
  r1.xy = TextureUV * amplitude + Center * (1.0 - amplitude)
  
  r1.xy = lerp( TextureUV, Center, amplitude);
  
  So:
  float2 zoomInOutBaseTextureUV = lerp(TextureUV, Center, amplitude);

Let's check out pair #2, r3.xy (line 62)
  r3.xy = (amplitude * fromCenterToTexel) * zoomInOutScaleNormalizedScreenCoordinates
        + zoomInOutBaseTextureUV

  So:
  float2 zoomInOutAddTextureUV0 = zoomInOutBaseTextureUV
                      + zoomInOutfromCenterToTexel*zoomInOutScaleNormalizedScreenCoordinates;


Let's check out pair #3, r0.xy (lines 63-64)
  r0.xy = zoomInOutScaleNormalizedScreenCoordinates * (amplitude * fromCenterToTexel) * 2.0 + zoomInOutBaseTextureUV

  So:
  float2 zoomInOutAddTextureUV1 = zoomInOutBaseTextureUV
  + 2.0*zoomInOutfromCenterToTexel*zoomInOutScaleNormalizedScreenCoordinates
All the three texture fetches are added together, this results is stored in r1 register. It's worth noticing that this pixel shader uses sampler with "clamp" addressing.

Combining all together

So, right now we have result of rotating in r2 register and added 3 fetches of zooming in r1 register. Let's see the end lines of the assembly:
  70: mad r2.xyzw, -r1.xyzw, l(0.333333, 0.333333, 0.333333, 0.333333), r2.xyzw  
  71: mul r1.xyzw, r1.xyzw, l(0.333333, 0.333333, 0.333333, 0.333333)  
  72: mul r0.xyzw, r0.wwww, r2.xyzw  
  73: mad o0.xyzw, cb3[0].yyyy, r0.xyzw, r1.xyzw  
  74: ret  

For additional inputs: r0.w comes from line 7, it's our intensity mask and cb3[0].y is amount of drunk effect.

Let's fiind out how it works.
Okay, my first approach was "brute-force" way:
  float4 finalColor = intensityMask * (rotationPart - zoomingPart);
  finalColor = drunkIntensity * finalColor + zoomingPart;
  
  return finalColor;

But what the heck, nobody writes shaders this way
I took pen & paper and wrote this formula:
  finalColor = effectAmount * [intensityMask * (rotationPart - zoomPart)] + zoomPart
  finalColor = effectAmount * intensityMask * rotationPart - effectAmount * intensityMask * zoomPart + zooomPart

  - Let t = effectAmount * intensityMask
  - So we have:
  finalColor = t * rotationPart - t * zoomPart + zoomPart
  finalColor = t * rotationPart + zoomPart - t * zoomPart
  finalColor = t * rotationPart + (1.0 - t) * zoomPart
  finalColor = lerp( zoomingPart, rotationPart, t )

  - Finally:
  finalColor = lerp(zoomingPart, rotationPart, intensityMask * drunkIntensity);

Phew! That was quite a detailed post but this is over ;)
Personally I have learned something during writing that one and hopefully you too!

The full HLSL source is here if you are interested. I checked it with my HLSLexplorer and although there is no direct 1-1 relation with original shader, the difference is so small (1 line less) that I can safely assume it's working :)

Let me know if you liked it.
Thanks for reading! :)
M.

3 komentarze:

  1. Great work, man! Do you have any pet-project or something you used this shaders for? I'm interested if there any showreel or demo of them applied to your project if there is any.

    OdpowiedzUsuń
    Odpowiedzi
    1. Hey! Thank you very much! :)

      You can take a look at my DX11 framework which I'm using for personal R&D: https://github.com/astralis3d/DX11Framework
      (a bit outdated, but I'll try to post more recent version soon)

      Usuń
  2. Hi, is it possible to reduce graphics requirements for games this way? would be interesting to know.

    OdpowiedzUsuń