Hi,
Let's take a look how drunk effect is implemented in The Witcher 3: Wild Hunt.
If you haven't played it yet,
Evening:
Night:
At first we see "double rotating" image, pretty common when you're not sober in real life. The more distant the pixel is from the center of image, the rotation effect is stronger. I posted the second video at night on purpose, because you can clearly see this rotation on stars (do you see 8 separate points?)
The second part of TW3 drunk effect, maybe not so visible at first sight, is slight zooming in and out. It's visible near the center.
It's probably obvious that this effect is typical postprocess (pixel shader). However, the order of it in pipeline may not be so obvious. It turns out that drunk effect is applied just *after* tonemapping and just before motion blur (the drunk image is input for motion blur).
Let's start the assembly game:
ps_5_0
dcl_globalFlags refactoringAllowed
dcl_constantbuffer cb0[2], immediateIndexed
dcl_constantbuffer cb3[3], immediateIndexed
dcl_sampler s0, mode_default
dcl_resource_texture2d (float,float,float,float) t0
dcl_input_ps_siv v1.xy, position
dcl_output o0.xyzw
dcl_temps 8
0: mad r0.x, cb3[0].y, l(-0.100000), l(1.000000)
1: mul r0.yz, cb3[1].xxyx, l(0.000000, 0.050000, 0.050000, 0.000000)
2: mad r1.xy, v1.xyxx, cb0[1].zwzz, -cb3[2].xyxx
3: dp2 r0.w, r1.xyxx, r1.xyxx
4: sqrt r1.z, r0.w
5: mul r0.w, r0.w, l(10.000000)
6: min r0.w, r0.w, l(1.000000)
7: mul r0.w, r0.w, cb3[0].y
8: mul r2.xyzw, r0.yzyz, r1.zzzz
9: mad r2.xyzw, r1.xyxy, r0.xxxx, -r2.xyzw
10: mul r3.xy, r0.xxxx, r1.xyxx
11: mad r3.xyzw, r0.yzyz, r1.zzzz, r3.xyxy
12: add r3.xyzw, r3.xyzw, cb3[2].xyxy
13: add r2.xyzw, r2.xyzw, cb3[2].xyxy
14: mul r0.x, r0.w, cb3[0].x
15: mul r0.x, r0.x, l(5.000000)
16: mul r4.xyzw, r0.xxxx, cb3[0].zwzw
17: mad r5.xyzw, r4.zwzw, l(1.000000, 0.000000, -1.000000, -0.000000), r2.xyzw
18: sample_indexable(texture2d)(float,float,float,float) r6.xyzw, r5.xyxx, t0.xyzw, s0
19: sample_indexable(texture2d)(float,float,float,float) r5.xyzw, r5.zwzz, t0.xyzw, s0
20: add r5.xyzw, r5.xyzw, r6.xyzw
21: mad r6.xyzw, r4.zwzw, l(0.707000, 0.707000, -0.707000, -0.707000), r2.xyzw
22: sample_indexable(texture2d)(float,float,float,float) r7.xyzw, r6.xyxx, t0.xyzw, s0
23: sample_indexable(texture2d)(float,float,float,float) r6.xyzw, r6.zwzz, t0.xyzw, s0
24: add r5.xyzw, r5.xyzw, r7.xyzw
25: add r5.xyzw, r6.xyzw, r5.xyzw
26: mad r6.xyzw, r4.zwzw, l(0.000000, 1.000000, -0.000000, -1.000000), r2.xyzw
27: mad r2.xyzw, r4.xyzw, l(-0.707000, 0.707000, 0.707000, -0.707000), r2.xyzw
28: sample_indexable(texture2d)(float,float,float,float) r7.xyzw, r6.xyxx, t0.xyzw, s0
29: sample_indexable(texture2d)(float,float,float,float) r6.xyzw, r6.zwzz, t0.xyzw, s0
30: add r5.xyzw, r5.xyzw, r7.xyzw
31: add r5.xyzw, r6.xyzw, r5.xyzw
32: sample_indexable(texture2d)(float,float,float,float) r6.xyzw, r2.xyxx, t0.xyzw, s0
33: sample_indexable(texture2d)(float,float,float,float) r2.xyzw, r2.zwzz, t0.xyzw, s0
34: add r5.xyzw, r5.xyzw, r6.xyzw
35: add r2.xyzw, r2.xyzw, r5.xyzw
36: mul r2.xyzw, r2.xyzw, l(0.062500, 0.062500, 0.062500, 0.062500)
37: mad r5.xyzw, r4.zwzw, l(1.000000, 0.000000, -1.000000, -0.000000), r3.zwzw
38: sample_indexable(texture2d)(float,float,float,float) r6.xyzw, r5.xyxx, t0.xyzw, s0
39: sample_indexable(texture2d)(float,float,float,float) r5.xyzw, r5.zwzz, t0.xyzw, s0
40: add r5.xyzw, r5.xyzw, r6.xyzw
41: mad r6.xyzw, r4.zwzw, l(0.707000, 0.707000, -0.707000, -0.707000), r3.zwzw
42: sample_indexable(texture2d)(float,float,float,float) r7.xyzw, r6.xyxx, t0.xyzw, s0
43: sample_indexable(texture2d)(float,float,float,float) r6.xyzw, r6.zwzz, t0.xyzw, s0
44: add r5.xyzw, r5.xyzw, r7.xyzw
45: add r5.xyzw, r6.xyzw, r5.xyzw
46: mad r6.xyzw, r4.zwzw, l(0.000000, 1.000000, -0.000000, -1.000000), r3.zwzw
47: mad r3.xyzw, r4.xyzw, l(-0.707000, 0.707000, 0.707000, -0.707000), r3.xyzw
48: sample_indexable(texture2d)(float,float,float,float) r4.xyzw, r6.xyxx, t0.xyzw, s0
49: sample_indexable(texture2d)(float,float,float,float) r6.xyzw, r6.zwzz, t0.xyzw, s0
50: add r4.xyzw, r4.xyzw, r5.xyzw
51: add r4.xyzw, r6.xyzw, r4.xyzw
52: sample_indexable(texture2d)(float,float,float,float) r5.xyzw, r3.xyxx, t0.xyzw, s0
53: sample_indexable(texture2d)(float,float,float,float) r3.xyzw, r3.zwzz, t0.xyzw, s0
54: add r4.xyzw, r4.xyzw, r5.xyzw
55: add r3.xyzw, r3.xyzw, r4.xyzw
56: mad r2.xyzw, r3.xyzw, l(0.062500, 0.062500, 0.062500, 0.062500), r2.xyzw
57: mul r0.x, cb3[0].y, l(8.000000)
58: mul r0.xy, r0.xxxx, cb3[0].zwzz
59: mad r0.z, cb3[1].y, l(0.020000), l(1.000000)
60: mul r1.zw, r0.zzzz, r1.xxxy
61: mad r1.xy, r1.xyxx, r0.zzzz, cb3[2].xyxx
62: mad r3.xy, r1.zwzz, r0.xyxx, r1.xyxx
63: mul r0.xy, r0.xyxx, r1.zwzz
64: mad r0.xy, r0.xyxx, l(2.000000, 2.000000, 0.000000, 0.000000), r1.xyxx
65: sample_indexable(texture2d)(float,float,float,float) r1.xyzw, r1.xyxx, t0.xyzw, s0
66: sample_indexable(texture2d)(float,float,float,float) r4.xyzw, r0.xyxx, t0.xyzw, s0
67: sample_indexable(texture2d)(float,float,float,float) r3.xyzw, r3.xyxx, t0.xyzw, s0
68: add r1.xyzw, r1.xyzw, r3.xyzw
69: add r1.xyzw, r4.xyzw, r1.xyzw
70: mad r2.xyzw, -r1.xyzw, l(0.333333, 0.333333, 0.333333, 0.333333), r2.xyzw
71: mul r1.xyzw, r1.xyzw, l(0.333333, 0.333333, 0.333333, 0.333333)
72: mul r0.xyzw, r0.wwww, r2.xyzw
73: mad o0.xyzw, cb3[0].yyyy, r0.xyzw, r1.xyzw
74: ret
Two separate constant buffers are being used here. Let's check their values:
cb0_v0.x -> elapsed time (seconds)
cb0_v1.xyzw - viewport & inversed viewport size (aka pixel size)
cb3_v0.x - Rotation around pixel, always set to 1.0.
cb3_v0.y - amount of drunk effect. After triggering it, it does not go on full intensity, but rises from 0.0 to 1.0. This is it.
cv3_v1.xy - pixel offsets (more on this later). This is sin/cos pair, so you can use sincos(time) in shader if you want.
cb3_v2.xy - center of effect, usually float2( 0.5, 0.5 ).
What we want to focus on here is to understand how this works instead of blindly rewriting assembly.
We will start from first lines:
ps_5_0
0: mad r0.x, cb3[0].y, l(-0.100000), l(1.000000)
1: mul r0.yz, cb3[1].xxyx, l(0.000000, 0.050000, 0.050000, 0.000000)
2: mad r1.xy, v1.xyxx, cb0[1].zwzz, -cb3[2].xyxx
3: dp2 r0.w, r1.xyxx, r1.xyxx
4: sqrt r1.z, r0.w
The "0" line is something i called "zoom factor", you'll see why in a minute.
Right after that (line 1), we calculate "rotation offsets". It's just input sin/cos pair multiplied by 0.05.
Lines 2-4: At first, we calculate vector from effect center to texture uv. Then we calculate it's squared distance (3) and regular distance (4) (from center to texel)
Zoomed texture coordinates
8: mul r2.xyzw, r0.yzyz, r1.zzzz
9: mad r2.xyzw, r1.xyxy, r0.xxxx, -r2.xyzw
10: mul r3.xy, r0.xxxx, r1.xyxx
11: mad r3.xyzw, r0.yzyz, r1.zzzz, r3.xyxy
12: add r3.xyzw, r3.xyzw, cb3[2].xyxy
13: add r2.xyzw, r2.xyzw, cb3[2].xyxy
Since they're packed this way, we can safely analyse only one pair of floats.
For start, r0.yz are "rotation offsets", r1.z is distance from center to texel, r1.xy is vector from center to texel and r0.x is "zoom factor".
To understand it, let zoomFactor = 1.0 for now, so we can write:
r2.xy =8: mul r2.xyzw, r0.yzyz, r1.zzzz 9: mad r2.xyzw, r1.xyxy, r0.xxxx, -r2.xyzw 13: add r2.xyzw, r2.xyzw, cb3[2].xyxy(texel - center) * zoomFactor-rotationOffsets * distanceFromCenter+center; But zoomFactor = 1.0: r2.xy = texel -center- rotationOffsets * distanceFromCenter +center; r2.xy = texel - rotationOffsets * distanceFromCenter;
Similarly for r3.xy:
10: mul r3.xy, r0.xxxx, r1.xyxx
11: mad r3.xyzw, r0.yzyz, r1.zzzz, r3.xyxy
12: add r3.xyzw, r3.xyzw, cb3[2].xyxy
r3.xy = rotationOffsets * distanceFromCenter + zoomFactor * (texel - center) + center
But zoomFactor = 1.0:
r3.xy = rotationOffsets * distanceFromCenter + texel - center + center
r3.xy = texel + rotationOffsets * distanceFromCenter
Sweet. So right now we basically have current TextureUV (texel) +/- rotation offsets, but what about zoomFactor? Take a look at line 0.
Basically, zoomFactor = 1.0 - 0.1 * drunkAmount. For maximum drunkAmount, zoomFactor = 0.9 and calculating zoomed texcoords is now:
baseTexcoordsA = 0.9 * texel + 0.1 * center + rotationOffsets * distanceFromCenter
baseTexcoordsB = 0.9 * texel + 0.1 * center - rotationOffsets * distanceFromCenter
Or, maybe more intuitive, it's just linear interpolation between normalized texture coordinates and center by some factor. This is to "zoom in" image. The best way to understand it is to play with it, so here is a link to Shadertoy which shows it in action.
Texcoords offset
The whole piece of assembly:
2: mad r1.xy, v1.xyxx, cb0[1].zwzz, -cb3[2].xyxx
3: dp2 r0.w, r1.xyxx, r1.xyxx
5: mul r0.w, r0.w, l(10.000000)
6: min r0.w, r0.w, l(1.000000)
7: mul r0.w, r0.w, cb3[0].y
14: mul r0.x, r0.w, cb3[0].x
15: mul r0.x, r0.x, l(5.000000) // texcoords offset intensity
16: mul r4.xyzw, r0.xxxx, cb3[0].zwzw // texcoords offset
produces some sort of gradient, let's call it "offset intensity mask". Actually, it produces two. One in "r0.w" (we will use it later) and second, 5 times stronger, in r0.x (line 15). The latter actually serves as multiplier for texel size, so it affects offset strength.
Sampling - rotation part
Next, a series of texture sampling goes on. There are actually 2 series per 8 samplings, one in each "side". In HLSL we can write this this way:
static const float2 pointsAroundPixel[8] =
{
float2(1.0, 0.0),
float2(-1.0, 0.0),
float2(0.707, 0.707),
float2(-0.707, -0.707),
float2(0.0, 1.0),
float2(0.0, -1.0),
float2(-0.707, 0.707),
float2(0.707, -0.707)
};
float4 colorA = 0;
float4 colorB = 0;
int i=0;
[unroll] for (i = 0; i < 8; i++)
{
colorA += TexColorBuffer.Sample( samplerLinearClamp, baseTexcoordsA + texcoordsOffset * pointsAroundPixel[i] );
}
colorA /= 16.0;
[unroll] for (i = 0; i < 8; i++)
{
colorB += TexColorBuffer.Sample( samplerLinearClamp, baseTexcoordsB + texcoordsOffset * pointsAroundPixel[i] );
}
colorB /= 16.0;
float4 rotationPart = colorA + colorB;
Trick is, we add to baseTexcoordsA/B additional offset lying on unit circle around pixel multiplied by previously mentioned "texcoords offset intensity". The further from center the pixel is, the radius of circle around the pixel is larger - we sample it 8 times, which is well visible on stars. The values of pointsAroundPixel (multiplies of 45 degrees):
| from: https://en.wikipedia.org/wiki/Unit_circle |
Sampling - zooming in/out part
The second part of drunk effect in The Witcher 3 is zooming "in and out". Let's see assembly responsible for that:
56: mad r2.xyzw, r3.xyzw, l(0.062500, 0.062500, 0.062500, 0.062500), r2.xyzw // the rotation part is stored in r2 register
57: mul r0.x, cb3[0].y, l(8.000000)
58: mul r0.xy, r0.xxxx, cb3[0].zwzz
59: mad r0.z, cb3[1].y, l(0.020000), l(1.000000)
60: mul r1.zw, r0.zzzz, r1.xxxy
61: mad r1.xy, r1.xyxx, r0.zzzz, cb3[2].xyxx
62: mad r3.xy, r1.zwzz, r0.xyxx, r1.xyxx
63: mul r0.xy, r0.xyxx, r1.zwzz
64: mad r0.xy, r0.xyxx, l(2.000000, 2.000000, 0.000000, 0.000000), r1.xyxx
65: sample_indexable(texture2d)(float,float,float,float) r1.xyzw, r1.xyxx, t0.xyzw, s0
66: sample_indexable(texture2d)(float,float,float,float) r4.xyzw, r0.xyxx, t0.xyzw, s0
67: sample_indexable(texture2d)(float,float,float,float) r3.xyzw, r3.xyxx, t0.xyzw, s0
68: add r1.xyzw, r1.xyzw, r3.xyzw
69: add r1.xyzw, r4.xyzw, r1.xyzw
We see that we have three separate texture fetches, so, 3 different texture coordinates. Let's analyse how texcoords for them are calculated. But first, some inputs for this part:
Let's start from pair #1, r1.xy (line 61)
float zoomInOutScalePixels = drunkEffectAmount * 8.0; // line 57
float2 zoomInOutScaleNormalizedScreenCoordinates = zoomInOutScalePixels * texelSize.xy; // line 58
float zoomInOutAmplitude = 1.0 + 0.02*cos(time); // line 59
float2 zoomInOutfromCenterToTexel = zoomInOutAmplitude * fromCenterToTexel; // line 60
Few words about inputs. We calculate offset in texels (e.g. 8.0 * texel size) which is later added to base uv. Amplitude simply oscillates between 0.98 and 1.02 to give "zooming" feeling, like with zoomFactor in rotation part.Let's start from pair #1, r1.xy (line 61)
r1.xy = fromCenterToTexel * amplitude + center r1.xy =(TextureUV - Center) * amplitude+ Center // you can insert here zoomInOutfromCenterToTexel r1.xy = TextureUV * amplitude - Center * amplitude + Center r1.xy = TextureUV * amplitude + Center * 1.0 - Center * amplitude r1.xy = TextureUV * amplitude + Center * (1.0 - amplitude) r1.xy = lerp( TextureUV, Center, amplitude); So: float2 zoomInOutBaseTextureUV = lerp(TextureUV, Center, amplitude);
Let's check out pair #2, r3.xy (line 62)
r3.xy = (amplitude * fromCenterToTexel) * zoomInOutScaleNormalizedScreenCoordinates
+ zoomInOutBaseTextureUV
So:
float2 zoomInOutAddTextureUV0 = zoomInOutBaseTextureUV
+ zoomInOutfromCenterToTexel*zoomInOutScaleNormalizedScreenCoordinates;
Let's check out pair #3, r0.xy (lines 63-64)
r0.xy = zoomInOutScaleNormalizedScreenCoordinates * (amplitude * fromCenterToTexel) * 2.0 + zoomInOutBaseTextureUV
So:
float2 zoomInOutAddTextureUV1 = zoomInOutBaseTextureUV
+ 2.0*zoomInOutfromCenterToTexel*zoomInOutScaleNormalizedScreenCoordinates
All the three texture fetches are added together, this results is stored in r1 register. It's worth noticing that this pixel shader uses sampler with "clamp" addressing.Combining all together
So, right now we have result of rotating in r2 register and added 3 fetches of zooming in r1 register. Let's see the end lines of the assembly:
70: mad r2.xyzw, -r1.xyzw, l(0.333333, 0.333333, 0.333333, 0.333333), r2.xyzw
71: mul r1.xyzw, r1.xyzw, l(0.333333, 0.333333, 0.333333, 0.333333)
72: mul r0.xyzw, r0.wwww, r2.xyzw
73: mad o0.xyzw, cb3[0].yyyy, r0.xyzw, r1.xyzw
74: ret
For additional inputs: r0.w comes from line 7, it's our intensity mask and cb3[0].y is amount of drunk effect.
Let's fiind out how it works.
Okay, my first approach was "brute-force" way:
float4 finalColor = intensityMask * (rotationPart - zoomingPart);
finalColor = drunkIntensity * finalColor + zoomingPart;
return finalColor;
But what the heck, nobody writes shaders this way
I took pen & paper and wrote this formula:
finalColor = effectAmount * [intensityMask * (rotationPart - zoomPart)] + zoomPart
finalColor = effectAmount * intensityMask * rotationPart - effectAmount * intensityMask * zoomPart + zooomPart
- Let t = effectAmount * intensityMask
- So we have:
finalColor = t * rotationPart - t * zoomPart + zoomPart
finalColor = t * rotationPart + zoomPart - t * zoomPart
finalColor = t * rotationPart + (1.0 - t) * zoomPart
finalColor = lerp( zoomingPart, rotationPart, t )
- Finally:
finalColor = lerp(zoomingPart, rotationPart, intensityMask * drunkIntensity);
Phew! That was quite a detailed post but this is over ;)
Personally I have learned something during writing that one and hopefully you too!
The full HLSL source is here if you are interested. I checked it with my HLSLexplorer and although there is no direct 1-1 relation with original shader, the difference is so small (1 line less) that I can safely assume it's working :)
Let me know if you liked it.
Thanks for reading! :)
M.
Great work, man! Do you have any pet-project or something you used this shaders for? I'm interested if there any showreel or demo of them applied to your project if there is any.
ReplyDeleteHey! Thank you very much! :)
DeleteYou can take a look at my DX11 framework which I'm using for personal R&D: https://github.com/astralis3d/DX11Framework
(a bit outdated, but I'll try to post more recent version soon)
Hi, is it possible to reduce graphics requirements for games this way? would be interesting to know.
ReplyDelete