## poniedziałek, 17 lutego 2020

### Reverse engineering the rendering of The Witcher 3, part 16 - shooting stars

This post is a part of the series "Reverse engineering the rendering of The Witcher 3".

In "The Witcher 3" there is a subtle yet nice detail in the sky - shooting stars. Interestingly, they don't seem to appear in "Blood & Wine" DLC.

Check out the video to see them in action:

Let's find out how this effect was achieved.

As you can notice, the head of a shooting star is much brighter than its tail. This is an important property we will take advantage of later.

The agenda for today is quite usual: I'm going to describe some general properties first, then cover geometry-related topics, and in the end I'll move to the pixel shader which has the most of the fun stuff.

### 1. General

Just a quick overview what's going on.

The shooting stars are drawn in forward pass, just after the skydome and the Sky & the Moon:

DrawIndexed(720) - skydome,
DrawIndexed(2160) - a sphere for the Sky / the Moon,
DrawIndexed(36) - irrelevant here, looks like the Sun's occlusion box (?)
DrawIndexed(12) - shooting star
DrawIndexedInstanced(1116, 1) - cirrus clouds

Similarly as cirrus clouds, each shooting star is being drawn twice in a row.

 Before the first draw call

 The result of the first draw call

 The result of the second draw call

Also, like for many forward-pass elements in this game, the following blend state is being used:

### 2. Geometry

Geometry-wise, the first thing worth mentioning is that each shooting star is represented by a thin quad with texcoords provided. 4 vertices, 6 indices. It's the simplest quad you can imagine.

 Zoom of a shooting star quad
 Much larger zoom of a shooting star quad. You can see wireframe view of line indicating two triangles.

Hey, but there is DrawIndexed(12) there! Does it mean we draw two shooting stars at the same time?

Yes.

In this particular frame one of the shooting stars is completely outside the view frustum.

Let's take a look at the the vertex shader assembly now:
vs_5_0
dcl_globalFlags refactoringAllowed
dcl_constantbuffer cb1[9], immediateIndexed
dcl_constantbuffer cb2[3], immediateIndexed
dcl_constantbuffer cb12[193], immediateIndexed
dcl_input v0.xyz
dcl_input v1.xyzw
dcl_input v2.xy
dcl_input v3.xy
dcl_input v4.xy
dcl_input v5.xyz
dcl_input v6.x
dcl_input v7.x
dcl_output o0.xyzw
dcl_output o1.xyzw
dcl_output o2.xy
dcl_output o3.xyzw
dcl_output_siv o4.xyzw, position
dcl_temps 5
0: mov r0.xyz, v0.xyzx
1: mov r0.w, l(1.000000)
2: dp4 r1.x, r0.xyzw, cb2[0].xyzw
3: dp4 r1.y, r0.xyzw, cb2[1].xyzw
4: dp4 r1.z, r0.xyzw, cb2[2].xyzw
8: dp3 r0.z, r2.xyzx, r2.xyzx
9: rsq r0.z, r0.z
10: mul r2.xyz, r0.zzzz, r2.xyzx
11: dp3 r0.z, v5.xyzx, v5.xyzx
12: rsq r0.z, r0.z
13: mul r3.xyz, r0.zzzz, v5.xyzx
14: mul r4.xyz, r2.xyzx, r3.yzxy
15: mad r2.xyz, r2.zxyz, r3.zxyz, -r4.xyzx
16: dp3 r0.z, r2.xyzx, r2.xyzx
17: rsq r0.z, r0.z
18: mul r2.xyz, r0.zzzz, r2.xyzx
19: mad r0.z, v7.x, v6.x, l(1.000000)
20: mul r3.xyz, r0.zzzz, r3.xyzx
21: mul r3.xyz, r3.xyzx, v3.xxxx
22: mul r2.xyz, r2.xyzx, v3.yyyy
23: mad r0.xzw, r3.xxyz, r0.xxxx, r1.xxyz
24: mad r0.xyz, r2.xyzx, r0.yyyy, r0.xzwx
25: mov r0.w, l(1.000000)
26: dp4 o4.x, r0.xyzw, cb1[0].xyzw
27: dp4 o4.y, r0.xyzw, cb1[1].xyzw
28: dp4 o4.z, r0.xyzw, cb1[2].xyzw
29: dp4 o4.w, r0.xyzw, cb1[3].xyzw
31: dp3 r0.w, r0.xyzx, r0.xyzx
32: sqrt r0.w, r0.w
33: div r0.xyz, r0.xyzx, r0.wwww
35: max r0.w, r0.w, l(0)
36: min r0.w, r0.w, cb12[42].z
37: dp3 r0.x, cb12[38].xyzx, r0.xyzx
38: mul r0.y, abs(r0.x), abs(r0.x)
39: mad_sat r1.x, r0.w, l(0.002000), l(-0.300000)
40: mul r0.y, r0.y, r1.x
41: lt r1.x, l(0), r0.x
42: movc r1.yzw, r1.xxxx, cb12[39].xxyz, cb12[41].xxyz
44: mad r1.yzw, r0.yyyy, r1.yyzw, cb12[40].xxyz
45: movc r2.xyz, r1.xxxx, cb12[45].xyzx, cb12[47].xyzx
47: mad o0.xyz, r0.yyyy, r2.xyzx, cb12[46].xyzx
48: ge r0.y, r0.w, cb12[48].y
49: if_nz r0.y
50:  mad r0.y, r0.z, cb12[22].z, cb12[0].z
51:  mul r0.z, r0.w, r0.z
52:  mul r0.z, r0.z, l(0.062500)
53:  mul r1.x, r0.w, cb12[43].x
54:  mul r1.x, r1.x, l(0.062500)
57:  div_sat r0.x, r0.x, r2.x
59:  mad r0.x, r0.x, r2.x, cb12[43].z
61:  mul r2.x, r0.x, r0.y
62:  mul r0.z, r0.x, r0.z
63:  mad r3.xyzw, r0.zzzz, l(16.000000, 15.000000, 14.000000, 13.000000), r2.xxxx
64:  max r3.xyzw, r3.xyzw, l(0, 0, 0, 0)
65:  add r3.xyzw, r3.xyzw, l(1.000000, 1.000000, 1.000000, 1.000000)
66:  div_sat r3.xyzw, r1.xxxx, r3.xyzw
67:  add r3.xyzw, -r3.xyzw, l(1.000000, 1.000000, 1.000000, 1.000000)
68:  mul r2.y, r3.y, r3.x
69:  mul r2.y, r3.z, r2.y
70:  mul r2.y, r3.w, r2.y
71:  mad r3.xyzw, r0.zzzz, l(12.000000, 11.000000, 10.000000, 9.000000), r2.xxxx
72:  max r3.xyzw, r3.xyzw, l(0, 0, 0, 0)
73:  add r3.xyzw, r3.xyzw, l(1.000000, 1.000000, 1.000000, 1.000000)
74:  div_sat r3.xyzw, r1.xxxx, r3.xyzw
75:  add r3.xyzw, -r3.xyzw, l(1.000000, 1.000000, 1.000000, 1.000000)
76:  mul r2.y, r2.y, r3.x
77:  mul r2.y, r3.y, r2.y
78:  mul r2.y, r3.z, r2.y
79:  mul r2.y, r3.w, r2.y
80:  mad r3.xyzw, r0.zzzz, l(8.000000, 7.000000, 6.000000, 5.000000), r2.xxxx
81:  max r3.xyzw, r3.xyzw, l(0, 0, 0, 0)
82:  add r3.xyzw, r3.xyzw, l(1.000000, 1.000000, 1.000000, 1.000000)
83:  div_sat r3.xyzw, r1.xxxx, r3.xyzw
84:  add r3.xyzw, -r3.xyzw, l(1.000000, 1.000000, 1.000000, 1.000000)
85:  mul r2.y, r2.y, r3.x
86:  mul r2.y, r3.y, r2.y
87:  mul r2.y, r3.z, r2.y
88:  mul r2.y, r3.w, r2.y
89:  mad r2.zw, r0.zzzz, l(0.000000, 0.000000, 4.000000, 3.000000), r2.xxxx
90:  max r2.zw, r2.zzzw, l(0, 0, 0, 0)
91:  add r2.zw, r2.zzzw, l(0.000000, 0.000000, 1.000000, 1.000000)
92:  div_sat r2.zw, r1.xxxx, r2.zzzw
93:  add r2.zw, -r2.zzzw, l(0.000000, 0.000000, 1.000000, 1.000000)
94:  mul r2.y, r2.z, r2.y
95:  mul r2.y, r2.w, r2.y
96:  mad r2.x, r0.z, l(2.000000), r2.x
97:  max r2.x, r2.x, l(0)
99:  div_sat r2.x, r1.x, r2.x
101:  mul r2.x, r2.x, r2.y
102:  mad r0.x, r0.y, r0.x, r0.z
103:  max r0.x, r0.x, l(0)
105:  div_sat r0.x, r1.x, r0.x
107:  mad r0.x, -r2.x, r0.x, l(1.000000)
109:  mul_sat r0.y, r0.y, cb12[48].z
110: else
111:  mov r0.xy, l(1.000000, 0.000000, 0.000000, 0.000000)
112: endif
113: log r0.x, r0.x
114: mul r0.z, r0.x, cb12[42].w
115: exp r0.z, r0.z
116: mul r0.z, r0.z, r0.y
117: mul r0.x, r0.x, cb12[48].x
118: exp r0.x, r0.x
119: mul o0.w, r0.x, r0.y
120: mad_sat r0.xy, r0.zzzz, cb12[189].xzxx, cb12[189].ywyy
122: mad r2.xyz, r0.xxxx, r2.xyzx, r1.yzwy
124: mad r0.x, r0.y, r0.x, l(1.000000)
125: mul_sat r2.w, r0.x, r0.z
126: lt r0.x, l(0), cb12[192].x
127: if_nz r0.x
128:  mad_sat r0.xy, r0.zzzz, cb12[191].xzxx, cb12[191].ywyy
130:  mad r1.xyz, r0.xxxx, r3.xyzx, r1.yzwy
132:  mad r0.x, r0.y, r0.x, l(1.000000)
133:  mul_sat r1.w, r0.x, r0.z
135:  mad o1.xyzw, cb12[192].xxxx, r0.xyzw, r2.xyzw
136: else
137:  mov o1.xyzw, r2.xyzw
138: endif
139: mov o3.xyzw, v1.xyzw
140: mov o2.xy, v4.yxyy
141: ret

One thing that immediately that can catch an attention is estimating a fog here (lines 30-138). Calculating it on a per-vertex basis makes sense for performance reasons. Also, we don't need such a fog precision - meteroids are usually above Geralt's head and they don't reach horizon.
Aerial params (rgb = color, a = influence) are saved to o0.xyzw whereas fog ones go to o1.xyzw.

o2.xy (line 140) is just texcoords.
o3.xyzw (line 139) is not relevant.

And now, a few words about calculating the world position. The vertex shaders performs billboarding. First of all, input data for billboards come from the vertex buffer - let's take a look at them.

The first one is Position:

As it was mentioned earlier, we have 2 quads here. 8 vertices, 12 indices.
But why the position is the same for each vertex? It's quite simple - this is the position of a center of a quad.

Next, each vertex has an offset from the center to the edge of a quad:

This means that every shooting star has a world space size of (400, 3) units. (XY plane, in The Witcher 3 Z-axis is up)

The last element every vertex has is a world-space direction unit vector which controls a movement of a shooting star:

Since the data come from the CPU side, it's hard to tell how they are calculated.

Now for the billboarding code. The idea is quite simple -first, an unit vector from quad centre to a camera is obtained:
8: dp3 r0.z, r2.xyzx, r2.xyzx
9: rsq r0.z, r0.z
10: mul r2.xyz, r0.zzzz, r2.xyzx

Then, an unit tangent vector which controls the movement of a shooting star is retrieved.
Given that this vector is already normalized from CPU side, this normalization is redundant.
11: dp3 r0.z, v5.xyzx, v5.xyzx
12: rsq r0.z, r0.z
13: mul r3.xyz, r0.zzzz, v5.xyzx

Having the two vectors, a cross product is used to determine bitangent vector which is perpendicular to the both input ones.
14: mul r4.xyz, r2.xyzx, r3.yzxy
15: mad r2.xyz, r2.zxyz, r3.zxyz, -r4.xyzx
16: dp3 r0.z, r2.xyzx, r2.xyzx
17: rsq r0.z, r0.z
18: mul r2.xyz, r0.zzzz, r2.xyzx

Now we have normalized tangent (r3.xyz) and bitangent (r2.xyz) vectors.
Let's introduce Xsize and Ysize, corresponding to TEXCOORD1 input element, so for instance (-200, 1.50).

The final calculation of a world-space position is:
19: mad r0.z, v7.x, v6.x, l(1.000000)
20: mul r3.xyz, r0.zzzz, r3.xyzx
21: mul r3.xyz, r3.xyzx, v3.xxxx
22: mul r2.xyz, r2.xyzx, v3.yyyy
23: mad r0.xzw, r3.xxyz, r0.xxxx, r1.xxyz
24: mad r0.xyz, r2.xyzx, r0.yyyy, r0.xzwx
25: mov r0.w, l(1.000000)

Given that r0.x, r0.y and r0.z are equal to 1.0, the final calculation simplifies to:

worldSpacePosition = quadCenter + tangent * Xsize + bitangent * Ysize

The last part is just multiplying the world-space position by a view-projection matrix in order to obtain SV_Position:
26: dp4 o4.x, r0.xyzw, cb1[0].xyzw
27: dp4 o4.y, r0.xyzw, cb1[1].xyzw
28: dp4 o4.z, r0.xyzw, cb1[2].xyzw
29: dp4 o4.w, r0.xyzw, cb1[3].xyzw

As mentioned in the "General" section, the following blend state is being used:

FinalColor = SrcColor * One + DestColor * (1.0 - SrcAlpha) =
FinalColor = SrcColor + DestColor * (1.0 - SrcAlpha)

where SrcColor and SrcAlpha are.respectively .rgb and .a components from the pixel shader whereas DestColor is .rgb color currently residing in the rendertarget.

The primary factor which controls the transparency here is SrcAlpha. Many of the forward shaders from this game calculate this as an opacity and apply it in the end like so:

return float4( color * opacity, opacity );

The shooting stars shader is not an exception here. Following this pattern, let's consider three cases when opacity is equal to 1.0, 0.1, and 0.0, respectively. But first, the output once again for easier comparison:

a) opacity = 1.0

FinalColor = color * opacity + DestColor * (1.0 - opacity) =
FinalColor = color = SrcColor

b) opacity = 0.1

FinalColor = color * opacity + DestColor * (1.0 - opacity) =
FinalColor = 0.1 * color + 0.9 * DestColor

c) opacity = 0.0

FinalColor = color * opacity + DestColor * (1.0 - opacity) =
FinalColor = DestColor

The fundamental idea of this shader is modelling and using an opacity function opacity(x) which controls an opacity of a pixel along a shooting star. The main requirement is that opacity has to reach maximum values near the end of a star ("head") and has to smoothly fade out to 0.0 ("tail").

It will become clear later as we get through the pixel shader assembly:
ps_5_0
dcl_globalFlags refactoringAllowed
dcl_constantbuffer cb0[10], immediateIndexed
dcl_constantbuffer cb2[3], immediateIndexed
dcl_constantbuffer cb4[2], immediateIndexed
dcl_input_ps linear v0.xyzw
dcl_input_ps linear v1.xyzw
dcl_input_ps linear v2.y
dcl_input_ps linear v3.w
dcl_output o0.xyzw
dcl_temps 4
0: mov_sat r0.x, v2.y
1: ge r0.y, r0.x, l(0.052579)
2: ge r0.z, l(0.965679), r0.x
3: and r0.y, r0.z, r0.y
4: if_nz r0.y
5:  ge r0.y, l(0.878136), r0.x
7:  mul r1.w, r0.z, l(1.211303)
8:  mov_sat r0.z, r1.w
9:  mad r0.w, r0.z, l(-2.000000), l(3.000000)
10:  mul r0.z, r0.z, r0.z
11:  mul r0.z, r0.z, r0.w
12:  mul r2.x, r0.z, l(0.084642)
13:  mov r1.yz, l(0.000000, 0.000000, 0.084642, 0.000000)
14:  movc r2.yzw, r0.yyyy, r1.yyzw, l(0.000000, 0.000000, 0.000000, 0.500000)
15:  not r0.z, r0.y
16:  if_z r0.y
17:   ge r0.y, l(0.924339), r0.x
19:   mul r1.w, r0.w, l(21.643608)
20:   mov_sat r0.w, r1.w
21:   mad r3.x, r0.w, l(-2.000000), l(3.000000)
22:   mul r0.w, r0.w, r0.w
23:   mul r0.w, r0.w, r3.x
24:   mad r1.x, r0.w, l(0.889658), l(0.084642)
25:   mov r1.yz, l(0.000000, 0.084642, 0.974300, 0.000000)
26:   movc r2.xyzw, r0.yyyy, r1.xyzw, r2.xyzw
27:  else
28:   mov r2.y, l(0)
29:   mov r0.y, l(-1)
30:  endif
31:  not r0.w, r0.y
32:  and r0.z, r0.w, r0.z
33:  if_nz r0.z
34:   ge r0.y, r0.x, l(0.924339)
36:   mul r1.w, r0.x, l(24.189651)
37:   mov_sat r0.x, r1.w
38:   mad r0.z, r0.x, l(-2.000000), l(3.000000)
39:   mul r0.x, r0.x, r0.x
40:   mul r0.x, r0.x, r0.z
41:   mad r1.x, r0.x, l(-0.974300), l(0.974300)
42:   mov r1.yz, l(0.000000, 0.974300, 0.000000, 0.000000)
43:   movc r2.xyzw, r0.yyyy, r1.xyzw, r2.xyzw
44:  endif
45: else
46:  mov r2.yzw, l(0.000000, 0.000000, 0.000000, 0.500000)
47:  mov r0.y, l(0)
48: endif
49: mov_sat r2.w, r2.w
50: mad r0.x, r2.w, l(-2.000000), l(3.000000)
51: mul r0.z, r2.w, r2.w
52: mul r0.x, r0.z, r0.x
54: mad r0.x, r0.x, r0.z, r2.y
55: movc r0.x, r0.y, r2.x, r0.x
56: mad r0.y, cb4[1].x, -cb0[9].w, l(1.000000)
57: mul_sat r0.y, r0.y, v3.w
58: mul r0.x, r0.y, r0.x
59: mul r0.yzw, cb2[2].xxyz, cb4[0].xxxx
60: mul r0.x, r0.x, cb2[2].w
61: dp3 r1.x, l(0.333000, 0.555000, 0.222000, 0.000000), r0.yzwy
62: mad r1.xyz, r1.xxxx, v0.xyzx, -r0.yzwy
63: mad r0.yzw, v0.wwww, r1.xxyz, r0.yyzw
65: mad r0.yzw, v1.wwww, r1.xxyz, r0.yyzw
66: mul o0.xyz, r0.xxxx, r0.yzwy
67: mov o0.w, r0.x
68: ret

Generally, the shader is a bit (over)complicated and I had some tough time with understanding what's going on. For instance, where do the values like 1.211303, 21.643608, 24.189651 come from?

If we think about using the opacity function, we need one input. This is quite easy - texcoord ranged [0,1] will be helpful here (line 0) so we can apply the function on whole length of a meteroid.

The opacity function has 3 segments/ranges and they are defined using 4 control points:
// current status: no idea how these are generated
const float controlPoint0 = 0.052579;
const float controlPoint1 = 0.878136;
const float controlPoint2 = 0.924339;
const float controlPoint3 = 0.965679;

I have no idea how they were estimated/calculated. ( :

As you can see in the assembly, the first condition is just a check if the input value is in [controlPoint0 - controlPoint3] range. If not, the opacity is just 0.0.
// Input for the opacity function
float y = saturate(Input.Texcoords.y);  // r0.x

// Value of opacity function.
// 0 - no change
// 1 - full color
float opacity = 0.0;

[branch]
if (y >= controlPoint0 && y <= controlPoint3)
{
...

Deciphering the following assembly snippet is essential if we want to find out how the opacity function works:
7: mul r1.w, r0.z, l(1.211303)
8: mov_sat r0.z, r1.w
9: mad r0.w, r0.z, l(-2.000000), l(3.000000)
10: mul r0.z, r0.z, r0.z
11: mul r0.z, r0.z, r0.w
12: mul r2.x, r0.z, l(0.084642)

In line 9 there are '-2.0' and '3.0' coefficents, which suggests the use of smoothstep function. Well, this is a good guess.

HLSL's smoothstep, with the prototype: ret smoothstep(min, max, x) always clamps x to [min-max] range. Assembly-wise, it subtracts min from the input value (which is r0.z at line 9) but there is nothing like that here. For max this implies multiplying the input value, but also, there is nothing like 'mul_sat' here. Instead, there is 'mov_sat'. This gives a clue that the smoothstep's min and max here are 0 and 1, respectively.

We know now that x should be in between [0, 1]. As stated earlier, there are 3 segments in the opacity function. This strongly suggests finding where are we in [segmentStart-segmentEnd].

float linstep(float min, float max, float v)
{
return ( (v-min) / (max-min) );
}

For example, let's take the first segment: [0.052579 - 0.878136]. Subtraction is at line 6. If we replace the division to multiplication -> 1.0 / (0.878136 - 0.052579) = 1.0 / 0.825557 = ~1.211303.

The result of the smoothstep is in [0, 1] range. The multiplication at line 12 is a weight of a segment. Each segment has its weight which allows to control the maximum opacity in this particular segment.

This means, that for the first segment [0.052579 - 0.878136] its opacity ranges [0 - 0.084642].

A HLSL function which calculates an opacity for an arbitrary segment can be written like so:
float getOpacityFunctionValue(float x, float cpLeft, float cpRight, float weight)
{
float val = smoothstep( 0, 1, linstep(cpLeft, cpRight, x) );
return val * weight;
}

So the whole idea is to just call this function for a proper segment.

A quick look at the weights:
const float weight0 = 0.084642;
const float weight1 = 0.889658;
const float weight2 = 0.974300; // note: weight0+weight1 = weight2

Following the assembly, the opacity(x) function is calculated like so:
float opacity = 0.0;

[branch]
if (y >= controlPoint0 && y <= controlPoint3)
{
// Range of v: [0, weight0]
float v = getOpacityFunctionValue(y, controlPoint0, controlPoint1, weight0);
opacity = v;

[branch]
if ( y >= controlPoint1 )
{
// Range of v: [0, weight1]
float v = getOpacityFunctionValue(y, controlPoint1, controlPoint2, weight1);
opacity = weight0 + v;

[branch]
if (y >= controlPoint2)
{
// Range of v: [0, weight2]
float v = getOpacityFunctionValue(y, controlPoint2, controlPoint3, weight2);
opacity = weight2 - v;
}
}
}

Here is the graph of the opacity function. You can easily notice the boost of the opacity which indicates a 'head' of a shooting star:
 The opacity function graph. red - opacity value green - control points blue - weights

Once the opacity has been calculated, the rest is just final touches. There are extra multiplications: stars opacity, shooting star color and fog influence. As usual in TW3 shaders, you can also find some redundant multiplications by 1.0:
// cb4_v1.x = 1.0
float starsOpacity = 1.0 - cb0_v9.w * cb4_v1.x;
opacity *= starsOpacity;

// Calculate color of a shooting star
// cb4_v0.x = 10.0
// cb2_v2.rgb = (1.0, 1.0, 1.0)
float3 color = cb2_v2.rgb * cb4_v0.x;

// cb2_v2.w = 1
opacity *= cb2_v2.w;

FogResult fr = { Input.FogParams, Input.AerialParams };
color = ApplyFog(fr, color);

return float4( color*opacity, opacity);
}

### 4. Summary

The major difficulty was the opacity function part. Once this is deciphered, all the rest is quite easy to find out.

I've mentioned earlier that the pixel shader is a bit overcomplicated. All we really care is just value of opacity(x) function, which is stored in r2.x (starting from line 49). However, the opacity function in the assembly produces three extra variables: minRange (r2.y), maxRange (r2.z) and value (r2.w). They are the parameters used to calculate opacity when opacity(x) is not used :

lerp( minRange, maxRange, smoothstep(0, 1, value) );

In fact, the final opacity value is obtained in the conditional move at line 55 - if input x value is in between [controlPoint0 - controlPoint3] range, this means that the opacity funcion is used, so r2.x is picked. Otherwise, if x is out of the range, opacity calculated from the r0.x, ergo, the equation above.

I debugged a few pixels outside of  [controlPoint0 - controlPoint3] range and the final opacity was always equal to zero. That's why I think the extra calculations just described are not necessary. Just opacity = 0.0 at the start of the shader works the same.

That's all for today :)
And as always, thanks for reading.

#### 1 komentarz:

1. Excellent summary - fascinating!