poniedziałek, 17 lutego 2020

Reverse engineering the rendering of The Witcher 3, part 16 - shooting stars

This post is a part of the series "Reverse engineering the rendering of The Witcher 3".



In "The Witcher 3" there is a subtle yet nice detail in the sky - shooting stars. Interestingly, they don't seem to appear in "Blood & Wine" DLC.

Check out the video to see them in action:

Let's find out how this effect was achieved.

As you can notice, the head of a shooting star is much brighter than its tail. This is an important property we will take advantage of later.

The agenda for today is quite usual: I'm going to describe some general properties first, then cover geometry-related topics, and in the end I'll move to the pixel shader which has the most of the fun stuff.



1. General

Just a quick overview what's going on.

The shooting stars are drawn in forward pass, just after the skydome and the Sky & the Moon:


DrawIndexed(720) - skydome,
DrawIndexed(2160) - a sphere for the Sky / the Moon,
DrawIndexed(36) - irrelevant here, looks like the Sun's occlusion box (?)
DrawIndexed(12) - shooting star
DrawIndexedInstanced(1116, 1) - cirrus clouds

Similarly as cirrus clouds, each shooting star is being drawn twice in a row.

Before the first draw call

The result of the first draw call

The result of the second draw call

Also, like for many forward-pass elements in this game, the following blend state is being used:




2. Geometry

Geometry-wise, the first thing worth mentioning is that each shooting star is represented by a thin quad with texcoords provided. 4 vertices, 6 indices. It's the simplest quad you can imagine.

Zoom of a shooting star quad
Much larger zoom of a shooting star quad.
You can see wireframe view of line indicating two triangles.

Hey, but there is DrawIndexed(12) there! Does it mean we draw two shooting stars at the same time?

Yes.


In this particular frame one of the shooting stars is completely outside the view frustum.

Let's take a look at the the vertex shader assembly now:
 vs_5_0  
    dcl_globalFlags refactoringAllowed  
    dcl_constantbuffer cb1[9], immediateIndexed  
    dcl_constantbuffer cb2[3], immediateIndexed  
    dcl_constantbuffer cb12[193], immediateIndexed  
    dcl_input v0.xyz  
    dcl_input v1.xyzw  
    dcl_input v2.xy  
    dcl_input v3.xy  
    dcl_input v4.xy  
    dcl_input v5.xyz  
    dcl_input v6.x  
    dcl_input v7.x  
    dcl_output o0.xyzw  
    dcl_output o1.xyzw  
    dcl_output o2.xy  
    dcl_output o3.xyzw  
    dcl_output_siv o4.xyzw, position  
    dcl_temps 5  
   0: mov r0.xyz, v0.xyzx  
   1: mov r0.w, l(1.000000)  
   2: dp4 r1.x, r0.xyzw, cb2[0].xyzw  
   3: dp4 r1.y, r0.xyzw, cb2[1].xyzw  
   4: dp4 r1.z, r0.xyzw, cb2[2].xyzw  
   5: add r0.x, v2.x, v2.y  
   6: add r0.y, -v2.y, v2.x  
   7: add r2.xyz, -r1.zxyz, cb1[8].zxyz  
   8: dp3 r0.z, r2.xyzx, r2.xyzx  
   9: rsq r0.z, r0.z  
  10: mul r2.xyz, r0.zzzz, r2.xyzx  
  11: dp3 r0.z, v5.xyzx, v5.xyzx  
  12: rsq r0.z, r0.z  
  13: mul r3.xyz, r0.zzzz, v5.xyzx  
  14: mul r4.xyz, r2.xyzx, r3.yzxy  
  15: mad r2.xyz, r2.zxyz, r3.zxyz, -r4.xyzx  
  16: dp3 r0.z, r2.xyzx, r2.xyzx  
  17: rsq r0.z, r0.z  
  18: mul r2.xyz, r0.zzzz, r2.xyzx  
  19: mad r0.z, v7.x, v6.x, l(1.000000)  
  20: mul r3.xyz, r0.zzzz, r3.xyzx  
  21: mul r3.xyz, r3.xyzx, v3.xxxx  
  22: mul r2.xyz, r2.xyzx, v3.yyyy  
  23: mad r0.xzw, r3.xxyz, r0.xxxx, r1.xxyz  
  24: mad r0.xyz, r2.xyzx, r0.yyyy, r0.xzwx  
  25: mov r0.w, l(1.000000)  
  26: dp4 o4.x, r0.xyzw, cb1[0].xyzw  
  27: dp4 o4.y, r0.xyzw, cb1[1].xyzw  
  28: dp4 o4.z, r0.xyzw, cb1[2].xyzw  
  29: dp4 o4.w, r0.xyzw, cb1[3].xyzw  
  30: add r0.xyz, r0.xyzx, -cb12[0].xyzx  
  31: dp3 r0.w, r0.xyzx, r0.xyzx  
  32: sqrt r0.w, r0.w  
  33: div r0.xyz, r0.xyzx, r0.wwww  
  34: add r0.w, r0.w, -cb12[22].z  
  35: max r0.w, r0.w, l(0)  
  36: min r0.w, r0.w, cb12[42].z  
  37: dp3 r0.x, cb12[38].xyzx, r0.xyzx  
  38: mul r0.y, abs(r0.x), abs(r0.x)  
  39: mad_sat r1.x, r0.w, l(0.002000), l(-0.300000)  
  40: mul r0.y, r0.y, r1.x  
  41: lt r1.x, l(0), r0.x  
  42: movc r1.yzw, r1.xxxx, cb12[39].xxyz, cb12[41].xxyz  
  43: add r1.yzw, r1.yyzw, -cb12[40].xxyz  
  44: mad r1.yzw, r0.yyyy, r1.yyzw, cb12[40].xxyz  
  45: movc r2.xyz, r1.xxxx, cb12[45].xyzx, cb12[47].xyzx  
  46: add r2.xyz, r2.xyzx, -cb12[46].xyzx  
  47: mad o0.xyz, r0.yyyy, r2.xyzx, cb12[46].xyzx  
  48: ge r0.y, r0.w, cb12[48].y  
  49: if_nz r0.y  
  50:  mad r0.y, r0.z, cb12[22].z, cb12[0].z  
  51:  mul r0.z, r0.w, r0.z  
  52:  mul r0.z, r0.z, l(0.062500)  
  53:  mul r1.x, r0.w, cb12[43].x  
  54:  mul r1.x, r1.x, l(0.062500)  
  55:  add r0.x, r0.x, cb12[42].x  
  56:  add r2.x, cb12[42].x, l(1.000000)  
  57:  div_sat r0.x, r0.x, r2.x  
  58:  add r2.x, -cb12[43].z, cb12[43].y  
  59:  mad r0.x, r0.x, r2.x, cb12[43].z  
  60:  add r0.y, r0.y, cb12[42].y  
  61:  mul r2.x, r0.x, r0.y  
  62:  mul r0.z, r0.x, r0.z  
  63:  mad r3.xyzw, r0.zzzz, l(16.000000, 15.000000, 14.000000, 13.000000), r2.xxxx  
  64:  max r3.xyzw, r3.xyzw, l(0, 0, 0, 0)  
  65:  add r3.xyzw, r3.xyzw, l(1.000000, 1.000000, 1.000000, 1.000000)  
  66:  div_sat r3.xyzw, r1.xxxx, r3.xyzw  
  67:  add r3.xyzw, -r3.xyzw, l(1.000000, 1.000000, 1.000000, 1.000000)  
  68:  mul r2.y, r3.y, r3.x  
  69:  mul r2.y, r3.z, r2.y  
  70:  mul r2.y, r3.w, r2.y  
  71:  mad r3.xyzw, r0.zzzz, l(12.000000, 11.000000, 10.000000, 9.000000), r2.xxxx  
  72:  max r3.xyzw, r3.xyzw, l(0, 0, 0, 0)  
  73:  add r3.xyzw, r3.xyzw, l(1.000000, 1.000000, 1.000000, 1.000000)  
  74:  div_sat r3.xyzw, r1.xxxx, r3.xyzw  
  75:  add r3.xyzw, -r3.xyzw, l(1.000000, 1.000000, 1.000000, 1.000000)  
  76:  mul r2.y, r2.y, r3.x  
  77:  mul r2.y, r3.y, r2.y  
  78:  mul r2.y, r3.z, r2.y  
  79:  mul r2.y, r3.w, r2.y  
  80:  mad r3.xyzw, r0.zzzz, l(8.000000, 7.000000, 6.000000, 5.000000), r2.xxxx  
  81:  max r3.xyzw, r3.xyzw, l(0, 0, 0, 0)  
  82:  add r3.xyzw, r3.xyzw, l(1.000000, 1.000000, 1.000000, 1.000000)  
  83:  div_sat r3.xyzw, r1.xxxx, r3.xyzw  
  84:  add r3.xyzw, -r3.xyzw, l(1.000000, 1.000000, 1.000000, 1.000000)  
  85:  mul r2.y, r2.y, r3.x  
  86:  mul r2.y, r3.y, r2.y  
  87:  mul r2.y, r3.z, r2.y  
  88:  mul r2.y, r3.w, r2.y  
  89:  mad r2.zw, r0.zzzz, l(0.000000, 0.000000, 4.000000, 3.000000), r2.xxxx  
  90:  max r2.zw, r2.zzzw, l(0, 0, 0, 0)  
  91:  add r2.zw, r2.zzzw, l(0.000000, 0.000000, 1.000000, 1.000000)  
  92:  div_sat r2.zw, r1.xxxx, r2.zzzw  
  93:  add r2.zw, -r2.zzzw, l(0.000000, 0.000000, 1.000000, 1.000000)  
  94:  mul r2.y, r2.z, r2.y  
  95:  mul r2.y, r2.w, r2.y  
  96:  mad r2.x, r0.z, l(2.000000), r2.x  
  97:  max r2.x, r2.x, l(0)  
  98:  add r2.x, r2.x, l(1.000000)  
  99:  div_sat r2.x, r1.x, r2.x  
  100:  add r2.x, -r2.x, l(1.000000)  
  101:  mul r2.x, r2.x, r2.y  
  102:  mad r0.x, r0.y, r0.x, r0.z  
  103:  max r0.x, r0.x, l(0)  
  104:  add r0.x, r0.x, l(1.000000)  
  105:  div_sat r0.x, r1.x, r0.x  
  106:  add r0.x, -r0.x, l(1.000000)  
  107:  mad r0.x, -r2.x, r0.x, l(1.000000)  
  108:  add r0.y, r0.w, -cb12[48].y  
  109:  mul_sat r0.y, r0.y, cb12[48].z  
  110: else  
  111:  mov r0.xy, l(1.000000, 0.000000, 0.000000, 0.000000)  
  112: endif  
  113: log r0.x, r0.x  
  114: mul r0.z, r0.x, cb12[42].w  
  115: exp r0.z, r0.z  
  116: mul r0.z, r0.z, r0.y  
  117: mul r0.x, r0.x, cb12[48].x  
  118: exp r0.x, r0.x  
  119: mul o0.w, r0.x, r0.y  
  120: mad_sat r0.xy, r0.zzzz, cb12[189].xzxx, cb12[189].ywyy  
  121: add r2.xyz, -r1.yzwy, cb12[188].xyzx  
  122: mad r2.xyz, r0.xxxx, r2.xyzx, r1.yzwy  
  123: add r0.x, cb12[188].w, l(-1.000000)  
  124: mad r0.x, r0.y, r0.x, l(1.000000)  
  125: mul_sat r2.w, r0.x, r0.z  
  126: lt r0.x, l(0), cb12[192].x  
  127: if_nz r0.x  
  128:  mad_sat r0.xy, r0.zzzz, cb12[191].xzxx, cb12[191].ywyy  
  129:  add r3.xyz, -r1.yzwy, cb12[190].xyzx  
  130:  mad r1.xyz, r0.xxxx, r3.xyzx, r1.yzwy  
  131:  add r0.x, cb12[190].w, l(-1.000000)  
  132:  mad r0.x, r0.y, r0.x, l(1.000000)  
  133:  mul_sat r1.w, r0.x, r0.z  
  134:  add r0.xyzw, -r2.xyzw, r1.xyzw  
  135:  mad o1.xyzw, cb12[192].xxxx, r0.xyzw, r2.xyzw  
  136: else  
  137:  mov o1.xyzw, r2.xyzw  
  138: endif  
  139: mov o3.xyzw, v1.xyzw  
  140: mov o2.xy, v4.yxyy  
  141: ret  

One thing that immediately that can catch an attention is estimating a fog here (lines 30-138). Calculating it on a per-vertex basis makes sense for performance reasons. Also, we don't need such a fog precision - meteroids are usually above Geralt's head and they don't reach horizon.
Aerial params (rgb = color, a = influence) are saved to o0.xyzw whereas fog ones go to o1.xyzw.

o2.xy (line 140) is just texcoords.
o3.xyzw (line 139) is not relevant.

And now, a few words about calculating the world position. The vertex shaders performs billboarding. First of all, input data for billboards come from the vertex buffer - let's take a look at them.

The first one is Position:

As it was mentioned earlier, we have 2 quads here. 8 vertices, 12 indices.
But why the position is the same for each vertex? It's quite simple - this is the position of a center of a quad.

Next, each vertex has an offset from the center to the edge of a quad:

This means that every shooting star has a world space size of (400, 3) units. (XY plane, in The Witcher 3 Z-axis is up)

The last element every vertex has is a world-space direction unit vector which controls a movement of a shooting star:

Since the data come from the CPU side, it's hard to tell how they are calculated.

Now for the billboarding code. The idea is quite simple -first, an unit vector from quad centre to a camera is obtained:
   7: add r2.xyz, -r1.zxyz, cb1[8].zxyz  
   8: dp3 r0.z, r2.xyzx, r2.xyzx  
   9: rsq r0.z, r0.z  
  10: mul r2.xyz, r0.zzzz, r2.xyzx  


Then, an unit tangent vector which controls the movement of a shooting star is retrieved.
Given that this vector is already normalized from CPU side, this normalization is redundant.
  11: dp3 r0.z, v5.xyzx, v5.xyzx  
  12: rsq r0.z, r0.z  
  13: mul r3.xyz, r0.zzzz, v5.xyzx  


Having the two vectors, a cross product is used to determine bitangent vector which is perpendicular to the both input ones.
  14: mul r4.xyz, r2.xyzx, r3.yzxy  
  15: mad r2.xyz, r2.zxyz, r3.zxyz, -r4.xyzx  
  16: dp3 r0.z, r2.xyzx, r2.xyzx  
  17: rsq r0.z, r0.z  
  18: mul r2.xyz, r0.zzzz, r2.xyzx  


Now we have normalized tangent (r3.xyz) and bitangent (r2.xyz) vectors.
Let's introduce Xsize and Ysize, corresponding to TEXCOORD1 input element, so for instance (-200, 1.50).

The final calculation of a world-space position is:
  19: mad r0.z, v7.x, v6.x, l(1.000000)  
  20: mul r3.xyz, r0.zzzz, r3.xyzx  
  21: mul r3.xyz, r3.xyzx, v3.xxxx  
  22: mul r2.xyz, r2.xyzx, v3.yyyy  
  23: mad r0.xzw, r3.xxyz, r0.xxxx, r1.xxyz  
  24: mad r0.xyz, r2.xyzx, r0.yyyy, r0.xzwx  
  25: mov r0.w, l(1.000000)  


Given that r0.x, r0.y and r0.z are equal to 1.0, the final calculation simplifies to:

worldSpacePosition = quadCenter + tangent * Xsize + bitangent * Ysize

The last part is just multiplying the world-space position by a view-projection matrix in order to obtain SV_Position:
  26: dp4 o4.x, r0.xyzw, cb1[0].xyzw  
  27: dp4 o4.y, r0.xyzw, cb1[1].xyzw  
  28: dp4 o4.z, r0.xyzw, cb1[2].xyzw  
  29: dp4 o4.w, r0.xyzw, cb1[3].xyzw  



3. Pixel Shader

As mentioned in the "General" section, the following blend state is being used:

FinalColor = SrcColor * One + DestColor * (1.0 - SrcAlpha) =
FinalColor = SrcColor + DestColor * (1.0 - SrcAlpha)

where SrcColor and SrcAlpha are.respectively .rgb and .a components from the pixel shader whereas DestColor is .rgb color currently residing in the rendertarget.

The primary factor which controls the transparency here is SrcAlpha. Many of the forward shaders from this game calculate this as an opacity and apply it in the end like so:

return float4( color * opacity, opacity );

The shooting stars shader is not an exception here. Following this pattern, let's consider three cases when opacity is equal to 1.0, 0.1, and 0.0, respectively. But first, the output once again for easier comparison:



a) opacity = 1.0

FinalColor = color * opacity + DestColor * (1.0 - opacity) =
FinalColor = color = SrcColor




b) opacity = 0.1

FinalColor = color * opacity + DestColor * (1.0 - opacity) =
FinalColor = 0.1 * color + 0.9 * DestColor




c) opacity = 0.0

FinalColor = color * opacity + DestColor * (1.0 - opacity) = 
FinalColor = DestColor




The fundamental idea of this shader is modelling and using an opacity function opacity(x) which controls an opacity of a pixel along a shooting star. The main requirement is that opacity has to reach maximum values near the end of a star ("head") and has to smoothly fade out to 0.0 ("tail").

It will become clear later as we get through the pixel shader assembly:
 ps_5_0  
    dcl_globalFlags refactoringAllowed  
    dcl_constantbuffer cb0[10], immediateIndexed  
    dcl_constantbuffer cb2[3], immediateIndexed  
    dcl_constantbuffer cb4[2], immediateIndexed  
    dcl_input_ps linear v0.xyzw  
    dcl_input_ps linear v1.xyzw  
    dcl_input_ps linear v2.y  
    dcl_input_ps linear v3.w  
    dcl_output o0.xyzw  
    dcl_temps 4  
   0: mov_sat r0.x, v2.y  
   1: ge r0.y, r0.x, l(0.052579)  
   2: ge r0.z, l(0.965679), r0.x  
   3: and r0.y, r0.z, r0.y  
   4: if_nz r0.y  
   5:  ge r0.y, l(0.878136), r0.x  
   6:  add r0.z, r0.x, l(-0.052579)  
   7:  mul r1.w, r0.z, l(1.211303)  
   8:  mov_sat r0.z, r1.w  
   9:  mad r0.w, r0.z, l(-2.000000), l(3.000000)  
  10:  mul r0.z, r0.z, r0.z  
  11:  mul r0.z, r0.z, r0.w  
  12:  mul r2.x, r0.z, l(0.084642)  
  13:  mov r1.yz, l(0.000000, 0.000000, 0.084642, 0.000000)  
  14:  movc r2.yzw, r0.yyyy, r1.yyzw, l(0.000000, 0.000000, 0.000000, 0.500000)  
  15:  not r0.z, r0.y  
  16:  if_z r0.y  
  17:   ge r0.y, l(0.924339), r0.x  
  18:   add r0.w, r0.x, l(-0.878136)  
  19:   mul r1.w, r0.w, l(21.643608)  
  20:   mov_sat r0.w, r1.w  
  21:   mad r3.x, r0.w, l(-2.000000), l(3.000000)  
  22:   mul r0.w, r0.w, r0.w  
  23:   mul r0.w, r0.w, r3.x  
  24:   mad r1.x, r0.w, l(0.889658), l(0.084642)  
  25:   mov r1.yz, l(0.000000, 0.084642, 0.974300, 0.000000)  
  26:   movc r2.xyzw, r0.yyyy, r1.xyzw, r2.xyzw  
  27:  else  
  28:   mov r2.y, l(0)  
  29:   mov r0.y, l(-1)  
  30:  endif  
  31:  not r0.w, r0.y  
  32:  and r0.z, r0.w, r0.z  
  33:  if_nz r0.z  
  34:   ge r0.y, r0.x, l(0.924339)  
  35:   add r0.x, r0.x, l(-0.924339)  
  36:   mul r1.w, r0.x, l(24.189651)  
  37:   mov_sat r0.x, r1.w  
  38:   mad r0.z, r0.x, l(-2.000000), l(3.000000)  
  39:   mul r0.x, r0.x, r0.x  
  40:   mul r0.x, r0.x, r0.z  
  41:   mad r1.x, r0.x, l(-0.974300), l(0.974300)  
  42:   mov r1.yz, l(0.000000, 0.974300, 0.000000, 0.000000)  
  43:   movc r2.xyzw, r0.yyyy, r1.xyzw, r2.xyzw  
  44:  endif  
  45: else  
  46:  mov r2.yzw, l(0.000000, 0.000000, 0.000000, 0.500000)  
  47:  mov r0.y, l(0)  
  48: endif  
  49: mov_sat r2.w, r2.w  
  50: mad r0.x, r2.w, l(-2.000000), l(3.000000)  
  51: mul r0.z, r2.w, r2.w  
  52: mul r0.x, r0.z, r0.x  
  53: add r0.z, -r2.y, r2.z  
  54: mad r0.x, r0.x, r0.z, r2.y  
  55: movc r0.x, r0.y, r2.x, r0.x  
  56: mad r0.y, cb4[1].x, -cb0[9].w, l(1.000000)  
  57: mul_sat r0.y, r0.y, v3.w  
  58: mul r0.x, r0.y, r0.x  
  59: mul r0.yzw, cb2[2].xxyz, cb4[0].xxxx  
  60: mul r0.x, r0.x, cb2[2].w  
  61: dp3 r1.x, l(0.333000, 0.555000, 0.222000, 0.000000), r0.yzwy  
  62: mad r1.xyz, r1.xxxx, v0.xyzx, -r0.yzwy  
  63: mad r0.yzw, v0.wwww, r1.xxyz, r0.yyzw  
  64: add r1.xyz, -r0.yzwy, v1.xyzx  
  65: mad r0.yzw, v1.wwww, r1.xxyz, r0.yyzw  
  66: mul o0.xyz, r0.xxxx, r0.yzwy  
  67: mov o0.w, r0.x  
  68: ret  

Generally, the shader is a bit (over)complicated and I had some tough time with understanding what's going on. For instance, where do the values like 1.211303, 21.643608, 24.189651 come from?

If we think about using the opacity function, we need one input. This is quite easy - texcoord ranged [0,1] will be helpful here (line 0) so we can apply the function on whole length of a meteroid.

The opacity function has 3 segments/ranges and they are defined using 4 control points:
   // current status: no idea how these are generated  
   const float controlPoint0 = 0.052579;  
   const float controlPoint1 = 0.878136;  
   const float controlPoint2 = 0.924339;  
   const float controlPoint3 = 0.965679;  

I have no idea how they were estimated/calculated. ( :

As you can see in the assembly, the first condition is just a check if the input value is in [controlPoint0 - controlPoint3] range. If not, the opacity is just 0.0.
   // Input for the opacity function
   float y = saturate(Input.Texcoords.y);  // r0.x
     
   // Value of opacity function.  
   // 0 - no change  
   // 1 - full color  
   float opacity = 0.0;  
     
   [branch]   
   if (y >= controlPoint0 && y <= controlPoint3)  
   {  
      ...  

Deciphering the following assembly snippet is essential if we want to find out how the opacity function works:
   6: add r0.z, r0.x, l(-0.052579)   
   7: mul r1.w, r0.z, l(1.211303)   
   8: mov_sat r0.z, r1.w   
   9: mad r0.w, r0.z, l(-2.000000), l(3.000000)   
  10: mul r0.z, r0.z, r0.z   
  11: mul r0.z, r0.z, r0.w   
  12: mul r2.x, r0.z, l(0.084642)   

In line 9 there are '-2.0' and '3.0' coefficents, which suggests the use of smoothstep function. Well, this is a good guess.

HLSL's smoothstep, with the prototype: ret smoothstep(min, max, x) always clamps x to [min-max] range. Assembly-wise, it subtracts min from the input value (which is r0.z at line 9) but there is nothing like that here. For max this implies multiplying the input value, but also, there is nothing like 'mul_sat' here. Instead, there is 'mov_sat'. This gives a clue that the smoothstep's min and max here are 0 and 1, respectively.

We know now that x should be in between [0, 1]. As stated earlier, there are 3 segments in the opacity function. This strongly suggests finding where are we in [segmentStart-segmentEnd].

Linstep function is the answer!
 float linstep(float min, float max, float v)  
 {  
   return ( (v-min) / (max-min) );  
 }  

For example, let's take the first segment: [0.052579 - 0.878136]. Subtraction is at line 6. If we replace the division to multiplication -> 1.0 / (0.878136 - 0.052579) = 1.0 / 0.825557 = ~1.211303.

The result of the smoothstep is in [0, 1] range. The multiplication at line 12 is a weight of a segment. Each segment has its weight which allows to control the maximum opacity in this particular segment.

This means, that for the first segment [0.052579 - 0.878136] its opacity ranges [0 - 0.084642].

A HLSL function which calculates an opacity for an arbitrary segment can be written like so:
 float getOpacityFunctionValue(float x, float cpLeft, float cpRight, float weight)  
 {  
   float val = smoothstep( 0, 1, linstep(cpLeft, cpRight, x) );  
   return val * weight;  
 }  

So the whole idea is to just call this function for a proper segment.

A quick look at the weights:
   const float weight0 = 0.084642;  
   const float weight1 = 0.889658;  
   const float weight2 = 0.974300; // note: weight0+weight1 = weight2  

Following the assembly, the opacity(x) function is calculated like so:
   float opacity = 0.0;

   [branch]   
   if (y >= controlPoint0 && y <= controlPoint3)  
   {  
     // Range of v: [0, weight0]  
     float v = getOpacityFunctionValue(y, controlPoint0, controlPoint1, weight0);  
     opacity = v;  
     
     [branch]  
     if ( y >= controlPoint1 )  
     {  
       // Range of v: [0, weight1]  
       float v = getOpacityFunctionValue(y, controlPoint1, controlPoint2, weight1);  
       opacity = weight0 + v;  
   
       [branch]  
       if (y >= controlPoint2)  
       {  
         // Range of v: [0, weight2]  
         float v = getOpacityFunctionValue(y, controlPoint2, controlPoint3, weight2);
         opacity = weight2 - v;          
       }  
     }  
   }  


Here is the graph of the opacity function. You can easily notice the boost of the opacity which indicates a 'head' of a shooting star:
The opacity function graph.
red - opacity value
green - control points
blue - weights

Once the opacity has been calculated, the rest is just final touches. There are extra multiplications: stars opacity, shooting star color and fog influence. As usual in TW3 shaders, you can also find some redundant multiplications by 1.0:
   // cb4_v1.x = 1.0  
   float starsOpacity = 1.0 - cb0_v9.w * cb4_v1.x;    
   opacity *= starsOpacity;  

   // Calculate color of a shooting star  
   // cb4_v0.x = 10.0
   // cb2_v2.rgb = (1.0, 1.0, 1.0)
   float3 color = cb2_v2.rgb * cb4_v0.x;
     
   // cb2_v2.w = 1  
   opacity *= cb2_v2.w;
     
   FogResult fr = { Input.FogParams, Input.AerialParams };  
   color = ApplyFog(fr, color);
     
   return float4( color*opacity, opacity);  
 }   



4. Summary

The major difficulty was the opacity function part. Once this is deciphered, all the rest is quite easy to find out.

I've mentioned earlier that the pixel shader is a bit overcomplicated. All we really care is just value of opacity(x) function, which is stored in r2.x (starting from line 49). However, the opacity function in the assembly produces three extra variables: minRange (r2.y), maxRange (r2.z) and value (r2.w). They are the parameters used to calculate opacity when opacity(x) is not used :

lerp( minRange, maxRange, smoothstep(0, 1, value) );

In fact, the final opacity value is obtained in the conditional move at line 55 - if input x value is in between [controlPoint0 - controlPoint3] range, this means that the opacity funcion is used, so r2.x is picked. Otherwise, if x is out of the range, opacity calculated from the r0.x, ergo, the equation above.

I debugged a few pixels outside of  [controlPoint0 - controlPoint3] range and the final opacity was always equal to zero. That's why I think the extra calculations just described are not necessary. Just opacity = 0.0 at the start of the shader works the same.

That's all for today :)
And as always, thanks for reading.

1 komentarz: