czwartek, 5 marca 2020

Reverse engineering the rendering of The Witcher 3, part 17 - the Milky Way

This post is a part of the series "Reverse engineering the rendering of The Witcher 3".


In the previous post I explained how shooting stars are implemented in The Witcher 3. That effect doesn't appear in "Blood & Wine". In this post I'm going to cover an effect which is exclusive to that DLC: the Milky Way

Here is a video which presents Milky Way in the action.


And a few screenshots: (1) before the skydome draw call, (2) with the Milky Way color only, (3) after the call:





The final frame with the Milky Way only (no sky color, no stars...) looks like this:


The Milky Way effect, which is one of major changes comparing to the '2015' version of the game was briefly mentioned in part 12. Let's find out how it's implemented!

The agenda is quite usual: geometry-related stuff will be briefly covered first, pixel shader afterwards.




1. Geometry

Let's start from the skydome mesh being used. There are two major differences between the one from 2015 (base game + Hearts of Stone, I like to refer to it as simply '2015') and the Blood & Wine (2016) one:

a)  B&W one is much more dense,
b)  B&W one has normal vectors being used.

Here is the skydome mesh from 2015 - DrawIndexed(720)
The Witcher 3 '2015' skydome mesh - 720 indices

Here is the one from B&W - DrawIndexed(2640):
The Witcher 3 B&W skydome mesh - 2640 indices

Here is B&W one again - I've drawn how normals are distributed - they point to 'the centre' of the mesh.
The Witcher 3 B&W skydome mesh with normals




2. Vertex shader

The vertex shader for skydome is quite simple.
Here is the relevant assembly code. I omitted calculating SV_Position for simplicity's sake:
 vs_5_0  
    dcl_globalFlags refactoringAllowed  
    dcl_constantbuffer cb1[4], immediateIndexed  
    dcl_constantbuffer cb2[6], immediateIndexed  
    dcl_input v0.xyz  
    dcl_input v1.xy  
    dcl_input v2.xyz  
    dcl_output o0.xyzw  
    dcl_output o1.xyzw  
    dcl_output_siv o2.xyzw, position  
    dcl_temps 3  
   0: mov o0.xy, v1.xyxx  
   1: mad r0.xyz, v0.xyzx, cb2[4].xyzx, cb2[5].xyzx  
   2: mov r0.w, l(1.000000)  
   3: dp4 o0.z, r0.xyzw, cb2[0].xyzw  
   4: dp4 o0.w, r0.xyzw, cb2[1].xyzw  
   5: mad r1.xyz, v2.xyzx, l(2.000000, 2.000000, 2.000000, 0.000000), l(-1.000000, -1.000000, -1.000000, 0.000000)  
   6: dp3 r2.x, r1.xyzx, cb2[0].xyzx  
   7: dp3 r2.y, r1.xyzx, cb2[1].xyzx  
   8: dp3 r2.z, r1.xyzx, cb2[2].xyzx  
   9: dp3 r1.x, r2.xyzx, r2.xyzx  
  10: rsq r1.x, r1.x  
  11: mul o1.xyz, r1.xxxx, r2.xyzx  
  12: dp4 o1.w, r0.xyzw, cb2[2].xyzw  

The inputs from vertex buffer are:
1) Position in local space [0-1] - v0.xyz,
2) Texcoords - v1.xy,
3) Normal vector [0-1] - v2.xyz

The inputs from cbuffer:
1) World matrix (0-3) - classic approach: uniform scaling and translation by camera position,
2) Scale & Bias for a vertex (4-5) - a trick used throughout the game to rescale from [0-1] local space to [-1;1] one and potentially 'squueze' meshes.

And a quick explanatiion what is going on in the shader:

The shader starts with a simple passing texcoords further (line 0). For a vertex's world position, scale and bias are applied to it (line 1) and the result is multiplied by a world matrix (lines 3-4, 12). For a normal vector, it has to be remapped from [0-1] to [-1;1] range (line 5), then it's multiplied by the world matrix (lines 6-8) and it's normalized in the end (lines 9-11).

The final output data layout is:




3. Pixel shader

Calculations of the Milky Way are just one part of the sky shader. B&W one is much longer than 2015 version. Its length is 385 assembly lines, comparing to 267 of the 2015 variant.

Let's see the fragment of the assembly responsible for the Milky Way:
  175: sample_indexable(texturecube)(float,float,float,float) r4.xyz, r2.xyzx, t0.xyzw, s0  
  176: mul r4.xyz, r4.xyzx, r4.xyzx  
  177: sample_indexable(texturecube)(float,float,float,float) r0.w, r2.xyzx, t1.yzwx, s0  
  178: dp3 r1.w, v1.xyzx, v1.xyzx  
  179: rsq r1.w, r1.w  
  180: mul r2.xyz, r1.wwww, v1.xyzx  
  181: dp3 r1.w, cb12[204].yzwy, cb12[204].yzwy  
  182: rsq r1.w, r1.w  
  183: mul r5.xyz, r1.wwww, cb12[204].yzwy  
  184: dp3 r1.w, r2.xyzx, r5.xyzx  
  185: mad_sat r0.w, r0.w, l(0.200000), r1.w  
  186: ge r1.w, l(0.497925), r0.w  
  187: if_nz r1.w  
  188:  ge r1.w, l(0.184939), r0.w  
  189:  mul r2.y, r0.w, l(5.407188)  
  190:  min r2.z, r2.y, l(1.000000)  
  191:  mad r2.w, r2.z, l(-2.000000), l(3.000000)  
  192:  mul r2.z, r2.z, r2.z  
  193:  mul r2.z, r2.z, r2.w  
  194:  mul r5.xyz, r2.zzzz, l(0.949254, 0.949254, 0.949254, 0.000000)  
  195:  mov r2.x, l(0.949254)  
  196:  movc r2.xw, r1.wwww, r2.xxxy, l(0.000000, 0.000000, 0.000000, 0.500000)  
  197:  not r4.w, r1.w  
  198:  if_z r1.w  
  199:   ge r1.w, l(0.239752), r0.w  
  200:   add r5.w, r0.w, l(-0.184939)  
  201:   mul r6.y, r5.w, l(18.243849)  
  202:   mov_sat r5.w, r6.y  
  203:   mad r6.z, r5.w, l(-2.000000), l(3.000000)  
  204:   mul r5.w, r5.w, r5.w  
  205:   mul r5.w, r5.w, r6.z  
  206:   mad r5.w, r5.w, l(-0.113726), l(0.949254)  
  207:   movc r5.xyz, r1.wwww, r5.wwww, r5.zzzz  
  208:   and r7.xyz, r1.wwww, l(0.949254, 0.949254, 0.949254, 0.000000)  
  209:   mov r6.x, l(0.835528)  
  210:   movc r2.xw, r1.wwww, r6.xxxy, r2.xxxw  
  211:   mov r2.xyzw, r2.xxxw  
  212:  else  
  213:   mov r7.xyz, l(0, 0, 0, 0)  
  214:   mov r2.xyzw, r2.xxxw  
  215:   mov r1.w, l(-1)  
  216:  endif  
  217:  not r5.w, r1.w  
  218:  and r4.w, r4.w, r5.w  
  219:  if_nz r4.w  
  220:   ge r5.w, r0.w, l(0.239752)  
  221:   ge r6.x, l(0.294564), r0.w  
  222:   and r1.w, r5.w, r6.x  
  223:   add r5.w, r0.w, l(-0.239752)  
  224:   mul r6.w, r5.w, l(18.244175)  
  225:   mov_sat r5.w, r6.w  
  226:   mad r7.w, r5.w, l(-2.000000), l(3.000000)  
  227:   mul r5.w, r5.w, r5.w  
  228:   mul r5.w, r5.w, r7.w  
  229:   mad r5.w, r5.w, l(0.015873), l(0.835528)  
  230:   movc r5.xyz, r1.wwww, r5.wwww, r5.xyzx  
  231:   movc r7.xyz, r1.wwww, l(0.835528, 0.835528, 0.835528, 0.000000), r7.xyzx  
  232:   mov r6.xyz, l(0.851401, 0.851401, 0.851401, 0.000000)  
  233:   movc r2.xyzw, r1.wwww, r6.xyzw, r2.xyzw  
  234:  endif  
  235:  not r5.w, r1.w  
  236:  and r4.w, r4.w, r5.w  
  237:  if_nz r4.w  
  238:   ge r1.w, r0.w, l(0.294564)  
  239:   add r0.w, r0.w, l(-0.294564)  
  240:   mul r6.w, r0.w, l(4.917364)  
  241:   mov_sat r0.w, r6.w  
  242:   mad r4.w, r0.w, l(-2.000000), l(3.000000)  
  243:   mul r0.w, r0.w, r0.w  
  244:   mul r0.w, r0.w, r4.w  
  245:   mad r0.w, r0.w, l(-0.851401), l(0.851401)  
  246:   movc r5.xyz, r1.wwww, r0.wwww, r5.xyzx  
  247:   movc r7.xyz, r1.wwww, l(0.851401, 0.851401, 0.851401, 0.000000), r7.xyzx  
  248:   mov r6.xyz, l(0, 0, 0, 0)  
  249:   movc r2.xyzw, r1.wwww, r6.xyzw, r2.xyzw  
  250:  endif  
  251: else  
  252:  mov r7.xyz, l(0, 0, 0, 0)  
  253:  mov r2.xyzw, l(0.000000, 0.000000, 0.000000, 0.500000)  
  254:  mov r1.w, l(0)  
  255: endif  
  256: mov_sat r2.w, r2.w  
  257: mad r0.w, r2.w, l(-2.000000), l(3.000000)  
  258: mul r2.w, r2.w, r2.w  
  259: mul r0.w, r0.w, r2.w  
  260: add r2.xyz, -r7.xyzx, r2.xyzx  
  261: mad r2.xyz, r0.wwww, r2.xyzx, r7.xyzx  
  262: movc r2.xyz, r1.wwww, r5.xyzx, r2.xyzx  
  263: mul r2.xyz, r2.xyzx, l(0.150000, 0.200000, 0.250000, 0.000000)  

It's quite intimidating, isn't it? When I saw it for the first time (it was before I saw the shooting stars shader) I thought "what the heck is this?! It's impossible to reverse it!".

But here's the thing - if you have read the shooting stars post, you can easily recognize this pattern. It works very similar to the meteroids! More about the curve in a minute.

The snippet starts from sampling the stars cubemap (line 175) where the sampling direction is stored in r2.xyz. As you can see, at line 177 there is sample instruction for another cubemap. Unlike the 2015 pixel shader, B&W one has one more texture attached - 'noise cubemap' - its faces look similar to this:


Before we get to the curve, let's find the input for it. At first there is a dot product being calculated (line 184) between normalized skydome's normal vector (lines 178-180) and the Moon's light vector (lines 181-183) - essentialy NdotL.

Here is a visualization of the dot product (linear space):

The value used as as input to the "milkyway curve" function is obtained at line 185:

x = saturate( noise * 0.2 + Ndot );

And a visualization of such perturbed NdotL, also in linear space:


Let's go the milky way function right now! It's a bit more complicated than the shooting stars one. As I mentioned in the last post, we start by having a list of control points over x axis. A quick look at the assembly and we have them:
   // Control points (x-axis)  
   float controlPoint0 = 0.0;  
   float controlPoint1 = 0.184939;  
   float controlPoint2 = 0.239752;  
   float controlPoint3 = 0.294564;  
   float controlPoint4 = 0.497925;  

How do we know if the first control points is zero? Well, it's quite simple: there is no 'add' instruction at line 189.

According to the shooting stars post, the control points define a number of segments, now we need to find weights for them.

It's quite simple for the first segment. The weight is 0.949254:
  194: mul r5.xyz, r2.zzzz, l(0.949254, 0.949254, 0.949254, 0.000000)   
  195: mov r2.x, l(0.949254)   

Let's try to find them for the second and third segment:
  206:  mad r5.w, r5.w, l(-0.113726), l(0.949254)   
  207:  movc r5.xyz, r1.wwww, r5.wwww, r5.zzzz   
  208:  and r7.xyz, r1.wwww, l(0.949254, 0.949254, 0.949254, 0.000000)   
  209:  mov r6.x, l(0.835528)   
  ...  
  229:  mad r5.w, r5.w, l(0.015873), l(0.835528)   
  230:  movc r5.xyz, r1.wwww, r5.wwww, r5.xyzx   
  231:  movc r7.xyz, r1.wwww, l(0.835528, 0.835528, 0.835528, 0.000000), r7.xyzx   
  232:  mov r6.xyz, l(0.851401, 0.851401, 0.851401, 0.000000)   

This is exactly the point when I paused writing because something was not right here (one of the 'Hmmmm' moments). Look, it's not as simple as multiplying just by one weight. Also, where does values like -0.113726 and 0.015873 come from?

Then I realised these values are just differences between maximum possible values at each segment ( 0.835528 - 0.949254 = -0.113726  and  0.851401 - 0.835528 = 0.015873)! Kinda obvious (one of the 'Eureka' moments).

It turns out, that these values are not weights, they are just y-coordinates of points which form the curve!

This changes and simplifies so many things. First of all we can get rid of a weight from the function we used in the previous post:
 float getSmoothTransition(float cpLeft, float cpRight, float x)  
 {  
   return smoothstep( 0, 1, linstep(cpLeft, cpRight, x) );  
 }  

And we can write the milkyway function like so:
 float milkyway_curve( float x )  
 {  
   // Define a set of 2D points which form the curve
   // Of course, you can use a Point2D-like struct here
 
   // Control points (x-axis)
   float controlPoint0 = 0.0;  
   float controlPoint1 = 0.184939;  
   float controlPoint2 = 0.239752;  
   float controlPoint3 = 0.294564;  
   float controlPoint4 = 0.497925;  
     
   // Values at points (y-axis)  
   float value0 = 0.0;  
   float value1 = 0.949254;  
   float value2 = 0.835528;  
   float value3 = 0.851401;  
   float value4 = 0.0;  
     
   float function_value = 0.0;  
     
   [branch] if (x <= controlPoint4)  
   {  
     [branch] if (x <= controlPoint1)  
     {  
       float t = getSmoothTransition(controlPoint0, controlPoint1, x);  
       function_value = lerp(value0, value1, t);  
     }  
       
     [branch] if (x >= controlPoint1 && x <= controlPoint2)  
     {  
       float t = getSmoothTransition(controlPoint1, controlPoint2, x);  
       function_value = lerp(value1, value2, t);  
     }  
       
     [branch] if (x >= controlPoint2 && x <= controlPoint3)  
     {  
       float t = getSmoothTransition(controlPoint2, controlPoint3, x);  
       function_value = lerp(value2, value3, t);  
     }  
       
     [branch] if (x >= controlPoint3)  
     {  
       float t = getSmoothTransition(controlPoint3, controlPoint4, x);  
       function_value = lerp(value3, value4, t);       
     }      
   }  
     
   return function_value;  
 }  

This is a general purpose solution for any number of points which form a smooth curve and it also explains the origin of 'weird' values of control points - they probably used some sort of a visual editor to set the points.

Of course, the same idea applies to the shooting stars code.

Here is the graph of the function:
Milkyway function graph.
Red - value of the function,
Green - x-coords
Blue - y-coords
Yellow dots - control points


Okay, what next? At line 263 we multiply a value from the function by blue-ish color:
 263: mul r2.xyz, r2.xyzx, l(0.150000, 0.200000, 0.250000, 0.000000)  

But this is not the end! We just need to perform a gamma correction:
  263: mul r2.xyz, r2.xyzx, l(0.150000, 0.200000, 0.250000, 0.000000)  
  264: mad r2.xyz, r4.xyzx, l(3.000000, 3.000000, 3.000000, 0.000000), r2.xyzx  
  ...  
  269: log r2.xyz, r2.xyzx  
  270: mul r2.xyz, r2.xyzx, l(2.200000, 2.200000, 2.200000, 0.000000)  
  271: exp r2.xyz, r2.xyzx  


Fun stuff now: I assigned different colors to the control points on x-axis:
   float3 gradient0 = float3(1, 0, 0);  
   float3 gradient1 = float3(0, 1, 0);  
   float3 gradient2 = float3(0, 0, 1);  
   float3 gradient3 = float3(1, 1, 0);  
   float3 gradient4 = float3(0, 1, 1);  

And this is what I got:


That's pretty much all for the Milky Way.

At line 264, there is r4.xyz and what is there is...




4. Toussaint stars (bonus)

I know the title says 'the Milky Way' but I just couldn't resist to show briefly how the Toussaint stars are made. They are much more bright comparing to Novigrad, Skellige or Velen.

I explained how the '2015' stars are calculated in part 12, time for 2016 ones!

Actually, the most of the assembly involved is just:
  175: sample_indexable(texturecube)(float,float,float,float) r4.xyz, r2.xyzx, t0.xyzw, s0   
  176: mul r4.xyz, r4.xyzx, r4.xyzx   
  ...  
  264: mad r2.xyz, r4.xyzx, l(3.000000, 3.000000, 3.000000, 0.000000), r2.xyzx   
  ...   
  269: log r2.xyz, r2.xyzx   
  270: mul r2.xyz, r2.xyzx, l(2.200000, 2.200000, 2.200000, 0.000000)   
  271: exp r2.xyz, r2.xyzx   
  ...  
  302: add r0.z, -cb0[9].w, l(1.000000)  
  303: mul r2.xyz, r0.zzzz, r2.xyzx  
  304: add r2.xyz, r2.xyzx, r2.xyzx

In HLSL it can be written like so:
 float3 stars = texStars.Sample(sampler, starsDir).rgb;  
 stars *= stars;  
   
 float3 milkyway = milkyway_func(noisePerturbed) * float3(0.15, 0.20, 0.25);  
 float3 skyContribution = milkyway + 3.0 * stars;  
   
 // gamma correction  
 skyContribution = pow(skyContribution, 2.2);  
   
 // starsOpacity - 0.0 during the day (so stars and the Milky Way are not visible then), 1.0 during the night  
 float starsOpacity = 1.0 - cb0_v9.w;  
 skyContribution *= starsOpacity;  

 skyContribution *= 2;

So the stars themselves are just boosted by 3 (line 264) and then together with the Milky Way contribution by 2 (line 304), the oldschool way but it works great!
Of course, there is a bit more happening later (stars blinking using integer noise etc) but this is far beyond the scope of this post.



Conclusion

In this post I demystified how the Milky Way and stars are implemented in The Witcher 3: Blood & Wine.

Let's replace the original shader with the code we just figured out. The final frame looks like this:

while with the original shader the frame looks like so:

Not bad.

And as always, thanks for reading.

Brak komentarzy:

Prześlij komentarz