This part of the series will be slightly different comparing to the previous ones. Today I'd like to show you some aspects of sky shaders from The Witcher 3.

Why some "stupid tricks" instead of full shader? Well, there are a few reasons. First of all, sky shader in The Witcher 3 is quite a complex beast. Pixel Shader of 2015 version has 267 lines of assembly while PS from "Blood & Wine" DLC - 385.

Moreover, they have quite a lot of inputs which doesn't really help in struggles to reverse engineer complete (and readable!) HLSL code.

Therefore, I decided to show you some tricks from these shaders only. If I find anything new, this post will be updated.

The differences between 2015 version of the game and B&W (2016) addon are quite notable. This includes, for instance, different calculation of stars and their blinking, different approach to rendering of the Sun...

*Blood & Wine*shader also calculates Milky Way during the night.

I'll start with some basics and switch to stupid tricks later.

### Basics

As most of modern video games, The Witcher 3 uses skydome to represent sky. Take a look at hemisphere used for this in The Witcher 3 (2015). On a side note, in this case bounding box of this mesh ranges from [0,0,0] to [1,1,1] (Z is up-axis) and has smoothly distributed UVs. We'll use them later.

The idea behind skydome is similar to skybox (mesh being used is the only difference). During vertex shader we translate a skydome with respect to observer (usually by camera position) which gives an illusion that sky is really far away - we'll never go there.

If you have been following the series for a while you know that The Witcher 3 uses reversed depth - that means, far plane is represented by 0.0f while near plane - by 1.0f. To make sure that output of skydome will be completely on far plane, we set

*MinDepth*the same as

*MaxDepth*of viewport parameters:

To learn how

*MinDepth*and

*MaxDepth*fields are used during viewport transform click here (docs.microsoft.com).

### Vertex Shader

Let's start with vertex shader. In The Witcher 3 (2015) assembly of VS is as follows:

In this scenario VS outputs only texcoords and world-space position. In

Take a look at constant buffer marked as

Here we have world matrix (uniform scaling by 100 and translation by camera position). Nothing fancy. cb2_v4 and cb2_v5 are scale/bias factors which serve to transform positions of vertices from [0-1] range to [-1;1] one. But here, in terms of Z-axis (up) these coefficents will 'squeeze' it.

We have already seen similar VS in previous parts of the series. The general algorithm is to pass texcoords further, then calculate

So, the HLSL for this vertex shader would be something like this:

Comparison of the my shader (left) and the original one (right):

The great thing about RenderDoc is that it allows to inject your own shader instead of original one and your changes do affect the pipeline until the very end of a frame. As you can see in HLSL code, I gave you some options to change scaling and translation of the final geometry. You can play with it and achieve some funny results:

So, we can replace HLSL code:

with this one:

An optimized version produces the following assembly:

As you can see, we reduced number of instructions from 26 to 12 - that's quite a change. I don't know how widespread this problem is in the game but c'mon CD Projekt Red, maybe a patch or something? :)

I'm not kidding here. You can inject my optimized shader instead of original one in RenderDoc and see for yourself that this optimization changes nothing in terms of visuals. Honestly, I don't know why CD Projekt Red decided to do per-vertex matrix-matrix multiplication...

```
vs_5_0
dcl_globalFlags refactoringAllowed
dcl_constantbuffer cb1[4], immediateIndexed
dcl_constantbuffer cb2[6], immediateIndexed
dcl_input v0.xyz
dcl_input v1.xy
dcl_output o0.xy
dcl_output o1.xyz
dcl_output_siv o2.xyzw, position
dcl_temps 2
0: mov o0.xy, v1.xyxx
1: mad r0.xyz, v0.xyzx, cb2[4].xyzx, cb2[5].xyzx
2: mov r0.w, l(1.000000)
3: dp4 o1.x, r0.xyzw, cb2[0].xyzw
4: dp4 o1.y, r0.xyzw, cb2[1].xyzw
5: dp4 o1.z, r0.xyzw, cb2[2].xyzw
6: mul r1.xyzw, cb1[0].yyyy, cb2[1].xyzw
7: mad r1.xyzw, cb2[0].xyzw, cb1[0].xxxx, r1.xyzw
8: mad r1.xyzw, cb2[2].xyzw, cb1[0].zzzz, r1.xyzw
9: mad r1.xyzw, cb1[0].wwww, l(0.000000, 0.000000, 0.000000, 1.000000), r1.xyzw
10: dp4 o2.x, r0.xyzw, r1.xyzw
11: mul r1.xyzw, cb1[1].yyyy, cb2[1].xyzw
12: mad r1.xyzw, cb2[0].xyzw, cb1[1].xxxx, r1.xyzw
13: mad r1.xyzw, cb2[2].xyzw, cb1[1].zzzz, r1.xyzw
14: mad r1.xyzw, cb1[1].wwww, l(0.000000, 0.000000, 0.000000, 1.000000), r1.xyzw
15: dp4 o2.y, r0.xyzw, r1.xyzw
16: mul r1.xyzw, cb1[2].yyyy, cb2[1].xyzw
17: mad r1.xyzw, cb2[0].xyzw, cb1[2].xxxx, r1.xyzw
18: mad r1.xyzw, cb2[2].xyzw, cb1[2].zzzz, r1.xyzw
19: mad r1.xyzw, cb1[2].wwww, l(0.000000, 0.000000, 0.000000, 1.000000), r1.xyzw
20: dp4 o2.z, r0.xyzw, r1.xyzw
21: mul r1.xyzw, cb1[3].yyyy, cb2[1].xyzw
22: mad r1.xyzw, cb2[0].xyzw, cb1[3].xxxx, r1.xyzw
23: mad r1.xyzw, cb2[2].xyzw, cb1[3].zzzz, r1.xyzw
24: mad r1.xyzw, cb1[3].wwww, l(0.000000, 0.000000, 0.000000, 1.000000), r1.xyzw
25: dp4 o2.w, r0.xyzw, r1.xyzw
26: ret
```

In this scenario VS outputs only texcoords and world-space position. In

*Blood & Wine*it also outputs normalized normal vector. I'll stay with the 2015 version as it's simpler.Take a look at constant buffer marked as

**cb2**:Here we have world matrix (uniform scaling by 100 and translation by camera position). Nothing fancy. cb2_v4 and cb2_v5 are scale/bias factors which serve to transform positions of vertices from [0-1] range to [-1;1] one. But here, in terms of Z-axis (up) these coefficents will 'squeeze' it.

We have already seen similar VS in previous parts of the series. The general algorithm is to pass texcoords further, then calculate

*Position*with scale/bias factors, then calculate*PositionW*in world space, then calculate final clip space position by multiplying*matWorld*and*matViewProj*matrices together -> use their product to multiply with*Position*to get final SV_Position.So, the HLSL for this vertex shader would be something like this:

```
struct InputStruct {
float3 param0 : POSITION;
float2 param1 : TEXCOORD;
float3 param2 : NORMAL;
float4 param3 : TANGENT;
};
struct OutputStruct {
float2 param0 : TEXCOORD0;
float3 param1 : TEXCOORD1;
float4 param2 : SV_Position;
};
OutputStruct EditedShaderVS(in InputStruct IN)
{
OutputStruct OUT = (OutputStruct)0;
// Simple texcoords passing
OUT.param0 = IN.param1;
// * Manually construct world and viewProj martices from float4s:
row_major matrix matWorld = matrix(cb2_v0, cb2_v1, cb2_v2, float4(0,0,0,1) );
matrix matViewProj = matrix(cb1_v0, cb1_v1, cb1_v2, cb1_v3);
// * Some optional fun with worldMatrix
// a) Scale
//matWorld._11 = matWorld._22 = matWorld._33 = 0.225f;
// b) Translate
// X Y Z
//matWorld._14 = 520.0997;
//matWorld._24 = 74.4226;
//matWorld._34 = 113.9;
// Local space - note the scale+bias here!
//float3 meshScale = float3(2.0, 2.0, 2.0);
//float3 meshBias = float3(-1.0, -1.0, -0.4);
float3 meshScale = cb2_v4.xyz;
float3 meshBias = cb2_v5.xyz;
float3 Position = IN.param0 * meshScale + meshBias;
// World space
float4 PositionW = mul(float4(Position, 1.0), transpose(matWorld) );
OUT.param1 = PositionW.xyz;
// Clip space - original approach from The Witcher 3
matrix matWorldViewProj = mul(matViewProj, matWorld);
OUT.param2 = mul( float4(Position, 1.0), transpose(matWorldViewProj) );
return OUT;
}
```

Comparison of the my shader (left) and the original one (right):

The great thing about RenderDoc is that it allows to inject your own shader instead of original one and your changes do affect the pipeline until the very end of a frame. As you can see in HLSL code, I gave you some options to change scaling and translation of the final geometry. You can play with it and achieve some funny results:

Hail to the skydome! |

#### Optimizing the vertex shader

Do you see a problem with the original vertex shader? Per-vertex matrix-matrix multiplication is completely redundant! I found it in at least few vertex shaders (for instance, in distant rain shafts). We could optimize it by multiplying*PositionW***with***matViewProj*immediately!So, we can replace HLSL code:

```
// Clip space - original approach from The Witcher 3
matrix matWorldViewProj = mul(matViewProj, matWorld);
OUT.param2 = mul( float4(Position, 1.0), transpose(matWorldViewProj) );
```

with this one:

```
// Clip space - optimized version
OUT.param2 = mul( matViewProj, PositionW );
```

An optimized version produces the following assembly:

```
vs_5_0
dcl_globalFlags refactoringAllowed
dcl_constantbuffer CB1[4], immediateIndexed
dcl_constantbuffer CB2[6], immediateIndexed
dcl_input v0.xyz
dcl_input v1.xy
dcl_output o0.xy
dcl_output o1.xyz
dcl_output_siv o2.xyzw, position
dcl_temps 2
0: mov o0.xy, v1.xyxx
1: mad r0.xyz, v0.xyzx, cb2[4].xyzx, cb2[5].xyzx
2: mov r0.w, l(1.000000)
3: dp4 r1.x, r0.xyzw, cb2[0].xyzw
4: dp4 r1.y, r0.xyzw, cb2[1].xyzw
5: dp4 r1.z, r0.xyzw, cb2[2].xyzw
6: mov o1.xyz, r1.xyzx
7: mov r1.w, l(1.000000)
8: dp4 o2.x, cb1[0].xyzw, r1.xyzw
9: dp4 o2.y, cb1[1].xyzw, r1.xyzw
10: dp4 o2.z, cb1[2].xyzw, r1.xyzw
11: dp4 o2.w, cb1[3].xyzw, r1.xyzw
12: ret
```

As you can see, we reduced number of instructions from 26 to 12 - that's quite a change. I don't know how widespread this problem is in the game but c'mon CD Projekt Red, maybe a patch or something? :)

I'm not kidding here. You can inject my optimized shader instead of original one in RenderDoc and see for yourself that this optimization changes nothing in terms of visuals. Honestly, I don't know why CD Projekt Red decided to do per-vertex matrix-matrix multiplication...

### The Sun

In The Witcher 3 (2015) calculating of atmospheric scattering and the Sun consists of two separate draw calls:

The Witcher 3 (2015) - before |

The Witcher 3 (2015) - with sky |

The Witcher 3 (2015) - with sky + the Sun |

Rendering the Sun in 2015 version is pretty similar to the Moon in terms of geometry and blend/depth states.

On the other hand, in

*Blood & Wine*sky with the Sun is rendered in one pass:The Witcher 3: Blood & Wine (2016) - before sky |

The Witcher 3: Blood & Wine (2016) - with sky and the Sun |

No matter how you want to render the Sun at some point you will need (normalized) direction of sunlight. The most intuitive way to obtain this vector is to use spherical coordinates. Basically you need only two values representing two angles (in radians!):

Normally you calculate sunlight direction in your application, then pass it to constant buffer for further use.

Once we have sunlight direction we can dive into assembly of pixel shader from

Okay. To start,

Then we calculate dot product between

Cool! We have this dot product in r1.x. Let's find the next use of it...

*phi*and*theta*. Once you have them you can assume*r = 1*, so it cancels, so for y-up Cartesian coordinate system we can write HLSL code like this:```
float3 vSunDir;
vSunDir.x = sin(fTheta)*cos(fPhi);
vSunDir.y = sin(fTheta)*sin(fPhi);
vSunDir.z = cos(fTheta);
vSunDir = normalize(vSunDir);
```

Normally you calculate sunlight direction in your application, then pass it to constant buffer for further use.

Once we have sunlight direction we can dive into assembly of pixel shader from

*Blood & Wine*....```
...
100: add r1.xyw, -r0.xyxz, cb12[0].xyxz
101: dp3 r2.x, r1.xywx, r1.xywx
102: rsq r2.x, r2.x
103: mul r1.xyw, r1.xyxw, r2.xxxx
104: mov_sat r2.xy, cb12[205].yxyy
105: dp3 r2.z, -r1.xywx, -r1.xywx
106: rsq r2.z, r2.z
107: mul r1.xyw, -r1.xyxw, r2.zzzz
...
```

Okay. To start,

*cb12[0].xyz*is a position of camera, while in*r0.xyz*we store vertex position (it's an output from vertex shader). Therefore, line 100 calculates*worldToCamera*vector. But take a look at lines 105-107. We could write it as*normalize( -worldToCamera)*, which means we calculate normalized*cameraToWorld*vector.```
120: dp3_sat r1.x, cb12[203].yzwy, r1.xywx
```

Then we calculate dot product between

*cameraToWorld*and*sunDirection*vectors! Remember they have to be normalized. Also we saturate whole expression to clamp it within [0-1] range.Cool! We have this dot product in r1.x. Let's find the next use of it...

```
152: log r1.x, r1.x
153: mul r1.x, r1.x, cb12[203].x
154: exp r1.x, r1.x
155: mul r1.x, r2.y, r1.x
```

The "log, mul, exp" triple is, simply speaking, exponentation. As you can see, we raise our cosine (dot product of normalized vectors) to some power. You may ask, why? This way we can produce gradient which will mimic our Sun. (And line 155 affects opacity of this gradient, so you can for instance set this to zero to completely hide the Sun). See some examples:

Having this gradient, we use it to interpolate between

Please take a note that this trick can be used to mimic corona phenomenon for the Moon (with lower values of the exponent). For this you will need

Final HLSL can look similar to the following snippet:

To calculate final sampling vector (line 173), we start by calculating normalized

Then we calculate 2 cross products (163-164, 165-166) with

Note to self: This is really well-thought and I definitely have to investigate it in more details.

Note to readers: If you know more about this operation, let me know!

exponent = 54 |

exponent = 2400 |

*skyColor*and*sunColor*!**To make sure there will be no artifacts we had to saturate in line 120.**Please take a note that this trick can be used to mimic corona phenomenon for the Moon (with lower values of the exponent). For this you will need

*moonDirection*vector - which can be easily calculated with spherical coordinates.Final HLSL can look similar to the following snippet:

```
float3 vCamToWorld = normalize( PosW – CameraPos );
float cosTheta = saturate( dot(vSunDir, vCamToWorld) );
float sunGradient = pow( cosTheta, sunExponent );
float3 color = lerp( skyColor, sunColor, sunGradient );
```

### Moving stars

If you would make a timelapse during the night on a clear sky in The Witcher 3 you would notice that stars are not static - they slightly move across the sky with time! I noticed this quite accidentally and wanted to see how this was done.

Let's start with fact that stars in The Witcher 3 are represented with 1024x1024x6 cubemap. If you think about it, it's very handy solution as it easily allows to map directions to sample the cubemap.

Consider the following piece of assembly:

```
159: add r1.xyz, -v1.xyzx, cb1[8].xyzx
160: dp3 r0.w, r1.xyzx, r1.xyzx
161: rsq r0.w, r0.w
162: mul r1.xyz, r0.wwww, r1.xyzx
163: mul r2.xyz, cb12[204].zwyz, l(0.000000, 0.000000, 1.000000, 0.000000)
164: mad r2.xyz, cb12[204].yzwy, l(0.000000, 1.000000, 0.000000, 0.000000), -r2.xyzx
165: mul r4.xyz, r2.xyzx, cb12[204].zwyz
166: mad r4.xyz, r2.zxyz, cb12[204].wyzw, -r4.xyzx
167: dp3 r4.x, r1.xyzx, r4.xyzx
168: dp2 r4.y, r1.xyxx, r2.yzyy
169: dp3 r4.z, r1.xyzx, cb12[204].yzwy
170: dp3 r0.w, r4.xyzx, r4.xyzx
171: rsq r0.w, r0.w
172: mul r2.xyz, r0.wwww, r4.xyzx
173: sample_indexable(texturecube)(float,float,float,float) r4.xyz, r2.xyzx, t0.xyzw, s0
```

To calculate final sampling vector (line 173), we start by calculating normalized

*worldToCamera*vector (lines 159-162).Then we calculate 2 cross products (163-164, 165-166) with

*moonDirection*and later perform 3 dot products to get final sampling vector. HLSL:```
float3 vWorldToCamera = normalize( g_CameraPos.xyz - Input.PositionW.xyz );
float3 vMoonDirection = cb12_v204.yzw;
float3 vStarsSamplingDir = cross( vMoonDirection, float3(0, 0, 1) );
float3 vStarsSamplingDir2 = cross( vStarsSamplingDir, vMoonDirection );
float dirX = dot( vWorldToCamera, vStarsSamplingDir2 );
float dirY = dot( vWorldToCamera, vStarsSamplingDir );
float dirZ = dot( vWorldToCamera, vMoonDirection);
float3 dirXYZ = normalize( float3(dirX, dirY, dirZ) );
float3 starsColor = texNightStars.Sample( samplerAnisoWrap, dirXYZ ).rgb;
```

Note to self: This is really well-thought and I definitely have to investigate it in more details.

Note to readers: If you know more about this operation, let me know!

### Blinking stars

Another nice trick I wanted to investigate in more details is blinking of stars. If you walk around, let's say, outskirts of Novigrad City and sky is clear you can notice that stars are blinking.

I was curious how this was implemented. So the difference is quite big between the 2015 version and

So we start just after sampling

Huh. Let's take a look at the very end of this quite big piece of assembly.

Once we sampled

Simple, isn't it? Well, not really. Think about this

To make sure that

If you are unfamiliar with this intimidating ishr/xor/and thing, take a look at lightnings effect to learn more about integer noise.

So as you can see, integer noise is called here four times, but it's different now comparing to lightnings. To make results even more random the input integer for noise is a sum (

Okay, easy now. Let's start from start.

We have 4 "iterations" of integer noise. I analyzed the assembly and calculation of all 4 iterations looks like this:

The final outputs of all these 4 iterations are (follow

Iteration 1 - r5.x,

Iteration 2 - r4.w,

Iteration 3 - r1.w,

Iteration 4 - r5.y

After the last

These lines calculate values for S-curve for weights based on fractional part of UVs, just like in case of lightnings. So:

*Blood & Wine*. For simplicity I'll stay with 2015 version.So we start just after sampling

*starsColor*from the previous section:```
174: mul r0.w, v0.x, l(100.000000)
175: round_ni r1.w, r0.w
176: mad r2.w, v0.y, l(50.000000), cb0[0].x
177: round_ni r4.w, r2.w
178: bfrev r4.w, r4.w
179: iadd r5.x, r1.w, r4.w
180: ishr r5.y, r5.x, l(13)
181: xor r5.x, r5.x, r5.y
182: imul null, r5.y, r5.x, r5.x
183: imad r5.y, r5.y, l(0x0000ec4d), l(0.0000000000000000000000000000000000001)
184: imad r5.x, r5.x, r5.y, l(146956042240.000000)
185: and r5.x, r5.x, l(0x7fffffff)
186: itof r5.x, r5.x
187: mad r5.y, v0.x, l(100.000000), l(-1.000000)
188: round_ni r5.y, r5.y
189: iadd r4.w, r4.w, r5.y
190: ishr r5.z, r4.w, l(13)
191: xor r4.w, r4.w, r5.z
192: imul null, r5.z, r4.w, r4.w
193: imad r5.z, r5.z, l(0x0000ec4d), l(0.0000000000000000000000000000000000001)
194: imad r4.w, r4.w, r5.z, l(146956042240.000000)
195: and r4.w, r4.w, l(0x7fffffff)
196: itof r4.w, r4.w
197: add r5.z, r2.w, l(-1.000000)
198: round_ni r5.z, r5.z
199: bfrev r5.z, r5.z
200: iadd r1.w, r1.w, r5.z
201: ishr r5.w, r1.w, l(13)
202: xor r1.w, r1.w, r5.w
203: imul null, r5.w, r1.w, r1.w
204: imad r5.w, r5.w, l(0x0000ec4d), l(0.0000000000000000000000000000000000001)
205: imad r1.w, r1.w, r5.w, l(146956042240.000000)
206: and r1.w, r1.w, l(0x7fffffff)
207: itof r1.w, r1.w
208: mul r1.w, r1.w, l(0.000000001)
209: iadd r5.y, r5.z, r5.y
210: ishr r5.z, r5.y, l(13)
211: xor r5.y, r5.y, r5.z
212: imul null, r5.z, r5.y, r5.y
213: imad r5.z, r5.z, l(0x0000ec4d), l(0.0000000000000000000000000000000000001)
214: imad r5.y, r5.y, r5.z, l(146956042240.000000)
215: and r5.y, r5.y, l(0x7fffffff)
216: itof r5.y, r5.y
217: frc r0.w, r0.w
218: add r0.w, -r0.w, l(1.000000)
219: mul r5.z, r0.w, r0.w
220: mul r0.w, r0.w, r5.z
221: mul r5.xz, r5.xxzx, l(0.000000001, 0.000000, 3.000000, 0.000000)
222: mad r0.w, r0.w, l(-2.000000), r5.z
223: frc r2.w, r2.w
224: add r2.w, -r2.w, l(1.000000)
225: mul r5.z, r2.w, r2.w
226: mul r2.w, r2.w, r5.z
227: mul r5.z, r5.z, l(3.000000)
228: mad r2.w, r2.w, l(-2.000000), r5.z
229: mad r4.w, r4.w, l(0.000000001), -r5.x
230: mad r4.w, r0.w, r4.w, r5.x
231: mad r5.x, r5.y, l(0.000000001), -r1.w
232: mad r0.w, r0.w, r5.x, r1.w
233: add r0.w, -r4.w, r0.w
234: mad r0.w, r2.w, r0.w, r4.w
235: mad r2.xyz, r0.wwww, l(0.000500, 0.000500, 0.000500, 0.000000), r2.xyzx
236: sample_indexable(texturecube)(float,float,float,float) r2.xyz, r2.xyzx, t0.xyzw, s0
237: log r4.xyz, r4.xyzx
238: mul r4.xyz, r4.xyzx, l(2.200000, 2.200000, 2.200000, 0.000000)
239: exp r4.xyz, r4.xyzx
240: log r2.xyz, r2.xyzx
241: mul r2.xyz, r2.xyzx, l(2.200000, 2.200000, 2.200000, 0.000000)
242: exp r2.xyz, r2.xyzx
243: mul r2.xyz, r2.xyzx, r4.xyzx
```

Huh. Let's take a look at the very end of this quite big piece of assembly.

Once we sampled

*starsColor*in line 173 we calculate some*offset*value. This*offset*is used to perturb first sampling direction (r2.xyz, line 235), then we sample stars cubemap again, perform gamma correction on these two values (237-242) and multiply them together (243).Simple, isn't it? Well, not really. Think about this

*offset*for a while. It must be different across whole skydome - stars blinking the same way would look very unrealistic.To make sure that

*offset*will be as diverse as possible we will take advantage of UVs wrapped across skydome (v0.xy) and elapsed time from constant buffer (cb[0].x).If you are unfamiliar with this intimidating ishr/xor/and thing, take a look at lightnings effect to learn more about integer noise.

So as you can see, integer noise is called here four times, but it's different now comparing to lightnings. To make results even more random the input integer for noise is a sum (

*iadd*) and reversing bits is performed (reversebits instrinsic; bfrev instruction).Okay, easy now. Let's start from start.

We have 4 "iterations" of integer noise. I analyzed the assembly and calculation of all 4 iterations looks like this:

```
// * Inputs - UV and elapsed time in seconds
float2 starsUV;
starsUV.x = 100.0 * Input.TextureUV.x;
starsUV.y = 50.0 * Input.TextureUV.y + g_fTime;
// * Iteration 1
int iStars1_A = reversebits( asint( floor(starsUV.y) ) );
int iStars1_B = asint( floor(starsUV.x) );
float fStarsNoise1 = integerNoise( iStars1_A + iStars1_B );
// * Iteration 2
int iStars2_A = reversebits( asint( floor(starsUV.y) ) );
int iStars2_B = asint( floor( starsUV.x - 1.0 ) );
float fStarsNoise2 = integerNoise( iStars2_A + iStars2_B );
// * Iteration 3
int iStars3_A = reversebits( asint( floor( starsUV.y - 1.0 ) ) );
int iStars3_B = asint( floor(starsUV.x) );
float fStarsNoise3 = integerNoise( iStars3_A + iStars3_B );
// * Iteration 4
int iStars4_A = reversebits( asint( floor( starsUV.y - 1.0 ) ) );
int iStars4_B = asint( floor( starsUV.x - 1.0 ) );
float fStarsNoise4 = integerNoise( iStars4_A + iStars4_B );
```

The final outputs of all these 4 iterations are (follow

*itof*instructions to find them):Iteration 1 - r5.x,

Iteration 2 - r4.w,

Iteration 3 - r1.w,

Iteration 4 - r5.y

After the last

*itof*(line 216) we have:```
217: frc r0.w, r0.w
218: add r0.w, -r0.w, l(1.000000)
219: mul r5.z, r0.w, r0.w
220: mul r0.w, r0.w, r5.z
221: mul r5.xz, r5.xxzx, l(0.000000001, 0.000000, 3.000000, 0.000000)
222: mad r0.w, r0.w, l(-2.000000), r5.z
223: frc r2.w, r2.w
224: add r2.w, -r2.w, l(1.000000)
225: mul r5.z, r2.w, r2.w
226: mul r2.w, r2.w, r5.z
227: mul r5.z, r5.z, l(3.000000)
228: mad r2.w, r2.w, l(-2.000000), r5.z
```

These lines calculate values for S-curve for weights based on fractional part of UVs, just like in case of lightnings. So:

```
float s_curve( float x )
{
float x2 = x * x;
float x3 = x2 * x;
// -2x^3 + 3x^2
return -2.0*x3 + 3.0*x2;
}
...
// lines 217-222
float weightX = 1.0 - frac( starsUV.x );
weightX = s_curve( weightX );
// lines 223-228
float weightY = 1.0 - frac( starsUV.y );
weightY = s_curve( weightY );
```

As you can expect, these factors serve to interpolate noise smoothly and generate final offset for sampling coordinates:

```
229: mad r4.w, r4.w, l(0.000000001), -r5.x
230: mad r4.w, r0.w, r4.w, r5.x
float noise0 = lerp( fStarsNoise1, fStarsNoise2, weightX );
231: mad r5.x, r5.y, l(0.000000001), -r1.w
232: mad r0.w, r0.w, r5.x, r1.w
float noise1 = lerp( fStarsNoise3, fStarsNoise4, weightX );
233: add r0.w, -r4.w, r0.w
234: mad r0.w, r2.w, r0.w, r4.w
float offset = lerp( noise0, noise1, weightY );
235: mad r2.xyz, r0.wwww, l(0.000500, 0.000500, 0.000500, 0.000000), r2.xyzx
236: sample_indexable(texturecube)(float,float,float,float) r2.xyz, r2.xyzx, t0.xyzw, s0
float3 starsPerturbedDir = dirXYZ + offset * 0.0005;
float3 starsColorDisturbed = texNightStars.Sample( samplerAnisoWrap, starsPerturbedDir ).rgb;
```

Once we have

*starsColorDisturbed*, the hardest part is over. Phew!

The next step is to perform gamma correction on both

*starsColor*and

*starsColorDisturbed*and multiply them:

```
starsColor = pow( starsColor, 2.2 );
starsColorDisturbed = pow( starsColorDisturbed, 2.2 );
float3 starsFinal = starsColor * starsColorDisturbed;
```

#### Stars - the final touches

We have

*starsFinal*in r1.xyz. What's happening at the end of processing stars is this:```
256: log r1.xyz, r1.xyzx
257: mul r1.xyz, r1.xyzx, l(2.500000, 2.500000, 2.500000, 0.000000)
258: exp r1.xyz, r1.xyzx
259: min r1.xyz, r1.xyzx, l(1.000000, 1.000000, 1.000000, 0.000000)
260: add r0.w, -cb0[9].w, l(1.000000)
261: mul r1.xyz, r0.wwww, r1.xyzx
262: mul r1.xyz, r1.xyzx, l(10.000000, 10.000000, 10.000000, 0.000000)
```

This is much, much easier comparing to blinking and moving stars.

So we start with raising

*starsFinal*to power of 2.5 - this allows to control density of stars. Pretty clever. Then, we make sure the maximum color of stars is float3(1, 1, 1).

cb0[9].w is used to control general visibility of stars. So in daytime expect this to be set to 1.0 (which yields in multiplying by zero) and 0.0 during nights.

At the end we boost visibility of stars by 10. And this is over! :)

### Summary

In this post I presented some cool tricks I found while investigating sky shader from The Witcher 3. I hope you enjoyed it. Thanks for reading!

Take care,

M.

Take care,

M.