Computing Morton Codes with a WebGPU Compute Shader

mortonzcurve.png

To implement collision detection on the GPU, I mostly followed the approach in the Thinking Parallel series from the Nvidia tech blog.

See my previous post for more details - I didn't follow the Nvidia series to the letter for various reasons.

In this post we will:

  1. Setup WebGPU.
  2. Generate some random 3-D points in JavaScript.
  3. Compute a Morton Code from each point's coordinates in a WebGPU compute shader.

WebGPU Browser Support

  • I'm using Chrome, which supports WebGPU out of the box.
  • Check here for other browser support.

General Purpose Computing: GPGPU

A compute shader allows you to write a function to compute something over your input data, and then you can write the output to a buffer.

So we are just using the GPU for general purpose computing, or GPGPU at this point.

We won't be rendering anything to the screen in this post.

Instead we will view the output of the compute shader in the developer console.


Code

  • I have a gist of the code here.
  • Eventually all of the code for this project will be in a repository, but I am still figuring out how to organize it.

Alright, let's go...


What is a Morton Code?

A Morton Code maps multidimensional data to a single value.

So we want to map each 3-D coordinate (x, y, z) to a single value.

We do this by interleaving the binary representation of the coordinate values.

interleavingCrop.png

One perk is that the locality of the points is preserved.


Coming Up...

  • In the next post, we will sort the Morton Codes, which will order the points in a z-order curve.
  • The sorted Morton Codes will form the basis for our acceleration structure to efficiently detect collisions between points: a bounding volume hierarchy.

WebGPU Setup

I'm just using the WebGPU JavaScript API - no frameworks for this one!

I am also omitting some error handling and stuff here - check the gist for the full code.

const adapter = await navigator.gpu.requestAdapter();
const device = await adapter?.requestDevice();

Here we have device which is a reference to the GPU connection, and we will be using it for various things below.


Generate Random Points

randompoints.png

This part is in JavaScript.

To start off, we are generating some random 3-D points.

Yes, the graphic is 2-D, but I wanted to show that the points are generated in the unit square.

const coordinateCount = 16;
const numCoordinateVertices = 3; // x,y,z
const numVerticesWithPadding = numCoordinateVertices + 1;

var testPoints = [];
for (let i = 0; i < numVerticesWithPadding * coordinateCount; i++) {
    testPoints.push(Math.random());
}

So we have 16 random (x, y, z) coordinates.

The basic idea with the compute shader is that we have an array of 16 point coordinates, and each Morton Code will be computed in its own thread, in parallel.

You might notice that we've generate an array of 64 numbers - 16 x 4 - why did we do that if we are working with 3-D coordinates?


vec3

I added the extra number as padding because I will be using a vec3f data structure in the shader.

Vec3f represents a vector with three floating point numbers.

  • Since vec3 requires padding in WebGPU, I am generating four numbers for each coordinate, and the last one isn't used.
  • Read more about WebGPU Memory Layout to find out more about the need for the extra padding.

Copy Points To the GPU

We generated the points in JavaScript.

...but everything else will be done on the GPU.

So first we need to copy the points data to the GPU.

We will do this by:

JavaScript Float32Array

Convert the array of generated points to a typed array in JavaScript.

const mortonCodeInput = new Float32Array(testPoints);

Create a GPU Buffer

Create a GPU buffer to store the data.

let mortonCodeInputBuffer = device.createBuffer({
    label: 'morton code points input buffer',
    size: mortonCodeInput.byteLength,
    usage: GPUBufferUsage.STORAGE | GPUBufferUsage.COPY_SRC | GPUBufferUsage.COPY_DST,
});

Write Data to the GPU Buffer

Write the typed array to the GPU buffer.

device.queue.writeBuffer(mortonCodeInputBuffer, 0, mortonCodeInput);

Another GPU Buffer for the output

I'm also going to create a separate GPU buffer for the Morton Code output.

let mortonCodeOutputBuffer = device.createBuffer({
    label: 'morton code buffer',
    size: mortonOutputBufferSize,
    usage: GPUBufferUsage.STORAGE | GPUBufferUsage.COPY_SRC | GPUBufferUsage.COPY_DST,
});

The computed Morton Codes will be written to this buffer.


WebGPU Compute Pipeline

computepipeline.png

Before we talk about the actual shader code to compute Morton Codes, let's go over the structure of the compute pipeline.

The compute pipeline consists of everything we need to set up to run our compute shader.

There are three components in WebGPU:

  1. Shaders
  2. Pipelines
  3. Bind Groups

1. Shader Module

We will create a shader module, which contains the shader code to compute the Morton Codes.

const mortonCodeModule = device.createShaderModule({
    label: 'compute morton codes from input points',
    code: < SHADER_CODE_STRING_HERE > ,
});
  • I left out the actual shader code to save space, but you can view it here.

The shader code is a string, and you can use string substitution for any variables you need to add that need to be added before the program compiles.

We will come back to the actual shader code in a bit.

2. Pipeline

Then we create a WebGPU pipeline.

const mortonCodePipelineLabel = 'morton code pipeline';

const mortonCodePipeline = device.createComputePipeline({
    label: mortonCodePipelineLabel,
    layout: 'auto',
    compute: {
        module: mortonCodeModule,
        entryPoint: 'computeMortonCodes',
    },
});
  • Pass the shader module to it.
  • Along with the entry point, which is the specific shader function name to be run.

You can have more than one function in a shader module.

You just need to make sure to reference the correct function when you create a pipeline.

You can see in the gist that the function is named computeMortonCodes.

3. Bind Group

Lastly, we will create a bind group for the pipeline we just created.

const mortonBindGroupLabel = 'bind group for morton code compute';

const mortonBindGroupEntries = [{
        binding: 0,
        resource: {
            buffer: mortonCodeInputBuffer
        },
    },
    {
        binding: 1,
        resource: {
            buffer: mortonCodeOutputBuffer
        },
    }
];

const mortonCodeBindGroup = device.createBindGroup({
    label: mortonBindGroupLabel,
    layout: mortonCodePipeline.getBindGroupLayout(0),
    entries: mortonBindGroupEntries
});

This will tell the compute pipeline how to use the GPU buffers passed to it.

The bind group entries correspond with the storage arrays in the shader program that I mentioned earlier:

@group(0) @binding(0) var<storage, read> data: array<vec3f, coordinateCount>;
@group(0) @binding(1) var<storage, read_write> dataOut: array<u32>;

The binding numbers in the bind group and corresponding shader code need to match.

In WebGPU, you are mostly responsible for data/memory management to make sure things are aligned correctly - the vec3 issue from earlier is similar.

If things aren't aligned correctly, you will either get an error or other potentially difficult-to-debug problems in your shader.


Morton Code Compute Shader

As I mentioned earlier, we are taking the binary representation of the coordinates, and interleaving them together.

fn expandBits(v: u32) -> u32
    {
        var vv = (v * 0x00010001) & 0xFF0000FF;
        vv = (vv * 0x00000101) & 0x0F00F00F;
        vv = (vv * 0x00000011) & 0xC30C30C3;
        vv = (vv * 0x00000005) & 0x49249249;
        return vv;
    }

fn morton3D(x: f32, y: f32, z: f32) -> u32
    {
        var xx = min(max(x * 1024.0, 0.0), 1023.0);
        var yy = min(max(y * 1024.0, 0.0), 1023.0);
        var zz = min(max(z * 1024.0, 0.0), 1023.0);

        var xxE = expandBits(u32(xx));
        var yyE = expandBits(u32(yy));
        var zzE = expandBits(u32(zz));
        return xxE * 4 + yyE * 2 + zzE;
    }

The Morton Codes are computed for each coordinate set in parallel.

let i = id.x;
let mc = data[i];
dataOut[i] = u32(morton3D(mc.x, mc.y, mc.z));

The first line gets the global invocation id for the workgroup invocation.

  • data corresponds to the input buffer - the raw coordinates.
    • This is the vec3f from earlier - if we had not added the extra padding to the buffer, the shader would not read in the data correctly and would likely not cause an error, so you would just have wonky, unexpected results.
  • dataOut corresponds to the Morton Code output buffer.

In this example, the workgroup size is 16, corresponding to the array of 16 coordinate sets in data.

So we get the coordinate from each index, compute the Morton Code, and then write it out to the same index in the dataOut array.


Run the Compute Pipeline

Time to run the code!

We need to do a few things first:

const encoder = device.createCommandEncoder({
    label: 'morton encoder'
});

const pass = encoder.beginComputePass({
    label: 'morton compute pass'
});

Then we set the pipeline and bind group.

pass.setPipeline(mortonCodePipeline);
pass.setBindGroup(0, mortonCodeBindGroup);

And in this example we are dispatching one workgroup.

pass.dispatchWorkgroups(1);

A Note on Workgroups

  • In this post/example I've kept things pretty simple as far as workgroups go - we are dispatching a single workgroup.
  • We have a workgroup size of 16 in the compute shader code, which corresponds to the length of our array of coordinates.
  • So that will work out to running 16 threads in parallel, in this example.

I will talk more about workgroups in the coming posts - this is where I have run into issues with WebGPU, so lots more on that to come!

For now, WebGPUFundamentals has a great post that goes into detail on workgroups.


Now we can end the compute pass.

pass.end();

Print Results to the Console

We need a separate buffer which we will copy the Morton Code output results to, in order to print them to the console.

I would recommend reading the WebGPU fundamentals article - the part about compute shaders has a good explanation of why we need to create this separate buffer to print the output in JavaScript.

The main thing here to note is that the buffer has the GPUBufferUsage.MAP_READ usage flag.

let mortonCodeBufferResult = device.createBuffer({
    label: 'morton codes output result',
    size: mortonCodeOutputBuffer.size,
    usage: GPUBufferUsage.MAP_READ | GPUBufferUsage.COPY_DST
});

Copy the Morton Code output buffer to the result buffer.

encoder.copyBufferToBuffer(mortonCodeOutputBuffer, 0, mortonCodeBufferResult, 0, mortonCodeBufferResult.size);

Copying the buffer is just another command that we are adding to the compute pipeline, but we have not yet actually executed the pipeline yet.

So that is why we are adding it here.


Finish and Submit the Command Encoder

Now we've got all of our commands in the command buffer, and we can finish and submit it for WebGPU to execute.

let commandBuffer = encoder.finish();
device.queue.submit([commandBuffer]);

Map and Print results to console

await mortonCodeBufferResult.mapAsync(GPUMapMode.READ);
const mortonCodeResult = new Int32Array(mortonCodeBufferResult.getMappedRange());
console.log('morton codes: ', mortonCodeResult);

In the console you should see the integer representation of the Morton Codes.

Thanks for reading!

I hope this has been a helpful overview on compute shaders in WebGPU.

Like I mentioned earlier, this is a simplified example extracted from the full project.

I am still figuring out how to organize the full project code!

Let me know if you have questions or if anything doesn't make sense.

Tagged In
blog comments powered by Disqus

Recent Posts

mortonzcurve.png
Computing Morton Codes with a WebGPU Compute Shader
May 29, 2024

Starting out with general purpose computing on the GPU, we are going to write a WebGPU compute shader to compute Morton Codes from an array of 3-D coordinates. This is the first step to detecting collisions between pairs of points.

Read More
webgpuCollide.png
WebGPU: Building a Particle Simulation with Collision Detection
May 13, 2024

In this post, I am dipping my toes into the world of compute shaders in WebGPU. This is the first of a series on building a particle simulation with collision detection using the GPU.

Read More
abstract_tree.png
Solving the Lowest Common Ancestor Problem in Python
May 9, 2023

Finding the Lowest Common Ancestor of a pair of nodes in a tree can be helpful in a variety of problems in areas such as information retrieval, where it is used with suffix trees for string matching. Read on for the basics of this in Python.

Read More
Get the latest posts as soon as they come out!