Point rendering

September 1, 2024 · 5 min read

I've started working on adding rain to Snowscape. This will require improving a lot of the very, very fledgling engine internals.

WGPU isn't that easy

I'm liking WGPU but it rarely seems that things Just Work (tm)...

Adding point rendering - just drawing individual pixels - was trivial. Basically duplicate all the line rendering code and make a few minor tweaks to the primitive type data definitions (e.g. PointList, not LineList).

So on to finding the equivalent of glPointSize. And...there is no equivalent. This webgpufundamentals post however was very helpful. TL;DR: use instancing to render the points so each point is generates a quad. This was very interesting since I hadn't used instancing before in almost any context and also had been wondering what the WGPU equivalent of geometry shaders might be.

First try...

For debugging, I tried rendering a grid of points as quads using LineList primitives but that didn't work:

alt text

Vertex index vs. instance index

Turned out I had my index and vertex buffers backwards. The VertexStepMode docs clued me into the error: in that parlance, I had my "rows" and "columns" transposed so that I was generating the primitives based on instance index rather than vertex index.

Once I figured that out, the points rendered as quads (with debugging colors on each generated vertex):

alt text

The working code

Define the vs_main input as an Instance, not a Vertex. Yes, the shader generates vertices, so it is a vertex shader still, but the input is the instance information. The additional information we need is the builtin(vertex_index) to know which vertex we're generating for the Instance. I confused myself by thinking of the vs_main input still being a Vertex originally.

struct Globals {
    view            : mat4x4<f32>,
    proj            : mat4x4<f32>,
    view_proj       : mat4x4<f32>,
    camera_position : vec3<f32>,
};

@group(0) @binding(0)
var<uniform> globals : Globals;

struct Locals {
};
@group(1) @binding(0)
var<uniform> locals: Locals;


struct Instance {
    @location(0) position : vec3<f32>,
    @location(1) color    : vec3<f32>,
};

struct Fragment {
    @builtin(position) clip_position : vec4<f32>,
    @location(0)       color         : vec3<f32>,
};

// This is called in with an "instance" step mode.
//
// There are N invocations of the vertex shader, one for each vertex
// of the instance.  In this case, we're rendering a quad as two triangles
// so N=6. (The render pipeline is set up to TriangeList, so a triangle
// is generated for each consecutive 3 vertex_index values).
//
// We use the Instance position to generate the center world coordinate,
// then generate 6 vertices in NDC space for the quad.
//
@vertex
fn vs_main(
    instance : Instance,
    @builtin(vertex_index) vertexIndex: u32,
) -> Fragment {

    var vertices = array(
          vec2f(-1, -1),
          vec2f( 1, -1),
          vec2f(-1,  1),

          vec2f( 1,  1),
          vec2f( 1, -1),
          vec2f(-1,  1),
    );
    var vertex_pos = vertices[vertexIndex];
    # TODO: remove hard-coded point size conversion
    var vertex_ndc = vec4<f32>(vertex_pos * 100.0/400.0, 0.0, 0.0);

    // Eye coordinates
    var ec = globals.view * vec4<f32>(instance.position, 1.0);

    var frag: Fragment;
    frag.color = instance.color;
    frag.clip_position = globals.proj * ec + vertex_ndc;
    return frag;
}

@fragment
fn fs_main(frag : Fragment) -> @location(0) vec4<f32> {
    var c = frag.color;
    return vec4<f32>(c, 1.0);
}

The shader is tightly coupled to the vertex layout and draw call. The shader below assumes it will be called with a count of six and exactly six for each instance. The step_mode must also be set correctly. This doesn't feel very "reuse friendly" but makes sense that layouts and shaders would be so tightly coupled.

    // Adds the commands to render the buffer to the queue
    //
    pub fn activate(&self, render_pass: &mut wgpu::RenderPass) {
        let bytes = self.instance_buffer.size() as u32;
        let size = std::mem::size_of::<PointInstance>() as u32;
        let count = bytes / size;

        // NOTE: the method naming is distracting since we're setting
        // the *instance buffer* here, not a vertex buffer.
        render_pass.set_vertex_buffer(0, self.instance_buffer.slice(..));

        // The shader is hard-coded to generate 6 vertices per instance
        render_pass.draw(0..6, 0..count);
    }

The WGPU naming can be misleading for those new to instancing. It's also rather distracting that "logically" the vertex format is instance data, but the WGPU naming conventions for the structs and methods all refer to this as a vertex buffer. I'm sure this is appropriate naming looking a the holistic design of WGPU, but this was quite confusing in figuring out how instancing works for the first time!

impl PointInstance {
    pub fn desc() -> wgpu::VertexBufferLayout<'static> {
        wgpu::VertexBufferLayout {
            array_stride: std::mem::size_of::<PointInstance>() as wgpu::BufferAddress,
            step_mode: wgpu::VertexStepMode::Instance, // <-- THIS
            attributes: &[
                wgpu::VertexAttribute {
                    offset: 0,
                    shader_location: 0,
                    format: wgpu::VertexFormat::Float32x3,
                },
                wgpu::VertexAttribute {
                    offset: std::mem::size_of::<[f32; 3]>() as wgpu::BufferAddress,
                    shader_location: 1,
                    format: wgpu::VertexFormat::Float32x3,
                },
            ],
        }
    }
}

The shader is also tightly coupled to the RenderPass as that needs its primitive topology to be set to TriangeList.

primitive: wgpu::PrimitiveState {
    cull_mode: None,
    topology: wgpu::PrimitiveTopology::TriangleList,  // <-- THIS
    strip_index_format: None,
    front_face: wgpu::FrontFace::Ccw,
    polygon_mode: wgpu::PolygonMode::Fill,
    unclipped_depth: false,
    conservative: false,
},

WGPU isn't that easy​

First try...​

Vertex index vs. instance index​

The working code​

WGPU isn't that easy

First try...

Vertex index vs. instance index

The working code