Skip to main content

Particles

· One min read

Got a "particle system" working. This is a simple point grid being mapped to a sine wave.

The core of the work was figuring out dynamic WGPU vertex updates and reworking some of the overly simple parts of the Snowfall architecture to account for changing geometry.

Point rendering

· 5 min read

I've started working on adding rain to Snowscape. This will require improving a lot of the very, very fledgling engine internals.

WGPU isn't that easy

I'm liking WGPU but it rarely seems that things Just Work (tm)...

Adding point rendering - just drawing individual pixels - was trivial. Basically duplicate all the line rendering code and make a few minor tweaks to the primitive type data definitions (e.g. PointList, not LineList).

So on to finding the equivalent of glPointSize. And...there is no equivalent. This webgpufundamentals post however was very helpful. TL;DR: use instancing to render the points so each point is generates a quad. This was very interesting since I hadn't used instancing before in almost any context and also had been wondering what the WGPU equivalent of geometry shaders might be.

First try...

For debugging, I tried rendering a grid of points as quads using LineList primitives but that didn't work:

alt text

Vertex index vs. instance index

Turned out I had my index and vertex buffers backwards. The VertexStepMode docs clued me into the error: in that parlance, I had my "rows" and "columns" transposed so that I was generating the primitives based on instance index rather than vertex index.

Once I figured that out, the points rendered as quads (with debugging colors on each generated vertex):

alt text

The working code

Define the vs_main input as an Instance, not a Vertex. Yes, the shader generates vertices, so it is a vertex shader still, but the input is the instance information. The additional information we need is the builtin(vertex_index) to know which vertex we're generating for the Instance. I confused myself by thinking of the vs_main input still being a Vertex originally.

struct Globals {
view : mat4x4<f32>,
proj : mat4x4<f32>,
view_proj : mat4x4<f32>,
camera_position : vec3<f32>,
};

@group(0) @binding(0)
var<uniform> globals : Globals;

struct Locals {
};
@group(1) @binding(0)
var<uniform> locals: Locals;


struct Instance {
@location(0) position : vec3<f32>,
@location(1) color : vec3<f32>,
};

struct Fragment {
@builtin(position) clip_position : vec4<f32>,
@location(0) color : vec3<f32>,
};

// This is called in with an "instance" step mode.
//
// There are N invocations of the vertex shader, one for each vertex
// of the instance. In this case, we're rendering a quad as two triangles
// so N=6. (The render pipeline is set up to TriangeList, so a triangle
// is generated for each consecutive 3 vertex_index values).
//
// We use the Instance position to generate the center world coordinate,
// then generate 6 vertices in NDC space for the quad.
//
@vertex
fn vs_main(
instance : Instance,
@builtin(vertex_index) vertexIndex: u32,
) -> Fragment {

var vertices = array(
vec2f(-1, -1),
vec2f( 1, -1),
vec2f(-1, 1),

vec2f( 1, 1),
vec2f( 1, -1),
vec2f(-1, 1),
);
var vertex_pos = vertices[vertexIndex];
# TODO: remove hard-coded point size conversion
var vertex_ndc = vec4<f32>(vertex_pos * 100.0/400.0, 0.0, 0.0);

// Eye coordinates
var ec = globals.view * vec4<f32>(instance.position, 1.0);

var frag: Fragment;
frag.color = instance.color;
frag.clip_position = globals.proj * ec + vertex_ndc;
return frag;
}

@fragment
fn fs_main(frag : Fragment) -> @location(0) vec4<f32> {
var c = frag.color;
return vec4<f32>(c, 1.0);
}

The shader is tightly coupled to the vertex layout and draw call. The shader below assumes it will be called with a count of six and exactly six for each instance. The step_mode must also be set correctly. This doesn't feel very "reuse friendly" but makes sense that layouts and shaders would be so tightly coupled.

    // Adds the commands to render the buffer to the queue
//
pub fn activate(&self, render_pass: &mut wgpu::RenderPass) {
let bytes = self.instance_buffer.size() as u32;
let size = std::mem::size_of::<PointInstance>() as u32;
let count = bytes / size;

// NOTE: the method naming is distracting since we're setting
// the *instance buffer* here, not a vertex buffer.
render_pass.set_vertex_buffer(0, self.instance_buffer.slice(..));

// The shader is hard-coded to generate 6 vertices per instance
render_pass.draw(0..6, 0..count);
}

The WGPU naming can be misleading for those new to instancing. It's also rather distracting that "logically" the vertex format is instance data, but the WGPU naming conventions for the structs and methods all refer to this as a vertex buffer. I'm sure this is appropriate naming looking a the holistic design of WGPU, but this was quite confusing in figuring out how instancing works for the first time!

impl PointInstance {
pub fn desc() -> wgpu::VertexBufferLayout<'static> {
wgpu::VertexBufferLayout {
array_stride: std::mem::size_of::<PointInstance>() as wgpu::BufferAddress,
step_mode: wgpu::VertexStepMode::Instance, // <-- THIS
attributes: &[
wgpu::VertexAttribute {
offset: 0,
shader_location: 0,
format: wgpu::VertexFormat::Float32x3,
},
wgpu::VertexAttribute {
offset: std::mem::size_of::<[f32; 3]>() as wgpu::BufferAddress,
shader_location: 1,
format: wgpu::VertexFormat::Float32x3,
},
],
}
}
}

The shader is also tightly coupled to the RenderPass as that needs its primitive topology to be set to TriangeList.

primitive: wgpu::PrimitiveState {
cull_mode: None,
topology: wgpu::PrimitiveTopology::TriangleList, // <-- THIS
strip_index_format: None,
front_face: wgpu::FrontFace::Ccw,
polygon_mode: wgpu::PolygonMode::Fill,
unclipped_depth: false,
conservative: false,
},

Minimal React Frontend

· 5 min read

To accelerate being UI development and debugging of Snowscape, I'm building a web-based front-end that can connect to the engine. I'm familiar with React and TypeScript, so being able to use the browser development environment will speed things along (versus trying to learn egui or another Rust-based framework at this point in development).

JavaScript tooling drives me a bit bonkers so here's a quick write-up on my "minimal" setup to get a front-end going (i.e. for simple, non-production purposes). Given the rate of change in the JavaScript ecosystem, who knows how long this post will be useful or relevant for!

The files

The "minimal" files needed here are:

src/
app.tsx
main.tsx
index.html
style.css
scripts/
open-browser.js

.gitignore
package.json
Makefile

The Makefile (Makefile)

I like Makefiles because I know make build is going to build my project and I don't have to worry about whether the project is using npm, npx, tsc, esbuild, cargo, etc. This is great for large complex monorepos using multiple languages as well as for coming back to old personal projects where I've long since forgotten all the details of how I build it.

I'm a big fan of language agnostic command-runners and Make, while hardly perfect, is ubiquitously available -- which is a good fit for a command runner you're using to avoid having to remmeber specialized tools.

.PHONY: ensure build run dev

ensure:
npm i

build: ensure
mkdir -p dist/static
cp src/index.html dist
cp src/style.css dist
npx esbuild \
--preserve-symlinks \
--loader:.js=jsx \
--loader:.md=text \
--loader:.yaml=text \
--loader:.txt=text \
--sourcemap \
--bundle src/main.tsx \
--outfile=dist/main.bundle.js

run: build
(sleep 2 && node scripts/open-browser.js) &
npx http-server -c-1 dist

dev:
$(MAKE) run &
npx nodemon \
--watch src \
--ext ts,tsx,html,css,yaml,yml \
--exec "make build || exit 1"

Tools & dependencies (package.json)

The above requires some tools, so let's look at the package.json:

{
"devDependencies": {
"@types/react": "^18.3.5",
"esbuild": "^0.23.1",
"http-server": "^14.1.1",
"nodemon": "^3.1.4",
"react-dev-utils": "^12.0.1"
},
"dependencies": {
"react": "^18.3.1",
"react-dom": "^18.3.1"
}
}

open-browser.js

And we have one super simple script scripts/open-browswer.js for opening a browswer tab when the client is launched. It's basically just a wrapper to call a function in an external package:

const openBrowser = require('react-dev-utils/openBrowser');
openBrowser('http://localhost:8080');

Files to keep out of git (.gitignore)

And let's not forget about making a .gitignore so we don't accidentally commit built files to the repo that we don't want there:

/node_modules/
/dist/

Boilerplate minimal HTML (index.html)

We need an index.html to host the page:

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta http-equiv="cache-control" content="no-cache" />
<meta name="viewport" content="width=device-width, initial-scale=1.0"></meta>
<link href="style.css" rel="stylesheet" />
<title></title>
</head>
<body>
<div id="root"></div>
<script src="main.bundle.js" type="application/javascript"></script>
</body>
</html>

Boilerplate CSS (style.css)

I generally use inline CSS for small projects, but it's nice to have a single CSS for global settings, normalization, etc.

body {
margin: 0;
padding: 0;

font-family: monospace;
}

React bootstrapping (main.tsx)

I like the convention of having a main() function that calls a Main component and, between those two, all the foundational "plumbing" of a React app is handled. In particular, for a simple development/debugging client like it adds very basic, minimal "hot reloading" that polls for changes to the underlying script bundle (no complicated dev servers: just a simple polling loop with a handful of lines of code).

import React, { JSX } from 'react';
import ReactDOM from 'react-dom/client';
import { App } from './app';

function Main(): JSX.Element {
return <App />;
}

async function main() {
console.log('--- snowscape client main ----');
pollForReload('/main.bundle.js');
pollForReload('/style.css');

const element = document.getElementById('root')!;
const root = ReactDOM.createRoot(element);
root.render(<Main />);
}

main();

function pollForReload(url) {
let previous: string | null = null;
const poll = async () => {
const resp = await fetch(url);
const text = await resp.text();
if (previous === null) {
previous = text;
} else if (previous !== text) {
window.location.reload();
}
setTimeout(poll, 800 + Math.random() * 800);
};
setTimeout(poll, 250);
}

The actual app (App.tsx)

The plumbing out of the way, we now have a place to start developing the app in a file that free of the any of the bootstrapping logic:

import React, { JSX } from 'react';

export function App(): JSX.Element {
return (
<div>
<h1>Hello App</h1>
</div>
);
}

Where's all the other stuff?

What about eslint, hot reloading dev server that can load individual components, tailwind CSS integration, API generation, deployment scripts, etc. etc.?

I'm very wary of the "not built here" syndrome that leads to developers (including myself) building things themselves that have already been built by others in high-quality fashion. However, over the years, too many times JavaScript build systems have "locked" my projects into a certain way of developing that (1) makes upgrades to new libraries hard, (2) doesn't work with other libraries / tools, (3) breaks mysteriously after not being used for 6+ months, (4) etc. As such, I tend to try to keep JavaScript build systems pretty minimal and "unabstracted" so it's easier to debug when there's a build issue. That said, the above is "good enough" for most of the simple one-off apps I experiment with but certainly not the best and probably not what is desirable for a full, production web app being developed by a team!

Render pipeline improvements

· One min read
  • Cleaned up the line rendering a bit
  • Added a "built-in" primitve for an XY-grid
  • Modified the instance to allow it to stay locked to the camera position
  • Allow for pass filters so instances go to the right pass / shader

Render passes

· One min read

What's exciting about the below image?

It is rendering lines.

alt text

Mostly internal work to refactor code and conceptual work to understand the various components of the WGPU architecture, but the engine code now supports rendering lines. This involved setting up a separate shader, render pass, line geometry buffer, and a few other things.

Not too exciting, but progress!

Also: an engine debug view

Also in the not-that-exciting category, I added a debug web view. The engine starts a small HTTP server that serves up basic data about the engine state. There's not much there yet!

alt text

WIP: voxel chunking

· 4 min read

A very work-in-progress implementation of voxel chunking and generation.

Update 3: "Optimizations"

After thinking about what I need to optimzie for a bit, I realized I was still running everything as debug build! 🤦‍♂️

Update 2: slow, but headed in the right direction

Update with more progress

Pseudo-code

Cache
entries : Map<Cell position, Entry>

Entry
timestamp
instance_id

update_cells:
compute a NxNxM bounds around the camera
transform the corners to cell coordinates
cells are 32x32x8

scan each cell in the bounds
if cache[cell coords] is empty
add a new entry
entry.timestamp = now

for each cache entry
if entry.timestamp > AGE
remove instance.id from scene
drop entry
else if entry has no model instance
instance = call generator(cell bounds)
entry.id = instance.id
add instance to scene

Work-in-progress

The below gets the basic pseudo-code going, which was the goal. It has a number of issues, which were intentionally deferred to be addressed later:

  • Hook in a real terrain generator function
  • Improve the Potentially Visible Set (PVS) code - the fixed bounds around the camera seems a bit too brute force
  • Cell cache
    • Address stuttering - rhe load/discard adds and removes many tiles at once. Seems like it would benefit from spreading out the load and/or somehow being smarter about the load/discard process.
  • General performance - does a lot of work already and seems slow. Need to look into what kind of unnecessary work it is doing. Do that research before deciding what code changes are needed.
  • The generator simply clones a fixed tile instance. It's not a "real" generator
  • The load/discard of new tiles has a lot of stutter as many tiles get added / removed at once
  • The potential visible set bounds are intentionally small for debugging
  • General performance: this does a lot of work, which doesn't seem scalable without refinement

End-to-end functionality first

A bit about how I approach new work where I personally don't know exactly what needs to be done.

Note that step (2c) is (surprisingly?) often the easiest step to ignore yet the most beneficial to do well in terms of it's impact of understanding the problem space and implementating something both quickly but also future-friendly.

1. Define the result

2. Get it working
a. Define the high-level approach
b. Implement a rough end-to-end pipeline
c. Proxy as many details as possible (i.e. use the simplest
implementation I can that still is representative of what
I eventually want)

3. Rinse-and-repeat step 2 until it works

4. Make it better
a. Replace proxies with real implementations
b. Fill in details

5. Scope to "good enough"
a. Repeat 4 until it feels "complete" even if it's not what I
fully wanted in 1
b. Recognize

5. Code clean-up
a. Remember I won't remember what I had been thinking even a
week later, so code comments, naming, and general clean-up
are worth it

6. Write-up
a. Write a light summary of what I did. This highlights what I may
have missed especially around (1), (2a), and (5)
b. Write out the next todos to make it possible to pick up where
I left off later

In this particular case, looking at step (2c) the keys to getting something going were:

  1. Cloning a fixed instance rather than using a real, position dependent terrain generator
  2. Making that fixed instance something very simple
  3. Making the potentially visible set logic very simple (just a fixed bounds)
  4. Ignoring performance until I got it functionally working

Performance fix

Figured out part of the performance issues:

  1. I had incorrectly assumed dropping stale tiles would be fast; it wasn't
  2. Dropped instances could be anywhere in the instance list
  3. The re-sync code was written such that any mismatch implicitly invalidated everything after it in the list

The fix was to hash the existing instance IDs and check for against that unordered table rather than continuing the very simple prior logic which assumed indicies always aligned.

Plugins as dylibs

· 2 min read

Nothing new in the image below! Under the hood though, the code changed...

From

Build and run Rust plugins as processes.

Pass parameters in as a JSON string to the command-line. Receive output as YAML on stdout.

Simple but not the most efficient encoding and limited to one fixed output on stdout.

To

Build and run Rust plugins as dynamic libraries.

Pass parameters via JSON still, but plugin functions return structs as bitcode-encoded byte arrays.

Still fairly simple, bitcode seems (not that surprisingly) more efficient. The dynamic library can have multiple entry points for more flexibility.

alt text

It's somewhat interesting to note this is my third "try" at plugins already. Maybe I should plan more in advance? Maybe this is part of the learning experience?

#3 Rust dynamic libraries: once I had a better understanding of what I wanted to do, this seems like a more natural fit. Leaves room for more generic plugins, including ones that stay in memory. May want a process/"service"-based mode for added decoupling in the future, but for now, this is a very easy approach for single-file, run-once plugins.

#2 Rust processes: moved to this to get the type checking. As I'm still learning Rust, more or less dove straight into hacking this together just to get something to work. Stuck with the JSON/YAML exchange formats since it let me postpone worrying about them.

#1 Deno-based node plug-ins: had some old JS code for generating terrains. Thought it might be nice to allow the imprecision of JS and just rely on JSON/YAML formats for data exchange. Moved on what I quickly realized I wanted the type-checking of Rust, including on the interchange formats.

Terrain generator

· One min read

Added the ability to write Rust "scripts" (basically crates that the engine will run to get model output from) and added a voxel terrain with some basic procedural noise. I think of it as a "script" since it uses a blanket important use scriptlib::*; which brings in a lot of utilities to keep the code itself concise.

Also switched the WGPU to prefer linear RGB color until I progress a bit further and want to handle sRGB correctly.

alt text

Hot reloading architecture

· 8 min read

TL;DR: I can now modify my sinscape.js script file and when I do the sine-wave based voxel landscape will automatically refresh in the engine without a restart.

I wanted to add "hot reloading" to the engine so that changes to data files are automatically reflected in the running engine. This is one of those small developer-ergonomics changes that, over time, I believe has huge benefits to productivity.

The primary challenge was to architect this such that the engine internals remain clean: i.e.

  1. Avoid scattering with knowledge of source file to asset mappings throughout the engine
  2. Avoid introducing complex inter-object references within the engine (makes for a Rust lifetime manageable headache)
  3. Minimal runtime impact in a release build
  4. Keep file watching code isolated and independent as it's a development feature, not an engine feature

I expect to have to revist this as the engine functionality increases and as I learn more about how to use Rust more effectively. 😄

Heads up

This article does not go into full depth on some of changes discussed. If you'd like more detail added to any section, let me know! I wanted to be sure there was an audience for this before going into any more depth.

Architecture

  1. Build a list of files -> asset ids during loading
  2. Add a dev-only Actor that watches for file change
  3. Trigger a reload for any assets that have been marked dirty
  4. Do the reload

Build the dependency graph during scene loading (DependencyList)

Record the dependencies

As the loader opens files, it maintains a mapping of each file to the list of asset ids that file impacted. Building the "graph" is simple as long as two rules are followed:

  1. Record direct dependencies: whenever a file is opened, ensure any assets created by that file add any entry mapping that file -> asset id
  2. Record transitive dependnecies: whenever an asset relies on data from another asset, copy all the dependencies from the existing asset to the newly created asset.

Example: when loading a .vox file, we simply add that file name as a dependency on the model that's going to use that vox data.

dependency_list.add_model_entry(vox_file.to_str().unwrap(), &desc.header.id);

let vox_data: vox_format::VoxData = vox_format::from_file(vox_file).unwrap();

We record the dependencies as IDs rather than object references as it's far cleaner for managing lifetimes.

For a simple scene, we end up with a list like the following

   1.4 INFO  --- Dependency list ---
[Model] mmm-house3
data/dist/base/models/mmm-house3/mmm-house3.yaml
data/dist/base/models/mmm-house3/obj_house3.vox
[Model] sinscape
data/dist/base/generators/sinscape.js
data/dist/base/models/sinscape.yaml
[Model] unit_cube
data/dist/base/models/unit_cube.yaml
[Scene] main
data/dist/base/scenes/main.yaml
[Instance] house-000
data/dist/base/models/mmm-house3/mmm-house3.yaml
data/dist/base/models/mmm-house3/obj_house3.vox
data/dist/base/scenes/main.yaml
[Scene] main-001
data/dist/base/scenes/main-001.yaml

Intrusive tracking

This is an "intrusive" approach: the bookkeeping of dependency tracking must be inlined directly into the loading logic and cannot be plugged in as an optional feature. This however feels fine as a design choice since the cost of building a mapping table is relatively low and it is conceptually simple.

The loading code expects each asset load to have 1 or more calls to methods such as the below. Thus, we want an interface that makes recording dependencies simple, hard-to-get-wrong, and ideally self-descriptive one-liners.

impl DependencyList {
// ...

// Direct dependencies
pub fn add_scene_entry(&mut self, file_path: &str, id: &str) { ... }
pub fn add_model_entry(&mut self, file_path: &str, id: &str) { ... }
pub fn add_instance_entry(&mut self, file_path: &str, id: &str) { ... }

// Transitive dependencies
pub fn copy_entries(&mut self,
src_type: EntityType, src_id: &str,
dst_type: EntityType, dst_id: &str) { ... }

// ...

Design choice: a list not a graph

Transitive dependencies copy dependencies which flattens the dependency graph. This makes it a dependency list. This is done for simplicity's sake, though has a small trade-off (continue reading for more on this).

The alternative

The alternative would be to record asset -> asset dependencies as well file -> asset dependencies. This would add only a little more complexity as the flattening would happen at use, not build, time for the list -- but per the below this didn't seem worth doing at this stage. 🤷

Design choice: an immutable list after initialization

The architecture builds this list at initial load only. It is treated effectively an immutable/static list after startup.

✅ The benefit is this is very simple to reason about. The dependency list requires no dynamic update logic.

🚫 The downside is changes such as file renames or inter-asset dependency modifications will cause the dependency list to go stale.

The trade-off seems good as the unsupport cases are not the common case, the workaround is trivial (restart the engine).

Watch the files for changes (FileWatcher)

I wanted to keep file watching logic out of the core engine. From an architectural perspective this should be as "pluggable" feature while incurring as little effect on the runtime in a release build as possible.

  • The overhead of building the DependencyList during loading seems fine to always in the build
  • The notion of a DirtyList also seems fine in a release build as it is rather isolated
  • However, the file watching code should not be in the core code.

This was solved by adding an Actor to the Engine. This approach is quite simple and encapsulates the file watching code quite nicely. The FileWatcher itself only depends on a file list and file -> id mapping table: it doesn't really need to understand much more than doing that mapping.

Pseudo-code

on init:
for each file in the dependency list
set up a file watcher

on every Nth frame update:
check if the file watcher has reported any changes
if no, return

for each modified file
look up the asset ids dependent on that file
update the engine's dirty_list with those asset ids

Rust code

Details

This is certainly not the "best" code, but was good enough to get things working. I'm still learning Rust, so feedback on improving this code is very welcome.

use crate::engine;
use log::info;
use std::{
collections::HashSet,
sync::{Arc, Mutex},
};

use notify::{Config, PollWatcher, RecursiveMode, Watcher};

pub struct FileWatcher {
watcher: PollWatcher,
dirty_list: Arc<Mutex<HashSet<String>>>,
}

impl FileWatcher {
pub fn new(file_list: Vec<String>) -> Self {
let (tx, rx) = std::sync::mpsc::channel();
// use the PollWatcher and disable automatic polling
let mut watcher = PollWatcher::new(tx, Config::default().with_manual_polling()).unwrap();

// Sort simply for display / debugging purposes
let mut file_list = file_list;
file_list.sort();

for f in file_list {
info!("Watching: {:?}", f);
watcher
.watch(f.as_ref(), RecursiveMode::NonRecursive)
.unwrap();
}

// run event receiver on a different thread, we want this one for user input
let dirty_list = Arc::new(Mutex::new(HashSet::new()));
{
let dirty_list = dirty_list.clone();
std::thread::spawn(move || {
for res in rx {
match res {
Ok(event) => {
let mut v = dirty_list.lock().unwrap();
for p in event.paths {
v.insert(p.to_str().unwrap().to_string());
}
}
Err(e) => println!("watch error: {:?}", e),
}
}
});
}
Self {
watcher,
dirty_list,
}
}
}

impl engine::Actor for FileWatcher {
fn update(&mut self, frame_state: &engine::FrameState) {
if frame_state.frame % 60 != 37 {
return;
}
self.watcher.poll().unwrap();

let mut v = self.dirty_list.lock().unwrap();
if v.len() == 0 {
return;
}

let values = v.drain();
for file in values {
info!("File changed: {:?}", file);
let entries = frame_state.dependency_list.entries_for_file(&file);
for e in entries {
frame_state.dirty_list.borrow_mut().push(e.clone());
}
}
}
}

Communicate what's changed (DirtyList)

The Engine maintains a simple DirtyList to be notified about changes.

I wanted to avoid complex event system, callbacks, object references or anything of that sort. So it simply has a list of asset ids that are currently considered "dirty."

The FileWatcher, on it's own file events, simply adds assets ids to this list.

On each frame, the Engine checks if the dirty list is non-empty. If so, it provides the hot reloader with the list of asset ids to reload (and the original DependencyList to do the back-mapping to files it may need to reload). It clears the list after telling the hot reloader to do its work.

Reload the asset (HotReloader)

The HotReloader uses a brute-force implementation (this likely will need to be revisted in the future).

If anything needs to be reloaded, the hot reloader loads the entire scene from disk again. This has the advantage of being simple: it's a "clean slate" that uses the exact same logic as engine startup.

It then loops over all active entities in the engine and checks if they are in the dirty list. If they are, it copies in the relevant data from the freshly loaded scene over the current data -- thus refreshing the asset.