Filip Hráček / text /

Making my 1970’s-style renderer multi-threaded

The following article describes a Flutter use case that is so niche that I wouldn’t be surprised if I’m literally the only person who has it at this point.

Still, I personally love to read deeply technical articles even when their usefulness for me is unclear at best — so I decided to write this anyway.

Use case

My game is mostly 2D, but it includes a retro 3D renderer. From the start of working on the game, I wanted a specific look which I’m going to call “a combination of 1970s sci-fi aesthetic and modern military UI”.

For this aesthetic, a “normal”, modem 3D renderer just wouldn’t work. So I decided to create a software (non-GPU) 3d renderer mostly from scratch. This allowed me to have complete control over every aspect of the final look of the 3d objects, and since we live in the 21st century, our contemporary computers are more than capable of running the renderer at high framerates.

Single-threaded beginnings

As a rule, I try to build every prototype feature single threaded first before adding the complexity of concurrency.

Since we’re talking about 1970s graphics running on 2020s computers, I was able to keep the renderer single-threaded, on the main thread, for something like 2 years.

Then again, a 3d renderer is a 3d renderer, so I didn't completely ignore performance during that time.

One cool thing about Flutter is that you get access to some low level drawing APIs, including Canvas.drawVertices. This method allows you to send a list of triangles (with various colors and/or textures) basically straight to the GPU. Perfect for my use case.

Pretty soon after implementing the initial 3d renderer, I addressed the fact that I’m creating a new list of triangles every frame. That means a lot of memory allocation and garbage collection. So I went with the slightly more low-level method of creating lists in Dart: TypedData.

Dart classes like Float32List (which is a subclass of TypedData) allow you to allocate a continuous block of memory. In this case, a continuous array of 32-bit floating point numbers. In contrast to how you would do it normally (with List<double>), you and the compiler have an understanding that this is really a continuous block of numbers, it can’t be split, it can’t be appended (without copying), and it contains only plain values (no boxed values). Sometimes, this is exactly what you want — it’s fast, simple, and already in a format that other parts of the computer (e.g., the GPU) understand.

So far, so good. You can hear me talk about drawVertices and typed data buffers in this talk:

So, now we have a software 3d renderer written in Dart (a high level GC'd language) that blits triangles to the screen while allocating almost no memory from frame to frame. For a modern PC, the performance is more than acceptable.

Road to multi-threading

But then I realize I would sometimes like to show many independent 3D models at once. And I’m also adding more and more game logic, including marching square computations for multiple heightmaps on every frame. And I never wanted the game to be exclusive to modern machines - I want the game to be playable on potatoes.

I do some profiling and, on Widows (which I expect to be my main audience), the renderers take something like 20-30% of CPU time. Even with the optimizations above. That’s a lot for something that’s an aesthetic choice and not crucial to the game’s simulation. And remember, this is all still happening on the main thread (Canvas.drawVertices can’t be called from anywhere else).

So, it’s time to move the 3D renderer to a separate thread.

Shared memory in Dart

Dart’s concurrency model is based on isolates. The following is a gross simplification, but Dart isolates are threads that don’t share any mutable memory (i.e., they’re isolated). The features is inspired by Erlang’s processes and by the web platform’s WebWorkers.

The isolation is a fantastic feature because shared mutable memory is a major source of really-hard-to-debug bugs. If you don’t believe me, the next time you’re with a C++ or Java programmer, casually mention the word thread or mutex and see their eyes twitch.

With isolate-based concurrency, you don’t need to deal with deadlocks and concurrent modification bugs because your “threads” (isolates) can only communicate by sending each other messages. These messages are always deeply copied, so you can’t accidentally trample on another isolate’s collection, for example. You get your own copy.

This works great for many use cases.

But then there’s my use case. I could prepare the typed data buffers inside an isolate, but then by sending them to the main isolate, I either have to copy them, or I have to transfer their ownership. So I’d have to start anew for every new frame, allocating a new block of memory. That’s something I want to avoid.

FFI to the rescue

Thankfully, while Dart itself tries to keep you away from the foot-gun that is shared memory, it also has to talk to other systems that don’t have such luxury. Notably, the C foreign function interface (FFI) — a standard way by which different programming languages communicate with each other using the lowest common denominator of C function calling.

Without leaving Dart, you can allocate memory on the native heap and ask Dart to free it once you don’t need it anymore:

import 'package:ffi/ffi.dart';

const n = 40;
final pointer = malloc.allocate<Int64>(n * Int64List.bytesPerElement);
final array = pointer.asTypedList(n, finalizer: malloc.nativeFree);

I learned this API from mraleph (Slava Egorov) on the Flutter Forum. Forums, btw, are treasure troves of information and deep technical discussions. (Full disclosure: I’m a mod on the aforementioned forum and it was kind of my idea to start it. But I happily give all credit to the person who actually started it, Hillel Coren of It’s All Widgets fame, and to all the other people who are active there.) Seriously, if you want to keep your sanity, I can highly recommend limiting your time on social media and instead joining a focused forum.

Anyway, now we have a way to share mutable arrays across isolate boundaries:

Putting shared mutable memory to work

So, now that we have a way to punch a hole (as Slava puts it) into Dart’s concurrency isolation model, how do we actually make it work?

Here’s how:

The result is noticeable, even when running on a very powerful device. On my M4 MacBook Pro, when showing three 3D renders at once, average time taken by a frame on the main thread goes from 3.7 ms to 2.9 ms. That’s 20% improvement on a thread that currently does a lot of other things (from physics simulation through marching squares all the way to AI).

Basically, the main thread’s cost of displaying all these 3D models went from 20% of its overall CPU time to close to zero.

The future of mutable shared state in Dart

The Dart team is experimenting with a more direct support for shared mutable memory. The proposal is written by the aforementioned Slava Egorov (the Dart Tech Lead), and discusses things that go beyond simple arrays. For my current use cases, that would probably be overkill, but I’m very happy that the Dart team thinks about supporting advanced (and scary and unsafe) programming. It’s the only way Dart can truly become a powerhouse general-purpose programming language. Give people sane, safe tools to work with, but also allow them to bring out a chainsaw if they must.

Was AI useful here?

It’s late 2025, and so I feel the need to address the obvious question of “was any of this code written by an A.I.”? I initially hoped so, because it seemed like a good fit. The code is already there, just put it into an isolate and come up with a message passing scheme.

But the reality was quite dismal. Even when I asked it to simply implement some standard Isolate boilerplate based off the shared memory example I've already written, it introduced a bug. After that, I got so nervous (the bug was not easy to spot) that I just implemented the whole thing myself. It took me a day, and some of it was spent chasing my own bugs (freeing memory twice, a classic) but most of it was iteratively crafting the code to my liking.

As a rule, I’m trying to force myself to use A.I.: to get out of my comfort zone, and to reap the benefits of this new technology. But increasingly, I fear that my opinion on LLMs is starting to set, and I see them not as independent coding agents I can trust, but more like fast code generators. There’s a huge gap between what current LLMs can dependably do, and the CHOP vision of “I’ll be a tech lead and the AIs will be my team”. Maybe this works in some scenarios, but I just haven’t been able to make it work.

Where current AI does help is, in my opinion:

This is already huge help, don’t get me wrong. It’s just that I currently don’t see an easy path from here to “AI agent can do expert level programming on my behalf”.

Future work

I’m obviously not done with the game or its optimization. Choosing Flutter & Dart to implement a simulation-heavy real-time game has its upsides (like ease of development; portability) but also its downsides (something like C++ still slays in terms of performance, especially compared to GC'd languages like Dart; Unity & Godot come with so many more bells and whistles). I chose the trade-off knowingly: it makes perfect sense for me (a Flutter expert who actually enjoys building games solo from a text editor) and for the project at hand (a game full of complex interlocking systems and experimental UI). I’m far from suggesting that it makes sense for anyone else.

So this post isn’t some kind of a “here’s how you do it” explainer. It’s more of a “look at this obscure problem I had” kind of article.

If you do find any of the above useful or at least entertaining, I’m happy.

— Filip Hráček
November 2025