Filip Hráček / text /
The following article describes a Flutter use case that is so niche that I wouldn’t be surprised if I’m literally the only person who has it at this point.
Still, I personally love to read deeply technical articles even when their usefulness for me is unclear at best — so I decided to write this anyway.
My game is mostly 2D, but it includes a retro 3D renderer. From the start of working on the game, I wanted a specific look which I’m going to call “a combination of 1970s sci-fi aesthetic and modern military UI”.
For this aesthetic, a “normal”, modem 3D renderer just wouldn’t work. So I decided to create a software (non-GPU) 3d renderer mostly from scratch. This allowed me to have complete control over every aspect of the final look of the 3d objects, and since we live in the 21st century, our contemporary computers are more than capable of running the renderer at high framerates.
As a rule, I try to build every prototype feature single threaded first before adding the complexity of concurrency.
Since we’re talking about 1970s graphics running on 2020s computers, I was able to keep the renderer single-threaded, on the main thread, for something like 2 years.
Then again, a 3d renderer is a 3d renderer, so I didn't completely ignore performance during that time.
One cool thing about Flutter is that you get access to some low level drawing APIs, including Canvas.drawVertices. This method allows you to send a list of triangles (with various colors and/or textures) basically straight to the GPU. Perfect for my use case.
Pretty soon after implementing the initial 3d renderer, I addressed the fact that I’m creating a new list of triangles every frame. That means a lot of memory allocation and garbage collection. So I went with the slightly more low-level method of creating lists in Dart: TypedData.
Dart classes like Float32List (which is a subclass of TypedData) allow you to allocate a continuous block of memory. In this case, a continuous array of 32-bit floating point numbers. In contrast to how you would do it normally (with List<double>), you and the compiler have an understanding that this is really a continuous block of numbers, it can’t be split, it can’t be appended (without copying), and it contains only plain values (no boxed values). Sometimes, this is exactly what you want — it’s fast, simple, and already in a format that other parts of the computer (e.g., the GPU) understand.
So far, so good. You can hear me talk about drawVertices and typed data buffers in this talk:
So, now we have a software 3d renderer written in Dart (a high level GC'd language) that blits triangles to the screen while allocating almost no memory from frame to frame. For a modern PC, the performance is more than acceptable.
But then I realize I would sometimes like to show many independent 3D models at once. And I’m also adding more and more game logic, including marching square computations for multiple heightmaps on every frame. And I never wanted the game to be exclusive to modern machines - I want the game to be playable on potatoes.
I do some profiling and, on Widows (which I expect to be my main audience), the renderers take something like 20-30% of CPU time. Even with the optimizations above. That’s a lot for something that’s an aesthetic choice and not crucial to the game’s simulation. And remember, this is all still happening on the main thread (Canvas.drawVertices can’t be called from anywhere else).
So, it’s time to move the 3D renderer to a separate thread.
Dart’s concurrency model is based on isolates. The following is a gross simplification, but Dart isolates are threads that don’t share any mutable memory (i.e., they’re isolated). The features is inspired by Erlang’s processes and by the web platform’s WebWorkers.
The isolation is a fantastic feature because shared mutable memory is a major source of really-hard-to-debug bugs. If you don’t believe me, the next time you’re with a C++ or Java programmer, casually mention the word thread or mutex and see their eyes twitch.
With isolate-based concurrency, you don’t need to deal with deadlocks and concurrent modification bugs because your “threads” (isolates) can only communicate by sending each other messages. These messages are always deeply copied, so you can’t accidentally trample on another isolate’s collection, for example. You get your own copy.
This works great for many use cases.
String, parse it, find what you need, and send the information back packaged as a nice Dart object.TransferableTypedData to basically transfer the ownership of the resulting bytes — no copying involved.But then there’s my use case. I could prepare the typed data buffers inside an isolate, but then by sending them to the main isolate, I either have to copy them, or I have to transfer their ownership. So I’d have to start anew for every new frame, allocating a new block of memory. That’s something I want to avoid.
Thankfully, while Dart itself tries to keep you away from the foot-gun that is shared memory, it also has to talk to other systems that don’t have such luxury. Notably, the C foreign function interface (FFI) — a standard way by which different programming languages communicate with each other using the lowest common denominator of C function calling.
Without leaving Dart, you can allocate memory on the native heap and ask Dart to free it once you don’t need it anymore:
import 'package:ffi/ffi.dart';
const n = 40;
final pointer = malloc.allocate<Int64>(n * Int64List.bytesPerElement);
final array = pointer.asTypedList(n, finalizer: malloc.nativeFree);
I learned this API from mraleph (Slava Egorov) on the Flutter Forum. Forums, btw, are treasure troves of information and deep technical discussions. (Full disclosure: I’m a mod on the aforementioned forum and it was kind of my idea to start it. But I happily give all credit to the person who actually started it, Hillel Coren of It’s All Widgets fame, and to all the other people who are active there.) Seriously, if you want to keep your sanity, I can highly recommend limiting your time on social media and instead joining a focused forum.
Anyway, now we have a way to share mutable arrays across isolate boundaries:
array object above is of type Int64List. Except it’s backed by memory outside the Dart heap.array is reachable, the memory will be allocated. Afterwards, Dart calls malloc.nativeFree to free the block of memory. (This means that you probably want to assign the array to a field of some long-living object — otherwise the memory will get freed as soon as array is no longer in scope.)pointer to another isolate. The pointer is just a memory address, so copying it while sending is basically free.pointer.asTypedList(n) (without the finalizer) to get its own buffer, backed by the same block of memory.n to the other isolate, so it knows how large the buffer is.So, now that we have a way to punch a hole (as Slava puts it) into Dart’s concurrency isolation model, how do we actually make it work?
Here’s how:
Retro3D) is added to the Flutter widget tree, it first loads the bytes of the 3D file from assets. (AssetBundle is not available outside the main isolate, AFAIK.)IsolateRenderer and asks it to initialize() with the file bytes and some initial, immutable information about the scene.IsolateRenderer spawns the worker isolate and sends the data over with the initial message. It then starts waiting for messages from the worker isolate.n vertices, you need a Float32List of n*2 elements — two 2D coordinates per each vertex. Similarly, for m triangles, you need an Int32List of m*3 elements — one ARGB color for each point on the triangle.)_BufferAllocationRequest message to the main isolate with the sizes it needs.IsolateRenderer allocates the memory using malloc.allocate() and creates two sets of buffers, #1 and #2. It saves these buffers into a field (IsolateRenderer._renderBuffers) so that they aren’t freed prematurely. It then sends the pointers back to the worker isolate as a _UseTheseSharedBuffers message.TypedData objects using the pointers it received._IsolateIsReady message to the main isolate.IsolateRenderer receives this message and completes its initialization.Retro3D widget listens to changes to a ChangeNotifier called SceneViewConfig, and every time there’s any change (e.g., moved camera, changed zoom), it calls a method on IsolateRenderer called requestNextFrame().requestNextFrame() uses a CancellableOperation to debounce the calls, so that we don’t accidentally ask for five consecutive renders in quick succession (just because some other part of the code changed 5 different things about the SceneViewConfig, one after the other).requestNextFrame() sends a RenderConfig message to the worker isolate. The RenderConfig object includes things like the camera position, zoom, but also the current viewport size and the RenderConfig.id (which exists mostly for debugging — to link requests to later renders).RenderConfig message._RenderReady message. This message only contains the index of the used buffer (#1 or #2), which is a single integer. So there’s no copying of large amount of data, nor any kind of re-allocation. The message also contains some additional data, such as the current polygon count (some polygons might be hidden and therefore not present in the render mesh) and the projection matrix (so that the main thread can compute where to put labels on the render).IsolateRenderer receives the _RenderReady message and transforms it into a RenderResult which contains all the data needed for a Canvas.drawVertices() call. So, the index of the buffer received from the worker isolate is used to find the actual TypedData objects.RenderResult is assigned as the new value of a ValueNotifier.repaint listenable for the CustomPainter that actually paints the 3D render on the screen.SceneViewConfig leads to requestNextFrame() call, which in turn sends a RenderConfig to the worker isolate, which renders it into one of the shared buffers, then notifies the main isolate, which repaints using the new data.The result is noticeable, even when running on a very powerful device. On my M4 MacBook Pro, when showing three 3D renders at once, average time taken by a frame on the main thread goes from 3.7 ms to 2.9 ms. That’s 20% improvement on a thread that currently does a lot of other things (from physics simulation through marching squares all the way to AI).
Basically, the main thread’s cost of displaying all these 3D models went from 20% of its overall CPU time to close to zero.
The Dart team is experimenting with a more direct support for shared mutable memory. The proposal is written by the aforementioned Slava Egorov (the Dart Tech Lead), and discusses things that go beyond simple arrays. For my current use cases, that would probably be overkill, but I’m very happy that the Dart team thinks about supporting advanced (and scary and unsafe) programming. It’s the only way Dart can truly become a powerhouse general-purpose programming language. Give people sane, safe tools to work with, but also allow them to bring out a chainsaw if they must.
It’s late 2025, and so I feel the need to address the obvious question of “was any of this code written by an A.I.”? I initially hoped so, because it seemed like a good fit. The code is already there, just put it into an isolate and come up with a message passing scheme.
But the reality was quite dismal. Even when I asked it to simply implement some standard Isolate boilerplate based off the shared memory example I've already written, it introduced a bug. After that, I got so nervous (the bug was not easy to spot) that I just implemented the whole thing myself. It took me a day, and some of it was spent chasing my own bugs (freeing memory twice, a classic) but most of it was iteratively crafting the code to my liking.
As a rule, I’m trying to force myself to use A.I.: to get out of my comfort zone, and to reap the benefits of this new technology. But increasingly, I fear that my opinion on LLMs is starting to set, and I see them not as independent coding agents I can trust, but more like fast code generators. There’s a huge gap between what current LLMs can dependably do, and the CHOP vision of “I’ll be a tech lead and the AIs will be my team”. Maybe this works in some scenarios, but I just haven’t been able to make it work.
Where current AI does help is, in my opinion:
<code> blocks)This is already huge help, don’t get me wrong. It’s just that I currently don’t see an easy path from here to “AI agent can do expert level programming on my behalf”.
I’m obviously not done with the game or its optimization. Choosing Flutter & Dart to implement a simulation-heavy real-time game has its upsides (like ease of development; portability) but also its downsides (something like C++ still slays in terms of performance, especially compared to GC'd languages like Dart; Unity & Godot come with so many more bells and whistles). I chose the trade-off knowingly: it makes perfect sense for me (a Flutter expert who actually enjoys building games solo from a text editor) and for the project at hand (a game full of complex interlocking systems and experimental UI). I’m far from suggesting that it makes sense for anyone else.
So this post isn’t some kind of a “here’s how you do it” explainer. It’s more of a “look at this obscure problem I had” kind of article.
If you do find any of the above useful or at least entertaining, I’m happy.
— Filip Hráček
November 2025