C++ game engine built to explore high-performance architecture.
Currently under active development, serves as both a learning platform and research project.
Or it might just be a playground to test my sanity.
Important
My original Bachelor's Thesis version is archived in the thesis branch.
Honestly? I just really love this stuff.
It started with my Bachelor's Thesis. I built a "functional C++ engine" from scratch. It was single-threaded and used a very basic Vulkan implementation (no bindless resources, no complex concurrency) just a simple loop. It worked, and I had a blast building it!
Then I watched Christian Gyrling’s GDC talk on Parallelizing the Naughty Dog Engine Using Fibers.
It blew my mind. Seeing how they saturated every single CPU core made me realize that my "simple loop" was basically running with the parking brake on. I really wanted to understand the raw, complex machinery behind AAA engines.
So, I scrapped my old architecture and started Luth from scratch.
This project is my personal deep dive into fiber-based job systems, lock-free memory models, bindless Vulkan rendering... It is absolutely over-engineered for a solo project, but that’s the point. I am building this to learn, to fail, and to obsess over the details that turn good code into high-performance architecture.
Luth moves away from standard C++ patterns (RAII everywhere, heavy STL usage, single-threaded contexts) in favor of Data-Oriented Design and Fiber-Based Concurrency.
Instead of spawning OS threads for specific tasks (like a "Render Thread" or "Audio Thread"), Luth treats the CPU as a generic pool of workers.
- N:M Threading: The engine spawns one Worker Thread per CPU core. Logical tasks are wrapped in Fibers (lightweight user-mode stacks) that can migrate between workers.
- Zero Blocking: The engine is designed to never sleep. If a job needs to wait for a dependency (or the GPU), it yields execution to the scheduler, which immediately swaps in another fiber. This keeps CPU saturation near 100%.
- Naughty Dog Inspiration: The scheduler implementation utilizes Adaptive Mutexes (spinning briefly before yielding) and Atomic Counters for synchronization to avoid priority inversion.
To hide latency, the engine pipelines execution across three distinct stages running in parallel. At any given moment
- Frame N (Game Logic): Physics, AI, and Transform updates run on the CPU.
- Frame N-1 (Render Logic): The results of the previous game frame are read to record Vulkan Command Buffers in parallel.
- Frame N-2 (GPU execution): The GPU executes the commands submitted for the frame prior.
Standard new/delete calls are forbidden in the hot path. Luth uses a strict memory hierarchy to handle the complexity of fiber migration:
- Tagged Page Allocator: A Naughty Dog-style allocator using 2MB virtual pages. Memory is allocated with a specific "Tag" (e.g.,
LevelGeometry,Frame_N) and freed in bulk. It uses per-thread caches to allow lock-free allocations during gameplay. - Frame Packets: Linear allocators that reset every frame. Used for transient data like command lists or UI state. This eliminates destructor overhead for 90% of runtime objects.
The renderer is built for modern hardware, focusing on reducing driver overhead.
- Bindless Descriptors: Uses
VK_EXT_descriptor_indexingto bind all engine textures to a single global array (Set 0). Materials simply store an integer index, allowing any draw call to access any texture without rebinding sets. - Dynamic Rendering: Eliminates legacy
VkRenderPassandVkFramebufferobjects. - Timeline Semaphores: Replaces
vkWaitForFences. A dedicated Poller Job runs on the CPU, querying semaphore values and waking up dependent fibers only when the GPU has finished a specific workload.
Prerequisites:
- OS: Windows 10/11
- Compiler: MSVC (v143+) or Clang (C++20 compliant)
- SDK: Vulkan SDK 1.3+
Steps:
- Clone the repository + submodules
git clone --recursive https://github.com/Hekbas/Luth.git
- Generate Project Files:
Run the setup script to run Premake:
scripts/setup_windows.bat
- Build:
Open the generated solution
Luth.slnand build the project.
LUTH Engine is built on the shoulders of giants:
- Vulkan SDK: The core rendering backend.
- EnTT: Fast Entity-Component-System (ECS).
- ImGui: Immediate Mode GUI for the Editor.
- Jolt Physics: (Planned) High-performance rigid body physics.
- Tracy: Real-time remote frame profiling.
- SPIRV-Cross: Shader reflection and cross-compilation.
- GLFW: Windowing and Input management.
- GLM: Mathematics library.
- spdlog: Fast C++ logging.
- assimp: Asset importing (Models).
- stb_image: Image loading.
- nlohmann/json: JSON serialization.
Warning
The engine is currently undergoing a massive architectural reboot (Phase 5) to fully implement the Fiber/Vulkan integration described above.
Core platform abstraction and initial rendering capability.
- Platform Layer:
- Windowing: GLFW integration with window abstraction.
- Input System: Centralized event bus and polling (Mouse/Keyboard/Controller).
- Vulkan Bootstrap: Initial Instance/Device creation and VMA (Vulkan Memory Allocator) integration.
- ImGui Integration:
- Viewports & Docking: Multi-window support enabling a modern editor layout.
- Vulkan Hooks: Custom render backend for ImGui draw lists.
Foundation of the Job System (Pre-Reboot).
- Fiber Primitives:
- Task Scheduler: Fiber-based execution model using assembly context switching.
- Synchronization: Basic counters and wait primitives.
- Async Asset Loading: Job-based loading pipeline to offload disk I/O from the main thread.
Robust, metadata-driven asset management system.
- Artifact System:
- Asset Database: Integrity scanning, orphan cleanup, and UUID registry.
- Library Cache: "Source vs. Artifact" separation. Importers compile raw assets (PNG, OBJ) into efficient binary binaries stored in
Library/. - Metadata: Generation of
.metafiles for import settings stability.
- Shader Reflection:
- SPIRV-Cross Integration: Automated reflection of shader binaries to generate Descriptor Set layouts and pipeline interfaces dynamically.
- Material System: Reflection-driven property exposure (uniforms matched to Inspector fields).
Scalable entity management and WYSIWYG tooling.
- ECS Optimization:
- Dirty Flags: Reactive transform updates to avoid redundant matrix multiplication.
- Hierarchy Propagation: Optimized O(N) parent-to-child world matrix calculation.
- POD Enforcement: Strict Plain-Old-Data layout for core components (
Transform,Camera) to maximize cache locality.
- Editor Polish:
- Project Panel: Virtual filesystem view with Search, Zoom, and Drag & Drop support.
- Inspector: Async preview loading and weak reference management for textures.
- Gizmos: ImGuizmo integration for visual object manipulation.
Establish the memory and execution foundation. No Vulkan implementation in this phase.
5.1 Memory Architecture
- Tagged Page Allocator:
- Implement
TaggedPageAllocator(Pool of 2MBVirtualAllocpages). - Implement
FreeTag(uint32_t tag)for bulk reclamation. - Implement
ThreadCache(Index-based access viaJobContext) to minimize global lock contention.
- Implement
- Frame Architecture:
- Define
FrameParamsstruct (Inputs, Time, Matrices, Viewport). - Implement Triple Buffering for
FrameParams(Game N, Render N-1, GPU N-2).
- Define
- Job Context:
- Define
struct JobContext. - Fields:
TaggedAllocator*,FrameParams*,ThreadIndex,FiberID. - Rule: All Jobs must accept
JobContext&as their primary argument.
- Define
5.2 Fiber Runtime
- Safety Audit:
- Remove all
thread_localvariables. Replace withJobContextlookups. - Verify
Guard Pageimplementation (protects against stack overflow).
- Remove all
- Synchronization Primitives:
- Implement
AdaptiveMutex(Spin ~2000 cycles ->Fiber::Yield()). - Implement
AtomicCounter(Wait adds fiber to "Wait List", resume moves to "Ready List").
- Implement
- Scheduler Upgrade:
- Implement Priority Queues (High, Normal, Low).
- Policy: High Priority (Game Logic) preempts Low Priority (Asset Loading).
5.3 Engine Loop
- Main Loop Refactor:
- Implement explicit pipeline staging:
KickGame(N)->SubmitRender(N-1)->RecycleGPU(N-2).
- Implement explicit pipeline staging:
- I/O Subsystem:
- Spawn dedicated
IO_Thread(OS Thread). - Implement
FileRequestQueue(Lock-free ring buffer). - Constraint: Blocking I/O (fread/fstream) forbidden in Fiber Workers.
- Spawn dedicated
Lock-free, explicit command generation utilizing the core memory model.
6.1 Synchronization
- Timeline Semaphores:
- Create
TimelineSemaphorewrapper. - Rule: All Queue submits must signal a Timeline value (Frame Index).
- Create
- The Poller Job:
- Implement
VulkanWaitJob. - Logic:
vkGetSemaphoreCounterValue-> If < Target, Yield; Else, execute callback. - Cleanup: Remove all
vkWaitForFencescalls.
- Implement
6.2 Command Buffer Management
- Command Allocator:
- Create
struct CommandAllocator(ownsVkCommandPool+vector<VkCommandBuffer>). - Implement
GetBuffer()(Lock-free, internal cache). - Implement
Reset()(Resets underlying pool).
- Create
- Global Pool:
- Create
CommandAllocatorPool(Thread-safeConcurrentQueue). - Acquire: Pop allocator from queue.
- Release: Push allocator back to queue.
- Create
- Integration:
- Add
CommandAllocator*toJobContext. - Implement "Lazy Acquire" in Render Jobs.
- Implement "Frame End Release" (Recycle only when
Pollerconfirms GPU done).
- Add
6.3 Render Graph & Execution
- Render Graph Compiler:
- Implement DAG topological sort.
- Implement
BarrierBuilder(InjectvkCmdPipelineBarrier2).
- Parallel Recording:
- Refactor
RenderGraph::Execute:- Split Pass into N tasks (Secondary Buffers).
- Dispatch Jobs (concurrent with Game N).
- Coalesce Secondary Buffers into Primary for submission.
- Refactor
High-throughput resource streaming.
7.1 Bindless Resources
- Global Descriptor Heap:
- Enable
VK_EXT_descriptor_indexing. - Create Layout:
binding=10, uniform texture2D globalTextures[].
- Enable
- Material System:
- Refactor Materials to use
uint32_t textureIDinstead ofVkDescriptorSet. - Upload Material Data to a global
SSBO.
- Refactor Materials to use
7.2 Asset Streaming
- Upload Context:
- Create dedicated
TransferQueue. - Implement
StagingRingBuffer(Persistent mapped memory).
- Create dedicated
- Async Flow:
-
IO_Threadreads binary -> SpawnsDecompressJob-> Allocates Staging -> Records Copy -> Signals Timeline.
-
Graphical fidelity.
- PBR Implementation:
- Load .HDR Environment Maps.
- Implement IBL (Irradiance/Prefilter Compute Shaders).
- Shadows:
- Cascaded Shadow Maps (CSM).
- Parallel Shadow Render Pass.
- Post-Processing:
- Tone Mapping (ACES).
- Bloom (Compute Shader).
Engine usability.
- Scene Serialization: YAML Save/Load via EnTT.
- Physics: Jolt Physics Integration (Jobified via
JobSystem). - Scripting: C# Mono Integration (Game Logic stage).
- Editor: ImGui Docking, Gizmos, Asset Browser.

