Nodejs-Llama: Desktop & Distributed Inference

A dual-mode llama.cpp integration project supporting both desktop (Electron) and distributed cluster (Inferno OS) deployments.

Overview

This project provides two ways to run llama.cpp inference:

Desktop Mode: An Electron application with Node.js native addon for local, single-user inference
Distributed Mode: Pure Limbo implementation for Inferno OS, optimized for distributed cognition across thousands of tiny inference engines in load-balanced clusters

Features

Desktop Mode (Electron)

Load LLM models through a user-friendly interface
Process text prompts asynchronously in a separate thread
Built with Electron for cross-platform compatibility
Direct integration with llama.cpp via a Node.js addon

Distributed Mode (Inferno OS)

Deploy thousands of modular isolates as Dis VM instances
Load balancing across cluster with multiple strategies
Distributed cognition with collective inference capacity
Auto-scaling based on load and resource availability
Aggregate throughput of 10,000+ tokens/sec with 1000+ nodes
Limbot: AI chat assistant CLI with conversation history
Dish Integration: Interactive distributed shell for cluster access

Quick Start

Choose your deployment mode:

Desktop Mode Setup - For local single-user inference
Distributed Mode Setup - For cluster deployment with thousands of nodes

Desktop Mode (Electron)

Prerequisites

Node.js (v16+)
npm or yarn
C++ compiler (GCC, Clang, or MSVC)
CMake (for building llama.cpp)
Git

Installation

Clone this repository:

git clone https://github.com/aruntemme/llama.cpp-electron.git
cd llama.cpp-electron

Install dependencies:
```
npm install
```

Clone and build llama.cpp (required before building the Node.js addon):

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
mkdir build
cd build
cmake ..
cmake --build . --config Release
cd ../..

Build the Node.js addon:
```
npm run build
```
Start the application:
```
npm start
```

How to Use (Desktop Mode)

Launch the application
Click "Select Model" to choose a llama.cpp compatible model file (.bin or .gguf)
Enter a prompt in the text area
Click "Process Prompt" to analyze the text
View the results in the results section

Distributed Mode (Inferno OS)

For distributed cluster deployment with thousands of tiny inference engines.

Prerequisites

Inferno OS installed (or Inferno emulator)
Limbo compiler
llama.cpp compatible models

Installation

Check Inferno installation:
```
cd inferno
./deploy.sh check
```
Compile Limbo modules:
```
./deploy.sh compile
```

Deploy to cluster:

./deploy.sh deploy-local   # For local testing
# or
./deploy.sh deploy-cluster # For distributed cluster

Quick Start (Distributed Mode)

Initialize cluster:
```
cd inferno
./llamboctl init
```

Spawn inference nodes:

./llamboctl spawn --count 1000 --type tiny

Check cluster status:
```
./llamboctl status
```

Process inference requests:

# Requests are automatically load-balanced across nodes
./deploy.sh test

Monitor cluster:

./llamboctl health
./llamboctl metrics --export prometheus

Use Limbot AI chat assistant:

# Interactive chat mode
./llamboctl limbot

# One-shot inference
./llamboctl limbot "What is distributed computing?"

Use Dish distributed shell:

# Launch interactive shell
./llamboctl dish

Distributed Configuration

Edit inferno/cluster-config.yaml to configure:

Node types (tiny: 128MB, medium: 1GB, large: 8GB)
Node counts (100 to 10,000+)
Load balancing strategy (round-robin, least-loaded, random)
Auto-scaling parameters
Network topology

See inferno/README.md for complete documentation.

Model Files

You'll need to download LLM model files separately. Compatible models include:

GGUF format models (recommended)
Quantized models for better performance
Other formats supported by llama.cpp

You can download models from Hugging Face or other repositories.

For Desktop Mode: Place models in a location accessible by the application.

For Distributed Mode: Place models in /models directory for cluster nodes to access.

Architecture

See ARCHITECTURE.md for detailed documentation on both architectures.

Desktop Mode: Single Electron process with Node.js addon → llama.cpp C++ library

Performance: ~10 tokens/sec
Resource: GB of RAM required
Use case: Single-user desktop application

Distributed Mode: Thousands of Inferno Dis VM instances with Limbo implementation

Performance: ~10,000+ tokens/sec aggregate (1000 nodes)
Resource: 128MB per tiny node, scales horizontally
Use case: Distributed cluster, massive parallel inference

Troubleshooting

Desktop Mode

Model loading errors: Ensure your model file is compatible with llama.cpp
Addon building errors: Make sure llama.cpp is properly built before building the addon
Performance issues: Large models may require more memory and processing power

Common Issues (Desktop)

Cannot find llama.h: Make sure you've built llama.cpp using the steps above
Loading model fails: Verify the model path is correct and the model is in a supported format
Electron startup errors: Check the terminal output for detailed error messages

Distributed Mode

Compilation errors: Ensure Inferno environment is properly configured
Node spawn failures: Check resource limits (ulimit) and available ports
Load balancing issues: Verify cluster configuration in cluster-config.yaml
Module loading errors: Ensure Limbo modules are compiled to .dis bytecode

See inferno/README.md for detailed troubleshooting.

Project Structure

llama.cpp-electron/
├── src/                    # Desktop mode (Electron)
│   ├── addon/             # C++ Node.js addon
│   │   ├── llama_addon.cpp
│   │   └── binding.gyp
│   ├── main.js            # Electron main process
│   ├── renderer.js        # Frontend logic
│   ├── preload.js         # IPC bridge
│   ├── index.html         # UI
│   └── styles.css
├── inferno/               # Distributed mode (Inferno OS)
│   ├── llambo.m           # Module definition
│   ├── llambo.b           # Implementation
│   ├── llambotest.b       # Test suite
│   ├── cluster-config.yaml # Cluster configuration
│   ├── deploy.sh          # Deployment script
│   ├── llamboctl          # Cluster control utility
│   └── README.md          # Detailed documentation
├── llama.cpp/             # Submodule
├── ARCHITECTURE.md        # Architecture documentation
├── README.md              # This file
└── package.json

Performance Comparison

Mode	Deployment	Throughput	Latency	Scalability
Desktop	Single machine	~10 tok/s	100ms	Limited by local resources
Distributed (100 nodes)	Cluster	~1,000 tok/s	50ms	Horizontal scaling
Distributed (1000 nodes)	Cluster	~10,000 tok/s	45ms	Thousands of nodes

Use Cases

Desktop Mode:

Personal AI assistant
Local development and testing
Single-user applications
Privacy-focused deployments

Distributed Mode:

Large-scale inference services
Multi-tenant platforms
Research clusters
Edge computing networks
Distributed AI systems

License

This project is licensed under the ISC License - see the LICENSE file for details.

Acknowledgments

llama.cpp - Inference engine
Electron - Desktop application framework
Node.js - JavaScript runtime
Inferno OS - Distributed operating system
Limbo - Programming language for Inferno

Documentation

ARCHITECTURE.md - Detailed architecture for both modes
inferno/README.md - Complete Inferno/Limbo documentation
Desktop Mode: See above sections
Distributed Mode: See inferno/ directory

Contributing

Contributions are welcome! Areas of interest:

Desktop Mode:

UI/UX improvements
Additional llama.cpp features
Performance optimizations

Distributed Mode:

FFI bindings to llama.cpp C library
Advanced load balancing algorithms
Consensus and cognitive fusion strategies
Monitoring and telemetry
Production deployment tools

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
inferno		inferno
llama.cpp		llama.cpp
scripts		scripts
src		src
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
IMPLEMENTATION-SUMMARY.md		IMPLEMENTATION-SUMMARY.md
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
worker_log.txt		worker_log.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Nodejs-Llama: Desktop & Distributed Inference

Overview

Features

Desktop Mode (Electron)

Distributed Mode (Inferno OS)

Quick Start

Desktop Mode (Electron)

Prerequisites

Installation

How to Use (Desktop Mode)

Distributed Mode (Inferno OS)

Prerequisites

Installation

Quick Start (Distributed Mode)

Distributed Configuration

Model Files

Architecture

Troubleshooting

Desktop Mode

Common Issues (Desktop)

Distributed Mode

Project Structure

Performance Comparison

Use Cases

License

Acknowledgments

Documentation

Contributing

About

Uh oh!

Releases

Packages

Languages

c9py/llama.cpp-electron

Folders and files

Latest commit

History

Repository files navigation

Nodejs-Llama: Desktop & Distributed Inference

Overview

Features

Desktop Mode (Electron)

Distributed Mode (Inferno OS)

Quick Start

Desktop Mode (Electron)

Prerequisites

Installation

How to Use (Desktop Mode)

Distributed Mode (Inferno OS)

Prerequisites

Installation

Quick Start (Distributed Mode)

Distributed Configuration

Model Files

Architecture

Troubleshooting

Desktop Mode

Common Issues (Desktop)

Distributed Mode

Project Structure

Performance Comparison

Use Cases

License

Acknowledgments

Documentation

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages