ReasonFlow 🧠

ReasonFlow is a novel framework designed to implement O(1)-like reasoning capabilities in large language models.
It uses Multi-Path Generation with Noise to generate and fuse multiple reasoning paths, robustly handling uncertainty.
This approach improves inference quality, scalability, and generalization for diverse NLP tasks.

Multi-Path Generation with Noise

ReasonFlow introduces multiple parallel "thinkers" generating partial outputs or "thoughts" at each step.

Core Parameters

• num_of_thinkers: number of parallel reasoning agents
• num_of_thoughts: tokens each thinker generates before the next iteration
• topk_thinkers: how many top performers to select from each step

Advanced Features

• dynamic_temperature: enables adaptive temperature scheduling based on generation progress
• diversity_weight: controls the balance between quality and diversity in path selection
• uncertainty handling: adapts acceptance threshold based on model confidence

Dynamic Behavior

Temperature Scheduling: Automatically adjusts sampling temperature based on:
- Generation progress
- Model confidence
- Exploration vs exploitation needs
Uncertainty Handling: Adapts generation strategy using:
- Token probability distribution analysis
- Path diversity measurements
- Acceptance threshold adjustment

By combining thought paths from the best thinkers and leveraging adaptive mechanisms, ReasonFlow stabilizes overall output. This mechanism allows exploration of multiple reasoning paths and selective fusion of solutions, leading to more coherent and robust results.

High-Performance Architecture

ReasonFlow is optimized for speed and efficiency with:

Efficient Tensor Operations:
- Minimal memory transfers using in-place operations
- Batch processing and operator fusion where possible
- GPU-resident tensors to avoid CPU-GPU transfers
Smart Caching & Prefetching:
- Optimized KV-cache handling for iterative generation
- Specialized fast paths for common generation scenarios
- LRU caching for repeated module lookups
Memory Optimization:
- Tensor pre-allocation for predictable memory usage
- Reduced redundant copies of large embeddings
- Strategic tensor cleanup based on iteration count
Numerical Stability:
- Log-domain computations for better precision
- Weighted token averaging based on thinker quality
- Improved probability and similarity calculations

The framework's modular design enables efficient scaling to multiple thinkers and long generation sequences while maintaining low latency.

Installation

Clone this repository.
Install dependencies:
```
pip install -r requirements.txt
```

Usage

Import and set up the ReasonFlow components:

from reasonflow import ReasonFlow, ReasonFlowConfig

# Configure with advanced features
config = ReasonFlowConfig(
    num_of_thinkers=2,
    num_of_thoughts=4,
    topk_thinkers=1,
    dynamic_temperature=True,
    diversity_weight=0.3,
    uncertainty_threshold=0.7
)

reason_flow = ReasonFlow(config, model, tokenizer)

Generate multiple reasoning paths:

output, best_thinkers_summary = reason_flow.generate_thoughts(
    "Your input here",
    max_new_tokens=128,
    device="cuda"
)

Configuration Options

ReasonFlowConfig(
    num_of_thinkers=2,              # Number of parallel reasoning paths
    num_of_thoughts=4,              # Tokens per generation step
    topk_thinkers=1,                # Number of best paths to select
    acceptance_threshold=0.5,       # Base threshold for path acceptance
    temperatures=[0.7, 1.3],        # Temperature range for sampling
    diversity_weight=0.3,           # Weight for diversity vs quality
    min_temperature=0.5,            # Minimum sampling temperature
    max_temperature=1.5,            # Maximum sampling temperature
    dynamic_temperature=True,       # Enable adaptive temperature
    uncertainty_threshold=0.7,      # Threshold for uncertainty handling
    min_acceptance_threshold=0.3,   # Minimum path acceptance threshold
    max_acceptance_threshold=0.8    # Maximum path acceptance threshold
)

Performance Considerations

For optimal performance:

Thinker Count: For most applications, 2-6 thinkers provide a good balance between quality and speed
Batch Size: Larger thought batches (4-8) can improve throughput on high-end GPUs
Device Placement: Keep tensors on GPU for best performance, avoiding host-device transfers
Token vs Hidden States: Using tokens (use_tokens=True) is generally faster than hidden states

Refer to the provided Python files for more detailed implementation insights.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
assets		assets
example		example
reasonflow		reasonflow
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ReasonFlow 🧠

Multi-Path Generation with Noise

Core Parameters

Advanced Features

Dynamic Behavior

High-Performance Architecture

Installation

Usage

Configuration Options

Performance Considerations

About

Uh oh!

Releases 1

Packages

Uh oh!

Languages

License

mkurman/ReasonFlow

Folders and files

Latest commit

History

Repository files navigation

ReasonFlow 🧠

Multi-Path Generation with Noise

Core Parameters

Advanced Features

Dynamic Behavior

High-Performance Architecture

Installation

Usage

Configuration Options

Performance Considerations

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Languages

Packages