Skip to content

Lightricks/LTX-2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LTX-2

Website Model Demo Paper Discord

LTX-2 is the first DiT-based audio-video foundation model that contains all core capabilities of modern video generation in one model: synchronized audio and video, high fidelity, multiple performance modes, production-ready outputs, API access, and open access.

ltx-2.mp4

🚀 Quick Start

# Clone the repository
git clone https://github.com/Lightricks/LTX-2.git
cd LTX-2

# Set up the environment
uv sync --frozen
source .venv/bin/activate

Required Models

Download the following models from the LTX-2 HuggingFace repository:

LTX-2 Model Checkpoint (choose and download one of the following)

Spatial Upscaler - Required for current two-stage pipeline implementations in this repository

Temporal Upscaler - Supported by the model and will be required for future pipeline implementations

Distilled LoRA - Required for current two-stage pipeline implementations in this repository (except DistilledPipeline and ICLoraPipeline)

Gemma Text Encoder (download all assets from the repository)

LoRAs

Available Pipelines

⚡ Optimization Tips

  • Use DistilledPipeline - Fastest inference with only 8 predefined sigmas (8 steps stage 1, 4 steps stage 2)
  • Enable FP8 transformer - Enables lower memory footprint: --enable-fp8 (CLI) or fp8transformer=True (Python)
  • Install attention optimizations - Use xFormers (uv sync --extra xformers) or Flash Attention 3 for Hopper GPUs
  • Use gradient estimation - Reduce inference steps from 40 to 20-30 while maintaining quality (see pipeline documentation)
  • Skip memory cleanup - If you have sufficient VRAM, disable automatic memory cleanup between stages for faster processing
  • Choose single-stage pipeline - Use TI2VidOneStagePipeline for faster generation when high resolution isn't required

✍️ Prompting for LTX-2

When writing prompts, focus on detailed, chronological descriptions of actions and scenes. Include specific movements, appearances, camera angles, and environmental details - all in a single flowing paragraph. Start directly with the action, and keep descriptions literal and precise. Think like a cinematographer describing a shot list. Keep within 200 words. For best results, build your prompts using this structure:

  • Start with main action in a single sentence
  • Add specific details about movements and gestures
  • Describe character/object appearances precisely
  • Include background and environment details
  • Specify camera angles and movements
  • Describe lighting and colors
  • Note any changes or sudden events

For additional guidance on writing a prompt please refer to https://ltx.video/blog/how-to-prompt-for-ltx-2

Automatic Prompt Enhancement

LTX-2 pipelines support automatic prompt enhancement via an enhance_prompt parameter.

🔌 ComfyUI Integration

To use our model with ComfyUI, please follow the instructions at https://github.com/Lightricks/ComfyUI-LTXVideo/.

📦 Packages

This repository is organized as a monorepo with three main packages:

  • ltx-core - Core model implementation, inference stack, and utilities
  • ltx-pipelines - High-level pipeline implementations for text-to-video, image-to-video, and other generation modes
  • ltx-trainer - Training and fine-tuning tools for LoRA, full fine-tuning, and IC-LoRA

Each package has its own README and documentation. See the Documentation section below.

📚 Documentation

Each package includes comprehensive documentation:

About

Official Python inference and LoRA trainer package for the LTX-2 audio–video generative model.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Languages