EasyLocomo

EasyLocomo is a streamlined, easy-to-use version of the evaluation framework for the LoCoMo (Long-term Conversational Memory) benchmark.

This repository adapts the original logic and data from the paper "Evaluating Very Long-Term Conversational Memory of LLM Agents" (ACL 2024), ensuring that the evaluation results are consistent with the original author's repository while providing a much simpler experience for testing any LLM via OpenAI-compatible APIs.

🌟 Key Features

Result Consistency: Uses the same data and evaluation logic as the original LoCoMo project. Consistency of results has been verified using GPT-4o-mini. See release 0.1.0 for details.
Simplified Setup: No complex bash scripts or environment setup. Optimized for uv and standard Python environments.
OpenAI API Compatibility: Call any LLM that supports the OpenAI API format (e.g., GPT-4o, GPT-4o-mini, Claude via proxy, DeepSeek, or local models via Ollama/vLLM).
Flexible Configuration: Easily set your API key, base URL, and model name.
Breakpoint Resumption: Automatically saves progress after each sample/batch and skips already predicted samples, allowing for reliable long-running evaluations.
JSON Mode & Robust Parsing: Utilizes OpenAI's JSON mode for structured outputs and includes advanced cleaning logic (removing reasoning thoughts, markdown blocks) to ensure high parsing success rates.
Error Logging: Detailed parsing errors are logged to a separate *_errors.jsonl file for easy debugging and model output analysis.
Automatic Reporting: Automatically generates performance statistics (Accuracy, BERTScore, etc.) and summaries of the results.
Token Estimation: Includes a utility script to estimate the token count of the evaluation dataset to help manage costs.

🚀 Quick Start

1. Installation

Clone the repository and install the dependencies. We recommend using uv for extremely fast setup:

# Using uv (Recommended)
uv sync

# Or using standard pip
pip install -r requirements.txt

2. Configuration

You can configure your API credentials by creating a .env file in the root directory:

OPENAI_API_KEY=your_api_key_here
OPENAI_API_BASE=https://api.openai.com/v1

Or you can pass them directly in the run_evaluation.py script.

3. Run Evaluation

Simply run the run_evaluation.py script:

# Using uv
uv run run_evaluation.py

# Or using standard python
python run_evaluation.py

By default, this will evaluate the model on the data/locomo10.json dataset. Results, including predictions and statistical reports, will be saved in the outputs/ directory.

📊 Results and Statistics

After running the evaluation, you will find the following files in the outputs/ directory:

[model_name]_qa.json: The model's predictions.
[model_name]_qa_stats.json: Detailed accuracy metrics (Overall, Session-level, etc.).
[model_name]_qa_summary.json: A human-readable summary of the evaluation results.

📚 Reference & Citation

This project is built upon the work by Maharana et al. (ACL 2024). Please cite the original paper if you use this benchmark:

@inproceedings{maharana2024locomo,
  title={Evaluating Very Long-Term Conversational Memory of LLM Agents},
  author={Maharana, Adyasha and Lee, Dong-Ho and Tulyakov, Sergey and Bansal, Mohit and Barbieri, Francesco and Fang, Yuwei},
  booktitle={Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL)},
  year={2024}
}

Original Repository: snap-research/locomo

🛠️ Advanced Usage

You can customize the evaluation parameters in run_evaluation.py:

run_test(
    model_name="gpt-4o-mini", 
    batch_size=15,
    max_context=65536,
    data_file="data/locomo10.json",
    category=1,
    overwrite=False
)

model_name: The identifier of the model to test.
batch_size: Number of concurrent API calls.
max_context: Maximum context length (tokens) passed to the model.
category: (Optional) Filter evaluation for a specific category (1-5). Useful for re-testing specific subsets.
overwrite: Whether to re-run evaluations for already predicted samples.

License

This project follows the licensing of the original LoCoMo repository. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
locomo		locomo
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run_evaluation.py		run_evaluation.py
token_estimator.py		token_estimator.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EasyLocomo

🌟 Key Features

🚀 Quick Start

1. Installation

2. Configuration

3. Run Evaluation

📊 Results and Statistics

📚 Reference & Citation

🛠️ Advanced Usage

License

About

Uh oh!

Releases 1

Packages

Languages

License

playeriv65/EasyLocomo

Folders and files

Latest commit

History

Repository files navigation

EasyLocomo

🌟 Key Features

🚀 Quick Start

1. Installation

2. Configuration

3. Run Evaluation

📊 Results and Statistics

📚 Reference & Citation

🛠️ Advanced Usage

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages