Skip to content

sunnweiwei/FoldAgent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Scaling Long-Horizon LLM Agent via Context-Folding

Paper: https://arxiv.org/pdf/2510.11967

cover

Training

Note: This is an open-source re-implementation based on agent_loop in verl. This may differ from the code we used to train the model in our paper.

Key Files

FoldAgent/
├── verl/
│   ├── experimental/
│   │   └── agent_loop/            # Base agent loop implementations
│   └── trainer/
│       └── ppo/                   # PPO training with FoldGRPO algorithm
├── agents/
│   └── fold_agent.py              # Core agent logic (process_item)
├── envs/
│   └── local_search.py            # Local search environment
└── scripts/
    └── train_fold.py              # Training script

1. Start Search Server

Start the search server on a separate machine. This will download the corpus (Tevatron/browsecomp-plus-corpus), pre-computed embeddings (miaolu3/browsecomp-plus), and load the Qwen3-Embedding-8B model on available GPUs.

cd envs && python search_server.py \
  --model Qwen/Qwen3-Embedding-8B \
  --corpus Tevatron/browsecomp-plus-corpus \
  --corpus-embedding-dataset miaolu3/browsecomp-plus \
  --host 0.0.0.0 \
  --port 8010

Set environment variables:

# URL of the local search server (for BrowseComp-Plus)
export LOCAL_SEARCH_URL="http://[IP-of-search-server]:8010"

# For LLM-based answer grading
export OPENAI_API_KEY="your-api-key"

2. Download Training Data

Download and decompress the BrowseComp dataset: https://drive.google.com/file/d/1aX5xXAN5R-gLKd8A0AY-troxXJRawyAM/view?usp=sharing

3. Train on BrowseComp

Example script to train Qwen3-8B:

bash scripts/train_bc_qwen3_8b.sh

Evaluation

1. Start Search Server

cd envs && python search_server.py \
  --model Qwen/Qwen3-Embedding-8B \
  --corpus Tevatron/browsecomp-plus-corpus \
  --corpus-embedding-dataset miaolu3/browsecomp-plus \
  --host 0.0.0.0 \
  --port 8000

2. Evaluate on BrowseComp

export OPENAI_API_KEY='your-key'

python scripts/eval_bc.py \
  --data_path data/bc_test.parquet \
  --model_name gpt-5-nano \
  --num_workers 150 \
  --workflow search_branch \
  --prompt_length 16384 \
  --response_length 32768 \
  --max_turn 200 \
  --val_max_turn 200 \
  --max_session 10 \
  --val_max_session 10 \
  --local_search_url http://localhost:8000 \
  --output_dir results

Output:

Evaluating: 100%|█████████████| 150/150 [32:52<00:00, 13.15s/item, avg_score=0.407, id=122]

============================================================
Overall - Avg Score: 0.4067, Success: 150/150

By Data Source:
  bc_test_easy: 0.8200 (50 items)
  bc_test_hard: 0.0400 (50 items)
  bc_test_medium: 0.3600 (50 items)
  • ReAct Agent: workflow=search
python scripts/eval_bc.py --workflow search [...]
  • Summary Agent: workflow=search, enable_summary
python scripts/eval_bc.py --workflow search --enable_summary [...]

3. Using Local LLMs (e.g., vLLM)

# Start vLLM server
vllm serve ByteDance-Seed/Seed-OSS-36B-Instruct --port 8001 --max-model-len 131072

# Run evaluation
export OPENAI_API_KEY='dummy'
export OPENAI_BASE_URL='http://localhost:8001/v1'

python scripts/eval_bc.py \
  --model_name ByteDance-Seed/Seed-OSS-36B-Instruct \
  --workflow search_branch \
  --num_workers 32 \
  --prompt_length 16384 \
  --response_length 32768 \
  --max_turn 200 \
  --val_max_turn 200 \
  --max_session 10 \
  --val_max_session 10 \
  --output_dir results

Evaluation and training on SWE-Bench Verified


Cite

@article{sun2025scaling,
  title   = {Scaling Long-Horizon LLM Agent via Context-Folding},
  author  = {Sun, Weiwei and Lu, Miao and Ling, Zhan and Liu, Kang and Yao, Xuesong and Yang, Yiming and Chen, Jiecao},
  journal = {arXiv preprint arXiv:2510.11967},
  year    = {2025},
}

Acknowledgements

This implementation is based on verl.

About

Scaling Long-Horizon LLM Agent via Context-Folding

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published