This project implements a simulation and evaluation pipeline for sentimental agents that deliberate, update opinions, and make collective decisions across various evaluation scenarios. It includes tools for dialog generation, sentiment analysis, evaluation, and visualization using multiple language models through Ollama.
This project uses Ollama for local language model inference. Ollama must be running before executing any simulations.
- Install Ollama: Follow instructions at https://ollama.ai
- Start Ollama service:
ollama serve
- Pull required models (choose based on your hardware):
# Lightweight models (good for testing) ollama pull deepseek-r1:1.5b ollama pull llama3.2:1b # Medium models (balanced performance) ollama pull llama3.1:8b ollama pull mistral:7b # Larger models (better quality, requires more RAM) ollama pull gpt-oss ollama pull gemma3:27b
We recommend using Python 3.10+ conda environment.
conda create -n sentimental_agents python=3.10
conda activate sentimental_agents
pip install -r requirements.txtThe system simulates multi-agent evaluation scenarios where AI agents with different expertise collectively assess items for decision-making:
- Agent Generation: Creates evaluators with distinct roles and specialized criteria
- Item Evaluation: Agents discuss merits and drawbacks using their domain expertise
- Sentiment Tracking: Monitors sentiment dynamics during deliberation
- Multi-Modal Analysis: Tests different sentiment feedback modes:
none: No sentiment awarenessown_sentiment: Agents aware of their own emotional stateothers_sentiment: Agents aware of others' emotional states
Each simulation runs across:
- Multiple models: Tests different LLMs for varied reasoning styles
- Multiple temperatures: Controls response randomness (0.0, 0.3, 0.7)
- Multiple seeds: Ensures statistical significance (30 random seeds)
- Multiple feedback modes: Compares sentiment-aware vs standard evaluation
The framework supports various evaluation contexts through configurable templates:
- Evaluators: E.g CFO, VP of Engineering, Department Manager
- Input: Candidate resume and job description
- Output: Hiring recommendation with pros/cons analysis
- Evaluators: E.g Senior Researcher, Associate Editor, Domain Expert
- Input: Paper abstract/content and venue standards
- Output: Publication recommendation with technical assessment
Additional domains can be configured by modifying the prompt templates and simulation setup.
data/input/
├── simulation_setup_data.json # Domain configuration (job/venue/publication)
└── default_sample_1.csv # Item data (candidates/papers/articles)
output_files/
└── YYYYMMDD_HHMMSS_model_full_experiment/
└── Item_001/ # Individual item results
├── none_temp0.0_seed10/ # Condition-specific results
│ └── simulation_data.json
├── own_sentiment_temp0.3_seed20/
└── others_sentiment_temp0.7_seed30/
For a quick test with one item and one model:
python main.py \
--simulation_setup_data data/input/simulation_setup_data.json \
--candidate_csv data/input/candidate_sample_1.csv \
--num_processes 4For comprehensive multi-model experiments:
./run.shNote: Each experiment covers one model at a time but tests all combinations of temperatures, seeds, and feedback modes. The script processes papers sequentially to avoid overwhelming system resources.
The run.sh script automatically:
- Detects your platform (Linux/macOS/Windows)
- Sets optimal CPU affinity (Linux only)
- Processes all CSV files in
data/input/ - Retries failed runs up to 3 times
- Runs comprehensive evaluation after simulation
Current supported models (configured in utilities.py):
deepseek-r1:1.5b- Fast, lightweightllama3.2:1b- Ultra-lightweightllama3.1:8b- Balanced performancemistral:7b- Strong reasoninggpt-oss- Open-source GPT variantgemma3:27b- High-quality, resource-intensive
- Seeds: 30 random seeds (10-300) for statistical significance
- Temperatures: 0.0 (deterministic), 0.3 (balanced), 0.7 (creative)
- Feedback modes: 3 sentiment awareness conditions
- Max rounds: 10 discussion rounds per simulation
After simulations complete, the system automatically generates:
- Confidence intervals (95% CI) for all metrics
- Statistical significance tests between conditions
- Sentiment evolution plots showing opinion dynamics
- Agent-level analysis of individual reviewer behavior
- Cross-condition comparisons of feedback mode effectiveness
Results include both raw per-seed data and aggregated statistics for publication-ready analysis.
- "Connection refused": Ensure Ollama is running (
ollama serve) - Model not found: Pull the required model (
ollama pull model-name) - Memory issues: Use smaller models or reduce batch size
- Slow performance: Check CPU affinity settings in
run.sh
# Check Ollama status
curl http://localhost:11434/api/tags
# Monitor system resources
htop # Linux
top # macOS/LinuxThe system provides detailed logging and progress indicators throughout execution.