Project Status: Minimum viable product with core functionality working, but many features are missing and bugs remain. You can also checkout command line alternative AISysRevCmdLine
This web-application offers AI-based support for Systematic Literature Reviews. Currently, only one step is supported: title–abstract screening. Although the application runs in a web browser, all data is stored locally on your machine. LLMs are accessed through OpenRouter, and data for screening can be imported from Scopus. The application allows you to:
- Import a CSV file with paper titles and abstracts. You can also use our Demo CSV file
- Specify include/exclude criteria for paper screening
- Evaluate papers against the criteria using multiple LLMs
- Receive LLM evaluations as binary decisions (include/exclude), ordinal ratings (1-7), or inclusion probabilities (0–1)
- Perform manual evaluation of titles and abstracts alongside LLM evaluations
- Export evaluation results to CSV for further analysis in Microsoft Excel, Google Sheets, R, Python, etc.
The application is based on our research papers on this topic. Please consider citing if you use the application 1–2.

Main view shows LLM screening tasks.

Manual evaluation view, with LLM evaluations (binary, ordinal, probability) alongside manual review.

Manual evaluation list view, with papers sorted by inclusion probability according to all executed LLMs.
The tool has been tested with CSV data exported from Scopus. Support for Web of Science can be achieved by editing the columns headers to match the ones from Scopus. The minimum required fields are: Document title, DOI, Abstract, Authors, and Source title.
The application is integrated with OpenRouter, which supports multiple LLMs ranging from very affordable to top-tier models like OpenAI’s ChatGPT, Google’s Gemini, Anthropic’s Claude, Meta's LLama, and Mistral. To use the models, you need to provide an OpenRouter key. You can set spending limits for each key directly on the OpenRouter website. New users also receive $5 in free credits when creating an account.

Note: Paper screening speed is about 4,5s per paper. We are working on parallelizing this after which it should go down to about 0.2s/paper.
- Docker, with Compose and buildx plugins installed.
uvPython package and project manager: https://docs.astral.sh/uv/getting-started/installation/- Enough RAM (At least 8GB recommended)
- Network connection
See https://docs.docker.com/desktop/ for Docker installation instructions. Docker Desktop includes Docker Compose, Docker Buildx, Docker Engine and the Docker CLI.
If Docker Desktop did not include Buildx plugin, see: [https://github.com/docker/buildx][https://github.com/docker/buildx]
- Run
docker infoto verify you have Docker installed- Docker
26.0.0has been tested as working. For MacOS computers with Colima, Docker version28.5.1confirmed to be working.
- Docker
- Run
docker buildx versionto verify you have Docker Buildx installed. For MacOS computers, Buildx plugin version0.29.1confirmed to be working. - Run
docker compose versionto verify you have Compose installed. For MacOS computers, Compose plugin version2.40.3confirmed to be working.- Version
2.33.1has been tested as working, newer versions should also work. - Note: Older versions of Compose use
docker-composeas the compose command. We don't provide support for legacy Compose versions.
- Version
First, clone the repository to your local computer.
git clone https://github.com/EvoTestOps/AISysRev.gitmove to correct directory
cd AISysRevStart the application in production mode:
make start-prodIf you want to develop the app, run:
make start-devThe startup of the app may a while due to the download of corresponding Docker images & services, application dependencies and building of the application.
After startup, open the application:
If you ran start-prod, navigate to https://localhost (the Caddy server's root CA is by default untrusted. You can bypass the browser warning).
If you used make start-dev, navigate to http://localhost:3001
If you do not have Windows Subsystem for Linux (WSL), start the application with
./start-prod.batTypeScript, React, Tailwind CSS, Vite, Wouter, Zod, Redux
Python, FastAPI, PostgreSQL, SQLAlchemy, Alembic
- Node.js v22 LTS
- Python 3.14
- Docker, with Compose plugin installed
- UV: https://docs.astral.sh/uv/getting-started/installation/
make start-dev
./start-dev.bat
Open up the client: http://localhost:3000
/api is proxied to the backend container, e.g. http://localhost:3000/api/v1/health will be proxied to http://localhost:8080/api/v1/health.
API docs: http://localhost:3000/documentation
Server: http://localhost:8080
Adminer GUI: http://localhost:8081/?pgsql=postgres&username=your_username&db=your_database_dev&ns= password: your_password
Mock data is located in data/mock -folder.
Run in client/ npm test for e2e tests
Run in root make backend-test (./backend-test.bat for Windows non-WSL) for backend tests
Run in root make backend-test-html (./backend-test-html.bat for Windows non-WSL) for backend tests and HTML coverage report
The project includes a Makefile for common development and database operations:
| Command | Description |
|---|---|
make start-dev |
Start dev containers with live reloading and build on startup (default setup) |
make start-test |
Start test containers and rebuild images (isolated test environment) |
make start-prod |
Start production container and rebuild images |
Note: Run all commands from the project root.
Containers are isolated by environment using the Docker Compose-pflag.
| Command | Description |
|---|---|
make m-create m="Message" |
Create a new migration with an autogenerated diff (replace Message) |
make m-up |
Apply all pending migrations (upgrade to latest) |
make m-hist |
Show the full migration history with details |
make m-current |
Display the current migration version in the database |
Currently, we support models provided via Openrouter.
MIT
[1] Huotala, A., Kuutila, M., Ralph, P., & Mäntylä, M. (2024). The promise and challenges of using llms to accelerate the screening process of systematic reviews. Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering, 262–271. https://doi.org/10.1145/3661167.3661172
[2] Huotala A, Kuutila M, Mäntylä M. SESR-Eval: Dataset for Evaluating LLMs in the Title-Abstract Screening of Systematic Reviews. In Proceedings of the The 19th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM) 2025 Oct 218 (pp. 1-12) https://arxiv.org/abs/2507.19027