Skip to content

SoupBoss is an intelligent job matching system that ingests job postings and does semantic matching to your resume

License

Notifications You must be signed in to change notification settings

gitobic/SoupBoss

Repository files navigation

SoupBoss Logo

SoupBoss

Semi-Intelligent Job Matching System with Semantic Similarity Scoring

Python 3.13+ Package Manager Ollama Claude Flask

Table of Contents


Overview

SoupBoss is an intelligent job matching system that leverages AI-powered semantic similarity to connect resumes with job opportunities. It features both a flask web interface and a CLI for different use cases, processing job postings from multiple sources with sophisticated AI matching and comprehensive reporting.

Forward

Job titles and descriptions vary wildly across companies. Am I a DevOps Engineer? An SRE? A Product Owner? Searching job boards often feels like guessing keywords and scrolling endlessly.

SoupBoss solves this by semantically matching your resume against job postings. Instead of keyword searches, it uses embeddings (via Ollama) to find roles that best align with your background.

Why the Name SoupBoss?

Think of job matching like making soup:

  • The ingredients are skills, experience, and job requirements
  • The recipe is how they come together
  • SoupBoss is the chef β€” mixing, tasting, and serving up the best matches

Instead of forcing you to pick search terms, it stirs everything together to surface roles that β€œtaste right” for your background.

What SoupBoss is Not SoupBoss Is
- ❌ An auto-downloader of job postings
- ❌ An auto-submitter of applications
- ❌ A tool that links back to postings
1. Import (fetch) job postings
2. Add your resume(s)
3. Generate embeddings for both
4. Run the matcher to surface top fits
5. Apply directly via the company site

Screenshots

Web Interface CLI

Prerequisites

Required Software

  • Python 3.13+ (required for modern language features)
  • uv package manager (critical - do not use pip)
  • Ollama installed and running locally

Ollama Setup

⚠️ CRITICAL: SoupBoss requires Ollama running locally on your machine.

SoupBoss connects directly to Ollama (default: localhost:11434) for all AI operations. It could be converted to using API's.

# Install Ollama from https://ollama.com
# Then pull an embedding model:
ollama pull nomic-embed-text
# or
ollama pull bge-large

πŸš€ Quick Start

🌐 Web Interface (Recommended for Most Users)

Fastest way to get started:

# Start the web interface
./restart_webapp.sh

# Then open in any browser:
# http://localhost:5000

⚑ CLI Interface (Power Users & Automation)

For scripting and advanced features:

# Install dependencies (uv handles everything automatically)
uv sync

# Verify installation and check Ollama connection
uv run python main.py status

# Test embedding functionality
uv run python main.py test-embedding

🎯 Complete Workflows

🌐 Web Interface Workflow

Visual, step-by-step process:

  1. Start: Run ./restart_webapp.sh and open http://localhost:5000
  2. Upload: Drag-and-drop your resume files to the upload area
  3. Test Company: Enter company name (e.g., "spacex") and click "Test Company"
  4. Add Company: If test succeeds, click "Add Company"
  5. Fetch Jobs: Click "Fetch Jobs" to retrieve postings with real-time progress
  6. Generate Embeddings: Click "Generate Embeddings" (force regenerate if needed)
  7. Run Matching: Click "Run Matching" to execute AI similarity scoring
  8. View Results: Switch to "Results & Matches" tab to see professional job cards
  9. Sort & Filter: Use dropdown to sort by similarity, company, title, or date

⚑ CLI Workflow

Command-line power user guide:

# 1. Add a company to track
uv run python main.py companies add spacex --source greenhouse

# 2. Test if the company has an active job board
uv run python main.py companies test spacex --source greenhouse

# 3. Fetch jobs from the company
uv run python main.py jobs fetch --source greenhouse --company spacex

# 4. Verify jobs were imported
uv run python main.py jobs list --company spacex

# 5. Add your resume(s)
uv run python main.py resumes add /path/to/resume.pdf --name "My Resume"
uv run python main.py resumes add /path/to/resume.docx --name "Alternative Resume"

# 6. Check resume was processed
uv run python main.py resumes list

# 6a. Export resume list to PDF
uv run python main.py resumes list --pdf my_resumes.pdf

# 7. Generate AI embeddings (this takes time but runs once)
uv run python main.py match generate --time

# 8. Run similarity matching
uv run python main.py match run

# 9. View your top 20 matches
uv run python main.py match show 1 --limit 20

# 10. Generate a comprehensive HTML report
uv run python main.py report --format html --output my_matches.html

# 11. View detailed system statistics
uv run python main.py maintenance stats

Bulk Operations

For processing multiple companies or large datasets:

# Create a companies.txt file with one company per line
echo -e "spacex\ntesla\nopenai" > companies.txt

# Bulk fetch jobs
uv run python main.py jobs fetch --source greenhouse --companies-file companies.txt

# Add multiple resumes at once
uv run python main.py resumes add /resumes/*.pdf

# Export all matches to CSV
uv run python main.py match export --format csv --output all_matches.csv

Architecture

Core Components

soupboss/
β”œβ”€β”€ cli.py                 # Complete CLI interface (40+ commands)
β”œβ”€β”€ db.py                  # SQLite with vector search
β”œβ”€β”€ embeddings.py          # Ollama client integration
β”œβ”€β”€ ingestion.py           # Multi-API job fetching
β”œβ”€β”€ matching.py            # AI similarity engine
β”œβ”€β”€ export.py              # Professional reporting
β”œβ”€β”€ maintenance.py         # System utilities
└── config.py              # Configuration management

Data Flow

graph TD
    A[**Job APIs**<br/>Greenhouse, Lever,<br>SmartRecruiters] --> B[Ingestion Engine]
    C[**Resume Files**<br/>PDF, DOCX, TXT, MD] --> D[**SQLite Database**<br/>with Vector Extensions]
    B --> D
    
    D --> E[**Ollama AI Service**<br/>Local Embedding Generation]
    E --> F[**Vector Storage**<br/>384 or 768-dim embeddings]
    F --> G[**Similarity Matching**<br/>Cosine similarity scoring]
    
    G --> H[**Match Results**<br/>Ranked by similarity]
    H --> I[**Reports & Analysis**<br/>CSV, JSON, HTML exports]
    H --> J[**Web Interface**<br/>Interactive job cards]
    H --> K[**CLI Tools**<br/>Automation & scripting]
    
    style A fill:#e1f5fe
    style C fill:#f3e5f5
    style E fill:#fff3e0
    style G fill:#e8f5e8
    style I fill:#fce4ec
    style J fill:#f1f8e9
    style K fill:#e3f2fd
Loading

Command Groups

Group Purpose Key Commands Examples
jobs Job data management fetch, list, import Ingest from APIs, bulk import
companies Company tracking add, list, test Manage job sources
resumes Resume management add, list, show, remove Process candidate files, export to PDF
match AI matching generate, run, show Create embeddings, find matches
maintenance System upkeep stats, clear-*, backup Database maintenance
config Configuration show, set, validate System settings

Supported Integrations / Formats

Job Board APIs

  • Greenhouse: https://boards-api.greenhouse.io/v1/boards/{company}/jobs
  • Lever: https://api.lever.co/v0/postings/{company}
  • SmartRecruiters: https://api.smartrecruiters.com/v1/companies/{company}/postings

Resume Formats

  • PDF (.pdf) - Full text extraction
  • Word Documents (.docx) - Microsoft Word format
  • Plain Text (.txt) - Direct text processing
  • Markdown (.md) - Formatted text with structure

Export & Reporting

  • CSV (.csv) - Structured data for Excel/analysis
  • JSON (.json) - Machine-readable data format
  • HTML (.html) - Rich formatted reports with styling
  • PDF (.pdf) - Professional printable reports (e.g., jobs list --pdf, resumes list --pdf, match show --pdf)

Configuration

SoupBoss uses a flexible configuration system that supports configuration via:

  • Environment variables (typically in a .env file)
  • soupboss.config.json file
  • Command-line arguments (using uv run python main.py config set or config env)
# Environment variables (.env file)
OLLAMA_HOST=localhost
OLLAMA_PORT=11434
OLLAMA_MODEL=nomic-embed-text

Standalone Utilities

Direct API access without the main system:

# Test company job boards
uv run python greenhouse_fetch.py -test spacex
uv run python lever_fetch.py -test leverdemo

# Direct job fetching
uv run python greenhouse_fetch.py -fetch spacex
uv run python smartrecruiters_fetch.py -fetch dynatrace1

# Bulk operations
uv run python greenhouse_fetch.py -in companies.txt -out ./data/

# Import Disney Workday data
uv run python disney_data_importer.py -file disney_jobs.json

Advanced Usage

Model Comparison & Analysis

# Compare embedding models for quality and speed
uv run python main.py match compare-models --save comparison.json

# Switch between models
uv run python main.py match switch-model bge-large --generate

# View model statistics
uv run python main.py match list-models

System Maintenance

# Database optimization
uv run python main.py maintenance optimize

# Data validation
uv run python main.py maintenance validate

# Backup before major operations
uv run python main.py maintenance backup --output backup.db

# Clean start (removes everything)
uv run python main.py maintenance reset-system --force

Documentation


Built for intelligent job matching with local AI and complete data control

🌐 Web Interface β€’ ⚑ CLI Power β€’ πŸ€– Local AI β€’ πŸ”’ Complete Privacy

About

SoupBoss is an intelligent job matching system that ingests job postings and does semantic matching to your resume

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •