Skip to content
/ gitout Public
forked from JakeWharton/gitout

A command-line tool and Docker image to automatically backup Git repositories from GitHub or anywhere

License

Notifications You must be signed in to change notification settings

po4yka/gitout

 
 

Repository files navigation

Git Out

Build & Test Publish CodeQL Docker Image Version Docker Image Size

A command-line tool to automatically backup Git repositories from GitHub or anywhere.

Migration Notice: Rust to Kotlin

IMPORTANT: This project has been completely rewritten from Rust to Kotlin (based on JakeWharton's upstream rewrite). If you are upgrading from a Rust-based version:

  • JVM Required: The tool now requires a Java Virtual Machine (JRE 8 or later) instead of being a native binary
  • Breaking Changes: Configuration format remains compatible (TOML version 0), but command-line options have changed
  • New Features: Parallel synchronization (4-8x faster), metrics system, enhanced SSL/TLS support, retry mechanism
  • Docker: The Docker image has been updated to use the Kotlin version with all fork improvements
  • Migration Guide: See the "Migration from Rust Version" section below for detailed upgrade instructions

The gitout tool will clone git repos from GitHub or any other git hosting service. If the repository was already cloned, it will fetch any updates to keep your local copy in sync.

When you add your GitHub username and a token, gitout will discover all of your owned repositories and synchronize them automatically. You can opt-in to having repositories that you've starred or watched synchronized as well.

The cloned repositories are bare. In other words, there is no working copy of the files for you to interact with. If you need access to the files, you can git clone /path/to/bare/repo.

Installation

Docker

The binary is available inside the po4yka/gitout Docker container (Kotlin version with SSL improvements).

Docker images are automatically built and published for multiple architectures (amd64, arm64) on every commit to master and for all version tags. Images are available on both Docker Hub and GitHub Container Registry.

Mount a /data volume which is where the repositories will be stored. Mount the /config folder which contains a config.toml or mount a /config/config.toml file directly.

By default, the tool will run a single sync and then exit. If you specify the GITOUT_CRON environment variable with a valid cron specifier, the tool will not exit and perform automatic syncs in accordance with the schedule.

$ docker run -d \
    -v /path/to/data:/data \
    -v /path/to/config.toml:/config/config.toml \
    -e "GITOUT_CRON=0 * * * *" \
    -e "PUID=1000" \
    -e "PGID=1000" \
    po4yka/gitout

For help creating a valid cron specifier, visit cron.help.

To be notified when sync is failing visit https://healthchecks.io, create a check, and specify the ID to the container using the GITOUT_HC_ID environment variable (for example, -e "GITOUT_HC_ID=...").

To write data as a particular user, the PUID and PGID environment variables can be set to your user ID and group ID, respectively.

If you're using Docker Compose, an example setup looks like:

services:
  gitout:
    image: po4yka/gitout:latest
    restart: unless-stopped
    volumes:
      - /path/to/data:/data
      - /path/to/config.toml:/config/config.toml
    environment:
      - "GITOUT_CRON=0 * * * *"
      - "PUID=1000"
      - "PGID=1000"
      # Optional:
      - "GITOUT_HC_ID=..."
      - "GITOUT_TIMEOUT=10m"

Note: You may want to specify an explicit version rather than latest. See https://hub.docker.com/r/po4yka/gitout/tags or CHANGELOG.md for the available versions.

Local debugging setup

The dev/ directory contains a turnkey configuration for running GitOut locally with verbose logging and disposable secrets:

  1. Copy your GitHub and Telegram tokens into dev/secrets/github_token.txt and dev/secrets/telegram_bot_token.txt.
  2. Customize dev/config.local.toml if you want to point at different repositories or enable Telegram notifications (remember to use numeric allowed_users IDs).
  3. Start the stack with:
    docker compose -f dev/docker-compose.local.yml up --build

The container runs once (GITOUT_CRON is empty) so you can iterate quickly, and all cloned repositories stay inside dev/data/.

Binaries

A .zip can be downloaded from the latest GitHub release.

The Java Virtual Machine must be installed on your system to run. After unzipping, run either bin/gitout (macOS, Linux) or bin/gitout.bat (Windows).

Usage

$ gitout --help
Usage: gitout [<options>] <config> <destination>

Options:
  --version               Show the version and exit
  --workers=<number>      Number of parallel workers for repository synchronization (default: from config or 4)
  --metrics / --no-metrics Enable/disable metrics collection (default: enabled)
  --metrics-format=<fmt>  Metrics output format: console, json, prometheus (default: console)
  --metrics-path=<path>   Path to export metrics file (optional, default: stdout)
  -v, --verbose           Increase logging verbosity. -v = informational, -vv = debug, -vvv = trace
  -q, --quiet             Decrease logging verbosity. Takes precedence over verbosity
  --dry-run               Print actions instead of performing them
  --cron=<expression>     Run command forever and perform sync on this schedule
  --hc-id=<id>            ID of Healthchecks.io service to notify
  --hc-host=<url>         Host of Healthchecks.io service to notify. Requires --hc-id
  -h, --help              Show this message and exit

Arguments:
  <config>       Configuration TOML
  <destination>  Backup directory

Environment Variables:
  GITOUT_WORKERS         Number of parallel workers (overrides config)
  GITOUT_METRICS         Enable/disable metrics (true/false)
  GITOUT_METRICS_FORMAT  Metrics output format (console/json/prometheus)
  GITOUT_METRICS_PATH    Path to export metrics file
  GITOUT_TIMEOUT         Timeout for git operations (default: 10m)
  GITOUT_CRON            Cron schedule for automatic syncs
  GITOUT_HC_ID           Healthchecks.io service ID
  GITOUT_HC_HOST         Healthchecks.io service host
  GITHUB_TOKEN           GitHub personal access token
  GITHUB_TOKEN_FILE      Path to file containing GitHub token
  SSL_CERT_FILE          Path to SSL certificate bundle

Configuration

Until version 1.0 of the tool, the TOML version is set to 0 and may change incompatibly between 0.x releases. You can find migration information in the CHANGELOG.md file.

version = 0

[github]
user = "example"
token = "abcd1234efgh5678ij90"  # Optional - can use environment variables instead

[github.clone]
starred = true  # Optional, default false
watched = true  # Optional, default false
gists = true    # Optional, default true
# Extra repos to synchronize that are not owned, starred, or watched by you.
repos = [
  "JakeWharton/gitout",
]
# Repos temporary or otherwise that you do not want to be synchronized.
ignore = [
  "JakeWharton/TestParameterInjector",
]

# Repos not on GitHub to synchronize.
[git.repos]
asm = "https://gitlab.ow2.org/asm/asm.git"

# SSL configuration (optional)
[ssl]
cert_file = "/etc/ssl/certs/ca-certificates.crt"  # Path to your certificate bundle
verify_certificates = true  # Optional, default true. Set to false only for testing!

# Parallel synchronization configuration (optional)
[parallelism]
workers = 4  # Number of repositories to sync in parallel (default: 4)

# Metrics and monitoring configuration (optional)
[metrics]
enabled = true           # Enable metrics collection (default: true)
format = "console"       # Output format: console, json, prometheus (default: console)
export_path = "/var/log/gitout/metrics.json"  # Optional: export to file

# Telegram notifications (optional)
[telegram]
token = "YOUR_BOT_TOKEN"    # Optional - can use environment variables instead
chat_id = "123456789"       # Your Telegram chat ID (required if enabled)
enabled = true              # Enable/disable Telegram notifications (default: true)
notify_start = true         # Notify when sync starts (default: true)
notify_progress = true      # Notify about progress (default: true)
notify_completion = true    # Notify when sync completes (default: true)
notify_errors = true        # Notify about errors (default: true)
notify_new_repos = true     # Notify about new repositories discovered (default: true)
notify_updates = false      # Notify about each repository update (default: false)
enable_commands = false     # Enable bot command interface (default: false)
allowed_users = []          # List of authorized Telegram user IDs for commands

GitHub Token Configuration

The GitHub token can be provided in multiple ways with the following priority order:

  1. config.toml file - Specify the token directly in the [github] section:

    [github]
    user = "example"
    token = "abcd1234efgh5678ij90"
  2. GITHUB_TOKEN_FILE environment variable - Path to a file containing the token:

    export GITHUB_TOKEN_FILE="/path/to/token-file"

    This is useful for keeping secrets in separate files managed by secret management systems.

  3. GITHUB_TOKEN environment variable - Token value directly:

    export GITHUB_TOKEN="abcd1234efgh5678ij90"

The tool will use the first available token found in this order. If you have a GitHub configuration but no token is found in any location, the tool will fail with a helpful error message.

When using Docker, you can pass environment variables like this:

docker run -d \
  -v /path/to/data:/data \
  -v /path/to/config.toml:/config/config.toml \
  -e "GITHUB_TOKEN_FILE=/secrets/github-token" \
  -e "GITOUT_CRON=0 * * * *" \
  po4yka/gitout

Telegram Notifications

The tool can send real-time notifications to Telegram about synchronization activities. This feature is useful for monitoring backup operations and getting instant alerts about errors.

Setting Up Telegram Bot

  1. Create a Telegram Bot:

    • Open Telegram and search for @BotFather
    • Send /newbot command
    • Follow the prompts to set a name and username for your bot
    • BotFather will provide you with a bot token (e.g., 123456789:ABCdefGHIjklMNOpqrsTUVwxyz)
    • Save this token securely
  2. Get Your Chat ID:

    • Start a chat with your new bot (click the link provided by BotFather)
    • Send any message to the bot
    • Open this URL in your browser: https://api.telegram.org/bot<YOUR_BOT_TOKEN>/getUpdates
    • Look for the "chat":{"id": field in the JSON response
    • This number is your chat_id (e.g., 123456789 or -987654321 for groups)
  3. Configure gitout:

    Add the [telegram] section to your config.toml:

    [telegram]
    token = "123456789:ABCdefGHIjklMNOpqrsTUVwxyz"
    chat_id = "123456789"
    enabled = true
    notify_start = true
    notify_progress = true
    notify_completion = true
    notify_errors = true
    # Optional: Enable bot commands and restrict access
    enable_commands = true
    allowed_users = [123456789, 987654321]  # List of allowed Telegram user IDs

Bot Commands and User Authentication

The Telegram bot can respond to commands from authorized users. When enable_commands is enabled, the bot will:

  • Start Polling: Listen for incoming commands from users
  • Authenticate Users: Only respond to user IDs listed in allowed_users
  • Provide Information: Respond to queries about sync status and bot information

Available Commands:

  • /ping - Health check plus bot uptime
  • /start - Welcome message and command list
  • /help - Show detailed help about available commands
  • /status - Get current synchronization status (real-time if syncing)
  • /stats - Get repository statistics and last sync time
  • /fails - Show the most recent repository failures (if any)
  • /info - Get bot information, version, and configuration

Getting Your Telegram User ID:

  1. Start a chat with @userinfobot on Telegram
  2. The bot will reply with your user ID
  3. Add this ID to the allowed_users list in your config.toml

Security Features:

  • Whitelist-based: Only users in allowed_users can interact with the bot
  • Unauthorized Access Logging: All unauthorized attempts are logged
  • Rejection Messages: Unauthorized users receive a clear rejection message

Example Configuration:

[telegram]
token = "123456789:ABCdefGHIjklMNOpqrsTUVwxyz"
chat_id = "123456789"              # Main chat for notifications
enabled = true
enable_commands = true              # Enable command interface
allowed_users = [123456789]         # Your Telegram user ID
notify_start = true
notify_progress = false             # Reduce spam for frequent syncs
notify_completion = true
notify_errors = true

Command Examples:

When you send /status to the bot:

📊 Last Sync Status

✅ Successful: 98
❌ Failed: 2
📊 Success Rate: 98%
⏱️ Duration: 5m 32s
🏁 Finished: 2025-11-04 10:30:00

When you send /stats to the bot:

📊 Repository Statistics

Last Sync: 2025-11-04 10:30:00
Total Repositories: 100
Successful: 98 ✅
Failed: 2 ❌
Success Rate: 98%

When you send /info to the bot:

ℹ️ GitOut Bot Information

Version: 0.4.0-fork-SNAPSHOT
Status: ✅ Active
Notifications:
  • Start: ✅
  • Progress: ❌
  • Completion: ✅
  • Errors: ✅
Commands: ✅ Enabled
Authorized Users: 2

Telegram Token Configuration

The Telegram bot token can be provided in multiple ways with the following priority order:

  1. config.toml file - Specify the token directly in the [telegram] section:

    [telegram]
    token = "123456789:ABCdefGHIjklMNOpqrsTUVwxyz"
    chat_id = "123456789"
  2. TELEGRAM_BOT_TOKEN_FILE environment variable - Path to a file containing the token:

    export TELEGRAM_BOT_TOKEN_FILE="/path/to/bot-token-file"

    This is useful for keeping secrets in separate files managed by secret management systems.

  3. TELEGRAM_BOT_TOKEN environment variable - Token value directly:

    export TELEGRAM_BOT_TOKEN="123456789:ABCdefGHIjklMNOpqrsTUVwxyz"

The tool will use the first available token found in this order. If Telegram is configured but no token is found, notifications will be disabled with a warning.

Notification Types

The Telegram integration sends the following types of notifications:

Start Notification (notify_start):

  • Triggered when synchronization begins
  • Shows total repository count and number of workers
  • Includes timestamp

Progress Notifications (notify_progress):

  • Sent at meaningful intervals during sync (every 10%)
  • Shows completion percentage and current repository
  • Helps monitor long-running sync operations

Completion Notification (notify_completion):

  • Sent when synchronization finishes
  • Shows success/failure counts and success rate
  • Includes total duration and finish timestamp
  • Uses different emoji for success (✅) vs errors (⚠️)

Error Notifications (notify_errors):

  • Individual notifications: Sent immediately when each repository sync fails
  • Shows repository name, URL, error message, and timestamp
  • Summary notification: Sent at end of sync with all failed repositories
  • Summary limited to first 5 errors to avoid message size limits
  • Truncates long error messages for readability
  • Provides real-time alerts for critical sync failures

New Repository Notifications (notify_new_repos):

  • Sent when new starred or watched repositories are discovered
  • Lists all newly detected repositories
  • Limited to first 10 repositories to avoid message size limits
  • Sent before synchronization begins

First Backup Notifications (notify_new_repos):

  • Sent when a repository's first backup is successfully created
  • Shows repository name, URL, and timestamp
  • Triggered after each new repository's initial clone completes
  • Helps track which new repositories have been backed up

Repository Update Notifications (notify_updates):

  • Sent after each successful repository sync/update
  • Shows repository name, URL, and timestamp
  • Applies to all existing repositories (not first-time backups)
  • Disabled by default to avoid notification spam
  • Useful for monitoring specific repositories or debugging sync issues

Docker Configuration

When using Docker, pass the Telegram token via environment variables:

docker run -d \
  -v /path/to/data:/data \
  -v /path/to/config.toml:/config/config.toml \
  -e "TELEGRAM_BOT_TOKEN_FILE=/secrets/telegram-token" \
  -e "GITOUT_CRON=0 * * * *" \
  po4yka/gitout

Or mount the token file:

docker run -d \
  -v /path/to/data:/data \
  -v /path/to/config.toml:/config/config.toml \
  -v /path/to/telegram-token:/secrets/telegram-token:ro \
  -e "TELEGRAM_BOT_TOKEN_FILE=/secrets/telegram-token" \
  -e "GITOUT_CRON=0 * * * *" \
  po4yka/gitout

Docker Compose Example

services:
  gitout:
    image: po4yka/gitout:latest
    restart: unless-stopped
    volumes:
      - /path/to/data:/data
      - /path/to/config.toml:/config/config.toml
      - /path/to/telegram-token:/secrets/telegram-token:ro
    environment:
      - "GITOUT_CRON=0 * * * *"
      - "TELEGRAM_BOT_TOKEN_FILE=/secrets/telegram-token"
      - "PUID=1000"
      - "PGID=1000"

Notification Configuration Options

You can fine-tune which notifications to receive:

[telegram]
token = "YOUR_BOT_TOKEN"
chat_id = "YOUR_CHAT_ID"
enabled = true             # Master switch for all notifications
notify_start = false       # Disable sync start notifications
notify_progress = false    # Disable progress updates (reduces message spam)
notify_completion = true   # Keep completion notifications
notify_errors = true       # Keep error notifications
notify_new_repos = true    # Keep new repository discovery notifications
notify_updates = false     # Disable per-repository update notifications (default)

Recommended Settings:

  • For frequent syncs (hourly): Disable notify_start, notify_progress, and notify_updates; keep notify_new_repos enabled
  • For daily syncs: Enable all notification types except notify_updates (unless monitoring specific repos)
  • For critical repositories: Always enable notify_errors, notify_completion, and notify_new_repos
  • For debugging: Enable notify_updates to see each repository sync in real-time

Troubleshooting

Bot not sending messages:

  • Verify the bot token is correct
  • Ensure you've sent at least one message to the bot
  • Check that the chat_id is correct (use /getUpdates API)
  • Review logs with -v flag for error messages

Messages not formatted correctly:

  • The bot uses HTML formatting by default
  • If messages look strange, there may be a Telegram API issue
  • Check Telegram's formatting documentation

Rate limiting:

  • Telegram has rate limits for bots (30 messages per second)
  • The tool implements smart throttling for progress notifications
  • Progress updates are sent at 10% intervals to reduce spam

SSL/TLS Configuration

The tool automatically searches for SSL certificates in common locations:

  • /etc/ssl/certs/ca-certificates.crt
  • /etc/ssl/cert.pem
  • /usr/lib/ssl/cert.pem
  • /etc/pki/tls/certs/ca-bundle.crt

If your certificates are in a different location, specify the path in the [ssl] section of your config.

The SSL_CERT_FILE environment variable can also be used to specify the certificate path.

Retry and Reliability

The tool includes automatic retry logic for network operations:

  • Up to 6 retry attempts for failed git operations
  • Exponential backoff between retries (5s, 10s, 15s, etc.)
  • Configurable timeout via GITOUT_TIMEOUT environment variable (default: 10m)

Parallel Synchronization (Enhanced)

The tool supports enhanced parallel synchronization with advanced coroutine features:

Basic Configuration:

  • --workers CLI option: Override the number of parallel workers
  • GITOUT_WORKERS environment variable: Set workers without changing config
  • [parallelism] section in config.toml: Set default worker pool size
  • Default: 4 workers

Priority: CLI option > Environment variable > Config file > Default (4)

Advanced Configuration (config.toml):

[parallelism]
workers = 4                        # Number of parallel workers
progress_interval_ms = 1000        # Progress reporting interval in milliseconds
repository_timeout_seconds = 600   # Default timeout per repository (in seconds)

# Priority patterns: sync high-priority repos first, smaller repos first
[[parallelism.priorities]]
pattern = "JakeWharton/*"          # Pattern to match repository names
priority = 75                      # Priority level (0-100, higher = more important)
timeout = 300                      # Optional: custom timeout for matching repos

[[parallelism.priorities]]
pattern = "*/critical"             # Match any org with 'critical' repo
priority = 100                     # Highest priority
timeout = 600                      # 10 minute timeout

[[parallelism.priorities]]
pattern = "*/large-repo"           # Large repositories
priority = 50                      # Normal priority
timeout = 1800                     # 30 minute timeout

Priority Levels:

  • 100: PRIORITY_HIGHEST - Critical repositories, synced first
  • 75: PRIORITY_HIGH - Important repositories
  • 50: PRIORITY_NORMAL - Default priority
  • 25: PRIORITY_LOW - Less important repositories
  • 0: PRIORITY_LOWEST - Synced last

Usage Examples:

# Use 8 parallel workers
gitout --workers=8 config.toml /backup/path

# Use environment variable
export GITOUT_WORKERS=6
gitout config.toml /backup/path

# Sequential mode (disable parallelism)
gitout --workers=1 config.toml /backup/path

Performance Benefits:

Parallel synchronization significantly reduces total sync time:

  • Sequential sync: ~30 minutes for 100 repos (assuming 18 seconds per repo)
  • Parallel sync (4 workers): ~8 minutes for 100 repos (4x speedup)
  • Parallel sync (8 workers): ~4 minutes for 100 repos (8x speedup)

Example performance improvement:

Sequential: 100 repos × 18s = 1800s (~30 min)
Parallel (4 workers): 100 repos ÷ 4 × 18s = 450s (~8 min)
Parallel (8 workers): 100 repos ÷ 8 × 18s = 225s (~4 min)

Enhanced Features:

  1. Progress Reporting:

    • Real-time progress updates: "Progress: 45 of 100 repositories completed (45%)"
    • Tracks sync duration per repository
    • Reports average sync time and success rate
  2. Priority Queuing:

    • High-priority repositories sync first
    • Smaller repositories prioritized within same priority level
    • Pattern-based priority configuration with wildcards
  3. Repository-Specific Timeouts:

    • Configure different timeouts for different repositories
    • Pattern matching for flexible configuration
    • Prevents large repos from holding up the queue
  4. Advanced Error Handling:

    • Categorized errors (Network, Auth, Timeout, Not Found, etc.)
    • Actionable suggestions for common errors
    • Detailed failure reports grouped by error category
  5. Metrics Collection:

    • Success/failure rates
    • Average sync duration
    • Total sync time
    • Per-repository timing information
  6. Cancellation Support:

    • Graceful handling of job cancellation
    • Proper cleanup on cancellation or timeout
    • Individual repository timeouts don't affect others

Best Practices:

  • Use 4-8 workers for GitHub to stay within rate limits
  • Higher values (10-16) work well for self-hosted git servers
  • Use workers=1 for debugging or when network is unreliable
  • Set higher priority for critical repositories
  • Configure longer timeouts for large repositories
  • Monitor logs to ensure parallel operations complete successfully

Error Handling:

The tool provides intelligent error categorization and suggestions:

Error Categories:

  • NETWORK: Connection issues, DNS failures, SSL problems
  • AUTH: Authentication or authorization failures
  • TIMEOUT: Operation timeouts
  • NOT_FOUND: Repository not found or access denied
  • GIT_COMMAND: Git command execution errors
  • FILESYSTEM: Disk space, permissions issues
  • UNKNOWN: Uncategorized errors

Example Error Output:

Failed repositories by category:
  NETWORK: 2 repositories
    - user/repo1 (https://github.com/user/repo1.git): Connection refused
      Suggestion: Check your network connection and ensure the repository server is accessible.
    - user/repo2 (https://github.com/user/repo2.git): SSL certificate verification failed
      Suggestion: SSL/TLS error. Check certificate configuration in [ssl] section.

  AUTH: 1 repository
    - user/private-repo (https://github.com/user/private-repo.git): Authentication failed
      Suggestion: Verify your GitHub token is valid and has the required permissions.

Logging Output:

With enhanced parallel sync, you'll see detailed progress information:

Using 4 parallel workers for synchronization
Starting synchronization of 100 repositories with 4 workers
[1/100] Starting sync: https://github.com/user/critical-repo.git (priority: 100)
[2/100] Starting sync: https://github.com/user/important-repo.git (priority: 75)
Progress: 25 of 100 repositories completed (25%)
[25/100] Completed sync: https://github.com/user/repo25.git (2.3s)
Progress: 50 of 100 repositories completed (50%)
Progress: 75 of 100 repositories completed (75%)
Synchronization complete: 98 succeeded, 2 failed
Total duration: 8m 32s, Average per repository: 5.1s
Success rate: 98%

Performance Optimization

The tool includes several performance optimizations for faster and more efficient operation:

Git Operations

Shallow Clones:

  • By default, initial clones use --depth=1 for faster downloads
  • Reduces bandwidth and storage for repositories with long history
  • Can be disabled in config if full history is needed

Compression:

  • Git compression level set to 9 (maximum) by default
  • Reduces network transfer size and storage usage
  • Configurable via compression_level setting

HTTP/2:

  • Enabled by default for git operations
  • Provides better multiplexing and reduced latency
  • Falls back to HTTP/1.1 if not available

Network Operations

Connection Pooling:

  • Reuses HTTP connections for GitHub API calls
  • Default pool size: 16 connections
  • Keep-alive: 300 seconds (5 minutes)
  • Significantly reduces connection overhead

Timeouts:

  • Connection timeout: 30 seconds
  • Read timeout: 60 seconds
  • Write timeout: 60 seconds
  • All configurable via [performance] section

File System Optimizations

Credential File Reuse:

  • Credentials file created once and reused
  • Avoids redundant file I/O operations
  • Enabled by default via reuse_credentials_file

Directory Creation:

  • Parent directories created once and cached
  • Eliminates redundant filesystem checks
  • Automatic synchronization for parallel operations

Performance Configuration

Add a [performance] section to your config.toml:

[performance]
# Git operation settings
shallow_clone = true          # Use --depth=1 for faster initial clones
clone_depth = 1               # Depth for shallow clones
compression_level = 9         # Git compression (0-9, 9 = max)

# Network settings
connection_timeout = 30       # Connection timeout in seconds
read_timeout = 60             # Read timeout in seconds
write_timeout = 60            # Write timeout in seconds
connection_pool_size = 16     # HTTP connection pool size
connection_pool_keep_alive = 300  # Keep-alive in seconds
enable_http2 = true           # Enable HTTP/2 for git operations

# File system settings
reuse_credentials_file = true # Reuse credentials file (recommended)
batch_size = 100              # API batch size for pagination

Performance Best Practices

For GitHub Synchronization:

  • Use 4-8 workers to stay within API rate limits
  • Enable shallow clones for faster initial sync
  • Use HTTP/2 for better performance
  • Increase connection pool size if syncing many repos

For Self-Hosted Git Servers:

  • Can use higher worker counts (10-16)
  • Adjust timeouts based on server performance
  • Consider disabling shallow clones if server doesn't support it
  • Monitor server load and adjust workers accordingly

Network Optimization:

  • Use wired connection for large syncs when possible
  • Schedule syncs during off-peak hours
  • Monitor bandwidth usage with high worker counts
  • Enable compression to reduce transfer sizes

Disk I/O Optimization:

  • Use SSD storage for best performance
  • Ensure sufficient disk space (repos can be large)
  • Consider using a separate partition for backups
  • Monitor disk I/O during sync operations

Memory Usage:

  • Each worker requires memory for git operations
  • Typical usage: 50-100 MB per worker
  • Total memory: (workers × 100 MB) + base overhead
  • Example: 8 workers ≈ 800 MB + 200 MB base = 1 GB

Troubleshooting Performance Issues

Slow API Calls:

  • Check internet connection speed
  • Verify GitHub API rate limits
  • Reduce workers if hitting rate limits
  • Enable debug logging: -vv

Slow Git Operations:

  • Increase timeout values
  • Check disk I/O performance
  • Verify network bandwidth
  • Consider using shallow clones

High Memory Usage:

  • Reduce number of workers
  • Monitor with system tools (top, htop)
  • Check for memory leaks with -vvv logging

High CPU Usage:

  • Normal during compression operations
  • Reduce compression_level if needed
  • Adjust worker count based on CPU cores

Creating a GitHub token

  1. Visit https://github.com/settings/tokens
  2. Click "Generate new token"
  3. Type "gitout" in the name field
  4. Select the "repo", "gist", and "read:user" scopes
    • repo: Needed to discover and clone private repositories (if you only have public repositories then just public_repo will also work)
    • gist: Needed to discover and clone private gists (if you only have public gists then this is not required)
    • read:user: Needed to traverse your owned, starred, and watched repo lists
  5. Select "Generate token"
  6. Copy the value into your config.toml as it will not be shown again

What's New: Complete Kotlin Migration & Major Enhancements

This fork (po4yka/gitout) is based on JakeWharton's Kotlin rewrite with extensive additional improvements. Version 0.4.0-fork includes a complete migration from Rust to Kotlin plus production-grade enhancements.

Migration from Rust to Kotlin

  • Complete rewrite: Migrated entire codebase from Rust to Kotlin
  • JVM-based: Now runs on Java Virtual Machine (requires JRE 8+)
  • Modern architecture: Leverages Kotlin coroutines for concurrency
  • GraphQL API: Uses Apollo GraphQL for GitHub API interactions
  • Maintained compatibility: TOML config format unchanged (version 0)
  • Enhanced error handling: Better error messages and diagnostics

Parallel Synchronization - 4-8x Faster

  • Concurrent sync: Uses Kotlin coroutines for parallel repository operations
  • Configurable workers: Default 4 workers, configurable up to 16+
  • Massive speedup: 4-8x faster for large repository sets
    • Sequential: ~30 minutes for 100 repos
    • 4 workers: ~8 minutes (4x faster)
    • 8 workers: ~4 minutes (8x faster)
  • Intelligent scheduling: Priority-based queuing with pattern matching
  • Progress tracking: Real-time progress updates with completion percentage
  • Fault isolation: One failure doesn't stop others
  • Configuration options:
    • CLI: --workers=N
    • Environment: GITOUT_WORKERS=N
    • Config: [parallelism] workers = N

Production-Grade Metrics System

  • Comprehensive tracking: Sync attempts, successes, failures, retries
  • Performance metrics: Duration statistics (avg, min, max, p50, p95, p99)
  • Multiple formats: Console (human-readable), JSON (machine-parseable), Prometheus (monitoring)
  • Per-repository details: Individual timing and status for each repo
  • Export options: stdout or file export with configurable path
  • Minimal overhead: <0.00002% impact on sync time
  • Integration ready: Works with ELK, Splunk, Prometheus, Grafana
  • Configuration:
    • CLI: --metrics-format=json --metrics-path=/logs/metrics.json
    • Environment: GITOUT_METRICS_FORMAT=json
    • Config: [metrics] format = "json"

Performance Optimizations - 70-80% Faster Git Operations

  • Shallow clones: --depth=1 by default for faster initial clones (50-80% faster)
  • HTTP/2 support: Better multiplexing and reduced latency
  • Connection pooling: Reuses HTTP connections (97.5% reduction in overhead)
  • Maximum compression: Git compression level 9 (40% less data transfer)
  • Credential reuse: Single credentials file shared across all repos
  • Directory caching: Eliminates redundant filesystem operations (90% reduction)
  • Configurable timeouts: Fine-tune for your network conditions
  • Benchmarking tools: Built-in performance measurement utilities
  • Real-world improvements:
    • 100 repos: 22m → 6m (73% faster)
    • 200 repos: 1h 20m → 22m (73% faster)

Enhanced SSL/TLS Support

  • Automatic detection: Finds certificates in common system locations
  • Multiple sources: Config file, environment variable, or auto-detect
  • Custom certificates: Support for private CA certificates
  • Enhanced diagnostics: Clear error messages for SSL issues
  • Testing mode: Disable verification for development (with warnings)
  • Docker support: Proper CA bundle handling in containers

Intelligent Retry Mechanism

  • Automatic retries: Up to 6 attempts for failed operations
  • Linear backoff: 5s, 10s, 15s, 20s, 25s, 30s delays
  • Configurable timeout: Default 10 minutes, set via GITOUT_TIMEOUT
  • Detailed logging: Clear indication of retry attempts
  • Metrics integration: Tracks retry counts and delays
  • Network resilience: Handles transient failures gracefully

Flexible Authentication

  • Three token sources (priority order):
    1. Config file: [github] token = "..."
    2. Token file: GITHUB_TOKEN_FILE=/path/to/token
    3. Environment: GITHUB_TOKEN=...
  • Secure handling: Credentials passed via temporary files
  • Clear errors: Helpful messages when authentication fails

Comprehensive Test Suite

  • 41+ unit and integration tests: High code coverage
  • Test categories:
    • Configuration parsing (7 tests)
    • Engine operations (7 tests)
    • Retry logic (5 tests)
    • Parallel sync (8 tests)
    • Integration scenarios (14+ tests)
  • Test documentation: Detailed summaries in /RETRY_TESTING_SUMMARY.md
  • CI/CD ready: Fast tests for continuous integration

Docker Enhancements

  • User/Group mapping: PUID and PGID for proper file permissions
  • All environment variables: Full configuration via env vars
  • Updated base image: Alpine Linux with proper SSL support
  • Size optimized: Smaller image with faster startup
  • Health checks: Built-in health check support

Developer Experience

  • Kotlin DSL: Modern, type-safe configuration
  • Structured logging: Clear, actionable log messages
  • Dry-run mode: Test configurations without making changes
  • Verbose modes: -v, -vv, -vvv for different detail levels
  • Healthchecks.io: Built-in monitoring integration
  • Cron scheduling: --cron for automatic periodic syncs

Migration from Rust Version

If you're upgrading from a previous Rust-based version of gitout, here's what you need to know:

System Requirements Changes

  • Old (Rust): Native binary, no runtime dependencies
  • New (Kotlin): Requires Java Runtime Environment (JRE 8 or later)
  • Docker: No changes needed, the image includes JRE

Installation Changes

  1. Uninstall old version (if installed as binary):

    # Remove old Rust binary
    rm /usr/local/bin/gitout
  2. Install JRE (if not already installed):

    # Ubuntu/Debian
    sudo apt-get install openjdk-17-jre
    
    # macOS (using Homebrew)
    brew install openjdk@17
    
    # Alpine Linux (for Docker)
    apk add openjdk17-jre
  3. Download and install new version:

    # Download from releases page
    wget https://github.com/po4yka/gitout/releases/latest/download/gitout.zip
    unzip gitout.zip
    
    # Run
    ./gitout/bin/gitout --version

Configuration Changes

  • No changes required: Your existing config.toml will work as-is
  • Optional: Add new sections for metrics, performance, parallelism

Command-Line Changes

Old (Rust) New (Kotlin) Notes
gitout config.toml dest/ gitout config.toml dest/ No change
Not available --workers=N New: parallel sync
Not available --metrics New: metrics system
Not available --cron="..." New: scheduled syncs
Not available --dry-run New: test mode
-v, -vv -v, -vv, -vvv Enhanced verbosity

Environment Variables Changes

Old (Rust) New (Kotlin) Notes
CRON GITOUT_CRON Renamed with prefix
HEALTHCHECK_ID GITOUT_HC_ID Renamed with prefix
HEALTHCHECK_HOST GITOUT_HC_HOST Renamed with prefix
Not available GITOUT_WORKERS New: set worker count
Not available GITOUT_TIMEOUT New: operation timeout
Not available GITOUT_METRICS New: enable/disable metrics
GITHUB_TOKEN GITHUB_TOKEN No change

Docker Changes

# Old environment variables still work, but new names preferred
docker run -d \
  -v /path/to/data:/data \
  -v /path/to/config.toml:/config/config.toml \
  -e "GITOUT_CRON=0 * * * *"      # Renamed from CRON
  -e "GITOUT_HC_ID=..."           # Renamed from HEALTHCHECK_ID
  -e "GITOUT_WORKERS=8"           # New: parallel workers
  -e "GITOUT_METRICS_FORMAT=json" # New: metrics
  -e "PUID=1000" \
  -e "PGID=1000" \
  po4yka/gitout:latest

What You Gain

  • 4-8x faster synchronization with parallel workers
  • Production metrics for monitoring and alerting
  • 70-80% faster git operations with performance optimizations
  • Better reliability with automatic retry mechanism
  • Enhanced logging with structured, actionable messages
  • Comprehensive testing with 41+ automated tests

Troubleshooting Migration Issues

Issue: "Java not found" error

# Solution: Install JRE
sudo apt-get install openjdk-17-jre

Issue: "Permission denied" when running binary

# Solution: Make executable
chmod +x bin/gitout

Issue: Environment variables not recognized

# Solution: Use new names with GITOUT_ prefix
export GITOUT_CRON="0 * * * *"  # Not CRON

Issue: Performance slower than expected

# Solution: Enable parallel sync
gitout --workers=8 config.toml /backup/path

CI/CD and Automation

This project uses GitHub Actions for continuous integration, testing, and deployment. All workflows are fully automated and trigger on appropriate events.

Workflows

Build & Test Workflow

File: .github/workflows/build.yaml Triggers: Pull requests, pushes to master, manual dispatch Purpose: Comprehensive testing and validation

Jobs:

  • Test Suite: Runs all unit and integration tests (ConfigTest, EngineTest, RetryPolicyTest, ParallelSyncTest, IntegrationTest)
  • Build Distribution: Creates ZIP and TAR distribution packages
  • Docker Build Test: Validates Docker image builds for both architectures
  • Code Quality: Runs code formatting and quality checks

Features:

  • Gradle dependency caching for faster builds
  • Test result publishing with detailed reports
  • Artifact uploads for distributions and reports
  • Multi-architecture Docker builds with caching

Publish Workflow

File: .github/workflows/publish.yaml Triggers: Pushes to master, version tags (v*..), manual dispatch Purpose: Automated publishing and release creation

Jobs:

  • Build Distributions: Creates release-ready ZIP and TAR packages
  • Publish Docker: Builds and publishes multi-arch images to Docker Hub and GHCR
  • Create Release: Automatically creates GitHub releases with artifacts

Docker Images:

  • Architectures: linux/amd64, linux/arm64
  • Registries: Docker Hub (po4yka/gitout) and GitHub Container Registry (ghcr.io/po4yka/gitout)
  • Tags: latest, version tags (e.g., v0.4.0), branch names, commit SHA

Release Features:

  • Automatic extraction of release notes from CHANGELOG.md
  • Upload of ZIP and TAR distributions
  • Docker pull commands in release notes
  • Generated release notes from commits

CodeQL Security Workflow

File: .github/workflows/codeql.yaml Triggers: Pushes to master, pull requests, weekly schedule (Mondays), manual dispatch Purpose: Automated security scanning and vulnerability detection

Features:

  • Java/Kotlin code analysis
  • Security and quality queries
  • Weekly scheduled scans
  • Integration with GitHub Security tab

Dependency Management

The project uses both Renovate and Dependabot for automated dependency updates:

Renovate (.github/renovate.json5):

  • Groups related updates (Kotlin, GitHub Actions, test dependencies)
  • Auto-merges minor and patch updates
  • Weekly schedule (Mondays at 3 AM)
  • Concurrent PR limit: 10

Dependabot (.github/dependabot.yml):

  • Monitors Gradle dependencies, GitHub Actions, and Docker base images
  • Groups updates by category
  • Weekly schedule (Mondays at 3 AM)
  • Automatic labeling for easy review

Manual Triggers

All workflows support manual triggering via workflow_dispatch:

# Using GitHub CLI
gh workflow run build.yaml
gh workflow run publish.yaml
gh workflow run codeql.yaml

Or via the GitHub web interface: Actions → Select workflow → Run workflow

Secrets Required

For full CI/CD functionality, the following secrets must be configured in repository settings:

Required for Docker Publishing:

  • DOCKER_HUB_TOKEN: Docker Hub personal access token (for po4yka user)

Automatically Available:

  • GITHUB_TOKEN: Provided by GitHub Actions (for GHCR and releases)

Branch Protection

Recommended branch protection rules for master:

  • Require pull request reviews before merging
  • Require status checks to pass:
    • Test Suite
    • Build Distribution
    • Docker Build Test
    • Code Quality
  • Require conversation resolution before merging
  • Require linear history

Release Process

To create a new release:

  1. Update version in build.gradle (e.g., 0.4.0-fork)
  2. Update CHANGELOG.md with release notes
  3. Commit changes: git commit -m "Prepare release 0.4.0-fork"
  4. Create and push tag: git tag v0.4.0-fork && git push origin v0.4.0-fork
  5. GitHub Actions will automatically:
    • Run all tests
    • Build distribution packages (ZIP and TAR)
    • Build and push multi-arch Docker images
    • Create GitHub release with artifacts and notes
    • Tag Docker images as latest, v0.4.0-fork, 0.4, 0, etc.

Monitoring Build Status

Monitor workflow runs:

Development

Build the project:

./gradlew build

Run tests:

./gradlew test

Create distribution:

./gradlew installDist

The binary will be in build/install/gitout/bin/gitout.

LICENSE

MIT. See LICENSE.txt.

Copyright 2020 Jake Wharton

About

A command-line tool and Docker image to automatically backup Git repositories from GitHub or anywhere

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Languages

  • Kotlin 89.7%
  • Python 8.3%
  • Dockerfile 1.1%
  • Shell 0.9%