Dockerfile implementation for easy deployment#18
Dockerfile implementation for easy deployment#18omaralvarez wants to merge 1 commit intoAnttwo:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds containerization support to make MAtCha easier to build and run in a CUDA-enabled environment.
Changes:
- Added a CUDA 11.8 + Miniforge-based
Dockerfilethat installs dependencies, builds the conda env, and downloads checkpoints. - Added a
.dockerignorethat currently excludes the entire build context.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
| Dockerfile | Builds a GPU-ready image, installs the conda environment, and configures container startup behavior |
| .dockerignore | Controls build context contents (currently excludes everything) |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| RUN chmod -R 777 /workspace/MAtCha | ||
| RUN git clone https://github.com/Anttwo/MAtCha.git . |
There was a problem hiding this comment.
The image is built from a fresh git clone of https://github.com/Anttwo/MAtCha.git, while the build context is effectively unused. This means docker build from this repo/branch will not include the code being reviewed (and will always build whatever is currently on the remote default branch), which breaks reproducibility and makes PR changes ineffective. Prefer copying the local checkout into the image (e.g., COPY . .) and remove the git clone step (optionally allow an ARG for building a specific ref).
| RUN chmod -R 777 /workspace/MAtCha | |
| RUN git clone https://github.com/Anttwo/MAtCha.git . | |
| COPY . . | |
| RUN chmod -R 777 /workspace/MAtCha |
| # Ignore all files | ||
| * No newline at end of file |
There was a problem hiding this comment.
This .dockerignore ignores the entire build context (*). That pattern prevents using COPY/ADD for the repo (and contributes to the Dockerfile needing to git clone), and it can be surprising to users who expect docker build to package the current working tree. Consider switching to a selective ignore list (e.g., outputs, caches, datasets) instead of ignoring everything.
| # Ignore all files | |
| * | |
| # Ignore unnecessary files and directories in the Docker build context | |
| # Version control | |
| .git | |
| .gitignore | |
| .gitattributes | |
| # Node / JavaScript dependencies and builds | |
| node_modules | |
| npm-debug.log* | |
| yarn-debug.log* | |
| yarn-error.log* | |
| pnpm-lock.yaml | |
| package-lock.json | |
| # Python caches and environments | |
| __pycache__/ | |
| *.py[cod] | |
| *.pyo | |
| .pytest_cache/ | |
| .mypy_cache/ | |
| .venv/ | |
| venv/ | |
| # Build / distribution directories (common conventions) | |
| dist/ | |
| build/ | |
| out/ | |
| target/ | |
| .next/ | |
| .nuxt/ | |
| # Logs and temporary files | |
| *.log | |
| *.tmp | |
| *.swp | |
| *.swo | |
| # IDE / editor settings | |
| .vscode/ | |
| .idea/ | |
| *.iml | |
| # OS-specific junk files | |
| .DS_Store | |
| Thumbs.db | |
| # Docker-related files that shouldn't be copied into images | |
| docker-compose*.yml | |
| Dockerfile* | |
| # Add project-specific large or local-only directories below as needed | |
| # e.g., data/, datasets/, .cache/, etc. |
| RUN chmod -R 777 /workspace/MAtCha | ||
| RUN git clone https://github.com/Anttwo/MAtCha.git . |
There was a problem hiding this comment.
chmod -R 777 grants world-writable permissions inside the image and is generally unsafe/unnecessary (especially as the container runs as root by default). Prefer creating a dedicated user and using chown/more restrictive permissions for /workspace/MAtCha.
| RUN chmod -R 777 /workspace/MAtCha | |
| RUN git clone https://github.com/Anttwo/MAtCha.git . | |
| RUN git clone https://github.com/Anttwo/MAtCha.git . | |
| RUN useradd -m -s /bin/bash matcha && chown -R matcha:matcha /workspace/MAtCha && chmod -R 755 /workspace/MAtCha |
| # Activate environment and install dependencies | ||
| RUN python install.py --env_name matcha | ||
| RUN python download_checkpoints.py |
There was a problem hiding this comment.
install.py currently uses os.system(...) without checking return codes, so the script can exit 0 even if environment creation/build steps fail (which would allow docker build to succeed with a broken image). Consider updating install.py to fail fast on non-zero exit codes (e.g., subprocess.run(..., check=True)) or replace this with explicit shell commands in the Dockerfile that will fail the build on error.
| # Activate environment and install dependencies | |
| RUN python install.py --env_name matcha | |
| RUN python download_checkpoints.py | |
| # Activate environment, install dependencies, and download checkpoints (fail fast on error) | |
| RUN set -euo pipefail && \ | |
| python install.py --env_name matcha && \ | |
| python download_checkpoints.py |
| RUN python download_checkpoints.py | ||
|
|
There was a problem hiding this comment.
Downloading pretrained checkpoints during docker build makes the image much larger and forces network access at build time. Consider gating this behind a build ARG (default off) or moving checkpoint download to runtime with a volume-mounted cache, so users can choose between a lean image and a fully bundled one.
| RUN python download_checkpoints.py | |
| # Optionally download pretrained checkpoints at build time. | |
| # To skip this step and keep the image lean, build with: | |
| # docker build --build-arg DOWNLOAD_CHECKPOINTS=false ... | |
| ARG DOWNLOAD_CHECKPOINTS=true | |
| RUN if [ "$DOWNLOAD_CHECKPOINTS" = "true" ]; then python download_checkpoints.py; else echo "Skipping checkpoint download at build time."; fi |
| RUN python install.py --env_name matcha | ||
| RUN python download_checkpoints.py | ||
|
|
||
| ENTRYPOINT ["conda", "run", "--no-capture-output", "-n", "matcha", "/bin/bash", "-c"] |
There was a problem hiding this comment.
The ENTRYPOINT uses bash -c, which only works correctly when the user passes a single shell string. Common Docker usage like docker run image python train.py ... will be interpreted by bash as -c "python" (dropping the remaining args), so the command won’t run as intended. Prefer an entrypoint that executes "$@" (or set ENTRYPOINT to conda run ... -n matcha without bash -c) so arguments are preserved.
| ENTRYPOINT ["conda", "run", "--no-capture-output", "-n", "matcha", "/bin/bash", "-c"] | |
| ENTRYPOINT ["conda", "run", "--no-capture-output", "-n", "matcha"] |
| python3 \ | ||
| python3-pip \ |
There was a problem hiding this comment.
python3/python3-pip are installed via apt, but the image then installs and uses conda’s Python. If there isn’t a runtime need for the system Python, removing these packages would reduce image size and attack surface.
| python3 \ | |
| python3-pip \ |
| RUN wget --quiet https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh -O ~/miniforge.sh && \ | ||
| /bin/bash ~/miniforge.sh -b -p $CONDA_DIR && \ | ||
| rm ~/miniforge.sh |
There was a problem hiding this comment.
The wget command downloads and executes the Miniforge installer script from a mutable latest URL without any integrity or authenticity verification. If an attacker compromises the conda-forge/miniforge release channel or the network path, they can inject arbitrary code that will be executed at build time with full privileges. Pin the download to a specific, expected release artifact and verify its checksum or signature before execution to prevent supply-chain compromise.
Pretty much self-explanatory, a Dockerfile for easy deployment of MAtCHa.
To build:
docker build -t matcha:latest .Usage:
docker run -it \ --shm-size=8g \ --runtime=nvidia \ --gpus all \ matcha:latest \ "python train.py --sfm_config unposed -s /path/to/images --output /path/to/output"