Skip to content

AESM Error while executing Confidential PyTorch Example in Docker #82

@asim29

Description

@asim29

Hi, I am trying to run the end-to-end confidential pytorch example from this tutorial. I was able to run the non-confidential part of the tutorial using gramine-sgx, but I am running into the following error when trying to run the confidential example:

root@xyz:pytorch-confidential# gramine-sgx ./pytorch pytorchexample.py
Gramine is starting. Parsing TOML manifest file, this may take some time...
error: Cannot connect to AESM service (tried sgx_aesm_socket_base and /var/run/aesmd/aesm.socket UNIX sockets).
Please check its status! (`service aesmd status` on Ubuntu)
error: load_enclave() failed with error: No such file or directory (ENOENT)

When I try to run service aesmd status I get the following output:

root@xyz:pytorch-confidential# service aesmd status
aesmd: unrecognized service

I followed the tutorial and I can see that the sgx-aesm-service service is installed. The docker file I am using to run Gramine is:

FROM ubuntu:20.04

ENV DEBIAN_FRONTEND=noninteractive
ENV LC_ALL=C.UTF-8 LANG=C.UTF-8

# Main Dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
        wget \
        gnupg \
        ca-certificates \
        software-properties-common \
        libnss-mdns \
        libnss-myhostname \
        git \
        curl \
        linux-headers-5.15.0-52-generic \
        openssh-client \
        screen \
        && apt-get clean && rm -rf /var/lib/apt/lists/*

# Gramine Dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
        build-essential \
        autoconf \ 
        bison \ 
        gawk \
        nasm \
        ninja-build \
        pkg-config \
        python3 \
        python3-click \
        python3-jinja2 \
        python3-pip \
        python3-pyelftools 

RUN python3 -m pip install 'meson>=0.56' 'tomli>=1.1.0' 'tomli-w>=0.4.0'

# Intel SGX-related Dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
        libprotobuf-c-dev \
        protobuf-c-compiler \
        protobuf-compiler \
        python3-cryptography \
        python3-protobuf

# Intel SGX SDK/PSW
RUN ["/bin/bash", "-c", "set -o pipefail && echo 'deb [trusted=yes arch=amd64] https://download.01.org/intel-sgx/sgx_repo/ubuntu focal main' | tee /etc/apt/sources.list.d/intel-sgx.list"]
RUN ["/bin/bash", "-c", "set -o pipefail && wget -qO - https://download.01.org/intel-sgx/sgx_repo/ubuntu/intel-sgx-deb.key | apt-key add -"]

RUN apt-get update && apt-get install -y --no-install-recommends \
        libsgx-epid \
        libsgx-quote-ex \
        libsgx-dcap-ql \
        libsgx-quote-ex \
        libsgx-quote-ex-dev \
        libsgx-qe3-logic \
        sgx-aesm-service 

# DCAP 
RUN curl -fsSLo /usr/share/keyrings/intel-sgx-deb.asc https://download.01.org/intel-sgx/sgx_repo/ubuntu/intel-sgx-deb.key
RUN apt-get update && apt-get install -y --no-install-recommends \
        libsgx-dcap-ql-dev \
        libsgx-dcap-quote-verify-dev \
        libsgx-dcap-default-qpl \
        libsgx-dcap-default-qpl-dev \
        && apt-get clean && rm -rf /var/lib/apt/lists/*

# Build and Install Gramine
ENV HOMEDIR=/home
ENV GRAMINEDIR=${HOMEDIR}/gramine

WORKDIR ${GRAMINEDIR}
RUN git clone https://github.com/gramineproject/gramine.git ${GRAMINEDIR} 

RUN meson setup build/ --buildtype=release -Ddirect=enabled -Dsgx=enabled -Ddcap=enabled 
RUN ninja -C build/ 
RUN ninja -C build/ install 
RUN gramine-sgx-gen-private-key

# Install PyTorch
RUN pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

The manifest template (edited as shown in the tutorial):

# SPDX-License-Identifier: LGPL-3.0-or-later

# PyTorch manifest template

loader.entrypoint = "file:{{ gramine.libos }}"
libos.entrypoint = "{{ entrypoint }}"

loader.log_level = "{{ log_level }}"

loader.env.LD_LIBRARY_PATH = "/lib:/usr/lib:{{ arch_libdir }}:/usr/{{ arch_libdir }}"
loader.env.HOME = "{{ env.HOME }}"

# Restrict the maximum number of threads to prevent insufficient memory
# issue, observed on CentOS/RHEL.
loader.env.OMP_NUM_THREADS = "8"

loader.insecure__use_cmdline_argv = true

fs.mounts = [
  { path = "{{ entrypoint }}", uri = "file:{{ entrypoint }}" },
  { path = "/lib", uri = "file:{{ gramine.runtimedir() }}" },
  { path = "/usr/lib", uri = "file:/usr/lib" },
  { path = "{{ arch_libdir }}", uri = "file:{{ arch_libdir }}" },
  { path = "/usr/{{ arch_libdir }}", uri = "file:/usr/{{ arch_libdir }}" },
{% for path in python.get_sys_path(entrypoint) %}
  { path = "{{ path }}", uri = "file:{{ path }}" },
{% endfor %}

  { type = "tmpfs", path = "/tmp" },

  { path = "/classes.txt", uri = "file:classes.txt", type = "encrypted" },
  { path = "/input.jpg", uri = "file:input.jpg", type = "encrypted" },
  { path = "/alexnet-pretrained.pt", uri = "file:alexnet-pretrained.pt", type = "encrypted" },
  
  { path = "/result.txt", uri = "file:result.txt", type = "encrypted" },
]

sgx.enclave_size = "4G"
sgx.max_threads = 32
sgx.edmm_enable = {{ 'true' if env.get('EDMM', '0') == '1' else 'false' }}

sgx.trusted_files = [
  "file:{{ entrypoint }}",
  "file:{{ gramine.libos }}",
  "file:{{ gramine.runtimedir() }}/",
  "file:/usr/lib/",
  "file:{{ arch_libdir }}/",
  "file:/usr/{{ arch_libdir }}/",
{% for path in python.get_sys_path(entrypoint) %}
  "file:{{ path }}{{ '/' if path.is_dir() else '' }}",
{% endfor %}

  "file:pytorchexample.py",

]

sgx.allowed_files = [
  "file:ssl/ca.crt",
]

sys.enable_extra_runtime_domain_names_conf = true

sgx.remote_attestation = "dcap"

loader.env.LD_PRELOAD = "libsecret_prov_attest.so"
loader.env.SECRET_PROVISION_CONSTRUCTOR = "1"
loader.env.SECRET_PROVISION_SET_KEY = "default"
loader.env.SECRET_PROVISION_CA_CHAIN_PATH = "ssl/ca.crt"
loader.env.SECRET_PROVISION_SERVERS = "localhost:4433"

# Gramine optionally provides patched OpenMP runtime library that runs faster inside SGX enclaves
# (add `-Dlibgomp=enabled` when configuring the build). Uncomment the line below to use the patched
# library. PyTorch's SGX perf overhead decreases on some workloads from 25% to 8% with this patched
# library. Note that we need to preload the library because PyTorch's distribution renames
# libgomp.so to smth like libgomp-7c85b1e2.so.1, so it's not just a matter of searching in the
# Gramine's Runtime path first, but a matter of intercepting OpenMP functions.
# loader.env.LD_PRELOAD = "/lib/libgomp.so.1"

I launch the provisioning server before I run the gramine commands and I can see it running in the background using the top command.

I am unsure why the service command cannot find the aesmd service. I can see that the container does indeed contain the following files:

  • /lib/systemd/system/aesmd.service
  • /etc/aesmd.conf
  • /opt/intel/sgx-aesm-service/aesm/aesm_service

The aesmd.conf file looks like this:

#Line with comments only

	  #empty line with comment
#proxy type    = direct #direct type means no proxy used
#proxy type    = default #system default proxy
#proxy type    = manual #aesm proxy should be specified for manual proxy type
#aesm proxy    = http://proxy_url:proxy_port
#whitelist url = http://sample_while_list_url/
#default quoting type = ecdsa_256
#default quoting type = epid_linkable
#default quoting type = epid_unlinkable
#qpl log level = error
#qpl log level = infocat: n: No such file or directory

Have I done something wrong in the installation process, or is something extra required to make this work within a Docker container?

I appreciate any help you can provide.

Best,
Asim.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions