Skip to content

ipykernel/repl support #63

@jlewi

Description

@jlewi

We'd like to support ipykernel to provide a full repl that can be reused across cells.

I think there are two key problems on the backend we need to solve

  1. Managing ipykernel processes
  2. Creating a websockets -> zeroMQ bridge

Managing ipykernel processes

We can already do this today with runme and our existing UI. We just need to create a cell and then execute

python -m ipykernel write_connection_file
python -m ipykernel_launcher -f kernel-12345.json

A kernel is now defined by the tuple (host, kernel_file). So in the webapp we could do something like.

runners.addIPykernel(name, host, kernel_file)

Websocket to ZeroMQ bridge

Runme already manages a websocket connection. The handler is here.

The payload of websockets requests responses is defined in the websockets proto. The message basically consists of

  • Metadata identifying the type of request
  • Actual payload for the message

So we can add fields to that proto to

  • Indicate its an ipykernel message
  • The tuple tuple identifying the kernel (host, kernel_file)
  • Payload containing the actual Jupyter/Ipykernel message

The kernel_file should be sufficient for the server to route the message to the right ipykernel.
So now the server just needs to use zmq to pass along the messages to the kernel.

I believe the python version of that code is https://github.com/jupyter-server/jupyter_server/blob/main/jupyter_server/services/kernels/connection/channels.py#L445. So we just need an equivalent golang version.

My hunch is we can just point codex at that file and create a go lang version without much problems.

ZeroMQ in GoLang

The biggest issue I see is zmq support in GoLang.
zeromq is implemented in c++.

goczmq - Looks like the official golang bindings but they haven't been updated in quite a while
zmq4 - Looks like a slightly more recent version of bindings for the c++ library
go-zeromq - Pure go implementation of zeromq

I think we should start with go-zeromq because its generally better to avoid CGO if you can.

That said, one of the main challenges with CGO is that you have to distribute libraries which is annoying. In our case though, ipykernel also depends on libzmq so I think we can assume libzmq is already installed on the user's system and basically offload that problem to python/pip or whatever the user is using to install ipython/ipykernel on their machine.

I still think we should start with go-zermoq.

Alternatives

jupyter's kernel_gateway

jupyter's kernel_gateway provides an HTTP server that handles

websockets/REST -> zeromq

It allows you to execute code snippets in a kernel via HTTP protocols so you don't have to speak zeromq directly.

It looks like the heavy-lifting of websockets->zeromq is done in ZMQChannelsWebsocketConnection which is imported from the jupyter-server repo. So it looks like kernel_gateway is just a slimmed down version of the jupyter server to support the minimum API to execute code in a kernel.

In addition to handling websocket -> zeromq it looks like kernel-gateway handles the following gateway concerns

  • CORS
  • Auth/Identity

Although it looks like this too is functionality imported from jupyter_server

We already have all of this functionality in the runme server. So the only missing piece is ZMQChannelsWebsocketConnection.

I think the existing runme-server is a much better choice for a gateway then kernel_gateway. runme is a golang binary which means we can easily create and distribute statically linked binaries for arbitrary platforms. This makes it super easy to distribute. I'd much rather take on the challenge of porting ZMQChannelsWebsocketConnection to go to the distribution challenges that python creates.

  • Porting ZMQChannelsWebsocketConnection is a one time challenge; that we can probably solve with AI
  • Dealing with the myriad challenges of python environment setups is going to be an ongoing problem

Our runme server has other services in GoLang e.g.

If we introduce kernel_gateway our options are

  1. Rewrite everything in python
  2. Have two servers and distribute both

Right now we have a monolith architecture and can selective enable/disable services. I think the benefits of this design outweigh the benefits we get from kernel_gateway.

Jupyter Server

Jupyter server architecture diagram

Based on that doc Jupyter server provides the following components

  • ServerApp
  • Config Manager
  • Custom Extensions
  • Gateway Server (I think this is the same as kernel_gateway)
  • Contents Manager and File Contents Manager
  • Session Manager
  • Mapping Kernel Manager

I don't think we need/want any of this.

We already have a server that handles basic serving concerns such as Auth and configuration.

We already have a way to manage kernels just by using runme's functionality to start/stop/monitor kernels by executing the relevant python/ps commands.

Contents Manager/File Contents Manger

  • For notebooks we want the webapp to talk directly to the storage system (e.g. Google Drive) or the local file system via File System API

Session Manager

  • To the extent we need to keep track of sessions I don't think Jupyter's session manager will work for us
  • I think we will need to figure out the right way to handle this in light of the way our notebook works
    • e.g. on the web app we keep track of runners and websocket connections
    • The webapp can manage kernels by executing relevant commands on the server

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions