-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
help wantedExtra attention is neededExtra attention is needed
Description
The master and nodes for Gemini currently operate on a master-oriented model, where nodes individually make calls to the master to exchange information. Here is a non-exhaustive list of information that will be transferred:
- Ping/heartbeat
- Jobs to execute
- CPU/memory and other metrics
We may also take the other approach and choose a node-oriented model where a master initiates communication with nodes. Some points about this that I have in mind:
- Nodes will need to run a Flask server for the masters to hit. Would have to exchange this information on node startup. This also calls for additional firewall/routing rules.
- Unclear how multiple masters would interact with eachother and divide work
- Will need to code retries into calls. For example, if a master makes a call to nodes when a job is submitted, if that call fails, the master will have to schedule a retry.
We can also choose to use a hybrid model, where some information is pushed to master, and some information is pushed to node.
/cc @ncatelli
Metadata
Metadata
Assignees
Labels
help wantedExtra attention is neededExtra attention is needed