-
Notifications
You must be signed in to change notification settings - Fork 1
RFC-0003: media cluster topology with multi-zones #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
giangndm
wants to merge
3
commits into
8xFF:main
Choose a base branch
from
giangndm:media-global-cluster
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,101 @@ | ||
| - Feature Name: media-global-cluster | ||
| - Start Date: 2024-02-02 | ||
| - RFC PR: [8xff/rfcs#0003](https://github.com/8xff/rfcs/pull/0003) | ||
|
|
||
| # Summary | ||
| [summary]: #summary | ||
|
|
||
| This RFC proposes way to orchestrate cluster topology and how we select the best node for each request. | ||
|
|
||
| # Motivation | ||
| [motivation]: #motivation | ||
|
|
||
| In live stream application, the most important thing is the quality of the stream. We need to ensure that the stream is delivered to the client with the best quality. To do that, we need to select the best node to serve each request. The best node is the node which is closest to the client and have the best network connection to the source. | ||
|
|
||
| # User Benefit | ||
|
|
||
| The user can setup a cluster across multiple regions and the system will automatically select the best node to serve each request. Mostly steps is automated and the user only need to setup the cluster and the system will do the rest. | ||
|
|
||
| # Design Proposal | ||
|
|
||
| ## Abstract Design | ||
|
|
||
| ### Gateway | ||
|
|
||
|  | ||
|
|
||
| We propose a gateway node, which will take care to route each request to best node. The process is described as below: | ||
|
|
||
| Update info process: | ||
|
|
||
| - Each node will broadcast its information to all gateway nodes in same zone. | ||
| - Each gateway node will broadcast its information to all gateways. | ||
|
|
||
| Routing process: | ||
|
|
||
| - If received gateway node is closest to the client, it will chose the best node to serve the request and send back the information to client. | ||
| - If received gateway node is not closest to the client, it will forward the request to the closest gateway node. | ||
|
|
||
| ### Topology | ||
|
|
||
|  | ||
|
|
||
| Each media servers is connected to all gateway nodes in same zone. Each gateway node is connected to all gateway nodes in same zone and all gateway nodes in other zones. | ||
|
|
||
| By that way, data exchange between node inside a zone with take care by gateway nodes in that zone, this ensure both data cost and latency. | ||
|
|
||
| Data exchange between node in different zone will be send from media server to gateway node in source zone, then to gateway node in destination zone, then to media server in destination zone. With the help of atm0s-sdn, we can ensure that the data will be send through the best network path. Note that the data can be relayed by other gateway nodes. | ||
|
|
||
| ## Implementation Details | ||
|
|
||
| ### Gateway | ||
|
|
||
| Each media-server will broadcast it's information to all gateway nodes in same zone over a pubsub channel `gateway-zone-{zone-id}`. The information will include: | ||
|
|
||
| - Live count | ||
| - Max count | ||
| - Node usage | ||
| - Transport protocol | ||
|
|
||
| Each gateway node will broadcast it's information to all gateway nodes over a pubsub `gateway-global`. The information will include: | ||
|
|
||
| - Zone id | ||
| - Lat, Long | ||
| - Summary of all media servers grouped by transport protocol (live count, max count, nodes usage) | ||
|
|
||
| For detect location of clients, each gateway node will have a geo-location database, (for example: [GeoLite2](https://dev.maxmind.com/geoip/geoip2/geolite2/)). Each time a client connect to gateway node, the gateway node will detect the location of client and finding closest gateway node to client. If the closest gateway node has same location for itself, it will chose the best node to serve the request and send back the information to client. If the closest gateway node is not itself, it will forward the request to the closest gateway node. | ||
|
|
||
| ### Topology | ||
|
|
||
| We use atm0s-sdn manual discovery service to build topology, it is done by config local tags and connect tags. The config is described as below: | ||
|
|
||
| | Server | Local Tags | Connect Tags | | ||
| | ------------- | ------------------------------ | ------------ | | ||
| | Gateway | gateway, gateway-{zone-id} | gateway | | ||
| | Media Server | media-{protocol}-{zone-id} | gateway-{zone-id} | | ||
|
|
||
| ## Potential Impact and Risks | ||
|
|
||
| This topology is relized on node configured zone-id, it that value is wrong, the cluster will not work correctly. | ||
| Other risks is accuracy of geo-location database, if the database is not accurate, the gateway node will not chose the best node to serve the request. | ||
|
|
||
| # Rationale and alternatives | ||
| [rationale-and-alternatives]: #rationale-and-alternatives | ||
|
|
||
| Whe have 2 alternatives but we think that the proposed design is the best in the space of possible designs. | ||
|
|
||
| - Single zone: We can use a single zone and use a single gateway node to serve all requests. This is the simplest way but it will not work well in case of large scale. | ||
| - Manual multi zones: We can use a multi zones and manually configure the best node for each request. This is the most flexible way but it will take a lot of time to configure and maintain. | ||
|
|
||
| # Unresolved questions | ||
| [unresolved-questions]: #unresolved-questions | ||
|
|
||
| None at this time. | ||
|
|
||
| # Future possibilities | ||
| [future-possibilities]: #future-possibilities | ||
|
|
||
| We have some possibilities to improve the system: | ||
|
|
||
| - Node selection logic can be improved by using machine learning to predict the best node for each request. | ||
| - Node selection can based on more metrics: CPU, RAM, Disk, Network, ... | ||
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For simple, each node should send ping to only one gateway node, the gateway node will do the rest