From a4c9764d001cd9a355b2c588f5dc73567021a43f Mon Sep 17 00:00:00 2001 From: Giang Minh Date: Fri, 2 Feb 2024 17:52:39 +0700 Subject: [PATCH 01/15] media-webrtc-sdk --- text/0000-media-webrtc-sdk.md | 498 ++++++++++++++++++++++++++++++++++ 1 file changed, 498 insertions(+) create mode 100644 text/0000-media-webrtc-sdk.md diff --git a/text/0000-media-webrtc-sdk.md b/text/0000-media-webrtc-sdk.md new file mode 100644 index 0000000..868853a --- /dev/null +++ b/text/0000-media-webrtc-sdk.md @@ -0,0 +1,498 @@ +- Feature Name: media-webrtc-sdk +- Start Date: 2024-02-02 +- RFC PR: [8xff/rfcs#0000](https://github.com/8xff/rfcs/pull/0000) + +# Summary +[summary]: #summary + +This RFC describes how WebRTC client is connect to atm0s-media-server and how it works. + +# Motivation +[motivation]: #motivation + +We need to create a client sdk which can be used in any platform and can be customized to fit with any use case. This sdk should be easy to use and easy to customize. It also need to work-well with atm0s-media-server network topology and our aproach to handle media stream. + +# User Benefit + +User can use WebRTC to connect to media-server, and can create very complex media stream topology. + +# Design Proposal + +For simplicty we proposal a sdk protocol which only use with HTTP (not websocket) and WebRTC. + +- HTTP is only used for sending RPC request to cluster (typicaly is Connect and Retry phase). +- WebRTC is used for sending and receiving media stream and also rpc, event after connected. + +### HTTP Request/Response format + +All request and response will be encoded in JSON format. The format is described as below: + +Request: JSON + +Response: + +```json +{ + success: bool, + error_code: Option, + error_msg: Option, + data: Option, +} +``` + +### Connect request + +Client can prepare some senders or receivers before connect to server. When client connect to server, it will send a connect request to server. + +Client must to prepare: + +- WebRTC connection with datachannel enabled. +- List of senders and receivers. +- WebRTC OfferSDP. + +Endpoint: POST `GATEWAY/webrtc/connect` + +Header: + Authorization: Bear {token} +Body: +```json +{ + version: Option, + room: String, + peer: String, + event: { + publish: "full" | "track", + subscribe: "full" | "track" | "manual", + }, + bitrate: { + ingress: "save" | "max", + }, + features: JSON, + tracks: { + receivers: [ + { + kind: "audio" | "video", + id: String, + state: Option<{ + remote: Option<{ + peer: String, + track: String, + }>, + limit: Option<{ + priority: u16, + min_spatial: Option, + max_spatial: u8, + min_temporal: Option, + max_temporal: u8, + }>, + }> + } + ], + senders: [ + { + kind: "audio" | "video", + id: String, + uuid: String, + label: String, + state: Option<{ + screen: bool, + pause: bool, + }>, + } + ], + }, + sdp: Option +} +``` + +In there: + +- version: is the version of client sdk. +- event: + - publish: full will publish both peer info and tracks info. track will only publish tracks info. + - subscribe: full will subscribe both remote peer info and tracks info. track will only subscribe remote tracks info. This feature is useful for client which want to use manual mode to subscribe remote tracks. Example in spatial room application, client will set to `manual` and only subscribe to peer which near to it. + +- bitrate: + - ingress is the bitrate mode for ingress stream. In `save` mode, media-server will limit bitrate based on network and consumers. In `max` mode, media-server will only limit bitrate by network and media-server config. + +- features: json object for containing some features which client want to use. For example: mix-minus, spatial room, etc. +- tracks: + - receivers: list of receivers which client want to create. Each receiver is described with: + - kind: is the kind of receiver, audio or video. + - id: is the id of receiver. + - remote: is the remote source which client want to pin to. If it's none, the receiver will be created but not pin to any source. + - limit: is the limit of receiver. If it's none, the receiver will be created with default limit. + + - senders: list of senders which client want to create. Each sender is described with: + - kind: is the kind of sender, audio or video. + - id: is the id of sender. + - uuid: is the uuid of sender. It's used to identify the sender in client side. + - label: is the label of sender. It's used to identify the sender in client side. + - screen: is the flag to indicate that the sender is screen sharing. +- sdp: is the OfferSDP which client created. + +After that server will success response with data: + +```json +{ + sdp: Option, + conn_id: String, +} +``` + +In there: + +- sdp: is the AnswerSDP which server created, it should contain all ice-candidates from server. +- conn_id: global identify of WebRTC connection. This is used by control api like restart-ice, ice-trickle, kick, etc. + +Error list: + +| Error code | Description | +|------------|-------------| +| INVALID_TOKEN | The token is invalid. | +| SDP_ERROR | The sdp is invalid. | +| INVALID_REQUEST | The request is invalid. | +| INTERNAL_SERVER_ERROR | The server is error. | +| GATEWAY_ERROR | The gateway is error. | + +After that client need to wait for connected event from WebRTC connection and connected event from datachannel. +If after a period of time, client don't receive any event, it will set restart ice flag and retry connect to server with newest offer-sdp. +After some tries (configurable), client will stop retry and report error to user as CONNECTION_TIMEOUT. + +### Restart-ice + +Endpoint: POST `GATEWAY/webrtc/:conn_id/restart-ice` +```json +BODY is same with connect request but the tracks should sending with current state +``` + +By that way, incase of network change, client can retry connect to server with newest offer-sdp and if the server is still alive, it will response with new answer-sdp. If the server is dead, client will retry connect to another server and can be restore the session state by using track state. + +### Ice-tricle + +Each time client WebRTC connection has a new ice-candidate, it should sending to gateway over: + +Endpoint: POST `GATEWAY/webrtc/conns/:conn_id/ice-remote` +Request data: candidate String +Response data: None + +### Datachannel Request/Response format + +All request and response sending over datachannel will be encoded in JSON format. The format is described as below: + +Request/Event: +```json +{ + type: "event" | "request", + seq: Number, + cmd: String, + data: Option, +} +``` + +Response: + +```json +{ + type: "answer", + seq: Number, + success: bool, + error_code: Option, + error_msg: Option, + data: Option, +} +``` + +The seq is incresemental value, which is generated in sender side. The seq is helped us to mapping between request and response and also detect data lost. + +The cmd is generate with rule: `identify.action`, for example: +- `peer.updateSdp`. +- `sender.{sender_id}.toggle`. +- `receiver.{receiver_id}.switch`. + +### In-session requests + +At current state, we will have only one WebRTC connection to server. So we don't need to send any request to server. All request will be send over datachannel. + +Typicaly, client will need some actions with media server: + +- Create/Release sender +- Create/Release receiver +- Sender action: pause, resume, switch stream +- Receiver action: pause, resume, switch remote source, update priority and layers + +All action which changed streams will be do at local-first, then calling updateSdp to server. + +#### UpdateSDP + +Each time we changed something in WebRTC connection, we need to send updateSdp request to server over datachannel. The request will be described in below: + +request: `peer.updateSdp` +Request data: +```json +{ + sdp: String, + tracks: { + receivers: [ + { + kind: "audio" | "video", + id: String, + } + ], + senders: [ + { + kind: "audio" | "video", + id: String, + uuid: String, + label: String, + screen: Option, + } + ], + } +} +``` + +Response data: +```json +{ + sdp: String +} +``` + +#### Room actions, event + +We can subscribe to peers event (joined, leaved, track added, track removed) and also can unsubscribe from it. + +```json +Request: room.peers.subscribe +Request data: +{ + peer_id: String, +} + +Response: None +``` + +```json +Request: room.peers.unsubscribe +Request data: +{ + peer_id: String, +} + +Response: None +``` + +Room have some events: + +```json +``` + +#### Session actions, event + +```json +Request: session.disconnect +Request data: None +Response: None +``` + +#### Session Sender create/release, actions, events + +For create a sender we need to create a transiver with kind is audio or video. After that we need to create a track and add it to transiver. Then we need to sending updateSdp request to server. + +For destroy a sender, we need to remove track from transiver and remove transiver from connection. Then we need to sending updateSdp request to server. + +Each sender support bellow actions: + +```json +Request: session.sender.{id}.toggle +Request data: +{ + track: Option, + label: Option, +} +Response: None +``` + +Each sender also fire some event with cmd prefix: "sender_event" json template: + +```json +Event: session.sender.{id}.state +Event data: { + state: "new" | "live" | "paused" +} +``` + +#### Session Receiver create/release, actions + +For create a receiver we need to create a transiver with kind is audio or video. After that we need to create a track and add it to transiver. Then we need to sending updateSdp request to server. + +Each receiver support bellow actions: + +```json +Request: session.receiver.{id}.switch +Request data: +{ + priority: u16, + remote: Option<{ + peer: String, + track: String, + }>, +} +Response: None +``` + +If remote is none, the receiver will be paused. + + +```json +Request: session.receiver.{id}.limit +Request data: +{ + priority: u16, + min_spatial: Option, + max_spatial: u8, + min_temporal: Option, + max_temporal: u8, +} +Response: None +``` + +Each sender also fire some event with cmd prefix: "sender_event" json template: + +``` +Event: session.receiver.{id}.state +Event data: "no_source" | "live" | "key_only" | "inactive" +``` + +```json +Event: session.receiver.{id}.stats +Event data: +{ + codec: String, + ingress: { + scaling: "single" | "simulcast" | "svc", + spatials: Number, + temporals: Number, + bitrate: Number, + rtt: Number, + lost: Number, + jitter: Number, + }, + egress: { + spatial: Number, + temporal: Number, + bitrate: Number, + }, +} +``` + +Receiver state is explain below: + +- no_source: The receiver is created but don't pin to any source. +- live: The receiver is live. +- key_only: The receiver is live but only receive key frame, this maybe for speed limiter. +- inactive: The receiver is pinned but not enough bandwidth to receive. + +#### Feature: mix-minus + +In connect request, we add field to features params: + +```json +{ + mix_minus: { + mode: "manual" | "auto", + sources: [ + { + peer: String, + track: String, + } + ] + } +} +``` + +Actions: + +```json +Request: session.features.mix-minus.add_source +Request data: +{ + peer: String, + track: String, +} +Response body: empty +``` + +```json +Request: session.features.mix-minus.remove_source +Request data: +{ + peer: String, + track: String, +} +Response body: empty +``` + +```json +Request: session.features.mix-minus.pause +Request data: +{ + peer: String, + track: String, +} +Response body: empty +``` + +```json +Request: session.features.mix-minus.resume +Request data: +{ + peer: String, + track: String, +} +Response body: empty +``` + +```json +Event: session.features.mix-minus.state +Event data: +{ + layers: [ + { + id: string, + remote: Option<{ + peer: String, + track: String, + }>, + audio_level: Number, + } + ] +} +``` + +# Drawbacks +[drawbacks]: #drawbacks + +No drawbacks. + +# Rationale and alternatives +[rationale-and-alternatives]: #rationale-and-alternatives + +We have some alternatives: + +- Whip/Whep: but it not flexible and cannot be used to create complex media stream topology. +- Livekit protocol: the protocol don't have document and it's is designed for Livekit server topology. + +# Unresolved questions +[unresolved-questions]: #unresolved-questions + +Not yet. + +# Future possibilities +[future-possibilities]: #future-possibilities + +We can have some improvements: + +- Dynamic event subscription. +- Binary event and request/response format. \ No newline at end of file From fcf23e035178b1e03ecfc173ee6687780483cb03 Mon Sep 17 00:00:00 2001 From: Giang Minh Date: Fri, 2 Feb 2024 17:55:26 +0700 Subject: [PATCH 02/15] assign media-webrtc-sdk rfc to id 0005 --- text/{0000-media-webrtc-sdk.md => 0005-media-webrtc-sdk.md} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename text/{0000-media-webrtc-sdk.md => 0005-media-webrtc-sdk.md} (100%) diff --git a/text/0000-media-webrtc-sdk.md b/text/0005-media-webrtc-sdk.md similarity index 100% rename from text/0000-media-webrtc-sdk.md rename to text/0005-media-webrtc-sdk.md From f4d5f82632f3cc62c3e563ad726cd731ee71ea08 Mon Sep 17 00:00:00 2001 From: Giang Minh Date: Sat, 3 Feb 2024 09:13:40 +0700 Subject: [PATCH 03/15] make it look better --- text/0005-media-webrtc-sdk.md | 390 +++++++++++++++++++++------------- 1 file changed, 237 insertions(+), 153 deletions(-) diff --git a/text/0005-media-webrtc-sdk.md b/text/0005-media-webrtc-sdk.md index 868853a..783e4ab 100644 --- a/text/0005-media-webrtc-sdk.md +++ b/text/0005-media-webrtc-sdk.md @@ -2,34 +2,36 @@ - Start Date: 2024-02-02 - RFC PR: [8xff/rfcs#0000](https://github.com/8xff/rfcs/pull/0000) -# Summary +# 1. Summary + [summary]: #summary This RFC describes how WebRTC client is connect to atm0s-media-server and how it works. -# Motivation +# 2. Motivation + [motivation]: #motivation We need to create a client sdk which can be used in any platform and can be customized to fit with any use case. This sdk should be easy to use and easy to customize. It also need to work-well with atm0s-media-server network topology and our aproach to handle media stream. -# User Benefit +# 3. User Benefit User can use WebRTC to connect to media-server, and can create very complex media stream topology. -# Design Proposal +# 4. Design Proposal For simplicty we proposal a sdk protocol which only use with HTTP (not websocket) and WebRTC. - HTTP is only used for sending RPC request to cluster (typicaly is Connect and Retry phase). - WebRTC is used for sending and receiving media stream and also rpc, event after connected. -### HTTP Request/Response format +## 4.1 HTTP Request/Response format All request and response will be encoded in JSON format. The format is described as below: -Request: JSON +**Body:**: JSON -Response: +**Response:\*** ```json { @@ -40,7 +42,7 @@ Response: } ``` -### Connect request +## 4.2 Connect request Client can prepare some senders or receivers before connect to server. When client connect to server, it will send a connect request to server. @@ -50,11 +52,12 @@ Client must to prepare: - List of senders and receivers. - WebRTC OfferSDP. -Endpoint: POST `GATEWAY/webrtc/connect` +**_Endpoint_**: `POST GATEWAY/webrtc/connect` + +**_Headers_**: `Authorization: Bear {token}` + +**_Body_**: -Header: - Authorization: Bear {token} -Body: ```json { version: Option, @@ -105,82 +108,93 @@ Body: } ``` -In there: +**_Response Data:_** + +```json +{ + sdp: Option, + conn_id: String, +} +``` + +In request: - version: is the version of client sdk. - event: - - publish: full will publish both peer info and tracks info. track will only publish tracks info. - - subscribe: full will subscribe both remote peer info and tracks info. track will only subscribe remote tracks info. This feature is useful for client which want to use manual mode to subscribe remote tracks. Example in spatial room application, client will set to `manual` and only subscribe to peer which near to it. + + - publish: `full` will publish both peer info and tracks info. `track` will only publish tracks info. + - subscribe: `full` will subscribe both remote peer info and tracks info. `track` will only subscribe remote tracks info. `manual` with not subscribe any source, client must do it manual. This feature is useful for client which want to use manual mode to subscribe remote tracks. Example in spatial room application, client will set to `manual` and only subscribe to peer which near to it. - bitrate: - - ingress is the bitrate mode for ingress stream. In `save` mode, media-server will limit bitrate based on network and consumers. In `max` mode, media-server will only limit bitrate by network and media-server config. + + - ingress is the bitrate mode for ingress stream. In `save` mode, media-server will limit bitrate based on network and consumers. In `max` mode, media-server will only limit bitrate by network and media-server config. - features: json object for containing some features which client want to use. For example: mix-minus, spatial room, etc. - tracks: - - receivers: list of receivers which client want to create. Each receiver is described with: - - kind: is the kind of receiver, audio or video. - - id: is the id of receiver. - - remote: is the remote source which client want to pin to. If it's none, the receiver will be created but not pin to any source. - - limit: is the limit of receiver. If it's none, the receiver will be created with default limit. - - - senders: list of senders which client want to create. Each sender is described with: - - kind: is the kind of sender, audio or video. - - id: is the id of sender. - - uuid: is the uuid of sender. It's used to identify the sender in client side. - - label: is the label of sender. It's used to identify the sender in client side. - - screen: is the flag to indicate that the sender is screen sharing. -- sdp: is the OfferSDP which client created. -After that server will success response with data: + - receivers: list of receivers which client want to create. Each receiver is described with: -```json -{ - sdp: Option, - conn_id: String, -} -``` + - kind: is the kind of receiver, audio or video. + - id: is the id of receiver. + - remote: is the remote source which client want to pin to. If it's none, the receiver will be created but not pin to any source. + - limit: is the limit of receiver. If it's none, the receiver will be created with default limit. -In there: + - senders: list of senders which client want to create. Each sender is described with: + - kind: is the kind of sender, audio or video. + - id: is the id of sender. + - uuid: is the uuid of sender. It's used to identify the sender in client side. + - label: is the label of sender. It's used to identify the sender in client side. + - screen: is the flag to indicate that the sender is screen sharing. + +- sdp: is the OfferSDP which client created. + +In response: - sdp: is the AnswerSDP which server created, it should contain all ice-candidates from server. - conn_id: global identify of WebRTC connection. This is used by control api like restart-ice, ice-trickle, kick, etc. Error list: -| Error code | Description | -|------------|-------------| -| INVALID_TOKEN | The token is invalid. | -| SDP_ERROR | The sdp is invalid. | -| INVALID_REQUEST | The request is invalid. | -| INTERNAL_SERVER_ERROR | The server is error. | -| GATEWAY_ERROR | The gateway is error. | +| Error code | Description | +| --------------------- | ----------------------- | +| INVALID_TOKEN | The token is invalid. | +| SDP_ERROR | The sdp is invalid. | +| INVALID_REQUEST | The request is invalid. | +| INTERNAL_SERVER_ERROR | The server is error. | +| GATEWAY_ERROR | The gateway is error. | After that client need to wait for connected event from WebRTC connection and connected event from datachannel. If after a period of time, client don't receive any event, it will set restart ice flag and retry connect to server with newest offer-sdp. After some tries (configurable), client will stop retry and report error to user as CONNECTION_TIMEOUT. -### Restart-ice +## 4.3 Restart-ice -Endpoint: POST `GATEWAY/webrtc/:conn_id/restart-ice` -```json -BODY is same with connect request but the tracks should sending with current state -``` +**_Endpoint_**: POST `GATEWAY/webrtc/:conn_id/restart-ice` + +**_Headers_**: `Authorization: Bear {token}` + +**_Body_**: same with connect request but the tracks should sending with current state + +**_Response Data_**: same with connect response By that way, incase of network change, client can retry connect to server with newest offer-sdp and if the server is still alive, it will response with new answer-sdp. If the server is dead, client will retry connect to another server and can be restore the session state by using track state. -### Ice-tricle +## 4.4 Ice-tricle Each time client WebRTC connection has a new ice-candidate, it should sending to gateway over: -Endpoint: POST `GATEWAY/webrtc/conns/:conn_id/ice-remote` -Request data: candidate String -Response data: None +**_Endpoint_**: POST `GATEWAY/webrtc/conns/:conn_id/ice-remote` + +**_Body_**: candidate String -### Datachannel Request/Response format +**_Response Data_**: None + +## 4.5 Datachannel Request/Response format All request and response sending over datachannel will be encoded in JSON format. The format is described as below: Request/Event: + ```json { type: "event" | "request", @@ -206,11 +220,16 @@ Response: The seq is incresemental value, which is generated in sender side. The seq is helped us to mapping between request and response and also detect data lost. The cmd is generate with rule: `identify.action`, for example: + - `peer.updateSdp`. - `sender.{sender_id}.toggle`. - `receiver.{receiver_id}.switch`. +- `room.peers.subscribe`. +- `room.peers.unsubscribe`. +- `session.disconnect`. +- `session.features.mix-minus.add_source`. -### In-session requests +## 4.6 In-session requests At current state, we will have only one WebRTC connection to server. So we don't need to send any request to server. All request will be send over datachannel. @@ -223,12 +242,14 @@ Typicaly, client will need some actions with media server: All action which changed streams will be do at local-first, then calling updateSdp to server. -#### UpdateSDP +### 4.6.1 UpdateSDP Each time we changed something in WebRTC connection, we need to send updateSdp request to server over datachannel. The request will be described in below: -request: `peer.updateSdp` -Request data: +**_Cmd:_**: `peer.updateSdp` + +**_Data:_** + ```json { sdp: String, @@ -252,86 +273,105 @@ Request data: } ``` -Response data: +**_Response data_**: + ```json { sdp: String } ``` -#### Room actions, event +### 4.6.2 Room actions, event We can subscribe to peers event (joined, leaved, track added, track removed) and also can unsubscribe from it. +#### 4.6.2.1 Subscribe to other peers event + +**_Cmd:_** `room.peers.subscribe` + +**_Request Data:_** + ```json -Request: room.peers.subscribe -Request data: { peer_id: String, } - -Response: None ``` +**_Response Data:_** None + +#### 4.6.2.2 Unsubscribe to other peers event + +**_Cmd:_**: `room.peers.unsubscribe` + +**_Request Data:_** + ```json -Request: room.peers.unsubscribe -Request data: { peer_id: String, } -Response: None ``` -Room have some events: +**_Response Data:_**: None -```json -``` +### 4.6.3 Session actions, event -#### Session actions, event +#### 4.6.3.1 Disconnect -```json -Request: session.disconnect -Request data: None -Response: None -``` +**_Cmd:_**: `session.disconnect` + +**_Request data:_**: None -#### Session Sender create/release, actions, events +**_Response:_**: None -For create a sender we need to create a transiver with kind is audio or video. After that we need to create a track and add it to transiver. Then we need to sending updateSdp request to server. +### 4.6.4 Session Sender create/release, actions, events + +For create a sender we need to create a transiver with kind is audio or video. After that we need to create a track and add it to transiver. Then we need to sending updateSdp request to server. For destroy a sender, we need to remove track from transiver and remove transiver from connection. Then we need to sending updateSdp request to server. -Each sender support bellow actions: +Each sender has some actions and some event with rule: `session.sender.{id}.{action}` + +#### 4.6.4.1 Switch sender source + +**_Cmd:_**: `session.sender.{id}.toggle` + +**_Request data:_** ```json -Request: session.sender.{id}.toggle -Request data: { track: Option, label: Option, } -Response: None ``` -Each sender also fire some event with cmd prefix: "sender_event" json template: +**_Response Data:_**: None + +**_Cmd:_**: `sender_event.{id}.state` + +#### 4.6.4.2 State event + +**_Event data:_**: ```json -Event: session.sender.{id}.state -Event data: { +{ state: "new" | "live" | "paused" } ``` -#### Session Receiver create/release, actions +### 4.6.5 Session Receiver create/release, actions For create a receiver we need to create a transiver with kind is audio or video. After that we need to create a track and add it to transiver. Then we need to sending updateSdp request to server. -Each receiver support bellow actions: +Each receiver has some actions and some event with rule: `session.receiver.{id}.{action}` -```json -Request: session.receiver.{id}.switch -Request data: +### 4.6.5.1 Switch receiver source + +**_Cmd:_**: `session.receiver.{id}.switch` + +**_Request data:_** + +````json { priority: u16, remote: Option<{ @@ -339,15 +379,18 @@ Request data: track: String, }>, } -Response: None -``` + +***Response data:***: None If remote is none, the receiver will be paused. +### 4.6.5.2 Limit receiver bitrate + +***Cmd:***: `session.receiver.{id}.limit` + +***Request data:*** ```json -Request: session.receiver.{id}.limit -Request data: { priority: u16, min_spatial: Option, @@ -355,22 +398,43 @@ Request data: min_temporal: Option, max_temporal: u8, } -Response: None -``` +```` -Each sender also fire some event with cmd prefix: "sender_event" json template: +**_Response data:_**: None +### 4.6.5.3 Receiver state event + +**_Event:_**: `session.receiver.{id}.state` + +**_Event data:_**: + +```json +{ + state: "no_source" | "live" | "key_only" | "inactive", + source: Option<{ + peer: String, + track: String, + }>, +} ``` -Event: session.receiver.{id}.state -Event data: "no_source" | "live" | "key_only" | "inactive" -``` + +Receiver state is explain below: + +- no_source: The receiver is created but don't pin to any source. +- live: The receiver is live. +- key_only: The receiver is live but only receive key frame, this maybe for speed limiter. +- inactive: The receiver is pinned but not enough bandwidth to receive. + +### 4.6.5.3 Receiver stats event + +**_Event:_**: `session.receiver.{id}.stats` + +**_Event data:_**: ```json -Event: session.receiver.{id}.stats -Event data: { codec: String, - ingress: { + ingress: Option<{ scaling: "single" | "simulcast" | "svc", spatials: Number, temporals: Number, @@ -378,85 +442,101 @@ Event data: rtt: Number, lost: Number, jitter: Number, - }, - egress: { + }>, + egress: Option<{ spatial: Number, temporal: Number, bitrate: Number, - }, + }>, } ``` -Receiver state is explain below: +## 4.7 Features -- no_source: The receiver is created but don't pin to any source. -- live: The receiver is live. -- key_only: The receiver is live but only receive key frame, this maybe for speed limiter. -- inactive: The receiver is pinned but not enough bandwidth to receive. +### 4.7.1 Feature: mix-minus mixer + +Mix-minus feature has 2 modes: -#### Feature: mix-minus +- Manual: client can add or remove source to mixer. +- Auto: media-server will automatically add or remove all audio sources except the local source to mixer. + +#### 4.7.1.1 Connect request In connect request, we add field to features params: ```json { - mix_minus: { - mode: "manual" | "auto", - sources: [ - { - peer: String, - track: String, - } - ] + features: { + mix_minus: { + mode: "manual" | "auto", + sources: [ + { + peer: String, + track: String, + } + ] + } } } ``` -Actions: +#### 4.7.1.2 Add source to mixer -```json -Request: session.features.mix-minus.add_source -Request data: -{ - peer: String, - track: String, -} -Response body: empty -``` +Note that, this action only work with `manual` mode. -```json -Request: session.features.mix-minus.remove_source -Request data: -{ - peer: String, - track: String, -} -Response body: empty -``` +**_Cmd:_**: `session.features.mix-minus.add_source` + +**_Request data:_** ```json -Request: session.features.mix-minus.pause -Request data: { peer: String, track: String, } -Response body: empty ``` +**_Response data:_**: None + +#### 4.7.1.3 Remove source from mixer + +Note that, this action only work with `manual` mode. + +**_Cmd:_**: `session.features.mix-minus.remove_source` + +**_Request data:_** + ```json -Request: session.features.mix-minus.resume -Request data: { peer: String, track: String, } -Response body: empty ``` +**_Response data:_**: None + +#### 4.7.1.4 Pause mix-minus mixer + +**_Cmd:_**: `session.features.mix-minus.pause` + +**_Request data:_** None + +**_Response data:_**: None + +#### 4.7.1.5 Resume mix-minus mixer + +**_Cmd:_**: `session.features.mix-minus.resume` + +**_Request data:_**: None + +**_Response data:_**: None + +#### 4.7.1.6 State event + +**_Cmd:_**: `session.features.mix-minus.state` + +**_Event data:_** + ```json -Event: session.features.mix-minus.state -Event data: { layers: [ { @@ -464,19 +544,21 @@ Event data: remote: Option<{ peer: String, track: String, + audio_level: Number, }>, - audio_level: Number, } ] } ``` -# Drawbacks +# 5. Drawbacks + [drawbacks]: #drawbacks No drawbacks. -# Rationale and alternatives +# 6. Rationale and alternatives + [rationale-and-alternatives]: #rationale-and-alternatives We have some alternatives: @@ -484,15 +566,17 @@ We have some alternatives: - Whip/Whep: but it not flexible and cannot be used to create complex media stream topology. - Livekit protocol: the protocol don't have document and it's is designed for Livekit server topology. -# Unresolved questions +# 7. Unresolved questions + [unresolved-questions]: #unresolved-questions Not yet. -# Future possibilities +# 8. Future possibilities + [future-possibilities]: #future-possibilities We can have some improvements: - Dynamic event subscription. -- Binary event and request/response format. \ No newline at end of file +- Binary event and request/response format. From e2d5692691154307dec973e25b0c60beacefcb3c Mon Sep 17 00:00:00 2001 From: Giang Minh Date: Sat, 3 Feb 2024 09:19:32 +0700 Subject: [PATCH 04/15] make it look better --- text/0005-media-webrtc-sdk.md | 38 +++++++++++++++++------------------ 1 file changed, 19 insertions(+), 19 deletions(-) diff --git a/text/0005-media-webrtc-sdk.md b/text/0005-media-webrtc-sdk.md index 783e4ab..a42d361 100644 --- a/text/0005-media-webrtc-sdk.md +++ b/text/0005-media-webrtc-sdk.md @@ -33,7 +33,7 @@ All request and response will be encoded in JSON format. The format is described **Response:\*** -```json +``` { success: bool, error_code: Option, @@ -58,7 +58,7 @@ Client must to prepare: **_Body_**: -```json +``` { version: Option, room: String, @@ -110,7 +110,7 @@ Client must to prepare: **_Response Data:_** -```json +``` { sdp: Option, conn_id: String, @@ -195,7 +195,7 @@ All request and response sending over datachannel will be encoded in JSON format Request/Event: -```json +``` { type: "event" | "request", seq: Number, @@ -206,7 +206,7 @@ Request/Event: Response: -```json +``` { type: "answer", seq: Number, @@ -250,7 +250,7 @@ Each time we changed something in WebRTC connection, we need to send updateSdp r **_Data:_** -```json +``` { sdp: String, tracks: { @@ -275,7 +275,7 @@ Each time we changed something in WebRTC connection, we need to send updateSdp r **_Response data_**: -```json +``` { sdp: String } @@ -291,7 +291,7 @@ We can subscribe to peers event (joined, leaved, track added, track removed) and **_Request Data:_** -```json +``` { peer_id: String, } @@ -305,7 +305,7 @@ We can subscribe to peers event (joined, leaved, track added, track removed) and **_Request Data:_** -```json +``` { peer_id: String, } @@ -338,7 +338,7 @@ Each sender has some actions and some event with rule: `session.sender.{id}.{act **_Request data:_** -```json +``` { track: Option, label: Option, @@ -353,7 +353,7 @@ Each sender has some actions and some event with rule: `session.sender.{id}.{act **_Event data:_**: -```json +``` { state: "new" | "live" | "paused" } @@ -371,7 +371,7 @@ Each receiver has some actions and some event with rule: `session.receiver.{id}. **_Request data:_** -````json +```` { priority: u16, remote: Option<{ @@ -390,7 +390,7 @@ If remote is none, the receiver will be paused. ***Request data:*** -```json +``` { priority: u16, min_spatial: Option, @@ -408,7 +408,7 @@ If remote is none, the receiver will be paused. **_Event data:_**: -```json +``` { state: "no_source" | "live" | "key_only" | "inactive", source: Option<{ @@ -431,7 +431,7 @@ Receiver state is explain below: **_Event data:_**: -```json +``` { codec: String, ingress: Option<{ @@ -464,7 +464,7 @@ Mix-minus feature has 2 modes: In connect request, we add field to features params: -```json +``` { features: { mix_minus: { @@ -488,7 +488,7 @@ Note that, this action only work with `manual` mode. **_Request data:_** -```json +``` { peer: String, track: String, @@ -505,7 +505,7 @@ Note that, this action only work with `manual` mode. **_Request data:_** -```json +``` { peer: String, track: String, @@ -536,7 +536,7 @@ Note that, this action only work with `manual` mode. **_Event data:_** -```json +``` { layers: [ { From 361c6a9bca786235ffc66fa7de692db4caae48c2 Mon Sep 17 00:00:00 2001 From: Giang Minh Date: Sat, 3 Feb 2024 09:46:56 +0700 Subject: [PATCH 05/15] fix grammar --- text/0005-media-webrtc-sdk.md | 133 ++++++++++++++++++---------------- 1 file changed, 71 insertions(+), 62 deletions(-) diff --git a/text/0005-media-webrtc-sdk.md b/text/0005-media-webrtc-sdk.md index a42d361..943bbc0 100644 --- a/text/0005-media-webrtc-sdk.md +++ b/text/0005-media-webrtc-sdk.md @@ -6,28 +6,33 @@ [summary]: #summary -This RFC describes how WebRTC client is connect to atm0s-media-server and how it works. +This RFC provides an overview of how the WebRTC client connects to atm0s-media-server and its functionality. # 2. Motivation [motivation]: #motivation -We need to create a client sdk which can be used in any platform and can be customized to fit with any use case. This sdk should be easy to use and easy to customize. It also need to work-well with atm0s-media-server network topology and our aproach to handle media stream. +To enhance the user experience, we aim to develop a client SDK that is platform-agnostic and highly customizable. This SDK should provide a seamless integration with atm0s-media-server network topology and align with our approach to handle media streams. Our primary focus is to ensure that the SDK is user-friendly and easily adaptable to various use cases. # 3. User Benefit -User can use WebRTC to connect to media-server, and can create very complex media stream topology. +User can use WebRTC to connect to the media server and create complex media stream topologies. # 4. Design Proposal -For simplicty we proposal a sdk protocol which only use with HTTP (not websocket) and WebRTC. +To ensure simplicity, we propose a SDK protocol that exclusively uses HTTP (not WebSocket) and WebRTC. -- HTTP is only used for sending RPC request to cluster (typicaly is Connect and Retry phase). -- WebRTC is used for sending and receiving media stream and also rpc, event after connected. +- HTTP is used only for sending RPC requests to the cluster, typically during the Connect and Retry phases. +- WebRTC is utilized for sending and receiving media streams, as well as RPC and event communication after establishing a connection. ## 4.1 HTTP Request/Response format -All request and response will be encoded in JSON format. The format is described as below: + +**Request and Response Format** + +All requests and responses will be encoded in JSON format. The format is described as follows: + +```json **Body:**: JSON @@ -42,15 +47,16 @@ All request and response will be encoded in JSON format. The format is described } ``` -## 4.2 Connect request -Client can prepare some senders or receivers before connect to server. When client connect to server, it will send a connect request to server. +## 4.2 Connect Establishment + +Before connecting to the server, the client needs to prepare the following: -Client must to prepare: +- Enable WebRTC connection with data channel. +- Create a list of senders and receivers. +- Generate WebRTC OfferSDP. -- WebRTC connection with datachannel enabled. -- List of senders and receivers. -- WebRTC OfferSDP. +Once the client is ready, it can send a connect request to the server. **_Endpoint_**: `POST GATEWAY/webrtc/connect` @@ -117,41 +123,41 @@ Client must to prepare: } ``` -In request: +The explanation of each request parameter: -- version: is the version of client sdk. +- version: is the version of the client SDK. - event: - - publish: `full` will publish both peer info and tracks info. `track` will only publish tracks info. - - subscribe: `full` will subscribe both remote peer info and tracks info. `track` will only subscribe remote tracks info. `manual` with not subscribe any source, client must do it manual. This feature is useful for client which want to use manual mode to subscribe remote tracks. Example in spatial room application, client will set to `manual` and only subscribe to peer which near to it. + - publish: `full` will publish both peer info and track info. `track` will only publish track info. + - subscribe: `full` will subscribe to both remote peer info and track info. `track` will only subscribe to remote track info. `manual` will not subscribe to any source, the client must do it manually. This feature is useful for clients who want to use manual mode to subscribe to remote tracks. For example, in a spatial room application, the client will set it to `manual` and only subscribe to peers that are near to it. - bitrate: - - ingress is the bitrate mode for ingress stream. In `save` mode, media-server will limit bitrate based on network and consumers. In `max` mode, media-server will only limit bitrate by network and media-server config. + - ingress is the bitrate mode for the ingress stream. In `save` mode, the media server will limit the bitrate based on the network and consumers. In `max` mode, the media server will only limit the bitrate based on the network and media server configuration. -- features: json object for containing some features which client want to use. For example: mix-minus, spatial room, etc. +- features: a JSON object containing some features that the client wants to use. For example: mix-minus, spatial room, etc. - tracks: - - receivers: list of receivers which client want to create. Each receiver is described with: + - receivers: a list of receivers that the client wants to create. Each receiver is described with: - - kind: is the kind of receiver, audio or video. - - id: is the id of receiver. - - remote: is the remote source which client want to pin to. If it's none, the receiver will be created but not pin to any source. - - limit: is the limit of receiver. If it's none, the receiver will be created with default limit. + - kind: the kind of receiver, audio or video. + - id: the ID of the receiver. + - remote: the remote source that the client wants to pin to. If it's none, the receiver will be created but not pinned to any source. + - limit: the limit of the receiver. If it's none, the receiver will be created with the default limit. - - senders: list of senders which client want to create. Each sender is described with: - - kind: is the kind of sender, audio or video. - - id: is the id of sender. - - uuid: is the uuid of sender. It's used to identify the sender in client side. - - label: is the label of sender. It's used to identify the sender in client side. - - screen: is the flag to indicate that the sender is screen sharing. + - senders: a list of senders that the client wants to create. Each sender is described with: + - kind: the kind of sender, audio or video. + - id: the ID of the sender. + - uuid: the UUID of the sender. It's used to identify the sender on the client side. + - label: the label of the sender. It's used to identify the sender on the client side. + - screen: a flag to indicate whether the sender is screen sharing. -- sdp: is the OfferSDP which client created. +- sdp: the OfferSDP that the client created. -In response: +The explaination of each response parameter: - sdp: is the AnswerSDP which server created, it should contain all ice-candidates from server. -- conn_id: global identify of WebRTC connection. This is used by control api like restart-ice, ice-trickle, kick, etc. +- conn_id: global identifier of WebRTC connection. This is used by control api like restart-ice, ice-trickle, kick, etc. Error list: @@ -163,9 +169,9 @@ Error list: | INTERNAL_SERVER_ERROR | The server is error. | | GATEWAY_ERROR | The gateway is error. | -After that client need to wait for connected event from WebRTC connection and connected event from datachannel. -If after a period of time, client don't receive any event, it will set restart ice flag and retry connect to server with newest offer-sdp. -After some tries (configurable), client will stop retry and report error to user as CONNECTION_TIMEOUT. +After that, the client needs to wait for the connected event from the WebRTC connection and the connected event from the data channel. +If the client doesn't receive any event after a period of time, it will set the restart ice flag and retry connecting to the server with the newest offer SDP. +After several attempts (configurable), the client will stop retrying and report an error to the user as CONNECTION_TIMEOUT. ## 4.3 Restart-ice @@ -177,11 +183,12 @@ After some tries (configurable), client will stop retry and report error to user **_Response Data_**: same with connect response -By that way, incase of network change, client can retry connect to server with newest offer-sdp and if the server is still alive, it will response with new answer-sdp. If the server is dead, client will retry connect to another server and can be restore the session state by using track state. + +By doing this, in case of a network change, the client can retry connecting to the server with the newest offer SDP. If the server is still alive, it will respond with a new answer SDP. However, if the server is dead, the gateway will retry connecting to another server. The session state can be restored using the track state and each feature state. ## 4.4 Ice-tricle -Each time client WebRTC connection has a new ice-candidate, it should sending to gateway over: +Each time the client's WebRTC connection has a new ice-candidate, it should be sent to the gateway using the following endpoint: **_Endpoint_**: POST `GATEWAY/webrtc/conns/:conn_id/ice-remote` @@ -191,7 +198,7 @@ Each time client WebRTC connection has a new ice-candidate, it should sending to ## 4.5 Datachannel Request/Response format -All request and response sending over datachannel will be encoded in JSON format. The format is described as below: +The format for encoding all requests and responses sent over the data channel is JSON. The structure of the request and response objects is as follows: Request/Event: @@ -217,7 +224,7 @@ Response: } ``` -The seq is incresemental value, which is generated in sender side. The seq is helped us to mapping between request and response and also detect data lost. +The seq is an incremental value generated on the sender side. It helps us to map between requests and responses and also detect data loss. The cmd is generate with rule: `identify.action`, for example: @@ -231,20 +238,21 @@ The cmd is generate with rule: `identify.action`, for example: ## 4.6 In-session requests -At current state, we will have only one WebRTC connection to server. So we don't need to send any request to server. All request will be send over datachannel. -Typicaly, client will need some actions with media server: +At the current state, we only have one WebRTC connection to the server, so there is no need to send any requests over HTTP. All requests will be sent over the WebRTC datachannel. + +Typically, the client will need to perform various actions with the media server, such as: -- Create/Release sender -- Create/Release receiver -- Sender action: pause, resume, switch stream -- Receiver action: pause, resume, switch remote source, update priority and layers +- Creating/Releasing senders +- Creating/Releasing receivers +- Sender actions: pause, resume, switch stream +- Receiver actions: pause, resume, switch remote source, update priority, and layers -All action which changed streams will be do at local-first, then calling updateSdp to server. +All actions that involve changing tracks will be performed locally first, and then the `updateSdp` command will be sent to the server. ### 4.6.1 UpdateSDP -Each time we changed something in WebRTC connection, we need to send updateSdp request to server over datachannel. The request will be described in below: +Each time we make changes to the WebRTC connection or negotiationneeded event fired, we need to send an `updateSdp` request to the server over the data channel. This request is described below: **_Cmd:_**: `peer.updateSdp` @@ -283,7 +291,7 @@ Each time we changed something in WebRTC connection, we need to send updateSdp r ### 4.6.2 Room actions, event -We can subscribe to peers event (joined, leaved, track added, track removed) and also can unsubscribe from it. +We can subscribe to peers event (joined, left, track added, track removed) and also can unsubscribe from it. #### 4.6.2.1 Subscribe to other peers event @@ -326,11 +334,11 @@ We can subscribe to peers event (joined, leaved, track added, track removed) and ### 4.6.4 Session Sender create/release, actions, events -For create a sender we need to create a transiver with kind is audio or video. After that we need to create a track and add it to transiver. Then we need to sending updateSdp request to server. +For creating a sender, we need to create a transceiver with kind as audio or video. After that, we need to create a track and add it to the transceiver. Then we need to send an updateSDP request to the server. -For destroy a sender, we need to remove track from transiver and remove transiver from connection. Then we need to sending updateSdp request to server. +For destroying a sender, we need to remove the track from the transceiver and remove the transceiver from the connection. Then we need to send an updateSDP request to the server. -Each sender has some actions and some event with rule: `session.sender.{id}.{action}` +Each sender has some actions and events with the following rule: `session.sender.{id}.{action}` #### 4.6.4.1 Switch sender source @@ -361,9 +369,9 @@ Each sender has some actions and some event with rule: `session.sender.{id}.{act ### 4.6.5 Session Receiver create/release, actions -For create a receiver we need to create a transiver with kind is audio or video. After that we need to create a track and add it to transiver. Then we need to sending updateSdp request to server. +To create a receiver, we need to create a transceiver with kind as audio or video. After that, we need to create a track and add it to the transceiver. Then we need to send an updateSdp request to the server. -Each receiver has some actions and some event with rule: `session.receiver.{id}.{action}` +Each receiver has some actions and events with the following rule: `session.receiver.{id}.{action}` ### 4.6.5.1 Switch receiver source @@ -418,12 +426,12 @@ If remote is none, the receiver will be paused. } ``` -Receiver state is explain below: +Receiver state is explained below: -- no_source: The receiver is created but don't pin to any source. -- live: The receiver is live. -- key_only: The receiver is live but only receive key frame, this maybe for speed limiter. -- inactive: The receiver is pinned but not enough bandwidth to receive. +- `no_source`: The receiver is created but not pinned to any source. +- `live`: The receiver is live. +- `key_only`: The receiver is live but only receives key frames, which may be for speed limiting purposes. +- `inactive`: The receiver is pinned but does not have enough bandwidth to receive. ### 4.6.5.3 Receiver stats event @@ -455,10 +463,11 @@ Receiver state is explain below: ### 4.7.1 Feature: mix-minus mixer -Mix-minus feature has 2 modes: -- Manual: client can add or remove source to mixer. -- Auto: media-server will automatically add or remove all audio sources except the local source to mixer. +The mix-minus feature has two modes: + +- Manual: In this mode, the client can manually add or remove sources to the mixer. +- Auto: In this mode, the media server will automatically add or remove all audio sources except the local source to the mixer. #### 4.7.1.1 Connect request From 22f2c0ab874c247e6455071761729e44a4663e57 Mon Sep 17 00:00:00 2001 From: Giang Minh Date: Sat, 3 Feb 2024 09:48:31 +0700 Subject: [PATCH 06/15] fix grammar --- text/0005-media-webrtc-sdk.md | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/text/0005-media-webrtc-sdk.md b/text/0005-media-webrtc-sdk.md index 943bbc0..f701ccf 100644 --- a/text/0005-media-webrtc-sdk.md +++ b/text/0005-media-webrtc-sdk.md @@ -32,11 +32,9 @@ To ensure simplicity, we propose a SDK protocol that exclusively uses HTTP (not All requests and responses will be encoded in JSON format. The format is described as follows: -```json +**_Body:_**: JSON -**Body:**: JSON - -**Response:\*** +**_Response:_** ``` { From 2fe60aaa78990392a8b28b556551ed7e8023f371 Mon Sep 17 00:00:00 2001 From: Giang Minh Date: Wed, 14 Feb 2024 23:43:40 +0700 Subject: [PATCH 07/15] fix some comments --- text/0005-media-webrtc-sdk.md | 86 ++++++++++++++++++++--------------- 1 file changed, 50 insertions(+), 36 deletions(-) diff --git a/text/0005-media-webrtc-sdk.md b/text/0005-media-webrtc-sdk.md index f701ccf..5897476 100644 --- a/text/0005-media-webrtc-sdk.md +++ b/text/0005-media-webrtc-sdk.md @@ -127,7 +127,7 @@ The explanation of each request parameter: - event: - publish: `full` will publish both peer info and track info. `track` will only publish track info. - - subscribe: `full` will subscribe to both remote peer info and track info. `track` will only subscribe to remote track info. `manual` will not subscribe to any source, the client must do it manually. This feature is useful for clients who want to use manual mode to subscribe to remote tracks. For example, in a spatial room application, the client will set it to `manual` and only subscribe to peers that are near to it. + - subscribe: `full` will subscribe to both remote peer info and track info. `track` will only subscribe to remote track info. `manual` will not subscribe to any source, the client must do it manually. This feature is useful for clients who want to use manual mode to subscribe to remote tracks. For example, in a proximity based audio application like [Gather](https://www.gather.town/), the client will set it to `manual` and only subscribe to peers that are near to it. - bitrate: @@ -140,15 +140,18 @@ The explanation of each request parameter: - kind: the kind of receiver, audio or video. - id: the ID of the receiver. - - remote: the remote source that the client wants to pin to. If it's none, the receiver will be created but not pinned to any source. - - limit: the limit of the receiver. If it's none, the receiver will be created with the default limit. + - state: the state of the receiver. It's used to restore the receiver state when the client reconnects to the server. It contains: + - remote: the remote source that the client wants to pin to. If it's none, the receiver will be created but not pinned to any source. + - limit: the limit of the receiver. If it's none, the receiver will be created with the default limit. - senders: a list of senders that the client wants to create. Each sender is described with: - kind: the kind of sender, audio or video. - id: the ID of the sender. - uuid: the UUID of the sender. It's used to identify the sender on the client side. - label: the label of the sender. It's used to identify the sender on the client side. - - screen: a flag to indicate whether the sender is screen sharing. + - state: the state of the sender. It's used to restore the sender state when the client reconnects to the server. It contains: + - screen: a flag to indicate whether the sender is screen sharing. + - pause: a flag to indicate whether the sender is paused. - sdp: the OfferSDP that the client created. @@ -226,13 +229,13 @@ The seq is an incremental value generated on the sender side. It helps us to map The cmd is generate with rule: `identify.action`, for example: -- `peer.updateSdp`. +- `peer.update_sdp`. - `sender.{sender_id}.toggle`. - `receiver.{receiver_id}.switch`. - `room.peers.subscribe`. - `room.peers.unsubscribe`. - `session.disconnect`. -- `session.features.mix-minus.add_source`. +- `session.features.mix_minus.sources.add`. ## 4.6 In-session requests @@ -246,13 +249,13 @@ Typically, the client will need to perform various actions with the media server - Sender actions: pause, resume, switch stream - Receiver actions: pause, resume, switch remote source, update priority, and layers -All actions that involve changing tracks will be performed locally first, and then the `updateSdp` command will be sent to the server. +All actions that involve changing tracks will be performed locally first, and then the `update_sdp` command will be sent to the server. -### 4.6.1 UpdateSDP +### 4.6.1 Update SDP -Each time we make changes to the WebRTC connection or negotiationneeded event fired, we need to send an `updateSdp` request to the server over the data channel. This request is described below: +Each time we make changes to the WebRTC connection or negotiationneeded event fired, we need to send an `update_sdp` request to the server over the data channel. This request is described below: -**_Cmd:_**: `peer.updateSdp` +**_Cmd:_**: `peer.update_sdp` **_Data:_** @@ -293,13 +296,15 @@ We can subscribe to peers event (joined, left, track added, track removed) and a #### 4.6.2.1 Subscribe to other peers event +(Note that this action only works with `event.subscribe` manual mode.) + **_Cmd:_** `room.peers.subscribe` **_Request Data:_** ``` { - peer_id: String, + peer_ids: [String], } ``` @@ -307,13 +312,15 @@ We can subscribe to peers event (joined, left, track added, track removed) and a #### 4.6.2.2 Unsubscribe to other peers event +(Note that this action only works with `subscribe` manual mode.) + **_Cmd:_**: `room.peers.unsubscribe` **_Request Data:_** ``` { - peer_id: String, + peer_ids: [String], } ``` @@ -332,15 +339,15 @@ We can subscribe to peers event (joined, left, track added, track removed) and a ### 4.6.4 Session Sender create/release, actions, events -For creating a sender, we need to create a transceiver with kind as audio or video. After that, we need to create a track and add it to the transceiver. Then we need to send an updateSDP request to the server. +For creating a sender, we need to create a transceiver with kind as audio or video. After that, we need to create a track and add it to the transceiver. Then we need to send an `update_sdp` request to the server. -For destroying a sender, we need to remove the track from the transceiver and remove the transceiver from the connection. Then we need to send an updateSDP request to the server. +For destroying a sender, we need to remove the track from the transceiver and remove the transceiver from the connection. Then we need to send an `update_sdp` request to the server. Each sender has some actions and events with the following rule: `session.sender.{id}.{action}` #### 4.6.4.1 Switch sender source -**_Cmd:_**: `session.sender.{id}.toggle` +**_Cmd:_**: `session.senders.{id}.toggle` **_Request data:_** @@ -353,10 +360,10 @@ Each sender has some actions and events with the following rule: `session.sender **_Response Data:_**: None -**_Cmd:_**: `sender_event.{id}.state` - #### 4.6.4.2 State event +**_Cmd:_**: `session.senders.{id}.state` + **_Event data:_**: ``` @@ -367,13 +374,13 @@ Each sender has some actions and events with the following rule: `session.sender ### 4.6.5 Session Receiver create/release, actions -To create a receiver, we need to create a transceiver with kind as audio or video. After that, we need to create a track and add it to the transceiver. Then we need to send an updateSdp request to the server. +To create a receiver, we need to create a transceiver with kind as audio or video. After that, we need to create a track and add it to the transceiver. Then we need to send an `update_sdp` request to the server. Each receiver has some actions and events with the following rule: `session.receiver.{id}.{action}` ### 4.6.5.1 Switch receiver source -**_Cmd:_**: `session.receiver.{id}.switch` +**_Cmd:_**: `session.receivers.{id}.switch` **_Request data:_** @@ -410,7 +417,7 @@ If remote is none, the receiver will be paused. ### 4.6.5.3 Receiver state event -**_Event:_**: `session.receiver.{id}.state` +**_Event:_**: `session.receivers.{id}.state` **_Event data:_**: @@ -420,6 +427,10 @@ If remote is none, the receiver will be paused. source: Option<{ peer: String, track: String, + scaling: "single" | "simulcast" | "svc", + spatials: Number, + temporals: Number, + codec: "opus" | "vp8" | "vp9" | "h264" | "h265" | "av1", }>, } ``` @@ -433,17 +444,13 @@ Receiver state is explained below: ### 4.6.5.3 Receiver stats event -**_Event:_**: `session.receiver.{id}.stats` +**_Event:_**: `session.receivers.{id}.stats` **_Event data:_**: ``` { - codec: String, ingress: Option<{ - scaling: "single" | "simulcast" | "svc", - spatials: Number, - temporals: Number, bitrate: Number, rtt: Number, lost: Number, @@ -491,14 +498,18 @@ In connect request, we add field to features params: Note that, this action only work with `manual` mode. -**_Cmd:_**: `session.features.mix-minus.add_source` +**_Cmd:_**: `session.features.mix_minus.sources.add` **_Request data:_** ``` { - peer: String, - track: String, + sources: [ + { + peer: String, + track: String, + } + ] } ``` @@ -508,14 +519,18 @@ Note that, this action only work with `manual` mode. Note that, this action only work with `manual` mode. -**_Cmd:_**: `session.features.mix-minus.remove_source` +**_Cmd:_**: `session.features.mix_minus.sources.remove` **_Request data:_** ``` { - peer: String, - track: String, + sources: [ + { + peer: String, + track: String, + } + ] } ``` @@ -523,7 +538,7 @@ Note that, this action only work with `manual` mode. #### 4.7.1.4 Pause mix-minus mixer -**_Cmd:_**: `session.features.mix-minus.pause` +**_Cmd:_**: `session.features.mix_minus.pause` **_Request data:_** None @@ -531,7 +546,7 @@ Note that, this action only work with `manual` mode. #### 4.7.1.5 Resume mix-minus mixer -**_Cmd:_**: `session.features.mix-minus.resume` +**_Cmd:_**: `session.features.mix_minus.resume` **_Request data:_**: None @@ -539,15 +554,14 @@ Note that, this action only work with `manual` mode. #### 4.7.1.6 State event -**_Cmd:_**: `session.features.mix-minus.state` +**_Cmd:_**: `session.features.mix_minus.state` **_Event data:_** ``` { - layers: [ + slots: [ { - id: string, remote: Option<{ peer: String, track: String, From d34df051b8b93298df8e3557a1e6945d6ba2d882 Mon Sep 17 00:00:00 2001 From: Giang Minh Date: Thu, 15 Feb 2024 00:00:20 +0700 Subject: [PATCH 08/15] add terms and rename remote to source --- text/0005-media-webrtc-sdk.md | 64 +++++++++++++++++++---------------- 1 file changed, 35 insertions(+), 29 deletions(-) diff --git a/text/0005-media-webrtc-sdk.md b/text/0005-media-webrtc-sdk.md index 5897476..2b6063f 100644 --- a/text/0005-media-webrtc-sdk.md +++ b/text/0005-media-webrtc-sdk.md @@ -24,9 +24,19 @@ To ensure simplicity, we propose a SDK protocol that exclusively uses HTTP (not - HTTP is used only for sending RPC requests to the cluster, typically during the Connect and Retry phases. - WebRTC is utilized for sending and receiving media streams, as well as RPC and event communication after establishing a connection. +- Each client will have sender tracks and receiver tracks to handle media streams. Senders are responsible for sending media streams to the server, while receivers are used to receive media streams from the server. Both senders and receivers can be paused, resumed, or switched to another source. There are two types of senders: audio and video, and two types of receivers: audio and video. Each sender and receiver is assigned a unique ID by the client, such as 'audio_1', 'audio_2', 'video_1', 'video_2', and it does not need to be unique with other clients. -## 4.1 HTTP Request/Response format +We have some terms: + +- **_Sender_**: A sender is a track that sends media to the server. It can be an audio or video track. +- **_Receiver_**: A receiver is a track that receives media from the server. It can be an audio or video track. +- **_Source_**: A source that the receiver is pinned to. It can be a audio or video. Source is identify by a pair peer_id and track_id. +- **_Peer_**: A peer is a client that is connected to the server. +- **_Room_**: A room is a group of peers that are connected to the same server. +- **_Conn_**: A connection is a WebRTC connection between the client and the server. +- **_Feature_**: A feature is a set of advance functionalities that the client can use. For example, mix-minus, chat, ... +## 4.1 HTTP Request/Response format **Request and Response Format** @@ -45,7 +55,6 @@ All requests and responses will be encoded in JSON format. The format is describ } ``` - ## 4.2 Connect Establishment Before connecting to the server, the client needs to prepare the following: @@ -81,7 +90,7 @@ Once the client is ready, it can send a connect request to the server. kind: "audio" | "video", id: String, state: Option<{ - remote: Option<{ + source: Option<{ peer: String, track: String, }>, @@ -126,32 +135,32 @@ The explanation of each request parameter: - version: is the version of the client SDK. - event: - - publish: `full` will publish both peer info and track info. `track` will only publish track info. - - subscribe: `full` will subscribe to both remote peer info and track info. `track` will only subscribe to remote track info. `manual` will not subscribe to any source, the client must do it manually. This feature is useful for clients who want to use manual mode to subscribe to remote tracks. For example, in a proximity based audio application like [Gather](https://www.gather.town/), the client will set it to `manual` and only subscribe to peers that are near to it. + - publish: `full` will publish both peer info and track info. `track` will only publish track info. + - subscribe: `full` will subscribe to both remote peer info and track info. `track` will only subscribe to remote track info. `manual` will not subscribe to any source, the client must do it manually. This feature is useful for clients who want to use manual mode to subscribe to remote tracks. For example, in a proximity based audio application like [Gather](https://www.gather.town/), the client will set it to `manual` and only subscribe to peers that are near to it. - bitrate: - - ingress is the bitrate mode for the ingress stream. In `save` mode, the media server will limit the bitrate based on the network and consumers. In `max` mode, the media server will only limit the bitrate based on the network and media server configuration. + - ingress is the bitrate mode for the ingress stream. In `save` mode, the media server will limit the bitrate based on the network and consumers. In `max` mode, the media server will only limit the bitrate based on the network and media server configuration. - features: a JSON object containing some features that the client wants to use. For example: mix-minus, spatial room, etc. - tracks: - - receivers: a list of receivers that the client wants to create. Each receiver is described with: + - receivers: a list of receivers that the client wants to create. Each receiver is described with: - - kind: the kind of receiver, audio or video. - - id: the ID of the receiver. - - state: the state of the receiver. It's used to restore the receiver state when the client reconnects to the server. It contains: - - remote: the remote source that the client wants to pin to. If it's none, the receiver will be created but not pinned to any source. - - limit: the limit of the receiver. If it's none, the receiver will be created with the default limit. + - kind: the kind of receiver, audio or video. + - id: the ID of the receiver. + - state: the state of the receiver. It's used to restore the receiver state when the client reconnects to the server. It contains: + - source: the remote source that the client wants to pin to. If it's none, the receiver will be created but not pinned to any source. + - limit: the limit of the receiver. If it's none, the receiver will be created with the default limit. - - senders: a list of senders that the client wants to create. Each sender is described with: - - kind: the kind of sender, audio or video. - - id: the ID of the sender. - - uuid: the UUID of the sender. It's used to identify the sender on the client side. - - label: the label of the sender. It's used to identify the sender on the client side. - - state: the state of the sender. It's used to restore the sender state when the client reconnects to the server. It contains: - - screen: a flag to indicate whether the sender is screen sharing. - - pause: a flag to indicate whether the sender is paused. + - senders: a list of senders that the client wants to create. Each sender is described with: + - kind: the kind of sender, audio or video. + - id: the ID of the sender. + - uuid: the UUID of the sender. It's used to identify the sender on the client side. + - label: the label of the sender. It's used to identify the sender on the client side. + - state: the state of the sender. It's used to restore the sender state when the client reconnects to the server. It contains: + - screen: a flag to indicate whether the sender is screen sharing. + - pause: a flag to indicate whether the sender is paused. - sdp: the OfferSDP that the client created. @@ -184,14 +193,13 @@ After several attempts (configurable), the client will stop retrying and report **_Response Data_**: same with connect response - By doing this, in case of a network change, the client can retry connecting to the server with the newest offer SDP. If the server is still alive, it will respond with a new answer SDP. However, if the server is dead, the gateway will retry connecting to another server. The session state can be restored using the track state and each feature state. ## 4.4 Ice-tricle Each time the client's WebRTC connection has a new ice-candidate, it should be sent to the gateway using the following endpoint: -**_Endpoint_**: POST `GATEWAY/webrtc/conns/:conn_id/ice-remote` +**_Endpoint_**: POST `GATEWAY/webrtc/conns/:conn_id/ice-candidate` **_Body_**: candidate String @@ -239,7 +247,6 @@ The cmd is generate with rule: `identify.action`, for example: ## 4.6 In-session requests - At the current state, we only have one WebRTC connection to the server, so there is no need to send any requests over HTTP. All requests will be sent over the WebRTC datachannel. Typically, the client will need to perform various actions with the media server, such as: @@ -387,7 +394,7 @@ Each receiver has some actions and events with the following rule: `session.rece ```` { priority: u16, - remote: Option<{ + source: Option<{ peer: String, track: String, }>, @@ -395,7 +402,7 @@ Each receiver has some actions and events with the following rule: `session.rece ***Response data:***: None -If remote is none, the receiver will be paused. +If source is none, the receiver will be paused. ### 4.6.5.2 Limit receiver bitrate @@ -450,13 +457,13 @@ Receiver state is explained below: ``` { - ingress: Option<{ + source: Option<{ bitrate: Number, rtt: Number, lost: Number, jitter: Number, }>, - egress: Option<{ + transmit: Option<{ spatial: Number, temporal: Number, bitrate: Number, @@ -468,7 +475,6 @@ Receiver state is explained below: ### 4.7.1 Feature: mix-minus mixer - The mix-minus feature has two modes: - Manual: In this mode, the client can manually add or remove sources to the mixer. @@ -562,7 +568,7 @@ Note that, this action only work with `manual` mode. { slots: [ { - remote: Option<{ + source: Option<{ peer: String, track: String, audio_level: Number, From 247560b6ca12beafbb7512be0e782d8469b0dc42 Mon Sep 17 00:00:00 2001 From: Giang Minh Date: Thu, 15 Feb 2024 00:06:16 +0700 Subject: [PATCH 09/15] fix typo --- text/0005-media-webrtc-sdk.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0005-media-webrtc-sdk.md b/text/0005-media-webrtc-sdk.md index 2b6063f..115ce61 100644 --- a/text/0005-media-webrtc-sdk.md +++ b/text/0005-media-webrtc-sdk.md @@ -164,7 +164,7 @@ The explanation of each request parameter: - sdp: the OfferSDP that the client created. -The explaination of each response parameter: +The explanation of each response parameter: - sdp: is the AnswerSDP which server created, it should contain all ice-candidates from server. - conn_id: global identifier of WebRTC connection. This is used by control api like restart-ice, ice-trickle, kick, etc. From 8ba3bb6a87732c0092ea228e52c048253ddfd324 Mon Sep 17 00:00:00 2001 From: Giang Minh Date: Fri, 16 Feb 2024 12:00:55 +0700 Subject: [PATCH 10/15] update peer, track events --- text/0005-media-webrtc-sdk.md | 111 ++++++++++++++++++++++++++++++++-- 1 file changed, 105 insertions(+), 6 deletions(-) diff --git a/text/0005-media-webrtc-sdk.md b/text/0005-media-webrtc-sdk.md index 115ce61..2747c43 100644 --- a/text/0005-media-webrtc-sdk.md +++ b/text/0005-media-webrtc-sdk.md @@ -76,6 +76,7 @@ Once the client is ready, it can send a connect request to the server. version: Option, room: String, peer: String, + metadata: Option, event: { publish: "full" | "track", subscribe: "full" | "track" | "manual", @@ -109,10 +110,10 @@ Once the client is ready, it can send a connect request to the server. kind: "audio" | "video", id: String, uuid: String, - label: String, + metadata: Option, state: Option<{ + active: bool, screen: bool, - pause: bool, }>, } ], @@ -133,6 +134,9 @@ Once the client is ready, it can send a connect request to the server. The explanation of each request parameter: - version: is the version of the client SDK. +- room_id: is the room that the client wants to connect to. This is [a-z0-9-] string, maximum is 32 characters. +- peer_id: is the ID of the client. It's used to identify the client on the server side. It's unique in the room. This is [a-z0-9-] string, maximum is 32 characters +- metadata: is the metadata of the client. It can be used to store some information about the client, such as user name .... It's optional and should small than 512 characters. - event: - publish: `full` will publish both peer info and track info. `track` will only publish track info. @@ -157,7 +161,7 @@ The explanation of each request parameter: - kind: the kind of sender, audio or video. - id: the ID of the sender. - uuid: the UUID of the sender. It's used to identify the sender on the client side. - - label: the label of the sender. It's used to identify the sender on the client side. + - metadata: It can be used to store some information about the client's track, such as label name, device .... It's optional and should small than 512 characters. - state: the state of the sender. It's used to restore the sender state when the client reconnects to the server. It contains: - screen: a flag to indicate whether the sender is screen sharing. - pause: a flag to indicate whether the sender is paused. @@ -281,8 +285,11 @@ Each time we make changes to the WebRTC connection or negotiationneeded event fi kind: "audio" | "video", id: String, uuid: String, - label: String, - screen: Option, + metadata: Option, + state: Option<{ + active: bool, + screen: bool, + }>, } ], } @@ -332,6 +339,65 @@ We can subscribe to peers event (joined, left, track added, track removed) and a ``` +#### 4.6.2.3 Peer joined event + +**_Cmd:_**: `room.peers.{peer}.joined` + +**_Event data:_**: + +``` +{ + metadata: Option, +} +``` + +**_Response Data:_**: None + +#### 4.6.2.3 Peer left event + +**_Cmd:_**: `room.peers.{peer}.left` + +**_Event data:_**: None + +**_Response Data:_**: None + +#### 4.6.2.3 Track added event + +**_Cmd:_**: `room.peers.{peer}.tracks.{track}.added` + +**_Event data:_**: + +``` +{ + metadata: Option, + state: { + active: bool, + screen: bool, + simulcast: bool, + }, +} +``` + +#### 4.6.2.3 Track updated event + +**_Cmd:_**: `room.peers.{peer}.tracks.{track}.updated` + +**_Event data:_**: + +``` +{ + state: { + active: bool, + }, +} +``` + +#### 4.6.2.3 Track removed event + +**_Cmd:_**: `room.peers.{peer}.tracks.{track}.removed` + +**_Event data:_**: None + **_Response Data:_**: None ### 4.6.3 Session actions, event @@ -344,6 +410,34 @@ We can subscribe to peers event (joined, left, track added, track removed) and a **_Response:_**: None +#### 4.6.3.2 Goaway event + +Goaway event is sent by the server in some cases: + +- The server is going to shutdown or restart. +- The client lifetime is expired, this is useful in some video conference application where each client only has limited session time; example 1 hour. +- The client is kicked by the server. + +In case of a server shutdown or restart, the client should reconnect by sending restart-ice request. + +**_Cmd:_**: `session.on_goaway` + +**_Event data:_**: + +``` +{ + reason: "shutdown" | "kick", + message: Option, + remain_seconds: Number, +} +``` + +**_Response Data:_**: None + +`remain_seconds` is the time that the client has to reconnect to the server. If it's 0, the client should reconnect immediately. + +In case of "shutdown", the client should reconnect by sending restart-ice request. + ### 4.6.4 Session Sender create/release, actions, events For creating a sender, we need to create a transceiver with kind as audio or video. After that, we need to create a track and add it to the transceiver. Then we need to send an `update_sdp` request to the server. @@ -361,12 +455,17 @@ Each sender has some actions and events with the following rule: `session.sender ``` { track: Option, - label: Option, + metadata: Option, + state: { + active: bool, + } } ``` **_Response Data:_**: None +If the track is none, the sender will be switched to an inactive state, and other clients will receive a track removed event. In case a client needs to deactivate the sender, it should set 'active' to false; this is useful for the mic mute feature. + #### 4.6.4.2 State event **_Cmd:_**: `session.senders.{id}.state` From 2957054b6a71909051a02bff4734e53055e08f1f Mon Sep 17 00:00:00 2001 From: Giang Minh Date: Fri, 16 Feb 2024 14:57:20 +0700 Subject: [PATCH 11/15] fix some terms --- text/0005-media-webrtc-sdk.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/text/0005-media-webrtc-sdk.md b/text/0005-media-webrtc-sdk.md index 2747c43..49628d5 100644 --- a/text/0005-media-webrtc-sdk.md +++ b/text/0005-media-webrtc-sdk.md @@ -136,7 +136,7 @@ The explanation of each request parameter: - version: is the version of the client SDK. - room_id: is the room that the client wants to connect to. This is [a-z0-9-] string, maximum is 32 characters. - peer_id: is the ID of the client. It's used to identify the client on the server side. It's unique in the room. This is [a-z0-9-] string, maximum is 32 characters -- metadata: is the metadata of the client. It can be used to store some information about the client, such as user name .... It's optional and should small than 512 characters. +- metadata: is the metadata of the client. It can be used to store some information about the client, such as user name .... It's optional and should be smaller than 512 characters. - event: - publish: `full` will publish both peer info and track info. `track` will only publish track info. @@ -161,7 +161,7 @@ The explanation of each request parameter: - kind: the kind of sender, audio or video. - id: the ID of the sender. - uuid: the UUID of the sender. It's used to identify the sender on the client side. - - metadata: It can be used to store some information about the client's track, such as label name, device .... It's optional and should small than 512 characters. + - metadata: It can be used to store some information about the client's track, such as label name, device .... It's optional and should be smaller than 512 characters. - state: the state of the sender. It's used to restore the sender state when the client reconnects to the server. It contains: - screen: a flag to indicate whether the sender is screen sharing. - pause: a flag to indicate whether the sender is paused. @@ -251,7 +251,7 @@ The cmd is generate with rule: `identify.action`, for example: ## 4.6 In-session requests -At the current state, we only have one WebRTC connection to the server, so there is no need to send any requests over HTTP. All requests will be sent over the WebRTC datachannel. +At the current state, we already have one WebRTC connection to the server, so there is no need to send any requests over HTTP. All requests will be sent over the WebRTC datachannel. Typically, the client will need to perform various actions with the media server, such as: @@ -434,7 +434,7 @@ In case of a server shutdown or restart, the client should reconnect by sending **_Response Data:_**: None -`remain_seconds` is the time that the client has to reconnect to the server. If it's 0, the client should reconnect immediately. +`remain_seconds` represents the remaining time that the client will be served by the server. In the event that a client needs to reconnect, it should do so before the remain_seconds expire to avoid interrupting the client session. In case of "shutdown", the client should reconnect by sending restart-ice request. From f5b4c5faad00694014b7448e7eddf770f7d5ae86 Mon Sep 17 00:00:00 2001 From: Giang Minh Date: Thu, 22 Feb 2024 14:37:31 +0700 Subject: [PATCH 12/15] change event with cmds don't have resource id --- text/0005-media-webrtc-sdk.md | 295 +++++++++++++++++++++++++--------- 1 file changed, 220 insertions(+), 75 deletions(-) diff --git a/text/0005-media-webrtc-sdk.md b/text/0005-media-webrtc-sdk.md index 49628d5..840a38b 100644 --- a/text/0005-media-webrtc-sdk.md +++ b/text/0005-media-webrtc-sdk.md @@ -74,8 +74,8 @@ Once the client is ready, it can send a connect request to the server. ``` { version: Option, - room: String, - peer: String, + room_id: String, + peer_id: String, metadata: Option, event: { publish: "full" | "track", @@ -92,8 +92,8 @@ Once the client is ready, it can send a connect request to the server. id: String, state: Option<{ source: Option<{ - peer: String, - track: String, + peer_id: String, + track_id: String, }>, limit: Option<{ priority: u16, @@ -109,16 +109,18 @@ Once the client is ready, it can send a connect request to the server. { kind: "audio" | "video", id: String, - uuid: String, + source: Option<{ + id: String, + screen: bool, + }>, metadata: Option, state: Option<{ active: bool, - screen: bool, }>, } ], }, - sdp: Option + sdp: String } ``` @@ -178,7 +180,7 @@ Error list: | Error code | Description | | --------------------- | ----------------------- | | INVALID_TOKEN | The token is invalid. | -| SDP_ERROR | The sdp is invalid. | +| INVALID_SDP | The sdp is invalid. | | INVALID_REQUEST | The request is invalid. | | INTERNAL_SERVER_ERROR | The server is error. | | GATEWAY_ERROR | The gateway is error. | @@ -205,10 +207,26 @@ Each time the client's WebRTC connection has a new ice-candidate, it should be s **_Endpoint_**: POST `GATEWAY/webrtc/conns/:conn_id/ice-candidate` -**_Body_**: candidate String +**_Body_**: + +``` +{ + candidate: String +} +``` **_Response Data_**: None +Error list: + +| Error code | Description | +| --------------------- | ----------------------- | +| INVALID_CONN | The conn_id is invalid. | +| INVALID_ICE | The ice is invalid. | +| INVALID_REQUEST | The request is invalid. | +| INTERNAL_SERVER_ERROR | The server is error. | +| GATEWAY_ERROR | The gateway is error. | + ## 4.5 Datachannel Request/Response format The format for encoding all requests and responses sent over the data channel is JSON. The structure of the request and response objects is as follows: @@ -241,9 +259,9 @@ The seq is an incremental value generated on the sender side. It helps us to map The cmd is generate with rule: `identify.action`, for example: -- `peer.update_sdp`. -- `sender.{sender_id}.toggle`. -- `receiver.{receiver_id}.switch`. +- `session.update_sdp`. +- `session.senders.toggle`. +- `sessions.receivers.switch`. - `room.peers.subscribe`. - `room.peers.unsubscribe`. - `session.disconnect`. @@ -266,7 +284,7 @@ All actions that involve changing tracks will be performed locally first, and th Each time we make changes to the WebRTC connection or negotiationneeded event fired, we need to send an `update_sdp` request to the server over the data channel. This request is described below: -**_Cmd:_**: `peer.update_sdp` +**_Cmd:_**: `session.update_sdp` **_Data:_** @@ -278,17 +296,32 @@ Each time we make changes to the WebRTC connection or negotiationneeded event fi { kind: "audio" | "video", id: String, + state: Option<{ + source: Option<{ + peer_id: String, + track_id: String, + }>, + limit: Option<{ + priority: u16, + min_spatial: Option, + max_spatial: u8, + min_temporal: Option, + max_temporal: u8, + }>, + }> } ], senders: [ { kind: "audio" | "video", id: String, - uuid: String, + source: Option<{ + id: String, + screen: bool, + }>, metadata: Option, state: Option<{ active: bool, - screen: bool, }>, } ], @@ -304,11 +337,13 @@ Each time we make changes to the WebRTC connection or negotiationneeded event fi } ``` +If state is defined, the server will update the state of the receiver/sender. If state is none, the server will not update the state of the receiver/sender. + ### 4.6.2 Room actions, event We can subscribe to peers event (joined, left, track added, track removed) and also can unsubscribe from it. -#### 4.6.2.1 Subscribe to other peers event +#### 4.6.2.1 Action: Subscribe to other peers event (Note that this action only works with `event.subscribe` manual mode.) @@ -324,7 +359,7 @@ We can subscribe to peers event (joined, left, track added, track removed) and a **_Response Data:_** None -#### 4.6.2.2 Unsubscribe to other peers event +#### 4.6.2.2 Action: Unsubscribe to other peers event (Note that this action only works with `subscribe` manual mode.) @@ -339,70 +374,100 @@ We can subscribe to peers event (joined, left, track added, track removed) and a ``` -#### 4.6.2.3 Peer joined event +#### 4.6.2.3 Event: Peer joined -**_Cmd:_**: `room.peers.{peer}.joined` +**_Cmd:_**: `room.peers.added` **_Event data:_**: ``` { + peer_id: String, metadata: Option, } ``` **_Response Data:_**: None -#### 4.6.2.3 Peer left event +#### 4.6.2.3 Event: Peer left -**_Cmd:_**: `room.peers.{peer}.left` +**_Cmd:_**: `room.peers.removed` -**_Event data:_**: None +**_Event data:_**: + +``` +{ + peer_id: String, +} +``` **_Response Data:_**: None -#### 4.6.2.3 Track added event +#### 4.6.2.3 Event: Track added -**_Cmd:_**: `room.peers.{peer}.tracks.{track}.added` +**_Cmd:_**: `room.tracks.added` **_Event data:_**: ``` { + kind: "audio" | "video", + peer_id: String, + track_id: String, + source: Option<{ + id: String, + screen: bool, + }>, metadata: Option, state: { active: bool, - screen: bool, - simulcast: bool, + scaling: Option<"simulcast" | "svc">, }, } ``` -#### 4.6.2.3 Track updated event +#### 4.6.2.3 Event: Track updated -**_Cmd:_**: `room.peers.{peer}.tracks.{track}.updated` +**_Cmd:_**: `room.peers.tracks.updated` **_Event data:_**: ``` { + kind: "audio" | "video", + peer_id: String, + track_id: String, + source: Option<{ + id: String, + screen: bool, + }>, + metadata: Option, state: { active: bool, + scaling: Option<"simulcast" | "svc">, }, } ``` -#### 4.6.2.3 Track removed event +#### 4.6.2.3 Event: Track removed -**_Cmd:_**: `room.peers.{peer}.tracks.{track}.removed` +**_Cmd:_**: `room.tracks.removed` -**_Event data:_**: None +**_Event data:_**: + +``` +{ + kind: "audio" | "video", + peer_id: String, + track_id: String, +} +``` **_Response Data:_**: None ### 4.6.3 Session actions, event -#### 4.6.3.1 Disconnect +#### 4.6.3.1 Action: Disconnect **_Cmd:_**: `session.disconnect` @@ -410,7 +475,7 @@ We can subscribe to peers event (joined, left, track added, track removed) and a **_Response:_**: None -#### 4.6.3.2 Goaway event +#### 4.6.3.2 Event: Goaway Goaway event is sent by the server in some cases: @@ -446,94 +511,145 @@ For destroying a sender, we need to remove the track from the transceiver and re Each sender has some actions and events with the following rule: `session.sender.{id}.{action}` -#### 4.6.4.1 Switch sender source +#### 4.6.4.1 Action: Switch sender source + +This action is used when the user changes the source, for example, when the user changes the camera or microphone. This can also be used when the user stops sharing the camera, in which case we will release the local stream and send a switch without the source param. -**_Cmd:_**: `session.senders.{id}.toggle` +**_Cmd:_**: `session.senders.switch` **_Request data:_** ``` { - track: Option, + id: String, + source: Option<{ + id: String, + screen: bool, + }>, metadata: Option, - state: { - active: bool, - } } ``` **_Response Data:_**: None -If the track is none, the sender will be switched to an inactive state, and other clients will receive a track removed event. In case a client needs to deactivate the sender, it should set 'active' to false; this is useful for the mic mute feature. +If source is none, this sender will be removed from the room, and the receiver that is pinned to this sender will receive an updated event with the source not set. The room also sends a `room.senders.removed` event to all subscribed peers. + +If the source is set and changed, this sender's active state will be reset to true. Note that screen and metadata only work with a changed source value. In the case of the source not being changed, the server will refuse the request. + +#### 4.6.4.1 Action: Toggle sender pause/resume -#### 4.6.4.2 State event +This action is used when user mute/unmute the sender, this is useful when toggle the microphone button, we just stop local source and sending toggle with active false param. -**_Cmd:_**: `session.senders.{id}.state` +**_Cmd:_**: `session.senders.toggle` + +**_Request data:_** + +``` +{ + id: String, + active: bool, +} +``` + +**_Response Data:_**: None + +#### 4.6.4.2 Event: State event + +This event is sent by the server when the state of the sender is changed. This is useful when the client implements loading animation when the user changes the source, or when the user mutes/unmutes the sender. + +**_Cmd:_**: `session.senders.state` **_Event data:_**: ``` { - state: "new" | "live" | "paused" + id: String, + state: "waiting" | "no-source" | "active" | "inactive" } ``` +- Waiting: The sender is pinned but server dont received any media data. +- No-source: The sender is not pinned to any source. +- Active: The sender is active, and server is receiving media data. +- Inactive: The sender is pinned but . + +```mermaid +graph LR + P1[create with source] + P2[create without source] + W[waiting] + NS[no-source] + A[active] + I[inactive] + P1 --> W + P2 --> NS + W -->|media data| A + A -->|toggle false| I + I -->|toggle true| W + NS -->|switch| W + W -->|switch none| NS + A -->|switch none| NS + I -->|switch none| NS +``` + ### 4.6.5 Session Receiver create/release, actions To create a receiver, we need to create a transceiver with kind as audio or video. After that, we need to create a track and add it to the transceiver. Then we need to send an `update_sdp` request to the server. Each receiver has some actions and events with the following rule: `session.receiver.{id}.{action}` -### 4.6.5.1 Switch receiver source +### 4.6.5.1 Action: Switch receiver source -**_Cmd:_**: `session.receivers.{id}.switch` +**_Cmd:_**: `session.receivers.switch` **_Request data:_** -```` +``` { + id: string, priority: u16, source: Option<{ - peer: String, - track: String, + peer_id: String, + track_id: String, }>, } +``` -***Response data:***: None +**_Response data:_**: None If source is none, the receiver will be paused. -### 4.6.5.2 Limit receiver bitrate +### 4.6.5.2 Action: Limit receiver bitrate -***Cmd:***: `session.receiver.{id}.limit` +**_Cmd:_**: `session.receiver.limit` -***Request data:*** +**_Request data:_** ``` { + id: String, priority: u16, min_spatial: Option, max_spatial: u8, min_temporal: Option, max_temporal: u8, } -```` +``` **_Response data:_**: None -### 4.6.5.3 Receiver state event +### 4.6.5.3 Event: Receiver state -**_Event:_**: `session.receivers.{id}.state` +**_Event:_**: `session.receivers.state` **_Event data:_**: ``` { - state: "no_source" | "live" | "key_only" | "inactive", + id: String, + state: "no_source" | "waiting" | "live" | "key_only" | "inactive", source: Option<{ - peer: String, - track: String, - scaling: "single" | "simulcast" | "svc", + scaling: Option<"simulcast" | "svc">, spatials: Number, temporals: Number, codec: "opus" | "vp8" | "vp9" | "h264" | "h265" | "av1", @@ -544,18 +660,47 @@ If source is none, the receiver will be paused. Receiver state is explained below: - `no_source`: The receiver is created but not pinned to any source. +- `waiting`: The receiver is pinned but does not received media data. - `live`: The receiver is live. - `key_only`: The receiver is live but only receives key frames, which may be for speed limiting purposes. - `inactive`: The receiver is pinned but does not have enough bandwidth to receive. -### 4.6.5.3 Receiver stats event - -**_Event:_**: `session.receivers.{id}.stats` +```mermaid +graph LR + NS[no-source] + W[waiting] + L[live] + KO[key-only] + I[inactive] + NS -->|switch| W + W -->|media data + bandwidth level 2| L + W -->|media data + bandwidth level 1| KO + L -->|speed limit| KO + W -->|media data + no bandwidth| I + L -->|no bandwidth| I + KO -->|no bandwidth| I + W -->|switch none| NS + L -->|switch none| NS + I -->|switch none| NS + KO -->|switch none| NS + I --> |bandwidth level 2| L + I --> |bandwidth level 1| KO + KO --> |bandwidth level 2| L + W --> |Source lost| NS + L --> |Source lost| NS + KO --> |Source lost| NS + I --> |Source lost| NS +``` + +### 4.6.5.3 Event: Receiver stats + +**_Event:_**: `session.receivers.stats` **_Event data:_**: ``` { + id: String, source: Option<{ bitrate: Number, rtt: Number, @@ -590,8 +735,8 @@ In connect request, we add field to features params: mode: "manual" | "auto", sources: [ { - peer: String, - track: String, + peer_id: String, + track_id: String, } ] } @@ -599,7 +744,7 @@ In connect request, we add field to features params: } ``` -#### 4.7.1.2 Add source to mixer +#### 4.7.1.2 Action: Add source to mixer Note that, this action only work with `manual` mode. @@ -611,8 +756,8 @@ Note that, this action only work with `manual` mode. { sources: [ { - peer: String, - track: String, + peer_id: String, + track_id: String, } ] } @@ -620,7 +765,7 @@ Note that, this action only work with `manual` mode. **_Response data:_**: None -#### 4.7.1.3 Remove source from mixer +#### 4.7.1.3 Action: Remove source from mixer Note that, this action only work with `manual` mode. @@ -632,8 +777,8 @@ Note that, this action only work with `manual` mode. { sources: [ { - peer: String, - track: String, + peer_id: String, + track_id: String, } ] } @@ -641,7 +786,7 @@ Note that, this action only work with `manual` mode. **_Response data:_**: None -#### 4.7.1.4 Pause mix-minus mixer +#### 4.7.1.4 Action: Pause mix-minus mixer **_Cmd:_**: `session.features.mix_minus.pause` @@ -649,7 +794,7 @@ Note that, this action only work with `manual` mode. **_Response data:_**: None -#### 4.7.1.5 Resume mix-minus mixer +#### 4.7.1.5 Action: Resume mix-minus mixer **_Cmd:_**: `session.features.mix_minus.resume` @@ -657,7 +802,7 @@ Note that, this action only work with `manual` mode. **_Response data:_**: None -#### 4.7.1.6 State event +#### 4.7.1.6 Event: State update **_Cmd:_**: `session.features.mix_minus.state` @@ -668,8 +813,8 @@ Note that, this action only work with `manual` mode. slots: [ { source: Option<{ - peer: String, - track: String, + peer_id: String, + track_id: String, audio_level: Number, }>, } From 9f2e96e7ecccaca08f8c495997235b2e81862302 Mon Sep 17 00:00:00 2001 From: Giang Minh Date: Wed, 15 May 2024 21:31:28 +0700 Subject: [PATCH 13/15] update with protobuf --- text/0005-media-webrtc-sdk.md | 901 +++++++++--------- text/0005-media-webrtc-sdk/conn.proto | 340 +++++++ text/0005-media-webrtc-sdk/features.proto | 27 + .../features_mix_minus.proto | 74 ++ text/0005-media-webrtc-sdk/gateway.proto | 28 + text/0005-media-webrtc-sdk/shared.proto | 88 ++ 6 files changed, 990 insertions(+), 468 deletions(-) create mode 100644 text/0005-media-webrtc-sdk/conn.proto create mode 100644 text/0005-media-webrtc-sdk/features.proto create mode 100644 text/0005-media-webrtc-sdk/features_mix_minus.proto create mode 100644 text/0005-media-webrtc-sdk/gateway.proto create mode 100644 text/0005-media-webrtc-sdk/shared.proto diff --git a/text/0005-media-webrtc-sdk.md b/text/0005-media-webrtc-sdk.md index 840a38b..f9e21ae 100644 --- a/text/0005-media-webrtc-sdk.md +++ b/text/0005-media-webrtc-sdk.md @@ -1,6 +1,6 @@ - Feature Name: media-webrtc-sdk - Start Date: 2024-02-02 -- RFC PR: [8xff/rfcs#0000](https://github.com/8xff/rfcs/pull/0000) +- RFC PR: [8xff/rfcs#0005](https://github.com/8xff/rfcs/pull/0005) # 1. Summary @@ -40,21 +40,22 @@ We have some terms: **Request and Response Format** -All requests and responses will be encoded in JSON format. The format is described as follows: +All requests and responses will be encoded in Protobuf format. In case of have error, server will response with below content: -**_Body:_**: JSON - -**_Response:_** - -``` -{ - success: bool, - error_code: Option, - error_msg: Option, - data: Option, +```protobuf +message Error { + uint32 code = 1; + string message = 2; } ``` +In folder 0005-media-webrtc-sdk includes all protobuf schemas: + +- [shared.proto](./0005-media-webrtc-sdk/shared.proto): all data types +- [gateway.proto](./0005-media-webrtc-sdk/gateway.proto): for interact with gateway node +- [conn.proto](./0005-media-webrtc-sdk/conn.proto): for interact with media-server over datachannel +- [features.proto](./0005-media-webrtc-sdk/features.proto): for feature extension + ## 4.2 Connect Establishment Before connecting to the server, the client needs to prepare the following: @@ -71,103 +72,39 @@ Once the client is ready, it can send a connect request to the server. **_Body_**: -``` -{ - version: Option, - room_id: String, - peer_id: String, - metadata: Option, - event: { - publish: "full" | "track", - subscribe: "full" | "track" | "manual", - }, - bitrate: { - ingress: "save" | "max", - }, - features: JSON, - tracks: { - receivers: [ - { - kind: "audio" | "video", - id: String, - state: Option<{ - source: Option<{ - peer_id: String, - track_id: String, - }>, - limit: Option<{ - priority: u16, - min_spatial: Option, - max_spatial: u8, - min_temporal: Option, - max_temporal: u8, - }>, - }> - } - ], - senders: [ - { - kind: "audio" | "video", - id: String, - source: Option<{ - id: String, - screen: bool, - }>, - metadata: Option, - state: Option<{ - active: bool, - }>, - } - ], - }, - sdp: String +```proto +message ConnectRequest { + string version = 2; + optional shared.RoomJoin join = 3; + features.Features features = 4; + shared.Tracks tracks = 5; + string sdp = 6; } ``` +In there: + +```protobuf + +``` + **_Response Data:_** ``` -{ - sdp: Option, - conn_id: String, +message ConnectResponse { + string conn_id = 1; + string sdp = 2; } ``` The explanation of each request parameter: +- token: authorization token for connectivity only, it isn't room token. - version: is the version of the client SDK. -- room_id: is the room that the client wants to connect to. This is [a-z0-9-] string, maximum is 32 characters. -- peer_id: is the ID of the client. It's used to identify the client on the server side. It's unique in the room. This is [a-z0-9-] string, maximum is 32 characters -- metadata: is the metadata of the client. It can be used to store some information about the client, such as user name .... It's optional and should be smaller than 512 characters. -- event: - - - publish: `full` will publish both peer info and track info. `track` will only publish track info. - - subscribe: `full` will subscribe to both remote peer info and track info. `track` will only subscribe to remote track info. `manual` will not subscribe to any source, the client must do it manually. This feature is useful for clients who want to use manual mode to subscribe to remote tracks. For example, in a proximity based audio application like [Gather](https://www.gather.town/), the client will set it to `manual` and only subscribe to peers that are near to it. - -- bitrate: - - - ingress is the bitrate mode for the ingress stream. In `save` mode, the media server will limit the bitrate based on the network and consumers. In `max` mode, the media server will only limit the bitrate based on the network and media server configuration. +- join: the information for joining to room, if not provide, it will connect in non-joined state and client must to call .join after have this information - features: a JSON object containing some features that the client wants to use. For example: mix-minus, spatial room, etc. -- tracks: - - - receivers: a list of receivers that the client wants to create. Each receiver is described with: - - - kind: the kind of receiver, audio or video. - - id: the ID of the receiver. - - state: the state of the receiver. It's used to restore the receiver state when the client reconnects to the server. It contains: - - source: the remote source that the client wants to pin to. If it's none, the receiver will be created but not pinned to any source. - - limit: the limit of the receiver. If it's none, the receiver will be created with the default limit. - - - senders: a list of senders that the client wants to create. Each sender is described with: - - kind: the kind of sender, audio or video. - - id: the ID of the sender. - - uuid: the UUID of the sender. It's used to identify the sender on the client side. - - metadata: It can be used to store some information about the client's track, such as label name, device .... It's optional and should be smaller than 512 characters. - - state: the state of the sender. It's used to restore the sender state when the client reconnects to the server. It contains: - - screen: a flag to indicate whether the sender is screen sharing. - - pause: a flag to indicate whether the sender is paused. - +- tracks: list of senders and receivers, which include state of it for fast initializing - sdp: the OfferSDP that the client created. The explanation of each response parameter: @@ -177,13 +114,13 @@ The explanation of each response parameter: Error list: -| Error code | Description | -| --------------------- | ----------------------- | -| INVALID_TOKEN | The token is invalid. | -| INVALID_SDP | The sdp is invalid. | -| INVALID_REQUEST | The request is invalid. | -| INTERNAL_SERVER_ERROR | The server is error. | -| GATEWAY_ERROR | The gateway is error. | +| Code | Error | Description | +| ------ | --------------------- | ----------------------- | +| TODO | INVALID_TOKEN | The token is invalid. | +| 0x2000 | INVALID_SDP | The sdp is invalid. | +| TODO | INVALID_REQUEST | The request is invalid. | +| 0x2001 | INTERNAL_SERVER_ERROR | The server is error. | +| TODO | GATEWAY_ERROR | The gateway is error. | After that, the client needs to wait for the connected event from the WebRTC connection and the connected event from the data channel. If the client doesn't receive any event after a period of time, it will set the restart ice flag and retry connecting to the server with the newest offer SDP. @@ -219,53 +156,60 @@ Each time the client's WebRTC connection has a new ice-candidate, it should be s Error list: -| Error code | Description | -| --------------------- | ----------------------- | -| INVALID_CONN | The conn_id is invalid. | -| INVALID_ICE | The ice is invalid. | -| INVALID_REQUEST | The request is invalid. | -| INTERNAL_SERVER_ERROR | The server is error. | -| GATEWAY_ERROR | The gateway is error. | +| Code | Error | Description | +| ---- | --------------------- | ----------------------- | +| | INVALID_CONN | The conn_id is invalid. | +| | INVALID_ICE | The ice is invalid. | +| | INVALID_REQUEST | The request is invalid. | +| | INTERNAL_SERVER_ERROR | The server is error. | +| | GATEWAY_ERROR | The gateway is error. | ## 4.5 Datachannel Request/Response format -The format for encoding all requests and responses sent over the data channel is JSON. The structure of the request and response objects is as follows: +The format for encoding all requests and responses sent over the data channel is Protobuf. The structure of the request and response objects is as follows: -Request/Event: +```proto +Request { + uint32 req_id = 1; + oneof request { + Session session = 2; + Sender sender = 3; + Receiver receiver = 4; + } +} -``` -{ - type: "event" | "request", - seq: Number, - cmd: String, - data: Option, +message ClientEvent { + uint32 seq = 1; + oneof event { + Request request = 2; + } } ``` -Response: +```proto +message Response { + uint32 req_id = 1; + oneof response { + shared.Error error = 2; + Session session = 3; + Sender sender = 4; + Receiver receiver = 5; + } +} -``` -{ - type: "answer", - seq: Number, - success: bool, - error_code: Option, - error_msg: Option, - data: Option, +message ServerEvent { + uint32 seq = 1; + oneof event { + Session session = 2; + Room room = 3; + Sender sender = 4; + Receiver receiver = 5; + Response response = 6; + } } ``` -The seq is an incremental value generated on the sender side. It helps us to map between requests and responses and also detect data loss. - -The cmd is generate with rule: `identify.action`, for example: - -- `session.update_sdp`. -- `session.senders.toggle`. -- `sessions.receivers.switch`. -- `room.peers.subscribe`. -- `room.peers.unsubscribe`. -- `session.disconnect`. -- `session.features.mix_minus.sources.add`. +The seq is an incremental value generated on the sender side. It helps us to detect data loss. The req_id is used to map between requests and responses and also ## 4.6 In-session requests @@ -280,293 +224,309 @@ Typically, the client will need to perform various actions with the media server All actions that involve changing tracks will be performed locally first, and then the `update_sdp` command will be sent to the server. -### 4.6.1 Update SDP +### 4.6.1 Session requests, events -Each time we make changes to the WebRTC connection or negotiationneeded event fired, we need to send an `update_sdp` request to the server over the data channel. This request is described below: +``` +message Request { + message Session { + message RoomJoin { + shared.RoomJoin info = 1; + string token = 2; + } -**_Cmd:_**: `session.update_sdp` + message RoomLeave { -**_Data:_** + } -``` -{ - sdp: String, - tracks: { - receivers: [ - { - kind: "audio" | "video", - id: String, - state: Option<{ - source: Option<{ - peer_id: String, - track_id: String, - }>, - limit: Option<{ - priority: u16, - min_spatial: Option, - max_spatial: u8, - min_temporal: Option, - max_temporal: u8, - }>, - }> - } - ], - senders: [ - { - kind: "audio" | "video", - id: String, - source: Option<{ - id: String, - screen: bool, - }>, - metadata: Option, - state: Option<{ - active: bool, - }>, - } - ], - } -} -``` + message UpdateSdp { + shared.Tracks tracks = 1; + string sdp = 2; + } -**_Response data_**: + message Disconnect { -``` -{ - sdp: String + } + + oneof request { + RoomJoin join = 1; + RoomLeave leave = 2; + UpdateSdp sdp = 3; + Disconnect disconnect = 4; + } + } } ``` -If state is defined, the server will update the state of the receiver/sender. If state is none, the server will not update the state of the receiver/sender. +``` +message Response { + message Session { + message RoomJoin { -### 4.6.2 Room actions, event + } -We can subscribe to peers event (joined, left, track added, track removed) and also can unsubscribe from it. + message RoomLeave { -#### 4.6.2.1 Action: Subscribe to other peers event + } -(Note that this action only works with `event.subscribe` manual mode.) + message UpdateSdp { + string sdp = 1; + } -**_Cmd:_** `room.peers.subscribe` + message Disconnect { -**_Request Data:_** + } -``` -{ - peer_ids: [String], + oneof response { + RoomJoin join = 1; + RoomLeave leave = 2; + UpdateSdp sdp = 3; + Disconnect disconnect = 4; + } + } } ``` -**_Response Data:_** None - -#### 4.6.2.2 Action: Unsubscribe to other peers event - -(Note that this action only works with `subscribe` manual mode.) - -**_Cmd:_**: `room.peers.unsubscribe` - -**_Request Data:_** - ``` -{ - peer_ids: [String], -} +message ServerEvent { + message Session { + message Connected { -``` + } -#### 4.6.2.3 Event: Peer joined + message JoinedRoom { + string room = 1; + string peer = 2; + } -**_Cmd:_**: `room.peers.added` + message LeavedRoom { + string room = 1; + string peer = 2; + } -**_Event data:_**: + message Disconnected { + string reason = 1; + } -``` -{ - peer_id: String, - metadata: Option, + oneof event { + Connected connected = 1; + JoinedRoom joined = 2; + LeavedRoom leaved = 3; + Disconnected disconnected = 4; + } + } } ``` -**_Response Data:_**: None +#### 4.6.1.1 Update SDP action -#### 4.6.2.3 Event: Peer left +Each time we make changes to the WebRTC connection or negotiationneeded event fired, we need to send an `update_sdp` request to the server over the data channel. The update sdp request must to include tracks state. -**_Cmd:_**: `room.peers.removed` +#### 4.6.1.2 Room join/leave action -**_Event data:_**: +Client can dynamic join to other room with Join/Leave request. This is useful when we don't have information about room at connecting state, just connect and join after have. This feature also useful will video conference application, where user can join to other child rooms without recreate connection. -``` -{ - peer_id: String, -} -``` - -**_Response Data:_**: None +When user successful joined or leaved a room, server also send event JoinedRoom or LeavedRoom to client. -#### 4.6.2.3 Event: Track added +#### 4.6.1.3 GoAway event -**_Cmd:_**: `room.tracks.added` +Goaway event is sent by the server in some cases: -**_Event data:_**: +- The server is going to shutdown or restart. +- The client lifetime is expired, this is useful in some video conference application where each client only has limited session time; example 1 hour. +- The client is kicked by the server. -``` -{ - kind: "audio" | "video", - peer_id: String, - track_id: String, - source: Option<{ - id: String, - screen: bool, - }>, - metadata: Option, - state: { - active: bool, - scaling: Option<"simulcast" | "svc">, - }, -} -``` +`remain_seconds` represents the remaining time that the client will be served by the server. In the event that a client needs to reconnect, it should do so before the remain_seconds expire to avoid interrupting the client session. -#### 4.6.2.3 Event: Track updated +In case of "shutdown", the client should reconnect by sending restart-ice request. -**_Cmd:_**: `room.peers.tracks.updated` +### 4.6.2 Room requests, events -**_Event data:_**: +We can subscribe to peers event (joined, left, track added, track removed) and also can unsubscribe from it. ``` -{ - kind: "audio" | "video", - peer_id: String, - track_id: String, - source: Option<{ - id: String, - screen: bool, - }>, - metadata: Option, - state: { - active: bool, - scaling: Option<"simulcast" | "svc">, - }, -} -``` - -#### 4.6.2.3 Event: Track removed - -**_Cmd:_**: `room.tracks.removed` +message Request { + message Rooom { + message SubscribePeer { + string peer = 1; + } -**_Event data:_**: + message UnsubscribePeer { + string peer = 1; + } -``` -{ - kind: "audio" | "video", - peer_id: String, - track_id: String, + oneof request { + SubscribePeer subscribe; + UnsubscribePeer unsubscribe; + } + } } ``` -**_Response Data:_**: None - -### 4.6.3 Session actions, event +``` +message Response { + message Rooom { + message SubscribePeer { -#### 4.6.3.1 Action: Disconnect + } -**_Cmd:_**: `session.disconnect` + message UnsubscribePeer { -**_Request data:_**: None + } -**_Response:_**: None + oneof response { + SubscribePeer subscribe; + UnsubscribePeer unsubscribe; + } + } +} +``` -#### 4.6.3.2 Event: Goaway +``` +message ServerEvent { + message Room { + message PeerJoined { + string peer = 1; + optional string metadata = 2; + } -Goaway event is sent by the server in some cases: + message PeerUpdated { + string peer = 1; + optional string metadata = 2; + } -- The server is going to shutdown or restart. -- The client lifetime is expired, this is useful in some video conference application where each client only has limited session time; example 1 hour. -- The client is kicked by the server. + message PeerLeaved { + string peer = 1; + } -In case of a server shutdown or restart, the client should reconnect by sending restart-ice request. + message TrackStarted { + string peer = 1; + string track = 2; + shared.Kind kind = 3; + optional string metadata = 4; + } -**_Cmd:_**: `session.on_goaway` + message TrackUpdated { + string peer = 1; + string track = 2; + shared.Kind kind = 3; + optional string metadata = 4; + } -**_Event data:_**: + message TrackStopped { + string peer = 1; + string track = 2; + shared.Kind kind = 3; + } -``` -{ - reason: "shutdown" | "kick", - message: Option, - remain_seconds: Number, + oneof event { + PeerJoined peer_joined = 1; + PeerUpdated peer_updated = 2; + PeerLeaved peer_leaved = 3; + TrackStarted track_started = 4; + TrackUpdated track_updated = 5; + TrackStopped track_stopped = 6; + } + } } ``` -**_Response Data:_**: None +#### 4.6.2.1 Action: Subscribe/Unsubscribe to other peers event -`remain_seconds` represents the remaining time that the client will be served by the server. In the event that a client needs to reconnect, it should do so before the remain_seconds expire to avoid interrupting the client session. +(Note that this action only works with `subscribe.tracks` is false.) -In case of "shutdown", the client should reconnect by sending restart-ice request. +This feature for allowing client free to select what peer it interested in, it is useful in spatial-chat application like `Gather.town`. By subscribe, server will send peer's tracks event like: started, stopped or updated. -### 4.6.4 Session Sender create/release, actions, events +#### 4.6.2.3 Event: Peer joined / updated / leaved -For creating a sender, we need to create a transceiver with kind as audio or video. After that, we need to create a track and add it to the transceiver. Then we need to send an `update_sdp` request to the server. +If client connect with `subscribe.peers` is true, server will send event about joined and leaved event of each peers which joined with `publish.peer` is true. -For destroying a sender, we need to remove the track from the transceiver and remove the transceiver from the connection. Then we need to send an `update_sdp` request to the server. +#### 4.6.2.3 Event: Track started / updated / stopped -Each sender has some actions and events with the following rule: `session.sender.{id}.{action}` +If client connect with `subscribe.tracks` is true or client subscribed to the peer, server will send event about peer's track started or stopped event of each peers which joined with `publish.tracks` is true. -#### 4.6.4.1 Action: Switch sender source +### 4.6.3 Sender, actions, events -This action is used when the user changes the source, for example, when the user changes the camera or microphone. This can also be used when the user stops sharing the camera, in which case we will release the local stream and send a switch without the source param. +For creating a sender, we need to create a transceiver with kind as audio or video. After that, we need to create a track and add it to the transceiver. Then we need to send an `update_sdp` request to the server. -**_Cmd:_**: `session.senders.switch` +``` +message Request { + message Sender { + message Attach { + shared.Sender.Source source = 1; + shared.Sender.Config config = 2; + } -**_Request data:_** + message Detach { -``` -{ - id: String, - source: Option<{ - id: String, - screen: bool, - }>, - metadata: Option, + } + + string name = 1; + oneof request { + Attach attach = 2; + Detach detach = 3; + shared.Sender.Config config = 4; + } + } } ``` -**_Response Data:_**: None +``` +message Response { + message Sender { + message Attach { -If source is none, this sender will be removed from the room, and the receiver that is pinned to this sender will receive an updated event with the source not set. The room also sends a `room.senders.removed` event to all subscribed peers. + } -If the source is set and changed, this sender's active state will be reset to true. Note that screen and metadata only work with a changed source value. In the case of the source not being changed, the server will refuse the request. + message Detach { -#### 4.6.4.1 Action: Toggle sender pause/resume + } -This action is used when user mute/unmute the sender, this is useful when toggle the microphone button, we just stop local source and sending toggle with active false param. + message Config { -**_Cmd:_**: `session.senders.toggle` + } -**_Request data:_** + oneof response { + Attach attach = 1; + Detach detach = 2; + Config config = 3; + } + } +} +``` ``` -{ - id: String, - active: bool, +message ServerEvent { + message Sender { + message State { + enum StateType { + WAITING = 0; + NO_SOURCE = 1; + ACTIVE = 2; + INACTIVE = 3; + } + + StateType state = 1; + } + + string name = 1; + oneof event { + State state = 2; + } + } } ``` -**_Response Data:_**: None +#### 4.6.3.1 Attach or detach sender source action -#### 4.6.4.2 Event: State event +This action is used when the user changes the source, for example, when the user changes the camera or microphone. This can also be used when the user stops sharing the camera, in which case we will release the local stream and send a switch without the source param. -This event is sent by the server when the state of the sender is changed. This is useful when the client implements loading animation when the user changes the source, or when the user mutes/unmutes the sender. +If source is none, this sender will be removed from the room, and the receiver that is pinned to this sender will receive an updated event with the source not set. The room also sends a TrackStopped event to all subscribed peers. -**_Cmd:_**: `session.senders.state` +#### 4.6.3.2 State updated event -**_Event data:_**: +This event is sent by the server when the state of the sender is changed. This is useful when the client implements loading animation when the user changes the source, or when the user mutes/unmutes the sender. -``` -{ - id: String, - state: "waiting" | "no-source" | "active" | "inactive" -} -``` +Sender states is explained below: - Waiting: The sender is pinned but server dont received any media data. - No-source: The sender is not pinned to any source. @@ -592,72 +552,108 @@ graph LR I -->|switch none| NS ``` -### 4.6.5 Session Receiver create/release, actions +### 4.6.4 Receiver create/release, actions To create a receiver, we need to create a transceiver with kind as audio or video. After that, we need to create a track and add it to the transceiver. Then we need to send an `update_sdp` request to the server. -Each receiver has some actions and events with the following rule: `session.receiver.{id}.{action}` - -### 4.6.5.1 Action: Switch receiver source +``` +message Request { + message Receiver { + message Attach { + shared.Receiver.Source source = 1; + shared.Receiver.Config config = 2; + } -**_Cmd:_**: `session.receivers.switch` + message Detach { -**_Request data:_** + } -``` -{ - id: string, - priority: u16, - source: Option<{ - peer_id: String, - track_id: String, - }>, + string name = 1; + oneof request { + Attach attach = 2; + Detach detach = 3; + shared.Receiver.Config config = 4; + } + } } ``` -**_Response data:_**: None +``` +message Response { + message Receiver { + message Attach { -If source is none, the receiver will be paused. + } + + message Detach { -### 4.6.5.2 Action: Limit receiver bitrate + } -**_Cmd:_**: `session.receiver.limit` + message Config { -**_Request data:_** + } -``` -{ - id: String, - priority: u16, - min_spatial: Option, - max_spatial: u8, - min_temporal: Option, - max_temporal: u8, + oneof response { + Attach attach = 1; + Detach detach = 2; + Config config = 3; + } + } } ``` -**_Response data:_**: None +``` +message ServerEvent { + message Receiver { + message State { + enum StateType { + NO_SOURCE = 0; + WAITING = 1; + LIVE = 2; + INACTIVE = 3; + } -### 4.6.5.3 Event: Receiver state + StateType state = 1; + } -**_Event:_**: `session.receivers.state` + message Stats { + message Source { + uint32 bitrate_kbps = 1; + float rtt = 2; + float lost = 3; + float jitter = 4; + } -**_Event data:_**: + message Transmit { + uint32 spatial = 1; + uint32 temporal = 2; + uint32 bitrate_kbps = 3; + } -``` -{ - id: String, - state: "no_source" | "waiting" | "live" | "key_only" | "inactive", - source: Option<{ - scaling: Option<"simulcast" | "svc">, - spatials: Number, - temporals: Number, - codec: "opus" | "vp8" | "vp9" | "h264" | "h265" | "av1", - }>, + optional Source source = 1; + optional Transmit transmit = 2; + } + + string name = 1; + oneof event { + State state = 2; + Stats stats = 3; + } + } } ``` -Receiver state is explained below: +### 4.6.4.1 Attach or detach receiver source action + +If source is none, the receiver will be paused. + +### 4.6.4.2 Config action + +We can provide new config when UI changed for updating priority and limit of quality for better bitrate usage. + +### 4.6.5.3 State updated event + +Receiver states is explained below: - `no_source`: The receiver is created but not pinned to any source. - `waiting`: The receiver is pinned but does not received media data. @@ -692,28 +688,9 @@ graph LR I --> |Source lost| NS ``` -### 4.6.5.3 Event: Receiver stats +### 4.6.5.3 Receiver stats event -**_Event:_**: `session.receivers.stats` - -**_Event data:_**: - -``` -{ - id: String, - source: Option<{ - bitrate: Number, - rtt: Number, - lost: Number, - jitter: Number, - }>, - transmit: Option<{ - spatial: Number, - temporal: Number, - bitrate: Number, - }>, -} -``` +Stats information can be used for show current issues for viewer, which can provide more useful about source of the issues. ## 4.7 Features @@ -724,103 +701,91 @@ The mix-minus feature has two modes: - Manual: In this mode, the client can manually add or remove sources to the mixer. - Auto: In this mode, the media server will automatically add or remove all audio sources except the local source to the mixer. -#### 4.7.1.1 Connect request +``` +message Request { + message Attach { + repeated Source sources = 1; + } -In connect request, we add field to features params: + message Detach { + repeated Source sources = 1; + } -``` -{ - features: { - mix_minus: { - mode: "manual" | "auto", - sources: [ - { - peer_id: String, - track_id: String, - } - ] - } + oneof request { + Attach attach = 1; + Detach detach = 2; } } ``` -#### 4.7.1.2 Action: Add source to mixer +``` +message Response { + message Attach { -Note that, this action only work with `manual` mode. + } -**_Cmd:_**: `session.features.mix_minus.sources.add` + message Detach { -**_Request data:_** + } -``` -{ - sources: [ - { - peer_id: String, - track_id: String, - } - ] + oneof response { + Attach attach = 1; + Detach detach = 2; + } } ``` -**_Response data:_**: None - -#### 4.7.1.3 Action: Remove source from mixer +``` +message ServerEvent { + message MappingSlotSet { + uint32 slot = 1; + Source source = 2; + } -Note that, this action only work with `manual` mode. + message MappingSlotDel { + uint32 slot = 1; + } -**_Cmd:_**: `session.features.mix_minus.sources.remove` + message SlotAudioLevel { + uint32 slot = 1; + int32 audio_level = 2; + } -**_Request data:_** + message MappingSlotsAudioLevel { + repeated SlotAudioLevel slots = 1; + } -``` -{ - sources: [ - { - peer_id: String, - track_id: String, - } - ] + oneof event { + MappingSlotSet slot_set = 1; + MappingSlotDel slot_del = 2; + MappingSlotsAudioLevel slots_audio_level = 3; + } } ``` -**_Response data:_**: None - -#### 4.7.1.4 Action: Pause mix-minus mixer - -**_Cmd:_**: `session.features.mix_minus.pause` - -**_Request data:_** None - -**_Response data:_**: None - -#### 4.7.1.5 Action: Resume mix-minus mixer +#### 4.7.1.1 Connect request -**_Cmd:_**: `session.features.mix_minus.resume` +In connect request, we add field to features params: -**_Request data:_**: None +``` +message Source { + string peer = 1; + string track = 2; +} -**_Response data:_**: None +message Config { + Mode mode = 1; + repeated Source sources = 2; +} +``` -#### 4.7.1.6 Event: State update +#### 4.7.1.2 Action: Add/remove source to mixer -**_Cmd:_**: `session.features.mix_minus.state` +Note that, this action only work with `manual` mode. -**_Event data:_** +#### 4.7.1.3 State update event -``` -{ - slots: [ - { - source: Option<{ - peer_id: String, - track_id: String, - audio_level: Number, - }>, - } - ] -} -``` +We have 2 types of event, slot bind changed and slot audio level. # 5. Drawbacks @@ -835,7 +800,7 @@ No drawbacks. We have some alternatives: - Whip/Whep: but it not flexible and cannot be used to create complex media stream topology. -- Livekit protocol: the protocol don't have document and it's is designed for Livekit server topology. +- Livekit protocol: the protocol is designed for Livekit server topology. # 7. Unresolved questions diff --git a/text/0005-media-webrtc-sdk/conn.proto b/text/0005-media-webrtc-sdk/conn.proto new file mode 100644 index 0000000..6959b07 --- /dev/null +++ b/text/0005-media-webrtc-sdk/conn.proto @@ -0,0 +1,340 @@ +syntax = "proto3"; + +import "shared.proto"; +import "features.proto"; + +package conn; + +message Request { + message Session { + message RoomJoin { + shared.RoomJoin info = 1; + string token = 2; + } + + message RoomLeave { + + } + + message UpdateSdp { + shared.Tracks tracks = 1; + string sdp = 2; + } + + message Disconnect { + + } + + oneof request { + RoomJoin join = 1; + RoomLeave leave = 2; + UpdateSdp sdp = 3; + Disconnect disconnect = 4; + } + } + + message Rooom { + message SubscribePeer { + string peer = 1; + } + + message UnsubscribePeer { + string peer = 1; + } + + oneof request { + SubscribePeer subscribe = 1; + UnsubscribePeer unsubscribe = 2; + } + } + + message Sender { + message Attach { + shared.Sender.Source source = 1; + shared.Sender.Config config = 2; + } + + message Detach { + + } + + string name = 1; + oneof request { + Attach attach = 2; + Detach detach = 3; + shared.Sender.Config config = 4; + } + } + + message Receiver { + message Attach { + shared.Receiver.Source source = 1; + shared.Receiver.Config config = 2; + } + + message Detach { + + } + + string name = 1; + oneof request { + Attach attach = 2; + Detach detach = 3; + shared.Receiver.Config config = 4; + } + } + + uint32 req_id = 1; + oneof request { + Session session = 2; + Rooom room = 3; + Sender sender = 4; + Receiver receiver = 5; + features.Request features = 6; + } +} + +message Response { + message Session { + message RoomJoin { + + } + + message RoomLeave { + + } + + message UpdateSdp { + string sdp = 1; + } + + message Disconnect { + + } + + oneof response { + RoomJoin join = 1; + RoomLeave leave = 2; + UpdateSdp sdp = 3; + Disconnect disconnect = 4; + } + } + + message Room { + message SubscribePeer { + + } + + message UnsubscribePeer { + + } + + oneof response { + SubscribePeer subscribe = 1; + UnsubscribePeer unsubscribe = 2; + } + } + + message Sender { + message Attach { + + } + + message Detach { + + } + + message Config { + + } + + oneof response { + Attach attach = 1; + Detach detach = 2; + Config config = 3; + } + } + + message Receiver { + message Attach { + + } + + message Detach { + + } + + message Config { + + } + + oneof response { + Attach attach = 1; + Detach detach = 2; + Config config = 3; + } + } + + uint32 req_id = 1; + oneof response { + shared.Error error = 2; + Session session = 3; + Room room = 4; + Sender sender = 5; + Receiver receiver = 6; + features.Request features = 7; + } +} + +message ServerEvent { + message Session { + message Connected { + + } + + message JoinedRoom { + string room = 1; + string peer = 2; + } + + message LeavedRoom { + string room = 1; + string peer = 2; + } + + message Disconnected { + string reason = 1; + } + + message GoAway { + string reason = 1; + uint32 remain_seconds = 2; + } + + oneof event { + Connected connected = 1; + JoinedRoom joined = 2; + LeavedRoom leaved = 3; + Disconnected disconnected = 4; + GoAway goway = 5; + } + } + + message Room { + message PeerJoined { + string peer = 1; + optional string metadata = 2; + } + + message PeerUpdated { + string peer = 1; + optional string metadata = 2; + } + + message PeerLeaved { + string peer = 1; + } + + message TrackStarted { + string peer = 1; + string track = 2; + shared.Kind kind = 3; + optional string metadata = 4; + } + + message TrackUpdated { + string peer = 1; + string track = 2; + shared.Kind kind = 3; + optional string metadata = 4; + } + + message TrackStopped { + string peer = 1; + string track = 2; + shared.Kind kind = 3; + } + + oneof event { + PeerJoined peer_joined = 1; + PeerUpdated peer_updated = 2; + PeerLeaved peer_leaved = 3; + TrackStarted track_started = 4; + TrackUpdated track_updated = 5; + TrackStopped track_stopped = 6; + } + } + + message Sender { + message State { + enum StateType { + WAITING = 0; + NO_SOURCE = 1; + ACTIVE = 2; + INACTIVE = 3; + } + + StateType state = 1; + } + + string name = 1; + oneof event { + State state = 2; + } + } + + message Receiver { + message State { + enum StateType { + NO_SOURCE = 0; + WAITING = 1; + LIVE = 2; + KEY_ONLY = 3; + INACTIVE = 4; + } + + StateType state = 1; + } + + message Stats { + message Source { + uint32 bitrate_kbps = 1; + float rtt = 2; + float lost = 3; + float jitter = 4; + } + + message Transmit { + uint32 spatial = 1; + uint32 temporal = 2; + uint32 bitrate_kbps = 3; + } + + optional Source source = 1; + optional Transmit transmit = 2; + } + + string name = 1; + oneof event { + State state = 2; + Stats stats = 3; + } + } + + uint32 seq = 1; + oneof event { + Session session = 2; + Room room = 3; + Sender sender = 4; + Receiver receiver = 5; + Response response = 6; + features.ServerEvent features = 7; + } +} + +message ClientEvent { + uint32 seq = 1; + oneof event { + Request request = 2; + } +} diff --git a/text/0005-media-webrtc-sdk/features.proto b/text/0005-media-webrtc-sdk/features.proto new file mode 100644 index 0000000..d050461 --- /dev/null +++ b/text/0005-media-webrtc-sdk/features.proto @@ -0,0 +1,27 @@ +syntax = "proto3"; + +import "features_mix_minus.proto"; + +package features; + +message Config { + optional mix_minus.Config mix_minus = 1; +} + +message Request { + oneof request { + mix_minus.Request mix_minus = 1; + } +} + +message Response { + oneof response { + mix_minus.Response mix_minus = 1; + } +} + +message ServerEvent { + oneof event { + mix_minus.ServerEvent mix_minus = 1; + } +} diff --git a/text/0005-media-webrtc-sdk/features_mix_minus.proto b/text/0005-media-webrtc-sdk/features_mix_minus.proto new file mode 100644 index 0000000..c447e68 --- /dev/null +++ b/text/0005-media-webrtc-sdk/features_mix_minus.proto @@ -0,0 +1,74 @@ +syntax = "proto3"; + +package mix_minus; + +enum Mode { + AUTO = 0; + MANUAL = 1; +} + +message Source { + string peer = 1; + string track = 2; +} + +message Config { + Mode mode = 1; + repeated Source sources = 2; +} + +message Request { + message Attach { + repeated Source sources = 1; + } + + message Detach { + repeated Source sources = 1; + } + + oneof request { + Attach attach = 1; + Detach detach = 2; + } +} + +message Response { + message Attach { + + } + + message Detach { + + } + + oneof response { + Attach attach = 1; + Detach detach = 2; + } +} + +message ServerEvent { + message MappingSlotSet { + uint32 slot = 1; + Source source = 2; + } + + message MappingSlotDel { + uint32 slot = 1; + } + + message SlotAudioLevel { + uint32 slot = 1; + int32 audio_level = 2; + } + + message MappingSlotsAudioLevel { + repeated SlotAudioLevel slots = 1; + } + + oneof event { + MappingSlotSet slot_set = 1; + MappingSlotDel slot_del = 2; + MappingSlotsAudioLevel slots_audio_level = 3; + } +} diff --git a/text/0005-media-webrtc-sdk/gateway.proto b/text/0005-media-webrtc-sdk/gateway.proto new file mode 100644 index 0000000..62c0b9d --- /dev/null +++ b/text/0005-media-webrtc-sdk/gateway.proto @@ -0,0 +1,28 @@ +syntax = "proto3"; + +import "shared.proto"; +import "features.proto"; + +package gateway; + +message ConnectRequest { + string version = 2; + optional shared.RoomJoin join = 3; + features.Config features = 4; + shared.Tracks tracks = 5; + string sdp = 6; +} + +message ConnectResponse { + string conn_id = 1; + string sdp = 2; +} + +message RemoteIceRequest { + string conn_id = 1; + string candidate = 2; +} + +message RemoteIceResponse { + +} diff --git a/text/0005-media-webrtc-sdk/shared.proto b/text/0005-media-webrtc-sdk/shared.proto new file mode 100644 index 0000000..68f48ce --- /dev/null +++ b/text/0005-media-webrtc-sdk/shared.proto @@ -0,0 +1,88 @@ +syntax = "proto3"; + +package shared; + +enum Kind { + AUDIO = 0; + VIDEO = 1; +} + +message Receiver { + message Source { + string peer = 1; + string track = 2; + } + + message Config { + uint32 priority = 1; + uint32 max_spatial = 2; + uint32 max_temporal = 3; + optional uint32 min_spatial = 4; + optional uint32 min_temporal = 5; + } + + message State { + Config config = 1; + optional Source source = 2; + } + + Kind kind = 1; + string name = 2; + State state = 3; +} + +message Sender { + message Source { + string id = 1; + bool screen = 2; + optional string metadata = 3; + } + + message Config { + uint32 priority = 1; + BitrateControlMode bitrate = 2; + } + + message State { + Config config = 1; + optional Source source = 2; + } + + Kind kind = 1; + string name = 2; + State state = 3; +} + + +message Tracks { + repeated Receiver receivers = 1; + repeated Sender senders = 2; +} + +message RoomInfoPublish { + bool peer = 1; + bool tracks = 2; +} + +message RoomInfoSubscribe { + bool peers = 1; + bool tracks = 2; +} + +message RoomJoin { + string room = 1; + string peer = 2; + RoomInfoPublish publish = 3; + RoomInfoSubscribe subscribe = 4; + optional string metadata = 5; +} + +enum BitrateControlMode { + DYNAMIC_CONSUMERS = 0; + MAX_BITRATE = 1; +} + +message Error { + uint32 code = 1; + string message = 2; +} From 846c90859a9b2593f0a198a89f4a2bf0f5d86615 Mon Sep 17 00:00:00 2001 From: Giang Minh Date: Thu, 16 May 2024 10:30:22 +0700 Subject: [PATCH 14/15] update with remote ice --- text/0005-media-webrtc-sdk.md | 59 ++++++++++++------------ text/0005-media-webrtc-sdk/gateway.proto | 6 +-- 2 files changed, 33 insertions(+), 32 deletions(-) diff --git a/text/0005-media-webrtc-sdk.md b/text/0005-media-webrtc-sdk.md index f9e21ae..25dd227 100644 --- a/text/0005-media-webrtc-sdk.md +++ b/text/0005-media-webrtc-sdk.md @@ -82,18 +82,13 @@ message ConnectRequest { } ``` -In there: - -```protobuf - -``` - **_Response Data:_** -``` +```proto message ConnectResponse { string conn_id = 1; string sdp = 2; + bool ice_lite = 3; } ``` @@ -106,6 +101,7 @@ The explanation of each request parameter: - features: a JSON object containing some features that the client wants to use. For example: mix-minus, spatial room, etc. - tracks: list of senders and receivers, which include state of it for fast initializing - sdp: the OfferSDP that the client created. +- ice_lite: if this is true, that mean server is running without ice-trickle, so we dont need to send ice candidate to server. The explanation of each response parameter: @@ -144,22 +140,27 @@ Each time the client's WebRTC connection has a new ice-candidate, it should be s **_Endpoint_**: POST `GATEWAY/webrtc/conns/:conn_id/ice-candidate` -**_Body_**: +Request: -``` -{ - candidate: String +```proto +message RemoteIceRequest { + repeated string candidates = 1; } ``` -**_Response Data_**: None +Response: + +```proto +message RemoteIceResponse { + uint32 added = 1; +} +``` Error list: | Code | Error | Description | | ---- | --------------------- | ----------------------- | | | INVALID_CONN | The conn_id is invalid. | -| | INVALID_ICE | The ice is invalid. | | | INVALID_REQUEST | The request is invalid. | | | INTERNAL_SERVER_ERROR | The server is error. | | | GATEWAY_ERROR | The gateway is error. | @@ -226,7 +227,7 @@ All actions that involve changing tracks will be performed locally first, and th ### 4.6.1 Session requests, events -``` +```proto message Request { message Session { message RoomJoin { @@ -257,7 +258,7 @@ message Request { } ``` -``` +```proto message Response { message Session { message RoomJoin { @@ -286,7 +287,7 @@ message Response { } ``` -``` +```proto message ServerEvent { message Session { message Connected { @@ -343,7 +344,7 @@ In case of "shutdown", the client should reconnect by sending restart-ice reques We can subscribe to peers event (joined, left, track added, track removed) and also can unsubscribe from it. -``` +```proto message Request { message Rooom { message SubscribePeer { @@ -362,7 +363,7 @@ message Request { } ``` -``` +```proto message Response { message Rooom { message SubscribePeer { @@ -381,7 +382,7 @@ message Response { } ``` -``` +```proto message ServerEvent { message Room { message PeerJoined { @@ -448,7 +449,7 @@ If client connect with `subscribe.tracks` is true or client subscribed to the pe For creating a sender, we need to create a transceiver with kind as audio or video. After that, we need to create a track and add it to the transceiver. Then we need to send an `update_sdp` request to the server. -``` +```proto message Request { message Sender { message Attach { @@ -470,7 +471,7 @@ message Request { } ``` -``` +```proto message Response { message Sender { message Attach { @@ -494,7 +495,7 @@ message Response { } ``` -``` +```proto message ServerEvent { message Sender { message State { @@ -556,7 +557,7 @@ graph LR To create a receiver, we need to create a transceiver with kind as audio or video. After that, we need to create a track and add it to the transceiver. Then we need to send an `update_sdp` request to the server. -``` +```proto message Request { message Receiver { message Attach { @@ -578,7 +579,7 @@ message Request { } ``` -``` +```proto message Response { message Receiver { message Attach { @@ -602,7 +603,7 @@ message Response { } ``` -``` +```proto message ServerEvent { message Receiver { message State { @@ -701,7 +702,7 @@ The mix-minus feature has two modes: - Manual: In this mode, the client can manually add or remove sources to the mixer. - Auto: In this mode, the media server will automatically add or remove all audio sources except the local source to the mixer. -``` +```proto message Request { message Attach { repeated Source sources = 1; @@ -718,7 +719,7 @@ message Request { } ``` -``` +```proto message Response { message Attach { @@ -735,7 +736,7 @@ message Response { } ``` -``` +```proto message ServerEvent { message MappingSlotSet { uint32 slot = 1; @@ -767,7 +768,7 @@ message ServerEvent { In connect request, we add field to features params: -``` +```proto message Source { string peer = 1; string track = 2; diff --git a/text/0005-media-webrtc-sdk/gateway.proto b/text/0005-media-webrtc-sdk/gateway.proto index 62c0b9d..8efb92d 100644 --- a/text/0005-media-webrtc-sdk/gateway.proto +++ b/text/0005-media-webrtc-sdk/gateway.proto @@ -16,13 +16,13 @@ message ConnectRequest { message ConnectResponse { string conn_id = 1; string sdp = 2; + bool ice_lite = 3; } message RemoteIceRequest { - string conn_id = 1; - string candidate = 2; + repeated string candidates = 1; } message RemoteIceResponse { - + uint32 added = 1; } From f15739b1e162530af47216ab081f6afd40168069 Mon Sep 17 00:00:00 2001 From: Giang Minh Date: Tue, 21 May 2024 09:28:46 +0700 Subject: [PATCH 15/15] reduce complexity of sender and receiver event --- text/0005-media-webrtc-sdk.md | 110 ++++-------------------- text/0005-media-webrtc-sdk/conn.proto | 19 +--- text/0005-media-webrtc-sdk/shared.proto | 11 +++ 3 files changed, 29 insertions(+), 111 deletions(-) diff --git a/text/0005-media-webrtc-sdk.md b/text/0005-media-webrtc-sdk.md index 25dd227..c7ba07d 100644 --- a/text/0005-media-webrtc-sdk.md +++ b/text/0005-media-webrtc-sdk.md @@ -496,17 +496,19 @@ message Response { ``` ```proto +module shared { + message Sender { + enum Status { + ACTIVE = 0; + INACTIVE = 1; + } + } +} + message ServerEvent { message Sender { message State { - enum StateType { - WAITING = 0; - NO_SOURCE = 1; - ACTIVE = 2; - INACTIVE = 3; - } - - StateType state = 1; + shared.Sender.Status status = 1; } string name = 1; @@ -529,29 +531,8 @@ This event is sent by the server when the state of the sender is changed. This i Sender states is explained below: -- Waiting: The sender is pinned but server dont received any media data. -- No-source: The sender is not pinned to any source. - Active: The sender is active, and server is receiving media data. -- Inactive: The sender is pinned but . - -```mermaid -graph LR - P1[create with source] - P2[create without source] - W[waiting] - NS[no-source] - A[active] - I[inactive] - P1 --> W - P2 --> NS - W -->|media data| A - A -->|toggle false| I - I -->|toggle true| W - NS -->|switch| W - W -->|switch none| NS - A -->|switch none| NS - I -->|switch none| NS -``` +- Inactive: Server hasn't received media data for a long time, currently is 2 seconds. ### 4.6.4 Receiver create/release, actions @@ -607,38 +588,12 @@ message Response { message ServerEvent { message Receiver { message State { - enum StateType { - NO_SOURCE = 0; - WAITING = 1; - LIVE = 2; - INACTIVE = 3; - } - - StateType state = 1; - } - - message Stats { - message Source { - uint32 bitrate_kbps = 1; - float rtt = 2; - float lost = 3; - float jitter = 4; - } - - message Transmit { - uint32 spatial = 1; - uint32 temporal = 2; - uint32 bitrate_kbps = 3; - } - - optional Source source = 1; - optional Transmit transmit = 2; + shared.Receiver.Status status = 1; } string name = 1; oneof event { State state = 2; - Stats stats = 3; } } } @@ -654,44 +609,11 @@ We can provide new config when UI changed for updating priority and limit of qua ### 4.6.5.3 State updated event -Receiver states is explained below: - -- `no_source`: The receiver is created but not pinned to any source. -- `waiting`: The receiver is pinned but does not received media data. -- `live`: The receiver is live. -- `key_only`: The receiver is live but only receives key frames, which may be for speed limiting purposes. -- `inactive`: The receiver is pinned but does not have enough bandwidth to receive. - -```mermaid -graph LR - NS[no-source] - W[waiting] - L[live] - KO[key-only] - I[inactive] - NS -->|switch| W - W -->|media data + bandwidth level 2| L - W -->|media data + bandwidth level 1| KO - L -->|speed limit| KO - W -->|media data + no bandwidth| I - L -->|no bandwidth| I - KO -->|no bandwidth| I - W -->|switch none| NS - L -->|switch none| NS - I -->|switch none| NS - KO -->|switch none| NS - I --> |bandwidth level 2| L - I --> |bandwidth level 1| KO - KO --> |bandwidth level 2| L - W --> |Source lost| NS - L --> |Source lost| NS - KO --> |Source lost| NS - I --> |Source lost| NS -``` - -### 4.6.5.3 Receiver stats event +Receiver's status is explained below: -Stats information can be used for show current issues for viewer, which can provide more useful about source of the issues. +- `waiting`: The receiver is pinned and waiting for media data. +- `active`: The receiver is live. +- `inactive`: The receiver hasn't media data for a long time, currently is 2 seconds. ## 4.7 Features diff --git a/text/0005-media-webrtc-sdk/conn.proto b/text/0005-media-webrtc-sdk/conn.proto index 6959b07..b49e219 100644 --- a/text/0005-media-webrtc-sdk/conn.proto +++ b/text/0005-media-webrtc-sdk/conn.proto @@ -267,14 +267,7 @@ message ServerEvent { message Sender { message State { - enum StateType { - WAITING = 0; - NO_SOURCE = 1; - ACTIVE = 2; - INACTIVE = 3; - } - - StateType state = 1; + shared.Sender.Status status = 1; } string name = 1; @@ -285,15 +278,7 @@ message ServerEvent { message Receiver { message State { - enum StateType { - NO_SOURCE = 0; - WAITING = 1; - LIVE = 2; - KEY_ONLY = 3; - INACTIVE = 4; - } - - StateType state = 1; + shared.Receiver.Status status = 1; } message Stats { diff --git a/text/0005-media-webrtc-sdk/shared.proto b/text/0005-media-webrtc-sdk/shared.proto index 68f48ce..edb70f7 100644 --- a/text/0005-media-webrtc-sdk/shared.proto +++ b/text/0005-media-webrtc-sdk/shared.proto @@ -8,6 +8,12 @@ enum Kind { } message Receiver { + enum Status { + WAITING = 0; + ACTIVE = 1; + INACTIVE = 2; + } + message Source { string peer = 1; string track = 2; @@ -32,6 +38,11 @@ message Receiver { } message Sender { + enum Status { + ACTIVE = 0; + INACTIVE = 1; + } + message Source { string id = 1; bool screen = 2;