From 5b3bf8d6bf1c7f125cded466bccec6722f2b4488 Mon Sep 17 00:00:00 2001 From: Joe Bryan Date: Thu, 31 Aug 2023 23:38:44 -0400 Subject: [PATCH 01/17] network: fully specifies packet format --- UIPS/UIP-0113.md | 111 +++++++++++++++++++++++++++++++---------------- 1 file changed, 73 insertions(+), 38 deletions(-) diff --git a/UIPS/UIP-0113.md b/UIPS/UIP-0113.md index a82d723..a2cbf78 100644 --- a/UIPS/UIP-0113.md +++ b/UIPS/UIP-0113.md @@ -55,44 +55,79 @@ A %poke message can be resolved by dereferencing the request path via arvo's `+p Packets recapitulate the structure of messages exactly, specifying their precise serialization and fragmentation, with sufficient metadata (and sized for) straightforward, maximal deliverabity over current networks. -Every packet has a maximum size of 1.472 bytes, and begins with a 4 byte header. The header must encode (in no particular order): - -- the protocol version (3 bits) -- the packet type (2 bits) -- the publisher rank (2 bits, only for requests) -- a truncated mug checksum (20 bits) -- hopcount (5-7 bits available) - - The remaining 1.468 bytes are allocated as follows: - -- %peek: { publisher[<=16], tag-byte[1], path[<=328] } -- %poke: { publisher[<=16], path[<=328], total[4], authenticator[96], fragment[<=1024] } -- %page: { next-hop[6], path[<=328], total[4], authenticator[96], fragment[<=1024] } - -These values are as follows: - -- publisher: ship from the root of the request type, variable length, based on rank in header -- %peek tag-byte: future-proofing for signed requests, possibly hash-based exclusions -- path: length-prefixed (2 bytes) request path, with 4-byte fragment number - - in the case of %poke: the request and payload paths are concatenated -- total: number of fragments in %page response (max size: 4TiB) -- authenticator: see discussion below -- fragment: bloq 13 slice of the serialized message - -outstanding questions: - -- precise order and interpretation of header bits - - hopcount semantics (actual + max; saturating counter?) -- authenticator details -- fixed length limits - - %page - - max path length could increase by ten - - or next-hop could increase by 12 bytes and prepare for ipv6 (offsetting path) - - %poke structure - - currently, request and payload paths are concatenated to simplify max-length calculation - - if specified as variable length - - could be two separate paths - - or simply two concatenated packets (%peek and %page) +Every packet has a 4-byte header: + +- 2 bits reserved +- 2 bits next-hop + - 0b00 - no next-hop + - 0b01 - one, 6-byte next-hop at the end + - 0b10 - one, single-byte-length-prefixed next-hop at the end + - 0b11 - multiple single-byte-length-prefixed next-hops at the end +- 3 bits for the protocol version + - 0-7 +- 2 bits for the packet type + - 0b00 - reserved (for pine, or open-ended packet-type-tag in body?) + - 0b01 - response %page + - 0b10 - request %peek + - 0b11 - request %poke +- 3 bits for saturating hopcount + - 0-6 precise, 7 is >= +- 20-bit truncated mug + - least-significant bits + +Every packet has a theoretical limit of XX, but a practical limit of 1.472 bytes (1500-byte de-facto MTU due to ethernet frame size). In practice, the remaining 1.468 bytes are allocated as follows: + +- %peek: { encoded-path, optional-peek-attributes } +- %page: { encoded-response, optional-page-attributes } +- %poke: { peek, page } + +- encoded-path: { meta-byte, ship, path-length, path, bloq, fragment-number } + - meta-byte: { reserved[2], rank[2], path-length-length[1], bloq-length[1], fragment-length[2] } + - reserved: 2 bits + - rank: 2 bits + - `(dec (met 0 (met 4 ship)))` + - path-length-length: 1 bit + - bloq length: 1 bit + - fragment number length: 2 bits + - `(met 3 fragment-number)` + - ship: encoded in `(bex +(rank))` bytes + - path-length: encoded in `(bex path-length-length)` bytes + - path: serialized without leading slash, path-length bytes + - bloq: + - bloq-length 0b0: implicit bloq size 13, 0 bytes used + - bloq-length 0b1: bloq size encoded in 1 byte + - fragment-number: 1-4 bytes +- optional-peek-attributes: + - undefined +- encoded-response: { meta-byte, total-bits, authentication-length, authentication, fragment-length, fragment } + - meta-byte: { total-bits-length[2], authentication-length-length[1], fragment-length[5] } + - total-bits-length: 2 bits + - `(dec (met 0 (met 3 total-bits)))` + - authentication-length-length: 1 bit + - fragment-length: 5 bits + - total-bits: message level bit-length + - encoded in `(bex total-bits-length)` bytes + - authentication-length: 0-1 bytes + - authentication: { tag[1], value[0-254] } + - tag: + - ed25519 signature + - digest (blake3) + - hmac (blake3) + - intermediate blake3 parent nodes + - combination signature + root hash + - ... + - fragment-length: 0-32 bytes + - fragment: 0-2^2^32 bytes (???) +- optional-page-attributes: next-hop +- next-hop: + - dependant on header bits + - 0b00: not-present + - 0b01: 6 bytes, no length tag + - 0b10: { length[1], next-hop[length] } + - 0b11: multiple length-prefixed next-hops + - XX length-prefixed values should have tag bytes + +XX: revise length limits sane relative to each other #### routing topology and mechanisms From fb3fb7ed458191b045e788c05458fcf097cff456 Mon Sep 17 00:00:00 2001 From: Joe Bryan Date: Tue, 19 Sep 2023 15:47:12 -0400 Subject: [PATCH 02/17] network: more packet format TODOs --- UIPS/UIP-0113.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/UIPS/UIP-0113.md b/UIPS/UIP-0113.md index a2cbf78..afd981e 100644 --- a/UIPS/UIP-0113.md +++ b/UIPS/UIP-0113.md @@ -128,6 +128,8 @@ Every packet has a theoretical limit of XX, but a practical limit of 1.472 bytes - XX length-prefixed values should have tag bytes XX: revise length limits sane relative to each other +XX: next-hop only on response +XX: total-bits only on first response #### routing topology and mechanisms From 26bbee65885db1ce365e9f5d0264a3827098500a Mon Sep 17 00:00:00 2001 From: Joe Bryan Date: Tue, 19 Sep 2023 15:47:27 -0400 Subject: [PATCH 03/17] network: link to wip --- UIPS/UIP-0113.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/UIPS/UIP-0113.md b/UIPS/UIP-0113.md index afd981e..ac29191 100644 --- a/UIPS/UIP-0113.md +++ b/UIPS/UIP-0113.md @@ -322,6 +322,9 @@ These features and changes may well be developed and released in the opposite or Existing %fine packets can be routed statefully in this model, but can only be multicast with each other (unless relays can convert bedirectionally between old and new structures). Existing %ames packets will continue be routed stateless, as (mutual) requests. +## Reference Implementation + +Initial types, packet en/de-coding, and associated prototypes are on [`jb/dire`](https://github.com/urbit/urbit/compare/f58fc8b4628...jb/dire). ## Security Considerations From 3edfe80694acab79ae9c4b1f5c4bbb9f33332db9 Mon Sep 17 00:00:00 2001 From: Joe Bryan Date: Tue, 19 Sep 2023 15:47:27 -0400 Subject: [PATCH 04/17] network: revise total fragments and fragment max-length --- UIPS/UIP-0113.md | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/UIPS/UIP-0113.md b/UIPS/UIP-0113.md index ac29191..61d2720 100644 --- a/UIPS/UIP-0113.md +++ b/UIPS/UIP-0113.md @@ -99,14 +99,14 @@ Every packet has a theoretical limit of XX, but a practical limit of 1.472 bytes - fragment-number: 1-4 bytes - optional-peek-attributes: - undefined -- encoded-response: { meta-byte, total-bits, authentication-length, authentication, fragment-length, fragment } - - meta-byte: { total-bits-length[2], authentication-length-length[1], fragment-length[5] } - - total-bits-length: 2 bits - - `(dec (met 0 (met 3 total-bits)))` +- encoded-response: { meta-byte, total-fragments, authentication-length, authentication, fragment-length, fragment } + - meta-byte: { total-fragments-length[2], authentication-length-length[1], fragment-length[5] } + - total-fragments-length: 2 bits + - `(dec (met 3 total-fragments))` - authentication-length-length: 1 bit - fragment-length: 5 bits - - total-bits: message level bit-length - - encoded in `(bex total-bits-length)` bytes + - total-fragments: message level fragment-length + - encoded in `1-4` bytes - authentication-length: 0-1 bytes - authentication: { tag[1], value[0-254] } - tag: @@ -117,7 +117,7 @@ Every packet has a theoretical limit of XX, but a practical limit of 1.472 bytes - combination signature + root hash - ... - fragment-length: 0-32 bytes - - fragment: 0-2^2^32 bytes (???) + - fragment: 0 - 2^252 bytes - optional-page-attributes: next-hop - next-hop: - dependant on header bits @@ -127,7 +127,6 @@ Every packet has a theoretical limit of XX, but a practical limit of 1.472 bytes - 0b11: multiple length-prefixed next-hops - XX length-prefixed values should have tag bytes -XX: revise length limits sane relative to each other XX: next-hop only on response XX: total-bits only on first response From a5d2ff03c0d03e1db72db4845330b2ea98c8b7ec Mon Sep 17 00:00:00 2001 From: Joe Bryan Date: Fri, 15 Dec 2023 15:55:23 -0500 Subject: [PATCH 05/17] network: add cookie --- UIPS/UIP-0113.md | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/UIPS/UIP-0113.md b/UIPS/UIP-0113.md index 61d2720..6cc2ada 100644 --- a/UIPS/UIP-0113.md +++ b/UIPS/UIP-0113.md @@ -55,7 +55,7 @@ A %poke message can be resolved by dereferencing the request path via arvo's `+p Packets recapitulate the structure of messages exactly, specifying their precise serialization and fragmentation, with sufficient metadata (and sized for) straightforward, maximal deliverabity over current networks. -Every packet has a 4-byte header: +Every packet has a 8-byte header. The first 4 bytes are used as follows: - 2 bits reserved - 2 bits next-hop @@ -75,6 +75,11 @@ Every packet has a 4-byte header: - 20-bit truncated mug - least-significant bits +The next 4 bytes are a constant token (or "cookie") to identify the ames protocol suite: + +- 4-byte cookie + - `~tasfyn-partyv` ie `0x51ad.1d5e` ie `{ 0x5e, 0x1d, 0xad, 0x51 }`. + Every packet has a theoretical limit of XX, but a practical limit of 1.472 bytes (1500-byte de-facto MTU due to ethernet frame size). In practice, the remaining 1.468 bytes are allocated as follows: - %peek: { encoded-path, optional-peek-attributes } From c8562db4f57381f3715680181bb78a4aaa4c0023 Mon Sep 17 00:00:00 2001 From: Ted Blackman Date: Fri, 5 Jan 2024 14:29:17 -0500 Subject: [PATCH 06/17] copy motivation section from roadmap.urbit.org --- UIPS/UIP-0113.md | 38 +++++++++++++++++++++++++++++++++++++- 1 file changed, 37 insertions(+), 1 deletion(-) diff --git a/UIPS/UIP-0113.md b/UIPS/UIP-0113.md index 6cc2ada..682d03f 100644 --- a/UIPS/UIP-0113.md +++ b/UIPS/UIP-0113.md @@ -11,8 +11,44 @@ created: ~2023.8.9 ## Abstract -Imposing a request/response discipline on all %ames messages and packets provides legibility at every layer, making the entire network easier to reason about, extending the urbit network to new use-cases and enabling a system that is more stable, reliable, and scalable. +The "directed messaging" project consists of a full-stack rewrite of Urbit's networking intended for much higher (100-1000x) throughput and increased connection stability. The "directed" term refers to the directedness of the connection between a request and a response, i.e. the request goes in one direction and the response goes in the opposite direction. +By imposing a request/response discipline on all %ames messages and packets, this design provides legibility at every layer, making the entire network stack easier to reason about, extending the Urbit network to new use-cases and enabling a system that is more stable, reliable, and scalable. + +At a high level: + +- Stateful relays (ie, galaxies, stars) alleviate publishers of any routing responsibilty and make peer-discovery reliable. (This works particularly well with multi-hop forwarding through the sponsorship hierarchy.) +- Packet legibility enables trivial request/response and packet/message correlation. (A fully-qualified, global, referentially-transparent namespace path uniquely identifies each response packet, and is trivially upgraded to the path of a semantic, complete network message.) +- The request/response discipline locates all congestion control and retransmission at the client edge, where congestion is most likely. +- Trivial packet-to-message correlation enables alternate transport and off-loop "de-packetization" (message assembly), our largest performance wins. +- The entirety of the publisher-side implementation is a layering of small, stateless functions. +- Existing semantics and interfaces for both %ames and fine can be straightforwardly rebased onto this model. + +## Motivation + +As of 2023 Urbit has two communication protocols: Ames, for sending commands and receiving acknowledgments; and Fine, a "remote scry protocol" for reading data out of other ships -- specifically, out of their scry namespaces. Both protocols are forms of one-to-one communication between two Urbit nodes. They both use Urbit's "galaxy" supernodes for peer discovery, and for packet relaying in case the nodes are behind firewalls. + +The designs and implementations of these protocols impose enormous performance overhead, causing them to both have quite low throughput: on the order of a megabit per second, even on a gigabit internet connection. The directed messaging project removes these overheads almost entirely, bringing the throughput much closer to the maximum supported by the underlying hardware. + +The first change this new design makes is to tag every packet as either a request or a response. For reads, a request contains a "scry path" (Urbit's version of a URL), and a response contains the data that path refers to. For writes (commands), a request contains the contents of the command, and a response contains an acknowledgment. Tagging each packet this way enables "directed routing". + +Directed routing uses a simplification of the routing design from Van Jacobson's [Named Data Networking](https://named-data.net/project/) project, in which a request is routed to the next relay by looking at its request path (in Urbit's case, a scry path), and a response is routed back to wherever the request came from. This means a response takes the exact reverse path through the network of the request that it satisfies. + +Directed routing has a number of advantages over Ames's current routing, which has proven complex and finicky. The biggest advantage is that by applying the request/response constraint, the whole routing probllem is made simpler and easier, making the routing semantics in turn easier to reason about and implement correctly. For example, unlike in the current design, publisher nodes never need to persist routes to subscriber nodes, since each subscriber remembers the route to the publisher and the publisher can just hold the immediate transport address (IP+port) it heard a request packet from temporarily to route responses back to it; the other relays will handle returning the response to the original requesting node. + +The second big change in this new design is to assign a scry path to every packet and message -- this explicitly lays out all packets and messages in Urbit's scry namespace. One reason to do this is that it allows all large pieces of data to be "pulled" by the node who will receive them, rather than "pushed" by the sending node as in the case now for Ames commands. With everything as pull rather than push, packet-level operations can be abstracted away from senders, which means there can be a single state machine for managing downloads, no matter whether the datum being downloaded is an Ames command to be performed (a write) or another kind of data to be read from another node's namespace (a read). + +This state machine for managing downloads uses a congestion control algorithm to determine how many request packets to send at what rate, to maximize the throughput without overloading the underlying hardware. In all previous protocols, this state machine has been part of the Ames vane (kernel module) inside Urbit's Arvo kernel. Arvo is a transactional system, so every time an Urbit node receives a response packet and wants to figure out how many new requests packets it should send to max out the connection, it must first write the incoming response packet to disk. Since each packet contains a data payload of a kilobyte, this means the system needs to do a disk write for every kilobyte -- absurdly high overhead for a production system. + +By moving all congestion control to pull, the state machine can be moved out of Urbit's kernel into its runtime -- the processor-native code that runs Urbit. In addition to getting rid of the per-packet disk writes, this means Urbit's packet handling will no longer need to engage in Unix interprocess communication, which has some overhead. Nor will it need to run any Nock: the packet processing state machine can be written in a hot loop in C, sidestepping the biggest slowdowns of the current system. + +The final remaining slowness of current Urbit packet processing lies in packet authentication: when a node receives a packet from another node, how does it verify that it was actually the sender node who sent the packet, not a malicious actor who forged a packet to deny service or otherwise interfere with healthy network operation? + +In the current Fine protocol, for performing reads, every packet contains a digital signature. This prevents forgery, but at the cost of multiple milliseconds for each packet -- another slapstick-level performance disaster. + +In directed messaging, a novel packet authentication scheme called "LockStep" reduces packet authentication time to a single Blake3 hash operation, which is orders of magnitude faster. + +These changes combine to ensure bulletproof peer-to-peer routing and low performance overhead, getting Urbit out of the way and letting application programmers make effective use of the networking hardware. ## Specification From 5c65068ecb915b15a5208895dde8490d72f0d730 Mon Sep 17 00:00:00 2001 From: Ted Blackman Date: Fri, 5 Jan 2024 14:32:21 -0500 Subject: [PATCH 07/17] copy motivation section from roadmap.urbit.org --- UIPS/UIP-0113.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/UIPS/UIP-0113.md b/UIPS/UIP-0113.md index 682d03f..fefa5fa 100644 --- a/UIPS/UIP-0113.md +++ b/UIPS/UIP-0113.md @@ -85,7 +85,7 @@ A %poke messages pushes a %page message from one requester to requestee, specify A %poke message can be resolved by dereferencing the request path via arvo's `+peek` arm on the publish ship. If the path can be resolved, the message has been processed and the result is the response %page. If not, the %poke must be injected as an event. A valid %poke message produces exactly one %page response -- crashing requests are converted to negative acknowledgments. -%poke generalizes %ames' %plea and %boon messages, simultaneously rendering "nacksplanations" superfluous. +%poke generalizes %ames' %plea (command) and %boon (response) messages. Making the acknowledgment a scry response instead of a bespoke packet type allows for multi-packet error messages, rendering Ames's current "nacksplanations" machinery superfluous. #### packet structure and semantics From 9c01a7868e913e1ae95aa4faf91837cefd271530 Mon Sep 17 00:00:00 2001 From: Ted Blackman Date: Fri, 5 Jan 2024 15:10:51 -0500 Subject: [PATCH 08/17] directed messaging: expand poke paragraph --- UIPS/UIP-0113.md | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/UIPS/UIP-0113.md b/UIPS/UIP-0113.md index fefa5fa..f537034 100644 --- a/UIPS/UIP-0113.md +++ b/UIPS/UIP-0113.md @@ -65,7 +65,7 @@ A new message layer is introduced, unifying %ames and fine. %ames' existing flow ##### %peek: `path` -A %peek message is a request for a %page at a path in the global namespace. It is unauthenticated and anonymous. In the future, request authentication could be used to gate access to computational resources and enable QoS, but never for access control to data itself. +A %peek message is a read request: a request for a %page at a path in the global namespace. It is unauthenticated and anonymous. In the future, request authentication could be used to gate access to computational resources and enable QoS, but never for access control to data itself. %peek can be injected as an event, but must not change formal state. It should be handled by dereferencing the request path via arvo's `+peek` arm on the publishing ship. A %peek message produces at-most-one %page response -- blocking/crashing requests are dropped. @@ -73,7 +73,7 @@ A %peek message is a request for a %page at a path in the global namespace. It i ##### %page: `[oath path page]` -A %page message is a public, authenticated binding of marked data in the urbit namespace -- in practical terms, a response to a %peek or %poke request. The binding between path and data must never change, violation of this principle should be shared widely and have severe reputational consequences for the offending party. +A %page message is a read response: a public, authenticated binding of marked data in the urbit namespace -- in practical terms, a response to a %peek or %poke request. The binding between path and data must never change, violation of this principle should be shared widely and have severe reputational consequences for the offending party. A %page message is injected as an event. If it correlates to an outstanding request -- via an exact match of its path -- it is processed with arbitrary stateful semantics. If it does not correlate, it is silently dropped. @@ -81,12 +81,19 @@ A %page message is injected as an event. If it correlates to an outstanding requ ##### %poke: `[path oath path page]` -A %poke messages pushes a %page message from one requester to requestee, specifying the path by which the requestee will acknowledge the %poke. It must be straightforward to unambiguously correlate the payload path to the prefix of the request path, such that the authentication of the payload is sufficient to authenticate the request (as is trivially the case for %ames' flows). +A %poke message is a write request: it is used to implement a command+acknowledgment communication protocol, where the requesting ship sends a command and the receiving ship performs the command and replies with an acknowledgment. + +The poke protocol is initiated by the requesting ship sending a %peek to the receiving ship to try to read the acknowledgment out of the receiver's namespace. The receiving ship recognizes this request as a poke and injects it as a stateful Arvo event rather than trying to read it out of Arvo the way it would usually do for a %peek packet. To process the request, the receiving Arvo emits a new %peek request back to the sending ship, to fetch the command datum out of the sender's namespace. Once the full datum has been downloaded, the receiving ship attempts to perform the request. If it succeeds, it sends a positive acknowledgment; if it fails, it sends a "negative" acknowledgment containing an error message. + +An optimization for small messages is also included in the protocol: the first packet of the initiating peek request also includes the first fragment of the command datum. If the command is under a kilobyte, then the entire command is included in the first packet, recovering the long-standing property of Ames protocols that small commands are processed and acknowledged in a single network roundtrip. + +A %poke pushes a %page message from one requester to requestee, specifying the path by which the requestee will acknowledge the %poke. It must be straightforward to unambiguously correlate the payload path to the prefix of the request path, such that the authentication of the payload is sufficient to authenticate the request (as is trivially the case for %ames' flows). A %poke message can be resolved by dereferencing the request path via arvo's `+peek` arm on the publish ship. If the path can be resolved, the message has been processed and the result is the response %page. If not, the %poke must be injected as an event. A valid %poke message produces exactly one %page response -- crashing requests are converted to negative acknowledgments. %poke generalizes %ames' %plea (command) and %boon (response) messages. Making the acknowledgment a scry response instead of a bespoke packet type allows for multi-packet error messages, rendering Ames's current "nacksplanations" machinery superfluous. + #### packet structure and semantics Packets recapitulate the structure of messages exactly, specifying their precise serialization and fragmentation, with sufficient metadata (and sized for) straightforward, maximal deliverabity over current networks. From e7e7016f4599b6746ccc0f1e178883ec8a0a090d Mon Sep 17 00:00:00 2001 From: Ted Blackman Date: Fri, 5 Jan 2024 17:02:32 -0500 Subject: [PATCH 09/17] directed messaging: add client state machine --- UIPS/UIP-0113.md | 49 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 49 insertions(+) diff --git a/UIPS/UIP-0113.md b/UIPS/UIP-0113.md index f537034..45f45ae 100644 --- a/UIPS/UIP-0113.md +++ b/UIPS/UIP-0113.md @@ -94,6 +94,55 @@ A %poke message can be resolved by dereferencing the request path via arvo's `+p %poke generalizes %ames' %plea (command) and %boon (response) messages. Making the acknowledgment a scry response instead of a bespoke packet type allows for multi-packet error messages, rendering Ames's current "nacksplanations" machinery superfluous. +## Modules + +The major logical modules involved in the system are the following: +- packet state machine (client, vane + driver) +- message state machine (client, vane) +- flow state machine (both client and server, vane) +- publisher namespace (server, vane) +- relay (driver, vane possible) + +## Client State Machine + +This module is responsible for performing a request for data at a scry path on another ship. Its caller supplies the path to be resolved, and when the module has finished resolving that path, it delivers the data bound to that path back to the caller. + +To perform the request, the module issues as many request packets as needed to retrieve all the fragments of the response message. It is responsible for re-sending each packet on a timer until a response is received. The formal state machine in the Ames vane is specified with minimal timing details, just enough to ensure packet-by-packet progress toward a complete message. A production-level implementation of the Ames I/O driver in the runtime will need to implement a congestion control algorithm to achieve high bandwidth. + +The general pattern of interaction between the Ames vane and a production-level I/O driver is that Ames will initiate a request by encoding and emitting the first packet of a request to the driver, and setting a coarse packet re-send timer. This coarse timer will be something like a one-second initial re-send backoff, doubling on each timeout up to a max of two minutes. The driver, when it hears this first packet, will implicitly assume responsibility for congestion control, i.e. setting packet re-send timers, tracking statistics about the communication channel, and using the statistics to determine how many further request packets to emit at what times. + +There is some minor extra overhead from doubling the timers -- Ames's coarse timer might fire a few times during the download of a large message. We do not expect serious performance issues from this, however. + +This module sends request packets, receives response packets and messages, sets timeouts for request packets, responds to packets timing out by re-sending the packets, and implements congestion control. It also delivers fully assembled response messages back to its clients. + +State Transitions: +- on-poke + - if our ack-path + - create request for payload-path (at message level) and begin processing as if a %page + - XX inline auth packet if required, ensure synchronous validation of first +- on-page + - check for pending request (peek|poke) + - if none, drop + - if first fragment + - authenticate + - initialize hash-tree + - request auth-packet if necessray + - else if auth-packet + - finish initializing hash-tree + - else + - if out-of-order, stash to process later or just drop + - incorporate incremental hash update into hash-tree + - validate fragment via hash-tree + - save fragment + - if incomplete, request next fragment + - else produce completed message +- send requests + + +## Next Module (TODO) + +TODO + #### packet structure and semantics Packets recapitulate the structure of messages exactly, specifying their precise serialization and fragmentation, with sufficient metadata (and sized for) straightforward, maximal deliverabity over current networks. From bb69508eb28211cfbe1f9ab4fe8fa59030ba08fc Mon Sep 17 00:00:00 2001 From: Joe Bryan Date: Fri, 5 Jan 2024 17:04:25 -0500 Subject: [PATCH 10/17] adds rift to encoded path --- UIPS/UIP-0113.md | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/UIPS/UIP-0113.md b/UIPS/UIP-0113.md index 45f45ae..2581dcb 100644 --- a/UIPS/UIP-0113.md +++ b/UIPS/UIP-0113.md @@ -178,16 +178,19 @@ Every packet has a theoretical limit of XX, but a practical limit of 1.472 bytes - %page: { encoded-response, optional-page-attributes } - %poke: { peek, page } -- encoded-path: { meta-byte, ship, path-length, path, bloq, fragment-number } - - meta-byte: { reserved[2], rank[2], path-length-length[1], bloq-length[1], fragment-length[2] } +- encoded-path: { meta-byte, ship, rift, path-length, path, bloq, fragment-number } + - meta-byte: { rank[2], rift-length[2], path-length-length[1], bloq-length[1], fragment-length[2] } - reserved: 2 bits - rank: 2 bits - `(dec (met 0 (met 4 ship)))` + - rift-length: 2 bits + - `(dec (met 3 rift))` - path-length-length: 1 bit - bloq length: 1 bit - fragment number length: 2 bits - `(met 3 fragment-number)` - ship: encoded in `(bex +(rank))` bytes + - rift: encoded in `1-4` bytes - path-length: encoded in `(bex path-length-length)` bytes - path: serialized without leading slash, path-length bytes - bloq: From ae990180d26977e1bae37635fd35db4fb2af0340 Mon Sep 17 00:00:00 2001 From: Ted Blackman Date: Sun, 7 Jan 2024 15:12:46 -0500 Subject: [PATCH 11/17] directed messaging: flesh out client state machine --- UIPS/UIP-0113.md | 28 +++++++++++++++++++++++----- 1 file changed, 23 insertions(+), 5 deletions(-) diff --git a/UIPS/UIP-0113.md b/UIPS/UIP-0113.md index 2581dcb..5cecad2 100644 --- a/UIPS/UIP-0113.md +++ b/UIPS/UIP-0113.md @@ -1,4 +1,4 @@ ---- +-a-- uip: 0113 title: "%ames: Directed Messaging" description: a request/response discipline for the network. @@ -91,7 +91,7 @@ A %poke pushes a %page message from one requester to requestee, specifying the p A %poke message can be resolved by dereferencing the request path via arvo's `+peek` arm on the publish ship. If the path can be resolved, the message has been processed and the result is the response %page. If not, the %poke must be injected as an event. A valid %poke message produces exactly one %page response -- crashing requests are converted to negative acknowledgments. -%poke generalizes %ames' %plea (command) and %boon (response) messages. Making the acknowledgment a scry response instead of a bespoke packet type allows for multi-packet error messages, rendering Ames's current "nacksplanations" machinery superfluous. +%poke generalizes %ames' %plea (command) and %boon (response) messages, without changing the interface presented to other vanes. Making the acknowledgment a scry response instead of a bespoke packet type allows for multi-packet error messages, rendering Ames's current "nacksplanations" machinery superfluous. ## Modules @@ -103,17 +103,35 @@ The major logical modules involved in the system are the following: - publisher namespace (server, vane) - relay (driver, vane possible) + client is broken up into three logical modules: +- message-level state machine +- formal packet-level state machine +- I/O driver packet accelerator (congestion control) + + + ## Client State Machine This module is responsible for performing a request for data at a scry path on another ship. Its caller supplies the path to be resolved, and when the module has finished resolving that path, it delivers the data bound to that path back to the caller. To perform the request, the module issues as many request packets as needed to retrieve all the fragments of the response message. It is responsible for re-sending each packet on a timer until a response is received. The formal state machine in the Ames vane is specified with minimal timing details, just enough to ensure packet-by-packet progress toward a complete message. A production-level implementation of the Ames I/O driver in the runtime will need to implement a congestion control algorithm to achieve high bandwidth. -The general pattern of interaction between the Ames vane and a production-level I/O driver is that Ames will initiate a request by encoding and emitting the first packet of a request to the driver, and setting a coarse packet re-send timer. This coarse timer will be something like a one-second initial re-send backoff, doubling on each timeout up to a max of two minutes. The driver, when it hears this first packet, will implicitly assume responsibility for congestion control, i.e. setting packet re-send timers, tracking statistics about the communication channel, and using the statistics to determine how many further request packets to emit at what times. +The general pattern of interaction between the Ames vane and a production-level I/O driver is that Ames will initiate a request by encoding and emitting the first packet of a request to the driver, then it will send one packet at a time, re-sending on a timer, until it receives the corresponding response packet. + +Ames will re-send any unacknowledged packets on a single repeating global timer that re-sends all unacknowledged packets at every two-minute interval. This uses the same design as current Ames's "dead flow" timer, where a dead flow is a flow that has not received any response packets for 30 seconds. All flows will be formally treated the same as dead flows in this new design -- live flows will be recognized and accelerated by the runtime using congestion control. + +The driver, when it hears a response packet over the wire, will recognize the flow as live and implicitly assume responsibility for congestion control, i.e. setting fine-grained packet re-send timers, tracking statistics about the communication channel, and using the statistics to determine how many further request packets to emit at what times. Once the driver has received a complete message, it injects the message as a single Arvo event. This clears any packet-level state in Ames for that message. + +Note that because a production I/O driver intercepts them, Arvo will not hear response packets for a message until the message has been completed. This means that if Vere is shut down and restarted, it will not remember the message and it will restart the download from scratch. + +When the I/O driver receives the first response packet for a message, it could inject the packet into Arvo as an event. A production-level I/O driver will likely opt instead for scrying into Arvo to ask Arvo to validate any HMAC on the first packet. The I/O driver does not contain any private keys in its state, as a security boundary, so it needs to ask Arvo to use its keys. This IPC request to scry into Arvo can be fired off synchronously at the same time as congestion control sends the next few request packets, to prevent unneeded latency. + +Later the driver will hear the scry response from Arvo. If the packet validates, then the driver will update its state on this message to reflect the confirmation. If it fails to validate, the driver will cancel its attempt to download this message, removing all state regarding it. Any duplicate responses received after cancellation are dropped. The flow is now considered dead, so Ames will take over re-sending once every two minutes. + +Keeping congestion control in the driver and out of the formal model allows improvement in congestion control design without modifying the Ames protocol or any code in the kernel. Even at Kelvin zero, a clever new congestion control algorithm could emerge and gain prominence. -There is some minor extra overhead from doubling the timers -- Ames's coarse timer might fire a few times during the download of a large message. We do not expect serious performance issues from this, however. +There is potentially minor extra overhead from duplicate timers: Ames's global dead flow re-send timer might fire a few times during the download of a very large message. We do not expect noticeable performance issues from this, and notably, the Ames vane will only ever have one timer set in this design, as compared to thousands on some ships with current Ames. -This module sends request packets, receives response packets and messages, sets timeouts for request packets, responds to packets timing out by re-sending the packets, and implements congestion control. It also delivers fully assembled response messages back to its clients. State Transitions: - on-poke From 59b51612a7a1333bb6bfc794f16b71697086b91b Mon Sep 17 00:00:00 2001 From: Ted Blackman Date: Sun, 7 Jan 2024 15:13:42 -0500 Subject: [PATCH 12/17] directed messaging: minor typos --- UIPS/UIP-0113.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/UIPS/UIP-0113.md b/UIPS/UIP-0113.md index 5cecad2..dc77aad 100644 --- a/UIPS/UIP-0113.md +++ b/UIPS/UIP-0113.md @@ -1,4 +1,4 @@ --a-- +--- uip: 0113 title: "%ames: Directed Messaging" description: a request/response discipline for the network. @@ -81,7 +81,7 @@ A %page message is injected as an event. If it correlates to an outstanding requ ##### %poke: `[path oath path page]` -A %poke message is a write request: it is used to implement a command+acknowledgment communication protocol, where the requesting ship sends a command and the receiving ship performs the command and replies with an acknowledgment. +A %poke message is a write request: it is used to implement a command+acknowledgment communication protocol, where the requesting ship sends a command and the receiving ship performs the command and replies with an acknowledgment. The poke protocol is initiated by the requesting ship sending a %peek to the receiving ship to try to read the acknowledgment out of the receiver's namespace. The receiving ship recognizes this request as a poke and injects it as a stateful Arvo event rather than trying to read it out of Arvo the way it would usually do for a %peek packet. To process the request, the receiving Arvo emits a new %peek request back to the sending ship, to fetch the command datum out of the sender's namespace. Once the full datum has been downloaded, the receiving ship attempts to perform the request. If it succeeds, it sends a positive acknowledgment; if it fails, it sends a "negative" acknowledgment containing an error message. @@ -120,7 +120,7 @@ The general pattern of interaction between the Ames vane and a production-level Ames will re-send any unacknowledged packets on a single repeating global timer that re-sends all unacknowledged packets at every two-minute interval. This uses the same design as current Ames's "dead flow" timer, where a dead flow is a flow that has not received any response packets for 30 seconds. All flows will be formally treated the same as dead flows in this new design -- live flows will be recognized and accelerated by the runtime using congestion control. -The driver, when it hears a response packet over the wire, will recognize the flow as live and implicitly assume responsibility for congestion control, i.e. setting fine-grained packet re-send timers, tracking statistics about the communication channel, and using the statistics to determine how many further request packets to emit at what times. Once the driver has received a complete message, it injects the message as a single Arvo event. This clears any packet-level state in Ames for that message. +The driver, when it hears a response packet over the wire, will recognize the flow as live and implicitly assume responsibility for congestion control, i.e. setting fine-grained packet re-send timers, tracking statistics about the communication channel, and using the statistics to determine how many further request packets to emit at what times. Once the driver has received a complete message, it injects the message as a single Arvo event. This clears any packet-level state in Ames for that message. Note that because a production I/O driver intercepts them, Arvo will not hear response packets for a message until the message has been completed. This means that if Vere is shut down and restarted, it will not remember the message and it will restart the download from scratch. From 0f1bbda160dfab554b06bb2b9e685ba2fd4673e8 Mon Sep 17 00:00:00 2001 From: Ted Blackman Date: Sun, 7 Jan 2024 15:14:44 -0500 Subject: [PATCH 13/17] directed messaging: minor edit --- UIPS/UIP-0113.md | 7 ------- 1 file changed, 7 deletions(-) diff --git a/UIPS/UIP-0113.md b/UIPS/UIP-0113.md index dc77aad..19bc02a 100644 --- a/UIPS/UIP-0113.md +++ b/UIPS/UIP-0113.md @@ -103,13 +103,6 @@ The major logical modules involved in the system are the following: - publisher namespace (server, vane) - relay (driver, vane possible) - client is broken up into three logical modules: -- message-level state machine -- formal packet-level state machine -- I/O driver packet accelerator (congestion control) - - - ## Client State Machine This module is responsible for performing a request for data at a scry path on another ship. Its caller supplies the path to be resolved, and when the module has finished resolving that path, it delivers the data bound to that path back to the caller. From 52379c4119b4f8fd3f71df10d5d07d9d86063a2a Mon Sep 17 00:00:00 2001 From: Ted Blackman Date: Sun, 7 Jan 2024 15:15:20 -0500 Subject: [PATCH 14/17] directed messaging: another minor edit --- UIPS/UIP-0113.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/UIPS/UIP-0113.md b/UIPS/UIP-0113.md index 19bc02a..357cbb0 100644 --- a/UIPS/UIP-0113.md +++ b/UIPS/UIP-0113.md @@ -97,8 +97,7 @@ A %poke message can be resolved by dereferencing the request path via arvo's `+p ## Modules The major logical modules involved in the system are the following: -- packet state machine (client, vane + driver) -- message state machine (client, vane) +- client state machine (client, vane + driver) - flow state machine (both client and server, vane) - publisher namespace (server, vane) - relay (driver, vane possible) From 7666e6edb5ccb529f4b7f2a63999ab5443a07713 Mon Sep 17 00:00:00 2001 From: Joe Bryan Date: Wed, 10 Jan 2024 10:59:33 -0500 Subject: [PATCH 15/17] dire: correct %page packet-spec (include encoded-path) --- UIPS/UIP-0113.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/UIPS/UIP-0113.md b/UIPS/UIP-0113.md index 357cbb0..5a6f847 100644 --- a/UIPS/UIP-0113.md +++ b/UIPS/UIP-0113.md @@ -185,7 +185,7 @@ The next 4 bytes are a constant token (or "cookie") to identify the ames protoco Every packet has a theoretical limit of XX, but a practical limit of 1.472 bytes (1500-byte de-facto MTU due to ethernet frame size). In practice, the remaining 1.468 bytes are allocated as follows: - %peek: { encoded-path, optional-peek-attributes } -- %page: { encoded-response, optional-page-attributes } +- %page: { encoded-path, encoded-response, optional-page-attributes } - %poke: { peek, page } - encoded-path: { meta-byte, ship, rift, path-length, path, bloq, fragment-number } From e245f005af1cc8e3c12e4f9ef2a14c777be6f4c1 Mon Sep 17 00:00:00 2001 From: Joe Bryan Date: Wed, 10 Jan 2024 11:01:21 -0500 Subject: [PATCH 16/17] dire: remove reserved bits from encoded-path meta-byte --- UIPS/UIP-0113.md | 1 - 1 file changed, 1 deletion(-) diff --git a/UIPS/UIP-0113.md b/UIPS/UIP-0113.md index 5a6f847..5c101f6 100644 --- a/UIPS/UIP-0113.md +++ b/UIPS/UIP-0113.md @@ -190,7 +190,6 @@ Every packet has a theoretical limit of XX, but a practical limit of 1.472 bytes - encoded-path: { meta-byte, ship, rift, path-length, path, bloq, fragment-number } - meta-byte: { rank[2], rift-length[2], path-length-length[1], bloq-length[1], fragment-length[2] } - - reserved: 2 bits - rank: 2 bits - `(dec (met 0 (met 4 ship)))` - rift-length: 2 bits From dac5e8036109bc9e059e03db906620a89e092079 Mon Sep 17 00:00:00 2001 From: Joe Bryan Date: Wed, 10 Jan 2024 11:13:57 -0500 Subject: [PATCH 17/17] dire: correct fragment number length-bits --- UIPS/UIP-0113.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/UIPS/UIP-0113.md b/UIPS/UIP-0113.md index 5c101f6..90eef1a 100644 --- a/UIPS/UIP-0113.md +++ b/UIPS/UIP-0113.md @@ -197,7 +197,7 @@ Every packet has a theoretical limit of XX, but a practical limit of 1.472 bytes - path-length-length: 1 bit - bloq length: 1 bit - fragment number length: 2 bits - - `(met 3 fragment-number)` + - `(dec (met 3 fragment-number))` - ship: encoded in `(bex +(rank))` bytes - rift: encoded in `1-4` bytes - path-length: encoded in `(bex path-length-length)` bytes