Skip to content

Comparing Protocols

Thomas Themel edited this page May 18, 2015 · 2 revisions

Comparison of PubSubHubbub to light-pinging protocols

People want a comparison of the concrete differences between fat pinging (PubSubHubbub, XMPP pubsub) and light pinging (rssCloud, XML-RPC pings, changes.xml, SUP, SLAP). This document aims to construct and convey an evaluation of these protocols that's easy to understand.

The core difference is how new information from feeds is delivered from a publisher to a subscriber:

  • Light pings: Send the URL of the feed that has updated to the subscriber.
  • Fat pings: Send the updated content of the feed to the subscriber.

There is also another series of criteria to consider for each protocol. (+) is good, (-) is bad.

Consideration XML-RPC ping changes.xml SUP SLAP XMPP pubsub rssCloud PubSubHubbub
Transport (+) HTTP (+) HTTP (+) HTTP/HTTPS (-)UDP TCP/XMPP (+) HTTP (+) HTTP/HTTPS
Distribution style Ping/Poll (-)Polling (-)Polling Ping/Poll (+) Push Ping/Poll (+) Push
Latency Low (-)High (-)High Low (+) Minimum possible Low (+) Minimum possible
Thundering herd (-)Yes (-)Yes (-)Yes (-)Yes (+) No (-)Yes (+) No
Spamable (no topics) (-)Yes (-)Yes (+) No (+) No (+) No (+) No (+) No
DoSes Publishers Preventable (+) No (+) No Preventable Preventable Preventable Preventable
DoS Relay attacks (-)Yes (+) No (+) No (+) No (+) No (-)Yes (+) No
Possible to implement on $5/month hosting (-)No (-)No (-)No (-)No (-)No Maybe (+) Yes
Message format XML schema XML schema JSON (-)Binary packet (-)Complex XMPP XML schema (+) Original RSS or Atom content
Secure notifications (-)No (-)No Somewhat (-)No (+) Yes (-)No (+) Yes
Publisher complexity XML-RPC client XML-RPC client SUP IDs (-)UDP send (-)XMPP send XML-RPC/(+) REST ping (+) REST ping
Subscriber complexity (-)Crawl pipeline (-)Crawl pipeline (-)Crawl pipeline (-)Crawl pipeline XMPP client (-)Crawl pipeline (+) Simple webapp

The rest of this document will compare light and fat pinging by these metrics:


Latency

To simplify this explanation, latency is represented as network "hops": the time it takes on average for data to propagate between two Internet nodes.

Naively, light pings and fat pings look the same:

Naive protocol flow for light pings Naive protocol flow for PuSH

There are four network hops required to deliver new content to a subscriber.

However, this leaves light pinging hubs open to relay denial of service attacks, so the Hub must verify there is new content:

Light ping protocol flow with proxying

  • Result:
    • Light pings require at least six network hops.
    • Fat pings require at most four network hops.
    • Fat pinging is 33% faster than naive light pinging.

Often publishers will be combined with their own hub (for better integration with their application, better statistics gathering, optimizations) yielding:

  • Light ping: 3 hops.

Light pings with publisher-owned hub

  • Fat ping: 1 hop.

Fat pings with publisher-owned hub

  • Result: Fat ping is 66% faster.

With popular sites, a feed will be served from multiple datacenters:

  • Light ping case:
    • Wait for propagation delay to fill all caches before sending pings, meaning the whole system operates as fast as the slowest node
    • All caches represent a dependent point of failure, meaning more waiting and retries

Protocol flow for light pings with multiple datacenters

  • Fat ping case:
    • Integrated hub may immediately send fat pings before feeds are updated externally

Protocol flow for fat pings in multiple datacenters

  • Result:
    • Latency incurred by caching/replication delays can be zero with fat pings.
    • In the light pinging case, it is always non-zero.
    • This is irrelevant for single-host sites, but it gets worse and worse the bigger a site is.
    • Mitigating this problem for light pings requires specialized knowledge of datacenter topology, which violates the abstraction of DNS.

Bandwidth

Assume an average feed is 100KB consisting of fifty 2KB posts.

Take the case of a single new item with 2KB of data and 100 subscribers to the feed:

  • Light ping: 10MB served by publisher

Light ping bandwidth calculation

  • Fat ping: 200KB served by publisher

PSHB bandwidth calculation

  • Result:
    • Fat ping requires 98% less data (i.e., light ping requires 50x more).
    • Light pinging requires publisher to serve 100x more HTTP requests (the "thundering herd").
    • This result remains the same even with 1,000,000 subscribers to a feed.

To prevent denial of service attacks, light pinging Hubs must verify there is new content:

Light ping verification bandwidth

  • Result:
    • Even worse than naive light pinging case.
    • Same bandwidth overhead as naive light pings.
    • Fat pinging is 33% faster than light pinging.

Light-ping advocates suggest that the Hub should re-serve only the new content on behalf of the publisher:

Light pings with proxying bandwidth calculation

  • Result:
    • Equal bandwidth as fat pings.
    • Still 100x as many incoming HTTP requests as fat pings.
    • 33% more latency than naive fat pings.
    • 66% more latency than combined publisher/hub fat pings.
    • Trust/security model for proxied feed on behalf of publisher unclear.

CPU Usage

Assume parsing a whole feed on average takes 10ms per item. Again assume an average feed has 25 items.

Take the case of a single new item being sent to 100 subscribers to the feed with naive pings:

  • Light ping: 25 seconds of CPU time consumed by subscribers (250ms each).

Light ping CPU usage calculation

  • Fat ping: 1.25 seconds of CPU time consumed total; 250ms by the hub, 10ms by each subscriber.

Naive PSHB CPU calculation

  • Result:
    • Fat pings require 95% less CPU (i.e., light pinging requires 20x).
    • Fat pings 99.6% cheaper for subscribers (i.e., light pinging requires 25x).
    • Light pinging requires publisher to serve 100x more HTTP requests (the "thundering herd") which have overhead.
    • This result remains the same even with 1,000,000 subscribers to a feed.

However, the Hub must verify the feed is new to make this safe:

Light pings with verification CPU usage calculation

  • Result:
    • Light pings require 25.25 CPU seconds consumed; 250ms by the hub, 250ms for each consumer
    • Even worse than the naive case.

And when, for light pings, Hubs re-serve only the new content on behalf of the publisher:

Light pings with verification and proxying CPU usage calculation

  • Result:
    • Equal CPU as fat pings.
    • Still 100x as many incoming HTTP requests as fat pings.
    • 33% more latency than naive fat pings.
    • 66% more latency than combined publisher/hub fat pings.
    • Trust/security model for proxied feed on behalf of publisher unclear.

Publisher complexit

Assume the publisher tells hubs the feed URLs.

  • Light ping: Send a feed URL in some format.
  • Fat ping: Send a feed URL in some format.
  • Result: More or less equivalent, though some interop issues (e.g., SOAP) could exist.

Subscriber complexity

  • Light ping

    • Subscription protocol code
    • Feed fetching pipeline
    • Feed parsing code
  • Fat ping

    • Subscription protocol code
    • Parsing code
  • Result

    • Fat pinging does not require the complexity of a feed fetching pipeline.
    • Significant because efficiently doing feed fetches in an asynchronous way can be hard, if not impossible for simple hosting providers.

Hub complexity

Assuming that Hubs must verify that the original feed has changed or else they will just be an open relay for DoS attacks.

  • Light ping:

    • Receive ping
    • Verify feed document has changed (one hash of the text contents)
    • List subscribers
    • Send ping to subscribers
  • Fat ping:

    • Receive ping
    • Parse feed document, determine if individual entries have changed (multiple hashes, one for each item)
    • List subscribers
    • Send new content to subscribers
  • Result:

    • Roughly the same.
    • Fat pinging requires a reparse of the feed on each publish notification, light pinging does not; they only need to check a hash of the whole content has changed.
    • Light ping requires only one hash per feed instead of one per item for fat pings. Storage usage can be mitigated by fat pinging by having a cap on total storage allowed per feed.

Clone this wiki locally