Comparing Protocols

Comparison of PubSubHubbub to light-pinging protocols

People want a comparison of the concrete differences between fat pinging (PubSubHubbub, XMPP pubsub) and light pinging (rssCloud, XML-RPC pings, changes.xml, SUP, SLAP). This document aims to construct and convey an evaluation of these protocols that's easy to understand.

The core difference is how new information from feeds is delivered from a publisher to a subscriber:

Light pings: Send the URL of the feed that has updated to the subscriber.
Fat pings: Send the updated content of the feed to the subscriber.

There is also another series of criteria to consider for each protocol. (+) is good, (-) is bad.

Consideration	XML-RPC ping	changes.xml	SUP	SLAP	XMPP pubsub	rssCloud	PubSubHubbub
Transport	(+) HTTP	(+) HTTP	(+) HTTP/HTTPS	(-)UDP	TCP/XMPP	(+) HTTP	(+) HTTP/HTTPS
Distribution style	Ping/Poll	(-)Polling	(-)Polling	Ping/Poll	(+) Push	Ping/Poll	(+) Push
Latency	Low	(-)High	(-)High	Low	(+) Minimum possible	Low	(+) Minimum possible
Thundering herd	(-)Yes	(-)Yes	(-)Yes	(-)Yes	(+) No	(-)Yes	(+) No
Spamable (no topics)	(-)Yes	(-)Yes	(+) No	(+) No	(+) No	(+) No	(+) No
DoSes Publishers	Preventable	(+) No	(+) No	Preventable	Preventable	Preventable	Preventable
DoS Relay attacks	(-)Yes	(+) No	(+) No	(+) No	(+) No	(-)Yes	(+) No
Possible to implement on $5/month hosting	(-)No	(-)No	(-)No	(-)No	(-)No	Maybe	(+) Yes
Message format	XML schema	XML schema	JSON	(-)Binary packet	(-)Complex XMPP	XML schema	(+) Original RSS or Atom content
Secure notifications	(-)No	(-)No	Somewhat	(-)No	(+) Yes	(-)No	(+) Yes
Publisher complexity	XML-RPC client	XML-RPC client	SUP IDs	(-)UDP send	(-)XMPP send	XML-RPC/(+) REST ping	(+) REST ping
Subscriber complexity	(-)Crawl pipeline	(-)Crawl pipeline	(-)Crawl pipeline	(-)Crawl pipeline	XMPP client	(-)Crawl pipeline	(+) Simple webapp

The rest of this document will compare light and fat pinging by these metrics:

Latency
Bandwidth
CPU Usage
Publisher complexity
Subscriber complexity
Hub complexity

Latency

To simplify this explanation, latency is represented as network "hops": the time it takes on average for data to propagate between two Internet nodes.

Naively, light pings and fat pings look the same:

Naive protocol flow for light pings Naive protocol flow for PuSH

There are four network hops required to deliver new content to a subscriber.

However, this leaves light pinging hubs open to relay denial of service attacks, so the Hub must verify there is new content:

Light ping protocol flow with proxying

Result:
- Light pings require at least six network hops.
- Fat pings require at most four network hops.
- Fat pinging is 33% faster than naive light pinging.

Often publishers will be combined with their own hub (for better integration with their application, better statistics gathering, optimizations) yielding:

Light ping: 3 hops.

Light pings with publisher-owned hub

Fat ping: 1 hop.

Fat pings with publisher-owned hub

Result: Fat ping is 66% faster.

With popular sites, a feed will be served from multiple datacenters:

Light ping case:
- Wait for propagation delay to fill all caches before sending pings, meaning the whole system operates as fast as the slowest node
- All caches represent a dependent point of failure, meaning more waiting and retries

Protocol flow for light pings with multiple datacenters

Fat ping case:
- Integrated hub may immediately send fat pings before feeds are updated externally

Protocol flow for fat pings in multiple datacenters

Result:
- Latency incurred by caching/replication delays can be zero with fat pings.
- In the light pinging case, it is always non-zero.
- This is irrelevant for single-host sites, but it gets worse and worse the bigger a site is.
- Mitigating this problem for light pings requires specialized knowledge of datacenter topology, which violates the abstraction of DNS.

Bandwidth

Assume an average feed is 100KB consisting of fifty 2KB posts.

Take the case of a single new item with 2KB of data and 100 subscribers to the feed:

Light ping: 10MB served by publisher

Light ping bandwidth calculation

Fat ping: 200KB served by publisher

PSHB bandwidth calculation

Result:
- Fat ping requires 98% less data (i.e., light ping requires 50x more).
- Light pinging requires publisher to serve 100x more HTTP requests (the "thundering herd").
- This result remains the same even with 1,000,000 subscribers to a feed.

To prevent denial of service attacks, light pinging Hubs must verify there is new content:

Light ping verification bandwidth

Result:
- Even worse than naive light pinging case.
- Same bandwidth overhead as naive light pings.
- Fat pinging is 33% faster than light pinging.

Light-ping advocates suggest that the Hub should re-serve only the new content on behalf of the publisher:

Light pings with proxying bandwidth calculation

Result:
- Equal bandwidth as fat pings.
- Still 100x as many incoming HTTP requests as fat pings.
- 33% more latency than naive fat pings.
- 66% more latency than combined publisher/hub fat pings.
- Trust/security model for proxied feed on behalf of publisher unclear.

CPU Usage

Assume parsing a whole feed on average takes 10ms per item. Again assume an average feed has 25 items.

Take the case of a single new item being sent to 100 subscribers to the feed with naive pings:

Light ping: 25 seconds of CPU time consumed by subscribers (250ms each).

Light ping CPU usage calculation

Fat ping: 1.25 seconds of CPU time consumed total; 250ms by the hub, 10ms by each subscriber.

Naive PSHB CPU calculation

Result:
- Fat pings require 95% less CPU (i.e., light pinging requires 20x).
- Fat pings 99.6% cheaper for subscribers (i.e., light pinging requires 25x).
- Light pinging requires publisher to serve 100x more HTTP requests (the "thundering herd") which have overhead.
- This result remains the same even with 1,000,000 subscribers to a feed.

However, the Hub must verify the feed is new to make this safe:

Light pings with verification CPU usage calculation

Result:
- Light pings require 25.25 CPU seconds consumed; 250ms by the hub, 250ms for each consumer
- Even worse than the naive case.

And when, for light pings, Hubs re-serve only the new content on behalf of the publisher:

Light pings with verification and proxying CPU usage calculation

Result:
- Equal CPU as fat pings.
- Still 100x as many incoming HTTP requests as fat pings.
- 33% more latency than naive fat pings.
- 66% more latency than combined publisher/hub fat pings.
- Trust/security model for proxied feed on behalf of publisher unclear.

Publisher complexit

Assume the publisher tells hubs the feed URLs.

Light ping: Send a feed URL in some format.
Fat ping: Send a feed URL in some format.
Result: More or less equivalent, though some interop issues (e.g., SOAP) could exist.

Subscriber complexity

Light ping
- Subscription protocol code
- Feed fetching pipeline
- Feed parsing code
Fat ping
- Subscription protocol code
- Parsing code
Result
- Fat pinging does not require the complexity of a feed fetching pipeline.
- Significant because efficiently doing feed fetches in an asynchronous way can be hard, if not impossible for simple hosting providers.

Hub complexity

Assuming that Hubs must verify that the original feed has changed or else they will just be an open relay for DoS attacks.

Light ping:
- Receive ping
- Verify feed document has changed (one hash of the text contents)
- List subscribers
- Send ping to subscribers
Fat ping:
- Receive ping
- Parse feed document, determine if individual entries have changed (multiple hashes, one for each item)
- List subscribers
- Send new content to subscribers
Result:
- Roughly the same.
- Fat pinging requires a reparse of the feed on each publish notification, light pinging does not; they only need to check a hash of the whole content has changed.
- Light ping requires only one hash per feed instead of one per item for fat pings. Storage usage can be mitigated by fat pinging by having a cap on total storage allowed per feed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparing Protocols

Comparison of PubSubHubbub to light-pinging protocols

Latency

Bandwidth

CPU Usage

Publisher complexit

Subscriber complexity

Hub complexity

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally