Skip to content

nodeset_node_associate_pubsub_chanhead: Assertion `ch->redis.slist.in_disconnected_pubsub_list == 0' failed in certain cases #700

@mkdewidar

Description

@mkdewidar

Hi,

We run a production system with Openresty version 1.25.3.2 and Nchan version 1.3.7. We run hundreds of containers per day, each serving tens of thousands of connections each, with Redis as a Nchan backend.

We are seeing regular failures every one or two hours throughout the day, every day, with the following series of errors:

nchan: Redis slave node <redis node> channel /<channel name> is SUBSCRIBING, but status was set to UNSUBSCRIBED
nchan: Redis slave node <redis node> expected previous pubsub_status for channel <memory address> (id: /<channel name>) to be REDIS_PUBSUB_SUBSCRIBING (0), was 2
nchan-src/src/store/redis/redis_nodeset.c:3638: nodeset_node_associate_pubsub_chanhead: Assertion `ch->redis.slist.in_disconnected_pubsub_list == 0' failed.
<container termiantes>

We cannot see any errors in Redis' logs around this time, and neither can I see any errors indicating Nchan had any connection drops before this sequence of logs.

It seems like there is a very hard to replicate bug where if a reconnect happens at the exact time that the connection is being reaped Nchan ends up in a bad state and ends up exiting?

Unfortunately I'm unable to replicate this locally, and cannot enable debug logs either due to the volume of traffic our system handles.

Wondering if you have any thoughts on this or ideas what this could be caused by?

Thanks,
Mohamed

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions