Skip to content

Timers seem to get clobbered in Linux multi-threaded application #164

@TheRealZago

Description

@TheRealZago

I've taken inspiration from the Linux port in issue #140, and quickly turned it into a delta-timer, using OS-level timers in nanosecond(!) precision (timer_create and friends), where I got back to <2% timer variation.

The result looked pretty good, until I realized, after ~2 hours of free-running, the PDOs were no longer triggering. The effective behavior dances around (1) no TPDO will ever transmit until the application is restart, or (2) one of the 2 active TPDOs stops, while the other keeps going normally, and a quick jump between Op-PreOp-Op might restore it temporarily.
In short, the soft-timers seem to get "corrupted" and the HAL timer never gets rearmed by the stack.

I can't provide the real application due to NDA, but I've reproduced the problem in this reduced project, which behaves pretty much the same: https://github.com/TheRealZago/canopen-timers. I've left some debugging notes I've acquired over the last 3 weeks of analyzing this problem, but it's extremely annoying to reproduce and debug.

If anyone has deployed this stack in a Linux environment, did you ever encounter this issue?
Otherwise, what interaction should I be tracking more in detail in the stack for figuring out why the timers seem to get corrupted?

Before getting lapidated, I'm not expecting an "I HAZ CODES" solution, but I'd be very happy to get input from "experts" who've been working with this project for longer than I have... 😄

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions