Skip to content

nvexec.maxwell_gpu_m has seg fault due to multi_gpu_stream_scheduler::query bug #1722

@dkolsen-pgi

Description

@dkolsen-pgi

PR #1683 (I think) breaks example nvexec.maxwell_gpu_m, causing a seg fault at runtime (or a libstdc++ assertion failure if using the hardened standard library in GCC 15).

The buggy code is in classes stream_scheduler and multi_gpu_stream_scheduler and in their common base class stream_scheduler_env. The base class has this code:

    struct stream_scheduler_env {
      // ...
      auto query(get_completion_scheduler_t<set_value_t>) const noexcept -> stream_scheduler;
      // ....
    };
    // ...
    inline auto stream_scheduler_env::query(get_completion_scheduler_t<set_value_t>) const noexcept
      -> stream_scheduler {
      return (const stream_scheduler&) *this;
    }

The query function assumes without checking that the derived class is a stream_scheduler.

The two derived classes contain this code:

    struct stream_scheduler : private stream_scheduler_env {
      // ...
      using stream_scheduler_env::query;
      // ...
      // non-static data members:
      context_state_t context_state_;
    };
    struct multi_gpu_stream_scheduler : private stream_scheduler_env {
      // ...
      using stream_scheduler_env::query;
      // ...
      // non-static data members:
      int num_devices_{};
      context_state_t context_state_;
    };

__read_query_t::operator() has return __attrs.query(_GetComplSch{});, which can call the query function in question with a multi_gpu_stream_scheduler. That essentially does a bit cast of a multi_gpu_stream_scheduler to a stream_scheduler, which doesn't work because the data members aren't compatible.

This was found with NVC++ testing, where example nvexec.maxwell_gpu_m fails consistently with a runtime seg fault. (The problem wasn't noticed earlier because there were other problems (on our side, not problems with stdexec) with our stdexec tests that were masking the failures.) NVHPC tracking: FS#38096

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions