nvexec.maxwell_gpu_m has seg fault due to multi_gpu_stream_scheduler::query bug

PR #1683 (I think) breaks example `nvexec.maxwell_gpu_m`, causing a seg fault at runtime (or a libstdc++ assertion failure if using the hardened standard library in GCC 15).

The buggy code is in classes `stream_scheduler` and `multi_gpu_stream_scheduler` and in their common base class `stream_scheduler_env`.  The base class has this code:
```c++
    struct stream_scheduler_env {
      // ...
      auto query(get_completion_scheduler_t<set_value_t>) const noexcept -> stream_scheduler;
      // ....
    };
    // ...
    inline auto stream_scheduler_env::query(get_completion_scheduler_t<set_value_t>) const noexcept
      -> stream_scheduler {
      return (const stream_scheduler&) *this;
    }
```
The `query` function assumes without checking that the derived class is a `stream_scheduler`.

The two derived classes contain this code:
```c++
    struct stream_scheduler : private stream_scheduler_env {
      // ...
      using stream_scheduler_env::query;
      // ...
      // non-static data members:
      context_state_t context_state_;
    };
```
```c++
    struct multi_gpu_stream_scheduler : private stream_scheduler_env {
      // ...
      using stream_scheduler_env::query;
      // ...
      // non-static data members:
      int num_devices_{};
      context_state_t context_state_;
    };
```

`__read_query_t::operator()` has `return __attrs.query(_GetComplSch{});`, which can call the `query` function in question with a `multi_gpu_stream_scheduler`.  That essentially does a bit cast of a `multi_gpu_stream_scheduler` to a `stream_scheduler`, which doesn't work because the data members aren't compatible.

This was found with NVC++ testing, where example `nvexec.maxwell_gpu_m` fails consistently with a runtime seg fault.  (The problem wasn't noticed earlier because there were other problems (on our side, not problems with stdexec) with our stdexec tests that were masking the failures.)  NVHPC tracking: FS#38096


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

nvexec.maxwell_gpu_m has seg fault due to multi_gpu_stream_scheduler::query bug #1722

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

nvexec.maxwell_gpu_m has seg fault due to multi_gpu_stream_scheduler::query bug #1722

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions