Feature: Improve robustness of MCAP Parser #6

simmsa · 2025-10-22T22:22:06Z

This PR improves the modaq_toolkit MCAP parser's robustness when handling edge cases in ROS2 message processing, adds flexible topic filtering, and enhances type conversion from ROS to numpy data types.

New Features

1. Topic Filtering with `topics_to_skip`

The MCAPParser now accepts an optional topics_to_skip parameter that allows you to exclude specific topics from processing:

from modaq_toolkit import MCAPParser
from pathlib import Path

# Skip specific topics during parsing
parser = MCAPParser(
    mcap_path=Path("my_bag.mcap"),
    topics_to_skip=["/diagnostics", "/rosout", "/parameter_events"]
)

# Process the file - skipped topics won't appear in output
dataframes = parser.get_dataframes()

2. Automatic ROS Constants Filtering

The parser now automatically skips ROS message constant definitions (which are not actual message fields). These include common logging level constants like:

DEBUG=10
INFO=20
WARN=30
ERROR=40
FATAL=50

These constants appear in ROS message schemas but aren't data fields. Previously, the parser would try to extract these as fields and fail. Now they're automatically detected and skipped during schema parsing.

3. ROS to Numpy Type Conversion

The MessageProcessor now includes a type mapping system (ros_type_to_numpy_type_map) that converts ROS primitive types to their corresponding numpy dtypes:

ros_type_to_numpy_type_map = {
    "float64": np.float64,
    # Additional mappings can be added for int8, int16, int32, float32, etc.
}

When processing messages, fields with recognized ROS types are automatically converted to numpy arrays with the correct dtype:

Explicitly specifies precision based on MCAP schema (e.g., uint64 remains uint64, not automatically converted to int64).
mcap list schema <path_to_mcap> provides schema output in cli for reference

Initially only change float64 and uint64 Other data types may work be we need to verify that this doesn't break any object/string parsing

This catches an edge case where an MCAP file may have both a scalar and an array and we need to concatenate both into an array. This basically converts scalars into arrays prior to concatenation

This catches and handles edge cases in array expansion and gives every effort to output valid data

Constants can be seen by using `mcap list schemas <files>`

simmsa · 2025-10-22T22:23:24Z

@cnicho35, this is ready for review. This adds support the M2 output from the comparison test.

cnicho35 · 2025-10-23T14:06:40Z

Hi Andrew,

I did a quick test on the HPC and it doesn't appear to be working for the m1-m2-comparison mcap files.

import sys
from pathlib import Path

# --- Add your local toolkit folder to the Python path ---
# For example, if modaq_toolkit is in /projects/m2_ws/src/m2_core/
toolkit_path = Path(r"/projects/wpfacilities/m1_m2_comparison/MODAQ_toolkit/src/")  # <-- change as needed
sys.path.append(str(toolkit_path))

from modaq_toolkit import MCAPParser
from pathlib import Path

# Skip specific topics during parsing
parser = MCAPParser(
    mcap_path=Path("/projects/wpfacilities/m1_m2_comparison/MODAQ_toolkit/tests/Bag_2025_07_23_17_10_38_184.mcap"),
    topics_to_skip=["/diagnostics", "/rosout", "/parameter_events"]
)

# Process the file - skipped topics won't appear in output
dataframes = parser.get_dataframes()

print("Parsed DataFrames with skipped topics:")
for df in dataframes:
    print(df)

This returns an empty df with no error message.

Also, can you please add the topics_to_skip input to the process_mcap_files function?

cnicho35 · 2025-10-23T14:10:05Z

Also, can you please remove this print function:
https://github.com/simmsa/MODAQ_toolkit/blob/44ac8ab9da63c6ca34c16d1aa3805a9758017e89/src/modaq_toolkit/message_processing.py#L270

…ions

simmsa · 2025-10-23T15:22:59Z

@cnicho35 I think I fixed all of the above bugs. On Kestrel with the latest version of this code this yields:

python SingleFileTest.py
INFO:modaq_toolkit.parser:Processing MCAP file: /projects/wpfacilities/m1_m2_comparison/MODAQ_toolkit/tests/Bag_2025_07_23_17_10_38_184.mcap
INFO:modaq_toolkit.parser:Found 7 topics: /ain_1_fast, /rosout, /ain_2_fast, /events/write_split, /system_messenger, /parameter_events, /time_status
INFO:modaq_toolkit.parser:Topic /ain_1_fast DataFrame shape: (6003, 14)
INFO:modaq_toolkit.parser:Topic /ain_2_fast DataFrame shape: (14649, 14)
INFO:modaq_toolkit.parser:Topic /events/write_split DataFrame shape: (1, 2)
INFO:modaq_toolkit.parser:Topic /system_messenger DataFrame shape: (2, 10)
INFO:modaq_toolkit.parser:Topic /time_status DataFrame shape: (2, 10)
Parsed DataFrames with skipped topics:
/ain_1_fast
/ain_2_fast
/events/write_split
/system_messenger
/time_status

I'm going to create to run this on the SURF-WEC dataset to verify it works there also.

This "should" work for all the m2_comparison files, but you may have one off edge cases.

I'm going to bump this to 0.4, and update the PR notes as this includes some larger changes than just a patch.

LMK if you see any other areas of improvement.

simmsa added 12 commits October 22, 2025 15:55

MCAP: Parse "stamp" to a datetime 'time' column

bc4e2df

MCAP: Add typical schema dict entry as an example

a08be81

MCAP: Correctly type MCAP data per the schema

7aa2747

Initially only change float64 and uint64 Other data types may work be we need to verify that this doesn't break any object/string parsing

MCAP: Add array "normalization"

e6d04c7

This catches an edge case where an MCAP file may have both a scalar and an array and we need to concatenate both into an array. This basically converts scalars into arrays prior to concatenation

MCAP: Add a more complete array expansion function

6655011

This catches and handles edge cases in array expansion and gives every effort to output valid data

MCAP: Skip known ROS_CONSTANTS

f2eac84

Constants can be seen by using `mcap list schemas <files>`

MCAP: Add topics_to_skip to MCAPParser

810e691

MCAP: Add more guards against empty dataframes

9b4717c

MCAP: Allow output of dataframes without a datetime index

90fe8f1

MCAP: Add additional debug output for parsing loop

9404c4a

Dev: Fix init.py module order

f15b9bc

Release: Bump to 0.3.1

44ac8ab

simmsa requested a review from cnicho35 October 22, 2025 22:22

simmsa marked this pull request as draft October 22, 2025 22:45

simmsa added 8 commits October 22, 2025 17:04

MCAP: Handle numpy edge case in array conversion

d40bf47

MCAP: Warn if the topic is empty after stage 2

df862e7

MCAP: Add array data length check

36d44d3

MCAP: Add docstring for parse_ros_message_definition

abb2375

MCAP: Use recursion to process nested message definitions

5a84b09

MCAP: Enable type conversion for all np types

5434cfe

MCAP: Use recursive types and np types to parse messages to ROS types

727ea81

MCAP: Add options to standardize time in stage 2 data

720adf4

MCAP: Fix, don't use .item()

f8a5df0

simmsa added 4 commits October 23, 2025 08:15

MCAP: Preserve dtypes between multiple arrays

f160514

MCAP: Use private MessageSpecification class to extract schema types

9d98e26

MCAP: Add topics_to_skip to process_mcap_and_get_dataframes

7f778b1

MCAP: Add topics_to_skip to process_mcap_and_get_dataframes

a2bc5f7

simmsa added 3 commits October 23, 2025 08:53

MCAP: Add topics_to_skip and stage1, stage 2 args to public api funct…

c991ce1

…ions

MCAP: Add topics to skip and stage 1, stage 2 args to cli

067d246

MCAP: Automatically call read_mcap() if it hasn't been called

83666f9

simmsa added 2 commits November 19, 2025 16:31

Time: By default keep any original time columns

3e64e3e

Time: By default keep original time columns in output data

519627e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Improve robustness of MCAP Parser #6

Feature: Improve robustness of MCAP Parser #6

Uh oh!

simmsa commented Oct 22, 2025

Uh oh!

simmsa commented Oct 22, 2025

Uh oh!

cnicho35 commented Oct 23, 2025

Uh oh!

cnicho35 commented Oct 23, 2025

Uh oh!

simmsa commented Oct 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Feature: Improve robustness of MCAP Parser #6

Are you sure you want to change the base?

Feature: Improve robustness of MCAP Parser #6

Uh oh!

Conversation

simmsa commented Oct 22, 2025

New Features

1. Topic Filtering with topics_to_skip

2. Automatic ROS Constants Filtering

3. ROS to Numpy Type Conversion

Uh oh!

simmsa commented Oct 22, 2025

Uh oh!

cnicho35 commented Oct 23, 2025

Uh oh!

cnicho35 commented Oct 23, 2025

Uh oh!

simmsa commented Oct 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. Topic Filtering with `topics_to_skip`