Skip to content

Conversation

@simmsa
Copy link
Collaborator

@simmsa simmsa commented Oct 22, 2025

This PR improves the modaq_toolkit MCAP parser's robustness when handling edge cases in ROS2 message processing, adds flexible topic filtering, and enhances type conversion from ROS to numpy data types.

New Features

1. Topic Filtering with topics_to_skip

The MCAPParser now accepts an optional topics_to_skip parameter that allows you to exclude specific topics from processing:

from modaq_toolkit import MCAPParser
from pathlib import Path

# Skip specific topics during parsing
parser = MCAPParser(
    mcap_path=Path("my_bag.mcap"),
    topics_to_skip=["/diagnostics", "/rosout", "/parameter_events"]
)

# Process the file - skipped topics won't appear in output
dataframes = parser.get_dataframes()

2. Automatic ROS Constants Filtering

The parser now automatically skips ROS message constant definitions (which are not actual message fields). These include common logging level constants like:

  • DEBUG=10
  • INFO=20
  • WARN=30
  • ERROR=40
  • FATAL=50

These constants appear in ROS message schemas but aren't data fields. Previously, the parser would try to extract these as fields and fail. Now they're automatically detected and skipped during schema parsing.

3. ROS to Numpy Type Conversion

The MessageProcessor now includes a type mapping system (ros_type_to_numpy_type_map) that converts ROS primitive types to their corresponding numpy dtypes:

ros_type_to_numpy_type_map = {
    "float64": np.float64,
    # Additional mappings can be added for int8, int16, int32, float32, etc.
}

When processing messages, fields with recognized ROS types are automatically converted to numpy arrays with the correct dtype:

  • Explicitly specifies precision based on MCAP schema (e.g., uint64 remains uint64, not automatically converted to int64).
  • mcap list schema <path_to_mcap> provides schema output in cli for reference

simmsa added 12 commits October 22, 2025 15:55
Initially only change float64 and uint64

Other data types may work be we need to verify that this doesn't break
any object/string parsing
This catches an edge case where an MCAP file may have both a scalar and
an array and we need to concatenate both into an array. This basically
converts scalars into arrays prior to concatenation
This catches and handles edge cases in array expansion and gives every
effort to output valid data
Constants can be seen by using `mcap list schemas <files>`
@simmsa simmsa requested a review from cnicho35 October 22, 2025 22:22
@simmsa
Copy link
Collaborator Author

simmsa commented Oct 22, 2025

@cnicho35, this is ready for review. This adds support the M2 output from the comparison test.

@simmsa simmsa marked this pull request as draft October 22, 2025 22:45
@cnicho35
Copy link
Collaborator

Hi Andrew,

I did a quick test on the HPC and it doesn't appear to be working for the m1-m2-comparison mcap files.

import sys
from pathlib import Path

# --- Add your local toolkit folder to the Python path ---
# For example, if modaq_toolkit is in /projects/m2_ws/src/m2_core/
toolkit_path = Path(r"/projects/wpfacilities/m1_m2_comparison/MODAQ_toolkit/src/")  # <-- change as needed
sys.path.append(str(toolkit_path))

from modaq_toolkit import MCAPParser
from pathlib import Path

# Skip specific topics during parsing
parser = MCAPParser(
    mcap_path=Path("/projects/wpfacilities/m1_m2_comparison/MODAQ_toolkit/tests/Bag_2025_07_23_17_10_38_184.mcap"),
    topics_to_skip=["/diagnostics", "/rosout", "/parameter_events"]
)

# Process the file - skipped topics won't appear in output
dataframes = parser.get_dataframes()

print("Parsed DataFrames with skipped topics:")
for df in dataframes:
    print(df)

This returns an empty df with no error message.

Also, can you please add the topics_to_skip input to the process_mcap_files function?

@cnicho35
Copy link
Collaborator

@simmsa
Copy link
Collaborator Author

simmsa commented Oct 23, 2025

@cnicho35 I think I fixed all of the above bugs. On Kestrel with the latest version of this code this yields:

python SingleFileTest.py
INFO:modaq_toolkit.parser:Processing MCAP file: /projects/wpfacilities/m1_m2_comparison/MODAQ_toolkit/tests/Bag_2025_07_23_17_10_38_184.mcap
INFO:modaq_toolkit.parser:Found 7 topics: /ain_1_fast, /rosout, /ain_2_fast, /events/write_split, /system_messenger, /parameter_events, /time_status
INFO:modaq_toolkit.parser:Topic /ain_1_fast DataFrame shape: (6003, 14)
INFO:modaq_toolkit.parser:Topic /ain_2_fast DataFrame shape: (14649, 14)
INFO:modaq_toolkit.parser:Topic /events/write_split DataFrame shape: (1, 2)
INFO:modaq_toolkit.parser:Topic /system_messenger DataFrame shape: (2, 10)
INFO:modaq_toolkit.parser:Topic /time_status DataFrame shape: (2, 10)
Parsed DataFrames with skipped topics:
/ain_1_fast
/ain_2_fast
/events/write_split
/system_messenger
/time_status

I'm going to create to run this on the SURF-WEC dataset to verify it works there also.

This "should" work for all the m2_comparison files, but you may have one off edge cases.

I'm going to bump this to 0.4, and update the PR notes as this includes some larger changes than just a patch.

LMK if you see any other areas of improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants