Skip to content

Conversation

@andreas
Copy link
Contributor

@andreas andreas commented Aug 8, 2025

This PR adds a fuzzer based on libfuzzer for roaring_buffer_reader. You can try it as follows (the fuzzer will run up to 60s):

cd fuzz
make quick_fuzz

The intent of the fuzzer is to cover whether the custom deserialization and operations on roaring bitmaps implemented in roaring_buffer_reader.c is safe and correct, in the sense that they compute the same results as croaring.

The fuzzer works as follows:

  1. Deserializes the input with roaring_bitmap_portable_deserialize_safe. Stop if deserialization fails.
  2. Create a roaring buffer with roaring_buffer_create.
  3. Compare return values for the following functions with their equivalent from croaring:
    • roaring_buffer_get_cardinality
    • roaring_buffer_contains
    • roaring_buffer_is_subset
    • roaring_buffer_and
    • roaring_buffer_andnot
    • roaring_buffer_and_cardinality
    • roaring_buffer_or_cardinality
    • roaring_buffer_andnot_cardinality
    • roaring_buffer_xor_cardinality
    • roaring_buffer_jaccard_index
    • roaring_buffer_intersect
    • roaring_buffer_is_empty
    • roaring_buffer_equals
    • roaring_buffer_rank
    • roaring_buffer_minimum
    • roaring_buffer_maximum

If you run the fuzzer, it pretty quickly finds that roaring_buffer_t relies on the offset header from the serialized bitmap, which is not validated by roaring_bitmap_portable_deserialize_safe. This can cause the roaring buffer functions to return wrong results.

Here's an example of that found by the fuzzer replicated in psql:

postgres=# select rb_to_array('\x3a300000010000000000000000000000000000000000000000000000000000000000000000000000000000000008'::roaringbitmap);
 rb_to_array
-------------
 {0}
(1 row)

postgres=# select rb_min('\x3a300000010000000000000000000000000000000000000000000000000000000000000000000000000000000008'::roaringbitmap);
 rb_min
--------
  12346
(1 row)

A proposed fix could be to not rely on the offset reader for untrusted input, but rather parse the containers instead (the croaring library always does this).

If there's support for this change, as well as #40, then we can set up fuzzing as a Github Action (see https://google.github.io/clusterfuzzlite/running-clusterfuzzlite/github-actions/).

@ChenHuajun
Copy link
Owner

How about rename Makefile.fuzz to Makefile, while it's the only build file in the directory​​

@andreas
Copy link
Contributor Author

andreas commented Aug 18, 2025

How about rename Makefile.fuzz to Makefile, while it's the only build file in the directory​​

Makes sense — fixed in 8f5cf6c 🙂

@ChenHuajun
Copy link
Owner

As described in #45, the roaring_buffer_reader relies on offsets for deserialization. When the offsets in the input serialized data are incorrect, it cannot produce the same output as roaring_bitmap_portable_deserialize_safe(which might skip offset parsing).

If it were also modified to not rely on offsets for deserialization, there would be a significant performance degradation, so this discrepancy can only be ignored.

@andreas
Copy link
Contributor Author

andreas commented Sep 8, 2025

After including the fix from #45, as well as a call to roaring_bitmap_internal_validate, then the fuzzer can run for 10 minutes with no findings 🥳

Note that the issue around offsets also apply for cardinalities: roaring_buffer_*-functions rely on keyscards from the serialized buffer, but malicious input can exploit this. The change in #45 fixes this too though 🙂

I'll open a separate PR to add roaring_bitmap_internal_validate in roaringbitmap_in and rb_from_bytea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants