Skip to content

Conversation

@eramirem
Copy link

@eramirem eramirem commented Aug 8, 2024

Problem

If we want to use the output of BM25Encoder._encode_single_document to form a scipy.sparse array, we cannot use the mmh3 hashes returned as indices, but we rather need the positions of the non-null keywords within the document frequency object.

Solution

Add a flag to specify whether you want the index positions to be returned by _encode_single_document

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update
  • Infrastructure change (CI configs, etc)
  • Non-code change (docs, etc)
  • None of the above: (explain here)

Test Plan

Describe specific steps for validating this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant