Skip to content

Conversation

@em-baggie
Copy link
Collaborator

@em-baggie em-baggie commented Jun 15, 2025

I tried a few different ways of doing this and thought this was the best option, but please let me know if there's a better way!

Changes:

  • Added custom regex to CodelistOptions as an optional String
  • Added checking if string can be parsed to regex when new codelist is made with the new() method
  • Added a new CustomCodeValidator which has method called by the Validate trait method when the custom regex has been set - this means if the custom regex is set, the validate_codes() method uses custom validation but if it is set to None, it does validation depending on codelist type. I assumed there wouldn't be a case where you needed to do both on one codelist.

I initially wanted to include the Regex in the Codelist struct (so only compiled once when codelist is made) but this made implementing the required traits with the attributes difficult (e.g. serialisation, PartialEq). Using String avoids this and also allows checking of valid regex when codelist is made, and give custom error to the user. The regex is currently compiled once for each time validate_codes() is called.

I noticed the python bindings currently don't allow for custom CodelistOptions to be implemented when creating a codelist or codelist factory as the default options are always used, and there is no functionality in the rust for updating the options as well - should we include this in another ticket?

Closes #93

@em-baggie em-baggie requested review from CarolineMorton and oylenshpeegul and removed request for CarolineMorton June 15, 2025 21:27
@em-baggie em-baggie force-pushed the add_custom_validation branch from fc64876 to 44c94d3 Compare June 16, 2025 19:25
Copy link
Collaborator

@CarolineMorton CarolineMorton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @em-baggie

Let me know what you think of this appraoch.

Comment on lines +83 to +84
let codelist = CodeList::new(name, codelist_type, metadata, Some(codelist_options))
.map_err(|e| PyValueError::new_err(e.to_string()))?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this in.

Comment on lines +56 to +64
) -> Result<Self, CodeListError> {
let options = options.unwrap_or_default();

// Validate custom regex if it has been set
if let Some(regex_str) = &options.custom_regex {
Regex::new(regex_str)?;
}

Ok(CodeList {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this. Be interested to get @oylenshpeegul opinion on this as well.

If we go back to the user experience, i think that wil be guide us. I think in python we want to be able to do the following:

# Create a new codelist object
c = CodeList("Pneumonia", "ICD10", "Manually created")

# Add a code to the list
c.add_entry("A119", "pneumonia")

# validate the codelist
c.validate_codes() # validates ICD

# validate custom
c.validate_codes(regex_str="^[A-Z][0-9]{2}\.[0-9]$")

Users would need to use regex in python so we could set if it accepts a string or a regex expression. Let's maybe start with a regex expression and see what that might mean for typing.

I think we want something like a match statement for using the regex or not.

pub trait Validator {
    fn validate_codes(&self) -> Result<(), CodeListValidatorError>;
}

impl Validator for CodeList {
    fn validate_codes(&self, custom_regex: Option<&Regex>) -> Result<(), CodeListValidatorError> {
        match custom_regex {
            Some(regex) => {...actual implementation},
            None => {
                match self.codelist_type {
                    CodeListType::ICD10 => IcdValidator(self).validate_all_code(),
                    CodeListType::SNOMED => SnomedValidator(self).validate_all_code(),
                    CodeListType::OPCS => OpcsValidator(self).validate_all_code(),
                    CodeListType::CTV3 => Ctv3Validator(self).validate_all_code(),
                }
            }
        }
    }
}

With this in mind, putting the regex into CodelistOptions might not be the best place for this.

What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point! If we want to serialize CodelistOptions, we can't have an Option<Regex> as @em-baggie said. But if we yank it out of there and make it a parameter instead, then why not?

I don't think I understand the Python part of your comment. Does PyO3 even translate regexes between Python and Rust? Python regexes are the fancy backtracking kind, whereas in Rust they are the more restricted finite automata kind.

Copy link
Collaborator Author

@em-baggie em-baggie Jun 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking was the regex could be included in the codelistoptions which can be customisable, and would link the regex to the particular codelist. But realise the way I've implemented it means if the regex is set, you can only validate with the regex and not normal validation which depends on the codelist type. Using it as a parameter is a lot more flexible, but the regex but will not be 'saved' in the codelist struct. But I guess validation notes can be added to the metadata to document if different validation methods have been used.

I think with the pyO3 the python can just pass a string and under the hood the rust can convert to regex within the method. Or maybe can just pass a regex in the python - not exactly sure how that would work. But to be honest think Caroline's method is a lot simpler and I probably overcomplicated this!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is codelistoptions something we want to expose to python api so it can be altered by user? Maybe I'm misunderstanding the purpose of these options 🤔

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have a go at implementing the other approach in a separate PR then we can see them both

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will want to expose CodeslistOptions at some point in the Python API i think.


#[test]
fn test_get_metadata() {
fn test_get_metadata() -> Result<(), CodeListError> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add the ability to pass a regular expression into validate() for custom validation

4 participants