Skip to content

Testing alternative normalizations #51

@globbestael

Description

@globbestael

Should AuthorVariantsExperimentsTest and PublicationExperiment be refactored?

Reason

  • AuthorVariantsExperimentsTest tests the performance of several alternative normalizations of Author input fields
    • uses a PublicationExperiment (extends Publication) which has the alternative normalization methods. Publication has been refactored to a POJO and the normalization has been moved to IOService and NormalizationService. PublicationExperiment should follow the same pattern / will be removed once the methods are moved (?)

The original normalization of Publication (e.g. for Authors) has been spread to

  • IOService::addNormalizedAuthor
  • NormalizationService::normalizeInputAuthors
    There is some logic in the IOService::addNormalized... methods which cannot be moved to the NormalizationService: the fields of Publication are only set in IOService, NormalizationService is not aware of Publication.

But the alternatives to be tested belong to the NormalizationService. This would entail for this test that:

  • an alternative IOService can be injected into the DeduplicationService (and there should be a IOService interface). Or there should be a getter and setter for this IOService field of the DeduplicationService (?).
  • there should be a NormalizationService interface
  • the alternative NormalizationService implementations can be injected into this IOService.
  • @DirtiesContext should applied to this test class?

Evaluation

  • GOOD: production code and test code use same structure / principles
  • GOOD: the AuthorComparatorService already uses this format
  • GOOD: extending this to testing alternative implementations of normalization of journal, pages and title will be easier
  • VERY GOOD: the current AuthorVariantsExperimentsTest only allows seeing the performance of the comparison for this field only. In the refactored version a full comparison for (1) all fields for (2) real test files would be possible.
  • BAD: a lot of refactoring work

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions