Skip to content

Releases: pageseeder/diffx

Release 1.2.4

14 Jul 07:58

Choose a tag to compare

This release includes a bugfix for whitespace handling, improvements to the configuration handling, and several code quality enhancements.

Bugfixes

  • Whitespace Handling: Fixed an issue where leading spaces were not being stripped correctly in certain contexts within the component. This ensures proper handling of mixed content with inline elements.
    • Improved the whitespace context management with a more robust implementation of the method replaceByTrailing
    • Added additional test cases to validate the fix for mixed content scenarios

Improvements

  • Configuration Handling: Enhanced the DiffConfig class with proper equality and hashcode methods
    • Fixed the equals() method to correctly compare all configuration properties
    • Improved the hashCode() implementation for better performance and correctness
    • Added support for the allowDoctypeDeclaration property in equality checks

Full Changelog: v1.2.3...v1.2.4

Release 1.2.2

14 Jul 07:49

Choose a tag to compare

New Features

XML Balance Checking

  • XMLBalanceCheckFilter: Added a new filter to ensure that XML tokens are properly balanced in the DiffHandler processing pipeline
    • Tracks start and end element pairs to verify they match correctly
    • Detects and reports XML structure imbalances including extra or missing elements
    • Provides diagnostics through the method and detailed error reporting isBalanced()
    • Useful for validating XML integrity during difference operations

API Improvements

Enhanced XMLToken Interface

  • Improved Null Safety: Added annotations throughout the API to prevent null pointer exceptions @NotNull
  • Standardized Documentation: Enhanced Javadoc clarity with consistent descriptions and improved parameter/return value documentation
  • New Default Methods: Added the method to simplify whitespace detection across token types isWhitespace()
  • Clarified Contract: Improved documentation on equals/hashCode implementation requirements for better performance

Whitespace Handling

  • Simplified Whitespace Detection: Replaced custom isWhiteSpace methods with the standardized approach in both and token.isWhitespace()``WhitespaceStripper``ExtendedWhitespaceStripper
  • Enhanced Processing Logic: Improved the whitespace processing algorithm for more consistent results
  • Better Edge Case Handling: Added test cases for complex mixed content scenarios
  • Fixed Context Management: Implemented more robust context tracking for accurate whitespace preservation

Full Changelog: v1.2.1...v1.2.2

Release 1.2.1

27 Jun 02:19

Choose a tag to compare

New Features

XML Processing Improvements

  • XMLEventBalancer (Beta): Added new implementation to ensure balanced XML structure in DiffHandler operations. This experimental component ensures well-formed XML during diff operations by maintaining properly paired start and end elements.

Core Functionality

  • NoOpFilter: Added implementation for transparent operation forwarding in DiffHandler. This filter passes operations through without modifications, providing a clean way to chain handlers.

Maintenance and Improvements

Build System Enhancements

  • Replaced Maven publishing scripts with JReleaser configuration
  • Refactored build scripts to use centralized dependencies management
  • Updated wrapper scripts for compliance and robustness
  • Added SPDX license headers to script files
  • Improved JAVA_HOME validation in scripts

Documentation

  • Enhanced Javadoc for key methods and classes across the project
  • Added detailed parameter annotations to improve code clarity

Testing

  • Added unit tests for ExtendedWhitespaceStripper to verify various whitespace handling scenarios

Code Quality

  • Improved Maven publishing configuration to use assignment syntax for task descriptions and credentials
  • Removed unused import statements

Compatibility

This release maintains compatibility with Java 11 and later versions. The library continues to provide efficient differencing algorithms specifically optimized for XML structures.

Notes

The XMLEventBalancer is currently marked as experimental (beta) and subject to change in future releases.

Release 1.2.0

27 Jun 04:56

Choose a tag to compare

Breaking changes

Now requires Java 11

New Features

  • Document Tokens: Added StartDocumentToken and EndDocumentToken classes to represent XML document boundaries
  • Sequence Processing: Introduced SequenceProcessor interface with ExtendedWhitespaceStripper implementation for configurable whitespace handling in XML sequences
  • Similarity Metrics: Added new similarity measurement capabilities:
    • Implemented XMLElementSimilarity class with length-based boosting and child stream similarity
    • Added StreamSimilarity interface with Edit, Jaccard, and Cosine similarity implementations

Code Improvements

  • Dependency Updates:

    • Upgraded to Java 11 and configured toolchain for compatibility
    • Updated to Gradle 8.13 with improved distribution validation
    • Updated JUnit dependencies to use BOM (Bill of Materials) for version alignment
  • Refactoring:

    • Refactored SAXLoader for improved XML reader handling
    • Refactored XMLElement for better content handling
    • Renamed XMLElementSimilarity to ElementSimilarity with improved method names
    • Replaced SimilarityFunction with Similarity interface (old interface deprecated)
  • API Changes:

    • Deprecated getChildren method in XMLElement in favor of getContent
    • Deprecated setXMLReaderClass method
    • Removed debug flags for cleaner codebase
  • Null Safety:

    • Added @NotNull annotations to toXML method parameters
    • Added @NotNull annotations to NamespaceSet.add method parameters
    • Enhanced exception handling throughout the codebase

v1.1.2

13 Jun 22:55

Choose a tag to compare

Release 1.1.2

New Features

  • Wagner-Fischer Algorithm: Added similarity-based diffing using the Wagner-Fischer algorithm for improved text comparison capabilities
  • Whitespace Handling Utility: Added a new utility class to strip whitespace from specified list of elements

Bug Fixes

  • Fixed constructor to properly respect the namespace-aware parameter
  • Fixed potential bug in KumarRanganAlgorithm implementation

Code Improvements

  • Refactored NilToken to use a singleton pattern for better memory efficiency
  • Improved ElementToken and its default implementation
  • Added constructor and enhanced annotations in Sequence class
  • Refactored stack usage with Deque for better performance in isWellFormed method
  • Standardized static final field declarations across the codebase
  • Renamed "open" to "start" in XML element handling logic for improved clarity
  • Added private constructor to Actions utility class to prevent instantiation
  • Simplified empty checks by utilizing the isEmpty method

Documentation and Testing

  • Improved overall documentation with better comments and explanations
  • Added @Version and @SInCE tags for better version tracking
  • Added comprehensive unit tests for SimilarityWagnerFischerAlgorithm
  • Enhanced code with @OverRide and @NotNull annotations for better type safety

Release 1.1.1

01 Mar 05:28

Choose a tag to compare

  • Fixes issues in PostXMLFixer where some element could be left unclosed.
  • Improved support for Unicode characters in TokenizerBySpaceWord

Release 1.1.0

01 Mar 05:22

Choose a tag to compare

The focus of this release was to address a number of security vulnerabilities, in particular XML eXternal Entity injection (XXE) in the code.

Although, XXE issues could easily be mitigated by filtering the XML input before submitting to diffx, we changed the default configuration to be secure by default and disabled loading external entities and DTDs as outlined in https://cheatsheetseries.owasp.org/cheatsheets/XML_External_Entity_Prevention_Cheat_Sheet.html

If for some reason, you need to use the DTD, you can set the allowDoctypeDeclaration boolean option to true in the DiffConfig.

Release 1.0.0

01 Mar 05:16

Choose a tag to compare

This release is functionally identical to 0.9.0 except that all deprecated code has been removed.

If you need to transition from the old API, use 0.9.0 instead.

Release 0.9.0

01 Mar 05:14

Choose a tag to compare

This version is a complete review of Diffx with a new API, architecture and algorithms.

For backward compatibility, some old APIs have been kept as deprecated code.

The following algorithms are provided:

  • MyersGreedyXMLAlgorithm, an implementation of Myers' greedy algorithm adjusted for XML.
  • MyersGreedyAlgorithm, an implementation of the greedy algorithm as outlined in Eugene Myers' paper "An O(ND) Difference Algorithm and its Variations".
  • MyersLinearAlgorithm an implementation of the linear algorithm as outlined in Eugene Myers' paper
  • MatrixXMLAlgorithm, an XML-aware algorithm based on the Wagner-Fisher algorithm.
  • WagnerFischerAlgorithm, an implementation of the Wagner-Fisher algorithm with no optimisation.
  • HirschbergAlgorithm, an implementation of the Hirschberg algorithm to find the longest common subsequence.
  • KumarRanganAlgorithm, a legacy implementation of the S. Kiran Kumar and C. Pandu Rangan algorithm to find the longest
    common subsequence (LCS). Several bugs affecting previous releases have been fixed.

The most efficient algorithm is Myer's greedy algorithm. The XML version has been adjusted to reorder the XML events in the diff in order to product a well-formed XML output. In some cases, it may be unable to produce a well-formed output, in which case, you may need to fall back on less efficient algorithms.

The matrix XML algorithm requires a large matrix, while it is easier to compute a diff path that produces well-formed XML, it also excessively memory hungry.

Release 0.8.1

01 Dec 23:37

Choose a tag to compare

  • Fixed bug with namespace declarations missing from second root when diffx output has more than one root element (for example when comparing to xml fragments with different roots).