Releases: pageseeder/diffx
Release 1.2.4
This release includes a bugfix for whitespace handling, improvements to the configuration handling, and several code quality enhancements.
Bugfixes
- Whitespace Handling: Fixed an issue where leading spaces were not being stripped correctly in certain contexts within the component. This ensures proper handling of mixed content with inline elements.
- Improved the whitespace context management with a more robust implementation of the method
replaceByTrailing - Added additional test cases to validate the fix for mixed content scenarios
- Improved the whitespace context management with a more robust implementation of the method
Improvements
- Configuration Handling: Enhanced the
DiffConfigclass with proper equality and hashcode methods- Fixed the
equals()method to correctly compare all configuration properties - Improved the
hashCode()implementation for better performance and correctness - Added support for the
allowDoctypeDeclarationproperty in equality checks
- Fixed the
Full Changelog: v1.2.3...v1.2.4
Release 1.2.2
New Features
XML Balance Checking
- XMLBalanceCheckFilter: Added a new filter to ensure that XML tokens are properly balanced in the DiffHandler processing pipeline
- Tracks start and end element pairs to verify they match correctly
- Detects and reports XML structure imbalances including extra or missing elements
- Provides diagnostics through the method and detailed error reporting
isBalanced() - Useful for validating XML integrity during difference operations
API Improvements
Enhanced XMLToken Interface
- Improved Null Safety: Added annotations throughout the API to prevent null pointer exceptions
@NotNull - Standardized Documentation: Enhanced Javadoc clarity with consistent descriptions and improved parameter/return value documentation
- New Default Methods: Added the method to simplify whitespace detection across token types
isWhitespace() - Clarified Contract: Improved documentation on equals/hashCode implementation requirements for better performance
Whitespace Handling
- Simplified Whitespace Detection: Replaced custom
isWhiteSpacemethods with the standardized approach in both andtoken.isWhitespace()``WhitespaceStripper``ExtendedWhitespaceStripper - Enhanced Processing Logic: Improved the whitespace processing algorithm for more consistent results
- Better Edge Case Handling: Added test cases for complex mixed content scenarios
- Fixed Context Management: Implemented more robust context tracking for accurate whitespace preservation
Full Changelog: v1.2.1...v1.2.2
Release 1.2.1
New Features
XML Processing Improvements
- XMLEventBalancer (Beta): Added new implementation to ensure balanced XML structure in DiffHandler operations. This experimental component ensures well-formed XML during diff operations by maintaining properly paired start and end elements.
Core Functionality
- NoOpFilter: Added implementation for transparent operation forwarding in DiffHandler. This filter passes operations through without modifications, providing a clean way to chain handlers.
Maintenance and Improvements
Build System Enhancements
- Replaced Maven publishing scripts with JReleaser configuration
- Refactored build scripts to use centralized dependencies management
- Updated wrapper scripts for compliance and robustness
- Added SPDX license headers to script files
- Improved JAVA_HOME validation in scripts
Documentation
- Enhanced Javadoc for key methods and classes across the project
- Added detailed parameter annotations to improve code clarity
Testing
- Added unit tests for
ExtendedWhitespaceStripperto verify various whitespace handling scenarios
Code Quality
- Improved Maven publishing configuration to use assignment syntax for task descriptions and credentials
- Removed unused import statements
Compatibility
This release maintains compatibility with Java 11 and later versions. The library continues to provide efficient differencing algorithms specifically optimized for XML structures.
Notes
The XMLEventBalancer is currently marked as experimental (beta) and subject to change in future releases.
Release 1.2.0
Breaking changes
Now requires Java 11
New Features
- Document Tokens: Added
StartDocumentTokenandEndDocumentTokenclasses to represent XML document boundaries - Sequence Processing: Introduced
SequenceProcessorinterface withExtendedWhitespaceStripperimplementation for configurable whitespace handling in XML sequences - Similarity Metrics: Added new similarity measurement capabilities:
- Implemented
XMLElementSimilarityclass with length-based boosting and child stream similarity - Added
StreamSimilarityinterface with Edit, Jaccard, and Cosine similarity implementations
- Implemented
Code Improvements
-
Dependency Updates:
- Upgraded to Java 11 and configured toolchain for compatibility
- Updated to Gradle 8.13 with improved distribution validation
- Updated JUnit dependencies to use BOM (Bill of Materials) for version alignment
-
Refactoring:
- Refactored
SAXLoaderfor improved XML reader handling - Refactored
XMLElementfor better content handling - Renamed
XMLElementSimilaritytoElementSimilaritywith improved method names - Replaced
SimilarityFunctionwithSimilarityinterface (old interface deprecated)
- Refactored
-
API Changes:
- Deprecated
getChildrenmethod inXMLElementin favor ofgetContent - Deprecated
setXMLReaderClassmethod - Removed debug flags for cleaner codebase
- Deprecated
-
Null Safety:
- Added
@NotNullannotations totoXMLmethod parameters - Added
@NotNullannotations toNamespaceSet.addmethod parameters - Enhanced exception handling throughout the codebase
- Added
v1.1.2
Release 1.1.2
New Features
- Wagner-Fischer Algorithm: Added similarity-based diffing using the Wagner-Fischer algorithm for improved text comparison capabilities
- Whitespace Handling Utility: Added a new utility class to strip whitespace from specified list of elements
Bug Fixes
- Fixed constructor to properly respect the namespace-aware parameter
- Fixed potential bug in KumarRanganAlgorithm implementation
Code Improvements
- Refactored NilToken to use a singleton pattern for better memory efficiency
- Improved ElementToken and its default implementation
- Added constructor and enhanced annotations in Sequence class
- Refactored stack usage with Deque for better performance in isWellFormed method
- Standardized static final field declarations across the codebase
- Renamed "open" to "start" in XML element handling logic for improved clarity
- Added private constructor to Actions utility class to prevent instantiation
- Simplified empty checks by utilizing the isEmpty method
Documentation and Testing
Release 1.1.1
- Fixes issues in
PostXMLFixerwhere some element could be left unclosed. - Improved support for Unicode characters in
TokenizerBySpaceWord
Release 1.1.0
The focus of this release was to address a number of security vulnerabilities, in particular XML eXternal Entity injection (XXE) in the code.
Although, XXE issues could easily be mitigated by filtering the XML input before submitting to diffx, we changed the default configuration to be secure by default and disabled loading external entities and DTDs as outlined in https://cheatsheetseries.owasp.org/cheatsheets/XML_External_Entity_Prevention_Cheat_Sheet.html
If for some reason, you need to use the DTD, you can set the allowDoctypeDeclaration boolean option to true in the DiffConfig.
Release 1.0.0
This release is functionally identical to 0.9.0 except that all deprecated code has been removed.
If you need to transition from the old API, use 0.9.0 instead.
Release 0.9.0
This version is a complete review of Diffx with a new API, architecture and algorithms.
For backward compatibility, some old APIs have been kept as deprecated code.
The following algorithms are provided:
MyersGreedyXMLAlgorithm, an implementation of Myers' greedy algorithm adjusted for XML.MyersGreedyAlgorithm, an implementation of the greedy algorithm as outlined in Eugene Myers' paper "An O(ND) Difference Algorithm and its Variations".MyersLinearAlgorithman implementation of the linear algorithm as outlined in Eugene Myers' paperMatrixXMLAlgorithm, an XML-aware algorithm based on the Wagner-Fisher algorithm.WagnerFischerAlgorithm, an implementation of the Wagner-Fisher algorithm with no optimisation.HirschbergAlgorithm, an implementation of the Hirschberg algorithm to find the longest common subsequence.KumarRanganAlgorithm, a legacy implementation of the S. Kiran Kumar and C. Pandu Rangan algorithm to find the longest
common subsequence (LCS). Several bugs affecting previous releases have been fixed.
The most efficient algorithm is Myer's greedy algorithm. The XML version has been adjusted to reorder the XML events in the diff in order to product a well-formed XML output. In some cases, it may be unable to produce a well-formed output, in which case, you may need to fall back on less efficient algorithms.
The matrix XML algorithm requires a large matrix, while it is easier to compute a diff path that produces well-formed XML, it also excessively memory hungry.
Release 0.8.1
- Fixed bug with namespace declarations missing from second root when diffx output has more than one root element (for example when comparing to xml fragments with different roots).