Contributing to JPlag

## Clustering

All clustering related classes are contained within the `de.jplag.clustering(.*)` packages.

The central idea behind the structure of clustering is the ease of use: To use the clustering calling code should only ever interact with the `ClusteringOptions`, `ClusteringFactory`, and `ClusteringResult` classes:

[![](https://mermaid.ink/img/pako:eNq1VFFLwzAQ_ish-LDBth9QxkCmgqIo-tqXM7nVQJqUXArKdL_d29qts53dJpiXHLnvu_vuLslSKq9RJlJZILoykAXIUyd4zW1JEYNx2Q2o6MOHmE4mYg7W8tGcWW3YYxGNd9SGiUSogBCR2oRLDQVbFaGdrQO2mQ8mvuUt-DZIIkLp6FTWtqSdNmEcRXCqq_IpYBE8O8iHc6M0XNR_EPVrh8fjsxr2yfgjUnrYx8msWKPFbNOC6Hs7OJn407RsbmS3zGXlXa8MY-Omwb2huLp7ssDXLi8gGPJuJdTOptGBVvpqH17s-Z6RShtXL-VrbojYvapyfh0WVoda9oKaXu0VMJ0ax-4FKJzNmmNVsQYPEIN5F2RyY7mEaJCGXZm3HCLD0K_xxwiOKyh28MMiqsP-gqt3uZer46snVvw-sWHD1r7hDw71FbbW8NxRHrmOy_-YTPM_cvgaIEcyx5CD0fwhb7KmMr5hjqlM2NS4AA6bytStoWWh-b1da8OvQiYLsIQjCWX0Lx9OySSGEreg-l-vUV_fDLwfoA)](https://mermaid-js.github.io/mermaid-live-editor/edit/#pako:eNq1VFFLwzAQ_ish-LDBth9QxkCmgqIo-tqXM7nVQJqUXArKdL_d29qts53dJpiXHLnvu_vuLslSKq9RJlJZILoykAXIUyd4zW1JEYNx2Q2o6MOHmE4mYg7W8tGcWW3YYxGNd9SGiUSogBCR2oRLDQVbFaGdrQO2mQ8mvuUt-DZIIkLp6FTWtqSdNmEcRXCqq_IpYBE8O8iHc6M0XNR_EPVrh8fjsxr2yfgjUnrYx8msWKPFbNOC6Hs7OJn407RsbmS3zGXlXa8MY-Omwb2huLp7ssDXLi8gGPJuJdTOptGBVvpqH17s-Z6RShtXL-VrbojYvapyfh0WVoda9oKaXu0VMJ0ax-4FKJzNmmNVsQYPEIN5F2RyY7mEaJCGXZm3HCLD0K_xxwiOKyh28MMiqsP-gqt3uZer46snVvw-sWHD1r7hDw71FbbW8NxRHrmOy_-YTPM_cvgaIEcyx5CD0fwhb7KmMr5hjqlM2NS4AA6bytStoWWh-b1da8OvQiYLsIQjCWX0Lx9OySSGEreg-l-vUV_fDLwfoA)

New clustering algorithms and preprocessors can be implemented using the `GenericClusteringAlgorithm` and `ClusteringPreprocessor` interfaces which operate on similarity matrices only. `ClusteringAdapter` handles the conversion between `de.jplag` classes and matrices. `PreprocessedClusteringAlgorithm` adds a preprocessor onto another `ClusteringAlgorithm`.

### Remarks on Spectral Clustering

 - based on [On Spectral Clustering: Analysis and an algorithm (Ng, Jordan & Weiss, 2001)](https://proceedings.neurips.cc/paper/2001/file/801272ee79cfde7fa5960571fee36b9b-Paper.pdf)
 - automatic hyper-parameter search using Bayesian Optimization with a Gaussian Process as the surrogate model and L-BFGS for optimization on the surrogate
 - the L-BFGS implementation is a pit of technical debt, [see here](https://github.com/jplag/JPlag/pull/281#discussion_r810171986).

### Integration Tests

There are integration tests for the Spectral Clustering to verify, that a least in the case of two known sets of similarities the groups known to be colluders are found. However, these are considered to be sensitive data. The datasets are not available to the public and these tests can only be run by maintainers with access.

To run these tests the contents of the [PseudonymizedReports](https://github.com/jplag/PseudonymizedReports) repository must added in the folder `jplag/src/test/resources/de/jplag/PseudonymizedReports`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contributing to JPlag #8

Clustering

Remarks on Spectral Clustering

Integration Tests

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Contributing to JPlag #8

Description

Clustering

Remarks on Spectral Clustering

Integration Tests

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions