misc ideas and thoughts

### techniques for larger datasets

* rectangular binning (plus scaling counts) see options in vega https://vega.github.io/vega/examples/density-heatmaps/. I think this could be used to replicate the results in Cook and Miller,2006.
* continuous hexagonal binning (plus scaling counts) . We can think of binning counts as a streaming algorithm (i.e. use something like count-min-sketch data structure)
* random sampling (or stratified random sampling if interest is in clustering)
* transparency
* displaying largish categorical data (I.e. cluster labels) via hextris (more for static displays or linking


### preprocessing transformations

 * add kernel PCA decomposition as a preprocessing step
*  scaled distances based on kNN

### spatial linking

or manipulating subgraphs of kNN (on data and embedding space) via brushing operations

* LC meta criterion, mean relative rank error, neighborhood loss
* use the k-NN indexes to form a tangent map on the data space / embedding space. Rescale the data by average neighbourhood mean and then compute the SVD again (the left singular values will be an estimate of the tangent)
* can use the aforementioned singular values to produce new display axes

### manual clustering

via persistent brushes, stability of k-means etc.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

misc ideas and thoughts #8

techniques for larger datasets

preprocessing transformations

spatial linking

manual clustering

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

misc ideas and thoughts #8

Description

techniques for larger datasets

preprocessing transformations

spatial linking

manual clustering

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions