Skip to content

misc ideas and thoughts #8

@sa-lee

Description

@sa-lee

techniques for larger datasets

  • rectangular binning (plus scaling counts) see options in vega https://vega.github.io/vega/examples/density-heatmaps/. I think this could be used to replicate the results in Cook and Miller,2006.
  • continuous hexagonal binning (plus scaling counts) . We can think of binning counts as a streaming algorithm (i.e. use something like count-min-sketch data structure)
  • random sampling (or stratified random sampling if interest is in clustering)
  • transparency
  • displaying largish categorical data (I.e. cluster labels) via hextris (more for static displays or linking

preprocessing transformations

  • add kernel PCA decomposition as a preprocessing step
  • scaled distances based on kNN

spatial linking

or manipulating subgraphs of kNN (on data and embedding space) via brushing operations

  • LC meta criterion, mean relative rank error, neighborhood loss
  • use the k-NN indexes to form a tangent map on the data space / embedding space. Rescale the data by average neighbourhood mean and then compute the SVD again (the left singular values will be an estimate of the tangent)
  • can use the aforementioned singular values to produce new display axes

manual clustering

via persistent brushes, stability of k-means etc.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions