Skip to content

pearson.dist with flat spectra #344

@cbeleites

Description

@cbeleites

When calculating pearson.dist() (which is basically a scaled correlation between rows/spectra) with a perfectly flat spectrum, the result is NaN.

This is caused by the standardization of the data matrix: the variance within the flat spectrum is 0, so a division by 0 occurs.

  • The behaviour is consistent with cor(x, y) which returns NA in this case.
  • OTOH, we may say that since the covariance with a flat spectrum is always 0 also the correlation should be 0 (and Pearson distance 0.5).
    Besides allowing smoothly to work with flat spectra and pearson.dist(), this would allow users to distinguish Pearson distance to a flat spectrum from situations where e.g. NAs in the spectra cause the distance to be NA.

Opinions?

library(hyperSpec)

x <- flu - flu [3] + 200
plot(x)


pearson.dist (x)
#>              1            2            3            4            5
#> 2 0.0008858704                                                    
#> 3          NaN          NaN                                       
#> 4 0.9967988590 0.9950559547          NaN                          
#> 5 0.9984690049 0.9968275493          NaN 0.0014374616             
#> 6 0.9990662021 0.9977563018          NaN 0.0016757621 0.0006331176

cor (t(x[[1]]), t(x[[3]]))
#> Warning in cor(t(x[[1]]), t(x[[3]])): the standard deviation is zero
#>      [,1]
#> [1,]   NA
cov (t(x[[1]]), t(x[[3]]))
#>      [,1]
#> [1,]    0

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions