-
Notifications
You must be signed in to change notification settings - Fork 11
Description
After a lot of troubleshooting it appears this warning: Error in n.x * n.y : (converted from warning) NAs produced by integer overflow when gbm.step is run is due to the internal function .roc which is appears twice in gbm.step. The first time is fine but the second time is was the warning is thrown.
.roc<-function (obsdat, preddat) { if (length(obsdat) != length(preddat)) { stop("obs and preds must be equal lengths") } n.x <- length(obsdat[obsdat == 0]) n.y <- length(obsdat[obsdat == 1]) xy <- c(preddat[obsdat == 0], preddat[obsdat == 1]) rnk <- rank(xy) wilc <- ((n.x * n.y) + ((n.x * (n.x + 1))/2) - sum(rnk[1:n.x]))/(n.x * n.y) return(round(wilc, 4)) }
The warning is arising when n.x * n.y occurs because they are both integers and if the presence/absence ratios are relatively balanced and you have a large dataset it is very easy to exceed the 32-bit integer limit (~2.1 billion, or 2^31 - 1). For this to occur all you would need is ~50,000 presence records and 50,000 absence records to exceed that threshold. This is specifically occurring when ROC is calculated for the training data thus providing the result for discrimination under self.statistics in the final model output. When the warning occurs, instead of having a value an NA is reported. Although this metric is not often reported, I'm surprised this issue hasn't arose before or hasn't been fixed since it's not uncommon to have large datasets when running SDMs. A simple fix is to put as.numeric in when calculating wilc.
wilc <- ((as.numeric(n.x) * as.numeric(n.y)) + ((as.numeric(n.x) * (n.x + 1))/2) - sum(rnk[1:n.x]))/(as.numeric(n.x) * as.numeric(n.y))
This fixes the issue. The troubling part of this is the user would have no idea the original cause of this warning and whether the outputted results are reliable or not. Hope this saves someone else days or troubleshooting. Is there a way this could be fixed in the actual gbm.step function?