Skip to content

Error in n.x * n.y : (converted from warning) NAs produced by integer overflow - from gbm.step function #55

@dancrear

Description

@dancrear

After a lot of troubleshooting it appears this warning: Error in n.x * n.y : (converted from warning) NAs produced by integer overflow when gbm.step is run is due to the internal function .roc which is appears twice in gbm.step. The first time is fine but the second time is was the warning is thrown.

.roc<-function (obsdat, preddat) { if (length(obsdat) != length(preddat)) { stop("obs and preds must be equal lengths") } n.x <- length(obsdat[obsdat == 0]) n.y <- length(obsdat[obsdat == 1]) xy <- c(preddat[obsdat == 0], preddat[obsdat == 1]) rnk <- rank(xy) wilc <- ((n.x * n.y) + ((n.x * (n.x + 1))/2) - sum(rnk[1:n.x]))/(n.x * n.y) return(round(wilc, 4)) }

The warning is arising when n.x * n.y occurs because they are both integers and if the presence/absence ratios are relatively balanced and you have a large dataset it is very easy to exceed the 32-bit integer limit (~2.1 billion, or 2^31 - 1). For this to occur all you would need is ~50,000 presence records and 50,000 absence records to exceed that threshold. This is specifically occurring when ROC is calculated for the training data thus providing the result for discrimination under self.statistics in the final model output. When the warning occurs, instead of having a value an NA is reported. Although this metric is not often reported, I'm surprised this issue hasn't arose before or hasn't been fixed since it's not uncommon to have large datasets when running SDMs. A simple fix is to put as.numeric in when calculating wilc.

wilc <- ((as.numeric(n.x) * as.numeric(n.y)) + ((as.numeric(n.x) * (n.x + 1))/2) - sum(rnk[1:n.x]))/(as.numeric(n.x) * as.numeric(n.y))

This fixes the issue. The troubling part of this is the user would have no idea the original cause of this warning and whether the outputted results are reliable or not. Hope this saves someone else days or troubleshooting. Is there a way this could be fixed in the actual gbm.step function?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions