Skip to content

gg_survival with 'by' handles factors with NA incorrectly when occurring before other levels #28

@daniellemccool

Description

@daniellemccool

I don't have a good minimal working example or anything, but I'm going to try my best to describe what's going on.

After calling gg_survival with either type "kaplan" or type "nelson" and supplying a factor for 'by', survfit with strata on 'by' is called. The default is na.group = FALSE, so it stratifies only on the other levels of the factor.

A little further down in the code, we have this bit:

  if(!is.null(by)){
    tm_splits <- which(c(FALSE,sapply(2:nrow(tbl), function(ind){tbl$time[ind] < tbl$time[ind - 1]})))

    lbls <- unique(data[,by])
    tbl$groups <- lbls[1]

    for(ind in 2:(length(tm_splits) + 1)){
      tbl$groups[tm_splits[ind - 1]:nrow(tbl)] <- lbls[ind]
    }
  }

Unique also returns 'NA' as an option, but NA was not included as a stratum level, so if you have a situation where NA occurs before at least one of your levels, it will take its place and you'll drop a factor level you potentially cared about.

I solved it myself by editing in na.group = TRUE to the call to strata in the kaplan and nelson functions because I wanted that information anyway, but I guess this might be something encountered by others as well!

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions