Useful code snippets #34

pdavidsonFIA · 2021-01-17T13:35:29Z

pdavidsonFIA
Jan 17, 2021

I start a thread for code snippets people find useful, but probably not meaningful enough to justify a dedicated place in a repo.
I will start the ball rolling with something that I use every day and saves me a ton of time. I write to most of my model output to pickle files, which I think are great since they preserve data types and structure (particularly a date- or period-indexed data frame). Often it is nice to inspect the content of the .pkl without opening a IDE and loading it. I discovered context_menu, which seems to work very well on Windows 10.
The following code, when run from a script, adds an 'unpickle' option to .pkl files. So when you right click the file you can choose to unpickle it, to an xlsx file in the same location with the same name. Double-click also seems to work since this is the only option associated with .pkl on my machine.

def unpickle_to_excel(filenames, params):
    import os
    import pandas as pd
    filename_pkl= os.path.splitext(filenames[0])[0] + '.pkl'
    filename_xl=os.path.splitext(filenames[0])[0] + '.xlsx'
    df = pd.read_pickle(filename_pkl)
    df.reset_index().to_excel(filename_xl, index=False)

if __name__ == '__main__':
    from context_menu import menus
    cm = menus.FastCommand('unpickle', type='.pkl', python=unpickle_to_excel)
    cm.compile()

#To remove the menu:
# from context_menu import  menus
# menus.removeMenu('unpickle','.pkl')

fumitoh · 2021-01-18T14:42:39Z

fumitoh
Jan 18, 2021

This isn't a snippet, but in case you need to draw waterfall charts in Python!

https://github.com/actuarialopensource/modelx-samples/blob/master/waterfall.ipynb

draw_waterfall(df, title="Cashflows")

0 replies

pdavidsonFIA · 2021-01-18T14:45:39Z

pdavidsonFIA
Jan 18, 2021
Author

thank you very much for this. I already briefly googled this topic earlier today but didn't find one with nice colours.

0 replies

pdavidsonFIA · 2021-01-24T11:44:34Z

pdavidsonFIA
Jan 24, 2021
Author

This is a simplified version of waterfalls, just using series data and fewer colours, but with labels to select base, and data labels.
https://gist.github.com/pdavidsonFIA/770ddb4b06dc61bdd3e9c67d484030a9

df = pd.Series({ "Premiums": 20,
           "Claims": -10,
           "Expenses": -5})
df['Total']=df.sum()
draw_waterfall(df, title='demo', stocks=['Total'], show=True)

0 replies

jimbrig · 2022-11-03T21:44:09Z

jimbrig
Nov 3, 2022

Here's an R script I used in my lossrx R Package (WIP) that quickly simulates transactional claims data for P&C reserving - GitHub.com | jimbrig/lossrx > claims_transactional.R (Note that the original version of this script was a GitHub Gist.

Link to Script:

View Script

#  ------------------------------------------------------------------------
#
# Title : Transactional Loss Data Simulation
#    By : Jimmy Briggs
#  Date : 2020-11-06
#
#  ------------------------------------------------------------------------

# simulate transactions on insurance claims
# transactions will include claim closures, payments and changes in case reserves
# this is all completely made up and does not accurately resemble actual claims
library(lubridate)
library(dplyr)
library(tibble)
library(randomNames)
library(tidyr)

# source("R/utils.R")
devtools::load_all()

# number of claims
n_claims <- 10000

# start/end dates & accident range
beg_date <- as.Date("2015-01-01")
end_date <- end_of_month(start_of_month(Sys.Date()) - 1)
accident_range <- as.numeric(end_date - beg_date)

set.seed(1234)

accident_date <- sample(0:accident_range, size = n_claims, replace = TRUE)

payment_fun <- function(n) rlnorm(n, 7.5, 1.5)

claims <- tibble::tibble(
  claim_num = 1:n_claims,
  claim_id = paste0("claim-", 1:n_claims),
  accident_date = beg_date + lubridate::days(accident_date),
  state = sample(c("TX", "CA", "GA", "FL"), size = n_claims, replace = TRUE),
  claimant = randomNames::randomNames(n_claims),
  report_date = rnbinom(n_claims, 5, .25),
  # 0 if claim closed when reported
  status = rbinom(n_claims, 1, 0.96),
  # initial payment amount
  payment =  payment_fun(n_claims)) %>%
  dplyr::mutate(report_date = accident_date + report_date,
                # set payment to zero if closed when reported
                payment = ifelse(status == 0, 0, payment),
                case = payment * runif(n_claims, 0.25, 8.0),
                transaction_date = report_date) %>%
  dplyr::arrange(accident_date)

## simulate transaction dates
# simulate number of transactions for each claim
n_trans <- rnbinom(n_claims, 3, 0.25)

# simulate lag to each transaction
trans_lag <- lapply(n_trans, function(x) rnbinom(x, 7, 0.1))
trans_lag <- lapply(trans_lag, function(x) {
  if (length(x) == 0) 0 else x
})

for (i in seq_len(n_claims)) {
  trans_lag[[i]] <- data_frame(
    "trans_lag" = trans_lag[[i]],
    "claim_id" = paste0("claim-", i),
    "claim_num" = i
  )
}

trans_tbl <- bind_rows(trans_lag) %>%
  group_by(claim_id, claim_num) %>%
  # switch from incremental to cumulative lag
  mutate(trans_lag = cumsum(trans_lag)) %>%
  ungroup()

# separate all zero claims from the claims that have payments
zero_claims <- dplyr::filter(claims, status == 0)
first_trans <- dplyr::filter(claims, status == 1)

subsequent_trans <- left_join(trans_tbl, first_trans, by = c("claim_num", "claim_id")) %>%
  filter(!is.na(accident_date))

n_trans <- nrow(subsequent_trans)

# simulate subsequent transaction payments
subsequent_trans <- subsequent_trans %>%
  mutate(payment = payment_fun(n_trans),
         case = pmax(case * rnorm(n_trans, 1.5, 0.1) - payment, 500),
         transaction_date = report_date + trans_lag) %>%
  select(-trans_lag)

trans <- bind_rows(zero_claims, first_trans, subsequent_trans) %>%
  arrange(transaction_date)

# add in a transaction number
trans$trans_num <- 1:nrow(trans)

# set final trans status to closed and case to 0
trans <- trans %>%
  arrange(trans_num) %>%
  group_by(claim_num, claim_id) %>%
  mutate(final_trans = ifelse(trans_num == max(trans_num), TRUE, FALSE),
         status = ifelse(final_trans, 0, 1),
         case = ifelse(final_trans, 0, case),
         status = ifelse(status == 0, "Closed", "Open"),
         paid = round(cumsum(payment), 0),
         case = round(case, 0),
         payment = round(payment, 0)) %>%
  select(-final_trans) %>%
  arrange(accident_date) %>%
  ungroup() %>%
  # filter out any transactions past latest eval
  filter(.data$transaction_date <= end_date)

fs::dir_create("data-raw/cache")
qs::qsave(trans, file = paste0("data-raw/cache/", Sys.Date(), "-transactional-claims.qs"))

claims_transactional <- trans
usethis::use_data(claims_transactional, overwrite = TRUE)

Further, I functionalized this workflow into an R function simulate_claims in the same package: Function Reference Documentation - lossrx - simulate_claims.R:

actuary-simulate_claims.R:

View Function

#' simulate_claims
#'
#' @description A function to simulate *transactional* actuarial claims/loss
#'   data for Property Casualty Insurance.
#'
#' @param n_claims Numeric - Number of claims to be simulated.
#' @param start_date,end_date Character/Date - Start and End dates for simulation to create claims within (experience_period).
#' @param seed Numeric - the seed is used to isolate randomness during statistical simulations.
#' @param loss_distribution Character - must be one of the distributions mentioned in the details below. Defaults to lognormal.
#' @param params Parameters associated with the specified `loss_distribution`
#'  in a list (i.e. `list(mean_log = 7.5, sd_log = 1.5)` for lognormal distribution).
#' @param status_prob_open Numeric - must be within `0 < x < 1` and represents
#'  probability a claim is open when running binomial simulations for claims'
#'  status.
#' @param cache Boolean/Logical - enable caching?
#' @param ... If needed
#'
#' @details
#'
#' Severity/Loss Distributions:
#'  - Normal: `norm`
#'  - Lognormal: `lnorm`
#'  - Gamma: `gamma`
#'  - LogGamma: `lgamma`
#'  - Pareto: `pareto`
#'  - Weibull: `weibull`
#'  - Generalized Beta: `genbeta`
#'
#' @return The return value, if any, from executing the function.
#'
#' @importFrom dplyr mutate arrange bind_rows group_by ungroup filter left_join select
#' @importFrom lubridate days
#' @importFrom randomNames randomNames
#' @importFrom stats rlnorm rnbinom rbinom runif rnorm
#' @importFrom tibble tibble
#' @importFrom rlang .data .env
simulate_claims <- function(n_claims = 1000,
                            start_date = "2015-01-01",
                            end_date = Sys.Date(),
                            seed = 12345,
                            loss_distribution = "lnorm",
                            params = list(mean_log = 7.5, sd_log = 1.5),
                            status_prob_open = 0.96,
                            cache = FALSE,
                            ...) {

  # loss_distribution <- match.arg("loss_distribution")

  stopifnot(
    is.numeric(n_claims) && n_claims > 0,
    class(as.Date(start_date)) == "Date",
    class(as.Date(end_date)) == "Date" &&
      as.Date(end_date) > as.Date(start_date),
    is.numeric(seed),
    loss_distribution %in% c(
      "lnorm",
      "lognormal",
      "normal",
      "gamma",
      "lgamma",
      "pareto",
      "weibull",
      "genbeta"
    ),
    is.numeric(status_prob_open),
    status_prob_open > 0 && status_prob_open < 1
  )

  beg_date <- as.Date(start_date)
  end_date <- as.Date(end_date)
  accident_range <- as.numeric(end_date - beg_date)
  set.seed(seed)
  accident_date <- sample(0:accident_range, size = n_claims, replace = TRUE)

  # mean_log <- 7.5
  # sd_log <- 1.5

  payment_fun <- function(n) stats::rlnorm(n, params$mean_log, params$sd_log)

  claims <- tibble::tibble(
    claim_num = paste0("claim-", 1:n_claims),
    accident_date = beg_date + lubridate::days(accident_date),
    state = sample(c("TX", "CA", "GA", "FL"), size = n_claims, replace = TRUE),
    claimant = randomNames::randomNames(n_claims),
    report_lag = stats::rnbinom(n_claims, 5, .25), # 0 if claim closed when reported
    status = stats::rbinom(n_claims, 1, 0.96), # initial payment amount
    payment =  payment_fun(n_claims)
  ) %>%
    dplyr::mutate(
      report_date = .data$accident_date + .data$report_lag,
      payment = ifelse(.data$status == 0, 0, .data$payment),
      case = .data$payment + stats::runif(.env$n_claims, 0.25, 8.0),
      transaction_date = .data$report_date
    ) %>%
    dplyr::arrange(.data$accident_date)

  n_trans <- stats::rnbinom(n_claims, 3, 0.25)
  trans_lag <- lapply(n_trans, function(x) stats::rnbinom(x, 7, 0.1)) %>%
    lapply(function(x) { if (length(x) == 0) 0 else x })

  for (i in seq_len(n_claims)) {
    trans_lag[[i]] <- tibble::tibble(
      "trans_lag" = trans_lag[[i]],
      "claim_num" = paste0("claim-", i)
    )
  }

  trans_tbl <- dplyr::bind_rows(trans_lag) %>%
    dplyr::group_by(.data$claim_num) %>%
    dplyr::mutate(trans_lag = cumsum(.data$trans_lag)) %>%
    dplyr::ungroup()

  # separate all zero claims from the claims that have payments
  zero_claims <- dplyr::filter(claims, .data$status == 0)
  first_trans <- dplyr::filter(claims, .data$status == 1)

  subsequent_trans <- dplyr::left_join(trans_tbl, first_trans, by = "claim_num") %>%
    dplyr::filter(!is.na(.data$accident_date))

  n_trans <- nrow(subsequent_trans)

  subsequent_trans <- subsequent_trans %>%
    dplyr::mutate(payment = payment_fun(.env$n_trans),
                  case = pmax(.data$case * stats::rnorm(.env$n_trans, 1.5, 0.1) - .data$payment, 500),
                  transaction_date = .data$report_date + .data$trans_lag) %>%
    dplyr::select(-.data$trans_lag)

  trans <- dplyr::bind_rows(zero_claims, first_trans, subsequent_trans) %>%
    dplyr::arrange(.data$transaction_date)

  # add in a transaction number
  trans$trans_num <- 1:nrow(trans)

  # set final trans status to closed and case to 0
  trans <- trans %>%
    dplyr::arrange(.data$trans_num) %>%
    dplyr::group_by(.data$claim_num) %>%
    dplyr::mutate(final_trans = ifelse(.data$trans_num == max(.data$trans_num), TRUE, FALSE),
                  status = ifelse(.data$final_trans, 0, 1),
                  case = ifelse(.data$final_trans, 0, .data$case),
                  status = ifelse(.data$status == 0, "Closed", "Open"),
                  paid = round(cumsum(.data$payment), 0),
                  case = round(.data$case, 0),
                  payment = round(.data$payment, 0)) %>%
    dplyr::select(-.data$final_trans) %>%
    dplyr::arrange(.data$accident_date) %>%
    dplyr::ungroup() %>%
    dplyr::arrange(.data$claim_num, dplyr::desc(.data$transaction_date))

  if (cache) { saveRDS(trans, file = "trans.RDS") }

  trans

}

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Useful code snippets #34

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 4 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Useful code snippets #34

Uh oh!

Uh oh!

pdavidsonFIA Jan 17, 2021

Replies: 4 comments

Uh oh!

fumitoh Jan 18, 2021

Uh oh!

pdavidsonFIA Jan 18, 2021 Author

Uh oh!

pdavidsonFIA Jan 24, 2021 Author

Uh oh!

jimbrig Nov 3, 2022

pdavidsonFIA
Jan 17, 2021

fumitoh
Jan 18, 2021

pdavidsonFIA
Jan 18, 2021
Author

pdavidsonFIA
Jan 24, 2021
Author

jimbrig
Nov 3, 2022