Skip to content

⚡ Optimize marg_pabs dataframe construction#8

Closed
lcgodoy wants to merge 1 commit intomainfrom
perf-optimize-marg-pabs-17284846057200474667
Closed

⚡ Optimize marg_pabs dataframe construction#8
lcgodoy wants to merge 1 commit intomainfrom
perf-optimize-marg-pabs-17284846057200474667

Conversation

@lcgodoy
Copy link
Collaborator

@lcgodoy lcgodoy commented Feb 25, 2026

Optimized marg_pabs in R/utils.R by replacing the loop and list growth with vectorized operations.

Why: The original implementation used a loop to create a list of dataframes (one for each row of newdata, replicated n_sim times) and then bound them using dplyr::bind_rows. This is inefficient for large datasets or simulations.

What:

  • Replaced the loop with vectorized subsetting: newdata[rep(seq_len(n_obs), each = n_sim), , drop = FALSE].
  • Replaced dplyr::mutate with base R assignment: output$prob_abs <- ....
  • This removes the runtime dependency on dplyr (which is in Suggests).
  • The output structure (columns and data) remains identical, though the return type may now be a standard data.frame instead of a tibble if the input was a data.frame (previously bind_rows forced a tibble).

Verification:

  • Verified logic by tracing data indices and operations.
  • Created a benchmark script (not committed) to confirm correctness and demonstrate the performance improvement conceptually.
  • The change is local to marg_pabs and does not affect other functions, although marg_rec and marg_surv share similar patterns that could be optimized in future work.

PR created automatically by Jules for task 17284846057200474667 started by @lcgodoy

Replaced inefficient loop-based dataframe construction and `bind_rows` in `marg_pabs` with vectorized subsetting. This significantly improves performance and removes the dependency on `dplyr` for this function.

The logic for `est_samples` (posterior draws) remains consistent with the original implementation: `est_samples` is flattened column-wise (draws for observation 1, then observation 2, etc.), which corresponds to the vectorized expansion of `newdata` (row 1 repeated n_sim times, then row 2 repeated n_sim times).

This change improves performance by avoiding repeated memory allocations and dataframe binding operations. It also makes the function more robust by relying on base R instead of the suggested `dplyr` package.

Co-authored-by: lcgodoy <24809942+lcgodoy@users.noreply.github.com>
@google-labs-jules
Copy link
Contributor

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

@lcgodoy lcgodoy closed this Feb 25, 2026
@lcgodoy lcgodoy deleted the perf-optimize-marg-pabs-17284846057200474667 branch February 25, 2026 16:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant