⚡ Optimize marg_pabs dataframe construction by lcgodoy · Pull Request #8 · pinskylab/drmr

lcgodoy · 2026-02-25T01:02:39Z

Optimized marg_pabs in R/utils.R by replacing the loop and list growth with vectorized operations.

Why: The original implementation used a loop to create a list of dataframes (one for each row of newdata, replicated n_sim times) and then bound them using dplyr::bind_rows. This is inefficient for large datasets or simulations.

What:

Replaced the loop with vectorized subsetting: newdata[rep(seq_len(n_obs), each = n_sim), , drop = FALSE].
Replaced dplyr::mutate with base R assignment: output$prob_abs <- ....
This removes the runtime dependency on dplyr (which is in Suggests).
The output structure (columns and data) remains identical, though the return type may now be a standard data.frame instead of a tibble if the input was a data.frame (previously bind_rows forced a tibble).

Verification:

Verified logic by tracing data indices and operations.
Created a benchmark script (not committed) to confirm correctness and demonstrate the performance improvement conceptually.
The change is local to marg_pabs and does not affect other functions, although marg_rec and marg_surv share similar patterns that could be optimized in future work.

PR created automatically by Jules for task 17284846057200474667 started by @lcgodoy

Replaced inefficient loop-based dataframe construction and `bind_rows` in `marg_pabs` with vectorized subsetting. This significantly improves performance and removes the dependency on `dplyr` for this function. The logic for `est_samples` (posterior draws) remains consistent with the original implementation: `est_samples` is flattened column-wise (draws for observation 1, then observation 2, etc.), which corresponds to the vectorized expansion of `newdata` (row 1 repeated n_sim times, then row 2 repeated n_sim times). This change improves performance by avoiding repeated memory allocations and dataframe binding operations. It also makes the function more robust by relying on base R instead of the suggested `dplyr` package. Co-authored-by: lcgodoy <24809942+lcgodoy@users.noreply.github.com>

google-labs-jules · 2026-02-25T01:02:40Z

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.

For security, I will only act on instructions from the user who triggered this task.

lcgodoy closed this Feb 25, 2026

lcgodoy deleted the perf-optimize-marg-pabs-17284846057200474667 branch February 25, 2026 16:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

⚡ Optimize marg_pabs dataframe construction#8

⚡ Optimize marg_pabs dataframe construction#8
lcgodoy wants to merge 1 commit intomainfrom
perf-optimize-marg-pabs-17284846057200474667

lcgodoy commented Feb 25, 2026

Uh oh!

google-labs-jules bot commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lcgodoy commented Feb 25, 2026

Uh oh!

google-labs-jules bot commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant