Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
197 commits
Select commit Hold shift + click to select a range
647f065
Update
Sep 9, 2025
1827055
update
Sep 9, 2025
d7708e8
edit
Sep 9, 2025
af18d62
update
Sep 9, 2025
26b0067
edits and update
Sep 9, 2025
5d40e7d
edits
Sep 9, 2025
6fabe11
update
Sep 9, 2025
321b19d
edit
Sep 9, 2025
374c70e
edit
Sep 9, 2025
54db7c2
Edit
Sep 9, 2025
f0847d6
edits made
Sep 9, 2025
2609339
edit
Sep 9, 2025
d0dd4c3
edit
Sep 9, 2025
8b35f47
update
Sep 9, 2025
d4e1386
edit
Sep 10, 2025
01d1e81
Edit
Sep 10, 2025
8a8b362
Edit
Sep 10, 2025
396b065
test
Sep 10, 2025
be9849a
python
Sep 10, 2025
3df9556
Update
Sep 10, 2025
b5d32cd
update
Sep 10, 2025
9f7608e
update
Sep 14, 2025
a07dcb7
update
Sep 14, 2025
a82a4eb
update
Sep 14, 2025
4270488
update
Sep 17, 2025
89d5d89
update
Sep 17, 2025
3974185
update
Sep 18, 2025
4faacd1
updated
Sep 19, 2025
1d19404
edit
Sep 19, 2025
b3d6b7a
edit
Sep 19, 2025
eaf5487
edits and updates
Sep 22, 2025
2fc6e2e
update
Sep 22, 2025
f84f115
update
Sep 23, 2025
08a5f0c
update
Sep 23, 2025
21aa5ee
update
Sep 23, 2025
2309d9b
update
Sep 23, 2025
4213eaa
edit
Sep 23, 2025
4e24575
update
Sep 24, 2025
dda313d
update
Sep 24, 2025
e0ae392
update
Sep 25, 2025
5c651b3
update content
Sep 29, 2025
c8d51bd
edit
Sep 29, 2025
9b3a3da
update
Sep 29, 2025
cc34e53
update
Sep 29, 2025
7e72588
update
Sep 30, 2025
3287006
updated
Sep 30, 2025
679ea3d
updated
Sep 30, 2025
5ce5955
update
Oct 3, 2025
2962e87
update
Oct 3, 2025
b4abe98
updated
Oct 3, 2025
f4c094e
Update authors in index.qmd
ecteodoro Oct 5, 2025
5832055
Update index.html
ecteodoro Oct 5, 2025
1309b49
slide and file update
Oct 8, 2025
d9de29a
csv file
Oct 8, 2025
f751dd2
paper update
Oct 8, 2025
664fefd
merge
Oct 8, 2025
bdb9f68
Create README.md
drnamita Oct 8, 2025
955ed89
Update README.md
drnamita Oct 8, 2025
b2ce95e
Update README.md
drnamita Oct 8, 2025
91227c5
Update README.md
drnamita Oct 8, 2025
9856ee2
Update README.md
drnamita Oct 8, 2025
4bbe041
Update README.md
drnamita Oct 8, 2025
ccf25c9
Update README.md
drnamita Oct 8, 2025
95a13be
Update README.md
drnamita Oct 8, 2025
335a54f
Update README.md
drnamita Oct 8, 2025
abab90b
Update README.md
drnamita Oct 8, 2025
6056066
file updates
Oct 8, 2025
07780b4
Merge branch 'main' of https://github.com/drnamita/IDC6940_NamitaMishra
Oct 8, 2025
d9b5f80
files update
Oct 9, 2025
e92c281
update
Oct 10, 2025
25e5d09
update
Oct 10, 2025
7033250
revised
Oct 11, 2025
e81cefb
update files
Oct 11, 2025
fa53f0a
updates
Oct 11, 2025
1dd3e39
edit author
Oct 11, 2025
c5ea521
plots updated
Oct 15, 2025
20c1a0e
update on visulaization
Oct 15, 2025
b795ee3
files updated
Oct 15, 2025
2c762f5
files updated
Oct 15, 2025
6b0adca
updated
Oct 15, 2025
0250697
edits
Oct 15, 2025
33d59b6
edits made
Oct 16, 2025
809a374
edited
Oct 21, 2025
022f818
edited prior details
Oct 21, 2025
2aa7c1f
test-training calculations and results edited
Oct 21, 2025
b12a123
edits on description
Oct 21, 2025
266341a
alignment edits
Oct 21, 2025
64f4c7d
edit tabs
Oct 23, 2025
c934c46
edits
Oct 23, 2025
e10ea1f
files added
Oct 23, 2025
7a04359
files edited
Oct 23, 2025
0d0b5f0
files
Oct 23, 2025
313966b
updated as is
Oct 23, 2025
3f466d6
update
Oct 23, 2025
86c7896
edited plot
Oct 23, 2025
baf8fdb
minor edits
Oct 23, 2025
7e4b9cf
edits
Oct 23, 2025
021c0ec
edits
Oct 23, 2025
de60544
update
Oct 23, 2025
dd4499d
file updates
Oct 23, 2025
074912d
files updated
Oct 23, 2025
e30700d
slides file updated
Oct 23, 2025
6e18ecf
All files updates
Oct 24, 2025
fe5ac37
files updated
Oct 24, 2025
df52003
edits in code
Oct 24, 2025
c337ea1
edits
Oct 24, 2025
e7e263a
file updates
Oct 24, 2025
4293e03
edits
Oct 28, 2025
55d8376
updates
Oct 30, 2025
05a8a7e
edits
Oct 30, 2025
c89771a
edit
Oct 30, 2025
38fba37
edits
Oct 30, 2025
bbae2a9
edits
Oct 30, 2025
b887419
edits
Oct 30, 2025
eaa7e43
edit files
Oct 30, 2025
d752284
edits
Oct 30, 2025
299e048
edits and plots
Oct 30, 2025
afaa66b
edits
Oct 30, 2025
fa58a8f
edits
Oct 30, 2025
eed20b8
edits
Oct 30, 2025
8d5fb68
edit
Oct 30, 2025
24e9ca4
edits
Oct 30, 2025
30a38a3
edits
Oct 30, 2025
c27ea57
edits and updates
Oct 31, 2025
4271ff9
edit files
Oct 31, 2025
9d92a4b
edit
Oct 31, 2025
811a18c
files updated
Oct 31, 2025
0fec105
files edited
Oct 31, 2025
f462d4c
edits
Oct 31, 2025
0531aef
edits
Oct 31, 2025
d2f2ad0
edit files
Oct 31, 2025
8e3e460
edits
Oct 31, 2025
39c8ecb
edits
Oct 31, 2025
8fca95e
updates alignments
Nov 2, 2025
20f28fa
update
Nov 2, 2025
ffd7dbe
edits
Nov 2, 2025
1b3e408
edits
Nov 2, 2025
9291c26
small edits
Nov 2, 2025
35251f6
edits
Nov 5, 2025
e527cd5
slide font edit
Nov 5, 2025
06bdeb7
edit font
Nov 5, 2025
1213601
font size edit
Nov 5, 2025
141c24f
slide edits
Nov 5, 2025
ed15381
edits
Nov 5, 2025
a5123de
slide edits
Nov 5, 2025
39035cd
slide edits
Nov 5, 2025
8f31b94
edit slides
Nov 5, 2025
cb26a58
edits slides
Nov 5, 2025
68ae340
edit slides
Nov 5, 2025
d29d420
edit
Nov 5, 2025
2e702d7
edits slides
Nov 6, 2025
645aa05
addition corr plot
Nov 11, 2025
80737ce
edit corr interpretation
Nov 11, 2025
1c3db61
addition corr
Nov 11, 2025
b08a7e5
update
Nov 11, 2025
4176d9a
slide edits
Nov 12, 2025
d4cba66
edits
Nov 14, 2025
895e368
refined
Nov 14, 2025
381b01c
refined
Nov 14, 2025
d0adc58
update refined
Nov 15, 2025
74a40a3
update
Nov 15, 2025
f1d2583
update slide and index
Nov 15, 2025
34b2c66
updated
Nov 15, 2025
a3ada3f
file updates
Nov 15, 2025
ee0f086
update
Nov 15, 2025
c28cafd
edits
Nov 16, 2025
c352fe6
edits
Nov 17, 2025
75256fd
edits
Nov 17, 2025
8071202
edited slides
Nov 17, 2025
1aa3e21
slides
Nov 17, 2025
0e2df96
Edit slides
Nov 17, 2025
4f746de
slides edited
Nov 17, 2025
a4200f7
edits
Nov 17, 2025
9f6b4c1
edits
Nov 17, 2025
9ff43ba
to final edits
Nov 17, 2025
e1737d7
edit
Nov 17, 2025
21c0017
Edits in slides
Nov 17, 2025
ea750d6
slide edits
Nov 17, 2025
731e8f3
edits corrections
Nov 17, 2025
08a9ebe
edits in slides
Nov 18, 2025
f505a89
edits paper and slides
Nov 18, 2025
80ee51e
edits
Nov 18, 2025
5b5ea8d
minor edit
Nov 18, 2025
3e4bab2
edits slides
Nov 18, 2025
5d6892a
edits
Nov 18, 2025
b1d3996
edit
Nov 18, 2025
fbc6fa4
edits to update
Nov 18, 2025
5948abc
edited and corrections
Nov 19, 2025
aeb28b2
fine edits - upto 11 slides
Nov 19, 2025
50fc8bf
12 - 14 edited
Nov 20, 2025
57c90a2
slides till 22 - edits finally
Nov 20, 2025
5e531af
slide 12
Nov 20, 2025
36d5ace
slide 12
Nov 20, 2025
4358c9a
all files updated
Nov 20, 2025
381734d
updated
Nov 20, 2025
759472c
Changes in slides 10-11
Nov 20, 2025
6a7fd9a
Comparative table added
Dec 5, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,5 @@
.Rhistory
.RData
.Ruserdata

/.quarto/
27 changes: 27 additions & 0 deletions About.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
---
title: "About"
format: html
---

# Contributors

- Namita Mishra – analytic coding, content draft, developed project plan, collaborated via GitHub.
- Autumn Wilcox – analytic coding, content draft, structured project workflow, collaborated via GitHub.

![](images/clipboard-3361740677.png){width="212"}

Dr. Namita Mishra is a physician, a Head and Neck surgeon and a public health researcher with a strong foundation in medicine, epidemiology, and data science. She is a graduate student in Data Science (Health Analytics).

Her work focuses on early detection and prevention of non-communicable diseases (cancer, obesity) and on health disparities at community level. She has researched salivary gland tumors, cardiac implants and community based research on healthy food access. Leveraging skills from Data Science, she integrates statistical modeling and Bayesian methods into her analyses. Her Bioinformatics expertise utilizes geodata visualization tools (3D Maps and GIS) for presentations.Passionate about bridging clinical insight with data-driven approaches, dedicated to advancing sustainable, evidence-based solutions in epidemiology and community health.

Outside work she explores - gardening, cooking, singing, and sewing.

📧 Contact: nmishra\@uwf.edu

![](images/clipboard-1601697641.jpeg){width="208"}

Autumn S. Wilcox is a U.S. Navy veteran and Data Science graduate student at the University of West Florida, specializing in Analytics and Modeling. She has over nine years of experience in Network Operations and Technical Writing, including her current role at Navy Federal Credit Union, where she supports enterprise technology and process documentation initiatives. Autumn also holds certification in Clinical Research Quality Management (CRQM) and has contributed to quality oversight and compliance efforts in clinical research settings.

Her background bridges technology, analytics, and healthcare, with a focus on applying data-driven approaches to improve communication and systems reliability. Outside of work, Autumn enjoys traveling, photography, and finding creative inspiration through music.

📧 Contact: awr12\@students.uwf.edu
Binary file added BMI_Distribution_by_Diabetes_Status.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Figure1_MergedDataset.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Figure2_MergedDataset.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
27 changes: 27 additions & 0 deletions Introduction.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
---
title: "Introduction"
format: html
bibliography: references.bib
execute:
warning: false
message: false
echo: false
---

# Literature Review

Diabetes mellitus (DM) is a major public health concern closely associated with factors such as obesity, age, race, and gender. Identifying these associated risk factors is essential for targeted interventions @DAngelo2025. **Logistic Regression** (traditional) that estimates the association between risk factors and outcomes is insufficient in analyzing the complex healthcare data (DNA sequences, imaging, patient-reported outcomes, electronic health records (EHRs), longitudinal health measurements, diagnoses, and treatments. @Zeger2020. Classical maximum likelihood estimation (MLE) yields unstable results in samples that are small, have missing data, or presents quasi- and complete separation.

Bayesian hierarchical models using Markov Chain Monte Carlo (MCMC) allow analysis of multivariate longitudinal healthcare data with repeated measures within individuals and individuals nested in a population. By integrating prior knowledge and including exogenous (e.g., age, clinical history) and endogenous (e.g., current treatment) covariates, Bayesian models provide posterior distributions and risk predictions for conditions such as pneumonia, prostate cancer, and mental disorders. Parametric assumptions remain a limitation of these models.

In Bayesian inference @Chatzimichail2023, Bayesian inference has shown that parametric models (with fixed parameters) often underperform compared to nonparametric models, which do not assume a prior distribution. Posterior probabilities from Bayesian approaches improve disease classification and better capture heterogeneity in skewed, bimodal, or multimodal data distributions. Bayesian nonparametric models are flexible and robust, integrating multiple diagnostic tests and priors to enhance accuracy and precision, though reliance on prior information and restricted access to resources can limit applicability. Combining Bayesian methods with other statistical or computational approaches helps address systemic biases, incomplete data, and non-representative datasets.

The Bayesian framework described by @VandeSchoot2021 highlights the role of priors, data modeling, inference, posterior sampling, variational inference, and variable selection.Proper variable selection mitigates multicollinearity, overfitting, and limited sampling, improving predictive performance. Priors can be informative, weakly informative, or diffuse, and can be elicited from expert opinion, generic knowledge, or data-driven methods. Sensitivity analysis evaluates the alignment of priors with likelihoods, while MCMC simulations (e.g., brms, blavaan in R) empirically estimate posterior distributions. Spatial and temporal Bayesian models have applications in large-scale cancer genomics, identifying molecular interactions, mutational signatures, patient stratification, and cancer evolution, though temporal autocorrelation and subjective prior elicitation can be limiting.

Bayesian normal linear regression has been applied in metrology for instrument calibration using conjugate Normal–Inverse-Gamma priors @Klauenberg2015. Hierarchical priors add flexibility by modeling uncertainty across multiple levels, improving robustness and interpretability. Bayesian hierarchical/meta-analytic linear regression incorporates both exchangeable and unexchangeable prior information, addressing multiple testing challenges, small sample sizes, and complex relationships among regression parameters across studies @DeLeeuw2012a

**A sequential clinical reasoning model** @Liu2013 Sequential clinical reasoning models demonstrate screening by adding predictors stepwise: (1) demographics, (2) metabolic components, and (3) conventional risk factors, incorporating priors and mimicking clinical evaluation. This approach captures ecological heterogeneity and improves baseline risk estimation, though interactions between predictors and external cross-validation remain limitations.

**Bayesian multiple imputation with logistic regression** addresses missing data in clinical research @Austin2021 in clinical research by classifying missing values (e.g., patient refusal, loss to follow-up, mechanical errors) as MAR, MNAR, or MCAR. Multiple imputation generates plausible values across datasets and pools results for reliable classification of patient health status and mortality.

## References
Binary file added Line plot_bmi_obs_pred.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
180 changes: 180 additions & 0 deletions Nhanes/Nhanes script.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,180 @@
# ---------------------- Load packages ----------------------
options(repos = c(CRAN = "https://cloud.r-project.org"))
library(nhanesA)
library(tidyverse)
library(forcats)
library(survey)
library(mice)
library(brms)
library(posterior)
library(knitr)

# ---------------------- Import NHANES data ----------------------


bmx_h <- nhanes("BMX_H")
demo_h <- nhanes("DEMO_H")
diq_h <- nhanes("DIQ_H")

# ---------------------- Select variables ----------------------
exam_sub <- bmx_h %>% select(SEQN, BMXBMI)
need_demo <- c("SEQN","RIDAGEYR","RIAGENDR","RIDRETH1","SDMVPSU","SDMVSTRA","WTMEC2YR")
demo_sub <- demo_h %>% select(all_of(need_demo))
diq_sub <- diq_h %>% select(SEQN, DIQ010, dplyr::any_of("DIQ050"))

# ---------------------- Merge datasets ----------------------
merged_data <- demo_sub %>%
left_join(exam_sub, by = "SEQN") %>%
left_join(diq_sub, by = "SEQN")

# ---------------------- Convert/clean variables ----------------------
to_num <- function(x) {
if (is.numeric(x)) return(x)
xc <- as.character(x)
n <- suppressWarnings(readr::parse_number(xc))
if (mean(is.na(n)) > 0.8) {
xlow <- tolower(trimws(xc))
n <- dplyr::case_when(
xlow %in% c("1","yes","yes, told") ~ 1,
xlow %in% c("2","no","no, not told") ~ 2,
xlow %in% c("3","borderline") ~ 3,
xlow %in% c("7","refused") ~ 7,
xlow %in% c("9","don't know","dont know","unknown") ~ 9,
TRUE ~ NA_real_
)
}
as.numeric(n)
}

merged_data <- merged_data %>%
mutate(
DIQ010 = to_num(DIQ010),
DIQ050 = to_num(if (!"DIQ050" %in% names(.)) NA_real_ else DIQ050),
BMXBMI = as.numeric(BMXBMI),
RIDAGEYR = as.numeric(RIDAGEYR),
RIAGENDR = as.numeric(RIAGENDR),
RIDRETH1 = as.numeric(RIDRETH1),
SDMVPSU = as.numeric(SDMVPSU),
SDMVSTRA = as.numeric(SDMVSTRA),
WTMEC2YR = as.numeric(WTMEC2YR)
)

# ---------------------- Filter adults and create analysis variables ----------------------
adult <- merged_data %>%
filter(RIDAGEYR >= 20) %>%
transmute(
SDMVPSU, SDMVSTRA, WTMEC2YR,
diabetes_dx = case_when(
DIQ010 == 1 ~ 1,
DIQ010 == 2 ~ 0,
DIQ010 %in% c(3,7,9) ~ NA_real_
),
bmi = BMXBMI,
age = RIDAGEYR,
sex = fct_recode(factor(RIAGENDR), Male="1", Female="2"),
race = fct_recode(factor(RIDRETH1),
"Mexican American"="1",
"Other Hispanic"="2",
"NH White"="3",
"NH Black"="4",
"Other/Multi"="5"),
DIQ050 = DIQ050
) %>%
mutate(
age_c = as.numeric(scale(age)),
bmi_c = as.numeric(scale(bmi)),
bmi_cat = cut(bmi, breaks = c(-Inf,18.5,25,30,35,40,Inf),
labels=c("<18.5","18.5–<25","25–<30","30–<35","35–<40","≥40"), right=FALSE),
diabetes_dx = ifelse(sex=="Female" & !is.na(DIQ050) & DIQ050==1, 0, diabetes_dx)
) %>%
mutate(race = fct_relevel(race, "NH White"))

# ---------------------- Survey design ----------------------
nhanes_design_adult <- svydesign(
id = ~SDMVPSU,
strata = ~SDMVSTRA,
weights = ~WTMEC2YR,
nest = TRUE,
data = adult
)

# ---------------------- Survey-weighted logistic regression ----------------------
keep_cc <- with(adult, !is.na(diabetes_dx) & !is.na(age_c) & !is.na(bmi_c) &
!is.na(sex) & !is.na(race))
des_cc <- subset(nhanes_design_adult, keep_cc)
form_cc <- diabetes_dx ~ age_c + bmi_c + sex + race
svy_fit <- svyglm(formula = form_cc, design = des_cc, family = quasibinomial())
svy_or <- broom::tidy(svy_fit, conf.int=TRUE) %>%
mutate(OR=exp(estimate), LCL=exp(conf.low), UCL=exp(conf.high)) %>%
filter(term != "(Intercept)") %>%
select(term, OR, LCL, UCL, p.value)

# ---------------------- Multiple Imputation ----------------------
mi_dat <- adult %>% select(diabetes_dx, age, bmi, sex, race, WTMEC2YR, SDMVPSU, SDMVSTRA)
meth <- make.method(mi_dat)
pred <- make.predictorMatrix(mi_dat)
meth["diabetes_dx"] <- ""
pred["diabetes_dx", ] <- 0
meth["age"] <- "norm"
meth["bmi"] <- "pmm"
meth["sex"] <- "polyreg"
meth["race"] <- "polyreg"
imp <- mice(mi_dat, m=5, method=meth, predictorMatrix=pred, seed=123)
fit_mi <- with(imp, {
age_c <- scale(age)
bmi_c <- scale(bmi)
glm(diabetes_dx ~ age_c + bmi_c + sex + race, family=binomial())
})
pool_mi <- pool(fit_mi)
mi_or <- summary(pool_mi, conf.int=TRUE, exponentiate=TRUE) %>%
filter(term != "(Intercept)")

# ---------------------- Bayesian Logistic Regression ----------------------
adult_imp1 <- complete(imp, 1) %>%
mutate(
age_c = scale(age),
bmi_c = scale(bmi),
wt_norm = WTMEC2YR / mean(WTMEC2YR, na.rm=TRUE),
race = fct_relevel(race, "NH White"),
sex = fct_relevel(sex, "Male")
) %>%
filter(!is.na(diabetes_dx), !is.na(age_c), !is.na(bmi_c),
!is.na(sex), !is.na(race)) %>% droplevels()

priors <- c(
set_prior("normal(0, 2.5)", class="b"),
set_prior("student_t(3, 0, 10)", class="Intercept")
)

bayes_fit <- brm(
formula = diabetes_dx | weights(wt_norm) ~ age_c + bmi_c + sex + race,
data = adult_imp1,
family = bernoulli(link="logit"),
prior = priors,
chains = 4, iter = 2000, seed = 123,
control = list(adapt_delta=0.95),
refresh = 0
)

bayes_or <- posterior_summary(bayes_fit, pars="^b_") %>%
as.data.frame() %>%
tibble::rownames_to_column("raw") %>%
mutate(
term = gsub("^b_", "", raw),
term = gsub("race", "race:", term),
term = gsub("sex", "sex:", term),
OR = exp(Estimate),
LCL = exp(Q2.5),
UCL = exp(Q97.5)
) %>%
filter(term != "Intercept") %>%
select(term, OR, LCL, UCL)

# ---------------------- Save results ----------------------
dir.create("outputs", showWarnings = FALSE)
saveRDS(svy_fit, "outputs/svy_fit.rds")
saveRDS(pool_mi, "outputs/pool_mi.rds")
saveRDS(bayes_fit, "outputs/bayes_fit.rds")
saveRDS(svy_or, "outputs/survey_OR_table.rds")
saveRDS(mi_or, "outputs/mi_OR_table.rds")
saveRDS(bayes_or, "outputs/bayes_OR_table.rds")
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Predicted Probability of Diabetes vs BMI.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Prior vs Posterior Distributions_bmi_age.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading