#Mice Analysis
##Descriptive Statistics ###Script Overview
The script (descriptive_statistics.py) processes behavioral data from a CSV file and generates four types of visualizations to analyze performance across subjects (mice) in a reward-based task. The main functions are:
- Purpose: Compares performance between good and poor performers using a boxplot for good performers and scattered points for poor performers.
- Inputs:
df_data: DataFrame with columnssubject(subject identifier) andperf(performance values, 0-1).chance_level: Threshold to classify good/poor performers (default: 0.55).
- Features:
- Boxplot for good performers (performance ≥ chance_level) with styled whiskers, caps, and medians.
- Jittered scatter points for both good and poor performers, with distinct colors (blue for good, red for poor).
- Reference line at the chance level.
- Custom legend and statistical annotation (Mann-Whitney U test p-value) if both groups exist.
- Large fonts and high contrast for presentation readability.
- Purpose: Plots psychometric curves showing performance as a function of reward probability for each subject and the group average.
- Inputs:
combined_data: DataFrame with columnssubject,perf, andprobability_r(probability of reward for right choice).
- Features:
- Individual subject curves with performance plotted against reward probability, excluding the 0.5 probability condition.
- Group average curve in black for emphasis.
- Ideal performance line (y=x) and chance level references (horizontal and vertical at 0.5).
viridiscolor palette for subjects, with a custom legend placed outside the plot.- Handles invalid performance values (
-1) and transforms left-side probabilities to right-equivalent for consistency.
- Purpose: Analyzes and visualizes performance around block transitions (changes in reward probability) or choice switches, showing trial-by-trial correctness.
- Inputs:
df: DataFrame with columnssubject,choice(0=left, 1=right), andprobability_r.window: Number of trials to show before/after the event.switch_type: 0 for block transitions, 1 for choice switches.
- Features:
- Plots performance (fraction correct) aligned to switch events for each subject, with SEM error bars.
- Identifies switch points based on changes in probability (
switch_type=0) or choice (switch_type=1). - Includes reference lines for the event (vertical at lag 0) and chance level (horizontal at 0.5).
- Uses
viridiscolormap for subjects, with a legend placed outside the plot. - Returns an aggregated DataFrame for further analysis.
- Purpose: Visualizes the reward probability for the right choice across trials for a specific session and subject.
- Inputs:
df: DataFrame with columnssession,subject,trial, andprobability_r.
- Features:
- Plots reward probability dynamics for a single session (hardcoded: session 129, subject 'B1').
- Includes a chance level reference line at 0.5.
- Optimized for slides with a clean design: Arial font, thicker lines, visible grid, and no top/right spines.
- High-contrast colors and large fonts for readability.
- Data Loading:
- Reads a CSV file (
global_trials1.csv) with behavioral data, using;as the separator and specifyingiti_durationas float. - Applies a custom
parsingfunction to filter data (e.g., for trained mice,trained=1).
- Reads a CSV file (
- Data Preprocessing:
- Creates a
choicecolumn based onoutcomeandside(correct/incorrect and left/right). - Computes a
perfcolumn (1 for correct choices based onprobability_r, 0 otherwise).
- Creates a
- Visualization:
- Calls all four plotting functions in sequence:
plot_blocks,plot_performance_comparison,plot_performance_psychometric, andoutcome_block_change(withswitch_type=1andwindow=10).
- Calls all four plotting functions in sequence:
- Ensure the required CSV file (
global_trials1.csv) is available at the specified path. - Verify that the
extra_plotting,model_avaluation, andparsingmodules are accessible. - Update the
data_pathvariable if needed. - The script will generate and display four plots:
- Reward probability dynamics for a specific session/subject.
- Performance comparison between good and poor performers.
- Psychometric curves for all subjects.
- Performance around choice switches.
- The script assumes a specific CSV structure with columns like
subject,session,outcome,side,probability_r, anditi_duration. Ensure your data matches this format. - The
plot_blocksfunction is hardcoded for session 129 and subject 'B1'; modify the filtering condition (df[(df['session'] == 129) & (df['subject'] == 'B1')]) for other sessions/subjects. - The
parsingfunction is not provided; it must handle data filtering for trained/untrained mice. - Visualization parameters (e.g., font sizes, figure dimensions) are optimized for presentations but can be adjusted in the
paramsdictionaries.
##Model Performance Visualization
###Script overview
The script (metrics_plots.py) is designed to load previously stored model performance data from CSV files and create comparative visualizations grouped by models. It includes two main functions:
####plot_metrics_comparison_grouped_by_models
- Purpose: Generates five separate plots to compare model performance across different metrics: Log Likelihood, Number of Trials, BIC (Bayesian Information Criterion), Log Likelihood per Observation, and Accuracy.
- Inputs:
blocks: A list of experimental conditions (e.g., probability blocks like[[0.2,0.8], [0.3,0.7]]).metrics_data: A nested dictionary containing metric values for each model (e.g.,{model_name: {metric_name: [values_across_blocks]}}).model_names: A list of model identifiers (e.g.,['glm_prob_switch', 'glm_prob_r']).
- Features:
- Uses a consistent color scheme (
viridispalette) across all plots. - Employs large, readable fonts suitable for presentations.
- Automatically adjusts layout to prevent overlap.
- Each plot visualizes one metric, grouping data by models and coloring by experimental conditions.
- Uses a consistent color scheme (
- Purpose: Creates a single grouped comparison plot for a specific metric, using boxplots and stripplots to show distribution and individual data points.
- Inputs:
blocks: Same as above.metrics_data: Same as above.model_names: Same as above.metric: The specific metric to plot (e.g.,'log_likelihood').ylabel: Y-axis label for the plot.palette: Color palette for different blocks.- Font size parameters for title, labels, legend, and ticks.
- Features:
- Boxplots show the distribution of metric values per model and block.
- Stripplots overlay individual data points with slight jitter for clarity.
- Custom legend placement outside the plot to avoid overlap.
- Special handling for model-specific block labeling (e.g., different block indices for
inference_basedvs. other models). - Grid and despined axes for clean visualization.
####Main Execution
- Data Loading:
- Reads CSV files containing model metrics (e.g.,
all_subjects_glm_metrics_{model}_{block}.csv). - Aggregates data by taking the median across subjects for each metric.
- Handles missing files or errors by appending empty arrays.
- Reads CSV files containing model metrics (e.g.,
- Model Configuration:
- Supports multiple models (e.g.,
glm_prob_switch,glm_prob_r,inference_based). - Uses different block configurations for standard and special models.
- Supports multiple models (e.g.,
- Visualization:
- Calls
plot_metrics_comparison_grouped_by_modelsto generate plots for selected models. - Includes error handling to report issues during plot generation.
- Calls
-
Place the script in a directory containing the required CSV files with model metrics.
-
Update the
main_folderandresults_basepaths in the script to match your file locations. -
Modify
models_to_analyzeandmodels_to_plotto include the desired models. -
The script will generate and display five plots comparing the specified metrics across models.
- Ensure CSV files follow the expected format with columns for
subject,log_likelihood,log_likelihood_per_obs,BIC,AIC, andaccuracy. - The script assumes median aggregation per subject; modify the
df.groupby('subject').median()line to use all data points if needed.
Font sizes and figure dimensions are optimized for presentations but can be adjusted in the function parameters.
##GLM-Right
The script (glm_right.py) processes behavioral data from a CSV file and includes functions to compute choice-related regressors, fit GLMs, and generate plots to visualize model coefficients. The main functions are:
- Purpose: Processes trial-by-trial data to compute regressors for the GLM and generates the regression formula.
- Inputs:
df: DataFrame with columns likesession,outcome,side,iti_duration,probability_r.n: Number of previous trials to consider (n_back).
- Features:
- Filters sessions with >50 trials.
- Encodes choices (
left,right,other) based onoutcomeandside. - Creates
choice_num(1=right, 0=left) for the dependent variable. - Computes regressors:
r_plus: 1 for rewarded right choice, -1 for rewarded left choice, 0 otherwise.r_minus: 1 for unrewarded right choice, -1 for unrewarded left choice, 0 otherwise.
- Builds regressors for past trials (1 to
n-1):r_plus_{i},r_minus_{i}. - Constructs the GLM formula (e.g.,
r_plus_1 + r_minus_1 + r_plus_2 + r_minus_2).
- Returns:
- Processed DataFrame with regressors.
- Regression formula string.
- Purpose: Creates a single plot showing GLM coefficients for all mice, optimized for A0 poster presentation.
- Inputs:
df: DataFrame with behavioral data.n_back: Number of previous trials to consider.figsize: Figure size (default: A0 poster size, 46.8x33.1 inches).
- Features:
- Excludes mouse 'A10'.
- Uses 5-fold cross-validation to fit logistic regression models (
choice_num ~ regressors). - Computes coefficients, p-values, and evaluation metrics (log-likelihood, AIC, BIC, pseudo R-squared, accuracy, precision, recall, F1, ROC AUC, Brier score).
- Saves metrics to a CSV file (
all_subjects_glm_metrics_glm_prob_r_{n_back}.csv). - Plots coefficients for:
r_plus(red, solid line with circles).r_minus(blue, dashed line with squares).Intercept(green, circle at x=0).
- Uses an alpha gradient to differentiate mice.
- Adds significance markers (
***,**,*,ns) based on Fisher’s combined p-values across mice. - Includes a custom legend and grid for readability.
- Optimized for posters with large fonts and thick lines.
- Purpose: Plots GLM coefficients separately for each mouse in a subplot grid, predicting right-side choices.
- Inputs:
df: DataFrame with behavioral data.
- Features:
- Creates a subplot grid (2 rows, dynamic columns) for individual mice.
- Filters sessions with >50 trials.
- Uses
obt_regressorswithn_back=10to compute regressors. - Fits logistic regression (
choice_num ~ regressors) for each mouse using 5-fold cross-validation. - Plots median coefficients for:
Intercept(green).r_plus(red).r_minus(blue).
- Data Loading:
- Reads
global_trials1.csvwith;separator, specifyingiti_durationas float. - Prints unique tasks for reference.
- Reads
- Data Preprocessing:
- Applies a custom
parsingfunction for trained mice (trained=1,opto_yes=0).
- Applies a custom
- Visualization:
- Defaults to separate plots for each mouse (
separate_mice=True) by callingglm. - Optionally generates combined plots for all mice with varying
n_backvalues ([2,3,4,7,10]) ifseparate_mice=False.
- Defaults to separate plots for each mouse (
- Ensure the required CSV file (
global_trials1.csv) is available at the specified path. - Verify that the
extra_plottingandparsingmodules are accessible. - Update the
data_pathvariable if needed. - The script will generate and display:
- A subplot grid with GLM coefficients for each mouse (default,
separate_mice=True). - Optionally, A0-sized combined plots for all mice with different
n_backvalues (ifseparate_mice=False).
- A subplot grid with GLM coefficients for each mouse (default,
- The script assumes a specific CSV structure with columns like
subject,session,outcome,side,iti_duration,probability_r, andtask. - The
parsingandselect_train_sessionsfunctions are not provided; they must handle data filtering and cross-validation split creation. - The
n_back=10is hardcoded inglm; adjust as needed. - Visualization parameters (e.g., font sizes, figure dimensions) are optimized for presentations/posters but can be modified in
plt.rcParamsor function arguments. - The script excludes mouse 'A10' and sessions with ≤50 trials.
- Metrics are saved to a CSV file, overwriting any existing file with the same name.
##GLM-Switch ###Script Overview
The script (glm_switch.py) processes behavioral data from a CSV file and includes functions to compute switch-related regressors, fit GLMs, and generate plots to visualize model coefficients. The main functions are:
- Purpose: Processes trial-by-trial behavioral data to compute regressors for the GLM and generates the regression formula.
- Inputs:
df: DataFrame with columns likesession,outcome,side,iti_duration,probability_r.n: Number of previous trials to consider (n_back).
- Features:
- Filters sessions with >50 trials.
- Encodes choices (
left,right,other) based onoutcomeandside. - Computes
switch_num(0=same choice as previous trial, 1=different). - Creates
last_trialregressor (reward outcome of the previous trial). - Builds regressors for past trials (2 to
n):rss_plus{i}: 1 if same choice as previous trial and rewarded at lagi.rss_minus{i}: 1 if same choice as previous trial and unrewarded at lagi.rds_plus{i}: 1 if different choice from previous trial and rewarded at lagi.
- Constructs the GLM formula string (e.g.,
rss_plus2 + rss_minus2 + rds_plus2 + last_trial).
- Returns:
- Processed DataFrame with regressors.
- Regression formula string.
- Purpose: Creates a single plot showing GLM coefficients for all mice, optimized for A0 poster presentation.
- Inputs:
df: DataFrame with behavioral data.n_back: Number of previous trials to consider.figsize: Figure size (default: A0 poster size, 46.8x33.1 inches).
- Features:
- Excludes mouse 'A10'.
- Uses 5-fold cross-validation to fit logistic regression models (
switch_num ~ regressors). - Computes coefficients, p-values, and evaluation metrics (log-likelihood, AIC, BIC, pseudo R-squared, accuracy, precision, recall, F1, ROC AUC, Brier score).
- Saves metrics to a CSV file (
all_subjects_glm_metrics_glm_prob_switch_{n_back}.csv). - Plots coefficients for:
rss_plus(red, solid line with circles).rss_minus(blue, dashed line with squares).rds_plus(orange, dashed line with circles).last_trial(gray, circle).Intercept(green, circle).
- Uses an alpha gradient to differentiate mice.
- Adds significance markers (
***,**,*,ns) based on Fisher’s combined p-values across mice. - Includes a custom legend with regressor types and a grid for readability.
- Optimized for posters with large fonts and thick lines.
- Purpose: Plots GLM coefficients either combined across all mice or separately for each mouse, focusing on switch behavior.
- Inputs:
df: DataFrame with behavioral data.
- Features:
- Creates a subplot grid (2 rows, dynamic columns) for individual mice.
- Filters sessions with >50 trials.
- Uses
obt_regressorswithn_back=10to compute regressors. - Fits logistic regression (
switch_num ~ regressors) for each mouse using 5-fold cross-validation. - Plots median coefficients for:
Intercept(green).rss_plus(red).rss_minus(blue).rds_plus(orange).- Other regressors (gray).
- Adds significance markers (
***,**,*,ns) based on median p-values. - Adjusts x-tick positions and labels for clarity (e.g., special handling for lag 10).
- Shares y-axis across subplots for consistency.
- Optimized for readability with dynamic subplot sizing.
- Data Loading:
- Reads
global_trials1.csvwith;separator, specifyingiti_durationas float. - Prints unique tasks for reference.
- Reads
- Data Preprocessing:
- Applies a custom
parsingfunction for trained mice (trained=1,opto_yes=0).
- Applies a custom
- Visualization:
- Defaults to separate plots for each mouse (
separate_mice=True) by callingglm. - Optionally generates combined plots for all mice with varying
n_backvalues ([2,3,4,7,10]) ifseparate_mice=False.
- Defaults to separate plots for each mouse (
- Ensure the required CSV file (
global_trials1.csv) is available at the specified path. - Verify that the
extra_plotting,model_avaluation, andparsingmodules are accessible. - Update the
data_pathvariable if needed. - The script will generate and display:
- A subplot grid with GLM coefficients for each mouse (default,
separate_mice=True). - Optionally, A0-sized combined plots for all mice with different
n_backvalues (ifseparate_mice=False).
- A subplot grid with GLM coefficients for each mouse (default,
- The script assumes a specific CSV structure with columns like
subject,session,outcome,side,iti_duration,probability_r, andtask. - The
parsingandselect_train_sessionsfunctions are not provided; they must handle data filtering and cross-validation split creation. - The
n_back=10is hardcoded inglm; adjust as needed. - Visualization parameters (e.g., font sizes, figure dimensions) are optimized for presentations/posters but can be modified in
plt.rcParamsor function arguments. - The script excludes mouse 'A10' and sessions with ≤50 trials.
- Metrics are saved to a CSV file, overwriting any existing file with the same name.
##Inference-Based Script
The script (inference-based.py) processes behavioral data from a CSV file and includes functions to compute inference-based features (V_t) and visualize logistic regression coefficients for predicting mouse choices. The main functions are:
- Purpose: Processes trial-by-trial behavioral data to compute value differences (
V_t) and choice-reward sequences based on past trials. - Inputs:
df: DataFrame with columns likesubject,session,outcome,side,probability_r.n_back: Number of previous trials to consider for sequence patterns.hist: Boolean flag to plot a histogram of computedV_tvalues.
- Features:
- Encodes choices (0=left, 1=right) based on
outcomeandside. - Creates choice-reward codes (
00,01,10,11) combining choice and reward outcome. - Builds sequences of
n_backprevious choice-reward pairs. - Computes
V_tas the difference between the mean probability of the right side being active (prob_right) and the left side being active (prob_left) for each sequence. - Optionally plots a histogram of
V_tvalues. - Prepares next-trial choice data for modeling.
- Encodes choices (0=left, 1=right) based on
- Returns: Processed DataFrame with computed features (
choice,sequence,V_t, etc.).
- Purpose: Computes
V_tusing recursive equations based on reward history and given probabilities, suitable for an alternative inference model. - Inputs:
df: DataFrame with trial data.p_SW: Probability of switching from active to inactive state.p_RWD: Probability of reward in the active state.hist: Boolean flag to plot a histogram ofV_tvalues.
- Features:
- Encodes choices similarly to
manual_computation. - Tracks same-site choices across trials.
- Computes
R_t(reward history) recursively:- Resets to 0 on reward.
- Updates using
rho = 1 / ((1 - p_SW) * (1 - p_RWD))for same-site unrewarded trials. - Maintains previous
R_tfor exploratory switches.
- Computes
V_tbased onR_t,side_num, andp_RWD, with warnings for values exceeding 1. - Optionally plots a histogram of
V_tvalues.
- Encodes choices similarly to
- Returns: DataFrame with computed
R_t,V_t, and other intermediate columns.
- Purpose: Creates a single plot showing logistic regression coefficients for all mice, optimized for A0 poster presentation.
- Inputs:
df: DataFrame with behavioral data for all mice.n_back: Number of previous trials for sequence patterns.figsize: Figure size (default: A0 poster size, 46.8x33.1 inches).
- Features:
- Excludes mouse 'A10' from analysis.
- Uses 5-fold cross-validation to fit logistic regression models (
choice ~ V_t + side_numorchoice ~ side_numforv2=2). - Computes coefficients, standard errors, p-values, and confidence intervals.
- Calculates comprehensive metrics (log-likelihood, AIC, BIC, pseudo R-squared, accuracy, precision, recall, F1, ROC AUC, Brier score).
- Saves metrics to a CSV file (path based on
v2andn_back). - Plots coefficients (
β^V,β^S, Intercept) with distinct colors and significance markers (***,**,*,ns) based on Stouffer's combined p-values. - Uses an alpha gradient to differentiate mice in the legend.
- Optimized for poster presentation with large fonts and thick lines.
- Purpose: Plots logistic regression coefficients either combined across all mice or separately for each mouse.
- Inputs:
df: DataFrame with behavioral data.
- Features:
- Supports two modes:
- Combined: Calls
plot_all_mice_correct_inf_combinedwithn_back=3(hardcoded). - Separate: Creates a subplot grid (2 rows, dynamic columns) for individual mice.
- Combined: Calls
- For separate plots:
- Filters sessions with >50 trials.
- Uses
manual_computationwithn_back=3to computeV_t. - Fits logistic regression (
choice ~ V_t + side_num) for each mouse using 5-fold cross-validation. - Plots median coefficients (Intercept,
side_num,V_t) with significance markers. - Uses distinct colors for regressors and rotates x-labels if needed.
- Shares y-axis across subplots for consistency.
- Optimized for readability with dynamic subplot sizing.
- Supports two modes:
- Data Loading:
- Reads
global_trials1.csvwith;separator, specifyingiti_durationas float. - Shifts
iti_durationto the next trial and excludessubject='manual'.
- Reads
- Data Preprocessing:
- Selects relevant columns (
subject,session,outcome,side,iti_duration,probability_r,task,date). - Encodes
outcome_bool(1 for correct, 0 otherwise). - Applies a custom
parsingfunction for trained mice (trained=1,opto_yes=0).
- Selects relevant columns (
- Visualization:
- Calls
inference_plotto generate plots (default: separate plots for each mouse).
- Calls
- Ensure the required CSV file (
global_trials1.csv) is available at the specified path. - Verify that the
extra_plotting,model_avaluation, andparsingmodules are accessible. - Update the
data_pathvariable if needed. - The script will generate and display either:
- A single A0-sized plot with combined coefficients for all mice (if
separate_mice=False). - A subplot grid with coefficients for each mouse (if
separate_mice=True).
- A single A0-sized plot with combined coefficients for all mice (if
- The script assumes a specific CSV structure with columns like
subject,session,outcome,side,probability_r,iti_duration,task, anddate. - The
parsingandselect_train_sessionsfunctions are not provided; they must handle data filtering and cross-validation split creation. - The
v2flag inplot_all_mice_correct_inf_combinedcontrols the computation method (manual_computationormanual_computation_v2) and model formula. - The
n_back=3is hardcoded ininference_plotfor combined plots; adjust as needed. - Visualization parameters (e.g., font sizes, figure dimensions) are optimized for presentations/posters but can be modified in
plt.rcParamsor function arguments. - The script excludes mouse 'A10' and sessions with ≤50 trials in
inference_plot.