Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
132 commits
Select commit Hold shift + click to select a range
e6940e9
make diff of time series
May 4, 2025
8337e9a
`AliasDataFrame` is a small utility that extends `pandas.DataFrame` f…
May 29, 2025
8ddfbf7
adding perfmonitor
May 31, 2025
350f786
adding PerfromanceLogger extracted from calibration code
May 31, 2025
1ba0686
supressing linter warning
May 31, 2025
4a7d520
Add dtype support and alias dependency graph to AliasDataFrame
Jun 1, 2025
54de3fd
Add support for dtype persistence and alias filtering in save/load
Jun 1, 2025
b8e241e
Save aliases directly to pyarrow metadata
Jun 1, 2025
fcb9bb9
add FormulaLinearModel.py used for the dEdx and distortion calibration
Jun 2, 2025
cfe72d4
add FormulaLinearModel.py used for the dEdx and distortion calibration
Jun 2, 2025
9087f54
special treatment for constants - should be enver materialized but used
Jun 2, 2025
60e26cb
special treatment for constants
Jun 2, 2025
b188456
special treatment for constants
Jun 2, 2025
f77f57c
`Add ROOT SetAlias export and Python-to-ROOT AST translation for alia…
Jun 3, 2025
664db50
Add dependency-aware alias materialization with optional cleanup and …
Jun 4, 2025
679141b
Extended usnit test for the sub_frames
Jun 9, 2025
6561696
Add extended unit tests for AliasDataFrame including lazy join and er…
Jun 9, 2025
3aae8ee
fixed - Circular dependency detection
Jun 9, 2025
6759c26
fixing all unit test - except oth the automatic materialization
Jun 9, 2025
071a860
fixing automatic materialization test + working in the distrtion cali…
Jun 9, 2025
ea7c0d6
fixing circular depndency test - all test are OK now
Jun 9, 2025
7389cda
adding unit test for the export_import tree
Jun 10, 2025
da90789
add failing test for export/import of the subframes
Jun 10, 2025
64b27cb
make test_export_tree_read_tree_with_subframe already OK
Jun 10, 2025
2a6bd71
adding metadata to all trees
Jun 11, 2025
9b7a038
Updated documentation
Jun 11, 2025
c2e7ca6
AliasDataFrame: add index-based subframe join and robust error handling
Jun 11, 2025
3753500
AliasDataFrame: Add __getattr__ support for subframes + docstring/typ…
Jun 12, 2025
718259a
AliasDataFrame: Add support for __getattr__ access to subframes and c…
Jun 12, 2025
d55b796
Refactor GroupByRegressor with robust fit logic, dtype casting, and u…
Jun 12, 2025
c45e5d0
Fix: ensure regression outputs are preserved for underpopulated groups
Jun 12, 2025
4f4f425
Fix NaN handling in robust regression and enable predictor-specific m…
Jun 12, 2025
22ce23c
Add NaN filtering and robust fit fallback logic to GroupByRegressor
Jun 12, 2025
2785bc4
Add flexible regression model selection via `fitter` parameter
Jun 12, 2025
c3d3617
* removing pylint warning
Jun 13, 2025
67e3699
* adding __init__.py
Jun 13, 2025
27c9fbe
* adding protection for infinite recursion
Jun 13, 2025
e9da107
pylint fix
Jun 13, 2025
d4d20e6
adding test for the logger
Jun 23, 2025
4d44bb2
adding conversions to the function list
Jun 25, 2025
cb4b5d1
adding chunksize and compression as argument
Jun 27, 2025
87fa521
adding chunksize and compression as argument
Jun 27, 2025
4ef6973
adding df drawing interface similar to the tree::Draw
Aug 14, 2025
512323d
docs(quantile_fit_nd): add v3.1 Ξ”q-centered ND quantile fitting spec
Oct 11, 2025
257d2ea
Commit latest working version of AliasDataFrame
Oct 11, 2025
fc54430
Commit latest working version of perfoemance_logger.py
Oct 11, 2025
161f0f0
Commit latest working version of groupby_regression.py
Oct 11, 2025
53db0b8
feat(DataFrameUtils): Enhance docstrings and error handling for scatt…
Oct 11, 2025
0ae7eac
feat(dfextensions): add ND quantile fitting (Ξ”q-centered) + tests & b…
Oct 11, 2025
273d6f8
test(dfextensions): fix quantile ND tests vs synthetic truth; add rob…
Oct 11, 2025
6d65a12
fix(quantile_fit_nd): exclude q_center from nuisance axes; silence si…
Oct 11, 2025
b4b5b41
fix(dfextensions/quantile_fit_nd): evaluator axis bug + window-local …
Oct 11, 2025
a578c17
tests(quantile_fit_nd): snapshot pre-fix state with rich diagnostics …
Oct 11, 2025
5d9cacd
fix(quantile_fit_nd): do not floor degenerate Ξ”q windows; keep NaN an…
Oct 11, 2025
30b7ee7
tests(quantile_fit_nd): handle Poisson via randomized PIT pre-processing
Oct 11, 2025
12d5fe4
docs(quantile_fit_nd): add Discrete Inputs policy and utilities
Oct 11, 2025
1b2ed00
bench(quantile_fit_nd): correct scaling assertions β€” Ξ±_bβ‰ˆβˆ’0.5, Ξ±_rtβ‰ˆ0.0
Oct 11, 2025
8625857
docs(quantile_fit_nd): add contextLLM.md (cold-start guide + policies)
Oct 11, 2025
2b27e47
docs(quantile_fit_nd): add contextLLM.md (cold-start guide + policies)
Oct 11, 2025
ec9f424
Forgottend commit of refernce test and bench log
Oct 11, 2025
cd63f42
feat(bench): add single-file GroupBy regression benchmark + reports
Oct 22, 2025
57b3293
docs(groupby_regression): add Performance & Benchmarking section + fi…
Oct 22, 2025
7d215d3
docs(bench): set default to 5k groups; document 30% outlier scenario
Oct 22, 2025
bb51bc0
docs(restartContext): update with 5k/5 default, 30% outliers, and lev…
Oct 22, 2025
5c9d14b
feat(groupby_regression): add optional per-group diagnostics (diag, d…
Oct 22, 2025
aa024b0
feat(bench): integrate class-level diagnostics summary into benchmark…
Oct 23, 2025
a71cc4d
docs(restartContext): record diagnostics integration and real-data va…
Oct 23, 2025
cc1ecb4
docs(restartContext): record diagnostics integration and real-data va…
Oct 23, 2025
5cf7431
use faster compression by default
Oct 23, 2025
de11f93
removed accidentally comitted code
Oct 23, 2025
3b3171c
feat(groupby_regression): add make_parallel_fit_fast (Phase 3 NumPy b…
Oct 23, 2025
225437c
feat(groupby): Phase 3 v4 (Numba) β€” 33–36Γ— faster than v2 on tiny groups
Oct 23, 2025
8af2860
feat(groupby): Phase 3 v4 (Numba) β€” 33–36Γ— faster than v2 on tiny groups
Oct 24, 2025
0c4961c
feat(groupby_regression): finalize v4 diagnostics + 200Γ— speedup
Oct 24, 2025
4b7fa67
test(v2/v3/v4): add verbose diagnostics for multi-target layout
Oct 24, 2025
718ed1d
remove junk files
Oct 25, 2025
823c60a
feat: create groupby_regression package structure
Oct 25, 2025
e43f332
refactor: move files to package structure (preserve history)
Oct 25, 2025
4d90f73
refactor: update imports for package structure
Oct 25, 2025
2997c0b
test: add cross-validation tests with realistic tolerances
Oct 25, 2025
0898280
fix: handle suffix in benchmark column names
Oct 25, 2025
a7bdc58
bench: limit quick mode to 2k groups, add performance warnings
Oct 25, 2025
7701367
fix: remove relative import in fallback path (bench_groupby_regressio…
Oct 25, 2025
d4e54b8
benchmarks: add optimized-only benchmark for GroupByRegressor (v2/v3/…
Oct 25, 2025
ba768f6
bench: add optimized-only benchmark (v2/v3/v4)
Oct 25, 2025
94386df
bench: add optimized-only benchmark (v2/v3/v4)
Oct 25, 2025
59702d7
bench: add optimized benchmark + visualization
Oct 25, 2025
4489253
cleanup: remove .bak files from repository
Oct 25, 2025
0be66c9
add: .gitignore for Python project
Oct 25, 2025
0f28d18
cleanup: remove diff.txt and prevent re-adding
Oct 25, 2025
c33b995
cleanup: remove diff.txt and prevent re-adding
Oct 25, 2025
d0aa5e9
cleanup: remove diff.txt and prevent re-adding
Oct 25, 2025
2c3e180
refactor: remove obsolete optimized/ directory
Oct 25, 2025
7871783
refactor: complete migration to new package structure
Oct 25, 2025
36b1043
rm diff.txt
Oct 25, 2025
4290510
cleanup: finalize package structure cleanup
Oct 25, 2025
f0d552d
docs: finalize README.md for v2.0.0 release
Oct 26, 2025
c21812d
docs: finalize README.md for v2.0.0 release
Oct 26, 2025
eef2087
docs: apply Section 1 corrections (scope, measurements)
Oct 27, 2025
5b4ddb5
docs: Section 2 cleanup and focus
Oct 27, 2025
6a797c7
docs: Section 5 accuracy corrections
Oct 27, 2025
5fd11a5
git add docs/SLIDING_WINDOW_SPEC_DRAFT.md
Oct 27, 2025
9ae2976
Complete Section 6 specification with all reviewer feedback
Oct 27, 2025
9271574
fixing bencmark path
Oct 28, 2025
34cab7d
docs: Add Phase 7 implementation plan with synthetic data spec
Oct 28, 2025
18bdeb3
test version 1
Oct 28, 2025
194142d
cd ~/alicesw/O2DPG/UTILS/dfextensions/groupby_regression
Oct 28, 2025
87724b7
feat: Add realistic TPC distortion synthetic data and validation
Oct 28, 2025
97498ea
cd ~/alicesw/O2DPG/UTILS/dfextensions/groupby_regression
Nov 8, 2025
f2e537f
Add column compression support to AliasDataFrame
Nov 9, 2025
6ebf223
Fix precision measurement in AliasDataFrame compression
Nov 9, 2025
687c548
Add .pylintrc to suppress non-functional pylint warningsRetry
Nov 9, 2025
39bb762
Add bidirectional atan2/arctan2 support for ROOT compatibility
Nov 9, 2025
6d19bf9
Remove IDE and system files from tracking, update .gitignore
Nov 9, 2025
8acda25
Remove IDE and system files from tracking, update .gitignore
Nov 9, 2025
5ef0f6b
Remove .idea/ directory from git tracking
Nov 9, 2025
ea5965e
Remove .idea/ from git tracking and add to .gitignore
Nov 9, 2025
cc02d74
Add selective compression mode (Pattern 2) to AliasDataFrame
Nov 10, 2025
6359214
Refactor: Move AliasDataFrame to subdirectory
Nov 10, 2025
70a2c3b
"Refactor: Move AliasDataFrame to subdirectory
Nov 10, 2025
5893349
Docs: Organize documentation structure
Nov 10, 2025
309969a
Style: Fix pylint issues
Nov 10, 2025
6c0dc8b
Style: Fix pylint issues in AliasDataFrame
Nov 10, 2025
b41160d
Style: Fix pylint issues in groupby_regression
Nov 10, 2025
cdff407
Style: Verify pylint scores in quantile_fit_nd All files already β‰₯9.0 βœ…
Nov 10, 2025
cbbb57b
Refactor: Reorganize root utilities into subdirectories
Nov 10, 2025
733e5dc
Style: Fix pylint issues in perfmonitor
Nov 10, 2025
0c098be
Fix: Update imports after reorganization
Nov 10, 2025
8ca5e2d
fix: Resolve 4 pylint errors
Nov 10, 2025
9186aec
fix: Resolve 4 pylint CI errors
Nov 10, 2025
4cb2571
fix: Skip aspirational formula validation test
Nov 10, 2025
cb0fa0a
fix: Skip formula validation tests and resolve pylint errors
Nov 10, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,23 @@
.vscode
*.pyc
o2dpg_tests/**

# IDE settings
.idea/
*.iml

# macOS metadata
.DS_Store
**/.DS_Store

# Python
__pycache__/
*.pyc
*.pyo
*.egg-info/
.pytest_cache/

# Virtual environments
venv/
env/
.idea/
50 changes: 50 additions & 0 deletions UTILS/AO2DQuery/AO2Dquery_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
import sys
#import shutil
#import os
#from pathlib import Path
import ROOT

"""
python $O2DPG/UTILS/AO2DQuery/AO2Dquery_utils.py AO2D_Derived_Merged.root $(find /lustre/alice/users/rverma/NOTESData/alice-tpc-notes/Downsampled -iname AO2D_Derived.root| head -n 10 )
"""

def merge_root_directories_with_suffix(output_file, input_files):
fout = ROOT.TFile(output_file, "RECREATE")

for i, fname in enumerate(input_files):
fin = ROOT.TFile.Open(fname)
if not fin or fin.IsZombie():
print(f"Warning: Could not open {fname}")
continue

for key in fin.GetListOfKeys():
dname = key.GetName()
if not dname.startswith("DF"):
continue

src_dir = fin.Get(dname)
new_dname = f"{dname}__{i}" # Add suffix

fout.cd()
fout.mkdir(new_dname)
fout.cd(new_dname)

for subkey in src_dir.GetListOfKeys():
obj_name = subkey.GetName()
obj = src_dir.Get(obj_name)

# Clone tree properly
if obj.InheritsFrom("TTree"):
cloned = obj.CloneTree(-1) # deep copy all entries
cloned.SetName(obj_name)
cloned.Write()
else:
obj.Write()

fin.Close()
fout.Close()

if __name__ == "__main__":
output = sys.argv[1]
inputs = sys.argv[2:]
merge_root_directories_with_suffix(output, inputs)
164 changes: 164 additions & 0 deletions UTILS/TimeSeries/timeseries_diff.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
"""timeseries_diff.py
import sys,os; sys.path.insert(1, os.environ[f"O2DPG"]+"/UTILS/TimeSeries");
from timeseries_diff import *

Utility helpers for time‑series comparison scripts.
keeping their ROOT files alive.
"""

import os
import pathlib
from typing import List, Tuple, Optional

import ROOT # PyROOT

# ---------------------------------------------------------------------------
# Helper: open many ROOT files and keep them alive
# ---------------------------------------------------------------------------

def read_time_series(listfile: str = "o2_timeseries_tpc.list",treename: str = "timeSeries",) -> List[Tuple[ROOT.TFile, Optional[ROOT.TTree]]]:
"""Read *listfile* containing one ROOT path per line and return a list
of ``(TFile, TTree | None)`` tuples.
The TFile objects are **kept open** (and returned) so the TTrees remain
valid for the caller. Blank lines and lines starting with "#" are
ignored. Environment variables in paths are expanded.
Parameters
----------
listfile : str
Text file with ROOT filenames.
treename : str, default "timeSeries"
Name of the tree to retrieve from each file.
Returns
-------
list of tuples
``[(f1, tree1), (f2, tree2), ...]`` where *tree* is ``None`` if
the file or tree could not be opened.
"""
files_and_trees: List[Tuple[ROOT.TFile, Optional[ROOT.TTree]]] = []

with open(listfile, "r") as fh:
paths = [ln.strip() for ln in fh if ln.strip() and not ln.startswith("#")]

for raw_path in paths:
path = os.path.expandvars(raw_path)
if not pathlib.Path(path).is_file():
print(f"[read_time_series] warning: file not found -> {path}")
files_and_trees.append((None, None))
continue
try:
froot = ROOT.TFile.Open(path)
if not froot or froot.IsZombie():
raise RuntimeError("file could not be opened")
tree = froot.Get(treename)
if not tree:
print(f"[read_time_series] warning: tree '{treename}' missing in {path}")
files_and_trees.append((froot, tree))
except Exception as e:
print(f"[read_time_series] error: cannot open {path}: {e}")
files_and_trees.append((None, None))

return files_and_trees

def makeAliases(trees):
for tree in trees: tree[1].AddFriend(trees[0][1],"F")


def setStyle():
ROOT.gStyle.SetOptStat(0)
ROOT.gStyle.SetOptTitle(0)
ROOT.gStyle.SetPalette(ROOT.kRainBow)
ROOT.gStyle.SetPaintTextFormat(".2f")
ROOT.gStyle.SetTextFont(42)
ROOT.gStyle.SetTextSize(0.04)
ROOT.gROOT.ForceStyle()
ROOT.gROOT.SetBatch(True)






# ---------------------------------------------------------------------------
# make_ratios ----------------------------------------------------------------
# ---------------------------------------------------------------------------

def make_ratios(trees: list, outdir: str = "fig", pdf_name: str = "ratios.pdf") -> ROOT.TCanvas:
"""Create ratio plots *log(var/F.var) vs Iteration$* for each input tree.
* A PNG for every variable / tree is saved to *outdir*
* All canvases are also appended to a multi‑page PDF *pdf_name*
* Vertical guide‑lines mark the logical regions (isector, itgl, iqpt, occu)

"""
outdir = pathlib.Path(outdir)
outdir.mkdir(parents=True, exist_ok=True)
pdf_path = outdir / pdf_name

# ------- style / helpers ----------------------------------------------
ROOT.gStyle.SetOptTitle(1)
canvas = ROOT.TCanvas("c_ratio", "ratio plots", 1200, 600)
lab = ROOT.TLatex()
lab.SetTextSize(0.04)

# vertical guides in **user** x‑coordinates (Iteration$ axis: 0–128)
vlines = [0, 54, 84, 104, 127]
vnames = ["isector", "itgl", "iqpt", "occupancy"]
vcolors = [ROOT.kRed+1, ROOT.kBlue+1, ROOT.kGreen+2, ROOT.kMagenta+1]
setups=["ref","apass2_closure-test-zAcc.GausSmooth_test3_streamer","apass2_closure-test-zAcc.GausSmooth_test4_streamer","apass2_closure-test-zAcc.GausSmooth_test2_streamer"]
# variables to compare ---------------------------------------------------
vars_ = [
"mTSITSTPC.mTPCChi2A", "mTSITSTPC.mTPCChi2C",
"mTSTPC.mDCAr_A_NTracks", "mTSTPC.mDCAr_C_NTracks",
"mTSTPC.mTPCNClA", "mTSTPC.mTPCNClC",
"mITSTPCAll.mITSTPC_A_MatchEff", "mITSTPCAll.mITSTPC_C_MatchEff",
"mdEdxQMax.mLogdEdx_A_RMS","mdEdxQMax.mLogdEdx_C_RMS",
"mdEdxQMax.mLogdEdx_A_IROC_RMS","mdEdxQMax.mLogdEdx_C_IROC_RMS"
]
cut = "mTSITSTPC.mDCAr_A_NTracks > 200"

# open PDF ---------------------------------------------------------------
canvas.Print(f"{pdf_path}[") # begin multipage

for setup_index, (_, tree) in enumerate(trees[1:], start=1):
if not tree:
continue
for var in vars_:
expr = f"log({var}/F.{var}):Iteration$"
# 2‑D density histogram
tree.Draw(f"{expr}>>his(128,0,128,50,-0.05,0.05)", cut, "colz")
# profile overlay
tree.Draw(f"{expr}>>hp(128,0,128)", cut, "profsame")
pad = ROOT.gPad
ymin, ymax = -0.05, 0.05
# keep references so ROOT does not garbage‑collect the guides
guides: list[ROOT.TLine] = []
for x, txt, col in zip(vlines, vnames, vcolors):
# skip lines outside current x‑range (safety when reusing canvas)
if x < 0 or x > 128:continue
# 1) vertical line in **user** coordinates
ln = ROOT.TLine(x, ymin, x, ymax)
ln.SetLineColor(col)
ln.SetLineStyle(2)
ln.SetLineWidth(5)
ln.Draw()
guides.append(ln)
# 2) text in NDC (pad‑relative) for stable position
x_ndc = pad.XtoPad(x) # already NDC 0‑1
lab.SetTextColor(col)
lab.DrawLatex(x + 0.02, 0.03, txt)

# label of the setup on top‑left
lab.SetTextColor(ROOT.kMagenta+2)
lab.DrawLatex(0.15, 0.05, f"Setup {setups[setup_index]}")
canvas.Modified(); canvas.Update()

# ----------------------------------------------------------------
tag = var.split('.')[-1]
canvas.SaveAs(str(outdir / f"ratio_{setup_index}_{tag}.png"))
canvas.Print(str(pdf_path)) # add page

# prevent ROOT from deleting the guides before next Draw()
for ln in guides:
pad.GetListOfPrimitives().Remove(ln)

canvas.Print(f"{pdf_path}]") # close multipage
return canvas
8 changes: 8 additions & 0 deletions UTILS/dfextensions/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@

# Transitional artifacts
diff.txt

# Transitional artifacts
diff.txt
*.log
groupby_regression_git.log
27 changes: 27 additions & 0 deletions UTILS/dfextensions/.pylintrc
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
[MESSAGES CONTROL]
# Disable style warnings that don't affect functionality
disable=
line-too-long,
bad-indentation,
fixme,
logging-fstring-interpolation,
too-many-arguments,
too-many-positional-arguments,
too-many-locals,
too-many-branches,
too-many-statements,
broad-exception-caught,
invalid-name,
missing-module-docstring,
missing-class-docstring,
missing-function-docstring,
reimported,
import-outside-toplevel,
redefined-outer-name,
superfluous-parens

[FORMAT]
max-line-length=120

[BASIC]
good-names=i,j,k,ex,Run,_,X,y,df,np,dfGB
Loading