From cb07f5f4b1dbbe0576c933e21487562633f6151b Mon Sep 17 00:00:00 2001 From: Boyce-pj Date: Mon, 22 Dec 2025 11:55:53 +0000 Subject: [PATCH] Adding di.cache module --- di/cache/cache.md | 131 ++++++++++++++++++++++++++++++++++++++++++++++ di/cache/cache.q | 94 +++++++++++++++++++++++++++++++++ di/cache/init.q | 9 ++++ di/cache/test.csv | 40 ++++++++++++++ 4 files changed, 274 insertions(+) create mode 100644 di/cache/cache.md create mode 100644 di/cache/cache.q create mode 100644 di/cache/init.q create mode 100644 di/cache/test.csv diff --git a/di/cache/cache.md b/di/cache/cache.md new file mode 100644 index 0000000..1166f55 --- /dev/null +++ b/di/cache/cache.md @@ -0,0 +1,131 @@ +# Cache + +`cache.q` provides an in-memory, parameterized caching mechanism for storing and reusing function results, reducing computation time for repeat calls. + +## Configuration Variables + +- **`.cache.maxsize`** — Maximum total cache size in MB. +- **`.cache.maxindividual`** — Maximum size in MB allowed for a single cache entry; capped at `maxsize`. +- **`MB`** — Defines one megabyte as `2 * xexp 20`. + +## Core Structures + +- **`cache`** (table) — Tracks cache entries: + - `id` (long) + - `lastrun`, `lastaccess` (timestamps) + - `size` (bytes) +- **`funcs`** (dict) — Maps `id` to the cached function. +- **`results`** (dict) — Maps `id` to the resulting data. +- **`perf`** (table) — Logs cache performance with columns: + - `time` (timestamp) + - `id` (long) + - `status` (symbol: `add`, `hit`, `fail`, `evict`, `rerun`) + +## Main Functions + +### `getid` +Generates unique IDs for new cache entries by incrementing a global counter. + +### `add` +Takes parameters `[function; id; status]` and: +1. Executes `function` via `value`. +2. If result size ≤ `.cache.maxindividual * MB`, ensures enough space: + - Calculates required space and evicts older entries as needed. +3. Inserts or updates cache table, `funcs`, `results`, logs performance. +4. Otherwise, logs a `fail` and returns the result without caching. + +### `drop` +Removes specific cache entries by `id`, updating both the cache table and results dict. + +### `evict` +Evicts least-recently-accessed items until required space is freed: +- Sorts by `lastaccess`, sums sizes, iteratively drops entries. +- Logs `evict` in `perf`. + +### `trackperf` +Logs performance events (`add`, `hit`, `fail`, `evict`, `rerun`) with timestamps into `perf`. + +### `execute` +Parameters: `[func; age]`. +1. Looks for matching cache entry by function identity. +2. If found and `age <= now – lastrun`: + - Updates `lastaccess`, logs a `hit`, returns cached result. +3. If found but stale: + - Drops entry, logs `rerun`, re-executes via `add`. +4. If not present: + - Adds a new cache entry via `add`. + +### `getperf` +Returns `perf` table with function mappings added for each event entry. + +# Cache Example Usage + +This example demonstrates how `.cache.execute` works with caching and stale time logic, along with performance tracking using `.cache.getperf[]`. + +## Example Steps + +### 1. First Execution +The function is run and the result placed in the cache: + +```q +q) \t r:execute[({system"sleep 2"; x+y};1;2);0D00:01] +2023 +q)r +3 +``` + +### 2. Second Execution (Cache Hit) +The second time round, the result set is returned immediately from the cache as we are within the stale time value: + +```q +q) \t r1:execute[({system"sleep 2"; x+y};1;2);0D00:01] +0 +q)r1 +3 +``` + +### 3. Execution After Stale Time (Re-run) +If the time since the last execution is greater than the required stale time, the function is re-run, the cached result is updated, and the result returned: + +```q +q) \t r2:execute[({system"sleep 2"; x+y};1;2);0D00:00] +2008 +q)r2 +3 +``` + +### 4. Cache Performance Tracking +The cache performance is tracked using `.cache.getperf[]`: + +```q +q).cache.getperf[] +time id status function +------------------------------------------------------------------ +2013.11.06D12:41:53.103508000 2 add {system"sleep 2"; x+y} 1 2 +2013.11.06D12:42:01.647731000 2 hit {system"sleep 2"; x+y} 1 2 +2013.11.06D12:42:53.930404000 2 rerun {system"sleep 2"; x+y} 1 2 +``` + + +--- + +## Cache Table Schema + +| Column | Type | Description | +|--------------|------------|--------------------------------------------------| +| `id` | `long` | Unique identifier for cached entry | +| `lastrun` | `timestamp`| When the entry was initially added | +| `lastaccess` | `timestamp`| When entry was last served from cache | +| `size` | `long` | Byte size of the cached result | + +## perf Table Schema + +| Column | Type | Description | +|---------|-------------|--------------------------------------------| +| `time` | `timestamp` | When the cache event occurred | +| `id` | `long` | Corresponding cache entry ID | +| `status`| `symbol` | Event type (`add`, `hit`, `fail`, etc.) | +| `function` (added via `getperf`) | `function` | Cached function for the event | + +--- + diff --git a/di/cache/cache.q b/di/cache/cache.q new file mode 100644 index 0000000..fc0b485 --- /dev/null +++ b/di/cache/cache.q @@ -0,0 +1,94 @@ +/ Library to provide a mechanism for storing function results in a cache and returning them from the cache if they are available and non stale. + +/ return timestamp function +cp:{.z.p}; + +/ the maximum size of the cache in MB +maxsize:10; + +/ the maximum size of any individual result set in MB +maxindividual:50; + +/ make sure the maxindividual isn't bigger than maxsize +maxindividual:maxsize&maxindividual; + +MB:2 xexp 20; + +/ a table to store the cache values in memory +cache:([id:`u#`long$()] lastrun:`timestamp$();lastaccess:`timestamp$();size:`long$()); + +/ a dictionary of the functions +.z.M.funcs set (`u#`long$())!(); +/ the results of the functions +results:(`u#`long$())!(); + +/ table to track the cache performance +perf:([]time:`timestamp$();id:`long$();status:`symbol$()); + +id:0j; +getid:{:id+::1}; + +/ add to cache +add:{[function;id;status] + / Don't trap the error here - if it throws an error, we want it to be propagated out + res:value function; + $[(maxindividual*MB)>size:-22!res; + / check if we need more space to store this item + [now:cp[]; + if[0>requiredsize:(maxsize*MB) - size+sum exec size from cache; evict[neg requiredsize;now]]; + / Insert to the cache table + .z.M.cache upsert (id;now;now;size); + / and insert to the function and results dictionary + funcs[id]:enlist function; + results[id]:enlist res; + / Update the performance + trackperf[id;status;now]]; + / Otherwise just log it as an addfail - the result set is too big + trackperf[id;`fail;cp[]]]; + / Return the result + res}; + +// Drop some ids from the cache +drop:{[ids] + ids,:(); + delete from .z.M.cache where id in ids; + results:: ids _ results; + } + +// evict some items from the cache - need to clear enough space for the new item +// evict the least recently accessed items which make up the total size +// feel free to write a more intelligent cache eviction policy ! +evict:{[reqsize;currenttime] + r:select from + (update totalsize:sums size from `lastaccess xasc select lastaccess,id,size from cache) + where prev[totalsize] (now:.z.p) - r`lastrun; + // update the cache stats, return the cached result + [update lastaccess:now from .z.M.cache where id=r`id; + trackperf[r`id;`hit;now]; + first results[r`id]]; + // value found, but too old - re-run it under the same id + [drop[r`id]; + add[func;r`id;`rerun]]]]; + // it's not in the cache, so add it + add[func;getid[];`add]]} + +// get the cache performance +getperf:{update function:funcs[id] from perf} diff --git a/di/cache/init.q b/di/cache/init.q new file mode 100644 index 0000000..b825ed3 --- /dev/null +++ b/di/cache/init.q @@ -0,0 +1,9 @@ +/ Load core functionality into root module namespace +\l ::cache.q + +export:([ + getperf:getperf; + sdd:add; + drop:drop; + execute:execute + ]) diff --git a/di/cache/test.csv b/di/cache/test.csv new file mode 100644 index 0000000..a1a1673 --- /dev/null +++ b/di/cache/test.csv @@ -0,0 +1,40 @@ +action,ms,bytes,lang,code,repeat,minver,comment +before,0,0,q,cac:use`di.cache,1,1,load package into session + +/ Test 1: Baseline Addition with Sleep +run,0,0,q,system "t cac.execute[({system\"sleep 1\"; x+y};10;20);0D00:01]",1,1,Initial run with 1s sleep +true,0,0,q,0~ system "t cac.execute[({system\"sleep 1\"; x+y};10;20);0D00:01]",1,1,Speedup: Cached result returns in <1ms + +/ Test 2: String Manipulation Speed +run,0,0,q,system "t cac.execute[({system\"sleep 1\"; upper x};\"hello world\");0D00:01]",1,1,Initial run with string conversion +true,0,0,q,0~ system "t cac.execute[({system\"sleep 1\"; upper x};\"hello world\");0D00:01]",1,1,Speedup: String conversion cached + +/ Test 3: Matrix Multiplication (Mock) +run,0,0,q,system "t cac.execute[({system\"sleep 1\"; (reverse x) * y};500;1000);0D00:01]",1,1,Initial run with numeric calc +true,0,0,q,0~ system "t cac.execute[({system\"sleep 1\"; (reverse x) * y};500;1000);0D00:01]",1,1,Speedup: Numeric calc cached + +/ Test 4: Join Operations +run,0,0,q,system "t cac.execute[({system\"sleep 1\"; x uj y};([]a:1 2);([]a:3 4));0D00:01]",1,1,Initial run with table join +true,0,0,q,0~ system "t cac.execute[({system\"sleep 1\"; x uj y};([]a:1 2);([]a:3 4));0D00:01]",1,1,Speedup: Table join cached + +/ Test 5: Type Checking & Casting +run,0,0,q,system "t cac.execute[({system\"sleep 1\"; `int$x};123.456);0D00:01]",1,1,Initial run with casting +true,0,0,q,0~ system "t cac.execute[({system\"sleep 1\"; `int$x};123.456);0D00:01]",1,1,Speedup: Casting cached + +/ Test 6: Nested List Flattening +run,0,0,q,system "t cac.execute[({system\"sleep 1\"; raze x};(1 2;3 4;5 6));0D00:01]",1,1,Initial run with raze +true,0,0,q,0~ system "t cac.execute[({system\"sleep 1\"; raze x};(1 2;3 4;5 6));0D00:01]",1,1,Speedup: Raze cached + +/Test 7: Check if output is correct +run,0,0,q,cac.execute[({x+1};1);0D00:01],1,1,cache sleep functionality +true,0,0,q,2~cac.execute[({x+1};1);0D00:01],1,1,cached sleep functionality + +/ Test 8: Table Result Correctness +run,0,0,q,res::([]a:1 2 3;b:`x`y`z);cac.execute[({x};res);0D00:01],1,1,Initial run: cache a table +true,0,0,q,res ~ cac.execute[({x};res);0D00:01],1,1,Check: Table retrieved from cache matches original + +/ Test 9: Complex Dictionary Result +run,0,0,q,dict::`stats`data!((avg;med);til 10);cac.execute[({x};dict);0D00:01],1,1,Initial run: cache a dictionary of functions and lists +true,0,0,q,dict ~ cac.execute[({x};dict);0D00:01],1,1,Check: Dictionary retrieved from cache matches original + +