Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
213 changes: 213 additions & 0 deletions di/analytics/analytics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,213 @@
# Analytics Functions Library

A set of analytical utilities designed to streamline and make common data manipulation operations more efficient in kdb+/q.

The library provides specialized functions for handling typical analytical workflows, including forward filling missing values, creating custom time intervals, pivoting tables, and generating cross-product expansions. Each function accepts dictionary parameters or a table for flexible configuration and includes robust error handling with informative messages.

---

## Overview

- **`ffill`** – Forward fill missing values within columns (optionally by group).
- **`ffillzero`** – Treat zeros as missing and forward fill with the last non-zero value.
- **`intervals`** – Generate custom time/value intervals with configurable step and rounding.
- **`pivot`** – Transform tables into cross-tab (wide) format using a pivot column.
- **`rack`** – Build cross products of key columns, optionally with time intervals and base tables.

---

## Functions

### ⚙️`ffill`

**Description**

Forward fills null values in specified columns with the most recent non-null observation. Supports both table-level operations and granular control through dictionary parameters.

**Parameters**
- Input can be either a table or a dictionary.
- when argument is table, forward fill the whole table (same as calling fills).
- When using dictionary format:
- `table`: The table to process (**required**)
- `keycols`: Column(s) to fill (optional – defaults to all columns)
- `by`: Grouping column(s) for segmented filling (optional)

**Behaviour**
- Processes columns independently, preserving data types.
- Handles both typed columns and mixed-type columns.
- When `by` is specified, filling occurs within each group.
- Combines `by` and `keycols` for targeted group-wise filling.

**Examples**
```q
// Fill all columns in a table
filledTable: ffill[table]

// Fill specific columns
ffill[`table`keycols!(myTable; `ask`bid)]

// Group-wise filling by symbol
ffill[`table`by`keycols!(myTable; `sym; `price`size)]

// Combined grouping and column selection
ffill[`table`by`keycols!(myTable; `sym; `ask`bid)]
```

---

<br>


### ⚙️ `ffillzero`

**Description**

Extends forward-fill functionality to handle zero values by treating them as missing data points before applying the fill operation.

**Parameters**
- Dictionary with:
- `table`: Source table (**required**)
- `keycols`: Columns where zeros should be filled (**required**)
- `by`: Optional grouping column(s)

**Behaviour**
1. Converts zero values to null in specified columns.
2. Applies `ffill` logic.
3. Returns a table with zeros replaced by previous non-zero values.

**Examples**
```q
// Replace zeros with last non-zero value
ffillzero[`table`keycols!(priceData; `bid`ask)]

// Group-wise zero filling
ffillzero[`table`by`keycols!(priceData; `sym; `price)]
```

---
<br>

### ⚙️ `intervals`

**Description**

Generates custom time or numeric interval sequences with configurable start, end, and increment parameters. Supports multiple temporal data types with optional rounding to interval boundaries.

**Parameters**
- Dictionary containing:
- `start`: Beginning of interval range (**required**)
- `end`: End of interval range (**required**)
- `interval`: Step size between successive intervals (**required**)
- `round`: Boolean flag for rounding start to nearest interval boundary (optional, default: `1b`)

**Behaviour**
- Supports multiple data types: `minute`, `second`, `time`, `timespan`, `timestamp`, `month`, `date`, `int`, `long`, `short`, `byte`.
- `start` and `end` must have matching data types.
- When `round` is false or omitted, `start` is rounded down to the nearest interval boundary.
- The sequence excludes any final interval that would exceed `end`.
- Date/month intervals: `interval` must be int or long (fractional dates/months not permitted)
- Timestamp intervals: interval accepts minute, second, timespan, int, or long. When using numeric types (int/long), values represent nanoseconds—use caution to prevent overflow.

**Examples**
```q
// Generate 15-minute intervals for trading day
intervals[`start`end`interval!(09:30:00.000; 16:00:00.000; 00:15:00.000)]

// Daily intervals without rounding
intervals[`start`end`interval`round!(2024.01.01; 2024.12.31; 1; 0b)]

// Hourly timestamps with automatic rounding
intervals[`start`end`interval!(2024.01.01D09:00:00; 2024.01.01D17:00:00; 01:00:00)]
```

---
<br>

### ⚙️`pivot`

**Description**

Reorganizes tabular data by transforming unique values from a pivot column into individual columns, with aggregated values at intersections. Creates a cross-tabular representation suitable for reporting and analysis.

**Parameters**
- Dictionary with:
- `table`: Source table (**required**)
- `by`: Row grouping column(s) (**required**)
- `piv`: Column whose distinct values become new columns (**required**)
- `var`: Value column(s) to aggregate (**required**)
- `f`: Column naming function (optional – defaults to concatenation with underscore)
- `g`: Column ordering function (optional – defaults to keeping `by` columns followed by sorted pivot columns)

**Behaviour**
- Groups data by `by` columns to form rows.
- Groups by `piv` columns to determine new column structure.
- Aggregates `var` values at each intersection.
- Applies naming and ordering functions (`f`, `g`) to the final result.

**Examples**
```q
// Basic pivot: levels become columns
pivot[`table`by`piv`var!(quotes; `date`sym`time; `level; `price)]

// Multiple aggregation columns
pivot[`table`by`piv`var!(trades; `date`sym; `exchange; `price`volume)]

// Custom column naming
pivot[`table`by`piv`var`f!(data; `date; `category; `value; {[v;P] `$"_" sv' string v,'P})]
```

---
<br>

### ⚙️`rack`

**Description**

Constructs a cross product of distinct column values, creating all possible combinations. Optionally integrates time series intervals and/or base table expansion for comprehensive data frameworks.

**Parameters**
- Dictionary containing:
- `table`: Source table (**required**)
- `keycols`: Columns to cross-product (**required**)
- `base`: Additional table to cross with result (optional)
- `timeseries`: Dictionary for interval generation (optional, uses `intervals` function)
- `fullexpansion`: Boolean for complete Cartesian product of key columns (optional, default: `0b`)

**Behaviour**
- Standard mode preserves existing row-wise combinations in `keycols`.
- Full expansion mode (`fullexpansion` = `1b`) generates all possible combinations across `keycols`.
- Can integrate with time series intervals for temporal expansion.
- Supports base table cross-product for additional dimensionality.

**Examples**
```q
// Generate all symbol combinations from table
rack[`table`keycols`fullexpansion!(trades; `sym; 1b)]

// Rack with time intervals
rack[`table`keycols`timeseries!(trades; `sym; `start`end`interval!(09:30; 16:00; 00:15))]

// Combine base table with rack and intervals
rack[`table`keycols`base`timeseries!(trades; `sym; baseData; intervalDict)]

// Preserve existing combinations without expansion
rack[`table`keycols!(quotes; `sym`exchange)]
```

---

## Error Handling

The functions implement comprehensive validation with descriptive error messages:
- **Type validation** – Ensures input parameters match expected types.
- **Structure validation** – Verifies the dictionary contains required keys.
- **Column validation** – Confirms specified columns exist in target tables.
- **Data type consistency** – Validates matching types across related parameters.

**Example error messages**
```q
'Input parameter must be a dictionary with keys-(table, keycols, by), or a table to fill
'Input parameter must be a dictionary with at least three keys (an optional key round):-start-end-interval
'some columns provided do not exist in the table
'interval start and end data type mismatch
```
101 changes: 101 additions & 0 deletions di/analytics/analytics.q
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
ffill:{[arg]
/ forward fills null values in specified columns (or all columns) with the last non-null value, optionally grouped by key columns. The function is the single point of entry for different input types: dictionary or table.
:$[.Q.qt arg;filltable[arg];
99h=type arg;filldict[arg];
'`$"Input parameter must be a dictionary with keys-(table, keycols, by), or a table to fill"];
}

/ forward fill a column in a table, handle both typed and mixed columns
fillcol: {$[0h=type x; x maxs (til count x)*(0<any each not null each x); fills x]}

/ forward fill all columns in a table
filltable:{[t] ![t;();0b;((),cols[t])!(.z.M.fillcol),/: cols[t],()]}

filldict:{[d]
/ fill with dictionary argument
if[not `table in fkey:key d;'`$"Input table is missing"];
if[(`keycols in fkey) & `by in fkey;
:![d`table;();((),d`by)!((),d`by);((),d`keycols)!(.z.M.fillcol),/:((),d`keycols)]];
if[`keycols in fkey;
:![d`table;();0b;((),d`keycols)!(.z.M.fillcol),/: ((),d`keycols)]];
if[`by in fkey;
:![d`table;();(enlist d`by)!(enlist d`by);(cols d`table)!(.z.M.fillcol),/: cols d`table]];
filltable[d`table];
}

ffillzero:{[d]
/ forward fills zero values in specified columns or all columns with the last non-zero value, optionally grouped by key columns.
if[any not `table`keycols in key d;'`$"Input table or key columns are missing"];
(d`table):@[d`table;d`keycols;{?[0=x;0n;x]}];
:filldict[d];
}

intervals:{[d]
/ create time intervals with bespoke increments
$[99h<> type d; '`$"input should be a dictionary";
not all `start`end`interval in fkey:key[d];'`$"Input parameter must be a dictionary with at least three keys (an optional key round):\n\t-",sv["\n\t-";string `start`end`interval];
any not (itype:.Q.ty'[d`start`end`interval`round]) in ("MmuUiIjJhHNnVvDdPptTB");'`$("One or more of inputs are of an invalid type.");
1<count distinct 2#itype;'`$"interval start and end data type mismatch";
(not (itype 2) in ("iIjJ")) & (itype 0) in ("MmDd");'`$"interval types should be int/long for date/month intervals"];

istart:d`start;
iend:d`end;
istep:d`interval;

if[(itype 0) in "Pp";
if[(itype 2) in "Uu";istep:(`long$istep)*60*1000000000];
if[(itype 2) in "Vv";istep:(`long$istep)*1000000000]];

adjStart:$[(`round in fkey) & not d`round;
istart;
istep*`long$istart div istep];
interval:abs[type istart]$adjStart+istep*til 1+ceiling(iend-adjStart)%istep;
:$[iend<last interval;-1_interval;interval];
}

pivot:{[d]
/ Reorganizes table data by pivoting specified columns into a cross-tabular format with aggregated values
$[99h<> type d; '`$"input should be a dictionary";
not all `table`by`piv`var in fkey:key[d];'`$"Input parameter must be a dictionary with at least four keys (with optional keys f and g):\n\t-",sv["\n\t-";string `table`by`piv`var];
any not itype:.Q.ty'[d`table`by`piv`var] in (" sS");'`$("One or more of inputs are of an invalid type.")];

if[(any/) not d[`by`piv`var] in cols [d`table];'`$"some columns provided do not exist in the table"];

t:d`table;
k:(),d`by;
p:(),d`piv;
v:(),d`var;
f:$[`f in fkey;d`f;{[v;P] `$"_" sv' string (v,()) cross P}];
g:$[`g in fkey;d`g;{[k;c] k,asc c}];
G:group flip k!(t:.Q.v t)k;
F:group flip p!t p;

count[k]!g[k;C]xcols 0!key[G]!flip(C:f[v]P:flip value flip key F)!raze
{[i;j;k;x;y]
a:count[x]#x 0N;
a[y]:x y;
b:count[x]#0b;
b[y]:1b;
c:a i;
c[k]:first'[a[j]@'where'[b j]];
c}[I[;0];I J;J:where 1<>count'[I:value G]]/:\:[t v;value F]}

rack:{[d]
/ Creates a cross product (rack) of distinct column values, optionally with time series intervals and/or base table expansion
$[99h<> type d; '`$"input should be a dictionary";
not all `table`keycols in fkey:key[d];'`$"Input parameter must be a dictionary with at least two keys (with optional keys base, timeseries, fullexpansion):\n\t-",sv["\n\t-";string `table`keycols]];
if[any not d[`keycols] in cols [d`table];'`$"some of the key columns provided do not exist in the table"];

tab:d`table;
keycol:d`keycols;
fullexp:$[`fullexpansion in fkey;d`fullexpansion;0b];
rackkeycol:$[fullexp;flip keycol!flip (cross/) distinct@/:(0!tab)[keycol];flip keycol!(0!tab)[keycol]];
if[`timeseries in fkey;
timeinterval:flip (enlist `interval)!enlist intervals[d`timeseries];
:$[`base in fkey; (cross/)(d`base;rackkeycol;timeinterval); (cross/)(rackkeycol;timeinterval)]];
:$[`base in fkey; (cross/)(d`base;rackkeycol); rackkeycol];
}




3 changes: 3 additions & 0 deletions di/analytics/init.q
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
\l ::analytics.q

export:([ffill:ffill;ffillzero:ffillzero;intervals:intervals;pivot:pivot;rack:rack])
Loading