Skip to content

alexshd/hybridarray

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

hybridarray

A NumPy-inspired array library for Go, providing efficient columnar data structures with zero-copy views.

Overview

hybridarray combines three key data structures:

  • Map: O(1) column name lookups (like NumPy's structured array field access)
  • Linked list: Preserves insertion order for column iteration
  • Columnar arrays: Cache-friendly data access patterns (like NumPy's contiguous storage)

This hybrid approach enables NumPy-like operations with Go's type safety and performance.

Features

  • Zero-copy views: Slicing and column selection without data duplication
  • Type-aware columns: Runtime type information with DType
  • Ordered iteration: Preserves column insertion order
  • View composition: Views of views work correctly
  • Minimal API: Small, focused surface area for research workflows

Installation

go get github.com/alexshd/hybridarray

Quick Start

package main

import (
    "fmt"
    "github.com/alexshd/hybridarray"
)

func main() {
    // Create from map
    arr, _ := hybridarray.FromMap(map[string][]any{
        "x": {1.0, 2.0, 3.0, 4.0, 5.0},
        "y": {10.0, 20.0, 30.0, 40.0, 50.0},
    })

    fmt.Println(arr.Shape()) // (5, 2)

    // Zero-copy slice (rows 1-3)
    view := arr.Slice(1, 4)

    // Select columns
    xy := arr.Select("x", "y")

    // Access values
    val, _ := arr.At(2, "x") // 3.0

    // Iterate columns
    for col := range arr.Columns() {
        fmt.Printf("%s: %v\n", col.Name, col.Data)
    }
}

API Reference

Creating Arrays

// New array with specified rows
arr := hybridarray.New(100)

// From map of columns
arr, err := hybridarray.FromMap(map[string][]any{
    "temperature": {20.5, 21.0, 19.8},
    "humidity":    {65.0, 68.0, 62.0},
})

Adding Columns

data := []any{1.0, 2.0, 3.0}
err := arr.AddColumn("sensor", data, hybridarray.DTypeFloat64)

Accessing Data

// Single value
val, err := arr.At(row, "column")

// Full row as map
row, err := arr.Row(5)

// Column lookup
col := arr.GetColumn("temperature")

// Shape
nrows, ncols := arr.Shape()

// Column names
names := arr.ColumnNames()

Zero-Copy Operations

// Row slicing [start:end)
view, err := arr.Slice(10, 100)

// Column selection
view, err := arr.Select("x", "y", "z")

// Combine operations
filtered := arr.Slice(50, 150).Select("sensor", "value")

Iteration

// Range over columns (Go 1.23+ iter.Seq)
for col := range arr.Columns() {
    fmt.Printf("%s (%s): %d values\n",
        col.Name, col.DType, len(col.Data))
}

Data Types

const (
    DTypeFloat64  // float64, float32
    DTypeInt64    // int, int64, int32, int16, int8
    DTypeString   // string
    DTypeBool     // bool
    DTypeAny      // any (type-erased)
)

Types are inferred automatically in FromMap or can be specified in AddColumn.

Zero-Copy Semantics

Views created with Slice() and Select() share underlying data:

arr, _ := hybridarray.FromMap(map[string][]any{
    "x": {1.0, 2.0, 3.0, 4.0, 5.0},
})

view := arr.Slice(1, 4) // Rows 1-3

// Modifying original data affects view
arr.GetColumn("x").Data[2] = 99.0

val, _ := view.At(1, "x") // 99.0 (sees the change)

This enables efficient data pipelines without copying large arrays.

Performance

Benchmarks on M1 MacBook Pro (example):

BenchmarkFromMap-8              50000    25000 ns/op
BenchmarkSlice-8              5000000      250 ns/op   (zero-copy)
BenchmarkSelect-8             1000000     1500 ns/op   (zero-copy)
BenchmarkAt-8                20000000       65 ns/op
BenchmarkGetColumn-8        100000000       12 ns/op   (map lookup)

Run benchmarks:

go test -bench=. -benchmem

Testing

# Unit tests
go test -v

# Fuzz tests (Go 1.18+)
go test -fuzz=FuzzFromMap -fuzztime=30s
go test -fuzz=FuzzSlice -fuzztime=30s
go test -fuzz=FuzzSelect -fuzztime=30s

# Race detection
go test -race

# Coverage
go test -cover

Design Philosophy

hybridarray is designed as a minimal reference implementation for scientific computing workflows. It prioritizes:

  1. Simplicity: Small API surface, easy to understand
  2. Zero-copy: Memory-efficient view semantics
  3. Type awareness: Runtime type info without generics overhead
  4. Research-friendly: Quick iteration on data transformations

It is not designed for:

  • Production databases (no ACID guarantees)
  • Distributed computing (single-machine only)
  • Complex query optimization (no query planner)

NumPy Comparison

Similar to NumPy's ndarray:

  • Zero-copy slicing (like NumPy views)
  • Typed columns (analogous to structured arrays with dtype)
  • Efficient iteration
  • Field-based access (like structured arrays: arr['field'])

Different from NumPy:

  • Columnar storage instead of row-major (more like pandas DataFrame)
  • Map-based field lookup (O(1) like NumPy's structured arrays)
  • No multi-dimensional indexing yet (1D + columns only)
  • No broadcasting or vectorized operations (yet)

Direct NumPy API Equivalents:

# NumPy structured arrays
arr = np.array([(1, 2.5), (2, 3.5)], dtype=[('x', 'i4'), ('y', 'f8')])
view = arr[10:20]  # Zero-copy slice
x_col = arr['x']   # Field access

# hybridarray (Go)
arr := FromMap(map[string][]any{"x": {1, 2}, "y": {2.5, 3.5}})
view := arr.Slice(10, 20)  // Zero-copy slice
x_col := arr.GetColumn("x") // Field access

Go 1.25.3 Features

Uses latest Go features:

  • iter.Seq for range-over-func column iteration (Go 1.23+)
  • Improved generic type inference
  • Enhanced fuzzing support

Future Enhancements

Potential NumPy-inspired additions (not implemented):

  • Vectorized operations: Add(), Mul(), Apply() (like NumPy ufuncs)
  • Aggregations: Sum(), Mean(), Std() (like NumPy reductions)
  • Boolean indexing: Where(predicate) (like NumPy fancy indexing)
  • Sorting: Sort(), Argsort() (like NumPy sorting)
  • Set operations: Unique(), Intersect() (like NumPy set routines)
  • Broadcasting: Automatic shape alignment (NumPy's killer feature)
  • Multi-dimensional: True ndarray with arbitrary dimensions

Contributing

This is a minimal reference implementation inspired by NumPy. For production scientific computing in Go, consider:

License

Apache License 2.0 - See LICENSE file for details.

Copyright 2025 Alex Shadrin

Credits

Primary inspiration: NumPy's ndarray architecture

Additional influences:

  • NumPy structured arrays (field access, zero-copy views, dtype system)
  • pandas DataFrame API (columnar storage, named columns)
  • Apache Arrow columnar format (memory layout)

This is a learning/research implementation to understand NumPy's design principles in Go.

About

A NumPy-inspired array library for Go with zero-copy views and columnar storage

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages