Skip to content
/ csvx Public

A zero dependencies tool that enables you to control how to tokenize, transform and handle files with char(s) separated values in Clojure, ClojureScript and Babashka.

License

Notifications You must be signed in to change notification settings

oneness/csvx

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CSVX

A dependency-free tool that enables you to control how to tokenize, transform and handle files with char(s) separated values.

Works in Clojure, ClojureScript (Node.js and browser), and Babashka.

Usage

Not sure if I am going to publish this to Clojars but if you are using tools.deps, you can just add following to deps.edn to add it to your project.

{:deps
  {github-oneness/csvx
    {:git/url "https://github.com/oneness/csvx"
     :sha "LATEST_COMMIT_SHA"}}}
;; Then require from your repl like this:
(require '[csvx.core :as csvx])
;; You can just use the one public fn `readx` without any options to parse csv:
(csvx/readx "resources/100-sales-records.csv")
;; Note that csvx/readx takes optional arg where you can pass in following
;; options: (listed values here are defaults if no option is given See src/csvx/core.clj
;; for details).
{:encoding "UTF-8"
 :max-lines-to-read Integer/MAX_VALUE  ;; or Number.MAX_SAFE_INTEGER in CLJS
 :line-tokenizer #(.split ^String (str %) ",")
 :line-transformer #(map-indexed hash-map %)}

**Clojure (CLJ) Usage:**

(require '[csvx.core :as csvx])

;; Read a CSV file - returns data directly
(csvx/readx "data.csv")
;; => [[{0 "name"} {1 "age"} {2 "gender"}]
;;     [{0 "John"} {1 "32"} {2 "M"}]
;;     [{0 "Susan"} {1 "28"} {2 "F"}]]

;; Read with options
(csvx/readx "data.csv"
            {:encoding "UTF-8"
             :max-lines-to-read 100
             :line-tokenizer #(.split ^String (str %) ",")
             :line-transformer #(map-indexed hash-map %)})

**ClojureScript (CLJS) Usage:**

CLJS usage returns a Promise and supports multiple input types:

Node.js:**

(require '[csvx.core :as csvx])

;; Read from file path - returns Promise
(-> (csvx/readx "data.csv")
    (.then (fn [data]
             (js/console.log data)))
    (.catch (fn [err]
              (js/console.error err))))

Browser:**

;; From File object (e.g., <input type="file">)
(-> (csvx/readx file-obj)
    (.then (fn [data]
             (js/console.log data))))

;; From URL
(-> (csvx/readx "https://example.com/data.csv")
    (.then (fn [data]
             (js/console.log data))))

**Babashka Usage:**

Babashka is a native Clojure interpreter for scripting with fast startup. csvx works with Babashka out of the box with no modifications needed.

Example script (script.bb):

#!/usr/bin/env bb

(require '[csvx.core :as csvx])

;; Read a CSV file - returns data directly
(def data (csvx/readx "data.csv"))

;; Process the data
(prn (count data) "rows read")
(prn (first data))

To use csvx in a Babashka script, place src/csvx/core.cljc on your classpath:

bb --classpath src script.bb

**Custom Tokenizers and Transformers:**

Following example shows that you can pass in `line-tokenizer` and `line-transformer` to parse JSON file into Clojure/ClojureScript maps:

(defn decode-json [^String file-path]
  (readx file-path
         {:max-lines-to-read 1
          :line-tokenizer (fn [line]
                            (map #(.split ^String % ":")
                                 (-> (clojure.string/replace line #"\{|\}" "")
                                     (.split ","))))
          :line-transformer (fn [line]
                              (reduce (fn [acc [k v]
                                        (merge acc
                                               {(-> k read-string keyword) (read-string v)}))
                                      {}
                                      line))}))

Options

  • :encoding - File encoding (default: "UTF-8")
  • :max-lines-to-read - Maximum number of lines to read (default: Integer/MAX_VALUE in CLJ, Number.MAX_SAFE_INTEGER in CLJS)
  • :line-tokenizer - Function to split line into fields (default: comma split)
  • :line-transformer - Function to transform tokenized line (default: map-indexed hash-map)

Return Values

  • **CLJ**: Returns data directly (vector of lines)
  • **CLJS**: Returns a Promise that resolves to the data
  • **Babashka**: Returns data directly (same as CLJ)

Develop

git clone https://github.com/oneness/csvx
cd csvx

# Run CLJ tests
clojure -X:test

# Run CLJS Node tests
clojure -X:test-node

# Run CLJS Browser tests (compiles and opens browser)
clojure -X:test-browser

# Run Babashka tests
bb test

Features

  • Works in Clojure, ClojureScript (Node.js and browser), and Babashka
  • No dependencies (plain JS interop for CLJS)
  • Custom tokenizers for any delimiter-separated format
  • Custom transformers for flexible output formats
  • Browser support for File objects and URLs
  • Configurable line limits for memory-efficient processing
  • Babashka compatible - use in scripts for fast CSV processing
  • Comprehensive test coverage across all platforms (CLJ, Node, Browser, Babashka)

Performance: CLJ vs Babashka

Babashka is significantly faster for scripting and CLI usage due to its native binary and instant startup:

PlatformCold StartWarm Start
CLJ (JVM)~3.9s~1.0s
Babashka~0.14s~0.025s
  • **Cold start:** Babashka is ~27x faster
  • **Warm start:** Babashka is ~40x faster

**Important caveat:** These benchmarks test a small workload (2 tests, simple CSV parsing). Results should be taken with a grain of salt:

  • JVM Clojure’s JIT compiler can outperform Babashka for CPU-intensive, long-running tasks
  • Babashka’s advantage is primarily startup time, not execution speed
  • Actual performance depends on workload size, I/O vs CPU operations, and use case
  • For large-scale data processing or long-running applications, JVM Clojure may be more suitable

About

A zero dependencies tool that enables you to control how to tokenize, transform and handle files with char(s) separated values in Clojure, ClojureScript and Babashka.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published