Persistent Vector Tree

Contents

Definition
Structure
Performance
Approach
Concurrency
Requirements

Persistent Vector Tree

1. Definition

The basic dynamic array data structure (std::vector) is in the most cases the most efficient data structure as stated by Stroustrup.

Accept for the fundamental issues, the most important feature of the standard vector is the contiguous memory allocation (i.e. the cache efficiency). On the other hand, the standard vector has two disadvantages:

Copy operation is linear (O(N)): although the contiguous allocation provides some efficiency (e.g. bitwise copy with memcopy for trivially copyables)
Iterators may become invalid due to the reallocation/shrinking (e.g. push_back)

The basic data structure in functional programming (FP) is the linked list as it addresses solutions to the above two issues. However, the linked list is cumbersome and it has a very weak performance as the iteration is based on the pointer indirection. Hence, in FP, we need a persistent data structure like a linked list which, as well, provides a contiguous memory allocation like a vector.

Ivan Cukic, in his famous book about FP, Functional Programming in C++, describes a data structure invented by Rick Hickey for the Clojure language (Bitmapped Vector Trie, BVT). Rick Hickey has also been inspired by Phil Bagwell's famous paper, Ideal Hash Trees. Phill Bagwell's trie structure is Hash Array Mapped Trie (HAMT). Thus, in summary, the data structure in this header, VectorTree.h, is a persistent replacement for the standard vector and HAMT is a persistent replacement for the associative data structures (std::unordered_set).

Note

VectorTree is not a trie but a tree data structure.
VectorTree aggrees with the definitions, invariants, algorithms and runtime/space complexities defined by Ivan Cukic in Functional Programming in C++.

Obviously, all operations listed in VectorTree interface are const-qualified as its a functionally persistent data structure. Each operation returns a new VectorTree keeping the persistent history of the data.

2. Structure

VectorTree is actually a tree data structure formed by the composite and the leaf nodes. The leaf nodes allocate a small buffer (usually 32 elements) using std::vector<T>. On the other hand, the composite nodes are the source of the shared ownership on the data simulated by std::shared_ptr applied on the children of the composite nodes. This is how the shared ownership of the data is simulated.

3. Performance

A primitive solution to define a persistent vector is a wrapper class for the standard vector which:

copies the original std::vector into a new one
applies the request on the given element
return the modified copy as the new std::vector

The primitive persistent vector would have a crucial performance problem when:

the size of the vector is large or
the contained type has a large size.

VectorTree is the solution to the above performance problem. It distributes the data with the help of a tree structure and perform the requested operations locally. The unrelated data is shared with the original VectorTree (i.e. the shared ownership defined in Structure section).

For the operations applied at the end (e.g. push_back, emplace_back or pop_back), the efficiency of the VectorTree data structure:

is superior to the efficiency of the primitive persistent vector and
converges to the efficiency of the standard vector.

However, unlike a linked list, the actions applied not at the end are expensive even more expensive than a standard vector. We have basically two such operations: insert and erase. For the erase function, VectorTree applies a work-around based on the swap-and-pop idiom. However, this solution invalidates the pointer to the last element, which is an obvious confliction for a persistent data structure. Nevertheles, the order is not preserved as the last item is swapped with the item to be deleted. However, this is neither a standard library (STL) nor a public open source library (e.g. boost). Hence, as an in-house data structure, this confliction can be acceptable. The insert operation is excluded as VectorTree does not preserve the order of the contained elements.

The member functions and the iterators of VectorTree are tested hardly using the gtest library. See test source directory for the test source files. See test results to review the results.

VectorTree is tested against std::vector and the primitive persistent vector described above using the google benchmark library. See benchmark source directory for the benchmark source files. See benchmark results to review the results.

4. Approach

The below flowchart is followed for the operations at the end:

Find the path from the root node to the active leaf node (i.e. the active path)
Copy (not clone!) all the nodes in the active path.
Modify the active leaf node for the requested operation together with the copied composite nodes
Create a new VectorTree which increments the shared references of the non-modified nodes and replaces the nodes in the active path with the modified nodes. The above algorithm provides logN time complexity which yields to a constant time as the base of the log is large (32 usually).

The algorithm may require a few modifications if the active leaf node cannot handle the requested operation. For example, the next leaf node must be determined if the active leaf node is full and the request is a push. Considering these exceptional cases, the worst case time complexity becomes logkN which again yields to O(1).

Erase operation applies the swap-and-pop idiom. I will not go through the details of this idiom. However, its worth to note that the erase operation consists of two modifications:

the path to the requested element and
the path to the last element which will be swapped with the requested element.

VectorTree has its own STL style random access iterators (const and non-const). The non-const iterator is private and used internally. An STL style for_each algorithm is implemented as a member function which:

clones the whole data structure
applies the input function on each element using the internal non-const iterator
returns the modified clone as the new VectorTree.

5. Concurrency

VectorTree provides an exception and thread safe solution by default as a result of the persistency.

6. Requirements

The contained type T must satisfy the following interface:

std::copy_constructible<T>
std::is_move_assignable<T>

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.vscode		.vscode
benchmark		benchmark
build		build
config		config
inc		inc
src		src
test		test
CMakeLists.txt		CMakeLists.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Persistent Vector Tree

1. Definition

2. Structure

3. Performance

4. Approach

5. Concurrency

6. Requirements

About

Uh oh!

Releases

Packages

Languages

BarisAlbayrakIEEE/VectorTree

Folders and files

Latest commit

History

Repository files navigation

Persistent Vector Tree

1. Definition

2. Structure

3. Performance

4. Approach

5. Concurrency

6. Requirements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages