A lightweight single-header C++17 dataframe implementation inspired by pandas.DataFrame.
Think of Dataframe as a spreadsheet in C++:
- Columns are named labels
- Rows are indexed records
Each column can hold different types (int, float, double, string), so you can load CSVs, manipulate data, and perform analytics with just a few lines of code — bringing pandas-like power to C++.
- C++17 or newer
Uses features likestd::variant,std::optional, andstd::from_chars. - Windows console (only for colored logging via
c_logger)
Note for Linux/Mac:
The dataframe core works on any platform.
c_loggeruses Windows-specific APIs for colored output.
To run on Linux/Mac:
- Replace
c_loggerwith a simpleprintf/std::coutlogger, or- Use ANSI escape codes for colors, or
- Stub out logging completely.
- Add the single header to your project:
dataframe.h - Include it in your code:
#include "dataframe.h"
- Compile with a C++17 compatible compiler.
cl /std:c++17 /O2 main.cppg++ -std=gnu++17 -O2 main.cpp -o demo.exeg++ -std=c++17 -O2 main.cpp -o demo- Mixed types per column:
using value_t = std::variant<double, std::string>; - CSV I/O:
from_csv(path, header)andto_csv(path, header) - Column operations:
- Add:
add_column(name, values) - Rename:
rename_column(old_name, new_name) - Remove:
remove_column(name) - Shape:
shape() -> {rows, columns}
- Add:
- Row operations:
drop(index)dropf()– drop first rowdropb()– drop last row
- Cleaning:
dropna()– remove rows with NaNdropinf()– remove rows with Infdropemp()– remove rows with empty string cells
- Statistics:
sum,prod,mean,var,std,min,max
- Relationships:
cov– covariancecorr– correlation
- Higher moments:
skew– skewnesskurt– excess kurtosis
- Transforms:
diff– first differencepct_change– percentage changelog_change– log returnscumsum– cumulative sumcumprod– cumulative product
- Selection:
at("col") -> std::vector<value_t>&(mutable reference to column)at<T>("col") -> std::vector<T>typed extraction with automatic parsingat<T>({col1, col2, ...}) -> std::vector<std::vector<T>>multiple columns
- Display:
print(n)– pretty console outputhead(n)– first n rowstail(n)– last n rows
#include "dataframe.h"
#include <iostream>
int main() {
c_dataframe df;
// Add some columns
df.add_column("Name", std::vector<std::string>{"Alice", "Bob", "Charlie"});
df.add_column("Age", std::vector<int>{25, 30, 28});
df.add_column("Salary", std::vector<double>{50000, 60000, 55000});
// Print the dataframe
df.print();
// Access statistics
double avg_salary = df.mean("Salary");
double salary_std = df.std("Salary");
std::cout << "Average salary: " << avg_salary << std::endl;
std::cout << "Salary std dev: " << salary_std << std::endl;
// Save to CSV
df.to_csv("employees.csv", true);
// Load from CSV
c_dataframe df2("employees.csv", true);
df2.print();
return 0;
}Example Console Output:
Name Age Salary
-----------------------------------------------------------------------
0 Alice 25 50000
1 Bob 30 60000
2 Charlie 28 55000
Licensed under the MIT License.