A C++ neural network implementation supporting both CPU and GPU execution through CUDA. The framework provides a flexible architecture for deep learning with automatic differentiation and CUDA-accelerated tensor operations.
The framework is built around an efficient tensor implementation supporting:
- CPU/GPU memory management with automatic device switching
- Basic operations: transpose, dot product, element-wise multiplication
- CUDA-optimized kernels with shared memory utilization
- Memory-aligned storage with proper striding
- Template support for different numeric types (float, double, int)
| Operation Type | Implementation Details |
|---|---|
| Matrix Multiplication | Tiled algorithm with shared memory (TILE_SIZE: 32) |
| Memory Management | Pitched allocation for optimal memory access |
| Device Handling | Automatic CPU/GPU data transfer |
| Batch Processing | Vectorized operations for training efficiency |
| Layer | Features |
|---|---|
| Linear | • Xavier initialization • Configurable input/output dimensions • Forward / backward pass support • Batch size handling |
| ReLU | • Zero-memory activation • Optimized backward pass |
| Sigmoid | • Numerically stable implementation • Binary cross-entropy integration |
| Softmax | • Stable computation with max subtraction • Cross-entropy integration |
| Dropout | • Training/eval mode switching • Configurable drop rate |
| Component | Implementation |
|---|---|
| Optimizers | • SGD with momentum • Configurable weight decay • Gradient clipping |
| Scheduler | • ReduceLROnPlateau • Configurable patience & factor |
| Loss Functions | • MSE • Binary Cross-Entropy • Categorical Cross-Entropy |
- MNIST dataset loader with normalization options
- Tabular data loader for CSV files
- ONNX model import functionality (weights / biases / activation functions)
// Initialize model
Model model;
model.setOptimizer(SGD(0.01f, 0.9f, 0.0001f)); // lr, momentum, weight_decay
// Add layers
model.addLayer(std::make_unique<Linear>(784, 128, true)); // GPU enabled
model.addLayer(std::make_unique<ReLU>());
model.addLayer(std::make_unique<Dropout>(0.2f));
model.addLayer(std::make_unique<Linear>(128, 10, true));
model.addLayer(std::make_unique<Softmax>(true));
// Training loop
Tensor<float> predictions = model.forward(input);
auto [loss, gradients] = CategoricalCrossEntropyLoss(predictions, targets);
model.backward(gradients);
model.step();| Application | Description |
|---|---|
| MNIST Classification | Digit recognition with dropout and learning rate scheduling |
| Iris Classification | Multi-class flower classification |
| Breast Cancer Classification | Binary classification with regularization (logistic regression) |
| California Housing | Regression with multi-layer architecture |
- CUDA Toolkit
- C++17 compatible compiler
- ONNX runtime libraries
# Compile with GPU support
./script.sh
# Run with GPU
./output --gpu
# Available logging levels
./output --infer # Inference logging
./output --back # Backprop logging
./output --loss # Loss computation logging
./output --debug # Detailed debug information
./output --all # All logging enabled