Skip to content

Real-time bitstream malware detection using an ML model. Includes feature extraction, classification, and embedded inference on PYNQ-supported FPGA boards.

License

Notifications You must be signed in to change notification settings

Bread2002/PYNQ_BLADEI

Repository files navigation

BLADEI: Bitstream-Level Abnormality Detection for Embedded Inference

Copyright (c) 2025, Rye Stahle-Smith


📌 Project Overview

This repository contains an embedded deployment pipeline for detecting malicious FPGA bitstreams using a trained machine learning (ML) model. Bitstreams are configuration files that can be weaponized to introduce hardware Trojans, posing serious risks in shared or cloud-hosted reconfigurable systems. This project leverages a lightweight, byte-level classification approach and enables on-device malware detection for PYNQ-supported FPGA boards, without requiring reverse engineering techniques or access to original source code or netlists. Benchmark designs, including AES-128 and RS232 variants, were obtained from Trust-Hub, then synthesized, implemented, and categorized as benign, malicious, or empty .bit files


⚙️ Features

  • 🔍 Byte-frequency analysis of binary .bit files
  • 🧩 Lightweight byte-level + statistical feature extraction
  • 📊 Real-time inference using a trained Random Forest with a custom, dependency-light predictor
  • ⚡ Deployment-ready for ARMv7 (e.g., PYNQ-Z1/Z2, Zynq-7000 SoC) and ARMv8 (e.g., Zynq UltraScale+ MPSoC, RFSoC, Kria) boards
  • 🧪 Verified with state-of-the-art (SOTA) bitstreams derived from Trust-Hub benchmarks

📂 Repository Structure

pynq-maldetect/
├── trusthub_bitstreams/ # Sample .bit files (Benign, Malicious, Empty)
├── model_components/ # Quantized ML model components
├── train_model.py # Model training and export for PYNQ
├── deploy_model.py # Model deployment for on-device inference
├── requirements.txt # Python dependencies
├── LICENSE.md
└── README.md


🚀 Setup Instructions

This project is divided into two parts:

  • 🧠 Model Training and Export
  • ⚙️ On-Device Inference

🧠 train_model.py — Model Training and Export

Requirements:

  • Python 3.8+
  • Python Packages: scikit-learn, numpy, scipy

⚠️ Note: Training should be performed on a general-purpose machine (laptop, workstation, or server) for both ARMv7 and ARMv8 targets. While some ARMv8 boards may be capable of training, it is not the recommended workflow. Training is heavier, package availability can be inconsistent, and it’s typically slower and less reproducible than running on a PC.

  1. Clone the Repository:

    git clone https://github.com/Bread2002/PYNQ_BLADEI.git
    cd PYNQ_BLADEI
  2. Install Dependencies:

    pip install -r requirements.txt
  3. Run the Training Script:

    python train_model.py

Features:

  • Byte-frequency feature extraction from .bit files (256-dimensional normalized histogram)
  • Statistical augmentation features (e.g., mean, std, skew, kurtosis, entropy, density metrics)
  • Training and evaluation using k-Fold Cross-Validation
  • Best-performing model exported as compact artifacts (JSON + NumPy arrays) and bundled into a .tar.gz archive for PYNQ deployment

⚙️ deploy_model.py — On-Device Inference

Requirements:

  • A supported FPGA board with PYNQ v3.1
  • Quantized model components (via on-board training or exported archive)
  1. Import the Archive to your PYNQ board via Jupyter Notebook or SSH/SFTP

  2. Decompress the Archive:

    mkdir PYNQ_BLADEI
    tar -xvzf PYNQ_BLADEI.tar.gz -C ./PYNQ_BLADEI
    rm PYNQ_BLADEI.tar.gz
    cd PYNQ_BLADEI
  3. Run the Deployment Script:

    python deploy_model.py

Features:

  • Loads .bit files from local storage
  • Extracts byte-frequency + statistical features
  • Predicts class (Benign, Malicious, or Empty) using the trained model
  • Displays prediction result with latency breakdown:
    • Load time
    • Feature extraction time
    • Inference time
  • Quarantines suspicious bitstreams

📈 Sample Output of Mock Deployment Pipeline

Benign Bitstream

======= BLADEI Vetting: =======
Processing bitstream: AES-T2100_TjFree_20251218_085702.bit

Actual Class: Benign AES (Class 1)
Predicted Class: Benign AES (Class 1) [94.67% Confidence]

ACTION: Bitstream passed vetting. Proceed to deployment.

======= Latency Summary: =======
Load Bitstream: 24.14 ms
Feature Extraction: 6124.15 ms
Prediction: 69.33 ms

Total Latency: 6.22 s

======= System Information: =======
System: Linux
Node Name: pynq
Release: 6.6.10-xilinx-v2024.1-g08e597ec1786
Version: #1 SMP PREEMPT Sat Apr 27 05:22:24 UTC 2024
Machine: armv7l
Processor: armv7l

======= CPU Information: =======
CPU Cores: 2
Logical Processors: 2
CPU Usage per Core: [0.4, 99.9]
Total RAM: 491.6640625 MB

Malicious Bitstream

======= BLADEI Vetting: =======
Processing bitstream: AES-T500_TjIn_20251218_163136.bit

Actual Class: Malicious AES (Class 3)
Predicted Class: Malicious AES (Class 3) [90.00% Confidence]

ACTION: Bitstream quarantined -> ./mock_deployment/Quarantine/AES-T500_TjIn_20251218_163136.bit
ACTION: Deployment blocked.

======= Latency Summary: =======
Load Bitstream: 105.69 ms
Feature Extraction: 6105.57 ms
Prediction: 68.96 ms

Total Latency: 6.28 s

======= System Information: =======
System: Linux
Node Name: pynq
Release: 6.6.10-xilinx-v2024.1-g08e597ec1786
Version: #1 SMP PREEMPT Sat Apr 27 05:22:24 UTC 2024
Machine: armv7l
Processor: armv7l

======= CPU Information: =======
CPU Cores: 2
Logical Processors: 2
CPU Usage per Core: [0.5, 98.8]
Total RAM: 491.6640625 MB

Empty Bitstream

======= BLADEI Vetting: =======
Processing bitstream: empty2_Empty_20251219_013937.bit

Actual Class: Empty (Class 0)
Predicted Class: Empty (Class 0) [70.67% Confidence]

ACTION: Bitstream quarantined -> ./mock_deployment/Quarantine/empty2_Empty_20251219_013937.bit
ACTION: Deployment blocked.

======= Latency Summary: ======= Load Bitstream: 178.80 ms
Feature Extraction: 6140.21 ms
Prediction: 58.70 ms

Total Latency: 6.38 s

======= System Information: =======
System: Linux
Node Name: pynq
Release: 6.6.10-xilinx-v2024.1-g08e597ec1786
Version: #1 SMP PREEMPT Sat Apr 27 05:22:24 UTC 2024
Machine: armv7l
Processor: armv7l

======= CPU Information: =======
CPU Cores: 2
Logical Processors: 2
CPU Usage per Core: [0.7, 98.1]
Total RAM: 491.6640625 MB


🤝 Acknowledgments

The authors were pleased to have this work accepted for presentation at the 37th annual ACM/ IEEE Supercomputing Conference. This work was supported by the McNair Junior Fellowship and Office of Undergraduate Research at the University of South Carolina. OpenAl's ChatGPT assisted with language and grammar correction. While this project utilizes benchmark designs from Trust-Hub, a resource sponsored by the National Science Foundation (NSF), all technical content and analysis were independently developed by the authors. This research also utilized PYNQ, provided by AMD and Xilinx, whose tools and hardware facilitated the synthesis and deployment stages of this study. Access to the FPGA devices was made possible through the AMD University Program.


🛠️ Future Work

  • Expand the current dataset with more SOTA benchmarks (ISCAS'85, ISCAS'89, ITC'02, and ITC'99)
  • Add a CNN-based image classification model to authenticate ML predictions
  • Implement a mock cloud-to-edge bitstream deployment pipeline
  • Improve detection latency with quantized models
  • Expand support for additional FPGA boards

🖊️ References

  • AMD. (2024). PYNQ: Python Productivity for Zynq. Retrieved from https://www.pynq.io
  • Benz, F., Seffrin, A., & Huss, S. A. (2012). BIL: A Tool-Chain for Bitstream Reverse-Engineering. Proceedings of the IEEE International Conference on Field Programmable Logic and Applications (FPL), 735–738. IEEE.
  • Chawla, N., Bowyer, K., Hall, L., & Kegelmeyer, W. (2002). SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research, 16, 321–357.
  • Elnaggar, R., & Chakrabarty, K. (2018). Machine Learning for Hardware Security: Opportunities and Risks. Journal of Electronic Testing, 34(2), 183–201.
  • Elnaggar, R., Chaudhuri, J., Karri, R., & Chakrabarty, K. (2023). Learning Malicious Circuits in FPGA Bitstreams. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 42(3), 726–739. Retrieved from https://ieeexplore.ieee.org/document/9828544/
  • Hayashi, V. T., & Ruggiero, W. V. (2025). Hardware Trojan Detection in Open-Source Hardware Designs Using Machine Learning. IEEE Access. Retrieved from https://ieeexplore.ieee.org/document/10904479/
  • Imbalanced-learn Developers. (2024). SMOTE. Retrieved from https://bit.ly/3IXc0l7
  • Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., … Duchesnay, E. (2011). scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825–2830. Retrieved from https://dl.acm.org/doi/10.5555/1953048.2078195
  • Salmani, H., Tehranipoor, M., & Karri, R. (2013). On Design Vulnerability Analysis and Trust Benchmark Development. Proceedings of the IEEE International Conference on Computer Design (ICCD). IEEE.
  • Scikit-learn Developers. (2025a). Cross Validation. Retrieved from https://bit.ly/3gct8QG
  • Scikit-learn Developers. (2025b). Truncated SVD. Retrieved from https://bit.ly/4mmi4BT
  • Seo, Y., Yoon, J., Jang, J., Cho, M., Kim, H.-K., & Kwon, T. (2018). Poster: Towards Reverse Engineering FPGA Bitstreams for Hardware Trojan Detection. Proceedings of the Network and Distributed System Security Symposium (NDSS), 18–21. Internet Society.
  • Shakya, B., He, T., Salmani, H., Forte, D., Bhunia, S., & Tehranipoor, M. (2017). Benchmarking of Hardware Trojans and Maliciously Affected Circuits. Journal of Hardware and Systems Security.
  • Yoon, J., Seo, Y., Jang, J., Cho, M., Kim, J., Kim, H., & Kwon, T. (2018). A Bitstream Reverse Engineering Tool for FPGA Hardware Trojan Detection. Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, 2318–2320. Presented at the Toronto, Canada. doi:10.1145/3243734.3278487

About

Real-time bitstream malware detection using an ML model. Includes feature extraction, classification, and embedded inference on PYNQ-supported FPGA boards.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages