GitHub - SURFLOU/PredictStockPrices: Advanced Large Language Model (LLM) chatbot designed to provide detailed insights into stock market data, utilizing Databricks' data engineering pipelines for ELT processes

Disclaimer: This application is for informational purposes only. It does not provide financial advice, and any analysis or recommendations should not be considered a basis for making investment decisions. Investing in financial markets involves risk, and you should consult with a qualified professional before making any investment choices.

Application Overview

This project is an advanced Large Language Model (LLM) chatbot designed to provide detailed insights into stock market data. The application operates in two primary modes:

Data Query Mode: Users can ask specific questions about stock data, and the LLM will provide accurate and concise answers based on the underlying information.
Investor Analysis Mode: Users can request an analysis of a given stock through the lens of a famous investor. The chatbot can simulate the investment styles of:
- Benjamin Graham
- Warren Buffett
- Aswath Damodaran

In this mode, the LLM provides a comprehensive response that includes a buy/hold/sell signal, a confidence level (from 0 to 100), and a detailed reasoning that reflects the chosen investor's philosophy.

Application Architecture

Architecture Overview

The application is built on a modular microservices architecture to ensure scalability and maintainability.

Frontend: The user interface is developed with Streamlit, providing a clean and interactive chat experience.
Backend: A FastAPI backend serves as the central hub, managing user requests and orchestrating communication between different components.
LLM Agents: The core logic is handled by specialized LLM agents written with Pydantic-AI.
- In Investor Analysis Mode, a single LLM agent is responsible for generating the investment signal, confidence score, and reasoning based on the chosen investor's philosophy.
- In Data Query Mode, two LLM agents collaborate to fulfill the user's request. One agent translates the human-language query into a SQL query, and the second agent analyzes the database results and presents them to the user in a readable format.
Communication: All agents communicate with each other and the backend using gRPC, a high-performance framework for remote procedure calls, which ensures fast and efficient data exchange.

Databricks & Data Pipeline

The core of this application's data processing is a robust Extract, Transform, Load (ETL) pipeline built within the Azure Databricks environment. This pipeline is responsible for ingesting, cleaning, and structuring vast amounts of stock market data to make it accessible for the LLM.

Process Architecture

Key Technologies

Apache Kafka: Used for real-time streaming ingestion of stock data.
Azure Blob Storage: Serves as the raw and processed data lake, providing a scalable and cost-effective storage solution.
APIs: External APIs are leveraged to pull up-to-date stock data.

Delta Lake Architecture

The data within Databricks is structured using the Medallion Architecture, a multi-layered approach that ensures data quality and reusability.

Bronze Layer: Raw, untransformed data ingested directly from Kafka and APIs.
Silver Layer: Cleaned and refined data, where basic transformations and quality checks have been applied.
Gold Layer: Aggregate and feature-engineered data, ready for consumption by the LLM and other downstream applications.

Databricks Tables Architecture

ToDo List

End to end data pipelines in Azure&Databricks
Backend in FastAPI (On Going)
Secure frontend in Streamlit (On Going)
SQL translate Agent (On Going)
Investor Analysis Agent (On Going)
Analysis Agent (On Going)
Monitoring in Langfuse

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
databricks		databricks
function		function
misc		misc
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Application Overview

Application Architecture

Architecture Overview

Databricks & Data Pipeline

Process Architecture

Key Technologies

Delta Lake Architecture

Databricks Tables Architecture

ToDo List

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

SURFLOU/PredictStockPrices

Folders and files

Latest commit

History

Repository files navigation

Application Overview

Application Architecture

Architecture Overview

Databricks & Data Pipeline

Process Architecture

Key Technologies

Delta Lake Architecture

Databricks Tables Architecture

ToDo List

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages