This system leverages Large Language Models (LLMs) and vector embeddings to revolutionize document interaction and organization. It provides an intelligent and automated way to categorize and analyze extensive document collections, automatically identifying relevant topics and themes with explainable confidence scores. Users can interact with the documents through a versatile chatbot, receiving tailored responses that adapt to their background and level of detail, from general inquiries to highly specific research needs. Additionally, a dedicated search component enables precise retrieval of technical documents. This comprehensive approach enhances document management, enabling more efficient information discovery and deeper insights across vast repositories.
| Index | Description |
|---|---|
| High Level Architecture | High level overview illustrating component interactions |
| Deployment | How to deploy the project |
| User Guide | The working solution |
| Directories | General project directory structure |
| API Documentation | Documentation on the API the project uses |
| Changelog | Any changes post publish |
| Credits | Meet the team behind the solution |
| License | License details |
The following architecture diagram illustrates the various AWS components utilized to deliver the solution. For an in-depth explanation of the frontend and backend stacks, please look at the Architecture Guide.
To deploy this solution, please follow the steps laid out in the Deployment Guide
Please refer to the Web App User Guide for instructions on navigating the web app interface.
├── cdk/ # AWS CDK infrastructure code
│ ├── bin/ # CDK app entry point
│ ├── glue/ # AWS Glue jobs and scripts
│ │ ├── scripts/ # Glue job scripts
│ │ └── custom_modules/ # Shared Python modules
│ ├── lambda/ # AWS Lambda functions
│ ├── layers/ # Lambda layers
│ ├── lib/ # CDK stack definitions
│ ├── sql_schema/ # Database schema definitions
│ └── text_generation/ # Text generation components
├── docs/ # Project documentation
├── export/ # Export files
├── frontend/ # Public-facing web application
│ ├── public/ # Static assets
│ └── src/ # Source code
│ ├── app/ # Next.js app router
│ └── components/ # React components
│ ├── analytics/ # Analytics components
│ ├── chat/ # Chat interface components
│ ├── document-detail/ # Document detail view
│ ├── document-search/ # Document search interface
│ ├── home/ # Home page components
│ └── ui/ # Shared UI components
└── frontendAdmin/ # Admin web application
├── public/ # Static assets
└── src/ # Source code
├── app/ # Next.js app router
└── components/ # React components
├── analytics/ # Analytics dashboard
├── auth/ # Authentication components
├── feedback/ # User feedback components
├── history/ # History tracking
├── prompt/ # Prompt management
└── ui/ # Shared UI components
-
/cdk: Contains the AWS CDK infrastructure code/bin: CDK app entry point and stack instantiation/glue: AWS Glue jobs for data processing/scripts: Glue job Python scripts/custom_modules: Shared Python modules
/lambda: AWS Lambda functions/layers: Lambda layers for shared dependencies/lib: CDK stack definitions and infrastructure code/sql_schema: Database schema and migration files/text_generation: Text generation and processing components
-
/docs: Project documentation and guides -
/frontend: Public-facing web application/public: Static assets and public files/src: Application source code/app: Next.js app router and pages/components: React components and UI elements/analytics: Analytics and reporting components/chat: Chat interface and messaging components/document-detail: Document viewing and details/document-search: Search interface and results/home: Home page and landing components/ui: Shared UI components and styles
-
/frontendAdmin: Administrative web application/public: Static assets and public files/src: Application source code/app: Next.js app router and pages/components: React components and UI elements/analytics: Analytics dashboard and metrics/auth: Authentication and authorization/feedback: User feedback management/history: History tracking and logs/prompt: Prompt management and configuration/ui: Shared UI components and styles
Here you can learn about the API the project uses: API Documentation.
Steps to implement optional modifications such as changing the colours of the application can be found here
N/A
This application was architected and developed by Daniel Long, Tien Nguyen, Nikhil Sinclair, and Zayan Sheikh, with project assistance by Amy Cao and Harleen Chahal. Thanks to the UBC Cloud Innovation Centre Technical and Project Management teams for their guidance and support.
This project is distributed under the MIT License.
Licenses of libraries and tools used by the system are listed below:
- For PostgreSQL and pgvector
- "a liberal Open Source license, similar to the BSD or MIT licenses."
LLaMa 3.3 Community License Agreement
- For Llama 3.3 70B Instruct model
