Skip to content

UBC-CIC/Document-Smart-Search

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Document Smart Search

This system leverages Large Language Models (LLMs) and vector embeddings to revolutionize document interaction and organization. It provides an intelligent and automated way to categorize and analyze extensive document collections, automatically identifying relevant topics and themes with explainable confidence scores. Users can interact with the documents through a versatile chatbot, receiving tailored responses that adapt to their background and level of detail, from general inquiries to highly specific research needs. Additionally, a dedicated search component enables precise retrieval of technical documents. This comprehensive approach enhances document management, enabling more efficient information discovery and deeper insights across vast repositories.

Index Description
High Level Architecture High level overview illustrating component interactions
Deployment How to deploy the project
User Guide The working solution
Directories General project directory structure
API Documentation Documentation on the API the project uses
Changelog Any changes post publish
Credits Meet the team behind the solution
License License details

High-Level Architecture

The following architecture diagram illustrates the various AWS components utilized to deliver the solution. For an in-depth explanation of the frontend and backend stacks, please look at the Architecture Guide.

Alt text

Deployment Guide

To deploy this solution, please follow the steps laid out in the Deployment Guide

User Guide

Please refer to the Web App User Guide for instructions on navigating the web app interface.

Directories

├── cdk/                           # AWS CDK infrastructure code
│   ├── bin/                       # CDK app entry point
│   ├── glue/                     # AWS Glue jobs and scripts
│   │   ├── scripts/              # Glue job scripts
│   │   └── custom_modules/       # Shared Python modules
│   ├── lambda/                   # AWS Lambda functions
│   ├── layers/                   # Lambda layers
│   ├── lib/                      # CDK stack definitions
│   ├── sql_schema/              # Database schema definitions
│   └── text_generation/         # Text generation components
├── docs/                         # Project documentation
├── export/                       # Export files
├── frontend/                     # Public-facing web application
│   ├── public/                   # Static assets
│   └── src/                      # Source code
│       ├── app/                  # Next.js app router
│       └── components/           # React components
│           ├── analytics/        # Analytics components
│           ├── chat/            # Chat interface components
│           ├── document-detail/ # Document detail view
│           ├── document-search/ # Document search interface
│           ├── home/            # Home page components
│           └── ui/              # Shared UI components
└── frontendAdmin/                # Admin web application
    ├── public/                   # Static assets
    └── src/                      # Source code
        ├── app/                  # Next.js app router
        └── components/           # React components
            ├── analytics/        # Analytics dashboard
            ├── auth/            # Authentication components
            ├── feedback/        # User feedback components
            ├── history/         # History tracking
            ├── prompt/          # Prompt management
            └── ui/              # Shared UI components
  1. /cdk: Contains the AWS CDK infrastructure code

    • /bin: CDK app entry point and stack instantiation
    • /glue: AWS Glue jobs for data processing
      • /scripts: Glue job Python scripts
      • /custom_modules: Shared Python modules
    • /lambda: AWS Lambda functions
    • /layers: Lambda layers for shared dependencies
    • /lib: CDK stack definitions and infrastructure code
    • /sql_schema: Database schema and migration files
    • /text_generation: Text generation and processing components
  2. /docs: Project documentation and guides

  3. /frontend: Public-facing web application

    • /public: Static assets and public files
    • /src: Application source code
      • /app: Next.js app router and pages
      • /components: React components and UI elements
        • /analytics: Analytics and reporting components
        • /chat: Chat interface and messaging components
        • /document-detail: Document viewing and details
        • /document-search: Search interface and results
        • /home: Home page and landing components
        • /ui: Shared UI components and styles
  4. /frontendAdmin: Administrative web application

    • /public: Static assets and public files
    • /src: Application source code
      • /app: Next.js app router and pages
      • /components: React components and UI elements
        • /analytics: Analytics dashboard and metrics
        • /auth: Authentication and authorization
        • /feedback: User feedback management
        • /history: History tracking and logs
        • /prompt: Prompt management and configuration
        • /ui: Shared UI components and styles

API Documentation

Here you can learn about the API the project uses: API Documentation.

Modification Guide

Steps to implement optional modifications such as changing the colours of the application can be found here

Changelog

N/A

Credits

This application was architected and developed by Daniel Long, Tien Nguyen, Nikhil Sinclair, and Zayan Sheikh, with project assistance by Amy Cao and Harleen Chahal. Thanks to the UBC Cloud Innovation Centre Technical and Project Management teams for their guidance and support.

License

This project is distributed under the MIT License.

Licenses of libraries and tools used by the system are listed below:

PostgreSQL license

  • For PostgreSQL and pgvector
  • "a liberal Open Source license, similar to the BSD or MIT licenses."

LLaMa 3.3 Community License Agreement

  • For Llama 3.3 70B Instruct model

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 5