Skip to content

GPU Not Utilized When Using llm-rs with CUDA Version #27

@andri-jpg

Description

@andri-jpg

I have installed the llm-rs library with the CUDA version, However, even though I have set use_gpu=True in the SessionConfig, the GPU is not utilized when running the code. Instead, the CPU usage remains at 100% during execution.

Additional Information:
I am using the "RedPajama Chat 3B" model from Rustformers. The model can be found at the following link: RedPajama Chat 3B Model.

Terminal output:

PS C:\Users\andri\Downloads\chatwaifu> python main.py
ggml_init_cublas: found 1 CUDA devices:
  Device 0: NVIDIA P106-100, compute capability 6.1

Code:

import json
from llm_rs.langchain import RustformersLLM
from llm_rs import SessionConfig, GenerationConfig, ContainerType, QuantizationType, Precision
from langchain import PromptTemplate
from langchain.chains import LLMChain
from langchain.memory import ConversationBufferMemory
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from pathlib import Path

class ChainingModel:
    def __init__(self, model, name, assistant_name):
        with open('config.json') as self.configuration:
            self.user_config = json.load(self.configuration)
        with open('template.json') as self.prompt_template:
            self.user_template = json.load(self.prompt_template)
        model = f"{model}.bin"
        self.model = model

        self.name = name
        self.assistant_name = assistant_name
        self.names = f"<{name}>"
        self.assistant_names = f"<{assistant_name}>"
        
        self.stop_word = ['\n<human>:', '<human>', '<bot>', '\n<bot>:']
        self.stop_words = self.change_stop_words(self.stop_word, self.name, self.assistant_name)
        session_config = SessionConfig(
            threads=self.user_config['threads'],
            context_length=self.user_config['context_length'],
            prefer_mmap=False,
            use_gpu=True
        )

        generation_config = GenerationConfig(
            top_p=self.user_config['top_p'],
            top_k=self.user_config['top_k'],
            temperature=self.user_config['temperature'],
            max_new_tokens=self.user_config['max_new_tokens'],
            repetition_penalty=self.user_config['repetition_penalty'],
            stop_words=self.stop_words
        )

        template = self.user_template['template']

        self.template = self.change_names(template, self.assistant_name, self.name)
        self.prompt = PromptTemplate(
            input_variables=["chat_history", "instruction"],
            template=self.template
        )
        self.memory = ConversationBufferMemory(memory_key="chat_history")

        self.llm = RustformersLLM(
            model_path_or_repo_id=self.model,
            session_config=session_config,
            generation_config=generation_config,
            callbacks=[StreamingStdOutCallbackHandler()]
        )

        self.chain = LLMChain(llm=self.llm, prompt=self.prompt, memory=self.memory)

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentationquestionFurther information is requested

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions