-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Labels
documentationImprovements or additions to documentationImprovements or additions to documentationquestionFurther information is requestedFurther information is requested
Description
I have installed the llm-rs library with the CUDA version, However, even though I have set use_gpu=True in the SessionConfig, the GPU is not utilized when running the code. Instead, the CPU usage remains at 100% during execution.
Additional Information:
I am using the "RedPajama Chat 3B" model from Rustformers. The model can be found at the following link: RedPajama Chat 3B Model.
Terminal output:
PS C:\Users\andri\Downloads\chatwaifu> python main.py
ggml_init_cublas: found 1 CUDA devices:
Device 0: NVIDIA P106-100, compute capability 6.1
Code:
import json
from llm_rs.langchain import RustformersLLM
from llm_rs import SessionConfig, GenerationConfig, ContainerType, QuantizationType, Precision
from langchain import PromptTemplate
from langchain.chains import LLMChain
from langchain.memory import ConversationBufferMemory
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from pathlib import Path
class ChainingModel:
def __init__(self, model, name, assistant_name):
with open('config.json') as self.configuration:
self.user_config = json.load(self.configuration)
with open('template.json') as self.prompt_template:
self.user_template = json.load(self.prompt_template)
model = f"{model}.bin"
self.model = model
self.name = name
self.assistant_name = assistant_name
self.names = f"<{name}>"
self.assistant_names = f"<{assistant_name}>"
self.stop_word = ['\n<human>:', '<human>', '<bot>', '\n<bot>:']
self.stop_words = self.change_stop_words(self.stop_word, self.name, self.assistant_name)
session_config = SessionConfig(
threads=self.user_config['threads'],
context_length=self.user_config['context_length'],
prefer_mmap=False,
use_gpu=True
)
generation_config = GenerationConfig(
top_p=self.user_config['top_p'],
top_k=self.user_config['top_k'],
temperature=self.user_config['temperature'],
max_new_tokens=self.user_config['max_new_tokens'],
repetition_penalty=self.user_config['repetition_penalty'],
stop_words=self.stop_words
)
template = self.user_template['template']
self.template = self.change_names(template, self.assistant_name, self.name)
self.prompt = PromptTemplate(
input_variables=["chat_history", "instruction"],
template=self.template
)
self.memory = ConversationBufferMemory(memory_key="chat_history")
self.llm = RustformersLLM(
model_path_or_repo_id=self.model,
session_config=session_config,
generation_config=generation_config,
callbacks=[StreamingStdOutCallbackHandler()]
)
self.chain = LLMChain(llm=self.llm, prompt=self.prompt, memory=self.memory)Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
documentationImprovements or additions to documentationImprovements or additions to documentationquestionFurther information is requestedFurther information is requested