Skip to content

Conversation

@carll99
Copy link
Member

@carll99 carll99 commented Dec 10, 2025

This is an updated PR with just the sign-off added. Goal was to have the very minimal code changes to give very basic logging info for the initial release. More detailed, code invasive, timings will we worked on for a future release/PR.

Signed-off-by: Carl Love <cel@linux.ibm.com>
@mkumatag mkumatag added this to the .Next milestone Dec 11, 2025
start_time = time.time()
vllm_stream = query_vllm_stream(prompt, docs, llm_endpoint, llm_model, stop_words, max_tokens, temperature, stream, dynamic_chunk_truncation=TRUNCATION)
request_time = time.time() - start_time
logger.info(f"Perf data: rag answer time = {request_time}")
Copy link
Contributor

@Niharika0306 Niharika0306 Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so as discussed in previous PR here :

This rag answer time doesn't reflect the actual time , as it logs as soon as stream object is returned.

But we can refer the llm inferencing time in the llm_utils file for actual timing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants