-
Notifications
You must be signed in to change notification settings - Fork 31
Adding summarization endpoint #292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Manali Latkar <manali.latkar@ibm.com>
d9e3136 to
3a0bc7e
Compare
| MAX_WORDS_BEST_PERFORMANCE = 2000 | ||
| MAX_WORDS_DEGRADED_PERFORMANCE = 21500 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mkumatag
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let us push the design doc somewhere in this repo itself so that it becomes handy while reviewing the PR
| default_max_input_length = 6000 | ||
| default_prompt_template_token_count = 250 | ||
| default_max_summary_length = 1000 | ||
| default_max_file_size_mb = 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10MB is really huge file, have you tested with 10MB file size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
handling file size separately for pdf and text files. more details added in the proposal PR.
| default_temperature = 0.0 | ||
| default_max_input_length = 6000 | ||
| default_prompt_template_token_count = 250 | ||
| default_max_summary_length = 1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any logic behind this number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wanted a big enough number for our maximum input size which is 21500 words. 21500 words -> 1000 words = good summary. Anything larger than 1000 won't be a summary.
| default_prompt_template_token_count = 250 | ||
| default_max_summary_length = 1000 | ||
| default_max_file_size_mb = 10 | ||
| default_summarization_temperature = 0.3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be 0.2
| default_max_summary_length = 1000 | ||
| default_max_file_size_mb = 10 | ||
| default_summarization_temperature = 0.3 | ||
| default_summarization_stop_words = "\n\n,Note,Word Count,Revised Summary" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the purpose of these stop words?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mkumatag When I tested this, the LLM was responding with extra data that started with some of these stop words. When I started sending them, the response was always clean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could be because of the prompt: Summarize the following text in approximately {summary_length} words
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are supposed to take the summary length as input from the user. So we have to include it in the prompt for the output to stick to that limit.
|
|
||
| if __name__ == "__main__": | ||
| initialize_models() | ||
| uvicorn.run(app, host="0.0.0.0", port=8000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how to override this port?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and also how user will run this service alone vs with rag service
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently this needs to be run along with rag service as per Sebasitan's slide.
To run alone, this would require vllm as prerequisite, once we figure out the custom usecase/template problem, then we can address it I believe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm talking about an individual summarisation service itself, may be we will explore post merging this..
| @@ -0,0 +1,200 @@ | |||
| import os | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add some readme in this directory
|
|
||
| # Log warning if text exceeds the best performance threshold | ||
| if word_count > MAX_WORDS_BEST_PERFORMANCE: | ||
| logger.info(f"Input text exceeds maximum word limit of {MAX_WORDS_BEST_PERFORMANCE} words. Performance may be degraded.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure without displaying anything else and just be this logging message how much is going to helpful for the admin who is running this solution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mkumatag should we send a "degraded_performance": True in the response?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should log other details to identify the request like what is the type of content user has passed & word count
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: What will be the admin/user action on this? IMO: I think we should consider collecting via some perf counter for monitoring purposes. Otherwise, exposing such information to end users won't be very helpful. If necessary, we should limit character count(input) so that we can be efficient and return an error if it exceeds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when we say degraded_performance, do we mean qualitatively? isn't max_model_len = 32k should take care of this ?
| detail="Server busy. Try again shortly." | ||
| ) | ||
| try: | ||
| summary = query_vllm_completions(llm_endpoint, prompt, llm_model, settings.llm_max_tokens, settings.summarization_temperature) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
llm_max_tokens is set to 512 tokens which is too less if you are considering of 1000 output words
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mkumatag you are right, I had changed the max_tokens while testing larger inputs, but missed adding the change here. Will probably come up with a max_tokens value based on the summary length given by the user.
| file_content = file.file.read() | ||
|
|
||
| # Validate file size | ||
| if len(file_content) > MAX_FILE_SIZE_BYTES: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the file size should vary based on file type, .txt would consume less space compared to pdf.
Please do some exercise to come up with a valid file sizes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have addressed the separate max file sizes in my document proposal.
|
|
||
| # Log warning if text exceeds the best performance threshold | ||
| if word_count > MAX_WORDS_BEST_PERFORMANCE: | ||
| logger.info(f"Input text exceeds maximum word limit of {MAX_WORDS_BEST_PERFORMANCE} words. Performance may be degraded.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should log other details to identify the request like what is the type of content user has passed & word count
| except requests.exceptions.RequestException as e: | ||
| concurrency_limiter.release() | ||
| error_details = str(e) | ||
| if e.response is not None: | ||
| error_details += f", Response Text: {e.response.text}" | ||
| logger.error(f"Error calling vLLM API: {error_details}") | ||
|
|
||
| raise HTTPException( | ||
| status_code=500, | ||
| detail=error_details | ||
| ) | ||
| except Exception as e: | ||
| concurrency_limiter.release() | ||
| logger.error(f"Error calling vLLM API: {e}") | ||
| raise HTTPException( | ||
| status_code=500, | ||
| detail=str(e) | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move this block to llm_utils itself like how its handled in other query methods?

This PR adds a separate microservice for summarization. Input to be summarized can be given by file or plaintext. A summary length has to be given as well.