-
Notifications
You must be signed in to change notification settings - Fork 11
Open
Labels
planningStuff we need to think aboutStuff we need to think about
Description
I don't plan on shelling out money for inference at the moment, so the initial plan is to have users bring their own "inference back-end" with them - likely Colab for now. Some points about this though:
- Who will be responsible for creating the prompt, sending it off to the inference backend and parsing the resulting generation?
- Initial plan is to implement that here, and the front-end will simply
POSTuser messages to an endpoint and receive responses (Maybe via WebSockets? Not sure holding on to a connection for 10+ seconds is a good idea) - Pros:
- We'll have real-world data on inference requests, which we can use to calculate how much $ it would cost to actually run inference ourselves (I've gotten many users suggesting I open a Patreon to cover hosting expenses - I'm unsure how well that'd pan out but with data this decision could be made a little more clearly)
- We can automatically push new prompting code by just updating the server
- Cons:
- Increased server load, since we'll be acting as a proxy for inference requests.
- Initial plan is to implement that here, and the front-end will simply
- How will inference work for group chats? How do we decide which characters should speak and when?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
planningStuff we need to think aboutStuff we need to think about