Skip to content

Figure out how inference will work #7

@0x000011b

Description

@0x000011b

I don't plan on shelling out money for inference at the moment, so the initial plan is to have users bring their own "inference back-end" with them - likely Colab for now. Some points about this though:

  • Who will be responsible for creating the prompt, sending it off to the inference backend and parsing the resulting generation?
    • Initial plan is to implement that here, and the front-end will simply POST user messages to an endpoint and receive responses (Maybe via WebSockets? Not sure holding on to a connection for 10+ seconds is a good idea)
    • Pros:
      • We'll have real-world data on inference requests, which we can use to calculate how much $ it would cost to actually run inference ourselves (I've gotten many users suggesting I open a Patreon to cover hosting expenses - I'm unsure how well that'd pan out but with data this decision could be made a little more clearly)
      • We can automatically push new prompting code by just updating the server
    • Cons:
      • Increased server load, since we'll be acting as a proxy for inference requests.
  • How will inference work for group chats? How do we decide which characters should speak and when?

Metadata

Metadata

Assignees

No one assigned

    Labels

    planningStuff we need to think about

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions