-
Notifications
You must be signed in to change notification settings - Fork 13
Add Pythonx.install_env/0 and Pythonx.install_paths/0 to enable FLAME #35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Actually, I'm not so sure about the initializing once part. Especially in FLAME it can be that you reuse the FLAME but you'd want to initialize a different pyproject.toml / venv than the last user of the FLAME. Currently, if I understand correctly, that is not possible. Initializing Python and initializing UV seem to be very entangled, right? |
|
Hey, sorry for the late reply. Another approach that could work is for Livebook to set an application env with the pyproject toml, then when Pythonx boots on another node and the env is set, it would do the initialization. We already support such application env, but it is compile time (if we don't set the env, that code path doesn't exist at all): pythonx/lib/pythonx/application.ex Lines 30 to 34 in 57b5398
cc @josevalim
Initializing Pythonx itself means starting the interpreter and setting up code paths. uv is a way to get Python+dependencies. So those are distinct, but since currently we only support initializing Pythonx with uv provided Python+dependencies, they are entangled. Within a single node, Pythonx can only be initialized once. Theoretically there is a way to uninitialize the interpreter, but in practice this doesn't work, because many libraries with native code would break (e.g. numpy). |
|
Thanks for the reply and don't worry about the delay!
Okay, yes, this could also be built into Pythonx of course. Either way, in the case of FLAME, passing the ENV variable would have to be a manual configuration on the FLAME pool...
Okay, if I understand correctly, one FLAME runner / BEAM node can only be used for a single python project config (i.e. venv, i.e. Python+dependencies)? I mean... this limitation is probably okay if we can make sure Pythonx is actually initialized on the FLAME. The thing that's bothering me a bit is: ENV variables are configured on a FLAME pool. But the limitation about Pythonx actually applies to a runner, not a pool. Each fresh runner of the same pool could technically be initialized with a different pyproject.toml. But... if we have to pass pyproject.toml as ENV var, we have to configure it on the pool... Then again... maybe that's only a theoretical problem and in practice, when using it with Livebook, you'd use one FLAME pool for one pyproject.toml... |
|
@mruoss to be clear, I mean pythonx application env, not an env var. The application envs are automatically copied by FLAME to the runner :) |
|
Oooh I see! Yes this sounds like a good approach! I can play around with this a bit. |
|
Hmm... maybe I'm still not getting it.
Aren't we talking about the following? Mix.install([{:pythonx, "~> 0.4.2"}, {:flame, "~> 0.1.5"}, {:flame_k8s_backend, "~> 0.5"}])
# ... initialize FLAME pool ...
Application.put_env(:pythonx, :pyproject_toml, "just a test")
FLAME.call(:runner, fn -> Application.get_env(:pythonx, :pyproject_toml) end)
# returns nil, was expecting "just a test" |
|
I think we copy the .app files but not the runtime values (but I may be misremembering). So anything dynamic won't work indeed. |
|
Gah, sorry, I thought the persistent config is copied too but I misremembered. |
|
But I think we do need to pass the information for boot in some way. @josevalim perhaps we can do explicit and pass extra config on flame pool to set extra application env? |
|
@jonatanklosko another option is to pass Pythonx as additionals path to be copied to FLAME and, because we should cache everything, then it just works on the FLAME? |
|
@josevalim that's a separate point, ideally we want to copy to effectively get cache hit, but the question is how does pythonx on the new node know that it should initialize the interpreter. |
|
It would be great if we could store it in the directory we copy but I am afraid it is not straightforward. We could also look at system env vars, but I don’t think they are copied either? |
|
Or we could copy to priv, which may have other side effects, so perhaps yeah, we need new capabilities in FLAME. |
Env vars that we set dynamically are not copied. We could pass it explicitly to the pool via
You mean the cache is still somewhere global, but FLAME copies it to Pythonx priv on the new node? It's even more implicit, since Pythonx would need to infer if/how it should initialize based on the priv contents. We also already use So currently I am leaning towards adding application env config to FLAME, and perhaps a function like |
|
Right, so any of app env or system env are fine. FLAME already supports system env but adding app env should be trivial. |
|
@josevalim actually, in both cases we can make it opaque if we don't expose the key name: [
env: ... |> Map.merge(Pythonx.flame_env())
]
# vs
[
config: [
pythonx: Pythonx.flame_config()
]
]Then we can have a very verbose env var name with encoded state and it's all private API. Then I am fine with using env vars. |
|
as you prefer! |
|
@mruoss so in case you want to try the alternative approach, the idea is to have |
|
That makes sense and I can definitely try this approach. But where would Pythonx keep this state? I.e. where would |
|
We can store in persistent term upon initialization :) |
82015a3 to
fa955d4
Compare
|
The approach in this pull request defines init_state by the paths rather than the I pushed some changes to explain the approach. There's 2 exported functions on this branch now Does that make sense? |
fa955d4 to
4ba6784
Compare
|
What if we call it |
746a9fb to
9eee00a
Compare
|
I will leave the decision about function naming up to you guys. With the current version of this PR, I can successfully launch a FLAME and initialize Pythonx on it. Note that the wildcard part is only needed until a new version of FLAME is released with phoenixframework/flame@6d62b8a Once we're aligned about the naming(s), the next step would be to adapt livebook's Once that is done, users still need to manually pass the paths and env to FLAME. Using the k8s backend, this would look something like the following in a Livebook: import YamlElixir.Sigil
pythonx_env = Pythonx.install_env()
pod_template = ~y"""
apiVersion: v1
kind: Pod
metadata:
generateName: livebook-flame-runner-
namespace: livebook
spec:
containers:
- name: livebook-runtime
env:
- name: LIVEBOOK_COOKIE
value: #{Node.get_cookie()}
- name: #{pythonx_env.name}
value: #{pythonx_env.value}
"""
Kino.start_child(
{FLAME.Pool,
name: :runner,
code_sync: [
start_apps: true,
copy_apps: true,
sync_beams: Kino.beam_paths(),
copy_paths: Pythonx.install_paths() # <= this is where we copy the paths / cache over to the FLAME
],
backend: {FLAMEK8sBackend, runner_pod_tpl: pod_template}}
# other options
) |
9eee00a to
50575e7
Compare
@jonatanklosko Oh of course, this makes sense! I've pushed some changes now and it works like a charm. Now I don't have to init Pythonx inside the FLAME.call() callback anymore and I can re-use the same runner for multiple consecutive calls (as long as python_toml doesn't change). 🎉
@josevalim I agree, it would be nice if this didn't have to be configured on the backend (k8s pod template in my example). But I'm not sure I fully get you. Are you saying you'd like both, the path and env sync to happen without the need for the user to specify anything on either pool nor backend? |
50575e7 to
e6d2176
Compare
|
I am saying it should only be specified at the FLAME level. But we need a good name for it, because the variable is only set after the Erlang runtime boots, unless we add a new responsibility to the FLAME backends. We will have to play with it from the FLAME side! |
|
Okay yes I see your point. Maybe we're abusing system env here. This is FLAME specific so maybe it can be a bit more "custom tailored", no? What if FLAME provides an API similar to Agent... something built on top of |
|
Honestly, those are all the same. I think we will only know the answer for sure once we play with FLAME API. It may be the system environment makes the most sense, as long as we allow the backend to set it before the machine starts, then it can be used for other use cases. But we need to consult with the backends. I assume for k8s that would be straightforward? |
|
Haha, yeah I had the same thought after I wrote it (all of them being the same). So let me play with FLAME a bit when I get to it.
You mean passing all (or a set of) system env vars? that's pretty straightforward, yes. |
|
@mruoss an So we just need the same for k8s? |
|
Oh well if this is what you're talking about: Any FLAME backend that wants to support Livebook already needs support for defining env variables on the runner "machine". Otherwise you couldn't pass The k8s backend supports 2 ways of defining the Pod manifest for the FLAME runner (See detailed docs on hexdocs):
|
|
So I think it is all good. Worst case scenario we will need to pass the manifest and |
|
Again I'm not sure I understand. With he k8s backend it's an either/or. You either use the abstraction and pass |
|
@mruoss yes, I understand that. All I am saying is that it would be nice to do: So we are not polluting the pod template with Pythonx or Livebook vars, and the user can focus on their stuff. That's what I meant by a unified env var. It is not the end of the world though, if we can't have both, that's fine, and |
jonatanklosko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! I dropped some comments regarding the API, and then we can ship it.
|
@josevalim I understand now. And I'm torn between "yes that would be nice indeed" and "but then we define env vars in 2 places, we need to clarify precedence,...". But I'll have to give it some more thought. @jonatanklosko Thanks for the code review. All makes sense. Not 100% sure about the root_dir. In my test it now takes quite long to create the FLAME pool because the list of files is very long. |
|
@mruoss interesting, |
9e1556f to
e7bfc0e
Compare
|
Sorry, something got messed in my previous commit. Now it should reflect the correct changes.
Hmm yes that's true. Still, do we really need to copy the entire |
The headers presumably not, since if anything compiles on the host, it is then reused on the nodes. But the headers are small compared to lib/, which is most of the space; it has dynamic libraries and those we should likely keep as is in the general case. |
jonatanklosko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly docs updates, but otherwise looks good to me!
e7bfc0e to
acc7245
Compare
jonatanklosko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic, looks good to me!!
64ed0bc to
df69451
Compare
df69451 to
90696dc
Compare
|
Flame v0.5.3 is out :) |
This is a proposition / food for discussion at this point - coming over form the discussion here: https://elixirforum.com/t/is-there-a-way-to-access-pyproject-toml/73129.
In order to support Livebook+Pythonx+FLAME, somethink like this would be a first step. We'd still have to somehow get that struct/data over to the FLAME and initialize Pythonx in Livebook's
/rel/server/overlays/bin/start_flame.exs. Since this can only be done once, we can't do it inside the FLAME.call function as FLAMEs can be reused...WDYT?