-
Notifications
You must be signed in to change notification settings - Fork 16
Description
Every codebase at one point winds up with factory registries, or instance maps, etc. W/in the ecosystem we have the following registries exist- even if people may not call them a registry. I'm sure there are more, these are just the ones I knew of and checked, and they still exist.
- snakeoil:
- compression. It doesn't look it, but it is- that's not an ABC, but the init_subclass is actually doing registration of any subclasses which isn't right (creation of a class doesn't mean register- tests for example)
- chksums. Fortunately python provides basically what we need, but the implementation of this registry mechanism is arse due to the API it was built for- the old data source shim for py2k/py3k. That has to be rewritten for the most part to constrain the API. Rebuilding of that should be on a common registry design just to move it into a simple "register this implemetation" flow.
- pkgcore: many pseudo registries. Anything that used the original
plugsframework is exactly this.pconfigfor example does a full import of the namespace and then walks all subclasses of Config to find what it needs; that's a registry, one that should be cached. There are some registries in pkgcore that need to be removed, but as a hard rule, they exist for a reason- that indirection supporting the other parts of the architecture.- Concrete example: the config indirection was to allow 3rd party repository and cache implementations. EAPI restrictions finally caught up, removing the core blocker- unrestricted ebuild file access into a tree, thus mandating the tree be fully real on disk. The internals of pkgcore have likely broken the design contract in a few spots, but the architecture itself enforces this support. TL;dr: a crappy slow rpy asking a big boy x86_64 to build its packages for it. That's the modern example of this.
- Carrying any such 3rd party implementation in mainline pkgcore is very much not on the todo until it is stabilized. Even then it's preferable that it be an external codebase.
- pkgcheck:
- pkgcheck.objects is entirely registries that also serialize to disk. They're quite fragile to deal with w/in
py_build.pysteps because the code's implementation is "load the cache if available, else scan"- there is no way to ask those registries to tell you what it would write as a cache since it's already merged the cache in a way that can't be tracked. Tl:DR: this needs to seperate and track internally. - That implementation also forces full namespace loads of all items in the listed cache. The K has to be reified, but the V does not- not until it's asked for.
- pkgcheck.objects is entirely registries that also serialize to disk. They're quite fragile to deal with w/in
For the above, they lack things the original plugin framework designed for which pkgcheck has made more relevant:
- ability to register multiple implementations of something. Checksums for example, there are multiple fallbacks possible, and they should be encoded and used.
- ArComp, same thing
- pkgcheck is currently the only source of any possible check in gentoo. I would expect there are third party checks that do not meet the standard of inclusion into pkgcheck (too niche, or in house, assuming google is still using gentoo for chromeos. Either they have to maintain a fork, or they need a way to register their own checks into the pkgcheck system.
- Repositories themselves should carry metadata stating what checks they want ran (in additional to pkgcheck defaults)- or suppressions of pkgcheck defaults. They also may want to carry their own "you must have this installed since it provides checks"- the chromeos scenario I mentioned, even if we were doing our work in the public.
- Providing for this isn't a "boil the ocean". The cache format of things like pkgcheck.objects should be textual- key -> import path. Putting it within the python source tree isn't a hard requirement. Putting it into a general directory which 3rd parties can also add into opens up the 3rd party usage.
That is the use cases I can see, and the functioniality that should be designed for- even if it's not implemented up front, the design should not preclude these things.
Basically, there is these scenarios:
- K->V; registry of "key" to "thing".
pkgcheck.objectsis basically a map of keys to classes. - K -> list[V]. Checksums for example: This scenario requires some way to order
list[V]- priorities. It also strongly implies the need to haveVbe able to self disable. "If bsdtar exists, this is how to use it. If it doesn't, I may be highest priority, but ignore me and try the next one in the line".
Orthogonal to that is:
- Is this
K->V | K->list[V]invariant, always a given for this project's code? Is it something that should be serialized for performance reasons? - If it's invariant to the code, it can be serialized in it's full state. If it's not, how can something register "I exist, load me and I'll tell you what I have"? This isn't a hard requirement as much as a way to think of this. All serializing registries have to have some way to self load if they're authoratitive for their domain. If they have that, then is there a reason to not support "load this registry at this import path" for registries that are variant, but known to exist?
The above is a dump of what I see as the design goals and constraints. Irregardless of how far capabilities are taken, it's implicit in those constraints that we should have registries of registries. For example using pkgcheck; pkgcheck code all knows of one common registry for where to get the global lists of checks. That registry is composed of the builtin pkgcheck registry, and whatever third party registries are loaded from disk..
Why the seperation? Because we need to be able to take that pkgcheck internal registry- the one that is just pkgcheck code- and be able to serialize it to disk. Thus registries of registries.