Skip to content

Alternatives to direct use of Anaconda infrastructure #422

@tsibley

Description

@tsibley

The new-ish Anaconda license terms are untenable for many organizations, including Fred Hutch. Avoiding use of licensed resources can be achieved thru client (e.g. conda, mamba, etc) configuration and/or network-level blocks (e.g. blocking https://conda.anaconda.org). Fred Hutch, for example, has chosen the latter with a two-pronged solution: local mirroring of the conda-forge and bioconda channels combined with proxied (i.e. controlled) access to https://conda.anaconda.org via https://conda-forge.fredhutch.org. We've been discussing this change internally on Slack.

While Nextstrain CLI does not use licensed resources (e.g. the official Anaconda installer or the official channels), our Conda runner does access the following URLs during nextstrain setup conda and nextstrain update conda:

https://api.anaconda.org/release/{channel}/{package}/{version}
https://api.anaconda.org/package/{channel}/{package}/files
https://api.anaconda.org/download/conda-forge/micromamba/{version}/{subdir}/{filename}
https://conda.anaconda.org/{channel}/{subdir}/…

where by default channel is one of conda-forge, bioconda (only for last URL), or nextstrain and package is one of micromamba or nextstrain-base. The Nextstrain channel and package name can be overridden with the NEXTSTRAIN_CONDA_CHANNEL and NEXTSTRAIN_CONDA_BASE_PACKAGE env vars.

The runner also matches against a https://conda.anaconda.org URL to report the channel of some installed packages:

anaconda_channel = re.search(r'^https://conda[.]anaconda[.]org/(?P<repo>.+?)/(?:linux|osx)-64$', channel)
if anaconda_channel:
channel = anaconda_channel["repo"]
return f"{name} {version} ({build}, {channel})"

The api.anaconda.org endpoints save us from downloading large repodata.json files to extract information about single packages.

We should extend the Conda runner to support some configuration/overriding of direct use of Anaconda infrastructure, e.g. an equivalent to Conda's channel_alias. This could be done via env vars or config or both. We'll also need to sort out an equivalent for Anaconda API access, or some other solution there.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions