Skip to content

Conversation

@AngieHinrichs
Copy link

The UCSC Genome Browser code base includes a network library with an interface analogous to knetfile.c's, but with a few extra features (HTTPS, u/p authentication, sparse file caching of complete URL paths). In order to use UCSC's network library within samtools, I have added a layer of indirection to knetfile functions. An alternate implementation of knetfile functions can be registered by passing in function pointers. Then, when a knetfile function is called, if an alternate function pointer has been registered, the knetfile function calls the alternate function and returns. The new code is all inside "#ifdef KNETFILE_HOOKS" so if -DKNETFILE_HOOKS is omitted, then samtools is compiled without the new layer.

The slight change to knetfile_hooks required a little extra abstraction in a couple other places in the code; for example, in bam_index_load_core, #ifdef KNETFILE_HOOKS then instead of saving the index file to the local directory and accessing it with fread, the index file is accessed using knetfile.

This patch also #ifdef 0's out the EOF check in bam.c, because I considered it costly & unnecessary for the Genome Browser's purposes -- I'd understand if you want to exclude that part. A separate #ifdef for it would be nice, though.

This patch has been in use for over a year in the UCSC Genome Browser and several of our mirror sites. I have provided versions of the patch with different line numbers for different samtools releases, along with instructions for applying the patch, here: http://genomewiki.ucsc.edu/index.php/KNETFILE_HOOKS . Some mirror maintainers have asked that I submit this, in hopes they won't have to manually patch every time they get the latest version of samtools.

I hope you'll consider incorporating this into samtools. Thanks!

1. knetfile can become a wrapper on another network library, for example
   the UCSC Genome Browser's network code which also supports https and
   sparse-file local caching of data.
2. The EOF check is #ifdef'd out because it costs another access,
   prints to stderr when it can't seek, and I don't consider it
   necessary.  Hmmm, I should have added an option for it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant