Skip to content

Conversation

@jfinkhaeuser
Copy link
Contributor

…llow extending the XML_RESOURCE_ORG_MAP. This permits extending the list of sources, and would also allow for additional reference schemas.

The motivation is, for example, to better reference documents that might be specified with identifiers other than DOI, or according to schemata from specific standards organizations. One example might e.g. be to support the ARK identifier scheme.

One issue with ARKs is that it's a federated system, and not every ARK resolver will yield bibxml compatible metadata -- and different implementations might do so via different inflections (their term for e.g. a ?bibxml or similar suffix). As a result, hardcoding support would hardcode assumptions about the resolver, which is unlikely to be a smart move.

This mechanism would allow for room for experimentation, or for adjusting reference sources to more local needs.

Example implementation of an extension.rb:

module XmlResourceOrgMapExtension

def update_map(map)
  map["my_addtion"] = [
     "bibxml", TTL_CONSTANT, false,
     ->(fn, n){
        [
          name = "naming-scheme.#{"%04d" % n.to_i}.xml",
          "#{BASE_URL}/#{name}"
        ] 
     }, 
     true
  ]  
  map     
end
module_function :update_map

end

…llow extending the XML_RESOURCE_ORG_MAP. This permits extending the list of sources, and would also allow for additional reference schemas.
@jfinkhaeuser
Copy link
Contributor Author

Do you have any thoughts on this? Should I use a different approach?

I wanted to have something of an interface (i.e. call a function rather than monkey patch stuff), because that's a cleaner approach. But it's not as if I'm married to this particular patch. I just would like to be able to add more sources for references.

FWIW, I can also add a test case, docs, etc. I just prefer to start small to make an initial review easier.

@cabo
Copy link
Owner

cabo commented Mar 27, 2025

I didn't have too much time working on this, but of course your PR triggered some background processing in my brain.
You can easily do another gem that simply requires the existing kramdown-rfc* gems, does some monkey patching, and then runs the actual work.
So my (those background processes') attention went to what might be a good interface to provide to make this more usable (and less brittle!).
I certainly don't want to have a sea of monkey patches out there that break each time I change a minute implementation detail...

@cabo
Copy link
Owner

cabo commented Mar 27, 2025

BTW, you may want to look at the way cbor-diag handles application-extensions for what those background processes had in mind...

@jfinkhaeuser
Copy link
Contributor Author

Well...

No, exactly. A sea of monkey patches is exactly what I'd like to avoid.

Ruby gives a few options (though it's been some years since I used it!), most of which do seem to involve some kind of monkey patching magic that is super useful for getting started, and will be rubbish to maintain over time.

For this particular use case, my personal preferred solution would actually be more data driven. I'd like to be able to provide a config file that basically takes identifier prefixes as keys, and maps them to URL, filename, TTL and so forth, where the URL/filename can contain placeholders for the full identifier (and/or the prefix, but there'll be little else useable).

Two downsides to this is that it's a little more limited in what you're doing in the lambdas for the I-D, STD and BCP bits. But it's fine for those to stay hardcoded as-is, and have the config only add to the map (as long as prefixes don't clash). The extension mechanism doesn't have to be as powerful as what's there already.

The other downside is that it has implications to suddenly have configuration outside of the YAML metadata section of markdown files. Should this go into that metadata section, then? That's not very re-usable if the same resource patterns are used across multiple documents.

TL;DR, I wanted to start a conversation here, not commit time to a path that you end up not liking.

@jfinkhaeuser
Copy link
Contributor Author

So, current thinking:

  • In the initial metadata section, support configuration of citation sources.
  • One configuration option would be to refer to an external file. This then becomes a choice by the document author rather than a choice the tooling imposes.
  • In-document specifications override the external file (going from least to most specific).

e.g.

---
coding: utf-8
title: "stuff"
category: info
submissiontype: independent
...
bibtags:
  from:
  - path/to/definition.yaml
  - path/to/refcache-like/directory/
  sources:
    PREFIX:
      url: https://something/that/can/reference/#{bibref}
      filename: pattern-with-#{bibref} # optional, defaults to see below
      ttl: 123 # optional, defaults to built-in 
      rewrite_anchor: # optional, defaults to - what do you prefer?

If the from entry is a file, it is treated as YAML which needs to contain the same bibtags structure. I'd disallow more from-fields here for avoiding having to handle recursive includes, but I can deal with that as well.

The second type of path would be a directory. It would be a minor addition. When in bibtagsys you're looking into the map, you could also see in this directory if there is a file that contains the bib reference in its name. One could additionally require it to end in .xml, but it doesn't hurt to glob IMHO. It'd be up to someone using such directories to ensure file names are sufficiently unique with respect to their reference tags; more than one match would be treated the same as no match (albeit with better error message).

Anyway, that addition came to mind because it's become necessary to stop relying on some external reference sources and make local copies of those reference files pointing e.g. to archive.org or some such.

IMHO that is the better approach. It'd be contained in bibtagsys (pretty much), about as exensible as this patch, and doesn't involve a bunch of monkey patched code.

The only question mark I have is that you also look in the map in convert_img. It would make sense to encapsulate access to the map in a function that's used in both places, though I do not think it's strictly necessary here.

Thoughts?

  both command.rb and kramdown-rfc2629.rb into resources.rb
- Modify both sources to use resources.rb

One problem is that it's easy to add this functionality to the
converter, and to (partially) duplicate it as before, but it creates
diverging code paths.

The approach in resources.rb moves the code paths back together:
- Functionality is in a module. The module also keeps some shared state.
- A class is offered that imports the module. This is used in
  command.rb, which is more of a script - the additional object being
  passed around does not hurt much.
- The module is also included in the converter class - if shared state has
  been initialized by a previous class instance in command.rb, the
  converter can make use of it. Otherwise, it just goes through the
  previous code paths.
- Forgot to merge the in-document sources into the source hash
'from' entries completely override other source configurations, i.e.
also for e.g. TTL settings. Now we return a local path if a) no source
is found, and use default TTL, etc. and b) if both a source and path
are found, in which case settings *are* applied, but the local path is
used.
@jfinkhaeuser
Copy link
Contributor Author

Alright, I just made those changes. It moves a lot of code, which I'm not sure you'll be happy with - but IMHO that separates concerns better, and avoids some duplication. It seems to work well for the examples, plus the example I added.

@cabo
Copy link
Owner

cabo commented May 6, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants