Rather than override the XML_RESOURCE_ORG_HOST and/or various TTLs, a… #255

jfinkhaeuser · 2025-03-21T09:57:24Z

…llow extending the XML_RESOURCE_ORG_MAP. This permits extending the list of sources, and would also allow for additional reference schemas.

The motivation is, for example, to better reference documents that might be specified with identifiers other than DOI, or according to schemata from specific standards organizations. One example might e.g. be to support the ARK identifier scheme.

One issue with ARKs is that it's a federated system, and not every ARK resolver will yield bibxml compatible metadata -- and different implementations might do so via different inflections (their term for e.g. a ?bibxml or similar suffix). As a result, hardcoding support would hardcode assumptions about the resolver, which is unlikely to be a smart move.

This mechanism would allow for room for experimentation, or for adjusting reference sources to more local needs.

Example implementation of an extension.rb:

module XmlResourceOrgMapExtension

def update_map(map)
  map["my_addtion"] = [
     "bibxml", TTL_CONSTANT, false,
     ->(fn, n){
        [
          name = "naming-scheme.#{"%04d" % n.to_i}.xml",
          "#{BASE_URL}/#{name}"
        ] 
     }, 
     true
  ]  
  map     
end
module_function :update_map

end

…llow extending the XML_RESOURCE_ORG_MAP. This permits extending the list of sources, and would also allow for additional reference schemas.

jfinkhaeuser · 2025-03-27T07:13:32Z

Do you have any thoughts on this? Should I use a different approach?

I wanted to have something of an interface (i.e. call a function rather than monkey patch stuff), because that's a cleaner approach. But it's not as if I'm married to this particular patch. I just would like to be able to add more sources for references.

FWIW, I can also add a test case, docs, etc. I just prefer to start small to make an initial review easier.

cabo · 2025-03-27T11:46:16Z

I didn't have too much time working on this, but of course your PR triggered some background processing in my brain.
You can easily do another gem that simply requires the existing kramdown-rfc* gems, does some monkey patching, and then runs the actual work.
So my (those background processes') attention went to what might be a good interface to provide to make this more usable (and less brittle!).
I certainly don't want to have a sea of monkey patches out there that break each time I change a minute implementation detail...

cabo · 2025-03-27T11:47:38Z

BTW, you may want to look at the way cbor-diag handles application-extensions for what those background processes had in mind...

jfinkhaeuser · 2025-03-28T19:19:43Z

Well...

No, exactly. A sea of monkey patches is exactly what I'd like to avoid.

Ruby gives a few options (though it's been some years since I used it!), most of which do seem to involve some kind of monkey patching magic that is super useful for getting started, and will be rubbish to maintain over time.

For this particular use case, my personal preferred solution would actually be more data driven. I'd like to be able to provide a config file that basically takes identifier prefixes as keys, and maps them to URL, filename, TTL and so forth, where the URL/filename can contain placeholders for the full identifier (and/or the prefix, but there'll be little else useable).

Two downsides to this is that it's a little more limited in what you're doing in the lambdas for the I-D, STD and BCP bits. But it's fine for those to stay hardcoded as-is, and have the config only add to the map (as long as prefixes don't clash). The extension mechanism doesn't have to be as powerful as what's there already.

The other downside is that it has implications to suddenly have configuration outside of the YAML metadata section of markdown files. Should this go into that metadata section, then? That's not very re-usable if the same resource patterns are used across multiple documents.

TL;DR, I wanted to start a conversation here, not commit time to a path that you end up not liking.

jfinkhaeuser · 2025-04-08T10:28:52Z

So, current thinking:

In the initial metadata section, support configuration of citation sources.
One configuration option would be to refer to an external file. This then becomes a choice by the document author rather than a choice the tooling imposes.
In-document specifications override the external file (going from least to most specific).

e.g.

---
coding: utf-8
title: "stuff"
category: info
submissiontype: independent
...
bibtags:
  from:
  - path/to/definition.yaml
  - path/to/refcache-like/directory/
  sources:
    PREFIX:
      url: https://something/that/can/reference/#{bibref}
      filename: pattern-with-#{bibref} # optional, defaults to see below
      ttl: 123 # optional, defaults to built-in 
      rewrite_anchor: # optional, defaults to - what do you prefer?

If the from entry is a file, it is treated as YAML which needs to contain the same bibtags structure. I'd disallow more from-fields here for avoiding having to handle recursive includes, but I can deal with that as well.

The second type of path would be a directory. It would be a minor addition. When in bibtagsys you're looking into the map, you could also see in this directory if there is a file that contains the bib reference in its name. One could additionally require it to end in .xml, but it doesn't hurt to glob IMHO. It'd be up to someone using such directories to ensure file names are sufficiently unique with respect to their reference tags; more than one match would be treated the same as no match (albeit with better error message).

Anyway, that addition came to mind because it's become necessary to stop relying on some external reference sources and make local copies of those reference files pointing e.g. to archive.org or some such.

IMHO that is the better approach. It'd be contained in bibtagsys (pretty much), about as exensible as this patch, and doesn't involve a bunch of monkey patched code.

The only question mark I have is that you also look in the map in convert_img. It would make sense to encapsulate access to the map in a function that's used in both places, though I do not think it's strictly necessary here.

Thoughts?

both command.rb and kramdown-rfc2629.rb into resources.rb - Modify both sources to use resources.rb One problem is that it's easy to add this functionality to the converter, and to (partially) duplicate it as before, but it creates diverging code paths. The approach in resources.rb moves the code paths back together: - Functionality is in a module. The module also keeps some shared state. - A class is offered that imports the module. This is used in command.rb, which is more of a script - the additional object being passed around does not hurt much. - The module is also included in the converter class - if shared state has been initialized by a previous class instance in command.rb, the converter can make use of it. Otherwise, it just goes through the previous code paths.

- Forgot to merge the in-document sources into the source hash

'from' entries completely override other source configurations, i.e. also for e.g. TTL settings. Now we return a local path if a) no source is found, and use default TTL, etc. and b) if both a source and path are found, in which case settings *are* applied, but the local path is used.

…rfc into reference-sources

jfinkhaeuser · 2025-04-09T15:25:15Z

Alright, I just made those changes. It moves a lot of code, which I'm not sure you'll be happy with - but IMHO that separates concerns better, and avoids some duplication. It seems to work well for the examples, plus the example I added.

cabo · 2025-05-06T18:36:37Z

Clearly, something is needed here:

https://mailarchive.ietf.org/arch/msg/rfc-markdown/V44TuK4YRRzhFu3ewoZM2g9qLIc

Rather than override the XML_RESOURCE_ORG_HOST and/or various TTLs, a…

e99f67c

…llow extending the XML_RESOURCE_ORG_MAP. This permits extending the list of sources, and would also allow for additional reference schemas.

jfinkhaeuser added 8 commits April 9, 2025 16:07

Examples for configuring citation sources

929999f

- DRY the filename and url templating

3a777cf

- Forgot to merge the in-document sources into the source hash

Intentionally short TTL

dfaf3b7

Merge branch 'reference-sources' of github.com:jfinkhaeuser/kramdown-…

2d8f5a9

…rfc into reference-sources

Merge branch 'cabo:master' into reference-sources

0149786

Merge branch 'reference-sources' of github.com:jfinkhaeuser/kramdown-…

3d2f60c

…rfc into reference-sources

We should use our own module, since the method moved here.

66edad2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Rather than override the XML_RESOURCE_ORG_HOST and/or various TTLs, a… #255

Rather than override the XML_RESOURCE_ORG_HOST and/or various TTLs, a… #255

Uh oh!

jfinkhaeuser commented Mar 21, 2025

Uh oh!

jfinkhaeuser commented Mar 27, 2025

Uh oh!

cabo commented Mar 27, 2025

Uh oh!

cabo commented Mar 27, 2025

Uh oh!

jfinkhaeuser commented Mar 28, 2025

Uh oh!

jfinkhaeuser commented Apr 8, 2025

Uh oh!

jfinkhaeuser commented Apr 9, 2025

Uh oh!

cabo commented May 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Rather than override the XML_RESOURCE_ORG_HOST and/or various TTLs, a… #255

Are you sure you want to change the base?

Rather than override the XML_RESOURCE_ORG_HOST and/or various TTLs, a… #255

Uh oh!

Conversation

jfinkhaeuser commented Mar 21, 2025

Uh oh!

jfinkhaeuser commented Mar 27, 2025

Uh oh!

cabo commented Mar 27, 2025

Uh oh!

cabo commented Mar 27, 2025

Uh oh!

jfinkhaeuser commented Mar 28, 2025

Uh oh!

jfinkhaeuser commented Apr 8, 2025

Uh oh!

jfinkhaeuser commented Apr 9, 2025

Uh oh!

cabo commented May 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants