Skip to content

Conversation

@PeterKraus
Copy link
Contributor

@PeterKraus PeterKraus commented Mar 26, 2025

Addresses #10 by implementing

  • datatractor beam input_type input_path - install & extract from file
  • datatractor probe input_type - return registered Extractors for that input_type
  • datatractor yard extractor_id - return extractor definition from registry
  • datatractor install extractor_id - install Extractor (optionally into venv)

Copy link
Member

@ml-evs ml-evs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will dig more into this later, just one comment for now!

Copy link
Member

@ml-evs ml-evs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of minor comments!

Comment on lines 215 to 220
try:
ret = func(**vars(args))
if ret is not None and len(ret) > 0:
print(ret)
except RuntimeError as e:
print(e)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for the CLI, we should try to return valid JSON rather than printing Python dictionaries so it can be piped into other tools, e.g.,

Suggested change
try:
ret = func(**vars(args))
if ret is not None and len(ret) > 0:
print(ret)
except RuntimeError as e:
print(e)
try:
ret = func(**vars(args))
if ret is not None and len(ret) > 0:
print(json.dumps(ret))
except RuntimeError as e:
print(e)

This might need some better error handling though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's true, I forgot a printed python dict is not json.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a guarded JSON dump, also when using probe you now get e.g. {"id": "biologic-mpr", "registered_extractors": ["yadg", "galvani"]} back rather than just the list so its valid JSON.

Copy link
Contributor Author

@PeterKraus PeterKraus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (Cannot approve my own PR)

@ml-evs ml-evs changed the title A new cli: A new datatractor CLI Apr 9, 2025
@ml-evs ml-evs merged commit 11c5af1 into main Apr 9, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants