Skip to content

Comments

Litcoin pipeline integration including helm chart for k8s deployment#291

Merged
hyi merged 12 commits intomasterfrom
litcoin-pipeline-integration
May 20, 2025
Merged

Litcoin pipeline integration including helm chart for k8s deployment#291
hyi merged 12 commits intomasterfrom
litcoin-pipeline-integration

Conversation

@hyi
Copy link
Contributor

@hyi hyi commented Mar 11, 2025

@EvanDietzMorris I am creating this PR for your perusal to get your inputs sooner rather than later - no need to get this PR merged until the graph building hanging issue specific to k8s deployment gets resolved.

@hyi hyi requested a review from EvanDietzMorris March 11, 2025 17:43
@hyi
Copy link
Contributor Author

hyi commented Mar 12, 2025

@EvanDietzMorris I have addressed the status inconsistency in the get-task-status API endpoint by improving the interaction between the subprocess run and celery worker, and also addressed the hanging issue due to insufficient memory for the worker. So I think this PR is ready to be merged as the first pass implementation dependent upon your code review and approval.

@hyi hyi changed the title Litcoin pipeline integration helm chart for k8s deployment Litcoin pipeline integration including helm chart for k8s deployment May 16, 2025
@hyi
Copy link
Contributor Author

hyi commented May 16, 2025

@EvanDietzMorris This PR is now ready for review. It includes all the work needed for the LitCoin pipeline integration, specifically, predicate mapping now integrated in the loadLitCoin parser, adding an environmental variable to indicate loading LitCoin input data from a shared directory or a shared PVC path rather than the default stars server URL, and the helm chart for Sterling k8s deployment. I tested the whole pipeline in Sterling with a small subset of litcoin input data and everything works as expected with the neo4j graph created. However, when I test the whole LitCoin input data, the parser failed in the middle due to predicate mapping getting blocked by security rules (403 response) for some inputs. I have created a help ticket trying to get this fixed/unblocked, but otherwise, it seems the whole pipeline works, so I am asking you to do a code review when you get a chance. The pipeline delivery deadline is 5/27 including readme update to have step-by-step installation instructions for people to follow to get the whole pipeline deployed. So it'd be great if this PR can be merged by 5/27 at the latest. Thanks!

Copy link
Contributor

@EvanDietzMorris EvanDietzMorris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tested the helm charts or celery part yet but this all looks good and shouldn't break anything else, so I'm ok with merging it quickly.

My only concern is that users/developers might get confused about the predicate mapping parts, and think they refer to something more general, when it's really just for litcoin. Are you ok with renaming some of the terminology to make it clear that it's litcoin specific? ie PRED_MAPPING_URL to LITCOIN_PRED_MAPPING_URL.

@hyi
Copy link
Contributor Author

hyi commented May 19, 2025

@EvanDietzMorris Good point! I have just done renaming as you suggested to avoid potential confusion. Please take a look when you get a chance and let me know if you have any other concerns that need to be addressed. Thanks

@EvanDietzMorris
Copy link
Contributor

I moved LITCOIN_PRED_MAPPING_URL to the optional section of the set_up_test_env.sh script. FYI we are also in the process of redesigning where the example env vars are set. #314 would move towards relying on .env files more, with a script to help populate them instead of setting them in local environments. Your opinion on that would be welcome if you want to look.

Other than that, I think we should double check whether we're handling the openAI credentials correctly, especially if we're handing this off to other people to deploy. We might want to move them over to actual secrets in the helm charts for production deployments. If you think we should change that, it's up to you whether to do it on this PR or start a new one.

@EvanDietzMorris EvanDietzMorris self-requested a review May 19, 2025 23:09
@hyi
Copy link
Contributor Author

hyi commented May 20, 2025

@EvanDietzMorris Agreed relying on .env files more is a good move. Yeah, thanks for bringing up this issue with OpenAI credentials. I'll bring this up tomorrow at the scrum meeting to get input on this. If we want to keep this OpenAI dependency, I'll create a new PR to have its credentials set up as secrets in helm chart. The new PR will be most likely created after the 5/27 deadline. So I'll merge this PR for now and create new PR to address this if needed.

@hyi hyi merged commit cebe4d5 into master May 20, 2025
4 checks passed
@hyi hyi deleted the litcoin-pipeline-integration branch May 20, 2025 01:39
EvanDietzMorris added a commit that referenced this pull request May 29, 2025
commit 320cbb8
Merge: 59751e9 95c9b42
Author: Evan Morris <evandietzmorris@gmail.com>
Date:   Thu May 22 14:40:03 2025 -0400

    Merge pull request #329 from RobokopU24/ccidb

    CCIDB

commit 59751e9
Merge: cebe4d5 a6782e4
Author: Evan Morris <evandietzmorris@gmail.com>
Date:   Thu May 22 14:39:23 2025 -0400

    Merge pull request #331 from RobokopU24/tmkp

    removing biolink from tmkp_confidence_score for consistency

commit cebe4d5
Merge: 7e2704c 63b4a60
Author: Hong Yi <hongyi@renci.org>
Date:   Mon May 19 21:39:21 2025 -0400

    Merge pull request #291 from RobokopU24/litcoin-pipeline-integration

    Litcoin pipeline integration including helm chart for k8s deployment

commit 63b4a60
Author: Evan Morris <evandietzmorris@gmail.com>
Date:   Mon May 19 18:51:33 2025 -0400

    moving litcoin var to the optional section

commit 8ed3ac1
Author: hyi <hongyi@renci.org>
Date:   Mon May 19 17:38:57 2025 -0400

    add LITCOIN prefix to predicate mapping related environment variables and constants to avoid potential confusion

commit a6782e4
Author: Evan Morris <evandietzmorris@gmail.com>
Date:   Mon May 19 15:23:47 2025 -0400

    removing biolink from tmkp_confidence_score for consistency

commit 95c9b42
Author: Evan Morris <evandietzmorris@gmail.com>
Date:   Mon May 19 15:22:56 2025 -0400

    adding more mappings

commit 16feffc
Author: Evan Morris <evandietzmorris@gmail.com>
Date:   Mon May 19 15:22:43 2025 -0400

    no more StrEnum

    - StrEnum isn't available for python 3.9
    - apply strip() to all values even from inside lists

commit 9f72549
Author: Evan Morris <evandietzmorris@gmail.com>
Date:   Fri May 16 15:18:28 2025 -0400

    removing edge properties with no values more generally

commit 508dccf
Author: Evan Morris <evandietzmorris@gmail.com>
Date:   Fri May 16 15:14:26 2025 -0400

    adding a comment about the name

commit 3c55bfd
Author: Evan Morris <evandietzmorris@gmail.com>
Date:   Fri May 16 14:39:07 2025 -0400

    resolving misc modeling issues

    - made effectors, phenotypes, mode of actions lowercase
    - clean up usage of disease context qualifier and anatomical context qualifier

commit 0dddcd5
Author: Evan Morris <evandietzmorris@gmail.com>
Date:   Fri May 16 12:52:29 2025 -0400

    first pass at CCIDB parser

commit 7e2704c
Merge: 6206134 58e2f02
Author: Evan Morris <evandietzmorris@gmail.com>
Date:   Thu May 15 21:29:00 2025 -0400

    Merge pull request #326 from Vibhorgupta31/re-import-csv

    Put `import csv` back in Common/extractor.py

commit 58e2f02
Author: Bradford Powell <bpow@drpowell.org>
Date:   Thu May 15 21:09:48 2025 -0400

    A previous commit removed the csv import, which is pretty important

commit a1599e2
Author: hyi <hongyi@renci.org>
Date:   Thu May 15 20:41:49 2025 -0400

    updated helm chart for orion worker deployment

commit 6206134
Merge: 6fba516 f44a359
Author: Evan Morris <evandietzmorris@gmail.com>
Date:   Thu May 15 13:43:27 2025 -0400

    Merge pull request #323 from RobokopU24/hmdb_directions_review

    reviewed and modified gene/metabolite edges

commit f44a359
Author: Evan Morris <evandietzmorris@gmail.com>
Date:   Thu May 15 13:42:30 2025 -0400

    bumping parser version

commit 6fba516
Merge: f2712ae a017712
Author: Evan Morris <evandietzmorris@gmail.com>
Date:   Thu May 15 13:09:05 2025 -0400

    Merge pull request #321 from Vibhorgupta31/extractor_consistency

    Extractor syntax updates

commit 7d6c88f
Author: Chris Bizon <bizon@renci.org>
Date:   Thu May 15 10:53:40 2025 -0400

    reviewed and modified gene/metabolite edges

commit f2712ae
Merge: 55dd740 2caff08
Author: Evan Morris <evandietzmorris@gmail.com>
Date:   Thu May 15 10:35:26 2025 -0400

    Merge pull request #322 from RobokopU24/issue-316

    Removing independent run sections from parsers that have them

commit 55dd740
Merge: 944a271 b2ddf5a
Author: Evan Morris <evandietzmorris@gmail.com>
Date:   Thu May 15 10:31:24 2025 -0400

    Merge pull request #318 from RobokopU24/original

    Initial attempt to keep original subject/object on edges

commit b2ddf5a
Author: Evan Morris <evandietzmorris@gmail.com>
Date:   Thu May 15 10:28:29 2025 -0400

    adding original_ ids to edge inversion test

commit 2caff08
Author: Hina Shah <hinashah@renci.org>
Date:   Thu May 15 10:27:01 2025 -0400

    Removing independent run sections from parsers that have them

commit 944a271
Merge: 0b68f16 e9f44cd
Author: Evan Morris <evandietzmorris@gmail.com>
Date:   Thu May 15 10:16:59 2025 -0400

    Merge pull request #320 from ITSskaraden/readme-removal

    Remove legacy README.md files from parser subdirectories (fixes #315)

commit e9f44cd
Author: Evan Morris <evandietzmorris@gmail.com>
Date:   Thu May 15 10:13:42 2025 -0400

    adding VP readme back

commit 0b68f16
Merge: 98e042b 125e902
Author: Evan Morris <evandietzmorris@gmail.com>
Date:   Thu May 15 09:54:34 2025 -0400

    Merge pull request #317 from RobokopU24/TIGA_edges

    Update loadPHAROS.py to remove TIGA edges

commit 125e902
Author: Evan Morris <evandietzmorris@gmail.com>
Date:   Thu May 15 09:53:16 2025 -0400

    bumping parser version

commit 084cdff
Author: ITSskaraden <SkaldBjorn@yandex.ru>
Date:   Thu May 15 01:47:39 2025 +0300

    Remove legacy README.md files from parser subdirectories (fixes #315)

commit a017712
Author: Bradford Powell <bpow@drpowell.org>
Date:   Wed Apr 9 14:23:29 2025 -0400

    Add some defaults to json_extract

    This allows easier parsing if someone just so happens to have an array
    of objects with just the right key=>value pairs. Or if they are using
    a generator that modifies those items on the way in...

    This is designed to be fully backwards compatible.

commit 3eedc43
Author: Bradford Powell <bpow@drpowell.org>
Date:   Wed May 14 17:38:34 2025 -0400

    Add exclude_unconnected_nodes argument to sql_extract and json_extract

    This matches the behavior of csv_extract, allowing edges to be ignored
    by providing an extractor function that returns `None`.

    A default of `False` is provided to make this completely backwards
    compatible.
    This provides consistence with csv_extract, which of the three

commit 1b9c987
Author: Chris Bizon <bizon@renci.org>
Date:   Wed May 14 15:52:53 2025 -0400

    Initial attempt to keep original subject/object on edges

commit 9b063b3
Author: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com>
Date:   Wed May 14 14:59:36 2025 -0400

    Update loadPHAROS.py

    removed __name__ == '__main__' section
    removed TIGA edges

commit 98e042b
Merge: ffacc7e bf90b53
Author: Evan Morris <evandietzmorris@gmail.com>
Date:   Wed May 14 14:13:41 2025 -0400

    Merge pull request #308 from RobokopU24/add-readme

    Update README

commit bf90b53
Author: Hina Shah <hinashah@renci.org>
Date:   Wed May 14 14:05:40 2025 -0400

    ENH: Update the README file with documentation for the two python scripts run through docker

commit ffacc7e
Merge: fa3979a 3307ff7
Author: Evan Morris <evandietzmorris@gmail.com>
Date:   Wed May 14 14:01:11 2025 -0400

    Merge pull request #309 from RobokopU24/issue-302

    Fixing typo for "PREDICATION"

commit 3a06922
Author: Evan Morris <evandietzmorris@gmail.com>
Date:   Wed May 14 13:43:19 2025 -0400

    moving the developer section to it's own file

commit fe7a9dc
Author: Evan Morris <evandietzmorris@gmail.com>
Date:   Wed May 14 12:08:34 2025 -0400

    Removed log in comment and fixed typo

commit 3307ff7
Author: Hina Shah <hinashah@renci.org>
Date:   Wed May 14 11:57:37 2025 -0400

    Fixing type predication

commit c058727
Author: Hina Shah <hinashah@renci.org>
Date:   Wed May 14 11:36:28 2025 -0400

    Update README

commit 1958c86
Author: hyi <hongyi@renci.org>
Date:   Fri May 9 17:04:35 2025 -0400

    get predicate mapping integrated

commit ade4945
Author: hyi <hongyi@renci.org>
Date:   Thu May 8 22:40:22 2025 -0400

    first pass to add predicate mapping

commit 9a7645c
Author: hyi <hongyi@renci.org>
Date:   Wed Apr 23 19:59:05 2025 -0400

    changed logging level from debug to info

commit 7ba0a36
Author: hyi <hongyi@renci.org>
Date:   Wed Apr 23 16:59:18 2025 -0400

    fixed celery worker queue name typo issue

commit 890e20c
Author: hyi <hongyi@renci.org>
Date:   Tue Apr 22 22:56:48 2025 -0400

    update ORION to ingest data generated from the LitCoin pipeline

commit 9d88b3b
Merge: caf4601 fa3979a
Author: hyi <hongyi@renci.org>
Date:   Wed Apr 9 11:02:26 2025 -0400

    Merge branch 'master' into litcoin-pipeline-integration

commit caf4601
Author: hyi <hongyi@renci.org>
Date:   Wed Mar 12 10:35:00 2025 -0400

    increase memory to 10GB to address OOM when running example graph ingestion

commit c083e73
Author: hyi <hongyi@renci.org>
Date:   Tue Mar 11 13:35:57 2025 -0400

    get orion task run exception propogate to celery without catching it

commit a150c14
Author: hyi <hongyi@renci.org>
Date:   Sun Mar 9 17:27:10 2025 -0400

    added ORION worker deployment helm chart
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants