Litcoin pipeline integration including helm chart for k8s deployment#291
Litcoin pipeline integration including helm chart for k8s deployment#291
Conversation
|
@EvanDietzMorris I have addressed the status inconsistency in the get-task-status API endpoint by improving the interaction between the subprocess run and celery worker, and also addressed the hanging issue due to insufficient memory for the worker. So I think this PR is ready to be merged as the first pass implementation dependent upon your code review and approval. |
|
@EvanDietzMorris This PR is now ready for review. It includes all the work needed for the LitCoin pipeline integration, specifically, predicate mapping now integrated in the loadLitCoin parser, adding an environmental variable to indicate loading LitCoin input data from a shared directory or a shared PVC path rather than the default stars server URL, and the helm chart for Sterling k8s deployment. I tested the whole pipeline in Sterling with a small subset of litcoin input data and everything works as expected with the neo4j graph created. However, when I test the whole LitCoin input data, the parser failed in the middle due to predicate mapping getting blocked by security rules (403 response) for some inputs. I have created a help ticket trying to get this fixed/unblocked, but otherwise, it seems the whole pipeline works, so I am asking you to do a code review when you get a chance. The pipeline delivery deadline is 5/27 including readme update to have step-by-step installation instructions for people to follow to get the whole pipeline deployed. So it'd be great if this PR can be merged by 5/27 at the latest. Thanks! |
EvanDietzMorris
left a comment
There was a problem hiding this comment.
I haven't tested the helm charts or celery part yet but this all looks good and shouldn't break anything else, so I'm ok with merging it quickly.
My only concern is that users/developers might get confused about the predicate mapping parts, and think they refer to something more general, when it's really just for litcoin. Are you ok with renaming some of the terminology to make it clear that it's litcoin specific? ie PRED_MAPPING_URL to LITCOIN_PRED_MAPPING_URL.
… and constants to avoid potential confusion
|
@EvanDietzMorris Good point! I have just done renaming as you suggested to avoid potential confusion. Please take a look when you get a chance and let me know if you have any other concerns that need to be addressed. Thanks |
|
I moved LITCOIN_PRED_MAPPING_URL to the optional section of the set_up_test_env.sh script. FYI we are also in the process of redesigning where the example env vars are set. #314 would move towards relying on .env files more, with a script to help populate them instead of setting them in local environments. Your opinion on that would be welcome if you want to look. Other than that, I think we should double check whether we're handling the openAI credentials correctly, especially if we're handing this off to other people to deploy. We might want to move them over to actual secrets in the helm charts for production deployments. If you think we should change that, it's up to you whether to do it on this PR or start a new one. |
|
@EvanDietzMorris Agreed relying on .env files more is a good move. Yeah, thanks for bringing up this issue with OpenAI credentials. I'll bring this up tomorrow at the scrum meeting to get input on this. If we want to keep this OpenAI dependency, I'll create a new PR to have its credentials set up as secrets in helm chart. The new PR will be most likely created after the 5/27 deadline. So I'll merge this PR for now and create new PR to address this if needed. |
commit 320cbb8 Merge: 59751e9 95c9b42 Author: Evan Morris <evandietzmorris@gmail.com> Date: Thu May 22 14:40:03 2025 -0400 Merge pull request #329 from RobokopU24/ccidb CCIDB commit 59751e9 Merge: cebe4d5 a6782e4 Author: Evan Morris <evandietzmorris@gmail.com> Date: Thu May 22 14:39:23 2025 -0400 Merge pull request #331 from RobokopU24/tmkp removing biolink from tmkp_confidence_score for consistency commit cebe4d5 Merge: 7e2704c 63b4a60 Author: Hong Yi <hongyi@renci.org> Date: Mon May 19 21:39:21 2025 -0400 Merge pull request #291 from RobokopU24/litcoin-pipeline-integration Litcoin pipeline integration including helm chart for k8s deployment commit 63b4a60 Author: Evan Morris <evandietzmorris@gmail.com> Date: Mon May 19 18:51:33 2025 -0400 moving litcoin var to the optional section commit 8ed3ac1 Author: hyi <hongyi@renci.org> Date: Mon May 19 17:38:57 2025 -0400 add LITCOIN prefix to predicate mapping related environment variables and constants to avoid potential confusion commit a6782e4 Author: Evan Morris <evandietzmorris@gmail.com> Date: Mon May 19 15:23:47 2025 -0400 removing biolink from tmkp_confidence_score for consistency commit 95c9b42 Author: Evan Morris <evandietzmorris@gmail.com> Date: Mon May 19 15:22:56 2025 -0400 adding more mappings commit 16feffc Author: Evan Morris <evandietzmorris@gmail.com> Date: Mon May 19 15:22:43 2025 -0400 no more StrEnum - StrEnum isn't available for python 3.9 - apply strip() to all values even from inside lists commit 9f72549 Author: Evan Morris <evandietzmorris@gmail.com> Date: Fri May 16 15:18:28 2025 -0400 removing edge properties with no values more generally commit 508dccf Author: Evan Morris <evandietzmorris@gmail.com> Date: Fri May 16 15:14:26 2025 -0400 adding a comment about the name commit 3c55bfd Author: Evan Morris <evandietzmorris@gmail.com> Date: Fri May 16 14:39:07 2025 -0400 resolving misc modeling issues - made effectors, phenotypes, mode of actions lowercase - clean up usage of disease context qualifier and anatomical context qualifier commit 0dddcd5 Author: Evan Morris <evandietzmorris@gmail.com> Date: Fri May 16 12:52:29 2025 -0400 first pass at CCIDB parser commit 7e2704c Merge: 6206134 58e2f02 Author: Evan Morris <evandietzmorris@gmail.com> Date: Thu May 15 21:29:00 2025 -0400 Merge pull request #326 from Vibhorgupta31/re-import-csv Put `import csv` back in Common/extractor.py commit 58e2f02 Author: Bradford Powell <bpow@drpowell.org> Date: Thu May 15 21:09:48 2025 -0400 A previous commit removed the csv import, which is pretty important commit a1599e2 Author: hyi <hongyi@renci.org> Date: Thu May 15 20:41:49 2025 -0400 updated helm chart for orion worker deployment commit 6206134 Merge: 6fba516 f44a359 Author: Evan Morris <evandietzmorris@gmail.com> Date: Thu May 15 13:43:27 2025 -0400 Merge pull request #323 from RobokopU24/hmdb_directions_review reviewed and modified gene/metabolite edges commit f44a359 Author: Evan Morris <evandietzmorris@gmail.com> Date: Thu May 15 13:42:30 2025 -0400 bumping parser version commit 6fba516 Merge: f2712ae a017712 Author: Evan Morris <evandietzmorris@gmail.com> Date: Thu May 15 13:09:05 2025 -0400 Merge pull request #321 from Vibhorgupta31/extractor_consistency Extractor syntax updates commit 7d6c88f Author: Chris Bizon <bizon@renci.org> Date: Thu May 15 10:53:40 2025 -0400 reviewed and modified gene/metabolite edges commit f2712ae Merge: 55dd740 2caff08 Author: Evan Morris <evandietzmorris@gmail.com> Date: Thu May 15 10:35:26 2025 -0400 Merge pull request #322 from RobokopU24/issue-316 Removing independent run sections from parsers that have them commit 55dd740 Merge: 944a271 b2ddf5a Author: Evan Morris <evandietzmorris@gmail.com> Date: Thu May 15 10:31:24 2025 -0400 Merge pull request #318 from RobokopU24/original Initial attempt to keep original subject/object on edges commit b2ddf5a Author: Evan Morris <evandietzmorris@gmail.com> Date: Thu May 15 10:28:29 2025 -0400 adding original_ ids to edge inversion test commit 2caff08 Author: Hina Shah <hinashah@renci.org> Date: Thu May 15 10:27:01 2025 -0400 Removing independent run sections from parsers that have them commit 944a271 Merge: 0b68f16 e9f44cd Author: Evan Morris <evandietzmorris@gmail.com> Date: Thu May 15 10:16:59 2025 -0400 Merge pull request #320 from ITSskaraden/readme-removal Remove legacy README.md files from parser subdirectories (fixes #315) commit e9f44cd Author: Evan Morris <evandietzmorris@gmail.com> Date: Thu May 15 10:13:42 2025 -0400 adding VP readme back commit 0b68f16 Merge: 98e042b 125e902 Author: Evan Morris <evandietzmorris@gmail.com> Date: Thu May 15 09:54:34 2025 -0400 Merge pull request #317 from RobokopU24/TIGA_edges Update loadPHAROS.py to remove TIGA edges commit 125e902 Author: Evan Morris <evandietzmorris@gmail.com> Date: Thu May 15 09:53:16 2025 -0400 bumping parser version commit 084cdff Author: ITSskaraden <SkaldBjorn@yandex.ru> Date: Thu May 15 01:47:39 2025 +0300 Remove legacy README.md files from parser subdirectories (fixes #315) commit a017712 Author: Bradford Powell <bpow@drpowell.org> Date: Wed Apr 9 14:23:29 2025 -0400 Add some defaults to json_extract This allows easier parsing if someone just so happens to have an array of objects with just the right key=>value pairs. Or if they are using a generator that modifies those items on the way in... This is designed to be fully backwards compatible. commit 3eedc43 Author: Bradford Powell <bpow@drpowell.org> Date: Wed May 14 17:38:34 2025 -0400 Add exclude_unconnected_nodes argument to sql_extract and json_extract This matches the behavior of csv_extract, allowing edges to be ignored by providing an extractor function that returns `None`. A default of `False` is provided to make this completely backwards compatible. This provides consistence with csv_extract, which of the three commit 1b9c987 Author: Chris Bizon <bizon@renci.org> Date: Wed May 14 15:52:53 2025 -0400 Initial attempt to keep original subject/object on edges commit 9b063b3 Author: Kathleen Carter <163005214+eKathleenCarter@users.noreply.github.com> Date: Wed May 14 14:59:36 2025 -0400 Update loadPHAROS.py removed __name__ == '__main__' section removed TIGA edges commit 98e042b Merge: ffacc7e bf90b53 Author: Evan Morris <evandietzmorris@gmail.com> Date: Wed May 14 14:13:41 2025 -0400 Merge pull request #308 from RobokopU24/add-readme Update README commit bf90b53 Author: Hina Shah <hinashah@renci.org> Date: Wed May 14 14:05:40 2025 -0400 ENH: Update the README file with documentation for the two python scripts run through docker commit ffacc7e Merge: fa3979a 3307ff7 Author: Evan Morris <evandietzmorris@gmail.com> Date: Wed May 14 14:01:11 2025 -0400 Merge pull request #309 from RobokopU24/issue-302 Fixing typo for "PREDICATION" commit 3a06922 Author: Evan Morris <evandietzmorris@gmail.com> Date: Wed May 14 13:43:19 2025 -0400 moving the developer section to it's own file commit fe7a9dc Author: Evan Morris <evandietzmorris@gmail.com> Date: Wed May 14 12:08:34 2025 -0400 Removed log in comment and fixed typo commit 3307ff7 Author: Hina Shah <hinashah@renci.org> Date: Wed May 14 11:57:37 2025 -0400 Fixing type predication commit c058727 Author: Hina Shah <hinashah@renci.org> Date: Wed May 14 11:36:28 2025 -0400 Update README commit 1958c86 Author: hyi <hongyi@renci.org> Date: Fri May 9 17:04:35 2025 -0400 get predicate mapping integrated commit ade4945 Author: hyi <hongyi@renci.org> Date: Thu May 8 22:40:22 2025 -0400 first pass to add predicate mapping commit 9a7645c Author: hyi <hongyi@renci.org> Date: Wed Apr 23 19:59:05 2025 -0400 changed logging level from debug to info commit 7ba0a36 Author: hyi <hongyi@renci.org> Date: Wed Apr 23 16:59:18 2025 -0400 fixed celery worker queue name typo issue commit 890e20c Author: hyi <hongyi@renci.org> Date: Tue Apr 22 22:56:48 2025 -0400 update ORION to ingest data generated from the LitCoin pipeline commit 9d88b3b Merge: caf4601 fa3979a Author: hyi <hongyi@renci.org> Date: Wed Apr 9 11:02:26 2025 -0400 Merge branch 'master' into litcoin-pipeline-integration commit caf4601 Author: hyi <hongyi@renci.org> Date: Wed Mar 12 10:35:00 2025 -0400 increase memory to 10GB to address OOM when running example graph ingestion commit c083e73 Author: hyi <hongyi@renci.org> Date: Tue Mar 11 13:35:57 2025 -0400 get orion task run exception propogate to celery without catching it commit a150c14 Author: hyi <hongyi@renci.org> Date: Sun Mar 9 17:27:10 2025 -0400 added ORION worker deployment helm chart
@EvanDietzMorris I am creating this PR for your perusal to get your inputs sooner rather than later - no need to get this PR merged until the graph building hanging issue specific to k8s deployment gets resolved.