-
Notifications
You must be signed in to change notification settings - Fork 4
Refseq annotation #71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
| if identifier.startswith("GCF_"): | ||
| return f"insdc.gcf:{identifier}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's also add in
if identifier.startswith("GCA_"):
return f"insdc.gca:{identifier}"| @@ -0,0 +1,710 @@ | |||
| import json | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before you start doing any refactoring, can you add in an integration test that checks the results of parsing the JSON data into all 8 CDM tables? Let me know when you have done that so I can take a look.
16aa4cf to
eab27c9
Compare
06c5508 to
a84db46
Compare
| expected_tables = [ | ||
| "contig", | ||
| "contig_x_contigcollection", | ||
| "contigcollection_x_feature", | ||
| "contigcollection_x_protein", | ||
| "feature", | ||
| "feature_x_protein", | ||
| "identifier", | ||
| "name", | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You also need the protein table -- it looks like the parser is not capturing the protein information any more.
5d4a64c to
0301f1b
Compare
| # Load NCBI dataset from NCBI API | ||
| sample_api_response = test_data_dir / "refseq" / "annotation_report.json" | ||
| dataset = json.load(sample_api_response.open()) | ||
|
|
||
| # Run parse function | ||
| parse_annotation_data(spark, [dataset], TEST_NS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to load the annotation_report.parsed.json file here and use that to populated expected_df.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #71 +/- ##
===========================================
- Coverage 52.69% 6.43% -46.27%
===========================================
Files 63 64 +1
Lines 3241 3403 +162
===========================================
- Hits 1708 219 -1489
- Misses 1533 3184 +1651
... and 36 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
9b5e4d0 to
6039dc6
Compare
No description provided.