Refseq annotation #71

alinakbase · 2026-01-21T18:19:55Z

No description provided.

src/cdm_data_loader_utils/parsers/annotation_parse.py

src/cdm_data_loader_utils/parsers/uniprot.py

tests/parsers/test_annotation_parse.py

ialarmedalien · 2026-01-21T20:30:56Z

src/cdm_data_loader_utils/parsers/annotation_parse.py

+    if identifier.startswith("GCF_"):
+        return f"insdc.gcf:{identifier}"


let's also add in

if identifier.startswith("GCA_"): return f"insdc.gca:{identifier}"

ialarmedalien · 2026-01-22T21:58:35Z

tests/parsers/test_annotation_parse.py

@@ -0,0 +1,710 @@
+import json


Before you start doing any refactoring, can you add in an integration test that checks the results of parsing the JSON data into all 8 CDM tables? Let me know when you have done that so I can take a look.

tests/validation/assertions.py

ialarmedalien · 2026-01-28T21:50:29Z

tests/parsers/test_annotation_parse.py

+    expected_tables = [
+        "contig",
+        "contig_x_contigcollection",
+        "contigcollection_x_feature",
+        "contigcollection_x_protein",
+        "feature",
+        "feature_x_protein",
+        "identifier",
+        "name",
+    ]


You also need the protein table -- it looks like the parser is not capturing the protein information any more.

tests/parsers/test_annotation_parse.py

ialarmedalien · 2026-01-29T21:30:57Z

tests/parsers/refseq/api/test_annotation_report.py

+    # Load NCBI dataset from NCBI API
+    sample_api_response = test_data_dir / "refseq" / "annotation_report.json"
+    dataset = json.load(sample_api_response.open())
+
+    # Run parse function
+    parse_annotation_data(spark, [dataset], TEST_NS)


You need to load the annotation_report.parsed.json file here and use that to populated expected_df.

codecov · 2026-01-29T21:35:03Z

Codecov Report

❌ Patch coverage is 74.71264% with 44 lines in your changes missing coverage. Please review.
✅ Project coverage is 6.43%. Comparing base (8f9a3d4) to head (9b5e4d0).
⚠️ Report is 1 commits behind head on develop.

Files with missing lines	Patch %	Lines
...ader_utils/parsers/refseq/api/annotation_report.py	74.41%	44 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop     #71       +/-   ##
===========================================
- Coverage    52.69%   6.43%   -46.27%     
===========================================
  Files           63      64        +1     
  Lines         3241    3403      +162     
===========================================
- Hits          1708     219     -1489     
- Misses        1533    3184     +1651

Files with missing lines	Coverage Δ
...rc/cdm_data_loader_utils/model/kbase_cdm_schema.py	`100.00% <100.00%> (ø)`
...ader_utils/parsers/refseq/api/annotation_report.py	`74.41% <74.41%> (ø)`

... and 36 files with indirect coverage changes

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0301f1b...9b5e4d0. Read the comment docs.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

tests/parsers/refseq/api/test_annotation_report.py

github-code-quality bot found potential problems Jan 21, 2026

View reviewed changes

src/cdm_data_loader_utils/parsers/annotation_parse.py Fixed Show fixed Hide fixed

src/cdm_data_loader_utils/parsers/uniprot.py Fixed Show fixed Hide fixed

github-code-quality bot found potential problems Jan 21, 2026

View reviewed changes

tests/parsers/test_annotation_parse.py Fixed Show fixed Hide fixed

ialarmedalien changed the base branch from develop to uniprot-refactor-v2 January 21, 2026 19:07

ialarmedalien reviewed Jan 21, 2026

View reviewed changes

ialarmedalien reviewed Jan 22, 2026

View reviewed changes

ialarmedalien force-pushed the refseq-annotation branch from 16aa4cf to eab27c9 Compare January 28, 2026 01:13

ialarmedalien force-pushed the uniprot-refactor-v2 branch 2 times, most recently from 06c5508 to a84db46 Compare January 28, 2026 17:21

ialarmedalien reviewed Jan 28, 2026

View reviewed changes

tests/validation/assertions.py Outdated Show resolved Hide resolved

ialarmedalien reviewed Jan 28, 2026

View reviewed changes

ialarmedalien reviewed Jan 29, 2026

View reviewed changes

tests/parsers/test_annotation_parse.py Outdated Show resolved Hide resolved

ialarmedalien force-pushed the refseq-annotation branch from 5d4a64c to 0301f1b Compare January 29, 2026 21:21

First pass at RefSeq API annotation report endpoint parser

48f9e3f

ialarmedalien changed the base branch from uniprot-refactor-v2 to develop January 29, 2026 21:22

ialarmedalien reviewed Jan 29, 2026

View reviewed changes

github-code-quality bot found potential problems Jan 30, 2026

View reviewed changes

tests/parsers/refseq/api/test_annotation_report.py Fixed Show fixed Hide fixed

Restoring deleted test

6039dc6

ialarmedalien force-pushed the refseq-annotation branch from 9b5e4d0 to 6039dc6 Compare January 30, 2026 16:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refseq annotation #71

Refseq annotation #71

alinakbase commented Jan 21, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ialarmedalien Jan 21, 2026

Uh oh!

ialarmedalien Jan 22, 2026

Uh oh!

Uh oh!

ialarmedalien Jan 28, 2026

Uh oh!

Uh oh!

ialarmedalien Jan 29, 2026

Uh oh!

codecov bot commented Jan 29, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		if identifier.startswith("GCF_"):
		return f"insdc.gcf:{identifier}"

Refseq annotation #71

Are you sure you want to change the base?

Refseq annotation #71

Conversation

alinakbase commented Jan 21, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ialarmedalien Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

ialarmedalien Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ialarmedalien Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ialarmedalien Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Jan 29, 2026 •

edited

Loading