Skip to content

Conversation

@alinakbase
Copy link
Collaborator

No description provided.

@ialarmedalien ialarmedalien changed the base branch from develop to uniprot-refactor-v2 January 21, 2026 19:07
Comment on lines 72 to 69
if identifier.startswith("GCF_"):
return f"insdc.gcf:{identifier}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's also add in

if identifier.startswith("GCA_"):
    return f"insdc.gca:{identifier}"

@@ -0,0 +1,710 @@
import json
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before you start doing any refactoring, can you add in an integration test that checks the results of parsing the JSON data into all 8 CDM tables? Let me know when you have done that so I can take a look.

@ialarmedalien ialarmedalien force-pushed the uniprot-refactor-v2 branch 2 times, most recently from 06c5508 to a84db46 Compare January 28, 2026 17:21
Comment on lines 736 to 738
expected_tables = [
"contig",
"contig_x_contigcollection",
"contigcollection_x_feature",
"contigcollection_x_protein",
"feature",
"feature_x_protein",
"identifier",
"name",
]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You also need the protein table -- it looks like the parser is not capturing the protein information any more.

@ialarmedalien ialarmedalien changed the base branch from uniprot-refactor-v2 to develop January 29, 2026 21:22
Comment on lines 721 to 726
# Load NCBI dataset from NCBI API
sample_api_response = test_data_dir / "refseq" / "annotation_report.json"
dataset = json.load(sample_api_response.open())

# Run parse function
parse_annotation_data(spark, [dataset], TEST_NS)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to load the annotation_report.parsed.json file here and use that to populated expected_df.

@codecov
Copy link

codecov bot commented Jan 29, 2026

Codecov Report

❌ Patch coverage is 74.71264% with 44 lines in your changes missing coverage. Please review.
✅ Project coverage is 6.43%. Comparing base (8f9a3d4) to head (9b5e4d0).
⚠️ Report is 1 commits behind head on develop.

Files with missing lines Patch % Lines
...ader_utils/parsers/refseq/api/annotation_report.py 74.41% 44 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #71       +/-   ##
===========================================
- Coverage    52.69%   6.43%   -46.27%     
===========================================
  Files           63      64        +1     
  Lines         3241    3403      +162     
===========================================
- Hits          1708     219     -1489     
- Misses        1533    3184     +1651     
Files with missing lines Coverage Δ
...rc/cdm_data_loader_utils/model/kbase_cdm_schema.py 100.00% <100.00%> (ø)
...ader_utils/parsers/refseq/api/annotation_report.py 74.41% <74.41%> (ø)

... and 36 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0301f1b...9b5e4d0. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants