-
Notifications
You must be signed in to change notification settings - Fork 50
Open
Description
The tap doesn't respect existing replication state by filter out data older than the replication key value in the state.
How to reproduce
Github tap configuration
- name: tap-github-repos
inherit_from: tap-github
pip_url: git+https://github.com/MeltanoLabs/tap-github.git
config:
user_agent: ''
start_date: '2023-01-01T00:00:00Z'
searches:
- name: All repos
query: apache/*
variant: meltanolabs
select:
- repositories.*
metadata:
repositories:
replication-method: INCREMENTAL
Run a sync that produces 1000 (limit for the 'repositories' stream) records and a state record.
meltano run tap-github-repos target-jsonl
Run the same sync one more time
meltano run tap-github-repos target-jsonl
Result is there are 2000 records in the target json file and each record is fully duplicated.
The issue can be reproduced on the repositories stream.
I couldn't reproduce this on the issues stream.
I haven't tested other streams.
If Github APIs do not allow fetching data from a specific replication point (at least for the repositories stream) then the tap should filter those records instead of sending them down the pipeline.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels