feat: add Bluesky CAR file import support#5
Merged
Conversation
- Add src/sources/bluesky.ts adapter for AT Protocol CAR files - Add @atproto/repo and @atproto/api dependencies - Add 'json' to default output formats (markdown, oai, json) - Update writers.ts with Bluesky permalink support - Add bluesky:post to SELF_POST_SOURCES in transforms - Update tests for multi-source support - Add .splice/ to gitignore
inferRole() was using a Twitter-specific heuristic (checking for full_text in raw) that didn't apply to Bluesky. All Bluesky posts were classified as 'user', causing conversations_oai.jsonl to be empty. Now bluesky:post source items are recognized as assistant messages.
Adds the ability to fetch parent posts from Bluesky's public API when processing CAR exports. This enables proper multi-turn conversation format in OAI JSONL output. Changes: - Add --enrich flag to CLI options - Add enrichBlueskyPosts() with rate-limited batch fetching - Preserve bluesky:fetched posts through filtering - Skip fetched posts as chain starters in grouping - Mark fetched posts as 'user' role in OAI output Results for Berduck (16.9K posts): - Fetched 12,498 parent posts successfully - 12,890 conversations now have user→assistant turns - 4,013 single-post conversations (parent unavailable)
Changed fetchPostChain to use parentHeight=50 and walk the full parent chain. This enables multi-turn conversations with complete thread history. Results for Berduck: - 13,886 unique context posts fetched (vs 12,498 before) - 8,939 conversations with 4+ messages - 3,976 conversations with 3 messages
Updated @mention regex from /@\w+/g to /@[\w.-]+/g to properly match Bluesky handles like @berduck.deepfates.com instead of leaving orphaned .deepfates.com fragments.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds support for importing Bluesky/AT Protocol repository exports (CAR files) alongside existing Twitter archive support.
Changes
src/sources/bluesky.tsfor parsing AT Protocol CAR files@atproto/repoand@atproto/apijsonto default output formats (now: markdown, oai, json)bluesky:postfor thread detectionUsage
Testing