Skip to content

Comments

feat: concurrent embedding, GitHub ZIP download, read offset/limit#267

Merged
MaojiaSheng merged 1 commit intovolcengine:mainfrom
yangxinxin-7:feat/code
Feb 24, 2026
Merged

feat: concurrent embedding, GitHub ZIP download, read offset/limit#267
MaojiaSheng merged 1 commit intovolcengine:mainfrom
yangxinxin-7:feat/code

Conversation

@yangxinxin-7
Copy link
Collaborator

Performance improvements:

  • upload_directory: three-phase approach (collect → pre-create dirs →
    concurrent upload via asyncio.Semaphore, limit 8). Memoized mkdir
    eliminates redundant AGFS calls.
  • tree_builder: replace recursive file-by-file move with single agfs.mv()
    wrapped in asyncio.to_thread
  • TextEmbeddingHandler: offload blocking embed() to thread pool
  • EmbeddingQueue: configurable max_concurrent workers via
    EmbeddingConfig.max_concurrent (default 1)

New features:

  • CodeRepositoryParser: use GitHub archive ZIP API instead of git clone
    for GitHub URLs without a specific commit (faster, no git history).
    Includes Zip Slip validation.
  • read()/read_file(): add offset/limit line-slicing, propagated through
    all client/service/HTTP/CLI layers

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@MaojiaSheng MaojiaSheng merged commit 7557f5d into volcengine:main Feb 24, 2026
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants