Skip to content

Comments

[v4] Improve download progress tracking (model cache registry and define which files will be loaded for pipelines)#1511

Open
nico-martin wants to merge 33 commits intomainfrom
v4-cache-handler
Open

[v4] Improve download progress tracking (model cache registry and define which files will be loaded for pipelines)#1511
nico-martin wants to merge 33 commits intomainfrom
v4-cache-handler

Conversation

@nico-martin
Copy link
Collaborator

Improved Download Progress Tracking

Problem

Transformers.js couldn't reliably track total download progress because:

  • File lists weren't known before downloads started
  • File sizes were inconsistent (compressed vs uncompressed)
  • No cache awareness before initiating downloads

Solution

New Exported Functions

  • get_files(): Determines required files before downloading
  • get_model_files() / get_tokenizer_files() / get_processor_files(): Helper functions to identify files for each component
  • get_file_metadata(): Fetches file metadata using Range requests without downloading full content
    • Returns fromCache boolean to identify cached files
    • Ensures consistent uncompressed file sizes
  • is_cached(): Checks if all files from a model are already in cache

Enhanced Progress Tracking

  • readResponse() with expectedSize: Falls back to metadata when content-length header is missing
  • total_progress callback: Provides aggregate progress across all files

Review

One thing I am not super confident is the get_model_files function. I tried to test it with different model architectures, but maybe I missed some that load files that are not in that function. @xenova, could you smoke-test some models and write mie the models that fail?

Easiest way to do that is:

import {
  get_files,
  pipeline,
} from "@huggingface/transformers";

const expectedFiles = await get_files(
  "onnx-community/gemma-3-270m-it-ONNX",
  {
    dtype: "fp32",
    device: "webgpu",
  }
);
const loadedFiles = new Set();
const pipe = await pipeline(
  "text-generation",
  "onnx-community/gemma-3-270m-it-ONNX",
  {
    dtype: "fp32",
    device: "webgpu",
    progress_callback: (e) => {
      if (e.file) loadedFiles.add(e.file);
    },
  }
);

console.log(
  "SAME FILES:",
  expectedFiles.sort().join(",") === Array.from(loadedFiles).sort().join(",")
);

@nico-martin nico-martin requested a review from xenova February 3, 2026 15:24
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@xenova xenova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very exciting PR! 🙌 Just a quick review from scanning the PR briefly.

});
/** @typedef {keyof typeof DATA_TYPES} DataType */

export const DEFAULT_DEVICE_DTYPE = DATA_TYPES.fp32;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently, we do a bit of a funny thing when loading models:

  1. if wasm, and no dtype set in config, we use q8 (8-bit) model as it's pretty fast on CPU
  2. if node cpu, and no dtype set in config, we use fp32 model
  3. if webgpu, and no dtype set in config, we use fp32 model

the main reason is that many models on WASM can encounter an out of memory issue if using fp32 on CPU in the browser. Something I think we can do is to do a scan of models on the hub and specify the default dtype there, especially on a per-device basis. We can check in a separate PR whether we have a good way to support per-device dtypes based on configs.

@xenova xenova changed the base branch from v4 to main February 13, 2026 17:03
@xenova xenova self-requested a review February 18, 2026 17:08
Copy link
Collaborator

@xenova xenova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid progress! Thanks 🔥

Comment on lines +14 to +26
/**
* @typedef {Object} FileClearStatus
* @property {string} file - The file path
* @property {boolean} deleted - Whether the file was successfully deleted
* @property {boolean} wasCached - Whether the file was cached before deletion
*/

/**
* @typedef {Object} CacheClearResult
* @property {number} filesDeleted - Number of files successfully deleted
* @property {number} filesCached - Number of files that were in cache
* @property {FileClearStatus[]} files - Array of files with their deletion status
*/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these exposed externally to the user, or just for development internally?

Also, is this implementation (mainly, the names and structure of output) inspired by other similar cache implementations? (e.g., browser APIs or other libraries?) Just curious :D

Copy link
Collaborator Author

@nico-martin nico-martin Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they are not exported from @huggingface/transformers directly. But if a user uses ModelRegistry.clear_cache they should see correct typehints. Also I like to write the type system (whats the expected input/output) first and then the implementation (typescript habit). So even for only internal methods I mostly have typedefs.
No, the structure is basically what I thought would be good infos to have..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heads up, if you post @huggingface/transformers, you'll ping everyone in that team on GitHub:
image

Probably best to avoid!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I should have used backticks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good!

@xenova xenova changed the title V4 cache handler [v4] Improve download progress tracking (model cache registry and define which files will be loaded for pipelines) Feb 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants