[v4] Improve download progress tracking (model cache registry and define which files will be loaded for pipelines)#1511
[v4] Improve download progress tracking (model cache registry and define which files will be loaded for pipelines)#1511nico-martin wants to merge 33 commits intomainfrom
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
xenova
left a comment
There was a problem hiding this comment.
Very exciting PR! 🙌 Just a quick review from scanning the PR briefly.
| }); | ||
| /** @typedef {keyof typeof DATA_TYPES} DataType */ | ||
|
|
||
| export const DEFAULT_DEVICE_DTYPE = DATA_TYPES.fp32; |
There was a problem hiding this comment.
currently, we do a bit of a funny thing when loading models:
- if wasm, and no dtype set in config, we use q8 (8-bit) model as it's pretty fast on CPU
- if node cpu, and no dtype set in config, we use fp32 model
- if webgpu, and no dtype set in config, we use fp32 model
the main reason is that many models on WASM can encounter an out of memory issue if using fp32 on CPU in the browser. Something I think we can do is to do a scan of models on the hub and specify the default dtype there, especially on a per-device basis. We can check in a separate PR whether we have a good way to support per-device dtypes based on configs.
…tion that does not check for tokenizer files or processor files if the task does not use them
Co-authored-by: Joshua Lochner <admin@xenova.com>
xenova
left a comment
There was a problem hiding this comment.
Solid progress! Thanks 🔥
| /** | ||
| * @typedef {Object} FileClearStatus | ||
| * @property {string} file - The file path | ||
| * @property {boolean} deleted - Whether the file was successfully deleted | ||
| * @property {boolean} wasCached - Whether the file was cached before deletion | ||
| */ | ||
|
|
||
| /** | ||
| * @typedef {Object} CacheClearResult | ||
| * @property {number} filesDeleted - Number of files successfully deleted | ||
| * @property {number} filesCached - Number of files that were in cache | ||
| * @property {FileClearStatus[]} files - Array of files with their deletion status | ||
| */ |
There was a problem hiding this comment.
Are these exposed externally to the user, or just for development internally?
Also, is this implementation (mainly, the names and structure of output) inspired by other similar cache implementations? (e.g., browser APIs or other libraries?) Just curious :D
There was a problem hiding this comment.
they are not exported from @huggingface/transformers directly. But if a user uses ModelRegistry.clear_cache they should see correct typehints. Also I like to write the type system (whats the expected input/output) first and then the implementation (typescript habit). So even for only internal methods I mostly have typedefs.
No, the structure is basically what I thought would be good infos to have..
There was a problem hiding this comment.
Sorry, I should have used backticks!
Co-authored-by: Joshua Lochner <admin@xenova.com>
…s.js into v4-cache-handler

Improved Download Progress Tracking
Problem
Transformers.js couldn't reliably track total download progress because:
Solution
New Exported Functions
get_files(): Determines required files before downloadingget_model_files()/get_tokenizer_files()/get_processor_files(): Helper functions to identify files for each componentget_file_metadata(): Fetches file metadata using Range requests without downloading full contentfromCacheboolean to identify cached filesis_cached(): Checks if all files from a model are already in cacheEnhanced Progress Tracking
readResponse()withexpectedSize: Falls back to metadata whencontent-lengthheader is missingtotal_progresscallback: Provides aggregate progress across all filesReview
One thing I am not super confident is the get_model_files function. I tried to test it with different model architectures, but maybe I missed some that load files that are not in that function. @xenova, could you smoke-test some models and write mie the models that fail?
Easiest way to do that is: