-
Notifications
You must be signed in to change notification settings - Fork 32
Open
Description
The model is currently forcing output images to 352x352 dimensions even when the input images are explicitly resized to 224x224. This creates a size mismatch error and prevents proper model usage with architectures expecting 224x224 images.
Current Behavior
When processing an image that has been resized to 224x224:
- Input image is correctly resized to 224x224
- Model processes the image
- Output is forced to 352x352
- Raises error:
ValueError: Input image size (352*352) doesn't match model (224*224)
Expected Behavior
The model should maintain the input image dimensions (224x224) throughout processing, or provide a configuration option to specify desired output dimensions.
Example.
print("Image size:", image.size) #print (224, 224)
prompts = ["ball"]
import torch
inputs = processor(text=prompts, images=[image] * len(prompts), padding="max_length", return_tensors="pt")
# predict
with torch.no_grad():
outputs = model(**inputs)
preds = outputs.logits.unsqueeze(1)
This will generate an error.
ValueError Traceback (most recent call last)
[<ipython-input-26-5c026bcdb696>](https://localhost:8080/#) in <cell line: 7>()
6 # predict
7 with torch.no_grad():
----> 8 outputs = model(**inputs)
9 preds = outputs.logits.unsqueeze(1)
8 frames
[/usr/local/lib/python3.10/dist-packages/transformers/models/clipseg/modeling_clipseg.py](https://localhost:8080/#) in forward(self, pixel_values, interpolate_pos_encoding)
209 batch_size, _, height, width = pixel_values.shape
210 if not interpolate_pos_encoding and (height != self.image_size or width != self.image_size):
--> 211 raise ValueError(
212 f"Input image size ({height}*{width}) doesn't match model" f" ({self.image_size}*{self.image_size})."
213 )
ValueError: Input image size (352*352) doesn't match model (224*224).
Metadata
Metadata
Assignees
Labels
No labels