Inconsistency between input (224x224) and output (352x352) image dimensions

The model is currently forcing output images to 352x352 dimensions even when the input images are explicitly resized to 224x224. This creates a size mismatch error and prevents proper model usage with architectures expecting 224x224 images.

### Current Behavior
When processing an image that has been resized to 224x224:
1. Input image is correctly resized to 224x224
2. Model processes the image
3. Output is forced to 352x352
4. Raises error: `ValueError: Input image size (352*352) doesn't match model (224*224)`

### Expected Behavior
The model should maintain the input image dimensions (224x224) throughout processing, or provide a configuration option to specify desired output dimensions.

Example.

```
print("Image size:", image.size)  #print (224, 224)
prompts = ["ball"]
import torch

inputs = processor(text=prompts, images=[image] * len(prompts), padding="max_length", return_tensors="pt")

# predict
with torch.no_grad():
    outputs = model(**inputs)
preds = outputs.logits.unsqueeze(1)

```

This will generate an error.

```
ValueError                                Traceback (most recent call last)
[<ipython-input-26-5c026bcdb696>](https://localhost:8080/#) in <cell line: 7>()
      6 # predict
      7 with torch.no_grad():
----> 8     outputs = model(**inputs)
      9 preds = outputs.logits.unsqueeze(1)

8 frames
[/usr/local/lib/python3.10/dist-packages/transformers/models/clipseg/modeling_clipseg.py](https://localhost:8080/#) in forward(self, pixel_values, interpolate_pos_encoding)
    209         batch_size, _, height, width = pixel_values.shape
    210         if not interpolate_pos_encoding and (height != self.image_size or width != self.image_size):
--> 211             raise ValueError(
    212                 f"Input image size ({height}*{width}) doesn't match model" f" ({self.image_size}*{self.image_size})."
    213             )

ValueError: Input image size (352*352) doesn't match model (224*224).

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inconsistency between input (224x224) and output (352x352) image dimensions #23

Current Behavior

Expected Behavior

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Inconsistency between input (224x224) and output (352x352) image dimensions #23

Description

Current Behavior

Expected Behavior

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions