-
Notifications
You must be signed in to change notification settings - Fork 2
Description
I am currently testing the performance of the Qwen3-235B-VL model as a replacement for GPT-5. Could you please share the specific experimental parameters used for your GPT-5 sampling?
My current configuration is as follows: Args: Namespace(task='shopping_admin', task_ids=None, exp='qwen235B-VL-Instruct', rerun=True, retry=False, model_name='Qwen/Qwen3-VL-235B-A22B-Instruct', visual_effects=True, use_html=False, use_axtree=False, use_screenshot=True, use_som=True, mode='bid', tips=False, headless=True, use_full_action_history=True)
Are these key parameters consistent with those used in your experiments? Additionally, did you utilize vision-based input or text-based input for GPT-5? Regarding the observation space, would you say that using use_screenshot and use_som (Set-of-Mark) tends to yield better results compared to use_axtree?
Finally, are there any other recommended models besides GPT-5? For instance, would a combination of DeepSeek-V3.1 and use_axtree be a viable alternative?