Allow combination of ESS and automatic scaling#2032
Allow combination of ESS and automatic scaling#2032
Conversation
Generalize spectral scale creation beyond OmnigeousField to handle any Optimizable objects. Equilibrium instances create exponential scales, while non-Equilibrium instances default to ones. Results are concatenated to preserve proper ordering.
Ensure x_scale has full state vector size (eq.dim_x) for compatibility with all problem types, including those with linear constraints or ProximalProjection objectives. Assign scaling selectively using x_idx keys (e.g., Rb_lmn), preserving generality across optimization scenarios beyond fixed-boundary cases.
…n objectives Address dimensional mismatch when x_scale is used with objectives like ProximalProjection by ensuring x_scale is initialized with eq.dim_x and then projected appropriately. This resolves issues where the reduced optimization space omits excluded parameters (e.g., R_lmn, Z_lmn, L_lmn) and avoids errors during eq.solve(x_scale='ess'). Also improved docstring to clarify size and ordering requirements for custom x_scale inputs.
…alSurface - Add validation to prevent ESS in eq.solve() (optimization only) - Improve documentation on ESS parameters and defaults - Fix ProximalProjection x_scale handling for multiple things - Add basic tests for ESS scaling scenarios
|
Proposed solution:
|
| options.setdefault("initial_trust_radius", 1e-3) | ||
| options.setdefault("max_trust_radius", 1.0) | ||
| elif options.get("initial_trust_radius", "scipy") == "scipy": | ||
| if options.get("initial_trust_radius", "scipy") == "scipy": |
There was a problem hiding this comment.
Why do we have these here and not in the optimization function itself? I remember getting confused because this default is different than the one set inside the function. If there is no specific reason, I would vote for having these defaults set in the same place.
| scl = tng._get_ess_scale(ess_alpha, ess_order, ess_min_value, ess_default) | ||
| all_scales.append(scl) | ||
| elif isinstance(xsc, str) and xsc == "auto": | ||
| scl = tree_map(jnp.zeros_like, tng.params_dict) |
There was a problem hiding this comment.
Whether #2041 is merged before or after this, we need to add special logic to SGD type optimizers to deal with these 0 values. SGD only looks at the gradient and norm scaling doesn't work there.
| return 1 / scale_inv, scale_inv | ||
| scale = 1 / scale_inv | ||
| if user_scale is not None: | ||
| scale = jnp.where(user_scale == 0, scale, user_scale) |
There was a problem hiding this comment.
I am not sure if this will work as intended. The reason Jacobian column scaling works (to some extent) is when you look at all the norms, their relative relation gives a sense of scaling. If we mix the column norms with the random scaling given by the user, we lose this relationship. For example, let's say norms are [1000, 2000, 1000] and user given x_scale is [1, 0, 1], this will result in [1, 2000, 1]. I don't think is will good results in most cases due to this inconsistent scaling.
Maybe we can first normalize the norms based on the maximum of the user given x_scale, then use that? Something like:
if user_scale is not None:
user_scale_max = jnp.max(user_scale)
user_scale_max_id = jnp.where(user_scale == user_scale_max)[0][0]
scale_max_at_id = scale[user_scale_max_id]
scale = scale * user_scale_max / scale_max_at_id
scale = jnp.where(user_scale == 0, scale, user_scale)This first normalizes the scales such that the norm of the column corresponding to maximum user scale, is equal to maximum user scale, this removes the order of magnitude difference of the user scale and Jacobian column norm. We can find a better solution probably.
| return 1 / scale_inv, scale_inv | ||
| scale = 1 / scale_inv | ||
| if user_scale is not None: | ||
| scale = jnp.where(user_scale == 0, scale, user_scale) |
| + f"Got size {x_scale.size} for state vector of size {xp.size}.", | ||
| ) | ||
| D = np.where(np.abs(x_scale) < 1e2, 1, np.abs(x_scale)) | ||
| # x_scale==0 means use auto scale, otherwise use user scale |
There was a problem hiding this comment.
sort of confused here, which auto is being referred to here? Are we overloading auto to mean both use inverse jac norm scales iteratively updated during optimization, but also as the overall scale of each variable used in factorize_linear_constraints to make sure the particular solution does not carry the entire magnitude of the variable?
| if isinstance(x_scale, str) and x_scale == "auto": | ||
| x_scale = auto_x_scale | ||
|
|
||
| self._D = jnp.where(x_scale == 0, auto_x_scale, x_scale) |
There was a problem hiding this comment.
this is not necessary if the x_scale is not auto right? but I guess does not hurt to have? or is this going to be changing if x_scale is passed in or "ess" now?
| if np.all(x_scale == 0): | ||
| x_scale = "jac" # automatic scaling | ||
| else: | ||
| # we can't combine adaptive scaling with user specified scale, but we |
There was a problem hiding this comment.
which user specified scale is this here? like if one used "ess" then what is happening here? is ess being used still?
This allows you to use both ESS and automatic jacobian scaling in the same problem, by using different scales for different objects/variables. Note this still does not allow using automatic scaling "on top of" ESS, since in that case I think the automatic scaling would undo the effect of ESS.
Basically, any element of
x_scaleset to 0 means use the automatic jacobian scaling, and nowx_scale=="auto"basically just setsx_scale=np.zeros()A few questions/concerns:
R_lmnbut "auto" forRb_lmnand have aBoundaryRSelfConsistencyconstraint, the results may be somewhat undefined