Skip to content

Comments

Allow combination of ESS and automatic scaling#2032

Open
f0uriest wants to merge 48 commits intomasterfrom
rc/auto_scale
Open

Allow combination of ESS and automatic scaling#2032
f0uriest wants to merge 48 commits intomasterfrom
rc/auto_scale

Conversation

@f0uriest
Copy link
Member

@f0uriest f0uriest commented Dec 11, 2025

This allows you to use both ESS and automatic jacobian scaling in the same problem, by using different scales for different objects/variables. Note this still does not allow using automatic scaling "on top of" ESS, since in that case I think the automatic scaling would undo the effect of ESS.

Basically, any element of x_scale set to 0 means use the automatic jacobian scaling, and now x_scale=="auto" basically just sets x_scale=np.zeros()

A few questions/concerns:

  • I'm not totally sure how it works with LinearConstraintProjection if you have a constraint that couples variables with different scaling types. IE, if you used "ess" for R_lmn but "auto" for Rb_lmn and have a BoundaryRSelfConsistency constraint, the results may be somewhat undefined
  • Right now for ESS, we use a default scale of 1 for any extra variables, but we could maybe use a scale of 0 to use auto scaling for non-ess variables?

Chris J and others added 30 commits May 23, 2025 09:40
Generalize spectral scale creation beyond OmnigeousField to handle any
Optimizable objects. Equilibrium instances create exponential scales,
while non-Equilibrium instances default to ones. Results are concatenated
to preserve proper ordering.
Ensure x_scale has full state vector size (eq.dim_x) for compatibility
with all problem types, including those with linear constraints or
ProximalProjection objectives. Assign scaling selectively using
x_idx keys (e.g., Rb_lmn), preserving generality across optimization
scenarios beyond fixed-boundary cases.
…n objectives

Address dimensional mismatch when x_scale is used with objectives like
ProximalProjection by ensuring x_scale is initialized with eq.dim_x and
then projected appropriately. This resolves issues where the reduced
optimization space omits excluded parameters (e.g., R_lmn, Z_lmn, L_lmn)
and avoids errors during eq.solve(x_scale='ess').

Also improved docstring to clarify size and ordering requirements for
custom x_scale inputs.
…alSurface

- Add validation to prevent ESS in eq.solve() (optimization only)
- Improve documentation on ESS parameters and defaults
- Fix ProximalProjection x_scale handling for multiple things
- Add basic tests for ESS scaling scenarios
@ddudt
Copy link
Collaborator

ddudt commented Dec 15, 2025

Proposed solution:

  • User only has a single xscale input option.
  • If xscale="auto" for all variables. Linear constraint projection D matrix scales based on initial values, and the optimizer xscale uses norm of Jacobian columns. (Same as existing defaults.)
  • If xscale="ess" for all variables. ESS scaling is applied in the linear constraint projection D matrix to make optimizer variables order unity, and the optimizer does no additional scaling.
  • If xscale is not the same for all variables. TBD, we need to think of a good solution.

@dpanici dpanici requested review from a team, YigitElma, ddudt, dpanici, rahulgaur104 and unalmis and removed request for a team December 17, 2025 20:11
options.setdefault("initial_trust_radius", 1e-3)
options.setdefault("max_trust_radius", 1.0)
elif options.get("initial_trust_radius", "scipy") == "scipy":
if options.get("initial_trust_radius", "scipy") == "scipy":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have these here and not in the optimization function itself? I remember getting confused because this default is different than the one set inside the function. If there is no specific reason, I would vote for having these defaults set in the same place.

scl = tng._get_ess_scale(ess_alpha, ess_order, ess_min_value, ess_default)
all_scales.append(scl)
elif isinstance(xsc, str) and xsc == "auto":
scl = tree_map(jnp.zeros_like, tng.params_dict)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether #2041 is merged before or after this, we need to add special logic to SGD type optimizers to deal with these 0 values. SGD only looks at the gradient and norm scaling doesn't work there.

return 1 / scale_inv, scale_inv
scale = 1 / scale_inv
if user_scale is not None:
scale = jnp.where(user_scale == 0, scale, user_scale)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if this will work as intended. The reason Jacobian column scaling works (to some extent) is when you look at all the norms, their relative relation gives a sense of scaling. If we mix the column norms with the random scaling given by the user, we lose this relationship. For example, let's say norms are [1000, 2000, 1000] and user given x_scale is [1, 0, 1], this will result in [1, 2000, 1]. I don't think is will good results in most cases due to this inconsistent scaling.

Maybe we can first normalize the norms based on the maximum of the user given x_scale, then use that? Something like:

if user_scale is not None:
        user_scale_max = jnp.max(user_scale)
        user_scale_max_id = jnp.where(user_scale == user_scale_max)[0][0]
        scale_max_at_id = scale[user_scale_max_id]
        scale = scale * user_scale_max / scale_max_at_id
        scale = jnp.where(user_scale == 0, scale, user_scale)

This first normalizes the scales such that the norm of the column corresponding to maximum user scale, is equal to maximum user scale, this removes the order of magnitude difference of the user scale and Jacobian column norm. We can find a better solution probably.

return 1 / scale_inv, scale_inv
scale = 1 / scale_inv
if user_scale is not None:
scale = jnp.where(user_scale == 0, scale, user_scale)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

@dpanici dpanici requested review from ddudt and dpanici and removed request for ddudt and dpanici December 22, 2025 19:42
@dpanici dpanici marked this pull request as ready for review January 28, 2026 21:13
+ f"Got size {x_scale.size} for state vector of size {xp.size}.",
)
D = np.where(np.abs(x_scale) < 1e2, 1, np.abs(x_scale))
# x_scale==0 means use auto scale, otherwise use user scale
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sort of confused here, which auto is being referred to here? Are we overloading auto to mean both use inverse jac norm scales iteratively updated during optimization, but also as the overall scale of each variable used in factorize_linear_constraints to make sure the particular solution does not carry the entire magnitude of the variable?

if isinstance(x_scale, str) and x_scale == "auto":
x_scale = auto_x_scale

self._D = jnp.where(x_scale == 0, auto_x_scale, x_scale)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not necessary if the x_scale is not auto right? but I guess does not hurt to have? or is this going to be changing if x_scale is passed in or "ess" now?

if np.all(x_scale == 0):
x_scale = "jac" # automatic scaling
else:
# we can't combine adaptive scaling with user specified scale, but we
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which user specified scale is this here? like if one used "ess" then what is happening here? is ess being used still?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants