Skip to content

Conversation

@headius
Copy link
Contributor

@headius headius commented Dec 11, 2025

This PR is meant to be applied on top of #763. It could be reformulated to skip that commit altogether.

In this PR I remove the following features:

  • Dumping an anonymous Data instance

The previous logic supported this case by defining a new Data class every time an anonymous instance was encountered in the YAML stream. As this is clearly not round-tripping the object back to its original class, and would not even deserialize two of the same anonymous data objects as the same class, it is not valid.

This change has precedent: we do not support dumping singleton objects or anonymous Structs, since there's no way to restore the reference to the original class.

  • Dumping a Data instance with instance variables

The previous logic attempted to support this case by allocating an uninitialized Data object and later calling initialize via internal mechanisms (either a C call to rb_struct_initialize or a reflective call via __send__. Because these hacks violate the contract of Data (initialized once and immediately frozen) and because a custom Data subclass with its own initialize and instance variable setup can't be generically constructed, such Datas cannot be dumped.

Removal of these two features greatly simplifies Data dumping and loading:

  • No anonymous Data classes being defined during load.
  • No instance variable logic.
  • No registering the Data instance before it is constructed, since it cannot be self-referential without instance variables.

I have removed the related tests and this passes on all implementations.

JRuby does not expose the `initialize` method for a given Data
instance to native code, so there's no equivalent to calling
`rb_struct_initialize`. This change moves that initialization to
a Ruby `__send__` and removes the native stub method `init_data`.

Fixes broken JRuby `Data` logic after ruby#692.
@headius
Copy link
Contributor Author

headius commented Dec 11, 2025

cc @nevans @tenderlove @eregon for discussion

@headius headius mentioned this pull request Dec 11, 2025
@eregon
Copy link
Member

eregon commented Dec 11, 2025

This is actually quite similar to the first approach & commit in #692: 788b844.

That was refined in later commits, I believe due to trying to support more cases of Data classes.

  1. Data classes can redefine initialize:
    Foo = Data.define(:width, :height, :area) do
      def initialize(width:, height:)
        super(width: width, height: height, area: width * height)
      end
    end

This wouldn't work when deserializing with .new or initialize, Foo.new(width: 2, height: 3, area: 6) would be ArgumentError.

  1. I don't know how common it is but it's also possible set ivars in initialize before the super call. Just ignoring the ivars and letting initialize set them again might not be quite right as noted in Data object encoding #692 (comment) because e.g. the ivars might be mutable and then their state will be lost. Also if these ivars reference another object serialized to YAML after deserialization they would be separate objects and no longer connected.

@nevans I believe you have more background on those and maybe other issues, could you share them here?

@nevans
Copy link
Contributor

nevans commented Dec 11, 2025

we do not support dumping singleton objects or anonymous Structs

But we do support loading them: see test_anon_strcut in test_to_ruby.c. If the yaml doesn't contain a class name, then round-tripping wouldn't be a concern.

This may not be the best use case, but I've personally used this as a strategy to quickly import old or external data into a different context, by simply scrubbing the class names from the tags. Obviously there are other approaches, with other tradeoffs, but this works.

@headius
Copy link
Contributor Author

headius commented Dec 11, 2025

But we do support loading them

This is only for legacy support. We should not use this to justify adding support for dumping anonymous Data now.

quickly import old or external data

Such data should be imported with an older, compatible version that supports it, and then re-dumped as the new supported form. We can't endlessly support old data formats when they become fully incompatible with modern dump output.

@nevans
Copy link
Contributor

nevans commented Dec 11, 2025

I don't have much to add to @eregon's comments (and the comment he linked to). Obviously there are always some caveats and issues when dealing with serialization and deserialization of any custom objects. But, IMO, deserialization should at least be able to cope with custom #initialize method like the one in @eregon's last comment.

I doubt that setting ivars on data objects is very common. But it should be supported (just as it is for Struct or any other object).

@headius
Copy link
Contributor Author

headius commented Dec 11, 2025

I doubt that setting ivars on data objects is very common. But it should be supported (just as it is for Struct or any other object).

No other type of object is born frozen, so this argument is moot. Since Data must be initialized once and immediately frozen, and there's no protocol to hook into this and set up instance variables, Data with instance variables cannot be properly dumped and loaded.

@nevans
Copy link
Contributor

nevans commented Dec 11, 2025

This is only for legacy support

There's no deprecation warning in the code. Changing something like this deserves to have a deprecation period, IMO.

@headius
Copy link
Contributor Author

headius commented Dec 11, 2025

Data classes can redefine initialize

Since there is no way to generically invoke initialize for such cases, and since Data is supposed to be initialized once and then immediately frozen, this cannot be supported.

it's also possible set ivars in initialize before the super call

Same argument applies here. Custom initialize logic that hooks into Data's initialize chain cannot be generically re-invoked during loading, so such objects cannot be dumped.

@headius
Copy link
Contributor Author

headius commented Dec 11, 2025

There's no deprecation warning in the code.

If you mean deprecation warning for Data with instance variables, I would argue it should never have been supported or released.

@nevans
Copy link
Contributor

nevans commented Dec 11, 2025

Data with instance variables cannot be properly dumped and loaded.

Maybe it can, maybe it can't. Psych will dump any regular old ruby object you give it. And it'll handle ivars on them in almost the exact same way it does with all the core object types: by using instance_variable_set. Is that valid for every single object you throw at it? Of course not. But its valid for many types of objects. If someone sets some instance variables in their data class's initialize, can those be safely serialized and deserialized? Maybe. Why couldn't that be possible?

@headius
Copy link
Contributor Author

headius commented Dec 11, 2025

If someone sets some instance variables in their data class's initialize, can those be safely serialized and deserialized? Maybe. Why couldn't that be possible?

Because we cannot hook into the initialize sequence from the Psych loading code without invoking that sequence during allocation, and custom initialize methods that do more than just pass through Data fields cannot be generically invoked for all cases.

@nevans
Copy link
Contributor

nevans commented Dec 11, 2025

Is there some particular philosophical reason why YAML shouldn't be able to support what Marshal does (this includes ivars and bypassing #initialize)? Or should Marshal not fully support Data objects either?

@headius
Copy link
Contributor Author

headius commented Dec 11, 2025

Is there some particular philosophical reason why YAML shouldn't be able to support what Marshal does

Marshal has custom protocols built into each runtime for accessing and initializing the internal state of objects during allocation. Psych does not. Until that changes, Psych should only support "simple" Data types.

@headius
Copy link
Contributor Author

headius commented Dec 11, 2025

Here's another way to look at this:

The contract of Data is that it is constructed, initialized, and frozen in one step when calling new. Custom subclasses can add to this sequence while the object is being initialized by defining their own initialize methods that manipulate incoming data or set instance variables, but once the object is finished "newing" it is fully initialized and frozen.

Psych cannot hook into this sequence because there's no protocol for it to do so. The previous logic relied on lifecycle hacks, primarily from native code, to manipulate an uninitialized Data object before it could actually initialize itself. This breaks the contract of Data.

We must not:

  • Manipulate a Data object that has not been initialized, because the contract of Data is that it is born frozen.
  • Invoke custom initialize methods during deserialization, because we cannot do so generically and therefore cannot reproduce logic in those initialize methods during load.

A possible workaround for this would be to implement a Psych protocol for custom Data subclasses that want to be able to dump and load, similar to what Marshal provides for other custom Ruby types.

class MyData << Data.define(:foo, :bar)
  def initialize(foo, baz)
    @baz = baz
    @this = self
    super(foo, foo + 1)
  end

  def psych_dump
    {foo: foo, baz: @baz}
  end

  def self.psych_load(hash)
    MyData.new(hash[:foo], hash[:baz])
  end
end

(Edit: added more logic to show incoming ivars as part of the protocol and move the ivar initialization above the super call. JRuby freezes the object in new after initialize has finished, which is perhaps how it should be in CRuby)

This would be equivalent to how many other classes use custom Marshal logic when their custom initialize logic cannot be replicated in a simple dump/load sequence.

@headius
Copy link
Contributor Author

headius commented Dec 11, 2025

Also, just so my biases are laid bare:

  • I think it was a terrible decision to allow Data to have instance variables in the first place, since it leads to all these problems with the initialization sequence and serialization.
  • Custom initialize methods are fine, but the contract of Data should have always been that any state initialized during construction must be trivially copyable, to support serialization mechanisms.

Data should have had the same constraints as a Record in Java, and then we wouldn't have to debate how to support those weird edge cases.

@nevans
Copy link
Contributor

nevans commented Dec 11, 2025

I'm not sure how or why Data's "contract" is any different from any other object in this regard? It's not normal to bypass Class.new or Object#initialize on any object, and yet that's what Psych (and Marshal) both do.

irb(main):001* D = Data.define do
irb(main):002*   attr_reader :a
irb(main):003> end
=> D
irb(main):004> d = D.allocate
=> #<data D>
irb(main):005> d.instance_variable_set :@a, "a"
=> "a"
irb(main):006> d.a
=> "a"
irb(main):007>

Now I have the problem that I can't update any data members with any existing ruby method... but there is a method that's part of Ruby's public C API that can do it.

@headius
Copy link
Contributor Author

headius commented Dec 11, 2025

I'm not sure how or why Data's "contract" is any different from any other object in this regard?

Because Data is supposed to be born frozen, while no other object has this as part of their contract (except for objects that don't allow instance variable state, which also supports my argument).

Your example never invokes the D object's initialize, so it is not valid. And because you cannot invoke D's initialize in a generic way, there's no way to proceed from here.

Hacking around in the C code to call CRuby-specific internal methods that initialize D without calling initialize is not an acceptable alternative, because the contract of Data disallows that state. The comments around the freeze operation in Data's current implementation even make this clear:

    rb_hash_foreach(argv[0], struct_hash_set_i, (VALUE)&arg);
    // Freeze early before potentially raising, so that we don't leave an
    // unfrozen copy on the heap, which could get exposed via ObjectSpace.
    OBJ_FREEZE(self);
    if (arg.unknown_keywords != Qnil) {
        rb_exc_raise(rb_keyword_error_new("unknown", arg.unknown_keywords));
    }

An unfrozen Data object should never be referenced or manipulated from Ruby, and therefore there's no way to support initializing instance variables in a generic way during loading.

@headius
Copy link
Contributor Author

headius commented Dec 11, 2025

Additional argument against using rb_struct_initialize: this is for Struct, not Data. The fact that Data and Struct are implemented with many of the same native functions in CRuby is irrelevant; Data does not extend Struct and Data is not kind_of? Struct.

$ cx ruby-master ruby -e "D = Data.define(:foo); p D.new(1).kind_of?(Struct)"
false

In JRuby, Data and Struct are completely disjoint types, including at the native level, and there is no equivalent hack to early-initialize a Data object's fields, because that violates the contract of Data.

@headius
Copy link
Contributor Author

headius commented Dec 11, 2025

Further justification: the reason you cannot initialize a Data yourself is because it was designed that way.

static VALUE
rb_data_initialize_m(int argc, const VALUE *argv, VALUE self)

The initialization sequence of Data was never designed to be invoked from outside of this C code, and therefore there's no way to initialize a Data object other than to follow the existing protocol. That protocol does not support initializing instance variables on an already-visibile-but-not-yet-initialized Data object unless you are inside a custom initialize. Since we can't generically invoke all possible custom initialize methods, and can't consistently pass through instance variables to allow such initializaiton, those use cases cannot be supported.

CRuby could certainly change the Data initialization protocol to support generic object serialization frameworks, but as it exists today we cannot do this without violating the explicitly defined lifecycle of Data.

@nevans
Copy link
Contributor

nevans commented Dec 11, 2025

Because Data is supposed to be born frozen, while no other object has this as part of their contract

class C
   attr_reader :a
   def initialize
     @a = "a"
     freeze
   end
end

This class has it as part of its contract, in the exact same way as Data: it's the last thing done in the initializer.

irb(main):008> YAML.unsafe_load C.new.to_yaml
=> #<C:0x000078497ce20458 @a="a">

I agree that using rb_struct_initialize is an abuse of that CRuby implementation detail. There should be a public method for invoking Data#initialize. I would've prefered that Data.instance_method(:initialize).bind_call(object, **members) serve that purpose, but ...

Wait a minute... I could've sworn I'd tried this before.

@nevans
Copy link
Contributor

nevans commented Dec 11, 2025

irb(main):002> D = Data.define(:foo)
=> D
irb(main):003> d = D.allocate
=> #<data D foo=nil>
irb(main):004> D.instance_method(:initialize).bind_call(d, foo: "bar")
=> nil
irb(main):005> d
=> #<data D foo="bar">

Nothing suprising there. Just calling Class.allocate and then object.class.initialize, the same way new normally does.

irb(main):006> d = D.allocate
=> #<data D foo=nil>
irb(main):007> Data.instance_method(:initialize).bind_call(d, foo: "bar")
=> nil
irb(main):008> d
=> #<data D foo="bar">

I was pretty certain I'd tried that when first implementing #692, and got exceptions.

@headius
Copy link
Contributor Author

headius commented Dec 11, 2025

First off, I think it's a bug that a Data class has a visible allocate method, and I would argue that should be fixed. That explicitly violates the contract mentioned in the C code I pasted above. An unfrozen Data should never be allowed "into the wild" because it can leak out to other consumers and end up with state it was never designed to have.

If we assume that it's not a bug and won't be removed, then your hack of using the Data#initialize method object is probably the only acceptable way to handle this feature. We are not executing custom logic from a subclass's initialize, but that is as you say pretty much the case for all other types of objects, and users should not have an expectation that custom initialize methods are executed during deserialization.

In the interest of getting this working on JRuby, I'd be fine with you pushing a PR to remove all custom C code related to this feature and instead using the Method trick. I don't feel great about it, with regards to the Data contract, but until there's a formal way to support serialization of Data (other than Marshal) it's perhaps good enough.

So:

  • Disallow anonymous Data dumping and loading.

I still believe we should not introduce new problems by allowing anonymous Data to be dumped (or loaded) because that clearly is not valid; the resulting obj.class from one anonymous Data instance in the yaml stream will never be == to another instance from that stream even though they should be the same.

  • Use Data.instance_method(:initialize) to lazily initialize Data objects that have custom initialize methods.

This is the gross case and should only be used when an object has a custom initialize method (since that's the only way they could set instance variables). So you need to compare Data.instance_method(:initialize) with D.instance_method(:initialize) as well. Best of luck making the performance not suck. 🙂

  • Use only normal initialization for Data objects that do not have custom initialize methods.

Plain old Data objects that do not have custom initializers and do not set instance variables should not be penalized by those that do. Just construct them normally, without registering during loading.

@nevans
Copy link
Contributor

nevans commented Dec 11, 2025

I justed tested on 3.2.6, so I must've misremembered some experiment or made some other mistake when I first implemented #692.

With this data class definition, demonstrating ivars and different kwargs from super:

irb(main):001* D = Data.define(:foo, :bar, :baz) do
irb(main):002*   attr_reader :ivar
irb(main):003*   def initialize(foo:, bar: foo * 2)
irb(main):004*     @ivar = "wat"
irb(main):005*     super(foo:, bar:, baz: "etc")
irb(main):006*   end
irb(main):007> end

To demonstrate the "normal" way of using it:

irb(main):010> d = D["foo"]
=> #<data D foo="foo", bar="foofoo", baz="etc">
irb(main):011> d.ivar
=> "wat"

And to demonstrate that explicitly calling allocate, followed by instance_variable_set, followed by Data#initialize works:

irb(main):012> d = D.allocate
=> #<data D foo=nil, bar=nil, baz=nil>
irb(main):013> d.instance_variable_set :@ivar, "wat"
=> "wat"
irb(main):015> Data.instance_method(:initialize).bind_call(d, foo: "foo", bar: "foofoo", baz: "etc")
=> nil
irb(main):016> d
=> #<data D foo="foo", bar="foofoo", baz="etc">
irb(main):017> d.frozen?
=> true

As for any special "contracts" may have that set it apart from other classes and objects, I still don't see why this should be any different from e.g other core classes like Array or String with instance variables, or any other custom class that calls "freeze" at the end of their initialize method.

@nevans
Copy link
Contributor

nevans commented Dec 11, 2025

should only be used when an object has a custom initialize method (since that's the only way they could set instance variables). So you need to compare Data.instance_method(:initialize) with D.instance_method(:initialize) as well

Why make the comparison when DATA_INITIALIZE.bind_call(o, **members) would always do what we want?

@headius
Copy link
Contributor Author

headius commented Dec 11, 2025

FWIW the hack doesn't work on JRuby either, since we never expected users to be able to call Data#initialize from Ruby. Perhaps this is the error you remember (though Data didn't even exist in JRuby when you started your original PR).

I will look into fixing this on the JRuby side.

@headius
Copy link
Contributor Author

headius commented Dec 11, 2025

Why make the comparison when DATA_INITIALIZE.bind_call(o, **members) would always do what we want?

The performance of bind_call is very poor compared to simply constructing the object the normal way, and it is totally unnecessary if there's no custom initialize method to skip.

@nevans
Copy link
Contributor

nevans commented Dec 11, 2025

I suppose the question is how much is the performance for this:

data = klass.allocate
ivars&.each do |ivar, v|
  data.instance_variable_set ivar, v
end
DATA_INITIALIZE.bind_call(o, **members)

vs this:

if (ivars.nil? || ivars.empty?) && DATA_INITIALIZE.equal?(klass.instance_method(:initialize))
  klass.new(**members)
else 
  data = klass.allocate
  ivars.each do |ivar, v|
    data.instance_variable_set ivar, v
  end
  DATA_INITIALIZE.bind_call(data, **members)
  data
end

And how significant is that performance difference?

Without running benchmarks, I'm inclined to create the simpler PR. And I need to punt on the performance benchmarks for now (because I have other things I was supposed to be working on today!)

@eregon
Copy link
Member

eregon commented Dec 11, 2025

Data#initialize is a no-go, it doesn't work on TruffleRuby either, and I don't think it makes sense to add it, it would be terribly slow and be inconsistent, the only method for Data is Data.define and that returns a subclass with the expected methods.

Maybe the initialize method on the Data direct subclass could be used, in my example that'd be Foo.ancestors[1].instance_method(:initialize). That can be monkey-patched but typically it's going to be an override in a subclass instead like that example.

@headius
Copy link
Contributor Author

headius commented Dec 11, 2025

DATA_INITIALIZE.equal?(klass.instance_method(:initialize))

Actually I think you're right, assuming the method hack will never be used for "simple" Data types.

Data types without instance variables dump as data not data_with_ivars and only the latter cases should use the method hack.

Data types that have instance variables should get dumped as data_with_ivars, so I think that's sufficient indication.

@nevans
Copy link
Contributor

nevans commented Dec 11, 2025

@eregon Right. Maybe the TruffleRuby case was what I was remembering? I've definitely looked at TruffleRuby's Data implementation.

Foo.superclass.instance_method(:initialize) doesn't work if there's another custom superclass between Data and Foo. Nor does would it work with Data.define(...) do def initialize.

IMO, TruffleRuby should have a (de-optimized) fallback for Data#initialize (and any other instance_methods that are defined on Data in CRuby). That would probably clear some of the other compatibility issues I'd previously encountered, too.

@headius
Copy link
Contributor Author

headius commented Dec 11, 2025

Data#initialize is a no-go, it doesn't work on TruffleRuby either

I don't particularly like the hack but I'd be willing to add this if necessary.

Maybe the initialize method on the Data direct subclass could be used

You can't generically invoke a subclass's custom initialize because the arity may be different and there may be side-effects. The only way this feature lands (safely) is if it's possible to bypass the custom initialize and set up the instance variables directly.

@nevans
Copy link
Contributor

nevans commented Dec 11, 2025

@headius Actually, I think that calling Data#initialize directly is more important than even including support for ivars. I've both encountered and written code like @eregon's example above, where initialize is overridden in a way that won't allow it to be used for deserialization. I've never written any data objects with ivars, and I can't recall seeing any non-toy code use them. I do think they should be supported, but that's more of a abstract philosophical stance.

@headius
Copy link
Contributor Author

headius commented Dec 11, 2025

Maybe the TruffleRuby case was what I was remembering?

the hack doesn't work on JRuby either

Actually this wasn't quite accurate. JRuby does define an initialize on Data subtypes, but it only takes a Hash (I believe this is the same in CRuby). The new logic is what massages any incoming Array into a Hash for it. When I tested, I tried to pass positional args, but that only works through new.

So your example works fine on JRuby if I only use the members hash:

irb(main):001> D = Data.define(:foo)
=> D
irb(main):002> d = D.allocate
=> #<data D foo=nil>
irb(main):003> Data.instance_method(:initialize).bind_call(d, {foo: 1})
=> nil
irb(main):004> d.foo
=> 1
irb(main):005> 

This commit removes support for the following features:

* Dumping an anonymous Data. This is unsupported for the same
  reasons as dumping anonymous modules or classes: there's no way
  to reference the proper Data class, and defining a new anonymous
  Data for every such element would not round-trip properly.
* Dumping a Data with instance variables. Instance variables on a
  Data can only be set during the initialize chain when first
  constructed, and such a chain is custom for each such Data class.
  As there is no way to modify a Data after constructed and no way
  to accurately invoke the custom initialize method, all Data with
  instance variables are rejected.

Data is also no longer registered while loading, since it's not
possible to have a self-referential Data instance without a custom
initialize that sets an instance variable to self. The combination
of these removed features simplifies Data dumping and loading and
eliminates all allocate+initialize hacks.
@headius
Copy link
Contributor Author

headius commented Dec 11, 2025

I repushed with the removed tests I forgot to include before, but it sounds like this PR may be rejected in favor of a less hacky version of the old logic. @nevans I can put that together if you are busy with other things today.

@headius
Copy link
Contributor Author

headius commented Dec 11, 2025

JRuby doesn't actually even support Data subclasses with instance variables right now:

$ jruby -e "class Foo < Data.define(:foo); def initialize(**); @bar = 1; super; end; end; p Foo[2]"
Unhandled Java exception: java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1
java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1

This is a bug.

I have the rest of the implementation passing on CRuby, though.

@headius
Copy link
Contributor Author

headius commented Dec 11, 2025

FWIW I believe we still need buy-in from @eregon to go forward with any solution, so I'm not closing this PR just yet.

I will push a separate PR that uses the method hack.

@nevans
Copy link
Contributor

nevans commented Dec 11, 2025

Sorry, was in a meeting. Just pushed my own branch with very minimal change. It's maybe not much different from yours. But we'd need some way to make it work for TruffleRuby too.

@eregon
Copy link
Member

eregon commented Dec 11, 2025

Because these hacks violate the contract of Data (initialized once and immediately frozen) and because a custom Data subclass with its own initialize and instance variable setup can't be generically constructed, such Datas cannot be dumped.

@headius I think this contract is something you would like, but I don't think any such contract is established.
Data is not even immutable, it's just shallow-frozen.

Data.allocate is currently supported (de facto), and it is supported and documented to override initialize, so Ruby code does have access to an unfrozen Data instance, at least during the custom initialize.
In fact I can even return an unfrozen Data instance easily:

> Data.define(:a,:b) { def initialize(**); @a = 42; end }.new.frozen?
=> false

(and that could use allocate if new is force-freezing, but new isn't force-freezing on CRuby)

In many comments you say there is no way to initialize a Data instance fields, but there is, it's rb_struct_initialize(). It's also clear that rb_struct_initialize() supports Data because there is code handling Data specifically there:
https://github.com/ruby/ruby/blob/d02c97157476bbd9774f2bf6425a69166b99da1b/struct.c#L798 (it freezes it after setting fields, which does give one guarantee you want).
I agree the name is bad and the semantics are not great (so a new function say rb_data_initialize would be a lot better), but nevertheless this is the way to set Data fields for deserialization, and the proof is Marshal is using it.
Having it as a C API kinda make sense to me because this is a low-level thing that shouldn't be used by most Ruby code.

You also mention that the Data subclass initialize must run, but this is something that almost all deserializers do not respect, on purpose, because that breaks (among others) references shared between fields of that object and other objects, it loses mutations to such objects, it's not possible to get the original arguments passed to the constructor, etc, as described in item 2 of #764 (comment). Note it also applies with no ivars, e.g.

array = [1, 2]
d = MyData.new(x: array)
array << 3 # perfectly valid, and it would be surprising if that state is lost
reloaded = Psych.load Psych.dump([array, d])
reloaded[0].should.equal?(reloaded[1].x)
reloaded[0].should == [1, 2, 3]

As @nevans says, for regular objects, whether they freeze in constructor or not, it's the same, it's not OK to call initialize to deserialize, instead fields are set directly, as in most (all?) deserialization libraries for sufficiently-generic objects where the constructor might be overridden.

TLDR: I see your POV, but I don't think it's good to limit ourselves to such a small subset of Data classes. The current behavior (+ #761) works for AFAIK all Data classes, and that certainly seems nice from the POV of a Psych user.
I think JRuby should have some API to set the fields of a Data instance.
And we should have a more proper C API to set fields of a Data instance, but functionally it would achieve the same result as rb_struct_initialize() does currently.

@headius
Copy link
Contributor Author

headius commented Dec 11, 2025

@eregon I think you missed all of the discussion above where I concede several of these points.

@eregon
Copy link
Member

eregon commented Dec 11, 2025

Data.instance_method(:initialize).bind_call(d, {foo: 1})

Aside: interesting that that works, it probably shouldn't because it should only accept keyword arguments.

@nevans
Copy link
Contributor

nevans commented Dec 11, 2025

FWIW: #765 is passing CI for CRuby, TruffleRuby, and JRuby (although maybe I needed to remove some pends in the tests?). But, to get TruffleRuby passing, I added a hack which is probably worse than rb_struct_initialize from @eregon's POV.

@eregon
Copy link
Member

eregon commented Dec 11, 2025

Mmh, OK, I'll add Data#initialize in TruffleRuby.
It's not ideal as it is a megamoprhic call to get the members of a Data class, but OTOH rb_struct_initialize() already has the same megamoprhic call so it's not worse.

For context there is truffleruby/truffleruby#3892 which I think is too bad of a performance a trade-off. But Data#initialize alone is more reasonable and doesn't slow down the normal case.

I think perf-wise it is likely quite a bit worse than #761 which does not use a Hash and does not do Hash lookups but just 2 arrays (members & values).
But I agree Data#initialize is a cleaner API than rb_struct_initialize().

@eregon
Copy link
Member

eregon commented Dec 11, 2025

Let's merge #765 instead.

@eregon eregon closed this Dec 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants