Replace computeLength with inflate#8
Conversation
inflate lazily doubles the buffer's size on demand. While memory does suffer, speed improves.
|
Thank you for measuring the performance impact due to buffer size computation! How about instead of inflating we just create a buffer that is same size as input string? The memory overhead in this context is practically meaningless - what matters is the output size. |
I presume you are saying: "We can speed up computeLength instead by approximating the length". I checked it out, json-size from json-joy tackles this issue in ts. Taking some of the approaches let me speed up computeLength by 1-4%, but the table section remained mostly the same. Sadly, the speedup wasn't enough to outweigh the cost of a larger buffer size. In the end it causes a very slight slowdown for some cases and very slight speedup for others. Thus, overall the same performance. The code can be found here. |
|
Never mind the previous remark, I forgot that encode takes Luau structure and produces msgpack encoded string, not json to msgpack. I will see how to incorporate these findings. But first I'd like to improve the test and benchmark harness to use the standalone Luau interpreter. That way tests and benchmarks can be run automatically and avoid the need for environment that supports Roblox Studio and the benchmarking plugin. |
|
The benchmarks themselves needed fluffing. There was not enough variety between the datasets, so I added On another note, when profiling I observed that computeLength takes up less of the overall time when serializing large datasets. I thought that this would mean larger datasets would perform better using the computeLength approach, but to my surprise the inflate approach performed better. |
Inflate lazily doubles the buffer's size on demand instead of calculating the length of the buffer immediately.
The initial buffer size and multiplier of the buffer size can be honed further. I do not have enough datasets to gauge the optimal values.
I initialized the buffer size to 64 since I expect most MessagePack users encode tables and not individual values.
Reasoning:
computeLengthneeds to do redundant work. It needs to obtain the data's type and to check the table's size, both moderately expensive.inflateremoves the aforementioned redundancy, albeit at a cost since there is less information about the data when encoding.Pros:
Cons:
Future Work:
Benchmarks:
For benchmarking, the following datasets were used:
If the websites go down containing the datasets, aside from the one provided here,
msgpack-default, andothers..., they can be found at awesome-json-datasets).old
new