Skip to content

Conversation

@glebm
Copy link

@glebm glebm commented Jan 9, 2023

Enables more optimizations for the rs90.

Enables more optimizations for the rs90.

Signed-off-by: Gleb Mazovetskiy <glex.spb@gmail.com>
@pcercuei
Copy link
Member

@glebm got any perf numbers?

@glebm
Copy link
Author

glebm commented Jan 18, 2023

Not sure which built-in application to use to measure the impact of this, closing for now
If someone has one in mind, feel free to test and reopen

@glebm glebm closed this Jan 18, 2023
@SnDream
Copy link

SnDream commented Jan 23, 2023

For me personally, this configuration works well.
I bought the rs90 originally to run Gambatte on this device to run some GB games, so I was concerned about how fast Gambatte would run.
I typically test at 456Mhz using a demoscene called Mental Respirator, and before this configuration, only one scene could not be emulated at full speed. After this configuration, finally all scenes can run at full speed (although the worst scenes can only output to the screen at around 11fps). Thanks a lot!
However, this version does not display the battery icon properly when running on my device, and checking jz-battery always shows 1472000.

@glebm
Copy link
Author

glebm commented Jan 23, 2023

Thanks for testing!
That is very odd, this shouldn't have any impact on the battery icon but perhaps something else broke it recently.

Which version did you compare this one with?

@SnDream
Copy link

SnDream commented Jan 23, 2023

Tested the latest master version, same battery icon issue, maybe some other commit introduced or my device is damaged.
Only I remember it happened after I flashed in this version, let me test an earlier version to see.

@SnDream
Copy link

SnDream commented Jan 23, 2023

The battery icon issue is definitely not related to this commit.
Also, I tested with Final Fantasy V (360Mhz, Frameskip 0) and found no difference in performance (47-48fps on the title screen). The initial amount of memory available to the device is also the same.
If there are other more suitable test scenarios, I can try them. For me this commit has a good boost on Gambattle, but there seems to be no difference on ReGBA.

@glebm glebm reopened this Jan 23, 2023
@pcercuei
Copy link
Member

It sounds really strange to me that a root filesystem compiled differently makes Gambatte faster.
What is Gambatte doing, that it needs to spend so much time in the system libs?

As for the change itself - I don't think it's a good idea to add all three flags at once, one of these might actually be a regression. I'd suggest enabling LTO first, since it should be the most influential.

@glebm
Copy link
Author

glebm commented Jan 23, 2023

What is Gambatte doing, that it needs to spend so much time in the system libs?

I've just realized that one of the system libs is SDL, and that one could have quite a bit of an impact due to blitting operations.

@pcercuei
Copy link
Member

And why would Gambatte use blitting operations? It renders to a 16-bit (or 32-bit) buffer, which gets blitted to the frame buffer? If so, that's very uneffective, it should render to the frame buffer directly in a double/triple buffering fashion.

@glebm
Copy link
Author

glebm commented Jan 23, 2023

I'd suggest enabling LTO first, since it should be the most influential.

BR2_ENABLE_LTO will have no impact currently because of the way it is implemented.
LTO is only enabled for certain packages that explicitly support it. These packages are:

$ rg 'BR2_ENABLE_LTO\),y' --files-with-matches
package/wireshark/wireshark.mk
package/fastd/fastd.mk
package/rocksdb/rocksdb.mk
package/valgrind/valgrind.mk
package/unbound/unbound.mk
package/log4cplus/log4cplus.mk
package/netdata/netdata.mk

Perhaps we could simply add -flto to the BR2_TARGET_OPTIMIZATION instead.

@glebm
Copy link
Author

glebm commented Jan 23, 2023

Sent #113 with LTO, let's see if it builds

@pcercuei
Copy link
Member

I highly doubt so :)
The Linux kernel won't like the -flto flag. Right now doing LTO on the kernel is only possible with Clang.

@SnDream
Copy link

SnDream commented Jan 23, 2023

And why would Gambatte use blitting operations? It renders to a 16-bit (or 32-bit) buffer, which gets blitted to the frame buffer? If so, that's very uneffective, it should render to the frame buffer directly in a double/triple buffering fashion.

Just to be clear, I used gambatte with some additional modifications to expect it to work faster on rs90. One of the modifications is to turn on the yuv option, this is faster than the default way in real tests, which may change the way the graphic is output.
I am very sorry that I am hardly familiar with c++, so I cannot do a useful performance analysis of it.

@glebm
Copy link
Author

glebm commented Jan 23, 2023

The Linux kernel won't like the -flto flag. Right now doing LTO on the kernel is only possible with Clang.

AFAIK these flags are not used to build the kernel. The kernel uses its own set of flags, controlled only via linux_defconfig.
We'll see in an hour :)

@glebm
Copy link
Author

glebm commented Jan 25, 2023

I've sent 2 separate PRs:

  1. -O3 od: use -O3 #114
  2. -fipa-pta od: rs90: use -fipa-pta #115

Again, would appreciate testing

@glebm glebm mentioned this pull request Feb 10, 2023
@glebm
Copy link
Author

glebm commented Mar 18, 2023

Closing this now as part of it has been merged and another part extracted to #114

@glebm glebm closed this Mar 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants