Skip to content

Conversation

@Serial-ATA
Copy link
Contributor

@Serial-ATA Serial-ATA commented Nov 25, 2025

I used jsdom for the scraping, but can switch to something else if you have a preferred library. I'm not sure what the best option is.

Unfortunately, there's no:

Also, there are technically URLs for tracks. For example the track for https://ototoy.jp/_/default/p/3016055 is available at https://ototoy.jp/opus/index.php/19511684, but it's just a redirect back to the album page, so I didn't figure it'd be worth adding.

closes #34

@kellnerd kellnerd added feature New feature or request provider Metadata provider labels Nov 26, 2025
@kellnerd
Copy link
Owner

I haven't looked closer yet, but for the DOM parser I had an eye on https://jsr.io/@b-fuze/deno-dom. It is TS/Deno first, almost dependency-free and should be more lightweight, but I haven't used it for anything so far.
Just letting you know in case you are interested to compare or benchmark both, otherwise I would do that eventually and we start with jsdom.

@Serial-ATA
Copy link
Contributor Author

Yeah, I also saw deno-dom in comparison lists. Looks like it'd almost be a drop-in replacement, so I'll try it out tomorrow, probably.

@Serial-ATA
Copy link
Contributor Author

Seems to work fine, only difference is it doesn't have specific element type definitions so some field accesses had to be replaced with getAttribute()

@Serial-ATA Serial-ATA marked this pull request as ready for review November 30, 2025 17:55
@kellnerd kellnerd self-requested a review December 3, 2025 20:07
@kellnerd
Copy link
Owner

kellnerd commented Dec 5, 2025

I'm sorry for the delay, I had really hoped to review the provider this week, but it will probably take me until next week.

@Serial-ATA
Copy link
Contributor Author

All good, take your time :)

Copy link
Owner

@kellnerd kellnerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for doing the first steps with DOM parsing in Harmony!
I was worried that it might be very slow, but in my few measurements (using performance.now()) it turned out that fetching the HTML is an order of magnitude slower than parsing and scraping it.
(Example for the Beatles Anthology test case: 323ms parsing / 375ms parsing+scraping of 4751ms total provider time. Smaller releases were more like 100ms of 1500ms.)

@Serial-ATA Serial-ATA requested a review from kellnerd December 15, 2025 02:45
Copy link
Owner

@kellnerd kellnerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last thing and we are good to go.


releaseElements.forEach((el) => {
const text = el.textContent.trim();
console.log(text);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I actually had that there cause I realized that every release used in the tests (except for The Beatles - Anthology Collection) only has an original release date. I wonder if that's the most common pattern.

@kellnerd kellnerd merged commit 16361b6 into kellnerd:main Dec 16, 2025
2 checks passed
@Serial-ATA Serial-ATA deleted the ototoy branch December 16, 2025 19:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature New feature or request provider Metadata provider

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OTOTOY

2 participants