Skip to content

Loading adblock cache takes 2-4s with huge cache file #62

@The-Compiler

Description

@The-Compiler

Before I explain the issue, let me note that I'm not sure if this is the right place - chances are, the answer is just "don't do that...", or perhaps this is something that can be improved somehow in the adblocking library rather than this wrapper. However, I lack the Rust knowledge to properly report it there, and I'd like to hear your opinion on this first.

Apparently, there have been posts suggesting to add many more filter lists to qutebrowser, namely:

c.content.blocking.adblock.lists = [
    'https://raw.githubusercontent.com/uBlockOrigin/uAssets/master/filters/annoyances.txt',
    'https://raw.githubusercontent.com/uBlockOrigin/uAssets/master/filters/badlists.txt',
    'https://raw.githubusercontent.com/uBlockOrigin/uAssets/master/filters/badware.txt',
    'https://raw.githubusercontent.com/uBlockOrigin/uAssets/master/filters/filters-2020.txt',
    'https://raw.githubusercontent.com/uBlockOrigin/uAssets/master/filters/filters-2021.txt',
    'https://raw.githubusercontent.com/uBlockOrigin/uAssets/master/filters/filters.txt',
    'https://raw.githubusercontent.com/uBlockOrigin/uAssets/master/filters/privacy.txt',
    'https://raw.githubusercontent.com/uBlockOrigin/uAssets/master/filters/resource-abuse.txt',
    'https://raw.githubusercontent.com/uBlockOrigin/uAssets/master/thirdparties/easylist-downloads.adblockplus.org/easyprivacy.txt',
    'https://raw.githubusercontent.com/uBlockOrigin/uAssets/master/thirdparties/pgl.yoyo.org/as/serverlist',
    'https://raw.githubusercontent.com/StevenBlack/hosts/master/alternates/fakenews-gambling/hosts',
    'https://raw.githubusercontent.com/AdAway/adaway.github.io/master/hosts.txt',
    'https://fanboy.co.nz/fanboy-problematic-sites.txt',
    'https://easylist.to/easylist/easylist.txt',
    'https://raw.githubusercontent.com/bogachenko/fuckfuckadblock/master/fuckfuckadblock.txt'
]

It looks like there are people who blindly copy that, because more clearly must be better or something...

However, running :adblock-update with those lists results in a adblock-cache.dat which is around 130 MB, and qutebrowser hangs about 2-4s at startup with it (when calling self._engine.deserialize_from_file). Some questions/ideas:

  • Is it really supposed to be that large (i.e. are there really so many unique filters in those lists)?
  • Do we also save things like cosmetic filters and such, despite not supporting them yet? If so, would it help to filter (hah) them out?
  • Should it really take 2-4s to deserialize? Can the Brave library maybe improve its performance somehow there?
  • Maybe qutebrowser should just detect when deserialization takes more than 1s or so and warn about it?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions