Skip to content
This repository was archived by the owner on Apr 3, 2024. It is now read-only.
This repository was archived by the owner on Apr 3, 2024. It is now read-only.

Invalid regex in Wappalyzer/data/technologies.json: Symfony: html #81

@arielf

Description

@arielf

Following code work with python3.9 but correctly warns about a bad regex in python3.11:

   from Wappalyzer import Wappalyzer, WebPage
   WPL = Wappalyzer.latest()
   webpage = WebPage.new_from_url(url)
   web_record = WPL.analyze_with_versions_and_categories(webpage)

Trying to run this with python3.11 on " http://yahoo.com" I get:

.../python3.11/site-packages/Wappalyzer/Wappalyzer.py:226: UserWarning: Caught 'unbalanced parenthesis at position 119' compiling regex:

['(?:<div class="sf-toolbar[^>]+?>[^]+<span class="sf-toolbar-value">([\\d.])+|<div id="sfwdt[^"]+" class="[^"]*sf-toolbar)', 'version:\\1']
----------------------------------^^^ invalid?

The 'position 119' seems to a delayed reaction to the core issue.

Indeed it looks like the sub-regex: [^]+ just before is invalid since ^ is a negation/complement for the char-class which is empty here.

The problem is in the data-file:
Wappalyzer/data/technologies.json (towards the end, technologies are alphabetically sorted)

The rule for "Symfony": "html": should be (one char change):

"html": "(?:<div class=\"sf-toolbar[^>]+?>[^<]+<span class=\"sf-toolbar-value\">([\\d.])+|<div id=\"sfwdt[^\"]+\" class=\"[^\"]*sf-toolbar)\\;version:\\1",
------------------------------------------^^^^ the fix

Fixed in this PR:
#80

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions