Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
73 commits
Select commit Hold shift + click to select a range
b01758e
Fillout forms
paul-at Sep 14, 2024
7cca514
Merge pull request #194 from TitovDigital/fillout
enthec-opensource Sep 16, 2024
ecbe9b8
oops
enthec-opensource Sep 16, 2024
99792c1
fix outdated links
RignonNoel Nov 3, 2024
048817f
Merge pull request #196 from RignonNoel/patch-1
enthec-opensource Nov 4, 2024
7d88b01
some new technologies
enthec-opensource Nov 4, 2024
583e672
icons fix
enthec-opensource Nov 4, 2024
fdfeb1c
oops
enthec-opensource Nov 4, 2024
55c1eb1
Merge pull request #197 from enthec/sync
enthec-opensource Nov 4, 2024
e22e25d
structure auto fix script
enthec-opensource Nov 9, 2024
f07f9e1
Merge pull request #199 from enthec/scripts
enthec-opensource Nov 9, 2024
ac6df33
Rename icon to avoid conflicts
thc202 Nov 11, 2024
79e7569
Merge pull request #200 from thc202/correct-icon
enthec-opensource Nov 11, 2024
e7b1274
Merge branch 'main' into sync
enthec-opensource Dec 27, 2024
81a024a
tech resync
enthec-opensource Dec 27, 2024
f476bbc
Merge branch 'main' into sync
enthec-opensource Dec 27, 2024
aaf3709
Merge pull request #201 from enthec/sync
enthec-opensource Dec 27, 2024
150e169
tech resync
enthec-opensource Jan 22, 2025
2024451
Merge pull request #202 from enthec/sync
enthec-opensource Jan 22, 2025
f64f0e3
dns issues
enthec-opensource Jan 30, 2025
a55153d
cleaner readme
enthec-opensource Jan 30, 2025
c5235cd
dns doc
enthec-opensource Jan 31, 2025
b8dab3c
note
enthec-opensource Jan 31, 2025
5c0ddae
note format
enthec-opensource Jan 31, 2025
a93aef8
html
enthec-opensource Jan 31, 2025
67419cd
Merge pull request #203 from enthec/readme
enthec-opensource Jan 31, 2025
7281529
Update README.md
kingthorin Feb 16, 2025
424878a
Merge pull request #204 from kingthorin/patch-1
enthec-opensource Feb 17, 2025
11d1f7b
Erlang: remove OTP prefix from version string
Mar 28, 2025
fbd9b60
Merge pull request #205 from maxime-huyghe/erlang
enthec-opensource Apr 2, 2025
a268dcf
Add jQuery Popup overlay
geoff-rmtech Apr 28, 2025
395bb2e
Enclose the scriptSrc in brackets
geoff-rmtech Apr 29, 2025
c5077a4
Remove trailing comma
geoff-rmtech Apr 29, 2025
b3744d6
Merge pull request #206 from geoff-rmtech/jQueryPopupOverlay
enthec-opensource Apr 29, 2025
cd920f0
fix: remove leading greedy regex operator on scripts pattern
Zalaxx May 14, 2025
341a477
Merge pull request #207 from Zalaxx/main
enthec-opensource Jul 7, 2025
e527960
Merge branch 'main' into sync
enthec-opensource Jul 7, 2025
ed35b9e
resync
enthec-opensource Jul 7, 2025
3ddbc26
Merge pull request #210 from enthec/sync
enthec-opensource Jul 7, 2025
4460d8c
Added the zenbasket platform under technologies
BharatPrakas Aug 12, 2025
d5fcc78
Add content adapted from SQLMap
kingthorin Aug 13, 2025
e89eedc
Tweak
kingthorin Aug 13, 2025
40bbed3
Add websphere icon svg
kingthorin Aug 13, 2025
a188936
Actually reference the icon
kingthorin Aug 13, 2025
9ea4442
Tweak white space
kingthorin Aug 13, 2025
eea1455
Remove unused HTML meta tag for zenbasket platform
BharatPrakas Aug 18, 2025
f3ec763
Merge pull request #211 from kingthorin/add-cookies
enthec-opensource Aug 30, 2025
16e3a49
Convert CRLF to LF, establish .gitattributes for consistent handling
kingthorin Sep 18, 2025
fa27f8d
Merge pull request #214 from kingthorin/eol-consistency
enthec-opensource Sep 23, 2025
d93629b
Fix jquery definition misidentifying popupoverlay
geoff-rmtech Oct 13, 2025
c953ae5
Merge pull request #215 from geoff-rmtech/fix-jquery-definition-misid…
enthec-opensource Oct 13, 2025
9201e8f
Fix regex performance
Zalaxx Oct 28, 2025
afb21dc
Merge pull request #216 from Zalaxx/main
enthec-opensource Oct 28, 2025
f71b226
Merge pull request #212 from BharatPrakas/main
enthec-opensource Nov 18, 2025
24f9aa8
Additions from HTTPArchive
kingthorin Nov 18, 2025
a1bafd4
Fix Google Font API website URL
nafiul09 Nov 20, 2025
0016640
Merge pull request #217 from kingthorin/http-arch-additions
enthec-opensource Nov 24, 2025
e3cdd2e
Merge pull request #219 from nafiul09/fix/google-font-api-website-url
enthec-opensource Nov 24, 2025
185bfa2
random fixes & cleanup
enthec-opensource Nov 24, 2025
5503dc4
Merge pull request #220 from enthec/fixes
enthec-opensource Nov 24, 2025
97fe28c
Merge remote-tracking branch 'origin/main' into sync
enthec-opensource Nov 24, 2025
7316fe9
tech resync
enthec-opensource Nov 24, 2025
5d6c85a
fixes
enthec-opensource Nov 24, 2025
7b569be
replaced icons
enthec-opensource Nov 24, 2025
ffbe655
Merge pull request #221 from enthec/sync
enthec-opensource Nov 26, 2025
de3fe03
Update next.js vendor
geoff-rmtech Dec 8, 2025
cd3eec4
Merge pull request #222 from geoff-rmtech/update-next-js-vendor
enthec-opensource Dec 8, 2025
cfd5a3c
Correct confidence and version tags (zenbasket and pubtech)
kingthorin Dec 17, 2025
4361051
Merge pull request #224 from kingthorin/conf-vers-fix
enthec-opensource Dec 17, 2025
4e9537e
schema: Set dialect, lint and validate data files
kingthorin Dec 29, 2025
efefd1e
Merge pull request #226 from kingthorin/schema-dialect
enthec-opensource Dec 30, 2025
be9c66d
more validators
enthec-opensource Dec 30, 2025
b84c993
Merge pull request #227 from enthec/validations
enthec-opensource Dec 30, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
11 changes: 11 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
* text=auto eol=lf

*.json text
*.md text
*.py text
*.svg text
*.txt text
*.yml text
*.yaml text

*.png binary
16 changes: 16 additions & 0 deletions .github/workflows/scripts/icon_path_validator.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,19 @@
import pathlib
import string
from typing import Final
from xml.etree import ElementTree


class InvalidStructureException(Exception):
def __init__(self, msg: str):
super().__init__(msg)


class InvalidSVGException(Exception):
def __init__(self, msg: str):
super().__init__(msg)


class IconValidator:
def __init__(self):
self._SOURCE_DIR: Final[str] = "src"
Expand All @@ -23,6 +29,16 @@ def validate(self) -> None:
for file in self._FULL_IMAGES_DIR.iterdir():
if file.name not in json_icons:
raise InvalidStructureException(f"{file.name} must be used, {file} isn't used!")
if file.name.lower().endswith(".svg"):
self._validate_svg(file)

def _validate_svg(self, file: pathlib.Path) -> None:
try:
with file.open("r", encoding="utf8") as f:
content: str = f.read()
ElementTree.fromstring(content)
except ElementTree.ParseError as e:
raise InvalidSVGException(f"Invalid SVG '{file.name}': {e}")

def get_json_icons(self) -> set[str]:
letters: list[str] = list(string.ascii_lowercase)
Expand Down
38 changes: 38 additions & 0 deletions .github/workflows/scripts/schema_validator.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
import json
import pathlib
from typing import Final

from jsonschema import validate, ValidationError


class SchemaValidationException(Exception):
def __init__(self, msg: str):
super().__init__(msg)


class SchemaValidator:
def __init__(self):
self._SOURCE_DIR: Final[str] = "src"
self._TECH_DIR: Final[str] = "technologies"
self._FULL_TECH_DIR: Final[pathlib.Path] = pathlib.Path(self._SOURCE_DIR).joinpath(self._TECH_DIR)
self._SCHEMA_FILE: Final[pathlib.Path] = pathlib.Path("schema.json")

def validate(self) -> None:
if not self._SCHEMA_FILE.is_file():
raise FileNotFoundError(f"Schema file '{self._SCHEMA_FILE}' not found!")
with self._SCHEMA_FILE.open("r", encoding="utf8") as f:
schema: dict = json.load(f)
for tech_file in sorted(self._FULL_TECH_DIR.iterdir()):
if not tech_file.name.endswith(".json"):
continue
with tech_file.open("r", encoding="utf8") as f:
technologies: dict = json.load(f)
try:
validate(instance=technologies, schema=schema)
except ValidationError as e:
path: str = " -> ".join(str(p) for p in e.absolute_path) if e.absolute_path else "root"
raise SchemaValidationException(f"{tech_file.name}: {e.message} (at {path})")


if __name__ == '__main__':
SchemaValidator().validate()
90 changes: 84 additions & 6 deletions .github/workflows/scripts/technology_validator.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,16 @@ def __init__(self, msg: str):
super().__init__(msg)


class TechNotFoundException(Exception):
def __init__(self, msg: str):
super().__init__(msg)


class InvalidURLException(Exception):
def __init__(self, msg: str):
super().__init__(msg)


class AbstractValidator:
def __init__(self, required: bool = False):
self._required = required
Expand Down Expand Up @@ -185,6 +195,27 @@ def get_type(self) -> list[Type]:
return [str]


class URLValidator(StringValidator):
def __init__(self, required: bool = False):
super().__init__(required)
self._url_pattern: Final[re.Pattern] = re.compile(
r"^https?://"
r"(?:[A-Za-z0-9](?:[A-Za-z0-9-]{0,61}[A-Za-z0-9])?\.)+"
r"[A-Za-z0-9-]{2,}"
r"(?::\d+)?"
r"(?:/[^\s]*)?"
r"$"
)

def _validate(self, tech_name: str, data: Any) -> bool:
if not super()._validate(tech_name, data):
return False
if not self._url_pattern.match(data):
self._set_custom_error(InvalidURLException(f"Tech '{tech_name}' has invalid URL: '{data}'"))
return False
return True


class BoolValidator(AbstractValidator):
def get_type(self) -> list[Type]:
return [bool]
Expand All @@ -200,6 +231,26 @@ def get_type(self) -> list[Type]:
return [dict]


class DNSValidator(DictValidator):
def _validate(self, tech_name: str, data: Any) -> bool:
if not super()._validate(tech_name, data):
return False
for k, v in data.items():
if not isinstance(k, str):
self._set_custom_error(InvalidKeyException(f"key in DNS for tech '{tech_name}' has an invalid type. 'str' is required, got type '{type(k).__name__}' -> '{k}'"))
return False
if not isinstance(v, list):
self._set_custom_error(InvalidKeyException(f"value in DNS for tech '{tech_name}' has an invalid type. 'list' is required, got type '{type(v).__name__}' -> '{v}'"))
return False
for record in v:
if not isinstance(record, str):
self._set_custom_error(InvalidTypeForFieldException(f"Invalid type for dns in tech '{tech_name}', selector '{v}' '{record}' key must be string!"))
return False
if not self._validate_regex(tech_name, record):
return False
return True


class CategoryValidator(ArrayValidator):
def __init__(self, categories: list[int], required: bool = False):
super().__init__(required)
Expand Down Expand Up @@ -297,6 +348,22 @@ def _validate(self, tech_name: str, data: Any) -> bool:
return True


class ReferenceValidator(ArrayValidator):
def __init__(self, all_techs: set[str]):
super().__init__()
self._all_techs: Final[set[str]] = all_techs

def _validate(self, tech_name: str, data: Any) -> bool:
if not super()._validate(tech_name, data):
return False
for ref in data:
clean_ref: str = ref.split(r"\;")[0]
if clean_ref not in self._all_techs:
self._set_custom_error(TechNotFoundException(f"Tech '{tech_name}' references '{clean_ref}' but it doesn't exist!"))
return False
return True


class TechnologiesValidator:
def __init__(self, file_name: str):
self._SOURCE_DIR: Final[str] = "src"
Expand All @@ -308,22 +375,23 @@ def __init__(self, file_name: str):
self._IMAGES_DIR: Final[str] = "images"
self._ICONS_DIR: Final[str] = "icons"
self._ICONS: Final[list[str]] = [icon.name for icon in pathlib.Path(self._SOURCE_DIR).joinpath(self._IMAGES_DIR).joinpath(self._ICONS_DIR).iterdir()]
self._ALL_TECHS: Final[set[str]] = self._get_all_tech_names()
self._validators: dict[str, AbstractValidator] = { # TODO confidence and version validator
"cats": CategoryValidator(self._CATEGORIES, True),
"website": StringValidator(True),
"website": URLValidator(True),
"description": StringValidator(),
"icon": IconValidator(self._ICONS),
"cpe": CPEValidator(),
"saas": BoolValidator(),
"oss": BoolValidator(),
"pricing": PricingValidator(),
"implies": ArrayValidator(), # TODO cat validation
"requires": ArrayValidator(), # TODO ^
"excludes": ArrayValidator(), # TODO ^
"implies": ReferenceValidator(self._ALL_TECHS),
"requires": ReferenceValidator(self._ALL_TECHS),
"excludes": ReferenceValidator(self._ALL_TECHS),
"requiresCategory": CategoryValidator(self._CATEGORIES),
"cookies": DictValidator(contains_regex=True),
"dom": DomValidator(),
"dns": DictValidator(contains_regex=True),
"dns": DNSValidator(contains_regex=True),
"js": DictValidator(contains_regex=True),
"headers": DictValidator(contains_regex=True),
"text": ArrayValidator(contains_regex=True),
Expand Down Expand Up @@ -365,6 +433,16 @@ def _duplicate_key_validator(cls, pairs: list[tuple[str, Any]]) -> dict[str, Any
result[key] = value
return result

def _get_all_tech_names(self) -> set[str]:
all_techs: set[str] = set()
for letter in list(string.ascii_lowercase) + ["_"]:
tech_file: pathlib.Path = self._FULL_TECH_DIR.joinpath(f"{letter}.json")
if tech_file.exists():
with tech_file.open("r", encoding="utf8") as f:
technologies: dict = json.load(f)
all_techs.update(technologies.keys())
return all_techs


class TechnologyProcessor:
def __init__(self, tech_name: str, tech_data: dict, validators: dict[str, AbstractValidator]):
Expand All @@ -389,4 +467,4 @@ def process(self) -> None:
if __name__ == '__main__':
# for letter in string.ascii_lowercase + "_":
# TechnologiesValidator(os.getenv("TECH_FILE_NAME", f"{letter}.json")).validate()
TechnologiesValidator(os.getenv("TECH_FILE_NAME", f"a.json")).validate()
TechnologiesValidator(os.getenv("TECH_FILE_NAME")).validate()
21 changes: 21 additions & 0 deletions .github/workflows/validate.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,27 @@ jobs:
- name: run structure validator
run: python3 .github/workflows/scripts/structure_validator.py

validate_schema:
runs-on: ubuntu-22.04
needs: validate_structure
strategy:
matrix:
python-version: [ "3.12" ]
steps:
- name: checkout repository
uses: actions/checkout@v4

- name: set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}

- name: install dependencies
run: python3 -m pip install jsonschema

- name: run schema validator
run: python3 .github/workflows/scripts/schema_validator.py

validate_categories:
runs-on: ubuntu-22.04
needs: validate_structure
Expand Down
10 changes: 6 additions & 4 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
.idea/

.project
.settings
.idea/

.project
.settings

.sync/
4 changes: 2 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# Contributing

WebAppAnalyzer is an [GPLv3 licensed](https://github.com/wappalyzer/wappalyzer/blob/master/LICENSE), open source project written in JavaScript. Anyone is welcome to contribute.
WebAppAnalyzer is an [GPLv3 licensed](https://github.com/enthec/webappanalyzer/blob/master/LICENSE), open source project written in JavaScript. Anyone is welcome to contribute.

## Getting started

To get started, see the [README](https://github.com/wappalyzer/wappalyzer/blob/master/README.md).
To get started, see the [README](https://github.com/enthec/webappanalyzer/blob/master/README.md).

## Adding a new technology

Expand Down
Loading