-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
While doing tests, I found that some of the whois query is formatted differently. In general, we are parsing name:value. In this case, there might be a line break. Here's an example output:
Domain: google.de
Status: connect
Domain name:
bbc.co.uk
Data validation:
Nominet was able to match the registrant's name and address against a 3rd party data source on 12-Jun-2014
Registrar:
British Broadcasting Corporation [Tag = BBC]
URL: https://www.bbc.co.uk
Relevant dates:
Registered on: before Aug-1996
Expiry date: 13-Dec-2025
Last updated: 10-Dec-2020
Registration status:
Registered until expiry date.
Name servers:
ddns0.bbc.co.uk 148.163.199.1 2607:f740:e04e::1
ddns0.bbc.com
ddns1.bbc.co.uk 148.163.199.65 2607:f740:e04e:4::1
ddns1.bbc.com
dns0.bbc.co.uk 198.51.44.9 2620:4d:4000:6259:7:9:0:1
dns0.bbc.com
dns1.bbc.co.uk 198.51.45.9 2a00:edc0:6259:7:9::2
dns1.bbc.com
WHOIS lookup made at 06:22:57 18-Oct-2024
The tokenizer needs to be updated to detect this. From what I can tell:
- First line is marked as
name: - Second line is the data.
- Line break indicates new item
There are other situations that do not comply to that ruleset:
Domain: google.it
Status: ok
Signed: no
Created: 1999-12-10 00:00:00
Last Update: 2024-09-27 00:50:20
Expire Date: 2025-04-21
Registrant
Organization: Google Ireland Holdings Unlimited Company
Address: 70 Sir John Rogerson's Quay
Dublin
2
Dublin
IE
Created: 2018-03-02 19:04:02
Last Update: 2018-03-02 19:04:02
Admin Contact
Name: Colm Buckley
Organization: Google LLC
Address: 1600 Amphitheatre Parkway
Mountain View
94043
CA
US
Created: 2024-09-27 00:44:25
Last Update: 2024-09-27 00:44:25
Technical Contacts
Name: Domain Administrator
Organization: Google LLC
Address: 1600 Amphitheatre Parkway
Mountain View
94043
CA
US
Created: 2017-12-21 19:54:04
Last Update: 2017-12-21 19:54:04
Registrar
Organization: MarkMonitor International Limited
Name: MARKMONITOR-REG
Web: https://www.markmonitor.com/
DNSSEC: no
Nameservers
ns1.google.com
ns2.google.com
ns3.google.com
ns4.google.com
This is similar to above, but here we have a situation where each section has a header (such as Technical Contacts). And in some cases, such as Address, there are multiple lines.
We need to build cases to handle this in the tokenizer. Attached is a list of whois queries for 100 different tld's.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working
Projects
Status
In Progress