Right now parsing is being done using regex. We should use the inbuilt `html.parser` instead. Python 2: https://docs.python.org/2/library/htmlparser.html Python 3: https://docs.python.org/3/library/html.parser.html