Skip to content

Datafields which happen to hold data like '&.+$' (regex - ampersands) are not escaped and destroy XML output #118

@johannespostler

Description

@johannespostler

Any datafield in the database that holds data that is captured with the regex '&.+$' breaks the output of all XSAMS files. This regex fits to all HTML entities e.g. ä. If one of these is outputted through regex, they are not escaped, therefore breaking most browsers and the validator (if it doesn't happen to be a html entity). Browsers expect a semicolon as the sixth character after the ampersand.

Testcase:
http://ideadb.uibk.ac.at/view/107/

The url field of this scan contains the following characters (within the link):
52fed736-74fc-11e2-9a8e-00000aacb35f&acdnat=1360663964_abbc8fd43c6ff547c477bb7648e5250d

Since this is a rather common pattern for URLs this is a problem.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions