Skip to content

metafor's "sed-fu" seems overly complicated. Triggers bug in busybox with musl. Can it be simplified? #38

@jakebman

Description

@jakebman

The buggy behavior, which only happens in a busybox sed which uses musl's libc (take a look in alpine:latest, with apk add sed), is that the leading quotation marks are not removed (this is counter to the POSIX regex spec):

→ keyword=hello
→ LINE="$keyword 'foobar'" # a trivial use with quotesecho $LINE | busybox sed -n "/$keyword / s/['\";]*\$//;s/^[ 	]*\(: _\)*$keyword ['\"]*\([^([].*\)*\$/\2/p" # musl bug
'foobar
→ echo $LINE | sed -n "/$keyword / s/['\";]*\$//;s/^[ 	]*\(: _\)*$keyword ['\"]*\([^([].*\)*\$/\2/p"
foobar

I'd like to suggest an edit to this sed-fu, but I'm not certain which parts of this functionality are intentional, and which are accidental. The guiding principle seems to be "this should work a little like echo, but we definitely do not want to eval"

The behavior of the current sed-fu appears to be:

  • Search for the keyword (we will only print lines that match it)
  • The keyword must be the first word on the line, but it can be : _keyword because of Allow functions with metadata to run before load
  • The keyword and the whitespace preceding it are removed
  • All leading and trailing quotes on the argument are removed (even if they are of different lengths [0], or are not symmetrical [1], though we'd expect the outer pair to match in normal usage)
  • A trailing semicolon is removed, whether or not [2] it is within the quotes
  • ([^([].*\)* looks like it's trying to do something like exclude open braces, but all that does is start the match earlier [3]
  • Attempting to quote two different arguments is interpreted as allowed internal quotes [4]
  • The character immediately following the keyword must be exactly a space. Tabs are disallowed. (typeset probably normalizes this so it wouldn't matter)
→ LINE="$keyword 'foobar;\"''\""; !echo # Allowed: uneven[0], mismatched[1] quotes and a semicolon within the quotes [2]
foobar
→ LINE="$keyword '(foobar;'"; !echo # Odd: A leading paren in the quotes [3], semicolon within the quotes [2]
'(foobar
  → LINE="$keyword 'foo' 'bar'"; !echo # Unsupported: two different quoted arguments [4]
foo' 'bar

I'm treading on Chesterton's fence, which prohibits me from removing these unless I know what they're for. But several of these don't look like they do anything for us. If they don't, I'd propose a shorter sed script:

sed -En "s/['\"]?;?\$//g; s/^\s*(:\s+_)?${keyword}\s+['\"]?//p"

This sed invocation does not break in musl, but it also makes several opinionated changes. I don't want to pass over them without calling them out:

  • Only one quote is removed from the start and end - quotes are still allowed to mismatch in kind and number, but that number is only zero or one.
  • The trailing semicolon is not removed if it's within the quotation marks
  • Whatever ([^([].*\)* is trying to do; it doesn't
  • Whitespace is more flexible - no constraints on spaces vs. tabs or how many of them there are (beyond syntax requirements). This can harm readability where tabs are never expected,
→ keyword=hello
→ LINE="$keyword 'foobar'" # a trivial use with quotesecho $LINE | busybox sed -n "/$keyword / s/['\";]*\$//;s/^[ 	]*\(: _\)*$keyword ['\"]*\([^([].*\)*\$/\2/p" # musl bug
'foobar
→ echo $LINE | busybox sed -En "s/['\"]?;?\$//g; s/^\s*(:\s+_)?${keyword}\s+['\"]?//p" # worked around
foobar

Do these changes seem like improvements, or do they miss the mark?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions