-
Notifications
You must be signed in to change notification settings - Fork 29
Description
The buggy behavior, which only happens in a busybox sed which uses musl's libc (take a look in alpine:latest, with apk add sed), is that the leading quotation marks are not removed (this is counter to the POSIX regex spec):
→ keyword=hello
→ LINE="$keyword 'foobar'" # a trivial use with quotes
→ echo $LINE | busybox sed -n "/$keyword / s/['\";]*\$//;s/^[ ]*\(: _\)*$keyword ['\"]*\([^([].*\)*\$/\2/p" # musl bug
'foobar
→ echo $LINE | sed -n "/$keyword / s/['\";]*\$//;s/^[ ]*\(: _\)*$keyword ['\"]*\([^([].*\)*\$/\2/p"
foobarI'd like to suggest an edit to this sed-fu, but I'm not certain which parts of this functionality are intentional, and which are accidental. The guiding principle seems to be "this should work a little like echo, but we definitely do not want to eval"
The behavior of the current sed-fu appears to be:
- Search for the keyword (we will only print lines that match it)
- The
keywordmust be the first word on the line, but it can be: _keywordbecause of Allow functions with metadata to run before load- The keyword and the whitespace preceding it are removed
- All leading and trailing quotes on the argument are removed (even if they are of different lengths [0], or are not symmetrical [1], though we'd expect the outer pair to match in normal usage)
- A trailing semicolon is removed, whether or not [2] it is within the quotes
([^([].*\)*looks like it's trying to do something like exclude open braces, but all that does is start the match earlier [3]- Attempting to quote two different arguments is interpreted as allowed internal quotes [4]
- The character immediately following the keyword must be exactly a space. Tabs are disallowed. (
typesetprobably normalizes this so it wouldn't matter)
→ LINE="$keyword 'foobar;\"''\""; !echo # Allowed: uneven[0], mismatched[1] quotes and a semicolon within the quotes [2]
foobar
→ LINE="$keyword '(foobar;'"; !echo # Odd: A leading paren in the quotes [3], semicolon within the quotes [2]
'(foobar
→ LINE="$keyword 'foo' 'bar'"; !echo # Unsupported: two different quoted arguments [4]
foo' 'barI'm treading on Chesterton's fence, which prohibits me from removing these unless I know what they're for. But several of these don't look like they do anything for us. If they don't, I'd propose a shorter sed script:
sed -En "s/['\"]?;?\$//g; s/^\s*(:\s+_)?${keyword}\s+['\"]?//p"This sed invocation does not break in musl, but it also makes several opinionated changes. I don't want to pass over them without calling them out:
- Only one quote is removed from the start and end - quotes are still allowed to mismatch in kind and number, but that number is only zero or one.
- The trailing semicolon is not removed if it's within the quotation marks
- Whatever
([^([].*\)*is trying to do; it doesn't - Whitespace is more flexible - no constraints on spaces vs. tabs or how many of them there are (beyond syntax requirements). This can harm readability where tabs are never expected,
→ keyword=hello
→ LINE="$keyword 'foobar'" # a trivial use with quotes
→ echo $LINE | busybox sed -n "/$keyword / s/['\";]*\$//;s/^[ ]*\(: _\)*$keyword ['\"]*\([^([].*\)*\$/\2/p" # musl bug
'foobar
→ echo $LINE | busybox sed -En "s/['\"]?;?\$//g; s/^\s*(:\s+_)?${keyword}\s+['\"]?//p" # worked around
foobarDo these changes seem like improvements, or do they miss the mark?