-
Notifications
You must be signed in to change notification settings - Fork 81
feat(evasive-transform): add meaning-preserving evasive transform for imports in strings #3026
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| @@ -0,0 +1,171 @@ | |||
| const evadeRegexp = /import\s*\(|<!--|-->/g; | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note this now provides multiple evasions in one go
| const multilineimport =\`␊ | ||
| console.log(${a})␊ | ||
| await im${""}port('some-module');␊ | ||
| console.log(${b})␊ | ||
| \`;␊ | ||
| ␊ | ||
| const taggedtemplate = String.dedent\`␊ | ||
| await import('some-module');␊ | ||
| \`;␊ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is there this difference between tagged vs. untagged template literal contents?
And as an aside, I think this test would be better if the input and expected output appeared in the same file (i.e., not using t.snapshot). It's valuable for high volume, but this use seems to come down on the wrong side of the tradeoff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the tag function would see an extra split and empty string value in it. Where without a tag the effects of having an extra hole filled with an empty string are unobservable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it's more code to get right, but if we're making this change, we should be thorough about it, e.g.:
const taggedtemplate = makeEvadedTemplateApplier(String.dedent)`
await ${IMPORT}('some-module');
`;where IMPORT is a unique global sentinel such as Symbol('import') and makeEvadedTemplateApplier is a global function like
const makeEvadedTemplateApplier =
fn =>
(T0, ...A0) => {
const argCount = A0.length;
const T1 = getTemplateMapping(T0) || [];
const args = [T1];
if (!T1.length) {
const raw0 = T0.raw;
const raw1 = [];
for (let i = 0, j = 0; i <= argCount; ++i, ++j) {
T1[j] = T0[i];
raw1[j] = raw0[i];
while (i < argCount && A0[i] === IMPORT) {
i++;
T1[j] += `import${T0[i]}`;
raw1[j] += `import${raw0[i]}`;
}
}
defineProperty(T1, 'raw', {
...{ writable: false, enumerable: false, configurable: false },
value: freeze(raw1),
});
setTemplateMapping(T0, freeze(T1));
}
for (let i = 0, j = 0; i < argCount; ++i) {
if (A0[i] !== IMPORT) args[++j] = A0[i];
}
return apply(fn, undefined, args);
};(which AFAIK is unobservable, until/unless https://github.com/tc39/proposal-array-is-template-object advances to Stage 4)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would have to put all that runtime in a single line of code, and it’s a lot. Might have to do some hard trade-off calculus. I’m in favor of making monotonic progress until we hit a cost cliff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know to what extent minification is on the table for injected helpers like the above, but if we are comfortable with it then the cost need be no more than 482 bytes, and only necessary when the source text being transformed has one or more tagged template literals with static contents that would be affected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's about the potential issues with sourcemaps or having to find a place to put it. Also, the level of complexity I'm not comfortable with introducing.
It's a complex fix for a very unlikely problem. And if a module contains a tagged template string with an import statement in it, it's likely problematic in many other ways.
HTML in tagged template strings is somewhat common, but as a virtual dom component, where HTML comments don't make sense.
I fail to find an example of tagged template strings containing hardcoded javascript with imports in it.
I'd like to avoid shipping a tagged template string transform for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I sure would like to avoid that complexity as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tentatively volunteered to co-champion it. Feed me your concerns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The primary concern is taking away the ability to locally virtualize tagged template literals, which is a steep price to pay for what seems to be negligible benefit.
This comment was marked as outdated.
This comment was marked as outdated.
8609251 to
6939389
Compare
… imports in strings
55d4269 to
5466fc4
Compare
gibson042
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR should update the evasive-transform README to mention that it also affects the representation of strings, regular expressions, and template literals (with caveats for last of those, particularly describing the gaps around cooked vs. raw and total disregard of tagged template literals).
| /* import comment ...IMPORT('some-module');*/␊ | ||
| const result = eval("...im"+"port('some-module'); await im"+"port(\\"other\\");");␊ | ||
| const result2 = eval("...im"+"port('some-module'); await im"+"port(\\"other\\");");␊ | ||
| const multilineimport =\`␊ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's strange loss of the space between = and \` here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's just babel. Even in whitespace-preserving mode this space is at the end of a line in some meaning, so it's probably not represented anywhere.
Are we worried?
Also, impressive you noticed it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not worried, just confused.
| /** | ||
| * Prevents `-->` from appearing in output by adding | ||
| * an empty block comment to force spacing. | ||
| * | ||
| * @param {import('@babel/traverse').NodePath} p | ||
| * @returns {void} | ||
| */ | ||
| export const evadeDecrementGreater = p => { | ||
| if ( | ||
| p.node.type === 'BinaryExpression' && | ||
| p.node.operator === '>' && | ||
| p.node.left.type === 'UpdateExpression' && | ||
| p.node.left.operator === '--' && | ||
| !p.node.left.trailingComments?.length | ||
| ) { | ||
| // Add an empty block comment to force a space between -- and > | ||
| p.node.left.trailingComments = [{ type: 'CommentBlock', value: '' }]; | ||
| } | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments can be stripped by downstream tools, recreating the problem—$x-->$y → [$x--][0]>$y or (0,$x--)>$y is better IMO (cf. #1837 (comment) ).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But evasive transforms are applied to code last by design, before feeding code to SES. I don't know of a usecase for evasive transforms where the resulting string gets modified again.
We could still switch to it.
The only argument against is it could impact optimizations/inlining by pushing code across a complexity threshold or not matching a pattern someone hardcoded as optimizable. Is that a concern?
Should I go to the issue and comment there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But evasive transforms are applied to code last by design, before feeding code to SES. I don't know of a usecase for evasive transforms where the resulting string gets modified again.
I think we had one at Agoric, but my memory is fuzzy.
We could still switch to it. The only argument against is it could impact optimizations/inlining by pushing code across a complexity threshold or not matching a pattern someone hardcoded as optimizable. Is that a concern?
I certainly don't consider that a concern.
Should I go to the issue and comment there?
Feel free, but we're all aware that [$x--][0]>$y is slightly more expensive than $x--/**/>$y (it's just unlikely for that extra cost to matter in realistic code).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've had a conversation at the Endo meeting that concluded in "if we doublecheck that there's no config to define a transform that runs after this one in Endo it's fine to use a comment"
Since you were not present, it's definitely not final.
I am concerned about the optimization and inlining
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I'm in a microbenchmarking mood, I ran the numbers—[$x--][0]>$y seems to take 25% to 70% longer than $x--/**/>$y in browser engines, but both forms are about as fast as anything can be so performance is not the relevant consideration here. However, there appears to be negligible overhead (if any) for (0,$x--)>$y:
#### V8
comment sync 33.9e+3 ops/ms (29.5e+0 ns/op) after 414 245760-count observations
comma sync 33.7e+3 ops/ms (29.6e+0 ns/op) after 412 245760-count observations
array sync 24.7e+3 ops/ms (40.4e+0 ns/op) after 278 267429-count observations (1 restart)
#### SpiderMonkey
comment sync 6977.19 ops/ms (143e+0 ns/op) after 137 153600-count observations
comma sync 6927.05 ops/ms (144e+0 ns/op) after 64 329143-count observations
array sync 5445.73 ops/ms (184e+0 ns/op) after 107 153600-count observations
#### JavaScriptCore
comment sync 7629.27 ops/ms (131e+0 ns/op) after 94 245760-count observations
comma sync 7685.11 ops/ms (130e+0 ns/op) after 94 245760-count observations
array sync 4557.14 ops/ms (219e+0 ns/op) after 112 122880-count observations
Details
for engine in V8 SpiderMonkey JavaScriptCore; do
esbench.mjs -h "$engine" -b3 \
-s '
const length = 10000, numbers = Array.from({ length }, (_, i) => i);
let i = length - 1;
' \
comment:'result = numbers[++i % length]; result = result--/**/>result;' \
comma:'result = numbers[++i % length]; result = (0,result--)>result;' \
array:'result = numbers[++i % length]; result = [result--][0]>result;'
doneSo I'd advocate for the comma expression transformation, although this is not a hill worth dying on and I don't have a strong objection to comment-injection either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the comma transform. What are the advantages of comment injection over comma transform?
|
Co-authored-by: Richard Gibson <richard.gibson@gmail.com>
…ansform prior to refactor
09e78c0 to
a32be3c
Compare
Description
With compartment-mapper introducing support for dynamic imports, the old regexp-based ways in which we evaded import censorship in SES are no longer viable and AST based transform can be the only meaning-preserving one.
Open questions
import (- with whitespace? We could add complexity to the transform or assume they're rare enough and let them hit censorship.splitby regexp unless somebody stops meimport(that are left need not be preserved? Could we use a simple regexp-replace there for a cheaper way to get the same result?eval('var a = import("somemodule")')or intend to ?Security Considerations
Scaling Considerations
AST transform is slower than regexp, but for cases where
import(doesn't exist in a string, it could be faster than a regexp search o the entire string.Documentation Considerations
Testing Considerations
Compatibility Considerations
with the transform being meaning-preserving, this should not be breaking for anyone. Also, considering opt-in