-
Notifications
You must be signed in to change notification settings - Fork 22
Add analyzer and fix to escape non-ASCII literal values #450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
|
||
| var newDoc = doc.WithSyntaxRoot( newRoot ); | ||
|
|
||
| return Task.FromResult( newDoc ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
string foo = "αβγ";-->
string foo = /* unencoded: "αβγ" */ "\u03B1\u03B2\u03B3";The comment could get out of sync. Options:
- Don't do this
- Write an analyzer to verify they are in sync (not hard, but probably not worth it.)
- It's fine, it's not likely to get through code review
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no chance that I'll notice an out-of-sync comment in a review. I'd lean towards #1 unless there lots of instances where the specific value is user-facing (seems unlikely since user-facing values should be langTerms, not string constants).
Alternately, is there a visualizer that would work here? E.g. it'd be handy if VS could overlay the "rendered" value of the string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll just remove it.
| description: "The parameter {0} has a default value of {1} here, but {2} in its original definition in {3}. This causes inconsistent behaviour. Please use the same defualt value everywhere." | ||
| ); | ||
|
|
||
| public static readonly DiagnosticDescriptor EscapeNonAsciiCharsInLiteral = new DiagnosticDescriptor( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider providing a description of (or link to) what can go wrong without this.
| var literalExpr = (LiteralExpressionSyntax)ctx.Node; | ||
|
|
||
| // We can't handle verbatim strings for the same reason as these | ||
| // folks: https://github.com/dotnet/codeformatter/issues/39 (TODO) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found this quite interesting.
| var copyStartIdx = 0; | ||
|
|
||
| // invariant: copyStartIdx < idx | ||
| // Note: the enclosing quotes are included in val; don't bother looking at them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is val?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, variable rename
| // invariant: copyStartIdx < idx | ||
| // Note: the enclosing quotes are included in val; don't bother looking at them. | ||
| for( int idx = 1; idx < token.Length - 1; idx++ ) { | ||
| if( token[idx] < 0x80 ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol - so simple
|
|
||
| // copy all the ascii chars we've seen between the last copy | ||
| // and now (not inclusive) into sb | ||
| sb.Append( token, copyStartIdx, idx - copyStartIdx ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems more complicated than copying char-at-a-time, and I'm not sure I see the advantage given that you're using StringBuffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's "basically a no-op" when the string is ASCII. Microsoft did it with two passes (one to detect nonASCIIness, one to do the copying) which did char-by-char copying in the second step. It's technically more work to do that (two passes, and char by char is still worse than chunks at a time) but its so neglible and only happens during errors anyway so really not interesting.
Honestly I just wrote it this way because it came out of my head this way. I'll look at it again.
| return false; | ||
| } | ||
|
|
||
| private static bool IsSurrogatePair( string str, int idx ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use char.IsSurrogatePair?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
D'oh! Thanks
| /* EscapeNonAsciiCharsInLiteral(string,"\u284C\u2801\u2827\u2811 \u283C\u2801\u2812 \u284D\u281C\u2807\u2811\u2839\u2830\u280E \u2863\u2815\u280C") */ "⡌⠁⠧⠑ ⠼⠁⠒ ⡍⠜⠇⠑⠹⠰⠎ ⡣⠕⠌" /**/; | ||
|
|
||
| // This one hits the branch for surrogate pairs | ||
| const string MormonTwinkleTwinkleLittleStar = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha!
No description provided.