Add analyzer and fix to escape non-ASCII literal values #450

j3parker · 2018-11-16T22:58:31Z

No description provided.

j3parker · 2018-11-16T22:58:57Z

Inspired by https://github.com/dotnet/codeformatter/blob/master/src/Microsoft.DotNet.CodeFormatting/Rules/NonAsciiCharactersAreEscapedInLiteralsRule.cs

j3parker · 2018-11-16T23:00:17Z

src/D2L.CodeStyle.Analyzers/Language/EscapeNonAsciiCharsInLiteralsAnalyzer.cs

+
+			var newDoc = doc.WithSyntaxRoot( newRoot );
+
+			return Task.FromResult( newDoc );


string foo = "αβγ";

-->

string foo = /* unencoded: "αβγ" */ "\u03B1\u03B2\u03B3";

The comment could get out of sync. Options:

Don't do this

Write an analyzer to verify they are in sync (not hard, but probably not worth it.)

It's fine, it's not likely to get through code review

Thoughts?

There is no chance that I'll notice an out-of-sync comment in a review. I'd lean towards #1 unless there lots of instances where the specific value is user-facing (seems unlikely since user-facing values should be langTerms, not string constants).

Alternately, is there a visualizer that would work here? E.g. it'd be handy if VS could overlay the "rendered" value of the string.

I'll just remove it.

cpacey · 2018-11-19T14:25:25Z

src/D2L.CodeStyle.Analyzers/Diagnostics.cs

 			description: "The parameter {0} has a default value of {1} here, but {2} in its original definition in {3}. This causes inconsistent behaviour. Please use the same defualt value everywhere."
 		);
+
+		public static readonly DiagnosticDescriptor EscapeNonAsciiCharsInLiteral = new DiagnosticDescriptor(


Consider providing a description of (or link to) what can go wrong without this.

cpacey · 2018-11-19T14:27:54Z

src/D2L.CodeStyle.Analyzers/Language/EscapeNonAsciiCharsInLiteralsAnalyzer.cs

+			var literalExpr = (LiteralExpressionSyntax)ctx.Node;
+
+			// We can't handle verbatim strings for the same reason as these
+			// folks: https://github.com/dotnet/codeformatter/issues/39 (TODO)


I found this quite interesting.

cpacey · 2018-11-19T14:30:45Z

src/D2L.CodeStyle.Analyzers/Language/EscapeNonAsciiCharsInLiteralsAnalyzer.cs

+			var copyStartIdx = 0;
+
+			// invariant: copyStartIdx < idx
+			// Note: the enclosing quotes are included in val; don't bother looking at them.


What is val?

Thanks, variable rename

cpacey · 2018-11-19T14:31:04Z

src/D2L.CodeStyle.Analyzers/Language/EscapeNonAsciiCharsInLiteralsAnalyzer.cs

+			// invariant: copyStartIdx < idx
+			// Note: the enclosing quotes are included in val; don't bother looking at them.
+			for( int idx = 1; idx < token.Length - 1; idx++ ) {
+				if( token[idx] < 0x80 ) {


lol - so simple

cpacey · 2018-11-19T15:21:39Z

src/D2L.CodeStyle.Analyzers/Language/EscapeNonAsciiCharsInLiteralsAnalyzer.cs

+
+				// copy all the ascii chars we've seen between the last copy
+				// and now (not inclusive) into sb
+				sb.Append( token, copyStartIdx, idx - copyStartIdx );


This seems more complicated than copying char-at-a-time, and I'm not sure I see the advantage given that you're using StringBuffer.

It's "basically a no-op" when the string is ASCII. Microsoft did it with two passes (one to detect nonASCIIness, one to do the copying) which did char-by-char copying in the second step. It's technically more work to do that (two passes, and char by char is still worse than chunks at a time) but its so neglible and only happens during errors anyway so really not interesting.

Honestly I just wrote it this way because it came out of my head this way. I'll look at it again.

cpacey · 2018-11-19T15:23:44Z

src/D2L.CodeStyle.Analyzers/Language/EscapeNonAsciiCharsInLiteralsAnalyzer.cs

+			return false;
+		}
+
+		private static bool IsSurrogatePair( string str, int idx ) {


Why not use char.IsSurrogatePair?

D'oh! Thanks

cpacey · 2018-11-19T16:08:19Z

tests/D2L.CodeStyle.Analyzers.Test/Specs/EscapeNonAsciiCharsInLiteralsAnalyzer.cs

+			/* EscapeNonAsciiCharsInLiteral(string,"\u284C\u2801\u2827\u2811 \u283C\u2801\u2812 \u284D\u281C\u2807\u2811\u2839\u2830\u280E \u2863\u2815\u280C") */ "⡌⠁⠧⠑ ⠼⠁⠒ ⡍⠜⠇⠑⠹⠰⠎ ⡣⠕⠌" /**/;
+
+		// This one hits the branch for surrogate pairs
+		const string MormonTwinkleTwinkleLittleStar =


Add analyzer and fix to escape non-ASCII literal values

5548205

j3parker requested a review from cpacey November 16, 2018 22:58

j3parker commented Nov 16, 2018

View reviewed changes

cpacey approved these changes Nov 19, 2018

View reviewed changes

j3parker requested review from mthjones and omsmith as code owners November 20, 2020 00:06

mthjones removed their request for review March 10, 2021 15:33


		var newDoc = doc.WithSyntaxRoot( newRoot );

		return Task.FromResult( newDoc );

Add analyzer and fix to escape non-ASCII literal values #450

Are you sure you want to change the base?

Add analyzer and fix to escape non-ASCII literal values #450

Uh oh!

Conversation

j3parker commented Nov 16, 2018

Uh oh!

j3parker commented Nov 16, 2018

Uh oh!

j3parker Nov 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

j3parker Nov 16, 2018 •

edited

Loading