Unescape unnecessarily escaped characters in strings #1575

josephfrazier · 2017-05-10T13:30:06Z

In 0b6d19d,
I noticed that makeString doesn't unescape unnecessarily escaped
non-quote characters. This change simply adds a test for that.

In prettier@0b6d19d, I noticed that `makeString` doesn't unescape unnecessarily escaped non-quote characters. This change simply adds a test for that.

Unfortunately, this breaks a couple of other tests...

lydell · 2017-05-10T13:58:04Z

Prettier actually used to use jsesc once, but it is too aggressive. Sometimes, people want the actual unicode characters for stuff (such as Chinese), and sometimes people want unicode escapes (such as the non-breaking space \u00a0). It was decided that it is better to leave all escapes alone (except quotes). See #229 and #355 and the issues the link to.

(Another reason for not touching escapes is because it can't be done in tagged template literals and regexes. See #574.)

Example:

> require('jsesc')('\u00a0', { quotes: "double", wrap: true, minimal: true })
'" "'
> // Oh no, my escape was turned into a unicode character!

See prettier#1575 (comment) This reverts commit d056525.

josephfrazier · 2017-05-10T14:25:51Z

Thanks for the explanation/context about jsesc, I see why it's not ideal for this use case. I still think there's a case for removing unnecessary escapes in strings, given that makeString is used specifically to unescape unnecessarily escaped quotes (here), so I'll look into alternative ways of fixing this.

vjeux · 2017-05-10T17:09:49Z

src/printer.js

@@ -3859,6 +3859,10 @@ function makeString(rawContent, enclosingQuote) {
  // Matches _any_ escape and unescaped quotes (both single and double).
  const regex = /\\([\s\S])|(['"])/g;

+  // Matches any unnecessarily escaped character.
+  // Adapted from https://github.com/eslint/eslint/blob/de0b4ad7bd820ade41b1f606008bea68683dc11a/lib/rules/no-useless-escape.js#L27
+  const regexUnnecessaryStringEscapes = /(^|[^\\])\\([^\\nrvtbfux\r\n\u2028\u2029"'0])/;


@lydell or @bakkot, could you review this piece of code? I'm very scared that this may introduce bugs.

This is missing some cases from Annex B, which extends the definition of EscapeSequence to include octals in sloppy mode. So the latter character class needs to end with 0-7, not 0. (In strict mode, octal escapes other than \0 are a syntax error, so no further accounting for them is necessary.)

Otherwise I believe this captures all of the characters for which putting a backslash before them does something other than just giving you the character.

However, this still isn't right, since it won't match, e.g., "\\\a": the initial "not preceded by a backslash" logic prevents it. That won't cause bugs, but it will cause some unnecessary escapes to be preserved.

I think you want /((?:^|[^\\])(?:\\\\)*)\$[^\\nrvtbfux\r\n\u2028\u2029"'0-7])/, which is "an odd number of backslashes followed by a character other than one which is meaningful to escape".

Not tested thoroughly, but at least:

let r = /((?:^|[^\\])(?:\\\$*)\\([^\\nrvtbfux\r\n\u2028\u2029"'0-7])/; String.raw`a\\\\\a`.replace(r, (match, prev, escaped) => prev + escaped) === String.raw`a\\\\a`; // true

Thanks! I had gotten as including the (?:\\\\)*, but I forgot about checking the previous character to make sure it's not a slash. I've incorporated the suggested changes as of 20dc94b, ~~but haven't quite gotten one test to pass... https://travis-ci.org/prettier/prettier/jobs/230873975#L1082~~ EDIT: nevermind, it works. See eab837b

Just noticed my solution doesn't quite work either:

String.raw`\a\a`.replace(r, (match, prev, escaped) => prev + escaped) === String.raw`a\a`

You need lookbehinds, but they're not in the language yet.

I fear this might not be doable with a regex, at least in a single pass.

(A hack is to just keep applying the replacement until things stop changing, but that's not ideal.)

https://github.com/prettier/prettier/pull/1575/files#r115860741

vjeux · 2017-05-10T17:11:12Z

tests/quotes/__snapshots__/jsfmt.spec.js.snap

+"\\a"
+'\\a'
+"hol\\a"
+'hol\\a'


Could you beef up your test plan quite dramatically? You are introducing a very error prone piece of logic in this pull request, it would give me confidence if you added tests for all the possible combinations you are adding.

Yeah, this definitely needs to be thoroughly tested. I added some more tests in f04b200, let me know if you spot any holes.

EDIT: oops, make that 7fc929c (I had a 'wip' commit in there...)

See prettier#1575 (comment)

…lash This just allows an even number of backslashes to precede the unnecessary one. See prettier#1575 (comment)

…in string See prettier#1575 (comment)

…in string This breaks another test though... See prettier#1575 (comment)

… in string This breaks another test though... See prettier#1575 (comment)

It turns out the test wasn't broken, I had just flubbed the escaping in the snapshot. The easiest way to see that this actually works is ```bash $ cat | prettier --stdin "hol\\a (the a is not escaped)" // press Control-D after the newline "hol\\a (the a is not escaped)"; // press Control-D after the newline ```

bakkot · 2017-05-10T18:41:13Z

tests/quotes/strings.js

@@ -31,6 +31,22 @@
 // Unnecessary escapes.


This file needs to start with something other than a string - as written, these are all directives, not strings.

Also, you need to make sure this substitution isn't applied to directives; see #1555.

Good call, I cherry-picked the relevant change over from #1571 (29bc75a). Once this PR has stabilized, I can port the tests over to #1571, since that's where we're actually making sure that directives don't get transformed.

See prettier#1575 (comment) (cherry picked from commit 126e56a)

See prettier#1575 (comment)

See prettier#1575 (comment) This looping is hacky. We might be able to emulate lookbehind instead.

vjeux · 2017-05-10T19:12:14Z

src/printer.js

+  let minimallyEscapedContent = newContent;
+  let minimallyEscapedContentPrev;
+
+  while (minimallyEscapedContent !== minimallyEscapedContentPrev) {


Could you add a max iteration length as a safe guard. I don't want it to go into an infinite loop if there's an edge case that is not well supported.

That makes sense. See 7a4cfa8

Not sure how expensive string comparison is here... See prettier@2323c8c#commitcomment-22092267

See prettier#1575 (comment)

bakkot · 2017-05-10T19:45:17Z

src/printer.js

+  const regexUnnecessaryStringEscapes = /((?:^|[^\\])(?:\\\\)*)\\([^\\nrvtbfux\r\n\u2028\u2029"'0-7])/g;
+
+  let minimallyEscapedContent = newContent;
+  let minimallyEscapedContentPrev;


This isn't used, yes?

Edit: also, not sure you need minimallyEscapedContent at all; presumably you could just mutate newContent.

Heh, oops. Times like these make me wish this project had automated linting :P

Fixed in 8cb1e1a

See prettier#1575 (comment)

vjeux · 2017-05-10T19:55:01Z

src/printer.js

+
+  let maxIterations = newContent.length;
+  let touched = true;
+  while (touched && maxIterations > 0) {


I'm curious, why do you need a loop here, you're only going to remove at most one \ per entity no?

Due to the limitations of the JS regex engine, there are edge cases that aren't replaced all at once. See #1575 (comment)

EDIT: Hmm, that link isn't taking me to the comment. Anyway, it was from @bakkot, saying the following:

Just noticed my solution doesn't quite work either:

String.raw`\a\a`.replace(r, (match, prev, escaped) => prev + escaped) === String.raw`a\a`

You need lookbehinds, but they're not in the language yet.

I fear this might not be doable with a regex, at least in a single pass.

(A hack is to just keep applying the replacement until things stop changing, but that's not ideal.)

See prettier#1575 (comment)

lydell · 2017-05-10T21:45:54Z

src/printer.js

@@ -3880,7 +3880,29 @@ function makeString(rawContent, enclosingQuote) {
    return match;


Isn't it enough to simply change the above lines to this and not do anything else?

if (quote) { return quote; } return /^[^\\nrvtbfux\r\n\u2028\u2029"'0-7]$/.test(escaped) ? escaped : '\\' + escaped

(Add comments and extract variables as needed.)

Huh, how about that! Fixed in 8934399

I don't yet completely understand how this works without using lookbehind, but it passes the tests.

I don't yet completely understand how this works without using lookbehind

It solves the "odd number of backslashes" problem by incrementally consuming pairs of backslashes. A much nicer solution, really.

Yep, that's a very useful regex trick that has helped me many times :)

Thanks for the explanation, @bakkot.

@lydell

Kudos to @lydell for figuring this out! See https://github.com/prettier/prettier/pull/1575/files#r115860741

vjeux · 2017-05-10T22:38:19Z

Is this ready to be merged?

josephfrazier · 2017-05-10T22:41:24Z

I don't have anything to add, assuming the tests are comprehensive enough 👍

josephfrazier · 2017-05-10T22:42:19Z

Oh wait, we still need to test that directives don't get unescaped.

See: * prettier#1575 (comment) * prettier#1575 (comment)

josephfrazier · 2017-05-10T23:02:28Z

Alright, the directive tests look good (note the difference in the snapshots). I think this is ready, @vjeux.

vjeux · 2017-05-10T23:02:55Z

Awesome, thanks a lot!!

josephfrazier added 2 commits May 10, 2017 09:27

Add test with unnecessarily escaped non-quote character

ebe3ef4

In prettier@0b6d19d, I noticed that `makeString` doesn't unescape unnecessarily escaped non-quote characters. This change simply adds a test for that.

Fix test with unnecessarily escaped non-quote character

d056525

Unfortunately, this breaks a couple of other tests...

Revert "Fix test with unnecessarily escaped non-quote character"

4979055

See prettier#1575 (comment) This reverts commit d056525.

Unescape unnecessarily escaped characters in strings

8d15aa9

josephfrazier changed the title ~~(WIP) Add test with unnecessarily escaped non-quote character~~ Unescape unnecessarily escaped characters in strings May 10, 2017

josephfrazier added 2 commits May 10, 2017 10:55

Add test for unnecessarily escaped character at not-beginning of string

3c935b7

Fix test for unnecessarily escaped character at not-beginning of string

7151246

vjeux reviewed May 10, 2017

View reviewed changes

josephfrazier added 9 commits May 10, 2017 13:28

Merge branch 'master' into unnecessarily-escaped

f81f20b

Add test for multiple unnecessary escapes in strings

6aacd39

Pass test for multiple unnecessary escapes in strings

b7615e8

Add test for octal escapes in strings

c534057

See prettier#1575 (comment)

Pass test for octal escapes in strings

4c9be10

See prettier#1575 (comment)

Add test for unnecessarily escaped character preced by escaped backslash

5324d88

See prettier#1575 (comment)

Pass test for unnecessarily escaped character preced by escaped backs…

b5cbe6a

…lash This just allows an even number of backslashes to precede the unnecessary one. See prettier#1575 (comment)

Add test for unescaped character after escaped backslash in strings

94b3248

Add test for unescaped character preceded by two escaped backslashes …

021dd17

…in string See prettier#1575 (comment)

josephfrazier added a commit to josephfrazier/prettier that referenced this pull request May 10, 2017

Add test for unescaped character preceded by two escaped backslashes …

e2387ef

…in string This breaks another test though... See prettier#1575 (comment)

josephfrazier force-pushed the unnecessarily-escaped branch from e2387ef to 20dc94b Compare May 10, 2017 18:26

josephfrazier added 2 commits May 10, 2017 14:26

Pass test for unescaped character preceded by two escaped backslashes…

20dc94b

… in string This breaks another test though... See prettier#1575 (comment)

bakkot reviewed May 10, 2017

View reviewed changes

josephfrazier added 3 commits May 10, 2017 14:48

Prevent test strings from being parsed as directives

29bc75a

See prettier#1575 (comment) (cherry picked from commit 126e56a)

Add test for consecutive unnecessarily escaped characters in strings

2e25f4b

See prettier#1575 (comment)

Pass test for consecutive unnecessarily escaped characters in strings

2323c8c

See prettier#1575 (comment) This looping is hacky. We might be able to emulate lookbehind instead.

vjeux reviewed May 10, 2017

View reviewed changes

josephfrazier added 2 commits May 10, 2017 15:32

Optimize (maybe?) string unescaping loop

5725d48

Not sure how expensive string comparison is here... See prettier@2323c8c#commitcomment-22092267

Safeguard against string unescaping loop hanging

7a4cfa8

See prettier#1575 (comment)

bakkot reviewed May 10, 2017

View reviewed changes

josephfrazier added a commit to josephfrazier/prettier that referenced this pull request May 10, 2017

Add more comprehensive tests for unnecessary string escapes

f04b200

See prettier#1575 (comment)

vjeux reviewed May 10, 2017

View reviewed changes

josephfrazier force-pushed the unnecessarily-escaped branch from f04b200 to 7fc929c Compare May 10, 2017 19:55

josephfrazier added 3 commits May 10, 2017 15:55

Add more comprehensive tests for unnecessary string escapes

7fc929c

See prettier#1575 (comment)

Remove superfluous variables from makeString()

8cb1e1a

See prettier#1575 (comment)

Unescape unnecessary strings escapes without looping

dc27b98

lydell reviewed May 10, 2017

View reviewed changes

Unescape unnecessary string escapes while handling quotes

8934399

Kudos to @lydell for figuring this out! See https://github.com/prettier/prettier/pull/1575/files#r115860741

josephfrazier added 2 commits May 10, 2017 18:44

Merge branch 'master' into unnecessarily-escaped

e57895d

Test that unnecessary escapes remain in directive literals

36bd8a2

See: * prettier#1575 (comment) * prettier#1575 (comment)

vjeux merged commit d4217f5 into prettier:master May 10, 2017

josephfrazier deleted the unnecessarily-escaped branch May 10, 2017 23:26

lock bot added the locked-due-to-inactivity Please open a new issue and fill out the template instead of commenting. label Jan 20, 2019

lock bot locked as resolved and limited conversation to collaborators Jan 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unescape unnecessarily escaped characters in strings #1575

Unescape unnecessarily escaped characters in strings #1575

josephfrazier commented May 10, 2017

lydell commented May 10, 2017

josephfrazier commented May 10, 2017 •

edited

vjeux May 10, 2017

bakkot May 10, 2017

bakkot May 10, 2017

josephfrazier May 10, 2017 •

edited

bakkot May 10, 2017

lydell May 10, 2017

vjeux May 10, 2017

josephfrazier May 10, 2017 •

edited

bakkot May 10, 2017

josephfrazier May 10, 2017 •

edited

vjeux May 10, 2017

josephfrazier May 10, 2017

bakkot May 10, 2017 •

edited

josephfrazier May 10, 2017

vjeux May 10, 2017

josephfrazier May 10, 2017 •

edited

lydell May 10, 2017 •

edited

josephfrazier May 10, 2017

bakkot May 10, 2017

lydell May 10, 2017

josephfrazier May 10, 2017

vjeux commented May 10, 2017

josephfrazier commented May 10, 2017

josephfrazier commented May 10, 2017

josephfrazier commented May 10, 2017

vjeux commented May 10, 2017

		@@ -3880,7 +3880,29 @@ function makeString(rawContent, enclosingQuote) {
		return match;

Unescape unnecessarily escaped characters in strings #1575

Unescape unnecessarily escaped characters in strings #1575

Conversation

josephfrazier commented May 10, 2017

lydell commented May 10, 2017

josephfrazier commented May 10, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

josephfrazier May 10, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

josephfrazier May 10, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

josephfrazier May 10, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bakkot May 10, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

josephfrazier May 10, 2017 • edited

Choose a reason for hiding this comment

lydell May 10, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vjeux commented May 10, 2017

josephfrazier commented May 10, 2017

josephfrazier commented May 10, 2017

josephfrazier commented May 10, 2017

vjeux commented May 10, 2017

josephfrazier commented May 10, 2017 •

edited

josephfrazier May 10, 2017 •

edited

josephfrazier May 10, 2017 •

edited

josephfrazier May 10, 2017 •

edited

bakkot May 10, 2017 •

edited

josephfrazier May 10, 2017 •

edited

lydell May 10, 2017 •

edited