New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unescape unnecessarily escaped characters in strings #1575
Conversation
In prettier@0b6d19d, I noticed that `makeString` doesn't unescape unnecessarily escaped non-quote characters. This change simply adds a test for that.
Unfortunately, this breaks a couple of other tests...
Prettier actually used to use jsesc once, but it is too aggressive. Sometimes, people want the actual unicode characters for stuff (such as Chinese), and sometimes people want unicode escapes (such as the non-breaking space (Another reason for not touching escapes is because it can't be done in tagged template literals and regexes. See #574.) Example:
|
See prettier#1575 (comment) This reverts commit d056525.
Thanks for the explanation/context about jsesc, I see why it's not ideal for this use case. I still think there's a case for removing unnecessary escapes in strings, given that |
src/printer.js
Outdated
@@ -3859,6 +3859,10 @@ function makeString(rawContent, enclosingQuote) { | |||
// Matches _any_ escape and unescaped quotes (both single and double). | |||
const regex = /\\([\s\S])|(['"])/g; | |||
|
|||
// Matches any unnecessarily escaped character. | |||
// Adapted from https://github.com/eslint/eslint/blob/de0b4ad7bd820ade41b1f606008bea68683dc11a/lib/rules/no-useless-escape.js#L27 | |||
const regexUnnecessaryStringEscapes = /(^|[^\\])\\([^\\nrvtbfux\r\n\u2028\u2029"'0])/; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is missing some cases from Annex B, which extends the definition of EscapeSequence
to include octals in sloppy mode. So the latter character class needs to end with 0-7
, not 0
. (In strict mode, octal escapes other than \0
are a syntax error, so no further accounting for them is necessary.)
Otherwise I believe this captures all of the characters for which putting a backslash before them does something other than just giving you the character.
However, this still isn't right, since it won't match, e.g., "\\\a"
: the initial "not preceded by a backslash" logic prevents it. That won't cause bugs, but it will cause some unnecessary escapes to be preserved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you want /((?:^|[^\\])(?:\\\\)*)\\([^\\nrvtbfux\r\n\u2028\u2029"'0-7])/
, which is "an odd number of backslashes followed by a character other than one which is meaningful to escape".
Not tested thoroughly, but at least:
let r = /((?:^|[^\\])(?:\\\\)*)\\([^\\nrvtbfux\r\n\u2028\u2029"'0-7])/;
String.raw`a\\\\\a`.replace(r, (match, prev, escaped) => prev + escaped) === String.raw`a\\\\a`; // true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I had gotten as including the (?:\\\\)*
, but I forgot about checking the previous character to make sure it's not a slash. I've incorporated the suggested changes as of 20dc94b, but haven't quite gotten one test to pass... https://travis-ci.org/prettier/prettier/jobs/230873975#L1082 EDIT: nevermind, it works. See eab837b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noticed my solution doesn't quite work either:
String.raw`\a\a`.replace(r, (match, prev, escaped) => prev + escaped) === String.raw`a\a`
You need lookbehinds, but they're not in the language yet.
I fear this might not be doable with a regex, at least in a single pass.
(A hack is to just keep applying the replacement until things stop changing, but that's not ideal.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"\\a" | ||
'\\a' | ||
"hol\\a" | ||
'hol\\a' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you beef up your test plan quite dramatically? You are introducing a very error prone piece of logic in this pull request, it would give me confidence if you added tests for all the possible combinations you are adding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…lash This just allows an even number of backslashes to precede the unnecessary one. See prettier#1575 (comment)
…in string This breaks another test though... See prettier#1575 (comment)
e2387ef
to
20dc94b
Compare
… in string This breaks another test though... See prettier#1575 (comment)
It turns out the test wasn't broken, I had just flubbed the escaping in the snapshot. The easiest way to see that this actually works is ```bash $ cat | prettier --stdin "hol\\a (the a is not escaped)" // press Control-D after the newline "hol\\a (the a is not escaped)"; // press Control-D after the newline ```
@@ -31,6 +31,22 @@ | |||
// Unnecessary escapes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file needs to start with something other than a string - as written, these are all directives, not strings.
Also, you need to make sure this substitution isn't applied to directives; see #1555.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See prettier#1575 (comment) (cherry picked from commit 126e56a)
See prettier#1575 (comment) This looping is hacky. We might be able to emulate lookbehind instead.
src/printer.js
Outdated
let minimallyEscapedContent = newContent; | ||
let minimallyEscapedContentPrev; | ||
|
||
while (minimallyEscapedContent !== minimallyEscapedContentPrev) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a max iteration length as a safe guard. I don't want it to go into an infinite loop if there's an edge case that is not well supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. See 7a4cfa8
Not sure how expensive string comparison is here... See prettier@2323c8c#commitcomment-22092267
src/printer.js
Outdated
const regexUnnecessaryStringEscapes = /((?:^|[^\\])(?:\\\\)*)\\([^\\nrvtbfux\r\n\u2028\u2029"'0-7])/g; | ||
|
||
let minimallyEscapedContent = newContent; | ||
let minimallyEscapedContentPrev; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't used, yes?
Edit: also, not sure you need minimallyEscapedContent
at all; presumably you could just mutate newContent
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, oops. Times like these make me wish this project had automated linting :P
Fixed in 8cb1e1a
src/printer.js
Outdated
|
||
let maxIterations = newContent.length; | ||
let touched = true; | ||
while (touched && maxIterations > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious, why do you need a loop here, you're only going to remove at most one \
per entity no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Due to the limitations of the JS regex engine, there are edge cases that aren't replaced all at once. See #1575 (comment)
EDIT: Hmm, that link isn't taking me to the comment. Anyway, it was from @bakkot, saying the following:
Just noticed my solution doesn't quite work either:
String.raw`\a\a`.replace(r, (match, prev, escaped) => prev + escaped) === String.raw`a\a`
You need lookbehinds, but they're not in the language yet.
I fear this might not be doable with a regex, at least in a single pass.
(A hack is to just keep applying the replacement until things stop changing, but that's not ideal.)
f04b200
to
7fc929c
Compare
src/printer.js
Outdated
@@ -3880,7 +3880,29 @@ function makeString(rawContent, enclosingQuote) { | |||
return match; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it enough to simply change the above lines to this and not do anything else?
if (quote) {
return quote;
}
return /^[^\\nrvtbfux\r\n\u2028\u2029"'0-7]$/.test(escaped)
? escaped
: '\\' + escaped
(Add comments and extract variables as needed.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, how about that! Fixed in 8934399
I don't yet completely understand how this works without using lookbehind, but it passes the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't yet completely understand how this works without using lookbehind
It solves the "odd number of backslashes" problem by incrementally consuming pairs of backslashes. A much nicer solution, really.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that's a very useful regex trick that has helped me many times :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation, @bakkot.
Kudos to @lydell for figuring this out! See https://github.com/prettier/prettier/pull/1575/files#r115860741
Is this ready to be merged? |
I don't have anything to add, assuming the tests are comprehensive enough 👍 |
Oh wait, we still need to test that directives don't get unescaped. |
Alright, the directive tests look good (note the difference in the snapshots). I think this is ready, @vjeux. |
Awesome, thanks a lot!! |
In 0b6d19d,
I noticed that
makeString
doesn't unescape unnecessarily escapednon-quote characters. This change simply adds a test for that.