Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unescape unnecessarily escaped characters in strings #1575

Merged
merged 28 commits into from May 10, 2017

Conversation

josephfrazier
Copy link
Collaborator

In 0b6d19d,
I noticed that makeString doesn't unescape unnecessarily escaped
non-quote characters. This change simply adds a test for that.

In prettier@0b6d19d,
I noticed that `makeString` doesn't unescape unnecessarily escaped
non-quote characters. This change simply adds a test for that.
Unfortunately, this breaks a couple of other tests...
@lydell
Copy link
Member

lydell commented May 10, 2017

Prettier actually used to use jsesc once, but it is too aggressive. Sometimes, people want the actual unicode characters for stuff (such as Chinese), and sometimes people want unicode escapes (such as the non-breaking space \u00a0). It was decided that it is better to leave all escapes alone (except quotes). See #229 and #355 and the issues the link to.

(Another reason for not touching escapes is because it can't be done in tagged template literals and regexes. See #574.)

Example:

> require('jsesc')('\u00a0', { quotes: "double", wrap: true, minimal: true })
'" "'
> // Oh no, my escape was turned into a unicode character!

@josephfrazier
Copy link
Collaborator Author

josephfrazier commented May 10, 2017

Thanks for the explanation/context about jsesc, I see why it's not ideal for this use case. I still think there's a case for removing unnecessary escapes in strings, given that makeString is used specifically to unescape unnecessarily escaped quotes (here), so I'll look into alternative ways of fixing this.

@josephfrazier josephfrazier changed the title (WIP) Add test with unnecessarily escaped non-quote character Unescape unnecessarily escaped characters in strings May 10, 2017
src/printer.js Outdated
@@ -3859,6 +3859,10 @@ function makeString(rawContent, enclosingQuote) {
// Matches _any_ escape and unescaped quotes (both single and double).
const regex = /\\([\s\S])|(['"])/g;

// Matches any unnecessarily escaped character.
// Adapted from https://github.com/eslint/eslint/blob/de0b4ad7bd820ade41b1f606008bea68683dc11a/lib/rules/no-useless-escape.js#L27
const regexUnnecessaryStringEscapes = /(^|[^\\])\\([^\\nrvtbfux\r\n\u2028\u2029"'0])/;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lydell or @bakkot, could you review this piece of code? I'm very scared that this may introduce bugs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is missing some cases from Annex B, which extends the definition of EscapeSequence to include octals in sloppy mode. So the latter character class needs to end with 0-7, not 0. (In strict mode, octal escapes other than \0 are a syntax error, so no further accounting for them is necessary.)

Otherwise I believe this captures all of the characters for which putting a backslash before them does something other than just giving you the character.

However, this still isn't right, since it won't match, e.g., "\\\a": the initial "not preceded by a backslash" logic prevents it. That won't cause bugs, but it will cause some unnecessary escapes to be preserved.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you want /((?:^|[^\\])(?:\\\\)*)\\([^\\nrvtbfux\r\n\u2028\u2029"'0-7])/, which is "an odd number of backslashes followed by a character other than one which is meaningful to escape".

Not tested thoroughly, but at least:

let r = /((?:^|[^\\])(?:\\\\)*)\\([^\\nrvtbfux\r\n\u2028\u2029"'0-7])/;
String.raw`a\\\\\a`.replace(r, (match, prev, escaped) => prev + escaped) === String.raw`a\\\\a`; // true

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I had gotten as including the (?:\\\\)*, but I forgot about checking the previous character to make sure it's not a slash. I've incorporated the suggested changes as of 20dc94b, but haven't quite gotten one test to pass... https://travis-ci.org/prettier/prettier/jobs/230873975#L1082 EDIT: nevermind, it works. See eab837b

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed my solution doesn't quite work either:

String.raw`\a\a`.replace(r, (match, prev, escaped) => prev + escaped) === String.raw`a\a`

You need lookbehinds, but they're not in the language yet.

I fear this might not be doable with a regex, at least in a single pass.

(A hack is to just keep applying the replacement until things stop changing, but that's not ideal.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"\\a"
'\\a'
"hol\\a"
'hol\\a'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you beef up your test plan quite dramatically? You are introducing a very error prone piece of logic in this pull request, it would give me confidence if you added tests for all the possible combinations you are adding.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this definitely needs to be thoroughly tested. I added some more tests in f04b200, let me know if you spot any holes.

EDIT: oops, make that 7fc929c (I had a 'wip' commit in there...)

josephfrazier added a commit to josephfrazier/prettier that referenced this pull request May 10, 2017
It turns out the test wasn't broken, I had just flubbed the escaping in
the snapshot. The easiest way to see that this actually works is

```bash
$ cat | prettier --stdin
"hol\\a (the a is not escaped)" // press Control-D after the newline
"hol\\a (the a is not escaped)"; // press Control-D after the newline
```
@@ -31,6 +31,22 @@
// Unnecessary escapes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file needs to start with something other than a string - as written, these are all directives, not strings.

Also, you need to make sure this substitution isn't applied to directives; see #1555.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, I cherry-picked the relevant change over from #1571 (29bc75a). Once this PR has stabilized, I can port the tests over to #1571, since that's where we're actually making sure that directives don't get transformed.

src/printer.js Outdated
let minimallyEscapedContent = newContent;
let minimallyEscapedContentPrev;

while (minimallyEscapedContent !== minimallyEscapedContentPrev) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a max iteration length as a safe guard. I don't want it to go into an infinite loop if there's an edge case that is not well supported.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. See 7a4cfa8

src/printer.js Outdated
const regexUnnecessaryStringEscapes = /((?:^|[^\\])(?:\\\\)*)\\([^\\nrvtbfux\r\n\u2028\u2029"'0-7])/g;

let minimallyEscapedContent = newContent;
let minimallyEscapedContentPrev;
Copy link
Collaborator

@bakkot bakkot May 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't used, yes?

Edit: also, not sure you need minimallyEscapedContent at all; presumably you could just mutate newContent.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, oops. Times like these make me wish this project had automated linting :P

Fixed in 8cb1e1a

josephfrazier added a commit to josephfrazier/prettier that referenced this pull request May 10, 2017
src/printer.js Outdated

let maxIterations = newContent.length;
let touched = true;
while (touched && maxIterations > 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious, why do you need a loop here, you're only going to remove at most one \ per entity no?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to the limitations of the JS regex engine, there are edge cases that aren't replaced all at once. See #1575 (comment)

EDIT: Hmm, that link isn't taking me to the comment. Anyway, it was from @bakkot, saying the following:


Just noticed my solution doesn't quite work either:

String.raw`\a\a`.replace(r, (match, prev, escaped) => prev + escaped) === String.raw`a\a`

You need lookbehinds, but they're not in the language yet.

I fear this might not be doable with a regex, at least in a single pass.

(A hack is to just keep applying the replacement until things stop changing, but that's not ideal.)

src/printer.js Outdated
@@ -3880,7 +3880,29 @@ function makeString(rawContent, enclosingQuote) {
return match;
Copy link
Member

@lydell lydell May 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it enough to simply change the above lines to this and not do anything else?

if (quote) {
  return quote;
}

return /^[^\\nrvtbfux\r\n\u2028\u2029"'0-7]$/.test(escaped)
  ? escaped
  : '\\' + escaped

(Add comments and extract variables as needed.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, how about that! Fixed in 8934399

I don't yet completely understand how this works without using lookbehind, but it passes the tests.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't yet completely understand how this works without using lookbehind

It solves the "odd number of backslashes" problem by incrementally consuming pairs of backslashes. A much nicer solution, really.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's a very useful regex trick that has helped me many times :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation, @bakkot.

@vjeux
Copy link
Contributor

vjeux commented May 10, 2017

Is this ready to be merged?

@josephfrazier
Copy link
Collaborator Author

I don't have anything to add, assuming the tests are comprehensive enough 👍

@josephfrazier
Copy link
Collaborator Author

Oh wait, we still need to test that directives don't get unescaped.

@josephfrazier
Copy link
Collaborator Author

Alright, the directive tests look good (note the difference in the snapshots). I think this is ready, @vjeux.

@vjeux vjeux merged commit d4217f5 into prettier:master May 10, 2017
@vjeux
Copy link
Contributor

vjeux commented May 10, 2017

Awesome, thanks a lot!!

@josephfrazier josephfrazier deleted the unnecessarily-escaped branch May 10, 2017 23:26
@lock lock bot added the locked-due-to-inactivity Please open a new issue and fill out the template instead of commenting. label Jan 20, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Jan 20, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
locked-due-to-inactivity Please open a new issue and fill out the template instead of commenting.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants