New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unify logic to find and replace line terminators #7949
Unify logic to find and replace line terminators #7949
Comments
Thanks for looking into this! This analysis is very helpful.
It looks like this is only used for removing newlines from text while autofixing, which seems separate from the other two cases. However, it might be a good idea to have a helper function that removes newlines from a string.
I suspect we can unify most of these cases to depend on
It looks like this is only used in two places, and I think the code in Unfortunately, this can't be merged into the If nothing else, it seems like it might be useful to have a |
@not-an-aardvark I really like the idea of using |
Currently, rules that deal with line-splitting need to include a list of JS linebreaks in some form (e.g. in a regex character class). As a result, splitting a string into JS lines is implemented in 22 different places in the codebase. As described in #7949, not all of these implementations are correct (many of them forget to account for \r, \u2028, and \u2029. This commit updates ast-utils.js with the following properties: * `astUtils.LINEBREAKS`: a Set containing all JS linebreaks * `astUtils.LINEBREAK_MATCHER`: a regular expression to match JS linebreaks * `astUtils.createGlobalLinebreakMatcher`: a function that returns a new global regex to match JS linebreaks Additionally, it updates the other line-splitting code to use these constants. Observable changes: * The error message for `newline-per-chained-call` will handle property names with \u2028 and \u2029 the same way it handles names with \n. * `no-multi-str` will now report an error for strings that contain \r, \u2028, or \u2029. (Previously, it only reported strings containing \n.) * `no-unused-vars` will now report the correct location if a `/* global foo */` comment contains a linebreak other than \n. (Previously, it would report an invalid location.) * `spaced-comment` will no longer report an error if a comment contains tokens in the `exceptions` option followed by \u2028 or \u2029.
Currently, rules that deal with line-splitting need to include a list of JS linebreaks in some form (e.g. in a regex character class). As a result, splitting a string into JS lines is implemented in 22 different places in the codebase. As described in #7949, not all of these implementations are correct (many of them forget to account for \r, \u2028, and \u2029. This commit updates ast-utils.js with the following properties: * `astUtils.LINEBREAKS`: a Set containing all JS linebreaks * `astUtils.LINEBREAK_MATCHER`: a regular expression to match JS linebreaks * `astUtils.createGlobalLinebreakMatcher`: a function that returns a new global regex to match JS linebreaks Additionally, it updates the other line-splitting code to use these constants. Observable changes: * The error message for `newline-per-chained-call` will handle property names with \u2028 and \u2029 the same way it handles names with \n. * `no-multi-str` will now report an error for strings that contain \r, \u2028, or \u2029. (Previously, it only reported strings containing \n.) * `no-unused-vars` will now report the correct location if a `/* global foo */` comment contains a linebreak other than \n. (Previously, it would report an invalid location.) * `spaced-comment` will no longer report an error if a comment contains tokens in the `exceptions` option followed by \u2028 or \u2029.
Currently, rules that deal with line-splitting need to include a list of JS linebreaks in some form (e.g. in a regex character class). As a result, splitting a string into JS lines is implemented in 22 different places in the codebase. As described in #7949, not all of these implementations are correct (many of them forget to account for \r, \u2028, and \u2029. This commit updates ast-utils.js with the following properties: * `astUtils.LINEBREAKS`: a Set containing all JS linebreaks * `astUtils.LINEBREAK_MATCHER`: a regular expression to match JS linebreaks * `astUtils.createGlobalLinebreakMatcher`: a function that returns a new global regex to match JS linebreaks Additionally, it updates the other line-splitting code to use these constants. Observable changes: * The error message for `newline-per-chained-call` will handle property names with \u2028 and \u2029 the same way it handles names with \n. * `no-multi-str` will now report an error for strings that contain \r, \u2028, or \u2029. (Previously, it only reported strings containing \n.) * `no-unused-vars` will now report the correct location if a `/* global foo */` comment contains a linebreak other than \n. (Previously, it would report an invalid location.) * `spaced-comment` will no longer report an error if a comment contains tokens in the `exceptions` option followed by \u2028 or \u2029.
The purpose of this proposal is to unify the logic used to find and replace the line terminator sequences
"\r\n"
,"\n"
,"\r"
,"\u2028"
and"\u2029"
throughout the code, following the suggestions in #7924 (review).A recursive text search for the most common patterns
"\n"
and"\r"
in the lib folder reveals anumber of similar regular expressions used in similar ways in several rules and other modules.
Here is an overview of the code in question based on the current master snapshot.
I added comments to explain what the code apparently does.
Find index and length of all LT sequences in a string
Comment out shebang
Strip all LTs from a string
Strip trailing newlines from a string
Test if a string contains a LT
Test if a string contains a LT
Find index and length of all LT sequences in a string
Split lines
Split lines (misses
\u2028
and\u2029
)Find index of all LT sequences in a string (assumes length of sequence is always 1)
Not sure about this one
Test if a string contains a LT (misses
\r
,\u2028
and\u2029
)Find length of all LT sequences in a string
Count LT sequences in a string (misses
\r
,\u2028
and\u2029
)Find index of last LT sequences (misses
\r
,\u2028
and\u2029
)Strip all LT sequences from a string
\r\n?|\n|.
should match any character but misses\u2028
and\u2029
Match line continuations
Match end of line (misses
\u2028
and\u2029
)Split lines
The comments in bold are what I suspect to be problems in the current implementation. Please, don't kill me if I'm wrong. I just added those comments for reference, so we don't happen to change the current behavior accidentally. Each of those should be discussed in a different issue if necessary.
Besides that, the analysis above shows at least four common use cases.
Strip all LTs from a string:
str.replace(/\n|\r|\u2028|\u2029/g, "")
, the regular expression here can be safely stored and reused, so that only one object is created.Split lines:
str.split(/\r\n|\r|\n|\u2028|\u2029/g)
, this regular expression can be also storedin a static context and reused across function calls. As suggested by @not-an-aardvark in the comment below, to split the contents of a node in its lines, it is be better to use
sourceCode.lines
withnode.loc.start
andnode.loc.end
as start and ending positions respectively.Find index and length of all LT sequences in a string: This is basically something like
The regexp literal here is identical to the one used to split lines, but its value cannot be stored statically because it is not stateless: it uses the regular property
lastIndex
to keep track of the progress in scanning the current string.Test if a string contains a LT:
/\n|\r|\u2028|\u2029/.test(str)
, or equivalently!/^.*$/.test(str)
.The text was updated successfully, but these errors were encountered: