New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update: rewrite TokenStore (fixes #7810) #7936
Conversation
LGTM |
@mysticatea, thanks for your PR! By analyzing the history of the files in this pull request, we identified @btmills, @vitorbal and @nzakas to be potential reviewers. |
lib/token-store.js
Outdated
*/ | ||
constructor(tokens, comments) { | ||
this.tokens = tokens; | ||
this.comments = comments.slice(0); // Needs cloning for backward compatibility! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed shebang's comment (///usr/bin/env node
) remains in token store currently. I think this is a bug, but I keep it to avoid breaking changes.
※
comments
needs cloning for backward compatibility.
After this initialization, ESLint removes a shebang's comment fromcomments
.
However, so far we had been concatenating 'tokens' and 'comments',
so the shebang's comment had remained in the concatenated array.
As the result, bothgetTokenOrCommentAfter
andgetTokenOrCommentBefore
methods had been returning the shebang's comment.
And some rules depend on this behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I don't have time to do a full review at the moment, but this PR changes how we handle shebangs. It essentially creates a comment token type ("Shebang") and then treats it just like any other comment. So I think we can leave the behavior as is until that PR lands?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think we can leave the behavior as is until that PR lands?
Yes, I think so.
Shebang
type comments sound nice to me.
@not-an-aardvark Could you post perf number before and after this change please? |
@ilyavolodin I'm confused, did you mean to ping @mysticatea instead? |
Oh, sorry, my bad. I mean to ping @mysticatea |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This looks like a very good improvement.
Looking at the implementation, I'm wondering if Cursor
could be a bit simpler. As far as I can tell, a single instance of cursor
never needs to move both forwards and backwards (each instance either moves forwards or backwards, but not both). It might be more precise to use iterators instead of cursors. Then each subclass would only need to implement Symbol.iterator
:
class FilteredTokenIterator extends TokenIterator {
constructor(options, predicate) {
super(options);
this.predicate = predicate;
}
*[Symbol.iterator]() {
for (const token of super[Symbol.iterator]()) {
if (this.predicate(token)) {
yield token;
}
}
}
}
What do you think about this suggestion?
lib/ast-utils.js
Outdated
* @param {Token} token - The token to be check. | ||
* @returns {boolean} `false` if the token is a comma token. | ||
*/ | ||
function isNotCommaToken(token) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of having functions like isNotCommaToken
, maybe it would be better to have a general negate
function.
function negate(func) {
return (...args) => !func(...args);
}
Then we can replace this:
tokens.filter(isNotCommaToken)
with this:
tokens.filter(negate(isCommaToken));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another possibility would be maintaining the current API but using the helper to pre-negate the positive definitions.
function isCommaToken(token) {
return token.value === "," && token.type === "Punctuator";
}
const isNotCommaToken = negate(isCommaToken);
With either solution, we get to avoid maintaining a set of implementations as well as their inverses separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try it later.
I'm worried performance reducing by the dynamic function call since those helpers are called in a lot of time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there ends up being a noticeable performance difference, I would agree leaving this as-is 👍
lib/ast-utils.js
Outdated
* @returns {boolean} `true` if the token is a comment token. | ||
*/ | ||
function isCommentToken(token) { | ||
return token.type === "Line" || token.type === "Block"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: When #7516 is merged, this will also need || token.type === "Shebang"
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, nice catch!
lib/rules/keyword-spacing.js
Outdated
* @returns {boolean} `true` if the token is an `of` token. | ||
*/ | ||
function isOfToken(token) { | ||
return token.value === "of"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is supposed to check the of
token in for (foo of bar)
, but it will report a false positive for
/*of*/
var of = 1;
`of`;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, for (const x of /*of*/ iterable) {}
triggers false positive. Wow, it's a bug in the current master as well. I will fix it.
lib/ast-utils.js
Outdated
/** | ||
* Checks if the given token is a semicolon token or not. | ||
* | ||
* @param {Token} token - The token to be check. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: I think this should say either "the token to be checked" or "the token to check".
lib/token-store.js
Outdated
cursor = new FilterCursor(cursor, filter); | ||
} | ||
if (skip >= 1) { | ||
cursor = new SkipCursor(cursor, skip); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm understanding correctly, this will not count filtered tokens as "skipped":
foo
bar
baz
sourceCode.getFirstToken(programNode, { skip: 1, filter: token => token.value !== "foo" });
// => the "baz" token
Is this intended behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is intentional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For what it's worth, I would find that unintuitive and would prefer that skipping "happens first" (i.e., skipping happens regardless of filtering, and then the filter takes over). Makes sense when you want to get a token using a predicate but first skip over a known bracket/paren/brace, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I thought this is natural, for example, "I want get the 2nd comma" → sourceCode.getFirstToken(node, {skip: 1, filter: isComma})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it would be nice to be able to go either way on this. But I'm not sure I can provide any use case to back up my preference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or, I think we can throw errors if both skip
and filter
are given. I didn't find cases which use both skip
and filter
options in the refactoring of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I think it would be better to just pick one behavior (e.g. don't count filtered tokens as skipped) and document the behavior, but not throw an error. We might as well allow an easy way to get the second comma.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the note to documents.
lib/token-store.js
Outdated
* Initializes this cursor. | ||
*/ | ||
constructor() { | ||
this.current = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be more readable if this.current
was called this.currentToken
(or maybe this.currentValue
) instead. At first, I couldn't tell whether this.current
was referring to an index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, IEnumerator was in my head 😃
lib/token-store.js
Outdated
/** @inheritdoc */ | ||
moveNext() { | ||
while (super.moveNext()) { | ||
if (this.predicate(this.current)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's only a minor issue, but predicate
will get called with a this
value of the current instance of FilterCursor
, which would be observable to external code. This could make it difficult to change the implementation in the future without making breaking changes to rules.
Maybe we should call the predicate as a pure function:
const predicate = this.predicate;
while (super.moveNext()) {
if (predicate(this.current)) {
// ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch!
lib/token-store.js
Outdated
movePrev() { | ||
while (this.count > 0) { | ||
this.count -= 1; | ||
if (!super.movePrev()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the skip
behavior depends on whether moveNext
or movePrev
is called first. If moveNext
is called first, then the first count
tokens are skipped, but if movePrev
is called first, then the last count
tokens are skipped.
I'm not sure if there's an easy workaround for this, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is intentional.
lib/token-store.js
Outdated
moveNext() { | ||
if (this.count > 0) { | ||
this.count -= 1; | ||
return super.moveNext(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like after calling moveNext
enough times, movePrev
won't work at all.
for (let i = 0; i < count; i++) {
this.moveNext();
}
this.movePrev(); // => false
I would expect that after using movePrev
, the cursor would be at index count - 2
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's never used in a mix of moveNext
and movePrev
.
getTokenAfter
uses moveNext
only, getTokenBefore
uses movePrev
only.
So this is no problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed movePrev()
as it causes confusing.
lib/token-store.js
Outdated
* @protected | ||
*/ | ||
get firstIndex() { | ||
if (typeof this.imap[this.start] !== "undefined") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this.imap
is created from Object.create(null)
, this could also be
if (this.start in this.imap) {
However, I'm fine with either method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will fix it.
I thought in
operator is slow since I had seen nodejs/node#10443. But this seems a different situation.
https://jsperf.com/undefined-check-777
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option would be to use a Map
to avoid prototype lookups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I had used Map
at first, but it was 16% slower than this for getting value. 😢
Thank you for the review! @ilyavolodin Sure. I was careful about performance. Open raw log
|
@not-an-aardvark I love Iterators/Generators! Actually, I had written |
LGTM |
1 similar comment
LGTM |
- Refactors TokenStore with ES2015 classes. - Adds object-style options to every method of TokenStore. E.g. `{includeComments: true, skip: 0, filter: null}` - Adds `getFirstTokenBetween` and `getLastTokenBetween` methods. - Adds `getTokenOrCommentAfter` and `getTokenOrCommentBefore` methods to TokenStore for backward compatibility. - Removes the TokenStore instance for the mix of tokens and comments.
To change `Block` to `BlockComment` in future.
LGTM |
My comments have been addressed. I haven't looked at the updated implementation thoroughly yet (planning to do that soon), but I don't want to block this if someone reviews/approves it before I do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did one more pass at this - some small nits and a few more questions (sorry I didn't catch these the first time around!)
tests/lib/token-store.js
Outdated
@@ -143,6 +225,52 @@ describe("TokenStore", () => { | |||
); | |||
}); | |||
|
|||
it("should skip a given number of tokens with option {skip: 1}", () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really minor nit: Should this maybe match this line and say "should skip a given number of tokens with skip option"
?
tests/lib/token-store.js
Outdated
@@ -197,6 +367,56 @@ describe("TokenStore", () => { | |||
); | |||
}); | |||
|
|||
it("should skip a given number of tokens with skip options", () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really minor nit: Should this maybe match this line and say "should skip a given number of tokens with skip option"
?
tests/lib/token-store.js
Outdated
); | ||
}); | ||
|
||
it("should retrieve matched token with filter options", () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really minor nit: Should this maybe match this line and say "should skip a given number of tokens with filter option"
?
lib/token-store.js
Outdated
let filter = null; | ||
|
||
if (typeof opts === "number") { | ||
skip = opts | 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've actually never personally used the bitwise OR operator. Just for my own understanding, is this being used for performance reasons (as opposed to Math.floor()
, for instance)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
V8 can optimize small integer (not double) in high level. Bitwise operator would tell V8 that this is small integer.
.... Probably 😃
* `getNodeByRangeIndex(index)` - returns the deepest node in the AST containing the given source index. | ||
|
||
> `skipOptions` is an object which has 3 properties; `skip`, `includeComments`, and `filter`. Default is `{skip: 0, includeComments: false, filter: null}`. | ||
> - The `skip` is a positive integer, the number of skipping tokens. If `filter` option is given at the same time, it doesn't count filtered tokens as skipped. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had one one more thought on my second pass through this - do you think it's worth throwing an error with a helpful message when skip
or count
are not positive integers? I'm a fan of failing fast and showing a useful error message, since it can help people avoid frustrating debugging.
I will slowly review this PR. My first suggestion would be to create a folder |
LGTM |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, just one nitpick on a pair of short variable names.
lib/token-store/index.js
Outdated
function createIndexMap(tokens, comments) { | ||
const map = Object.create(null); | ||
let it = 0; | ||
let ic = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could these variables be renamed (e.g., tokenIndex, commentIndex)? (I think this came up earlier)
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recent changes LGTM, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Great work on this, @mysticatea!
I checked this out and played with a couple ideas for performance improvements that might be low-hanging fruit, but nothing I tried achieved a statistically significant difference. At some point, we might be able to do something like cursor memoization. This one would be more complex, so I haven't tried it, and we should merge this PR without it. Running ESLint on its own codebase involves calling the Cursor
constructor almost two million times, which is what makes me think there might be an opportunity for performance improvement there.
I'm going to go ahead an merge this in so we can test it out before the next release. |
What is the purpose of this pull request? (put an "X" next to item)
[X] Add something to the core
Fixes #7810
What changes did you make? (Give an overview)
I rewrote
TokenStore
with few extensions.TokenStore
instances. One is to access tokens. Another is to access a mix of tokens and comments. I added{includeComments: true}
option to every methods ofTokenStore
. As the result, we no longer need the store of a mix of tokens and comments. This would reduce memory usage of linting.TokenStore
. I think those enhancements are useful to skip parentheses or something like.getFirstTokenBetween(left, right, options)
getLastTokenBetween(left, right, options)
filter
option to skip specific tokens.TokenStore
.Is there anything you'd like reviewers to focus on?