Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update: rewrite TokenStore (fixes #7810) #7936

Merged
merged 22 commits into from Feb 9, 2017
Merged

Conversation

mysticatea
Copy link
Member

What is the purpose of this pull request? (put an "X" next to item)

[X] Add something to the core

Fixes #7810

What changes did you make? (Give an overview)

I rewrote TokenStore with few extensions.

  • We had been having 2 TokenStore instances. One is to access tokens. Another is to access a mix of tokens and comments. I added {includeComments: true} option to every methods of TokenStore. As the result, we no longer need the store of a mix of tokens and comments. This would reduce memory usage of linting.
  • I added some enhancements to TokenStore. I think those enhancements are useful to skip parentheses or something like.
    • getFirstTokenBetween(left, right, options)
    • getLastTokenBetween(left, right, options)
    • filter option to skip specific tokens.
    • token-store.d.ts is the interface of TokenStore.
  • I refactored our codebase with the enhancements. For examples:
// lib/rules/arrow-body-style.js
-                            const lastTokenBeforeBody = sourceCode.getTokensBetween(sourceCode.getFirstToken(node), arrowBody)
-                                .reverse()
-                                .find(token => token.value !== "(");
+                            const lastTokenBeforeBody = sourceCode.getLastTokenBetween(sourceCode.getFirstToken(node), arrowBody, astUtils.isNotOpeningParenToken);
// lib/rules/arrow-spacing.js
-            let arrow = sourceCode.getTokenBefore(node.body);
-
-            // skip '(' tokens.
-            while (arrow.value !== "=>") {
-                arrow = sourceCode.getTokenBefore(arrow);
-            }
+            const arrow = sourceCode.getTokenBefore(node.body, astUtils.isArrowToken);
// lib/rules/func-call-spacing.js
+            const lastToken = sourceCode.getLastToken(node);
             const lastCalleeToken = sourceCode.getLastToken(node.callee);
-            let prevToken = lastCalleeToken;
-            let parenToken = sourceCode.getTokenAfter(lastCalleeToken);
-
-            // advances to an open parenthesis.
-            while (
-                parenToken &&
-                parenToken.range[1] < node.range[1] &&
-                parenToken.value !== "("
-            ) {
-                prevToken = parenToken;
-                parenToken = sourceCode.getTokenAfter(parenToken);
-            }
+            const parenToken = sourceCode.getFirstTokenBetween(lastCalleeToken, lastToken, astUtils.isOpeningParenToken);
+            const prevToken = parenToken && sourceCode.getTokenBefore(parenToken);

Is there anything you'd like reviewers to focus on?

@mysticatea mysticatea added core Relates to ESLint's core APIs and features enhancement This change enhances an existing feature of ESLint evaluating The team will evaluate this issue to decide whether it meets the criteria for inclusion labels Jan 18, 2017
@mysticatea mysticatea self-assigned this Jan 18, 2017
@eslintbot
Copy link

LGTM

@mention-bot
Copy link

@mysticatea, thanks for your PR! By analyzing the history of the files in this pull request, we identified @btmills, @vitorbal and @nzakas to be potential reviewers.

*/
constructor(tokens, comments) {
this.tokens = tokens;
this.comments = comments.slice(0); // Needs cloning for backward compatibility!
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed shebang's comment (///usr/bin/env node) remains in token store currently. I think this is a bug, but I keep it to avoid breaking changes.

comments needs cloning for backward compatibility.
After this initialization, ESLint removes a shebang's comment from comments.
However, so far we had been concatenating 'tokens' and 'comments',
so the shebang's comment had remained in the concatenated array.
As the result, both getTokenOrCommentAfter and getTokenOrCommentBefore
methods had been returning the shebang's comment.
And some rules depend on this behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't have time to do a full review at the moment, but this PR changes how we handle shebangs. It essentially creates a comment token type ("Shebang") and then treats it just like any other comment. So I think we can leave the behavior as is until that PR lands?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think we can leave the behavior as is until that PR lands?

Yes, I think so.
Shebang type comments sound nice to me.

@ilyavolodin
Copy link
Member

@not-an-aardvark Could you post perf number before and after this change please?

@not-an-aardvark
Copy link
Member

@ilyavolodin I'm confused, did you mean to ping @mysticatea instead?

@ilyavolodin
Copy link
Member

Oh, sorry, my bad. I mean to ping @mysticatea

Copy link
Member

@not-an-aardvark not-an-aardvark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This looks like a very good improvement.

Looking at the implementation, I'm wondering if Cursor could be a bit simpler. As far as I can tell, a single instance of cursor never needs to move both forwards and backwards (each instance either moves forwards or backwards, but not both). It might be more precise to use iterators instead of cursors. Then each subclass would only need to implement Symbol.iterator:

class FilteredTokenIterator extends TokenIterator {
    constructor(options, predicate) {
        super(options);
        this.predicate = predicate;
    }

    *[Symbol.iterator]() {
        for (const token of super[Symbol.iterator]()) {
            if (this.predicate(token)) {
                yield token;
            }
        }
    }
}

What do you think about this suggestion?

lib/ast-utils.js Outdated
* @param {Token} token - The token to be check.
* @returns {boolean} `false` if the token is a comma token.
*/
function isNotCommaToken(token) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having functions like isNotCommaToken, maybe it would be better to have a general negate function.

function negate(func) {
    return (...args) => !func(...args);
}

Then we can replace this:

tokens.filter(isNotCommaToken)

with this:

tokens.filter(negate(isCommaToken));

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another possibility would be maintaining the current API but using the helper to pre-negate the positive definitions.

function isCommaToken(token) {
    return token.value === "," && token.type === "Punctuator";
}

const isNotCommaToken = negate(isCommaToken);

With either solution, we get to avoid maintaining a set of implementations as well as their inverses separately.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try it later.
I'm worried performance reducing by the dynamic function call since those helpers are called in a lot of time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there ends up being a noticeable performance difference, I would agree leaving this as-is 👍

lib/ast-utils.js Outdated
* @returns {boolean} `true` if the token is a comment token.
*/
function isCommentToken(token) {
return token.type === "Line" || token.type === "Block";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: When #7516 is merged, this will also need || token.type === "Shebang".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, nice catch!

* @returns {boolean} `true` if the token is an `of` token.
*/
function isOfToken(token) {
return token.value === "of";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is supposed to check the of token in for (foo of bar), but it will report a false positive for

/*of*/

var of = 1;

`of`;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, for (const x of /*of*/ iterable) {} triggers false positive. Wow, it's a bug in the current master as well. I will fix it.

lib/ast-utils.js Outdated
/**
* Checks if the given token is a semicolon token or not.
*
* @param {Token} token - The token to be check.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: I think this should say either "the token to be checked" or "the token to check".

cursor = new FilterCursor(cursor, filter);
}
if (skip >= 1) {
cursor = new SkipCursor(cursor, skip);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding correctly, this will not count filtered tokens as "skipped":

foo
bar
baz

sourceCode.getFirstToken(programNode, { skip: 1, filter: token => token.value !== "foo" });
// => the "baz" token

Is this intended behavior?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is intentional.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth, I would find that unintuitive and would prefer that skipping "happens first" (i.e., skipping happens regardless of filtering, and then the filter takes over). Makes sense when you want to get a token using a predicate but first skip over a known bracket/paren/brace, etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I thought this is natural, for example, "I want get the 2nd comma" → sourceCode.getFirstToken(node, {skip: 1, filter: isComma})

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it would be nice to be able to go either way on this. But I'm not sure I can provide any use case to back up my preference.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, I think we can throw errors if both skip and filter are given. I didn't find cases which use both skip and filter options in the refactoring of this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I think it would be better to just pick one behavior (e.g. don't count filtered tokens as skipped) and document the behavior, but not throw an error. We might as well allow an easy way to get the second comma.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the note to documents.

* Initializes this cursor.
*/
constructor() {
this.current = null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be more readable if this.current was called this.currentToken (or maybe this.currentValue) instead. At first, I couldn't tell whether this.current was referring to an index.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, IEnumerator was in my head 😃

/** @inheritdoc */
moveNext() {
while (super.moveNext()) {
if (this.predicate(this.current)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's only a minor issue, but predicate will get called with a this value of the current instance of FilterCursor, which would be observable to external code. This could make it difficult to change the implementation in the future without making breaking changes to rules.

Maybe we should call the predicate as a pure function:

const predicate = this.predicate;

while (super.moveNext()) {
    if (predicate(this.current)) {
        // ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

movePrev() {
while (this.count > 0) {
this.count -= 1;
if (!super.movePrev()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the skip behavior depends on whether moveNext or movePrev is called first. If moveNext is called first, then the first count tokens are skipped, but if movePrev is called first, then the last count tokens are skipped.

I'm not sure if there's an easy workaround for this, though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intentional.

moveNext() {
if (this.count > 0) {
this.count -= 1;
return super.moveNext();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like after calling moveNext enough times, movePrev won't work at all.

for (let i = 0; i < count; i++) {
    this.moveNext();
}
this.movePrev(); // => false

I would expect that after using movePrev, the cursor would be at index count - 2.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's never used in a mix of moveNext and movePrev.
getTokenAfter uses moveNext only, getTokenBefore uses movePrev only.
So this is no problem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed movePrev() as it causes confusing.

* @protected
*/
get firstIndex() {
if (typeof this.imap[this.start] !== "undefined") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this.imap is created from Object.create(null), this could also be

if (this.start in this.imap) {

However, I'm fine with either method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will fix it.
I thought in operator is slow since I had seen nodejs/node#10443. But this seems a different situation.
https://jsperf.com/undefined-check-777

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option would be to use a Map to avoid prototype lookups.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I had used Map at first, but it was 16% slower than this for getting value. 😢

https://jsperf.com/int-map

@mysticatea
Copy link
Member Author

Thank you for the review!
I will check those.


@ilyavolodin Sure.

I was careful about performance.
But sadly, the new implementation is about 1% slower than the original. Also after the refactoring of 8f85577, it got 3% slower, I guess it was caused by that FilterCursor increases function calls of each iteration.

Open raw log
C:\Users\t-nagashima.AD\Documents\GitHub\eslint [master ≡]> npm run perf

> eslint@3.13.1 perf C:\Users\t-nagashima.AD\Documents\GitHub\eslint
> node Makefile.js perf


Loading:
  Load performance Run #1:  237.425139ms
  Load performance Run #2:  239.108839ms
  Load performance Run #3:  232.480586ms
  Load performance Run #4:  237.605341ms
  Load performance Run #5:  235.970236ms

  Load Performance median:  237.425139ms


Single File:
  CPU Speed is 3392 with multiplier 13000000
  Performance Run #1:  3946.4067999999997ms
  Performance Run #2:  3938.73959ms
  Performance Run #3:  3994.417041ms
  Performance Run #4:  3911.2673059999997ms
  Performance Run #5:  3938.744118ms

  Performance budget exceeded: 3938.744118ms (limit: 3832.5471698113206ms)


Multi Files (0 files):
  CPU Speed is 3392 with multiplier 39000000
  Performance Run #1:  12499.78252ms
  Performance Run #2:  12556.26077ms
  Performance Run #3:  12439.634717ms
  Performance Run #4:  12459.592524ms
  Performance Run #5:  12317.274179ms

  Performance budget exceeded: 12459.592524ms (limit: 11497.641509433963ms)

C:\Users\t-nagashima.AD\Documents\GitHub\eslint [master ≡]> git checkout rewrite-token-store
Switched to branch 'rewrite-token-store'
Your branch is up-to-date with 'origin/rewrite-token-store'.
C:\Users\t-nagashima.AD\Documents\GitHub\eslint [rewrite-token-store ≡]> npm run perf

> eslint@3.13.1 perf C:\Users\t-nagashima.AD\Documents\GitHub\eslint
> node Makefile.js perf


Loading:
  Load performance Run #1:  236.285969ms
  Load performance Run #2:  235.878777ms
  Load performance Run #3:  239.786486ms
  Load performance Run #4:  237.061413ms
  Load performance Run #5:  241.826668ms

  Load Performance median:  237.061413ms


Single File:
  CPU Speed is 3392 with multiplier 13000000
  Performance Run #1:  4109.42097ms
  Performance Run #2:  4072.425119ms
  Performance Run #3:  4087.458598ms
  Performance Run #4:  4223.57671ms
  Performance Run #5:  4177.791984ms

  Performance budget exceeded: 4109.42097ms (limit: 3832.5471698113206ms)


Multi Files (0 files):
  CPU Speed is 3392 with multiplier 39000000
  Performance Run #1:  13025.187907ms
  Performance Run #2:  13013.918458ms
  Performance Run #3:  12832.759673ms
  Performance Run #4:  12840.481519ms
  Performance Run #5:  12834.366706ms

  Performance budget exceeded: 12840.481519ms (limit: 11497.641509433963ms)

@mysticatea
Copy link
Member Author

@not-an-aardvark I love Iterators/Generators! Actually, I had written TokenStore with generators: https://gist.github.com/mysticatea/ff83097cdebb0aad79fe6cccc29d4f50. But unfortunately, it was super slow. npm run perf was 15% slower than this implementation.

@eslintbot
Copy link

LGTM

1 similar comment
@eslintbot
Copy link

LGTM

- Refactors TokenStore with ES2015 classes.
- Adds object-style options to every method of TokenStore.
  E.g. `{includeComments: true, skip: 0, filter: null}`
- Adds `getFirstTokenBetween` and `getLastTokenBetween` methods.
- Adds `getTokenOrCommentAfter` and `getTokenOrCommentBefore` methods
  to TokenStore for backward compatibility.
- Removes the TokenStore instance for the mix of tokens and comments.
@eslintbot
Copy link

LGTM

@kaicataldo kaicataldo added accepted There is consensus among the team that this change meets the criteria for inclusion and removed evaluating The team will evaluate this issue to decide whether it meets the criteria for inclusion labels Feb 3, 2017
@not-an-aardvark not-an-aardvark dismissed their stale review February 3, 2017 21:42

My comments have been addressed. I haven't looked at the updated implementation thoroughly yet (planning to do that soon), but I don't want to block this if someone reviews/approves it before I do.

Copy link
Member

@kaicataldo kaicataldo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did one more pass at this - some small nits and a few more questions (sorry I didn't catch these the first time around!)

@@ -143,6 +225,52 @@ describe("TokenStore", () => {
);
});

it("should skip a given number of tokens with option {skip: 1}", () => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really minor nit: Should this maybe match this line and say "should skip a given number of tokens with skip option"?

@@ -197,6 +367,56 @@ describe("TokenStore", () => {
);
});

it("should skip a given number of tokens with skip options", () => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really minor nit: Should this maybe match this line and say "should skip a given number of tokens with skip option"?

);
});

it("should retrieve matched token with filter options", () => {
Copy link
Member

@kaicataldo kaicataldo Feb 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really minor nit: Should this maybe match this line and say "should skip a given number of tokens with filter option"?

let filter = null;

if (typeof opts === "number") {
skip = opts | 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've actually never personally used the bitwise OR operator. Just for my own understanding, is this being used for performance reasons (as opposed to Math.floor(), for instance)?

Copy link
Member Author

@mysticatea mysticatea Feb 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.
V8 can optimize small integer (not double) in high level. Bitwise operator would tell V8 that this is small integer.

.... Probably 😃

* `getNodeByRangeIndex(index)` - returns the deepest node in the AST containing the given source index.

> `skipOptions` is an object which has 3 properties; `skip`, `includeComments`, and `filter`. Default is `{skip: 0, includeComments: false, filter: null}`.
> - The `skip` is a positive integer, the number of skipping tokens. If `filter` option is given at the same time, it doesn't count filtered tokens as skipped.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had one one more thought on my second pass through this - do you think it's worth throwing an error with a helpful message when skip or count are not positive integers? I'm a fan of failing fast and showing a useful error message, since it can help people avoid frustrating debugging.

@gyandeeps
Copy link
Member

I will slowly review this PR. My first suggestion would be to create a folder tokenStore and then spplit the single 1000 line file into multiple files. Then have index.js file which exposes the main stuff from the folder.
This is for readability. Thoughts @mysticatea ?

@eslintbot
Copy link

LGTM

@mysticatea
Copy link
Member Author

  • I split token-store.js to token-store/*.js files. Entry point is token-store/index.js.
  • I added assertions for options.skip and options.count.
  • I fixed plural in test descriptions.

Copy link
Member

@platinumazure platinumazure left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, just one nitpick on a pair of short variable names.

function createIndexMap(tokens, comments) {
const map = Object.create(null);
let it = 0;
let ic = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could these variables be renamed (e.g., tokenIndex, commentIndex)? (I think this came up earlier)

@eslintbot
Copy link

LGTM

Copy link
Member

@platinumazure platinumazure left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recent changes LGTM, thanks!

Copy link
Member

@btmills btmills left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Great work on this, @mysticatea!

I checked this out and played with a couple ideas for performance improvements that might be low-hanging fruit, but nothing I tried achieved a statistically significant difference. At some point, we might be able to do something like cursor memoization. This one would be more complex, so I haven't tried it, and we should merge this PR without it. Running ESLint on its own codebase involves calling the Cursor constructor almost two million times, which is what makes me think there might be an opportunity for performance improvement there.

@ilyavolodin
Copy link
Member

I'm going to go ahead an merge this in so we can test it out before the next release.

@ilyavolodin ilyavolodin merged commit 834f45d into master Feb 9, 2017
@mysticatea mysticatea deleted the rewrite-token-store branch February 12, 2017 21:00
@eslint-deprecated eslint-deprecated bot locked and limited conversation to collaborators Feb 6, 2018
@eslint-deprecated eslint-deprecated bot added the archived due to age This issue has been archived; please open a new issue for any further discussion label Feb 6, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
accepted There is consensus among the team that this change meets the criteria for inclusion archived due to age This issue has been archived; please open a new issue for any further discussion core Relates to ESLint's core APIs and features enhancement This change enhances an existing feature of ESLint
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants