Update: rewrite TokenStore (fixes #7810) #7936

mysticatea · 2017-01-18T07:59:35Z

What is the purpose of this pull request? (put an "X" next to item)

[X] Add something to the core

What changes did you make? (Give an overview)

I rewrote TokenStore with few extensions.

We had been having 2 TokenStore instances. One is to access tokens. Another is to access a mix of tokens and comments. I added {includeComments: true} option to every methods of TokenStore. As the result, we no longer need the store of a mix of tokens and comments. This would reduce memory usage of linting.
I added some enhancements to TokenStore. I think those enhancements are useful to skip parentheses or something like.
- getFirstTokenBetween(left, right, options)
- getLastTokenBetween(left, right, options)
- filter option to skip specific tokens.
- token-store.d.ts is the interface of TokenStore.
I refactored our codebase with the enhancements. For examples:

// lib/rules/arrow-body-style.js
-                            const lastTokenBeforeBody = sourceCode.getTokensBetween(sourceCode.getFirstToken(node), arrowBody)
-                                .reverse()
-                                .find(token => token.value !== "(");
+                            const lastTokenBeforeBody = sourceCode.getLastTokenBetween(sourceCode.getFirstToken(node), arrowBody, astUtils.isNotOpeningParenToken);

// lib/rules/arrow-spacing.js
-            let arrow = sourceCode.getTokenBefore(node.body);
-
-            // skip '(' tokens.
-            while (arrow.value !== "=>") {
-                arrow = sourceCode.getTokenBefore(arrow);
-            }
+            const arrow = sourceCode.getTokenBefore(node.body, astUtils.isArrowToken);

// lib/rules/func-call-spacing.js
+            const lastToken = sourceCode.getLastToken(node);
             const lastCalleeToken = sourceCode.getLastToken(node.callee);
-            let prevToken = lastCalleeToken;
-            let parenToken = sourceCode.getTokenAfter(lastCalleeToken);
-
-            // advances to an open parenthesis.
-            while (
-                parenToken &&
-                parenToken.range[1] < node.range[1] &&
-                parenToken.value !== "("
-            ) {
-                prevToken = parenToken;
-                parenToken = sourceCode.getTokenAfter(parenToken);
-            }
+            const parenToken = sourceCode.getFirstTokenBetween(lastCalleeToken, lastToken, astUtils.isOpeningParenToken);
+            const prevToken = parenToken && sourceCode.getTokenBefore(parenToken);

Is there anything you'd like reviewers to focus on?

Please check if the purpose of Proposal: add option to include comments in token store methods #7810 is satisfied or not.
Please check the new interface.
Please check the English of documents.

eslintbot · 2017-01-18T07:59:36Z

LGTM

mention-bot · 2017-01-18T08:01:43Z

@mysticatea, thanks for your PR! By analyzing the history of the files in this pull request, we identified @btmills, @vitorbal and @nzakas to be potential reviewers.

mysticatea · 2017-01-18T08:05:34Z

lib/token-store.js

+     */
+    constructor(tokens, comments) {
+        this.tokens = tokens;
+        this.comments = comments.slice(0); // Needs cloning for backward compatibility!


I noticed shebang's comment (///usr/bin/env node) remains in token store currently. I think this is a bug, but I keep it to avoid breaking changes.

※ comments needs cloning for backward compatibility.
After this initialization, ESLint removes a shebang's comment from comments.
However, so far we had been concatenating 'tokens' and 'comments',
so the shebang's comment had remained in the concatenated array.
As the result, both getTokenOrCommentAfter and getTokenOrCommentBefore
methods had been returning the shebang's comment.
And some rules depend on this behavior.

Sorry, I don't have time to do a full review at the moment, but this PR changes how we handle shebangs. It essentially creates a comment token type ("Shebang") and then treats it just like any other comment. So I think we can leave the behavior as is until that PR lands?

So I think we can leave the behavior as is until that PR lands?

Yes, I think so.
Shebang type comments sound nice to me.

ilyavolodin · 2017-01-18T15:27:16Z

@not-an-aardvark Could you post perf number before and after this change please?

not-an-aardvark · 2017-01-18T15:33:19Z

@ilyavolodin I'm confused, did you mean to ping @mysticatea instead?

ilyavolodin · 2017-01-18T15:45:03Z

Oh, sorry, my bad. I mean to ping @mysticatea

not-an-aardvark

Thanks! This looks like a very good improvement.

Looking at the implementation, I'm wondering if Cursor could be a bit simpler. As far as I can tell, a single instance of cursor never needs to move both forwards and backwards (each instance either moves forwards or backwards, but not both). It might be more precise to use iterators instead of cursors. Then each subclass would only need to implement Symbol.iterator:

class FilteredTokenIterator extends TokenIterator {
    constructor(options, predicate) {
        super(options);
        this.predicate = predicate;
    }

    *[Symbol.iterator]() {
        for (const token of super[Symbol.iterator]()) {
            if (this.predicate(token)) {
                yield token;
            }
        }
    }
}

What do you think about this suggestion?

not-an-aardvark · 2017-01-18T15:15:06Z

lib/ast-utils.js

+ * @param {Token} token - The token to be check.
+ * @returns {boolean} `false` if the token is a comma token.
+ */
+function isNotCommaToken(token) {


Instead of having functions like isNotCommaToken, maybe it would be better to have a general negate function.

function negate(func) { return (...args) => !func(...args); }

Then we can replace this:

tokens.filter(isNotCommaToken)

with this:

tokens.filter(negate(isCommaToken));

Another possibility would be maintaining the current API but using the helper to pre-negate the positive definitions.

function isCommaToken(token) { return token.value === "," && token.type === "Punctuator"; } const isNotCommaToken = negate(isCommaToken);

With either solution, we get to avoid maintaining a set of implementations as well as their inverses separately.

I will try it later.
I'm worried performance reducing by the dynamic function call since those helpers are called in a lot of time.

If there ends up being a noticeable performance difference, I would agree leaving this as-is 👍

not-an-aardvark · 2017-01-18T15:22:09Z

lib/ast-utils.js

+ * @returns {boolean} `true` if the token is a comment token.
+ */
+function isCommentToken(token) {
+    return token.type === "Line" || token.type === "Block";


Note: When #7516 is merged, this will also need || token.type === "Shebang".

Oh, nice catch!

not-an-aardvark · 2017-01-18T15:42:21Z

lib/rules/keyword-spacing.js

+ * @returns {boolean} `true` if the token is an `of` token.
+ */
+function isOfToken(token) {
+    return token.value === "of";


I think this is supposed to check the of token in for (foo of bar), but it will report a false positive for

/*of*/ var of = 1; `of`;

So, for (const x of /*of*/ iterable) {} triggers false positive. Wow, it's a bug in the current master as well. I will fix it.

not-an-aardvark · 2017-01-18T15:52:06Z

lib/ast-utils.js

+/**
+ * Checks if the given token is a semicolon token or not.
+ *
+ * @param {Token} token - The token to be check.


Nitpick: I think this should say either "the token to be checked" or "the token to check".

not-an-aardvark · 2017-01-18T16:08:42Z

lib/token-store.js

+        cursor = new FilterCursor(cursor, filter);
+    }
+    if (skip >= 1) {
+        cursor = new SkipCursor(cursor, skip);


If I'm understanding correctly, this will not count filtered tokens as "skipped":

foo bar baz sourceCode.getFirstToken(programNode, { skip: 1, filter: token => token.value !== "foo" }); // => the "baz" token

Is this intended behavior?

Yes, this is intentional.

For what it's worth, I would find that unintuitive and would prefer that skipping "happens first" (i.e., skipping happens regardless of filtering, and then the filter takes over). Makes sense when you want to get a token using a predicate but first skip over a known bracket/paren/brace, etc.

Hmm, I thought this is natural, for example, "I want get the 2nd comma" → sourceCode.getFirstToken(node, {skip: 1, filter: isComma})

I guess it would be nice to be able to go either way on this. But I'm not sure I can provide any use case to back up my preference.

Or, I think we can throw errors if both skip and filter are given. I didn't find cases which use both skip and filter options in the refactoring of this PR.

Personally, I think it would be better to just pick one behavior (e.g. don't count filtered tokens as skipped) and document the behavior, but not throw an error. We might as well allow an easy way to get the second comma.

I added the note to documents.

not-an-aardvark · 2017-01-18T17:03:32Z

lib/token-store.js

+     * Initializes this cursor.
+     */
+    constructor() {
+        this.current = null;


I think it might be more readable if this.current was called this.currentToken (or maybe this.currentValue) instead. At first, I couldn't tell whether this.current was referring to an index.

Actually, IEnumerator was in my head 😃

not-an-aardvark · 2017-01-18T17:15:39Z

lib/token-store.js

+    /** @inheritdoc */
+    moveNext() {
+        while (super.moveNext()) {
+            if (this.predicate(this.current)) {


It's only a minor issue, but predicate will get called with a this value of the current instance of FilterCursor, which would be observable to external code. This could make it difficult to change the implementation in the future without making breaking changes to rules.

Maybe we should call the predicate as a pure function:

const predicate = this.predicate; while (super.moveNext()) { if (predicate(this.current)) { // ...

Nice catch!

not-an-aardvark · 2017-01-18T17:23:22Z

lib/token-store.js

+    movePrev() {
+        while (this.count > 0) {
+            this.count -= 1;
+            if (!super.movePrev()) {


It looks like the skip behavior depends on whether moveNext or movePrev is called first. If moveNext is called first, then the first count tokens are skipped, but if movePrev is called first, then the last count tokens are skipped.

I'm not sure if there's an easy workaround for this, though.

This is intentional.

not-an-aardvark · 2017-01-18T17:31:06Z

lib/token-store.js

+    moveNext() {
+        if (this.count > 0) {
+            this.count -= 1;
+            return super.moveNext();


It looks like after calling moveNext enough times, movePrev won't work at all.

for (let i = 0; i < count; i++) { this.moveNext(); } this.movePrev(); // => false

I would expect that after using movePrev, the cursor would be at index count - 2.

It's never used in a mix of moveNext and movePrev.
getTokenAfter uses moveNext only, getTokenBefore uses movePrev only.
So this is no problem.

I removed movePrev() as it causes confusing.

not-an-aardvark · 2017-01-18T17:35:28Z

lib/token-store.js

+     * @protected
+     */
+    get firstIndex() {
+        if (typeof this.imap[this.start] !== "undefined") {


Since this.imap is created from Object.create(null), this could also be

if (this.start in this.imap) {

However, I'm fine with either method.

I will fix it.
I thought in operator is slow since I had seen nodejs/node#10443. But this seems a different situation.
https://jsperf.com/undefined-check-777

Another option would be to use a Map to avoid prototype lookups.

Actually, I had used Map at first, but it was 16% slower than this for getting value. 😢

https://jsperf.com/int-map

mysticatea · 2017-01-19T00:37:00Z

Thank you for the review!
I will check those.

@ilyavolodin Sure.

I was careful about performance.
But sadly, the new implementation is about 1% slower than the original. Also after the refactoring of 8f85577, it got 3% slower, I guess it was caused by that FilterCursor increases function calls of each iteration.

Open raw log

C:\Users\t-nagashima.AD\Documents\GitHub\eslint [master ≡]> npm run perf

> eslint@3.13.1 perf C:\Users\t-nagashima.AD\Documents\GitHub\eslint
> node Makefile.js perf


Loading:
  Load performance Run #1:  237.425139ms
  Load performance Run #2:  239.108839ms
  Load performance Run #3:  232.480586ms
  Load performance Run #4:  237.605341ms
  Load performance Run #5:  235.970236ms

  Load Performance median:  237.425139ms


Single File:
  CPU Speed is 3392 with multiplier 13000000
  Performance Run #1:  3946.4067999999997ms
  Performance Run #2:  3938.73959ms
  Performance Run #3:  3994.417041ms
  Performance Run #4:  3911.2673059999997ms
  Performance Run #5:  3938.744118ms

  Performance budget exceeded: 3938.744118ms (limit: 3832.5471698113206ms)


Multi Files (0 files):
  CPU Speed is 3392 with multiplier 39000000
  Performance Run #1:  12499.78252ms
  Performance Run #2:  12556.26077ms
  Performance Run #3:  12439.634717ms
  Performance Run #4:  12459.592524ms
  Performance Run #5:  12317.274179ms

  Performance budget exceeded: 12459.592524ms (limit: 11497.641509433963ms)

C:\Users\t-nagashima.AD\Documents\GitHub\eslint [master ≡]> git checkout rewrite-token-store
Switched to branch 'rewrite-token-store'
Your branch is up-to-date with 'origin/rewrite-token-store'.
C:\Users\t-nagashima.AD\Documents\GitHub\eslint [rewrite-token-store ≡]> npm run perf

> eslint@3.13.1 perf C:\Users\t-nagashima.AD\Documents\GitHub\eslint
> node Makefile.js perf


Loading:
  Load performance Run #1:  236.285969ms
  Load performance Run #2:  235.878777ms
  Load performance Run #3:  239.786486ms
  Load performance Run #4:  237.061413ms
  Load performance Run #5:  241.826668ms

  Load Performance median:  237.061413ms


Single File:
  CPU Speed is 3392 with multiplier 13000000
  Performance Run #1:  4109.42097ms
  Performance Run #2:  4072.425119ms
  Performance Run #3:  4087.458598ms
  Performance Run #4:  4223.57671ms
  Performance Run #5:  4177.791984ms

  Performance budget exceeded: 4109.42097ms (limit: 3832.5471698113206ms)


Multi Files (0 files):
  CPU Speed is 3392 with multiplier 39000000
  Performance Run #1:  13025.187907ms
  Performance Run #2:  13013.918458ms
  Performance Run #3:  12832.759673ms
  Performance Run #4:  12840.481519ms
  Performance Run #5:  12834.366706ms

  Performance budget exceeded: 12840.481519ms (limit: 11497.641509433963ms)

mysticatea · 2017-01-19T00:39:19Z

@not-an-aardvark I love Iterators/Generators! Actually, I had written TokenStore with generators: https://gist.github.com/mysticatea/ff83097cdebb0aad79fe6cccc29d4f50. But unfortunately, it was super slow. npm run perf was 15% slower than this implementation.

eslintbot · 2017-01-19T08:28:46Z

LGTM

eslintbot · 2017-01-19T08:50:05Z

LGTM

- Refactors TokenStore with ES2015 classes. - Adds object-style options to every method of TokenStore. E.g. `{includeComments: true, skip: 0, filter: null}` - Adds `getFirstTokenBetween` and `getLastTokenBetween` methods. - Adds `getTokenOrCommentAfter` and `getTokenOrCommentBefore` methods to TokenStore for backward compatibility. - Removes the TokenStore instance for the mix of tokens and comments.

To change `Block` to `BlockComment` in future.

eslintbot · 2017-02-03T11:44:49Z

LGTM

My comments have been addressed. I haven't looked at the updated implementation thoroughly yet (planning to do that soon), but I don't want to block this if someone reviews/approves it before I do.

kaicataldo

Did one more pass at this - some small nits and a few more questions (sorry I didn't catch these the first time around!)

kaicataldo · 2017-02-07T03:20:42Z

tests/lib/token-store.js

@@ -143,6 +225,52 @@ describe("TokenStore", () => {
            );
        });

+        it("should skip a given number of tokens with option {skip: 1}", () => {


Really minor nit: Should this maybe match this line and say "should skip a given number of tokens with skip option"?

kaicataldo · 2017-02-07T03:23:46Z

tests/lib/token-store.js

@@ -197,6 +367,56 @@ describe("TokenStore", () => {
            );
        });

+        it("should skip a given number of tokens with skip options", () => {


Really minor nit: Should this maybe match this line and say "should skip a given number of tokens with skip option"?

kaicataldo · 2017-02-07T03:25:28Z

tests/lib/token-store.js

+            );
+        });
+
+        it("should retrieve matched token with filter options", () => {


Really minor nit: Should this maybe match this line and say "should skip a given number of tokens with filter option"?

kaicataldo · 2017-02-07T04:07:07Z

lib/token-store.js

+    let filter = null;
+
+    if (typeof opts === "number") {
+        skip = opts | 0;


I've actually never personally used the bitwise OR operator. Just for my own understanding, is this being used for performance reasons (as opposed to Math.floor(), for instance)?

Yes.
V8 can optimize small integer (not double) in high level. Bitwise operator would tell V8 that this is small integer.

.... Probably 😃

kaicataldo · 2017-02-07T04:10:17Z

docs/developer-guide/working-with-rules.md

+* `getNodeByRangeIndex(index)` - returns the deepest node in the AST containing the given source index.
+
+> `skipOptions` is an object which has 3 properties; `skip`, `includeComments`, and `filter`. Default is `{skip: 0, includeComments: false, filter: null}`.
+> - The `skip` is a positive integer, the number of skipping tokens. If `filter` option is given at the same time, it doesn't count filtered tokens as skipped.


I had one one more thought on my second pass through this - do you think it's worth throwing an error with a helpful message when skip or count are not positive integers? I'm a fan of failing fast and showing a useful error message, since it can help people avoid frustrating debugging.

gyandeeps · 2017-02-07T04:33:54Z

I will slowly review this PR. My first suggestion would be to create a folder tokenStore and then spplit the single 1000 line file into multiple files. Then have index.js file which exposes the main stuff from the folder.
This is for readability. Thoughts @mysticatea ?

eslintbot · 2017-02-07T10:15:42Z

LGTM

mysticatea · 2017-02-07T10:19:28Z

I split token-store.js to token-store/*.js files. Entry point is token-store/index.js.
I added assertions for options.skip and options.count.
I fixed plural in test descriptions.

platinumazure

Sorry, just one nitpick on a pair of short variable names.

platinumazure · 2017-02-07T14:34:02Z

lib/token-store/index.js

+function createIndexMap(tokens, comments) {
+    const map = Object.create(null);
+    let it = 0;
+    let ic = 0;


Could these variables be renamed (e.g., tokenIndex, commentIndex)? (I think this came up earlier)

eslintbot · 2017-02-08T22:36:58Z

LGTM

platinumazure

Recent changes LGTM, thanks!

btmills

LGTM. Great work on this, @mysticatea!

I checked this out and played with a couple ideas for performance improvements that might be low-hanging fruit, but nothing I tried achieved a statistically significant difference. At some point, we might be able to do something like cursor memoization. This one would be more complex, so I haven't tried it, and we should merge this PR without it. Running ESLint on its own codebase involves calling the Cursor constructor almost two million times, which is what makes me think there might be an opportunity for performance improvement there.

ilyavolodin · 2017-02-09T01:41:31Z

I'm going to go ahead an merge this in so we can test it out before the next release.

mysticatea added core Relates to ESLint's core APIs and features enhancement This change enhances an existing feature of ESLint evaluating The team will evaluate this issue to decide whether it meets the criteria for inclusion labels Jan 18, 2017

mysticatea self-assigned this Jan 18, 2017

mysticatea commented Jan 18, 2017

View reviewed changes

not-an-aardvark previously requested changes Jan 18, 2017

View reviewed changes

mysticatea requested a review from kaicataldo January 20, 2017 09:17

kaicataldo mentioned this pull request Jan 25, 2017

Breaking: Calculate leading/trailing comments in core #7516

Merged

mysticatea added 13 commits January 30, 2017 20:41

Chore: refactor with new API

e306942

Docs: update working-with-rules.md

c031e0d

Fix: improve tests

3061734

Chore: simplify isNot* functions

4913c93

Chore: fix jsdoc comments.

7c81030

Fix: isComment returns true for shebang

7f51920

Chore: remove unnecessary isOfToken function

fa213b5

Chore: refactor TokenStore; remove movePrev methods

24505a8

Chore: rename variables: start, end, imap

e75803f

Fix: this.predicate() → predicate()

ec714db

Chore: typeof → in

5beeb9d

Chore: use lodash.sortedIndexBy

5d05c23

mysticatea added 3 commits February 3, 2017 20:19

Chore: fix typo

587ef0d

Chore: fix "Block" to .startsWith("Block")

1133196

To change `Block` to `BlockComment` in future.

Docs: add compatibility notes to JSDoc

66ee5bf

kaicataldo added accepted There is consensus among the team that this change meets the criteria for inclusion and removed evaluating The team will evaluate this issue to decide whether it meets the criteria for inclusion labels Feb 3, 2017

kaicataldo mentioned this pull request Feb 3, 2017

Breaking: no-multi-spaces check around inline comments (fixes #7693) #7696

Merged

kaicataldo reviewed Feb 7, 2017

View reviewed changes

mysticatea added 3 commits February 7, 2017 18:29

Chore: modularize TokenStore

4d36c8a

Chore: fix plural in test descriptions

720d702

Fix: add assertions for options.skip and options.count

da1db34

platinumazure requested changes Feb 7, 2017

View reviewed changes

mysticatea mentioned this pull request Feb 8, 2017

Transferring eslint-canary to the eslint GitHub organization #8032

Closed

Chore: it/ic → tokenIndex/commentIndex

91226e2

platinumazure approved these changes Feb 8, 2017

View reviewed changes

btmills approved these changes Feb 8, 2017

View reviewed changes

ilyavolodin approved these changes Feb 9, 2017

View reviewed changes

ilyavolodin merged commit 834f45d into master Feb 9, 2017

mysticatea deleted the rewrite-token-store branch February 12, 2017 21:00

This was referenced Feb 20, 2017

Update dependency eslint to version 3.19.0 singapore/lint-condo#244

Merged

Update dependency eslint to version 3.16.0 renovatebot/renovate#111

Merged

platinumazure mentioned this pull request Feb 24, 2017

Enhancement request: Add token value filtering to token-store methods #4563

Closed

eslint-deprecated bot locked and limited conversation to collaborators Feb 6, 2018

eslint-deprecated bot added the archived due to age This issue has been archived; please open a new issue for any further discussion label Feb 6, 2018

Update: rewrite TokenStore (fixes #7810) #7936

Update: rewrite TokenStore (fixes #7810) #7936

Conversation

mysticatea commented Jan 18, 2017

eslintbot commented Jan 18, 2017

mention-bot commented Jan 18, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ilyavolodin commented Jan 18, 2017

not-an-aardvark commented Jan 18, 2017

ilyavolodin commented Jan 18, 2017

not-an-aardvark left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mysticatea Jan 19, 2017 • edited

Choose a reason for hiding this comment

mysticatea commented Jan 19, 2017

mysticatea commented Jan 19, 2017

eslintbot commented Jan 19, 2017

eslintbot commented Jan 19, 2017

eslintbot commented Feb 3, 2017

kaicataldo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kaicataldo Feb 7, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mysticatea Feb 7, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gyandeeps commented Feb 7, 2017

eslintbot commented Feb 7, 2017

mysticatea commented Feb 7, 2017

platinumazure left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eslintbot commented Feb 8, 2017

platinumazure left a comment

Choose a reason for hiding this comment

btmills left a comment

Choose a reason for hiding this comment

ilyavolodin commented Feb 9, 2017

mysticatea Jan 19, 2017 •

edited

kaicataldo Feb 7, 2017 •

edited

mysticatea Feb 7, 2017 •

edited