New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for /*#__PURE__*/ comments. #2429
Conversation
src/ast/nodes/CallExpression.ts
Outdated
@@ -135,6 +135,8 @@ export default class CallExpression extends NodeBase implements DeoptimizableEnt | |||
for (const argument of this.arguments) { | |||
if (argument.hasEffects(options)) return true; | |||
} | |||
debugger; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debugger statement here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No longer. Any thoughts on what kind of version bump would be needed to release this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say this could go either into the 0.66 or the 1.0. But give us a little time, got a lot of stuff around my head at the moment and I want to do a thorough review of this and also do some performance tests on large code-bases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I'm optimistic about the perf aspect, but I also want to see what happens when it's used on a large codebase.
Thank you @conartist6 ❤️ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @conartist6,
I hope you do not mind I added a commit to your PR. This looks like much, but it is all cosmetics.
As you predicted, my performance tests showed there was hardly any regression even for a larger code-base with a fair amount of comments.
I discovered one edge-case issue, otherwise this is mostly changing names and moving functionality around. In the greater scheme of things, it might be interesting to add the comment parsing to the step where rollup copies the acorn nodes into its own data structures but this would have been a greater change for dubious improvements.
One thing to note is that you modify the acorn nodes, not Rollup's data structures. But after some consideration, I see no issue with this as this is no transient state.
Please have a look, obviously from my side this is good to go.
@@ -35,6 +36,7 @@ import getCodeFrame from './utils/getCodeFrame'; | |||
import { getOriginalLocation } from './utils/getOriginalLocation'; | |||
import { makeLegal } from './utils/identifierHelpers'; | |||
import { basename, extname } from './utils/path'; | |||
import { markPureCallExpressions } from './utils/pureComments'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have extracted the code into a separate file to keep Module smaller.
src/ast/nodes/CallExpression.ts
Outdated
@@ -23,6 +23,7 @@ export default class CallExpression extends NodeBase implements DeoptimizableEnt | |||
type: NodeType.tCallExpression; | |||
callee: ExpressionNode; | |||
arguments: (ExpressionNode | SpreadElement)[]; | |||
pure?: boolean; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think markedPure
is a little verbose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd have said maybe not verbose enough. annotatedPure
is less ambiguous. The argument for being unambiguous is that we're doing something surprising here, adding properties to a syntax tree that aren't a part of javascript syntax. I generally try to be explicit when I'm doing something surprising.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point. annotatedPure
sounds good.
src/utils/pureComments.ts
Outdated
function forEachNodeAfterComment( | ||
ast: ESTree.Program, | ||
commentNodes: CommentDescription[], | ||
handleNode: (node: ESTree.Node) => void |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the handler into the loop to avoid the extra loop at the end
src/utils/pureComments.ts
Outdated
|
||
checkCommentsBeforeNode(<any>ast); | ||
|
||
function checkCommentsBeforeNode(node: ESTree.Node & { start: number; end: number }) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typings were missing. Unfortunately, the outdated rollup-plugin-typescript does not properly flag errors at the moment. My IDE does, though.
Looking a little through acorn, it seems that there is no reason to pass state around so I switched to local variables to avoid the property access. By inlining the handler, there was no reason to construct an array of nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It hate/love types so much. I hate them when they cause you to have to do things like you've had to do on this line. The declaration adds no value and almost no information and looks confusing like destructuring syntax, which can already be pretty much the most complicated syntax in the language to read.
I'm not saying there's any better way, just... ugh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also: given the level of scrutiny you're giving perf, the biggest perf change that could be made would be pulling this inner function definition outside of forEachNodeAfterComment. It should be possible, since the unused second parameter to this function is a state object, which could be used to store all the parameters currently captured by closing over the outer function. That way everything only needs to get hot once even if a multitude of files are checked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. About the types: The big issue is that the external acorn types are wrong. They tend to work a little better and are actually quite useful when we have everything under control. Here the important information is that this accepts ESTree nodes instead of the rollup derivatives we use everywhere else. But I see you extracted it nicely
src/utils/pureComments.ts
Outdated
function checkCommentsBeforeNode(node: ESTree.Node & { start: number; end: number }) { | ||
if (commentIdx === commentNodes.length || commentNodes[commentIdx].end > node.end) return; | ||
let isFound = false; | ||
while (commentIdx < commentNodes.length && node.start >= commentNodes[commentIdx].end) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a (rare) issue where multiple comments belonging to the same node were actually attributed to subsequent node. This loop fixes this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch.
function markPureNode(node: ESTree.Node) { | ||
while (node.type === 'ExpressionStatement') { | ||
node = node.expression; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I replaced the recursion with a flat loop for (probably only marginally) better performance.
@@ -0,0 +1,2 @@ | |||
/*#__PURE__*//*#__PURE__*//*#__PURE__*//*#__PURE__*//*#__PURE__*/ removed(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This tests for the multiple comment issue (I know, it is probably not very realistic, but still).
require(id) { | ||
if (id === 'socks') { | ||
return () => { | ||
throw new Error('Not all socks were removed.'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The emoji is lovely but since this is supposed to be a functional test, I would want it to throw an error when things happen that are not supposed to happen. With this assertion, the source-code check might probably be moot.
src/utils/pureComments.ts
Outdated
|
||
function checkCommentsBeforeNode( | ||
node: ESTreeNodeWithLocation, | ||
st: { handleNode: NodeHandler; commentIdx: number; commentNodes: CommentDescription[] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these types are correct, but I don't really know. I guess you'll have to look at them in your IDE? I'll get mine set up, but I don't have time right now.
Although there is no pure annotation specification, per se, Rollup should strive to be compatible with the defacto There are over a hundred pure annotation test scenarios here covering a large variety of edge cases including parentheses, the use of https://github.com/mishoo/UglifyJS2/search?utf8=✓&q=__PURE__&type= If Rollup is able to produce the same output as these pure annotation tests, then it would be considered to be a highly compatible implementation. |
@kzc I was actually wondering about this one. As the annotation is placed before a call expression, my impression was that side-effects in call arguments should be ignored as well as otherwise the annotation would be useless. Does uglify handle this differently? If so, why? |
I'll look into establishing exact parity with uglify's semantics in terms of the Regex and consideration of expressions. Any pointers would be much appreciated. I'm doing this not because I'm an expert with any of these tools or domains, I'm just someone who needs the functionality. |
In any case, we should probably have a test with a side-effect in the call argument. My guess would be that this should be pure as well (which I think it is now), but I have no idea what Uglify does. e.g. /*__PURE__*/someFunction(globalVariable++); could be removed. In terms of the RegExp, too strict is certainly better than too loose. |
I had assumed that a known side effect in the argument would cause the whole expression to be impure and that's what I was trying to code for. I would not want to remove the call in your example above. My expectation was that eventually annotations would be supported on some nodes other than call expressions. If you had an impure function and an impure argument that you wanted to be pruned, I'd expect to need to write:
I think we follow uglify semantics either way though. |
When in doubt do what Due to consistency after various code transforms and simplification, the arguments to the pure annotated call can have side effects and are not considered to be part of the call. Consider:
This pure annotation logic has been battle tested in the wild for a couple of years now and is basically set in stone. |
When in doubt...
|
@kzc Huh, so it will actually strip the function call and leave the argument expressions. Right now my code would leave the call and its arguments in place. Also all the cases above seem pretty weird to me. Why would you write a function call, not use the result, then mark it pure? It doesn't make sense, why not just omit the function call. Just trying to figure out if the use cases for Rollup and terser are really quite the same. Also thanks for pointing out such a simple way to test terser's behavior and interesting common case. |
Are there other instances where rollup changes code instead of just choosing to include or omit it? Rollup doesn't and has no desire to be an uglification engine as I understand so it's conceivable that uglify or terser might be run on rollup results -- should we then include the annotation comments in rollup output? |
Although the rationale may not seem obvious upon first glance, there's a reason for it. It's widely used in I'd also suggest that this PR offer a way to optionally opt-out of the Rollup pure annotation logic in the event that it deviates from uglify's behavior. Perhaps something like |
@kcz Could you clarify whether implementing a subset of the complete terser functionality with semantics which match terser's is acceptable? Regex aside, can you think of any situation in which the current semantics are unsafe? The scope of this diff is intentionally limited to recognizing the annotation and its meaning. I think any attempt to do more vigorous kinds of dead code elimination should be considered future work. |
Oh, and I'm happy to add a flag for it. |
@conartist6 In that case I would say: Do not go for any inlining but rather leave the whole expression in case the argument has a side-effect. Which is in fact what the current implementation does as I just checked. This is not what Uglify does but it should be compatible unless the annotation is wrong. We should definitely add a test for this. As for a CLI flag, my suggestion would be an input option |
Unless we do inlining, I do not think there is a lot of combinatorics to check like sequence expressions etc. But admittedly we should also have a test where the call expression is not an expression statement but part of another expression which is or is not used. |
@conartist6 If a safe subset of uglify's defacto pure annotation logic were implemented that would probably be okay. Use the numerous |
I don't like their regex, it isn't anchored to the beginning of the comment string. It means that you can't write comments that describe pure annotations without accidentally changing the meaning of your program. Consider: /**
* Blah blah blah long documentation comment lots of work for the regex.
* Normally I would have marked the call to foo on line 22 as #__PURE__
* but in this case it isn't safe.
**/
(function myModule() {
// ...
})() // oops, sorry What I do think it's good to allow is some comment after the marker flag, possibly explaining why the annotation is there: // #__PURE__ We know this call is safe to remove because some reason.
foo(); Definitely looking at their tests makes me realize that there is a bug in my code -- the annotation comment must be dropped if the rollup drops the function, otherwise if rollup's output is passed to terser, terser will assume that some different call has been annotated as pure. Terser, bizarrely, tests its own support for multiple pure annotations appearing in a single comment. This means there's screwy edge cases if I do anchor my regex, like we see one annotation, use it to remove a function, and discard that part of the comment, but then a second annotation appears later in the same comment string and isn't discarded. That would end up causing a different function to be marked pure if the output is passed to terser. I think the best logic is: if a pure annotation comment was recognized and the function was discarded as pure, eliminate all instances of the |
This should actually not happen. If Rollup drops code, all leading comments and trailing comments on the same line are removed together with the statement. Look here: If this is not the case, this should be fixed in the white-space handling logic, not in any pure-comment specific logic. There may be some edge-cases involving sequence expressions and conditionals, I will take a look at those. The only situation where this will not work for statements is this: previsousStatement; //#__PURE__
pureFn(); Here, the logic will associate the trailing comment with the previous statement. If all statements are on the same line, this will work again, though.
This would certainly be a safe approach but I tend to disagree. Consider this example: In this example, Why not instead limit to the most basic pure comments: |
I am definitely against making pure comments a central part of Rollup and very much hope we can limit any pure comment specific logic to the necessary minimum. |
@lukastaegert You didn't include the right URL for you second example I think. Cool that rollup already knows how to drop comments. It looks like we should definitely make sure to avoid the case you mention in your inline example above. I certainly don't mind a less permissive regex. I think by and large it's tools that should be putting these things in not people. I'll update the diff tonight, and I have a lot better idea what the important test cases are now - and am now sure that we should not be seeking full parity with terser |
Dammit, the rollup website update broke the share links :( Here is the off-line example for the second case: import {pureFn} from 'external';
export const myFn = () => /*#__PURE__*/pureFn(); |
As I am not sure when I will have time again to work on this one, here some pointers to get you going:
Once all those are covered, we should be good to go. |
Seems Rich has fixed the REPL! Now my links above are actually valid :) |
Ok, I think I have now addressed everything that was brought forth! For the RegExp, I went with Uglify's approach as suggested by @kzc . The rationale is that
To make sure we are as close to Uglify as possible, I basically copied most of Uglify's tests into two Rollup tests. At the moment, we tree-shake almost everything that Uglify does except when transformations are involved that we do not support (yet), in which case everything is left untouched. It is however guaranteed at least by Uglify's test that everything that Uglify keeps is also kept by Rollup. Then there was the issue when pure annotations were not addressed by Rollup's automatic comment removal, i.e. situations like this console.log('remains'); // @__PURE__ this comment technically belongs to the previous line
console.log('is pure and therefore removed');
console.log('must not receive the comment'); Now the approach is to attach all annotation comments to the annotated AST nodes and when a node is removed, it is made sure that all annotations are removed as well regardless of their position. I checked this also for quite a few other scenarios (sequence expressions etc.). This is done in such a general way that it should be easy to handle other annotation types as well in the future (though I do not expect this to happen any time soon). In contrast to Uglify, pure annotations will be retained if the annotated node is retained. This is done so that those annotations are left to be processed by consumers of our library, which I think is very important for Rollup to support: // needs to be retained in case value is used but should not be regarded a side-effect if not
export const value = /*@__PURE__*/calculateValue(); Please have another look and see if you can find any more issues, otherwise I would aim for releasing this next week. |
I trust that all test cases were reviewed to match uglify's pure annotation behavior. I won't comment on that beyond suggesting to double check the annotations work correctly with respect to expression parentheses - these cases were particularly tricky to get right. Briefly looking at the patch I didn't see a flag to disable rollup's pure annotation processing in the event of an issue as mentioned in #2429 (comment). The flag could default to enabled. It would also be handy to debug situations where the original source input is incorrectly annotated. |
Heya, I think I can provide a comprehensive e2e test for this. At Angular we rely heavily on rollup/webpack+uglify/terser with PURE comments to remove as much code as possible. We also run Build Optimizer on packages that we know has no toplevel side effects to mark function calls with I made a package that checks how much can be removed using this setup: https://github.com/filipesilva/check-side-effects You can test it following these steps:
The output should be:
That's the total of code that could not be removed out of around 1MB of source files. This is to say that, if this implementation of If you want to extract the rollup configuration I use, you can find it here: https://github.com/filipesilva/check-side-effects/blob/master/src/checker.ts |
@kzc Added an option to deactivate it and documented it. Even if annotation hints are ignored, Rollup will still make sure that annotations are always properly removed together with their associated expressions. At least according to Uglify's tests, parentheses did not turn out to be a big issue as acorn seems to handle them flawlessly. |
@filipesilva Thanks for posting this, I played around with your package and it seems to remove perfectly even when I completely remove the minifier. In that case I still get a lot of module header comments but apart from that, the only remaining code is import 'tslib';
import 'rxjs';
import 'rxjs/operators'; which is basically the imports we could not remove because of potential side-effects. If you also specify |
if this is not covered by normal comment removal.
…terate more than once through all comments even if other annotation types are added
@lukastaegert that sounds about right, thanks for taking a look! |
@lukastaegert It appears that pure annotations cannot be disabled for NewExpresssions:
Here's a fix that adds the same annotation check present in CallExpression: --- a/src/ast/nodes/NewExpression.ts
+++ b/src/ast/nodes/NewExpression.ts
@@ -24,7 +24,7 @@ export default class NewExpression extends NodeBase {
for (const argument of this.arguments) {
if (argument.hasEffects(options)) return true;
}
- if (this.annotatedPure) return false;
+ if (this.context.annotations && this.annotatedPure) return false;
return this.callee.hasEffectsWhenCalledAtPath(
EMPTY_PATH,
this.callOptions, |
Thanks, I will add this to #3153 |
This PR contains:
Are tests included?
Breaking Changes?
List any relevent issue numbers:
Fixes #2423, fixes #1293
Description
CallExpressions are not considered to
haveEffects
if they are preceded by a comment whose text is#__PURE__
I do not know what breakages this change could potentially cause. There are certainly plenty of
/*#__PURE__*/
comments out in the wild, but they should hopefully all have been created to express the exact concept for which they are also used here.Also this PR contains an emoji, hopefully all the tooling supports UTF8 properly.