Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor IRFactory #1188

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open

Conversation

tuchida
Copy link
Contributor

@tuchida tuchida commented Feb 23, 2022

ref. #967

  1. IRFactory no longer inherits from Parser. This eliminates Code reuse via inheritance.
  2. Decompiler has been removed. This make that Function.prototype.toString behaves as per the ECMA262 specification.
  3. It is now possible to throw a SyntaxError in IRFactory. It will no longer terminate at 1=1.

I believe that this fix is necessary if we are going to add the latest ECMA262 features to Rhino in the future. When I actually tried to add Computed Property Names and default parameters, I had trouble with the IRFactory and Decompiler code. However, I also understand that it is a big risk for Rhino, with its old history, to accept such a big modification. I can also split the Pull Request.

I would like to discuss with you what kind of pull requests you can accept.

@tuchida tuchida changed the title Refactor ir factory Refactor IRFactory Feb 23, 2022
@tuchida
Copy link
Contributor Author

tuchida commented Feb 23, 2022

  1. Decompiler has been removed. This make that Function.prototype.toString behaves as per the ECMA262 specification.

After getting harness to work in this way, some of the tests for Function.prototype.toString passed.

@@ -458,7 +458,7 @@
     name.js
     private-identifiers-not-empty.js {unsupported: [class-fields-private]}
 
-built-ins/Function 194/505 (38.42%)
+built-ins/Function 188/505 (37.23%)
     internals/Call 2/2 (100.0%)
     internals/Construct 6/6 (100.0%)
     length/S15.3.5.1_A1_T3.js strict
@@ -528,7 +528,6 @@
     prototype/Symbol.hasInstance/value-negative.js
     prototype/Symbol.hasInstance/value-non-obj.js
     prototype/Symbol.hasInstance/value-positive.js
-    prototype/toString/arrow-function.js
     prototype/toString/async-arrow-function.js {unsupported: [async-functions]}
     prototype/toString/async-function-declaration.js {unsupported: [async-functions]}
     prototype/toString/async-function-expression.js {unsupported: [async-functions]}
@@ -554,9 +553,7 @@
     prototype/toString/class-expression-explicit-ctor.js
     prototype/toString/class-expression-implicit-ctor.js
     prototype/toString/Function.js
-    prototype/toString/function-declaration.js
     prototype/toString/function-declaration-non-simple-parameter-list.js
-    prototype/toString/function-expression.js
     prototype/toString/generator-function-declaration.js
     prototype/toString/generator-function-expression.js
     prototype/toString/generator-method.js
@@ -566,9 +563,6 @@
     prototype/toString/getter-class-statement.js
     prototype/toString/getter-class-statement-static.js
     prototype/toString/getter-object.js
-    prototype/toString/line-terminator-normalisation-CR.js
-    prototype/toString/line-terminator-normalisation-CR-LF.js
-    prototype/toString/line-terminator-normalisation-LF.js
     prototype/toString/method-class-expression.js
     prototype/toString/method-class-expression-static.js
     prototype/toString/method-class-statement.js

@rbri
Copy link
Collaborator

rbri commented Feb 23, 2022

Personally i really like this stuff to bring Rhino into the future.
I think it is time to start a branch for a backward compatible version that we maintain for some time (years?). All the old projects not able to update are from my point of view only interested in bugfixes and not really in the new features.
But for all the folks out there still using the recent stuff it might be a real help do go forward.

So far my point of view

@p-bakker
Copy link
Collaborator

I'm also all for moving forward, the only question is how exactly.

Personally, i prefer working on a 1.7.15 and a 2.0.0 simultaneously, where we try to put as much as we can into v1.7.15 as possible, and only put stuff into v2.0.0 if it entails a breaking change.

Examples of breaking changes: changing the Callable interface to accommodate for implementing strict-mode properly and possibly breaking changes to properly implement block-level scoping of const/let/functions.

Whether this PR contains breaking changes that we cannot do without I'm not sure (haven't had the chance to look it over in detail). I did see some public api changes (getEncodedSource, using enum instead of int for flags param for example), but on the surface it looked like those aren't absolutely necessary (but I might be wrong about that), they seemed to just clean up the code a bit

@tuchida
Copy link
Contributor Author

tuchida commented Feb 24, 2022

getEncodedSource

This is necessary. Currently, Decompiler is reverting to the original source code from AST. In this PR, the original source code is retained without going through the AST.

using enum instead of int for flags param

This is not absolutely necessary.

@p-bakker
Copy link
Collaborator

This is necessary

But technically you could've kept using the 'getEncodedSource' name, no? Both return a string

Anyway: with any change, we could be breaking something. Even the fixing of the Regex propetly types @rbri is working on could be a breaking change for some. The question is how likely and does it warrant a bump of the major version.

I do think that when we bump the major version, we must include all breaking changes we can identify at once, which means no release can be made until we've resolved them all

@p-bakker
Copy link
Collaborator

Maybe we should move this discussion to a discussion and plan that meeting that @gbrail proposed a while ago

@rbri
Copy link
Collaborator

rbri commented Feb 24, 2022

https://www.hyrumslaw.com/
and check out the XKCD link at the bottom.

@gbrail
Copy link
Collaborator

gbrail commented Mar 2, 2022

I'm happy to discuss new versions elsewhere. Right now however we only have a few people contributing to Rhino -- only really three people right now are regularly contributing to new feature development, plus various people who drop in to fix a bug (and those drop-ins are really appreciated). But in addition to that:

  • I don't see how we have enough people working in Rhino to maintain two versions.
  • I also don't think that we have enough people for a Big Rewrite. (That's what the Nashorn team tried to do and now people are back to using Rhino in their Java projects.)
  • But I don't want to close the door too hard on people who currently embed Rhino so that they have a hope of upgrading.

So I don't personally see how we have any choice other than to keep moving forward with coming up to date with newer standards, and make pragmatic decisions about backward compatibility without necessarily promising that nothing would ever change.

Relevant to this particular PR, does anyone know how commonly the "Decompiler" is used externally? I do know that a number of projects, including some popular ones, use Rhino as a JavaScript parser, but I think that they mostly use the Parser and various AST classes. Perhaps this particular PR is one that breaks compatibility but not in what we'd consider to be a "core" part of the API.

@tuchida
Copy link
Contributor Author

tuchida commented Mar 2, 2022

Another way to convert AST back to source code is to use AstNode#toSource.
I tried to read this comment to see why Decompiler is needed, but I couldn't understand it.

* The following class save decompilation information about the source. Source information is
* returned from the parser as a String associated with function nodes and with the toplevel script.
* When saved in the constant pool of a class, this string will be UTF-8 encoded, and token values
* will occupy a single byte.
*
* <p>Source is saved (mostly) as token numbers. The tokens saved pretty much correspond to the
* token stream of a 'canonical' representation of the input program, as directed by the parser.
* (There were a few cases where tokens could have been left out where decompiler could easily
* reconstruct them, but I left them in for clarity). (I also looked adding source collection to
* TokenStream instead, where I could have limited the changes to a few lines in getToken... but
* this wouldn't have saved any space in the resulting source representation, and would have meant
* that I'd have to duplicate parser logic in the decompiler to disambiguate situations where
* newlines are important.) The function decompile expands the tokens back into their string
* representations, using simple lookahead to correct spacing and indentation.
*
* <p>Assignments are saved as two-token pairs (Token.ASSIGN, op). Number tokens are stored inline,
* as a NUMBER token, a character representing the type, and either 1 or 4 characters representing
* the bit-encoding of the number. String types NAME, STRING and OBJECT are currently stored as a
* token type, followed by a character giving the length of the string (assumed to be less than
* 2^16), followed by the characters of the string inlined into the source string. Changing this to
* some reference to to the string in the compiled class' constant pool would probably save a lot of
* space... but would require some method of deriving the final constant pool entry from information
* available at parse time.

@p-bakker
Copy link
Collaborator

p-bakker commented Mar 2, 2022

I tried to read this comment to see why Decompiler is needed, but I couldn't understand it

By all means I'm just guessing as well, but I think that the Decompiler approach may have been taken to save memory, by storing tokenized source instead of the original string value?

Judging by the history of the Decompiler file, it's contents was specifically made public so it could be called from outside org.mozilla.javascript, see cecdd47

Further exposing public stuff happened in 66cf82e

Having said that: I've not found any open source code that actually interacts with the Decompiler directly

@p-bakker
Copy link
Collaborator

p-bakker commented Mar 2, 2022

After getting harness to work in this way, some of the tests for Function.prototype.toString passed.

FYI I've managed to get the latest version of test262 to execute against Rhino, after implementing Regex unicode flag support and a work-around in the Test262TestSuite class when loading harness code that contains for/of loops with consts (which Rhino doesn't support yet). So once I've wrapped up the unicode flag support and it gets merged, we can start running against the latest test262 version

@gbrail
Copy link
Collaborator

gbrail commented Mar 12, 2022

I'd like to merge this but I'm still worried about removing Decompiler. I searched GitHub for "import org.mozilla.javascript.Decompiler" and I got about 4000 results, so at least some older code uses it.

It looks like Decompiler has only one public function -- what if you were to replace the whole class with a new class that only implements that one function using the new and improved toString method, or AstNode.toSource as you mentioned? It's OK with me if some of the flags no longer work, or if the output is different, but at least that doesn't take away a whole top-level public class.

If you can do that, then I think we should merge this.

@rbri
Copy link
Collaborator

rbri commented Mar 18, 2022

@tuchida i really like to try this with HtmlUnit - can you please remove the draft flag from this

@p-bakker
Copy link
Collaborator

@tuchida are you going to follow-up on the comments of @gbrail and @rbri, so this PR can move forward?

@p-bakker
Copy link
Collaborator

p-bakker commented May 17, 2022

@tuchida,

Its been a while, are you still on this? In #1188 (comment) @gbrail detailed what he thinks still needs to happen to this PR for it to become mergable

@tuchida
Copy link
Contributor Author

tuchida commented May 17, 2022

I am not making progress now. Sorry.

@p-bakker
Copy link
Collaborator

ok, understood. I am presuming that eventually you'll get back to this?

@tuchida
Copy link
Contributor Author

tuchida commented May 18, 2022

Yes.

@p-bakker
Copy link
Collaborator

p-bakker commented Aug 2, 2023

Hi @tuchida

I looked at the changes in this PR and it looks like they are two-fold:

  • there's the actual changes to change the inheritance and get rid of the decompiler
  • and there are code cleanup changes, that mostly get rid of code duplication

Any change you could split the two, so we can at least merge the later and then have a more focused discussion about the former?

@gbrail
Copy link
Collaborator

gbrail commented Aug 4, 2023

I'm still willing to address this, as it only has a small affect on compatibility. We can do one of two things:

  • If we can put back the "Decompiler" class with just one public method to call the refactored decompiler code, then we can update this PR and merge it for the next release, which should be 1.7.15.
  • If no one has time for that, then we can pull this in after that, as I expect that the next release could be 2.0 and this is a minor enough incompatibility that we could just do it.

@tuchida tuchida force-pushed the refactor-IRFactory branch 3 times, most recently from 16d3041 to dcfaa89 Compare August 8, 2023 14:51
@tuchida
Copy link
Contributor Author

tuchida commented Aug 8, 2023

Sorry for the late reply. This is a Pull Request from a long time ago, so my memory may be faulty.

  • and there are code cleanup changes, that mostly get rid of code duplication

@p-bakker As far as I can remember, there were no such changes.

  • If we can put back the "Decompiler" class with just one public method to call the refactored decompiler code, then we can update this PR and merge it for the next release, which should be 1.7.15.

@gbrail The public method of Decompiler.decompile, decodes encoded source code. A source code is encoded in IRFactory using Decompiler (ref. 166b42b). Since this Pull Request removed the encoding process and ast.getEncodedSource, leaving only Decompiler.decompile will not help maintain compatibility.

@rbri
Copy link
Collaborator

rbri commented Aug 8, 2023

Decompiler has been removed. This make that Function.prototype.toString behaves as per the ECMA262 specification.

Just for the records - did something similar.
HtmlUnit@f3e3667

Is in use in HtmlUnit for some weeks now. But it might introduce problems with the debugger....

@tuchida
Copy link
Contributor Author

tuchida commented Aug 13, 2023

This Pull Request can be split into these two.

  1. IRFactory no longer inherits from Parser. This eliminates Code reuse via inheritance.
  2. Decompiler has been removed. This make that Function.prototype.toString behaves as per the ECMA262 specification.

@tuchida
Copy link
Contributor Author

tuchida commented Aug 20, 2023

  1. IRFactory no longer inherits from Parser. This eliminates Code reuse via inheritance.

I have isolated this as a separate issue.

  1. Decompiler has been removed. This make that Function.prototype.toString behaves as per the ECMA262 specification.

This will not pass the test262 test if Regex u flag is not implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants