Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autofixes in processors #7510

Closed
ilyavolodin opened this issue Nov 1, 2016 · 32 comments
Closed

Autofixes in processors #7510

ilyavolodin opened this issue Nov 1, 2016 · 32 comments
Assignees
Labels
accepted There is consensus among the team that this change meets the criteria for inclusion archived due to age This issue has been archived; please open a new issue for any further discussion core Relates to ESLint's core APIs and features enhancement This change enhances an existing feature of ESLint

Comments

@ilyavolodin
Copy link
Member

I'd like to reopen discussion about support for autofixing in processors. Now that autofixing is stabilized and we have more rules that can be fixed automatically, it feels like it would be helpful to support it for processors. This is continuation of the discussion from #5121
My proposal is to create an additional function called autoFix in the processor definition. It will take two arguments, an array of strings with fixed text and filename. It's job would be to combine passed in fixed text and original file text and return complete text of the modified file. ESLint will, at that point, write that file to disk.
I'll open WIP PR with the changes. My only concern is using autofix in processors through API. Since API just gets back a list of messages, and not fixed text, location of those messages will be off. I'm not sure what would be a good approach to fix this issue, we can either strip autofix information from the messages, or require yet another function from the processor to fix location information for messages (or maybe require that postprocess fix not only message locations, but also autofix locations as well?).

@ilyavolodin ilyavolodin added enhancement This change enhances an existing feature of ESLint core Relates to ESLint's core APIs and features needs bikeshedding Minor details about this change need to be discussed evaluating The team will evaluate this issue to decide whether it meets the criteria for inclusion labels Nov 1, 2016
@platinumazure
Copy link
Member

platinumazure commented Nov 1, 2016

I don't think postprocess can fix message locations and auto-fix locations. Basically, auto-fix locations would need to be fixed first, then fixes will be applied (and as we know, not all fixes are actually applied), and then the other message indexes would need to be adjusted before being sent to the normal postprocess flow.

My original proposal simply assumed that range information would need to be modified in fix objects (possibly by a processFixLocations or similar method in processors), and then fixes would be applied (and other message locations changed), and then postprocess would be called. Looking forward to seeing what you're proposing in your PR.

Previous proposal (in case anyone wants to see the history): #5121

@nzakas
Copy link
Member

nzakas commented Nov 1, 2016

Before modifying processors to do more, I'd actually like to rethink them completely. The way they work now is a bit confusing, both in the way they are defined and how they are automatically applied as soon as the plugin is loaded. I know this is tangential to the topic at hand, I'd just prefer not to build on top of what seems like a slightly broken paradigm. If we're going to open the hood, let's see how we can make things better and that may, in turn, make it easier to do autofixing (which I do think we should figure out how to do).

@ilyavolodin
Copy link
Member Author

OK. Fair enough. Although I don't think those two issues intersect much. It's not hard to come up with the proposal (or even implementation) of a new format for processors. My concern is mostly around backwards-compatibility. If we will have to support both formats for a while, that would be tough...

@nzakas
Copy link
Member

nzakas commented Nov 2, 2016

I don't think we'd preserve backwards compatibility if we redo processors.

Just so I can get a better sense for what you're thinking, can you show an example of how a processor would use your proposed API in this issue?

@ilyavolodin
Copy link
Member Author

ilyavolodin commented Nov 2, 2016

@nzakas Sure. Instead of providing an object with two properties (preprocess and postprocess), it will now have to provide three properties (preprocess, postprocess and autoFix). First two stay exactly the same as they where before. A simple example of autoFix function is listed below:

// suppose that array below is generated by preprocess function and describes locations of each extract block.
var codeBlocksMap = [{
    start: 152,
    end: 835
   }, { ... }
];

function autoFix(fixedCode, filename) {
  var output = originalFileContent;
  fixedCode.forEach(function (codeBlock, index) {
    output = output.substr(0, codeBlocksMap[index].start) +
      codeBlock + output.substr(codeBlocsMap[index].end + 1);
  }
  return output;
}

Which basically means, autoFix function is going to take an array of fixed string, restore the content of the file by substituting original parts of the file with the fixed parts, and return complete content of the file back. At that point, ESLint is going to take over and save the file back to the disk.

@zeptonaut
Copy link

For what it's worth, this has been a huge problem for adoption of eslint for Google's Catapult project (the UI for developer tracing w/in Google Chrome). The project uses Polymer heavily and thus requires the html plugin/processor. Turning on the "1 true brace style" resulted in something like 5000 style violations in the project, which would require a massive effort to fix.

@nzakas, is there any way that we could add the ability for --fix to work with processors in a way that's consistent with how you want processors to work in the future? I'm concerned that if we couple a non-backwards-compatible remake of processors with the ability to --fix, that means that we'd have to:

  1. Do the remake of processors for eslint
  2. Add support for --fix to the processor engine
  3. Wait for processor-creators to add support for --fix to their processors

all before --fix was available to processor-users, which could ultimately take years. If we just do #2 in a way that's compatible with doing #1 in the future, this process could move a lot faster.

@zeptonaut
Copy link

(Also adding @platinumazure, who I've discussed this briefly with before and who expressed interest in a possible fix.)

@ilyavolodin
Copy link
Member Author

@zeptonaut The way I suggested above to implement --fix for processors will require processor developer to modify their package (needs an extra function that can take fixed output and merge it back with original code). So there are some work involved even if enable autofixing for processors as is (without redesign of processors architecture).
But even if we change architecture, I don't see it as a huge change, and for the most part, code of the existing processors can be reused (just needs to be reshuffled a bit).

@platinumazure
Copy link
Member

I want to throw one more thing out for possible discussion-- it's not critical to processor auto-fix directly, but it could help with plugin development and I think it's worth considering if we're talking about redesigning processors.

I really think it would be awesome if ESLint (or a new repo in the organization) provided APIs that basically represented string transforms. Basically, the APIs would be very similar to using either String.prototype.replace or String.prototype.split, but the returned objects would have some extra properties tracking line/column/range of the source text as well as of the transformed text sent to ESLint. And then these objects would also expose methods for transforming line/column or range back and forth. In that sense, plugin developers would be able to use basically the same postprocess logic, as well as fix (or whatever API we create for handling auto-fixing), without having to think too hard about it for their plugin.

If we're already thinking about rewriting processors, can we do anything to support processor developers to make their task easier?

@ilyavolodin
Copy link
Member Author

@platinumazure I think that's a good idea for a stand-alone package. I don't think it should be part of ESLint, however. But it would definitely be helpful to processor developers.

@zeptonaut
Copy link

@platinumazure that definitely seems like it'd be useful, but I agree with @ilyavolodin (with my limited knowledge) that it's probably an issue that is (and should be) decoupled from this one.

@zeptonaut
Copy link

zeptonaut commented Nov 18, 2016

The main reason that I'm concerned with @ilyavolodin's proposal is that it forces the responsibility for accurately applying the fixes onto each plugin owner, whereas the logic for each of those is likely to be very similar.

I have a slightly different straw man proposal:

Are there any known plugins where the line given to eslint doesn't map directly to a line in a source file, plus some configurable offset? It seems like this is by far the most common case. For example, consider this HTML file:

<!-- index.html -->
0| <html>
1|   <head><title>My title</title></head>
2|   <body>
3|     <script>
4|       let my_var = 0;
5|     </script>
6|   </body>
7| </html>

The HTML plugin will just give the following to eslint:

{ 
  // The code that's returned from the processor.
  source: ["let my_var = 0;"], 
  // Lets eslint known that line 0, character 0 in `source` translates to 
  // line 4, character 6 in index.html.
  source_map = ["0,0:4:6"] 
}

If you're familiar with Javascript source maps that are used to help, for example, Chrome devtools debug compiled code, the concept is very similar. If no source_map is passed to eslint, then eslint can just say that --fix isn't supported for that plugin. In fact, if we find that the source map itself gets too big (what happened with real Javascript source maps), we could provide a library like the one that @platinumazure suggested that generates the compact source maps mentioned in the article above for plugin developers.

@ilyavolodin
Copy link
Member Author

@zeptonaut This is really limiting approach which will basically disable any ability to fix large number of possible processors. Consider two examples below:

index.aspx
<html>
  <head><title>My title</title></head>
  <body>
    <script>
      let my_var = '<%= myServerSideVariable %>';
    </script>
  </body>
</html>
<html>
  <body>
    <script>
      let a = 22;
    </script>
    <script>
      a += 1;
      console.log(a);
    </script>
  </body>
</html>

Neither of the two examples would be fixable with your proposal, because in the first example we need to substitute `<%= myServerSideVar %> with something parsable in JavaScript. And in the second example, we want to combine two blocks of code and serve it as a single input into ESLint.

@zeptonaut
Copy link

@ilyavolodin Thanks - I appreciate the feedback on this idea.

For your first example, what sort of transformation would a .aspx plugin normally do before handing this off to eslint? As you mentioned, the part within the script tag right now isn't parsable by Javascript or eslint, so I'm not familiar what type of work the plugin would normally do before handing this off to eslint.

As for the second block, I think that my approach would handle that well. In terms of the object returned in my previous example, it would look something like:

{ 
  // The code that's returned from the processor.
  source: [
    "let a = 22;",
    "a += 1;",
    "console.log(a);"
  ], 
  source_map = [
    "0,0:3,6" // "let a = 22;" begins at line 3, character 6 of the HTML file
    "0,0:6,6" // "a += 1" begins at line 6, character 6 of the HTML file
    "0,0:7,6" // "console.log(a);" begins at line 7, character 6 of the HTML file
  ] 
}

Maybe what you meant was that if, instead of having three separate strings in the source array, you just had:

source: ["let a = 22;\na += 1;\nconsole.log(a);"]

then it wouldn't work. I think that the way to handle this would be to allow some way in the source map to say "different parts of this source come from different parts of the raw file", which is how Javascript source maps handle this problem. This could be done trivially with something like:

{ 
  // The code that's returned from the processor.
  source: ["let a = 22;\na += 1;\nconsole.log(a);"],
  source_map = [
    // "let a = 22;" begins at line 3, character 6 of the HTML file
    // "a += 1" begins at line 6, character 6 of the HTML file
    // "console.log(a);" begins at line 7, character 6 of the HTML file
    "0,0:3,6 | 1,0:6,6 | 2,0:7,6"
  ] 
}

In this format, each element in source_map corresponds to the element in source with the same index. So if:

source_map = [
  "0,0:3,6 | 1,0:6,6",
  "0,0:2,4"
]

then it means:

  • Line 0, character 0 of source_map[0] corresponds to line 3, character 6 of the raw file.
  • Line 1, character 0 of source_map[0] corresponds to line 6, character 6 of the raw file.
  • Line 0, character 0 of source_map[1] corresponds to line 2, character 4 of the raw file.

This should allow you to have a single source entry that comes from multiple places in the raw file.

Does that make sense?

@platinumazure
Copy link
Member

@zeptonaut Your basic idea makes sense and I like it. I just want to point out that in the current implementation, returning an array of sources implies that each source is independent, so in plugins like eslint-plugin-html and eslint-plugin-microtemplates (where a value could be declared in global scope in one block, and then used in another block), those have to export one source.

That said, if we want to allow a "source" to be an array of strings, such that a multiple-block processor could send an array of arrays of strings to ESLint, then I think source maps could be supported in the way you propose. I would certainly love to be able to emit multiple "source parts" from my microtemplates plugin rather than having to concatenate everything together and remember the mappings.

@zeptonaut
Copy link

Ah, okay. I was thinking that emitting multiple sources per plugin per source file was already supported. Maybe we should focus now on making a solution that works for the current situation but could easily be extended to allow emission of multiple source strings in the future? (I think this solution might fit that bill.)

@ilyavolodin
Copy link
Member Author

@zeptonaut What I was trying to show with the examples above is that there might be many different things that processors would do, and we can't predict them all. Somebody even managed to write a processor that under the hood executes JSHint just so they could lint JSON files. While we provide facilities to make processors happen, we don't want to either do too much, or limit possible processors in the future (those two are almost always have direct correlation).
I think you might've slightly misunderstand what I was suggesting above. ESLint will still apply all of the fixes, it will just be up to processor to take fixed text and convert it back to the full original file. That could be done in many ways. JavaScript Source Maps that you are suggesting is just one way of doing it, and it will be up to the processor creator to figure out if they want to use that method or some other way of doing things. Basically, in my proposal above, ESLint will continue to do all the normal things you would expect from the linter/fixer, and leave the in-between to the processor. To clarify the process that I'm suggesting here's the list of things happening and who is going to be doing them:

  1. Read a file (ESLint)
  2. Convert content of the file and return either an array of a string with parsable JavaScript code (processor - preprocess function)
  3. Lint provided code according to the local configuration (ESLint)
  4. Provide fix information for the code (ESLint)
  5. Apply fixes to the provided code (ESLint)
  6. Take output of the fix code and insert/mutate it to the original format, return an single string with full content of the file (processor - autoFix function)
  7. Write file back to the disk (ESLint)
  8. Map errors that were not fixed to the right positions (processor - postprocess function)
  9. Output error messages (ESLint)

In the step number 6 is where you can user JavaScript source maps to restore the full content of the file based on the output of ESLint autofix. But it will be up to the processor creator to do that. Somebody could write a reusable library (like @platinumazure suggested) that would do all of the processor steps automatically (or almost automatically).

@zeptonaut
Copy link

zeptonaut commented Nov 18, 2016

Fair enough: I guess if an easy-to-use library can be written that handles the source mapping, I'd be all for deferring that logic to such a library rather than sticking it in eslint proper.

With the proposed contract, though, I think it'd be exceedingly hard to write a library that manages this. Suppose that you have an HTML file:

<html>
  <script>
    var a = 0;
    a++;
    console.log(a);
    var b = 0;
  </script>
  <body><h1>Foo</h1></body>
  <script>
    var c = 4;
    console.log(c);
  </script>
</html>

and you give eslint:

var a = 0;
a++;
console.log(a);
var b = 0;
var c = 4;
console.log(c);

and eslint gives autofix back:

let a = 0;
a++;
console.log(a);
let c = 4;
console.log(c);

how do you go about merging that without additional metadata given to you by eslint about where that code came from? It seems like trying to use diffs to apply these changes is likely to be exceedingly difficult for plugin authors to get right.

@ilyavolodin
Copy link
Member Author

@zeptonaut And that example is exactly why I don't want to bring mapping functionality into ESLint. There are a lot of different ways to handle the code above. You might want to combine it into a single <script> block, and remove all other script tags. You might want to put in comments in between blocks of code and split back based on those comments. Or you might choose to send it as multiple code blocks into ESLint.
ESLint is not aware of anything other than pure JavaScript (and JSX), so it shouldn't be it's job to try to figure this out. It should be up to the processor to decide what to do. It's also processors job to keep all necessary metadata to be able to reconstruct original format.

@zeptonaut
Copy link

I'm not sure that I understand. Given the above situation, how would a plugin author even go about reconstructing the code correctly?

@platinumazure
Copy link
Member

@zeptonaut It's not pretty, but you can check out platinumazure/eslint-plugin-microtemplates to see how I'm trying to manage it. It can definitely be improved and I think ESLint (or at least a project in the ESLint org) could help, but I also agree that ESLint shouldn't try too hard to get involved with that tracking because of the sheer diversity of input files that might need to be processed by ESLint.

@zeptonaut
Copy link

zeptonaut commented Nov 21, 2016

@platinumazure I now see the advantage of having a single file being able to emit an array of strings, each of which aren't independent.

How does eslint handle this problem when one file uses globals in another? Do you know which rules are most affected by knowing the contents of other parts of the file? It seems like all of the problems associated with emitting multiple strings per file would also be just as applicable to importing javascript from other files, which eslint obviously already handles.

@platinumazure
Copy link
Member

platinumazure commented Nov 21, 2016

@zeptonaut ESLint actually does not know how to use the contents of other files when linting one file. Instead, we have some constructs that just tell the linter how to behave in the face of uncertainty.

Examples:

  • no-undef will read a /* global */ comment to identify a variable as a global which was defined somewhere else, and thus not complain. (It also uses globals defined in your ESLint configuration.)
  • no-unused-vars will read an /* exported */ comment to identify a variable that is only defined in one file (and, thanks to the comment, assumed to be used elsewhere) and thus not complain.
  • Require and (named) import both define variables, so no special handling is needed there.

In any case, an HTML file (whether processed by eslint-plugin-html, eslint-plugin-microtemplates, or something else) would still need /* global */ and similar directives when a variable is actually defined or used completely outside of the original HTML file. So ESLint doesn't handle those cases any differently than it does with non-preprocessed files.

@zeptonaut
Copy link

Got it. Given all of this, I think that @ilyavolodin's original proposition makes sense: basically allow plugin authors figure out how to merge different pieces of the same HTML file (like they do now), and have eslint hand them a --fixed version of the file back and have them be responsible for piecing it back together.

I think that the difficulty of this will likely discourage some plugin authors from actually implementing --fix, but I'm convinced now that there's no better way that really aligns with eslint plugins' goals.

@platinumazure
Copy link
Member

platinumazure commented Nov 21, 2016

@zeptonaut

I think that the difficulty of this will likely discourage some plugin authors from actually implementing --fix, but I'm convinced now that there's no better way that really aligns with eslint plugins' goals.

Well, this was my motivation for suggesting a stateful transform library earlier-- even adjusting line and column numbers in the current postprocess method is annoying right now, and it seems to me like the technique should be basically the same for all plugins that are just extracting JS snippets from their input files.

@ilyavolodin
Copy link
Member Author

@zeptonaut @platinumazure I think you guys I mostly focusing on simple processors (HTML, Vue, etc.) where you can just extract JavaScript directly as a subset of a file. But processors can be a lot more complicated. For example, some format that has variable substitution or a template that gets compiled into JavaScript code, or maybe somebody would be crazy enough to write a processor that converts ActionScript to JavaScript. While the library that you are talking about would be helpful to cover simple cases of extraction, you can't predict what other things people might want to do... (To be clear, I'm not opposed to a library like that. Although you could probably just use JavaScript source mapping library for that. I also don't think this library should be part of ESLint org. But it will most likely be pretty helpful to some processor creators out there).

@zeptonaut
Copy link

I'm fine with that: I see no problem with eslint requiring folks that need the library to include it from an external source.

@BenoitZugmeyer
Copy link
Contributor

BenoitZugmeyer commented Jan 29, 2017

Author of eslint-plugin-html here. I recently released a new version of my plugin with --fix support, and I'd like to have a word here.

I am not very proud of the way I rewrote the plugin to support the --fix option: basically, it doesn't follow the plugin ESLint API conventions at all. Instead of exposing pre and postprocessor functions, it patches some internal ESLint functions, so ESLint won't consider it as an actual plugin and will run its source code fixer logic instead. I realise this approach is quite risky as it may break in any future version of ESLint but it worked correctly for a long time now.

Anyway, let's focus on the topic of the issue. My vision is quite simple: plugin developers currently have to remap message lines and columns in the postprocess method. A solution would be to remap fix ranges in as well, so the fixing process can be applied to the source file content exactly like it currently works. How would they do that is up to them: they could use a sourcemap library, or anything else. For example, converting ActionScript to JavaScript would require a library, and this library is likely to provide an "original location" support that could be used in the postprocess method.

This behavior could be "opt-in": a plugin with fix range remap support could export a fixable: true property.

Maybe using locations (line/column) everywhere (message start/end and fix range) instead of offsets or conversely would be simpler for plugin developers. I initially tried to use source-map to get original message locations, but it doesn't provide a way to retrieve original offset (for fix ranges). That's why I wrote a small library similar to magic-string that fits my needs.

@ilyavolodin
Copy link
Member Author

@BenoitZugmeyer If I'm understanding it correctly, what you are proposing is very close to what I suggested above. Except I separated autofixing into it's own function (autoFix), instead of using postprocess for two purposes.

@BenoitZugmeyer
Copy link
Contributor

Not really, no. The postprocess function wouldn't change its return value, it's still an array of message objects. I'll try to illustrate by using your steps:

  1. Read the original code from a file (ESLint)
  2. Convert the original code and return an array of a string with parsable JavaScript code (processor - preprocess function)
  3. Lint provided code according to the local configuration (ESLint)
  4. Provide fix information for the code (ESLint)
  5. Map messages position (line/column/endLine/endColumn) and fix ranges (fix.range) (processor - postprocess function)
  6. Apply fixes to the original code (ESLint)
  7. Write file back to the disk (ESLint)
  8. Output error messages (ESLint)

I can provide a prototype PR if it makes things easier to understand.

@ilyavolodin
Copy link
Member Author

Ahh... I see. So basically you want postprocess to map all of the locations in the context.report. Hmm... That's not a bad idea. I would need to look at the code for autofixing and see if it's possible, due to the fact that preprocess provides only subset of the file's content to ESLint, and I'm pretty sure that's what will be used to do autofixing (i.e. ESLint doesn't re-read content of the file before applying autofixes, it uses text that was passed in to it initially, which in the case of processes was sanitized by preprocess function).

@not-an-aardvark
Copy link
Member

This was accepted in today's TSC meeting.

@not-an-aardvark not-an-aardvark added accepted There is consensus among the team that this change meets the criteria for inclusion and removed evaluating The team will evaluate this issue to decide whether it meets the criteria for inclusion needs bikeshedding Minor details about this change need to be discussed labels Sep 14, 2017
@not-an-aardvark not-an-aardvark self-assigned this Sep 14, 2017
@eslint-deprecated eslint-deprecated bot locked and limited conversation to collaborators Mar 15, 2018
@eslint-deprecated eslint-deprecated bot added the archived due to age This issue has been archived; please open a new issue for any further discussion label Mar 15, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
accepted There is consensus among the team that this change meets the criteria for inclusion archived due to age This issue has been archived; please open a new issue for any further discussion core Relates to ESLint's core APIs and features enhancement This change enhances an existing feature of ESLint
Projects
None yet
Development

No branches or pull requests

6 participants