Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support markdown #2943

Merged
merged 90 commits into from Oct 11, 2017
Merged

feat: support markdown #2943

merged 90 commits into from Oct 11, 2017

Conversation

ikatyang
Copy link
Member

@ikatyang ikatyang commented Sep 30, 2017

Main:

  • Use remark-parse with commonmark option enabled (GFM is based on commonmark)
  • Commonmark Spec
  • enforce
    • header: #-style
    • emphasis: _
    • strong: **
    • listItem: -(always), +(switch between two style for continuous lists)
    • listItem (ordered): 1. (always), 1)(switch between two style for continuous lists)
    • thematicBreak: - - -(always), * * * (in list)
    • indented code block: 4 whitespaces
    • fenced code block: ```(always), ~~~(in js template)
    • table: leading/trailing| and align table cell separator and respect alignment
  • fill word by word (whitespace-based).
  • See the formatted part of the snapshot file.
    export[xxx] = `
    
    unformatted
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    formatted
    
    `;

Side:

  • extend align's n to be string available. (use for quoteblock)

Question:

  • Are the styles above ok?
  • Should name this parser markdown (current) or remark? Ans: markdown

NOTE:

  • the content of code/html/yaml still remains the same.
  • does not support break (trailing two spaces).
  • remark is available on astexplorer.

TODO:

  • <!-- prettier-ignore -->
  • format the content of code
  • escape html entities (<, >, &) html entities remain the same (&lt;, etc.).
  • pass the test case: https://raw.githubusercontent.com/adamschwartz/github-markdown-kitchen-sink/master/TEST.md
  • add a special case for <!-- prettier-ignore --> not to print additional line break
  • use ~~~-style code block for markdown in template string
  • pass all the examples in the spec using AST_COMPARE (607/616)
  • figure out a better way to escape necessary characters
  • anything else?

Closes #2444

@ikatyang
Copy link
Member Author

ikatyang commented Oct 7, 2017

@Graham42

That's a test snapshot file, the top part is unformatted, the bottom part is formatted. 😅

Looks like:

export[xxx] = `

unformatted
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
formatted

`;

@Graham42
Copy link

Graham42 commented Oct 7, 2017

O! well that explains it. That looks much prettier 💯

@azu
Copy link
Contributor

azu commented Oct 11, 2017

I have a question.
Does this markdown formatter support multi-bye language like Japanese or emoji?

It seem that current implementaion has two problems.

First, current implementation will be broken in Chinese-Japanese-Korean(CJK) and emoji.
Because, some printing algorithm depened on string.length.

For example, "❤️".length is 2. Following algorithm jolt out of alignment.

This issue is known as East Asian Width or Unicode problem.

I know that East Asian Width problem is very difficult.( I don't know perfect solution...)
Some library get unicode length.

Second, Following tokenizer can not tokenize non-English text.
Because, non-English text like cheniese doesn't put a space between words.

    .split(/(\s+)/g)

It will be resolved by using tokenizer like nlcst-parse-japanese, parse-english, and rakutenma.
But, It is not realistic and is not perfect. because toknizer is heavy weight(File size is large and Parse speed is slow)

Toknizer example is Text Tokenizer · Hivemall User Manual

Thanks.

@ikatyang
Copy link
Member Author

@azu

Thanks for the suggestion, I've already thought about it and just merged #3003 to fix the printer first, the CJK support is working in progress now, but it should be in a separate PR I think, this PR is somehow too large.

I think there's no need to use tokenizer for CJK, since AFAIK they can be broke in any place, and that's how CJK books printed, e.g.

一串很長很長很長很長很長很長很長很長很長很長很長的中文字,一串很長很長很長很長很長很
長很長很長很長很長很長的中文字,一串很長很長很長很長很長很長很長很長很長很長很長的中
文字,一串很長很長很長很長很長很長很長很長很長很長很長的中文字。

For the string width thing, use string-width should be enough, but we have to disable the stripAnsi feature.


@azz

Do we have any concern to merge this PR? I'd like to send some followed up PRs instead of pushing to the current branch since this PR is somehow too large to review easily.

@azz azz merged commit 9f6f3e7 into prettier:master Oct 11, 2017
@azz
Copy link
Member

azz commented Oct 11, 2017

I've been incrementally reviewing the changes so we're fine to merge!

@ikatyang ikatyang deleted the feat/markdown branch October 11, 2017 23:04
@vjeux
Copy link
Contributor

vjeux commented Oct 12, 2017

So cool!

@nhoizey
Copy link

nhoizey commented Oct 12, 2017

Amazing!

Is it possible to also support Mardown dialects, such as Kramdown, used by Jekyll?

@ikatyang
Copy link
Member Author

@nhoizey

It seems kramdown does not follow the commonmark spec so that it should have its own printer instead of merging into this one.

For new language support, see #3017 (comment).

@nhoizey
Copy link

nhoizey commented Oct 13, 2017

@ikatyang thanks, I'll see what I can do!

@lipis
Copy link
Member

lipis commented Oct 13, 2017

Now we should format all *.md files in this project! :)

@lydell
Copy link
Member

lydell commented Oct 13, 2017

@lipis Good idea!

@lipis
Copy link
Member

lipis commented Oct 13, 2017

#3022 quite a few things to take care of though

@azz
Copy link
Member

azz commented Oct 14, 2017

Big Data research part two: https://bigquery.cloud.google.com/savedquery/652929483875:69be015faec8444c8a2e7d055eeb32f8

* item - item + item
642,106 413,627 10,954
type regex
* item /^\s*\*\s/gm
- item /^\s*-\s/gm
+ item /^\s*\+\s/gm

* item wins!

@ikatyang
Copy link
Member Author

ikatyang commented Oct 14, 2017

There's also the +-style list item, we should do research for it too, and take the top two results to be used in:

listItem: -(always), +(switch between two style for continuous lists)

(current)

- 123
- 123

+ 456
+ 456
<ul>
  <li>123</li>
  <li>123</li>
</ul>
<ul>
  <li>456</li>
  <li>456</li>
</ul>

@azz
Copy link
Member

azz commented Oct 14, 2017

+ is not very popular. Edited my previous comment.

contents.push(rowContents);
}, "children");

const columnMaxWidths = contents.reduce(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also limit the max length of the columns. For example, All-Contributors tables contain HTML code and table dash columns processed by Prettier get very very long.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you open a new issue so we can track this better?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done #4195

@lock lock bot added the locked-due-to-inactivity Please open a new issue and fill out the template instead of commenting. label Jul 5, 2018
@lock lock bot locked as resolved and limited conversation to collaborators Jul 5, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
locked-due-to-inactivity Please open a new issue and fill out the template instead of commenting.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet