Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated inline grammar regexes for strong and em #1315

Merged
merged 4 commits into from Aug 15, 2018

Conversation

barrywoolgar
Copy link
Contributor

@barrywoolgar barrywoolgar commented Aug 2, 2018

Description

Contributor

  • Test(s) exist to ensure functionality and minimize regression; or,
  • no tests required for this PR.
  • If submitting new feature, it has been documented in the appropriate places.

Committer

In most cases, this should be a different person than the contributor.

  • Draft GitHub release notes have been updated.
  • CI is green (no forced merge required).
  • Merge PR

Attempt to match special case single character matches before the more permissive standard regexes
@barrywoolgar
Copy link
Contributor Author

barrywoolgar commented Aug 2, 2018

I am aware I need to add/update tests to this PR before it can be merged, but I haven't the time right now to figure them out so I'm creating this PR as a placeholder for me.

If there's somewhere that describes the dev setup and testing folders (which look many and varied!) please let me know, otherwise I'll figure it over the weekend or next week.

@styfle
Copy link
Member

styfle commented Aug 2, 2018

Hi @barrywoolgar thanks for the PR!

The contibuting doc has some information (it assumes you have node.js installed) so that should get you going with running the test suit.

The good news is, you didn't break any existing tests so that means all that we need is a test case for each issue you linked to 👍

@UziTech
Copy link
Member

UziTech commented Aug 2, 2018

If you fix a current spec test that is failing then all you need to do is remove it from shouldPassButFails

describe('CommonMark 0.28 Emphasis and strong emphasis', function() {
var section = 'Emphasis and strong emphasis';
// var shouldPassButFails = [];
var shouldPassButFails = [333, 334, 342, 348, 349, 352, 353, 354, 355, 356, 360, 368, 369, 371, 372, 378, 380, 381, 382, 387, 388, 392, 393, 394, 395, 396, 402, 403, 409, 416, 419, 420, 421, 422, 423, 424, 428, 431, 432, 433, 434, 435, 436, 443, 444, 445, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458];
var willNotBeAttemptedByCoreTeam = [];
var ignore = shouldPassButFails.concat(willNotBeAttemptedByCoreTeam);
cmSpec.forEach(function(spec) {
messenger.test(spec, section, ignore);
});
});

https://spec.commonmark.org/0.28/#example-333

@barrywoolgar
Copy link
Contributor Author

barrywoolgar commented Aug 2, 2018 via email

@barrywoolgar
Copy link
Contributor Author

Hi, thanks for the pointers. I've got the test suite running and I can't find an existing shouldPassButFails example that covers this case (all 57 existing ones continue to fail with and without the patch).

I assume I shouldn't be making changes to the commonmark spec, as that seems to reference external examples. InsteadI've added some examples (matching those reported in the issues) to /test/new/cm_strong_and_em.*.

Is this sufficient? Thanks

@styfle
Copy link
Member

styfle commented Aug 3, 2018

I confirmed this does fix the two issues and the new test looks good.

I'm a little surprised that it didn't pass test 333 but I tried it myself and it's still failing on this branch.

CommonMark 0.28 Emphasis and strong emphasis should pass example 333
  Message:
    CommonMark (Emphasis and strong emphasis):
    a*"foo"*
    
    ------
    
    Expected:
    <p>a*&quot;foo&quot;*</p>
    
    ------
    
    Marked:
    <p>a<em>&quot;foo&quot;</em></p>

Given that this fixes some bugs, I think it's still worth merging as-is.
@UziTech Any more thoughts on this one?

@@ -540,8 +540,8 @@ var inline = {
link: /^!?\[(label)\]\(href(?:\s+(title))?\s*\)/,
reflink: /^!?\[(label)\]\[(?!\s*\])((?:\\[\[\]]?|[^\[\]\\])+)\]/,
nolink: /^!?\[(?!\s*\])((?:\[[^\[\]]*\]|\\[\[\]]|[^\[\]])*)\](?:\[\])?/,
strong: /^__([^\s][\s\S]*?[^\s])__(?!_)|^\*\*([^\s][\s\S]*?[^\s])\*\*(?!\*)|^__([^\s])__(?!_)|^\*\*([^\s])\*\*(?!\*)/,
em: /^_([^\s][\s\S]*?[^\s_])_(?!_)|^_([^\s_][\s\S]*?[^\s])_(?!_)|^\*([^\s][\s\S]*?[^\s*])\*(?!\*)|^\*([^\s*][\s\S]*?[^\s])\*(?!\*)|^_([^\s_])_(?!_)|^\*([^\s*])\*(?!\*)/,
strong: /^__([^\s])__(?!_)|^\*\*([^\s])\*\*(?!\*)|^__([^\s][\s\S]*?[^\s])__(?!_)|^\*\*([^\s][\s\S]*?[^\s])\*\*(?!\*)/,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Safe

lib/marked.js Outdated
strong: /^__([^\s][\s\S]*?[^\s])__(?!_)|^\*\*([^\s][\s\S]*?[^\s])\*\*(?!\*)|^__([^\s])__(?!_)|^\*\*([^\s])\*\*(?!\*)/,
em: /^_([^\s][\s\S]*?[^\s_])_(?!_)|^_([^\s_][\s\S]*?[^\s])_(?!_)|^\*([^\s][\s\S]*?[^\s*])\*(?!\*)|^\*([^\s*][\s\S]*?[^\s])\*(?!\*)|^_([^\s_])_(?!_)|^\*([^\s*])\*(?!\*)/,
strong: /^__([^\s])__(?!_)|^\*\*([^\s])\*\*(?!\*)|^__([^\s][\s\S]*?[^\s])__(?!_)|^\*\*([^\s][\s\S]*?[^\s])\*\*(?!\*)/,
em: /^_([^\s_])_(?!_)|^\*([^\s*])\*(?!\*)|^_([^\s][\s\S]*?[^\s_])_(?!_)|^_([^\s_][\s\S]*?[^\s])_(?!_)|^\*([^\s][\s\S]*?[^\s*])\*(?!\*)|^\*([^\s*][\s\S]*?[^\s])\*(?!\*)/,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Safe

@davisjam
Copy link
Contributor

davisjam commented Aug 3, 2018

The regexes are safe. If the tests don't break then I'm happy.

@barrywoolgar
Copy link
Contributor Author

I've been rather taken by investigating the shouldPassButFails arrays, so if I've made some of them pass with further changes to the above lines would you like me to add more commits here or is there some way to submit a new PR without this one being merged first?

@UziTech
Copy link
Member

UziTech commented Aug 3, 2018

If you are willing to fix some of the tests we can wait on merging the PR (best solution IMO).

Or if you want we can merge this PR and you can submit a new PR with the improvements

@UziTech
Copy link
Member

UziTech commented Aug 3, 2018

Adding [WIP] to the beginning of the title of this PR will let us know to wait on merging it.

@barrywoolgar barrywoolgar changed the title Updated inline grammer regexes for strong and em [WIP] Updated inline grammer regexes for strong and em Aug 3, 2018
@barrywoolgar
Copy link
Contributor Author

Thanks, apologies to the reviewers who will presumably need to do it all again when I'm done.

@Feder1co5oave
Copy link
Contributor

+1

lib/marked.js Outdated
strong: /^__([^\s][\s\S]*?[^\s])__(?!_)|^\*\*([^\s][\s\S]*?[^\s])\*\*(?!\*)|^__([^\s])__(?!_)|^\*\*([^\s])\*\*(?!\*)/,
em: /^_([^\s][\s\S]*?[^\s_])_(?!_)|^_([^\s_][\s\S]*?[^\s])_(?!_)|^\*([^\s][\s\S]*?[^\s*])\*(?!\*)|^\*([^\s*][\s\S]*?[^\s])\*(?!\*)|^_([^\s_])_(?!_)|^\*([^\s*])\*(?!\*)/,
strong: /^__([^\s])__(?!_)|^\*\*([^\s])\*\*(?!\*)|^__([^\s][\s\S]*?[^\s])__(?!_)|^\*\*([^\s][\s\S]*?[^\s])\*\*(?!\*)/,
// em: /^_([^\s_])_(?!_)|^\*([^\s*])\*(?!\*)|^_([^\s][\s\S]*?[^\s_])_(?!_)|^_([^\s_][\s\S]*?[^\s])_(?!_)|^\*([^\s][\s\S]*?[^\s*])\*(?!\*)|^\*([^\s*][\s\S]*?[^\s])\*(?!\*)/,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this before acceptance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, sorry about that. Suddenly very busy at work but will tidy up and push my progress asap.

@styfle
Copy link
Member

styfle commented Aug 15, 2018

Are we just waiting on removing a comment in this PR?
If so, we can merge and create a new PR to remove the comment 😄

@joshbruce I think this should be in 0.5.0 too

@styfle styfle added this to the 0.5.0 - Commonmark Compliance milestone Aug 15, 2018
@barrywoolgar
Copy link
Contributor Author

Sorry for the delay, work and personal life have got in the way a bit here. I've fixed the commented line, and pushed another commit that fixes the disabled tests for 333, 450, 452, 505, and 535.

I was making some progress on some others but I'll have to submit those under a new PR as I doubt they'll be done anytime soon now.

I have found that trying to read/debug a single regex that covers both the __ and ** formats is quite difficult, so I fell down a bit of a hole trying to separate those before continuing development.

This was referenced Apr 6, 2020
zhenalexfan pushed a commit to zhenalexfan/MarkdownHan that referenced this pull request Nov 8, 2021
Updated inline grammar regexes for strong and em
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants