JoshyPHP I'll look into it this week. The superscript behaviour (especially abc) is mostly based off Reddit's behaviour at the time it was implemented. I might make the implementation aware of word boundaries based on ASCII punctuation but I want to think about it carefully. Many Markdown implementations base establish work boundaries off ASCII letters, which can cause issues with non-latin languages.
I mentioned my post from two years ago only because you asked for other similar discussions and that one I remembered. I'm absolutely fine with the behaviour that we have right now. Superscripts can be closed as expected.
matteocontrini Markdown does not have underlined text in general
JoshyPHP I wish that was the case as well but almost every Markdown implementation out there [...] already uses _ for emphasis.
95% of my users never heard about Markdown, another 5% have heard about it but don't care a jota whether it's implemented properly, unproperly or not at all (some other ruleset instead). I know that other communities are different, I just wanted to explain where I come from.
I would love to have the option of defining my own rules. As this doesn't seem like a realistic prospect, I will probably try to write my own markup extension forking the existing Markdown extension. I have no idea though how difficult this will be.
To explain my ideal solution in more detail: As my highest priority I would like to have bold, italic, underline, subscript and superscript with single delimiting characters as well as quotes, links and embedded images. Second highest priority would be ordered and unordered lists, probably tables. Third highest priority would be headlines, rulers and code, they are hardly ever used by anyone except me. And I use code tags only for troubleshooting regarding the current Markdown.
JoshyPHP Yes but I don't remember it exactly. In the original Markdown they behaved identically but over time it became apparent that it was almost never the case that users meant to emphasis words that way. Most often, users would type some_random_words and didn't want the middle part to be emphasized. On the other hand, when they used * it was almost always to signify emphasis.
I understand, so this problem would arise as well if the underscores would denote underlined text.
JoshyPHP That could be a bug. I'll look into it. What's the expected output? Is that from another implementation? Edit: not a bug, it's just a case of undefined behaviour. The first two tildes are paired with the next two tildes so you end up with something like <s><sub>m</s>m~.
Well, the expected outcome would be to close tags in the reverse order of their opening: <sub><s>m</s>m</sub>. But I'm aware that this would require regex lookbehinds, probably somewhat costly.
On the other side: In the case of bold and italics it works as expected:
***m**m* >> mm, ***m*m** >> mm
The first sequence becomes <em><strong>m</strong>m</em>, the second <strong><em>m</em>m</strong>.