400 lines
10 KiB
Markdown
400 lines
10 KiB
Markdown
![]() |
---
|
|||
|
jupytext:
|
|||
|
formats: ipynb,md:myst
|
|||
|
text_representation:
|
|||
|
extension: .md
|
|||
|
format_name: myst
|
|||
|
format_version: '0.8'
|
|||
|
jupytext_version: 1.4.2
|
|||
|
kernelspec:
|
|||
|
display_name: Python 3
|
|||
|
language: python
|
|||
|
name: python3
|
|||
|
---
|
|||
|
|
|||
|
# Using `markdown_it`
|
|||
|
|
|||
|
> This document can be opened to execute with [Jupytext](https://jupytext.readthedocs.io)!
|
|||
|
|
|||
|
markdown-it-py may be used as an API *via* the [`markdown-it-py`](https://pypi.org/project/markdown-it-py/) package.
|
|||
|
|
|||
|
The raw text is first parsed to syntax 'tokens',
|
|||
|
then these are converted to other formats using 'renderers'.
|
|||
|
|
|||
|
+++
|
|||
|
|
|||
|
## Quick-Start
|
|||
|
|
|||
|
The simplest way to understand how text will be parsed is using:
|
|||
|
|
|||
|
```{code-cell} python
|
|||
|
from pprint import pprint
|
|||
|
from markdown_it import MarkdownIt
|
|||
|
```
|
|||
|
|
|||
|
```{code-cell} python
|
|||
|
md = MarkdownIt()
|
|||
|
md.render("some *text*")
|
|||
|
```
|
|||
|
|
|||
|
```{code-cell} python
|
|||
|
for token in md.parse("some *text*"):
|
|||
|
print(token)
|
|||
|
print()
|
|||
|
```
|
|||
|
|
|||
|
## The Parser
|
|||
|
|
|||
|
+++
|
|||
|
|
|||
|
The `MarkdownIt` class is instantiated with parsing configuration options,
|
|||
|
dictating the syntax rules and additional options for the parser and renderer.
|
|||
|
You can define this configuration *via* directly supplying a dictionary or a preset name:
|
|||
|
|
|||
|
- `zero`: This configures the minimum components to parse text (i.e. just paragraphs and text)
|
|||
|
- `commonmark` (default): This configures the parser to strictly comply with the [CommonMark specification](http://spec.commonmark.org/).
|
|||
|
- `js-default`: This is the default in the JavaScript version.
|
|||
|
Compared to `commonmark`, it disables HTML parsing and enables the table and strikethrough components.
|
|||
|
- `gfm-like`: This configures the parser to approximately comply with the [GitHub Flavored Markdown specification](https://github.github.com/gfm/).
|
|||
|
Compared to `commonmark`, it enables the table, strikethrough and linkify components.
|
|||
|
**Important**, to use this configuration you must have `linkify-it-py` installed.
|
|||
|
|
|||
|
```{code-cell} python
|
|||
|
from markdown_it.presets import zero
|
|||
|
zero.make()
|
|||
|
```
|
|||
|
|
|||
|
```{code-cell} python
|
|||
|
md = MarkdownIt("zero")
|
|||
|
md.options
|
|||
|
```
|
|||
|
|
|||
|
You can also override specific options:
|
|||
|
|
|||
|
```{code-cell} python
|
|||
|
md = MarkdownIt("zero", {"maxNesting": 99})
|
|||
|
md.options
|
|||
|
```
|
|||
|
|
|||
|
```{code-cell} python
|
|||
|
pprint(md.get_active_rules())
|
|||
|
```
|
|||
|
|
|||
|
You can find all the parsing rules in the source code:
|
|||
|
`parser_core.py`, `parser_block.py`,
|
|||
|
`parser_inline.py`.
|
|||
|
|
|||
|
```{code-cell} python
|
|||
|
pprint(md.get_all_rules())
|
|||
|
```
|
|||
|
|
|||
|
Any of the parsing rules can be enabled/disabled, and these methods are "chainable":
|
|||
|
|
|||
|
```{code-cell} python
|
|||
|
md.render("- __*emphasise this*__")
|
|||
|
```
|
|||
|
|
|||
|
```{code-cell} python
|
|||
|
md.enable(["list", "emphasis"]).render("- __*emphasise this*__")
|
|||
|
```
|
|||
|
|
|||
|
You can temporarily modify rules with the `reset_rules` context manager.
|
|||
|
|
|||
|
```{code-cell} python
|
|||
|
with md.reset_rules():
|
|||
|
md.disable("emphasis")
|
|||
|
print(md.render("__*emphasise this*__"))
|
|||
|
md.render("__*emphasise this*__")
|
|||
|
```
|
|||
|
|
|||
|
Additionally `renderInline` runs the parser with all block syntax rules disabled.
|
|||
|
|
|||
|
```{code-cell} python
|
|||
|
md.renderInline("__*emphasise this*__")
|
|||
|
```
|
|||
|
|
|||
|
### Typographic components
|
|||
|
|
|||
|
The `smartquotes` and `replacements` components are intended to improve typography:
|
|||
|
|
|||
|
`smartquotes` will convert basic quote marks to their opening and closing variants:
|
|||
|
|
|||
|
- 'single quotes' -> ‘single quotes’
|
|||
|
- "double quotes" -> “double quotes”
|
|||
|
|
|||
|
`replacements` will replace particular text constructs:
|
|||
|
|
|||
|
- ``(c)``, ``(C)`` → ©
|
|||
|
- ``(tm)``, ``(TM)`` → ™
|
|||
|
- ``(r)``, ``(R)`` → ®
|
|||
|
- ``(p)``, ``(P)`` → §
|
|||
|
- ``+-`` → ±
|
|||
|
- ``...`` → …
|
|||
|
- ``?....`` → ?..
|
|||
|
- ``!....`` → !..
|
|||
|
- ``????????`` → ???
|
|||
|
- ``!!!!!`` → !!!
|
|||
|
- ``,,,`` → ,
|
|||
|
- ``--`` → &ndash
|
|||
|
- ``---`` → &mdash
|
|||
|
|
|||
|
Both of these components require typography to be turned on, as well as the components enabled:
|
|||
|
|
|||
|
```{code-cell} python
|
|||
|
md = MarkdownIt("commonmark", {"typographer": True})
|
|||
|
md.enable(["replacements", "smartquotes"])
|
|||
|
md.render("'single quotes' (c)")
|
|||
|
```
|
|||
|
|
|||
|
### Linkify
|
|||
|
|
|||
|
The `linkify` component requires that [linkify-it-py](https://github.com/tsutsu3/linkify-it-py) be installed (e.g. *via* `pip install markdown-it-py[linkify]`).
|
|||
|
This allows URI autolinks to be identified, without the need for enclosing in `<>` brackets:
|
|||
|
|
|||
|
```{code-cell} python
|
|||
|
md = MarkdownIt("commonmark", {"linkify": True})
|
|||
|
md.enable(["linkify"])
|
|||
|
md.render("github.com")
|
|||
|
```
|
|||
|
|
|||
|
### Plugins load
|
|||
|
|
|||
|
Plugins load collections of additional syntax rules and render methods into the parser.
|
|||
|
A number of useful plugins are available in [`mdit_py_plugins`](https://github.com/executablebooks/mdit-py-plugins) (see [the plugin list](./plugins.md)),
|
|||
|
or you can create your own (following the [markdown-it design principles](./architecture.md)).
|
|||
|
|
|||
|
```{code-cell} python
|
|||
|
from markdown_it import MarkdownIt
|
|||
|
import mdit_py_plugins
|
|||
|
from mdit_py_plugins.front_matter import front_matter_plugin
|
|||
|
from mdit_py_plugins.footnote import footnote_plugin
|
|||
|
|
|||
|
md = (
|
|||
|
MarkdownIt()
|
|||
|
.use(front_matter_plugin)
|
|||
|
.use(footnote_plugin)
|
|||
|
.enable('table')
|
|||
|
)
|
|||
|
text = ("""
|
|||
|
---
|
|||
|
a: 1
|
|||
|
---
|
|||
|
|
|||
|
a | b
|
|||
|
- | -
|
|||
|
1 | 2
|
|||
|
|
|||
|
A footnote [^1]
|
|||
|
|
|||
|
[^1]: some details
|
|||
|
""")
|
|||
|
md.render(text)
|
|||
|
```
|
|||
|
|
|||
|
## The Token Stream
|
|||
|
|
|||
|
+++
|
|||
|
|
|||
|
Before rendering, the text is parsed to a flat token stream of block level syntax elements, with nesting defined by opening (1) and closing (-1) attributes:
|
|||
|
|
|||
|
```{code-cell} python
|
|||
|
md = MarkdownIt("commonmark")
|
|||
|
tokens = md.parse("""
|
|||
|
Here's some *text*
|
|||
|
|
|||
|
1. a list
|
|||
|
|
|||
|
> a *quote*""")
|
|||
|
[(t.type, t.nesting) for t in tokens]
|
|||
|
```
|
|||
|
|
|||
|
Naturally all openings should eventually be closed,
|
|||
|
such that:
|
|||
|
|
|||
|
```{code-cell} python
|
|||
|
sum([t.nesting for t in tokens]) == 0
|
|||
|
```
|
|||
|
|
|||
|
All tokens are the same class, which can also be created outside the parser:
|
|||
|
|
|||
|
```{code-cell} python
|
|||
|
tokens[0]
|
|||
|
```
|
|||
|
|
|||
|
```{code-cell} python
|
|||
|
from markdown_it.token import Token
|
|||
|
token = Token("paragraph_open", "p", 1, block=True, map=[1, 2])
|
|||
|
token == tokens[0]
|
|||
|
```
|
|||
|
|
|||
|
The `'inline'` type token contain the inline tokens as children:
|
|||
|
|
|||
|
```{code-cell} python
|
|||
|
tokens[1]
|
|||
|
```
|
|||
|
|
|||
|
You can serialize a token (and its children) to a JSONable dictionary using:
|
|||
|
|
|||
|
```{code-cell} python
|
|||
|
print(tokens[1].as_dict())
|
|||
|
```
|
|||
|
|
|||
|
This dictionary can also be deserialized:
|
|||
|
|
|||
|
```{code-cell} python
|
|||
|
Token.from_dict(tokens[1].as_dict())
|
|||
|
```
|
|||
|
|
|||
|
### Creating a syntax tree
|
|||
|
|
|||
|
```{versionchanged} 0.7.0
|
|||
|
`nest_tokens` and `NestedTokens` are deprecated and replaced by `SyntaxTreeNode`.
|
|||
|
```
|
|||
|
|
|||
|
In some use cases it may be useful to convert the token stream into a syntax tree,
|
|||
|
with opening/closing tokens collapsed into a single token that contains children.
|
|||
|
|
|||
|
```{code-cell} python
|
|||
|
from markdown_it.tree import SyntaxTreeNode
|
|||
|
|
|||
|
md = MarkdownIt("commonmark")
|
|||
|
tokens = md.parse("""
|
|||
|
# Header
|
|||
|
|
|||
|
Here's some text and an image 
|
|||
|
|
|||
|
1. a **list**
|
|||
|
|
|||
|
> a *quote*
|
|||
|
""")
|
|||
|
|
|||
|
node = SyntaxTreeNode(tokens)
|
|||
|
print(node.pretty(indent=2, show_text=True))
|
|||
|
```
|
|||
|
|
|||
|
You can then use methods to traverse the tree
|
|||
|
|
|||
|
```{code-cell} python
|
|||
|
node.children
|
|||
|
```
|
|||
|
|
|||
|
```{code-cell} python
|
|||
|
print(node[0])
|
|||
|
node[0].next_sibling
|
|||
|
```
|
|||
|
|
|||
|
## Renderers
|
|||
|
|
|||
|
+++
|
|||
|
|
|||
|
After the token stream is generated, it's passed to a [renderer](https://github.com/executablebooks/markdown-it-py/tree/master/markdown_it/renderer.py).
|
|||
|
It then plays all the tokens, passing each to a rule with the same name as token type.
|
|||
|
|
|||
|
Renderer rules are located in `md.renderer.rules` and are simple functions
|
|||
|
with the same signature:
|
|||
|
|
|||
|
```python
|
|||
|
def function(renderer, tokens, idx, options, env):
|
|||
|
return htmlResult
|
|||
|
```
|
|||
|
|
|||
|
+++
|
|||
|
|
|||
|
You can inject render methods into the instantiated render class.
|
|||
|
|
|||
|
```{code-cell} python
|
|||
|
md = MarkdownIt("commonmark")
|
|||
|
|
|||
|
def render_em_open(self, tokens, idx, options, env):
|
|||
|
return '<em class="myclass">'
|
|||
|
|
|||
|
md.add_render_rule("em_open", render_em_open)
|
|||
|
md.render("*a*")
|
|||
|
```
|
|||
|
|
|||
|
This is a slight change to the JS version, where the renderer argument is at the end.
|
|||
|
Also `add_render_rule` method is specific to Python, rather than adding directly to the `md.renderer.rules`, this ensures the method is bound to the renderer.
|
|||
|
|
|||
|
+++
|
|||
|
|
|||
|
You can also subclass a render and add the method there:
|
|||
|
|
|||
|
```{code-cell} python
|
|||
|
from markdown_it.renderer import RendererHTML
|
|||
|
|
|||
|
class MyRenderer(RendererHTML):
|
|||
|
def em_open(self, tokens, idx, options, env):
|
|||
|
return '<em class="myclass">'
|
|||
|
|
|||
|
md = MarkdownIt("commonmark", renderer_cls=MyRenderer)
|
|||
|
md.render("*a*")
|
|||
|
```
|
|||
|
|
|||
|
Plugins can support multiple render types, using the `__ouput__` attribute (this is currently a Python only feature).
|
|||
|
|
|||
|
```{code-cell} python
|
|||
|
from markdown_it.renderer import RendererHTML
|
|||
|
|
|||
|
class MyRenderer1(RendererHTML):
|
|||
|
__output__ = "html1"
|
|||
|
|
|||
|
class MyRenderer2(RendererHTML):
|
|||
|
__output__ = "html2"
|
|||
|
|
|||
|
def plugin(md):
|
|||
|
def render_em_open1(self, tokens, idx, options, env):
|
|||
|
return '<em class="myclass1">'
|
|||
|
def render_em_open2(self, tokens, idx, options, env):
|
|||
|
return '<em class="myclass2">'
|
|||
|
md.add_render_rule("em_open", render_em_open1, fmt="html1")
|
|||
|
md.add_render_rule("em_open", render_em_open2, fmt="html2")
|
|||
|
|
|||
|
md = MarkdownIt("commonmark", renderer_cls=MyRenderer1).use(plugin)
|
|||
|
print(md.render("*a*"))
|
|||
|
|
|||
|
md = MarkdownIt("commonmark", renderer_cls=MyRenderer2).use(plugin)
|
|||
|
print(md.render("*a*"))
|
|||
|
```
|
|||
|
|
|||
|
Here's a more concrete example; let's replace images with vimeo links to player's iframe:
|
|||
|
|
|||
|
```{code-cell} python
|
|||
|
import re
|
|||
|
from markdown_it import MarkdownIt
|
|||
|
|
|||
|
vimeoRE = re.compile(r'^https?:\/\/(www\.)?vimeo.com\/(\d+)($|\/)')
|
|||
|
|
|||
|
def render_vimeo(self, tokens, idx, options, env):
|
|||
|
token = tokens[idx]
|
|||
|
|
|||
|
if vimeoRE.match(token.attrs["src"]):
|
|||
|
|
|||
|
ident = vimeoRE.match(token.attrs["src"])[2]
|
|||
|
|
|||
|
return ('<div class="embed-responsive embed-responsive-16by9">\n' +
|
|||
|
' <iframe class="embed-responsive-item" src="//player.vimeo.com/video/' +
|
|||
|
ident + '"></iframe>\n' +
|
|||
|
'</div>\n')
|
|||
|
return self.image(tokens, idx, options, env)
|
|||
|
|
|||
|
md = MarkdownIt("commonmark")
|
|||
|
md.add_render_rule("image", render_vimeo)
|
|||
|
print(md.render(""))
|
|||
|
```
|
|||
|
|
|||
|
Here is another example, how to add `target="_blank"` to all links:
|
|||
|
|
|||
|
```{code-cell} python
|
|||
|
from markdown_it import MarkdownIt
|
|||
|
|
|||
|
def render_blank_link(self, tokens, idx, options, env):
|
|||
|
tokens[idx].attrSet("target", "_blank")
|
|||
|
|
|||
|
# pass token to default renderer.
|
|||
|
return self.renderToken(tokens, idx, options, env)
|
|||
|
|
|||
|
md = MarkdownIt("commonmark")
|
|||
|
md.add_render_rule("link_open", render_blank_link)
|
|||
|
print(md.render("[a]\n\n[a]: b"))
|
|||
|
```
|