add README.markdown, rm README.peg-markdown
This commit is contained in:
parent
e54246e086
commit
a6edc52d48
102
README.markdown
Normal file
102
README.markdown
Normal file
@ -0,0 +1,102 @@
|
|||||||
|
This is an implementation of John Gruber's [markdown][] in
|
||||||
|
[Go][]. It is a translation of [peg-markdown][], written by
|
||||||
|
John MacFarlane in C, into Go. It is using a modified version
|
||||||
|
of Andrew J Snodgrass' PEG parser [peg][] -- now supporting
|
||||||
|
LEG grammars --, which is itself based on the parser used
|
||||||
|
by peg-markdown.
|
||||||
|
|
||||||
|
[markdown]: http://daringfireball.net/projects/markdown/
|
||||||
|
[peg-markdown]: https://github.com/jgm/peg-markdown
|
||||||
|
[peg]: https://github.com/pointlander/peg
|
||||||
|
[Go]: http://golang.org/
|
||||||
|
|
||||||
|
Support for HTML output is implemented, but Groff and LaTeX
|
||||||
|
output have not been ported. The output should be identical
|
||||||
|
to that of peg-markdown.
|
||||||
|
|
||||||
|
The Go version is around 5x slower than the original C
|
||||||
|
version. A marked speed improvement has been achieved by
|
||||||
|
converting function `preformat` from concatenating strings
|
||||||
|
to using bytes.Buffer. At other places, where this kind of
|
||||||
|
modification had been tried, performance did not improve. Also,
|
||||||
|
pre-allocating a large buffer for `element`s didn't show a
|
||||||
|
significant difference from allocating `element`s one at a time.
|
||||||
|
|
||||||
|
## Installation
|
||||||
|
|
||||||
|
Provided you have a recent copy of Go, and git is available,
|
||||||
|
|
||||||
|
goinstall github.com/knieriem/markdown
|
||||||
|
|
||||||
|
should install the package into
|
||||||
|
`$GOROOT/src/pkg/github.com/knieriem/markdown`, and build
|
||||||
|
it. During the build, a copy of [knieriem/peg][] will be
|
||||||
|
downloaded from github and compiled (`make peg` if done
|
||||||
|
manually).
|
||||||
|
|
||||||
|
**NOTE:** At the moment, goinstall most likely will fail,
|
||||||
|
as it does not use the package's Makefile, but generates
|
||||||
|
its own, which is not sufficient as it does not know how
|
||||||
|
to build parser.leg.go from parser.leg. As a workaround,
|
||||||
|
after the failed goinstall, please do the following steps to
|
||||||
|
finish the installation:
|
||||||
|
|
||||||
|
cd $GOROOT/src/pkg/github.com/knieriem/markdown
|
||||||
|
gomake install
|
||||||
|
|
||||||
|
See doc.go for an example how to use the package.
|
||||||
|
|
||||||
|
To create the command line program *markdown,* run
|
||||||
|
|
||||||
|
cd $GOROOT/src/pkg/github.com/knieriem/markdown
|
||||||
|
gomake cmd
|
||||||
|
|
||||||
|
the binary should then be available in subdirectory *cmd.*
|
||||||
|
|
||||||
|
To run the Markdown 1.0.3 test suite, type
|
||||||
|
|
||||||
|
make mdtest
|
||||||
|
|
||||||
|
This will download peg-markdown, in case you have `git`
|
||||||
|
available, build cmd/markdown, and run the test suite.
|
||||||
|
|
||||||
|
The test suite will fail on one test, for the same reason which
|
||||||
|
applies to peg-markdown, because the grammar is the same.
|
||||||
|
See the [original README][] for details.
|
||||||
|
|
||||||
|
[original README]: https://github.com/jgm/peg-markdown/blob/master/README.markdown
|
||||||
|
[knieriem/peg]: https://github.com/knieriem/peg
|
||||||
|
|
||||||
|
## Known issues
|
||||||
|
|
||||||
|
Emphasis and strong markup within items of lists, as in input
|
||||||
|
like ./PHP Markdown Extra.mdtest/Emphasis.text from Michel
|
||||||
|
Fortin's [MDTest][] package,
|
||||||
|
|
||||||
|
1. ***test test***
|
||||||
|
2. ___test test___
|
||||||
|
3. *test **test***
|
||||||
|
4. **test *test***
|
||||||
|
...
|
||||||
|
|
||||||
|
seem to present a problem for the LEG parser, which needs
|
||||||
|
(on my system) around four minutes to process that file.
|
||||||
|
|
||||||
|
[MDTest]: http://git.michelf.com/mdtest/
|
||||||
|
|
||||||
|
## Todo
|
||||||
|
|
||||||
|
* Implement definition lists (work in progress), and perhaps tables
|
||||||
|
|
||||||
|
* Rename element key identifiers, so that they are not public
|
||||||
|
|
||||||
|
* Where appropriate, use more idiomatic Go code
|
||||||
|
|
||||||
|
## Subdirectory Index
|
||||||
|
|
||||||
|
* peg – PEG parser generator (modified) from Andrew J Snodgrass
|
||||||
|
|
||||||
|
* peg/leg – LEG parser generator, based on PEG
|
||||||
|
|
||||||
|
* cmd – command line program `markdown`
|
||||||
|
|
@ -1,214 +0,0 @@
|
|||||||
What is this?
|
|
||||||
=============
|
|
||||||
|
|
||||||
This is an implementation of John Gruber's [markdown][] in C. It uses a
|
|
||||||
[parsing expression grammar (PEG)][] to define the syntax. This should
|
|
||||||
allow easy modification and extension. It currently supports output in
|
|
||||||
HTML, LaTeX, or groff_mm formats, and adding new formats is relatively
|
|
||||||
easy.
|
|
||||||
|
|
||||||
[parsing expression grammar (PEG)]: http://en.wikipedia.org/wiki/Parsing_expression_grammar
|
|
||||||
[markdown]: http://daringfireball.net/projects/markdown/
|
|
||||||
|
|
||||||
It is pretty fast. A 179K text file that takes 5.7 seconds for
|
|
||||||
Markdown.pl (v. 1.0.1) to parse takes less than 0.2 seconds for this
|
|
||||||
markdown. It does, however, use a lot of memory (up to 4M of heap space
|
|
||||||
while parsing the 179K file, and up to 80K for a 4K file). (Note that
|
|
||||||
the memory leaks in earlier versions of this program have now been
|
|
||||||
plugged.)
|
|
||||||
|
|
||||||
Both a library and a standalone program are provided.
|
|
||||||
|
|
||||||
peg-markdown is written and maintained by John MacFarlane (jgm on
|
|
||||||
github), with significant contributions by Ryan Tomayko (rtomayko).
|
|
||||||
It is released under both the GPL and the MIT license; see LICENSE for
|
|
||||||
details.
|
|
||||||
|
|
||||||
Installing
|
|
||||||
==========
|
|
||||||
|
|
||||||
On a linux or unix-based system
|
|
||||||
-------------------------------
|
|
||||||
|
|
||||||
This program is written in portable ANSI C. It requires
|
|
||||||
[glib2](http://www.gtk.org/download.html). Most *nix systems will have
|
|
||||||
this installed already. The build system requires GNU make.
|
|
||||||
|
|
||||||
The other required dependency, [Ian Piumarta's peg/leg PEG parser
|
|
||||||
generator](http://piumarta.com/software/peg/), is included in the source
|
|
||||||
directory. It will be built automatically. (However, it is not as portable
|
|
||||||
as peg-markdown itself, and seems to require gcc.)
|
|
||||||
|
|
||||||
To make the 'markdown' executable:
|
|
||||||
|
|
||||||
make
|
|
||||||
|
|
||||||
(Or, on some systems, `gmake`.) Then, for usage instructions:
|
|
||||||
|
|
||||||
./markdown --help
|
|
||||||
|
|
||||||
To run John Gruber's Markdown 1.0.3 test suite:
|
|
||||||
|
|
||||||
make test
|
|
||||||
|
|
||||||
The test suite will fail on one of the list tests. Here's why.
|
|
||||||
Markdown.pl encloses "item one" in the following list in `<p>` tags:
|
|
||||||
|
|
||||||
1. item one
|
|
||||||
* subitem
|
|
||||||
* subitem
|
|
||||||
|
|
||||||
2. item two
|
|
||||||
|
|
||||||
3. item three
|
|
||||||
|
|
||||||
peg-markdown does not enclose "item one" in `<p>` tags unless it has a
|
|
||||||
following blank line. This is consistent with the official markdown
|
|
||||||
syntax description, and lets the author of the document choose whether
|
|
||||||
`<p>` tags are desired.
|
|
||||||
|
|
||||||
Cross-compiling for Windows with MinGW on a linux box
|
|
||||||
-----------------------------------------------------
|
|
||||||
|
|
||||||
Prerequisites:
|
|
||||||
|
|
||||||
* Linux system with MinGW cross compiler For Ubuntu:
|
|
||||||
|
|
||||||
sudo apt-get install mingw32
|
|
||||||
|
|
||||||
* [Windows glib-2.0 binary & development files](http://www.gtk.org/download-windows.html).
|
|
||||||
Unzip files into cross-compiler directory tree (e.g., `/usr/i586-mingw32msvc`).
|
|
||||||
|
|
||||||
Steps:
|
|
||||||
|
|
||||||
1. Create the markdown parser using Linux-compiled `leg` from peg-0.1.4:
|
|
||||||
|
|
||||||
./peg-0.1.4/leg markdown_parser.leg >markdown_parser.c
|
|
||||||
|
|
||||||
(Note: The same thing could be accomplished by cross-compiling leg,
|
|
||||||
executing it on Windows, and copying the resulting C file to the Linux
|
|
||||||
cross-compiler host.)
|
|
||||||
|
|
||||||
2. Run the cross compiler with include flag for the Windows glib-2.0 headers:
|
|
||||||
for example,
|
|
||||||
|
|
||||||
/usr/bin/i586-mingw32msvc-cc -c \
|
|
||||||
-I/usr/i586-mingw32msvc/include/glib-2.0 \
|
|
||||||
-I/usr/i586-mingw32msvc/lib/glib-2.0/include -Wall -O3 -ansi markdown*.c
|
|
||||||
|
|
||||||
3. Link against Windows glib-2.0 headers: for example,
|
|
||||||
|
|
||||||
/usr/bin/i586-mingw32msvc-cc markdown*.o \
|
|
||||||
-Wl,-L/usr/i586-mingw32msvc/lib/glib,--dy,--warn-unresolved-symbols,-lglib-2.0 \
|
|
||||||
-o markdown.exe
|
|
||||||
|
|
||||||
The resulting executable depends on the glib dll file, so be sure to
|
|
||||||
load the glib binary on the Windows host.
|
|
||||||
|
|
||||||
Compiling with MinGW on Windows
|
|
||||||
-------------------------------
|
|
||||||
|
|
||||||
These directions assume that MinGW is installed in `c:\MinGW` and glib-2.0
|
|
||||||
is installed in the MinGW directory hierarchy (with the mingw bin directory
|
|
||||||
in the system path).
|
|
||||||
|
|
||||||
Unzip peg-markdown in a temp directory. From the directory with the
|
|
||||||
peg-markdown source, execute:
|
|
||||||
|
|
||||||
cd peg-0.1.4
|
|
||||||
for %i in (*.c) do @gcc -g -Wall -O3 -DNDEBUG -c -o %~ni.o %i
|
|
||||||
gcc -o leg.exe leg.o tree.o compile.o
|
|
||||||
cd ..
|
|
||||||
peg-0.1.4\leg.exe markdown_parser.leg >markdown_parser.c
|
|
||||||
@for %i in (markdown*.c) do @gcc -mms-bitfields -Ic:/MinGW/include/glib-2.0 -Ic:/MinGW/lib/glib-2.0/include -c -o %~ni.o %i
|
|
||||||
gcc -O3 -Lc:/MinGW/lib/glib-2.0 -lglib-2.0 -lintl markdown.o markdown_lib.o markdown_output.o markdown_parser.o -o markdown.exe -Wl,--dy,--warn-unresolved-symbols,-lglib-2.0,-Lc:/MinGW/lib/glib-2.0,-lglib-2.0,-lintl
|
|
||||||
|
|
||||||
(Windows instructions courtesy of Matt Wolf.)
|
|
||||||
|
|
||||||
Extensions
|
|
||||||
==========
|
|
||||||
|
|
||||||
peg-markdown supports extensions to standard markdown syntax.
|
|
||||||
These can be turned on using the command line flag `-x` or
|
|
||||||
`--extensions`. `-x` by itself turns on all extensions. Extensions
|
|
||||||
can also be turned on selectively, using individual command-line
|
|
||||||
options. To see the available extensions:
|
|
||||||
|
|
||||||
./markdown --help-extensions
|
|
||||||
|
|
||||||
The `--smart` extension provides "smart quotes", dashes, and ellipses.
|
|
||||||
|
|
||||||
The `--notes` extension provides a footnote syntax like that of
|
|
||||||
Pandoc or PHP Markdown Extra.
|
|
||||||
|
|
||||||
Using the library
|
|
||||||
=================
|
|
||||||
|
|
||||||
The library exports two functions:
|
|
||||||
|
|
||||||
GString * markdown_to_g_string(char *text, int extensions, int output_format);
|
|
||||||
char * markdown_to_string(char *text, int extensions, int output_format);
|
|
||||||
|
|
||||||
The only difference between these is that `markdown_to_g_string` returns a
|
|
||||||
`GString` (glib's automatically resizable string), while `markdown_to_string`
|
|
||||||
returns a regular character pointer. The memory allocated for these must be
|
|
||||||
freed by the calling program, using `g_string_free()` or `free()`.
|
|
||||||
|
|
||||||
`text` is the markdown-formatted text to be converted. Note that tabs will
|
|
||||||
be converted to spaces, using a four-space tab stop. Character encodings are
|
|
||||||
ignored.
|
|
||||||
|
|
||||||
`extensions` is a bit-field specifying which syntax extensions should be used.
|
|
||||||
If `extensions` is 0, no extensions will be used. If it is `0xFFFFFF`,
|
|
||||||
all extensions will be used. To set extensions selectively, use the
|
|
||||||
bitwise `&` operator and the following constants:
|
|
||||||
|
|
||||||
- `EXT_SMART` turns on smart quotes, dashes, and ellipses.
|
|
||||||
- `EXT_NOTES` turns on footnote syntax. [Pandoc's footnote syntax][] is used here.
|
|
||||||
- `EXT_FILTER_HTML` filters out raw HTML (except for styles).
|
|
||||||
- `EXT_FILTER_STYLES` filters out styles in HTML.
|
|
||||||
|
|
||||||
[Pandoc's footnote syntax]: http://johnmacfarlane.net/pandoc/README.html#footnotes
|
|
||||||
|
|
||||||
`output_format` is either `HTML_FORMAT`, `LATEX_FORMAT`, or `GROFF_MM_FORMAT`.
|
|
||||||
|
|
||||||
To use the library, include `markdown_lib.h`. See `markdown.c` for an example.
|
|
||||||
|
|
||||||
Hacking
|
|
||||||
=======
|
|
||||||
|
|
||||||
It should be pretty easy to modify the program to produce other formats
|
|
||||||
than HTML or LaTeX, and to parse syntax extensions. A quick guide:
|
|
||||||
|
|
||||||
* `markdown_parser.leg` contains the grammar itself.
|
|
||||||
|
|
||||||
* `markdown_output.c` contains functions for printing the `Element`
|
|
||||||
structure in various output formats.
|
|
||||||
|
|
||||||
* To add an output format, add the format to `markdown_formats` in
|
|
||||||
`markdown_lib.h`. Then modify `print_element` in `markdown_output.c`,
|
|
||||||
and add functions `print_XXXX_string`, `print_XXXX_element`, and
|
|
||||||
`print_XXXX_element_list`. Also add an option in the main program
|
|
||||||
that selects the new format. Don't forget to add it to the list of
|
|
||||||
formats in the usage message.
|
|
||||||
|
|
||||||
* To add syntax extensions, define them in the PEG grammar
|
|
||||||
(`markdown_parser.leg`), using existing extensions as a guide. New
|
|
||||||
inline elements will need to be added to `Inline =`; new block
|
|
||||||
elements will need to be added to `Block =`. (Note: the order
|
|
||||||
of the alternatives does matter in PEG grammars.)
|
|
||||||
|
|
||||||
* If you need to add new types of elements, modify the `keys`
|
|
||||||
enum in `markdown_peg.h`.
|
|
||||||
|
|
||||||
* By using `&{ }` rules one can selectively disable extensions
|
|
||||||
depending on command-line options. For example,
|
|
||||||
`&{ extension(EXT_SMART) }` succeeds only if the `EXT_SMART` bit
|
|
||||||
of the global `syntax_extensions` is set. Add your option to
|
|
||||||
`markdown_extensions` in `markdown_lib.h`, and add an option in
|
|
||||||
`markdown.c` to turn on your extension.
|
|
||||||
|
|
||||||
* Note: Avoid using `[^abc]` character classes in the grammar, because
|
|
||||||
they cause problems with non-ascii input. Instead, use: `( !'a' !'b'
|
|
||||||
!'c' . )`
|
|
||||||
|
|
Loading…
Reference in New Issue
Block a user