add README.markdown, rm README.peg-markdown

This commit is contained in:
Michael Teichgräber 2010-12-01 22:29:50 +01:00
parent e54246e086
commit a6edc52d48
2 changed files with 102 additions and 214 deletions

102
README.markdown Normal file
View File

@ -0,0 +1,102 @@
This is an implementation of John Gruber's [markdown][] in
[Go][]. It is a translation of [peg-markdown][], written by
John MacFarlane in C, into Go. It is using a modified version
of Andrew J Snodgrass' PEG parser [peg][] -- now supporting
LEG grammars --, which is itself based on the parser used
by peg-markdown.
[markdown]: http://daringfireball.net/projects/markdown/
[peg-markdown]: https://github.com/jgm/peg-markdown
[peg]: https://github.com/pointlander/peg
[Go]: http://golang.org/
Support for HTML output is implemented, but Groff and LaTeX
output have not been ported. The output should be identical
to that of peg-markdown.
The Go version is around 5x slower than the original C
version. A marked speed improvement has been achieved by
converting function `preformat` from concatenating strings
to using bytes.Buffer. At other places, where this kind of
modification had been tried, performance did not improve. Also,
pre-allocating a large buffer for `element`s didn't show a
significant difference from allocating `element`s one at a time.
## Installation
Provided you have a recent copy of Go, and git is available,
goinstall github.com/knieriem/markdown
should install the package into
`$GOROOT/src/pkg/github.com/knieriem/markdown`, and build
it. During the build, a copy of [knieriem/peg][] will be
downloaded from github and compiled (`make peg` if done
manually).
**NOTE:** At the moment, goinstall most likely will fail,
as it does not use the package's Makefile, but generates
its own, which is not sufficient as it does not know how
to build parser.leg.go from parser.leg. As a workaround,
after the failed goinstall, please do the following steps to
finish the installation:
cd $GOROOT/src/pkg/github.com/knieriem/markdown
gomake install
See doc.go for an example how to use the package.
To create the command line program *markdown,* run
cd $GOROOT/src/pkg/github.com/knieriem/markdown
gomake cmd
the binary should then be available in subdirectory *cmd.*
To run the Markdown 1.0.3 test suite, type
make mdtest
This will download peg-markdown, in case you have `git`
available, build cmd/markdown, and run the test suite.
The test suite will fail on one test, for the same reason which
applies to peg-markdown, because the grammar is the same.
See the [original README][] for details.
[original README]: https://github.com/jgm/peg-markdown/blob/master/README.markdown
[knieriem/peg]: https://github.com/knieriem/peg
## Known issues
Emphasis and strong markup within items of lists, as in input
like ./PHP Markdown Extra.mdtest/Emphasis.text from Michel
Fortin's [MDTest][] package,
1. ***test test***
2. ___test test___
3. *test **test***
4. **test *test***
...
seem to present a problem for the LEG parser, which needs
(on my system) around four minutes to process that file.
[MDTest]: http://git.michelf.com/mdtest/
## Todo
* Implement definition lists (work in progress), and perhaps tables
* Rename element key identifiers, so that they are not public
* Where appropriate, use more idiomatic Go code
## Subdirectory Index
* peg PEG parser generator (modified) from Andrew J Snodgrass
* peg/leg LEG parser generator, based on PEG
* cmd command line program `markdown`

View File

@ -1,214 +0,0 @@
What is this?
=============
This is an implementation of John Gruber's [markdown][] in C. It uses a
[parsing expression grammar (PEG)][] to define the syntax. This should
allow easy modification and extension. It currently supports output in
HTML, LaTeX, or groff_mm formats, and adding new formats is relatively
easy.
[parsing expression grammar (PEG)]: http://en.wikipedia.org/wiki/Parsing_expression_grammar
[markdown]: http://daringfireball.net/projects/markdown/
It is pretty fast. A 179K text file that takes 5.7 seconds for
Markdown.pl (v. 1.0.1) to parse takes less than 0.2 seconds for this
markdown. It does, however, use a lot of memory (up to 4M of heap space
while parsing the 179K file, and up to 80K for a 4K file). (Note that
the memory leaks in earlier versions of this program have now been
plugged.)
Both a library and a standalone program are provided.
peg-markdown is written and maintained by John MacFarlane (jgm on
github), with significant contributions by Ryan Tomayko (rtomayko).
It is released under both the GPL and the MIT license; see LICENSE for
details.
Installing
==========
On a linux or unix-based system
-------------------------------
This program is written in portable ANSI C. It requires
[glib2](http://www.gtk.org/download.html). Most *nix systems will have
this installed already. The build system requires GNU make.
The other required dependency, [Ian Piumarta's peg/leg PEG parser
generator](http://piumarta.com/software/peg/), is included in the source
directory. It will be built automatically. (However, it is not as portable
as peg-markdown itself, and seems to require gcc.)
To make the 'markdown' executable:
make
(Or, on some systems, `gmake`.) Then, for usage instructions:
./markdown --help
To run John Gruber's Markdown 1.0.3 test suite:
make test
The test suite will fail on one of the list tests. Here's why.
Markdown.pl encloses "item one" in the following list in `<p>` tags:
1. item one
* subitem
* subitem
2. item two
3. item three
peg-markdown does not enclose "item one" in `<p>` tags unless it has a
following blank line. This is consistent with the official markdown
syntax description, and lets the author of the document choose whether
`<p>` tags are desired.
Cross-compiling for Windows with MinGW on a linux box
-----------------------------------------------------
Prerequisites:
* Linux system with MinGW cross compiler For Ubuntu:
sudo apt-get install mingw32
* [Windows glib-2.0 binary & development files](http://www.gtk.org/download-windows.html).
Unzip files into cross-compiler directory tree (e.g., `/usr/i586-mingw32msvc`).
Steps:
1. Create the markdown parser using Linux-compiled `leg` from peg-0.1.4:
./peg-0.1.4/leg markdown_parser.leg >markdown_parser.c
(Note: The same thing could be accomplished by cross-compiling leg,
executing it on Windows, and copying the resulting C file to the Linux
cross-compiler host.)
2. Run the cross compiler with include flag for the Windows glib-2.0 headers:
for example,
/usr/bin/i586-mingw32msvc-cc -c \
-I/usr/i586-mingw32msvc/include/glib-2.0 \
-I/usr/i586-mingw32msvc/lib/glib-2.0/include -Wall -O3 -ansi markdown*.c
3. Link against Windows glib-2.0 headers: for example,
/usr/bin/i586-mingw32msvc-cc markdown*.o \
-Wl,-L/usr/i586-mingw32msvc/lib/glib,--dy,--warn-unresolved-symbols,-lglib-2.0 \
-o markdown.exe
The resulting executable depends on the glib dll file, so be sure to
load the glib binary on the Windows host.
Compiling with MinGW on Windows
-------------------------------
These directions assume that MinGW is installed in `c:\MinGW` and glib-2.0
is installed in the MinGW directory hierarchy (with the mingw bin directory
in the system path).
Unzip peg-markdown in a temp directory. From the directory with the
peg-markdown source, execute:
cd peg-0.1.4
for %i in (*.c) do @gcc -g -Wall -O3 -DNDEBUG -c -o %~ni.o %i
gcc -o leg.exe leg.o tree.o compile.o
cd ..
peg-0.1.4\leg.exe markdown_parser.leg >markdown_parser.c
@for %i in (markdown*.c) do @gcc -mms-bitfields -Ic:/MinGW/include/glib-2.0 -Ic:/MinGW/lib/glib-2.0/include -c -o %~ni.o %i
gcc -O3 -Lc:/MinGW/lib/glib-2.0 -lglib-2.0 -lintl markdown.o markdown_lib.o markdown_output.o markdown_parser.o -o markdown.exe -Wl,--dy,--warn-unresolved-symbols,-lglib-2.0,-Lc:/MinGW/lib/glib-2.0,-lglib-2.0,-lintl
(Windows instructions courtesy of Matt Wolf.)
Extensions
==========
peg-markdown supports extensions to standard markdown syntax.
These can be turned on using the command line flag `-x` or
`--extensions`. `-x` by itself turns on all extensions. Extensions
can also be turned on selectively, using individual command-line
options. To see the available extensions:
./markdown --help-extensions
The `--smart` extension provides "smart quotes", dashes, and ellipses.
The `--notes` extension provides a footnote syntax like that of
Pandoc or PHP Markdown Extra.
Using the library
=================
The library exports two functions:
GString * markdown_to_g_string(char *text, int extensions, int output_format);
char * markdown_to_string(char *text, int extensions, int output_format);
The only difference between these is that `markdown_to_g_string` returns a
`GString` (glib's automatically resizable string), while `markdown_to_string`
returns a regular character pointer. The memory allocated for these must be
freed by the calling program, using `g_string_free()` or `free()`.
`text` is the markdown-formatted text to be converted. Note that tabs will
be converted to spaces, using a four-space tab stop. Character encodings are
ignored.
`extensions` is a bit-field specifying which syntax extensions should be used.
If `extensions` is 0, no extensions will be used. If it is `0xFFFFFF`,
all extensions will be used. To set extensions selectively, use the
bitwise `&` operator and the following constants:
- `EXT_SMART` turns on smart quotes, dashes, and ellipses.
- `EXT_NOTES` turns on footnote syntax. [Pandoc's footnote syntax][] is used here.
- `EXT_FILTER_HTML` filters out raw HTML (except for styles).
- `EXT_FILTER_STYLES` filters out styles in HTML.
[Pandoc's footnote syntax]: http://johnmacfarlane.net/pandoc/README.html#footnotes
`output_format` is either `HTML_FORMAT`, `LATEX_FORMAT`, or `GROFF_MM_FORMAT`.
To use the library, include `markdown_lib.h`. See `markdown.c` for an example.
Hacking
=======
It should be pretty easy to modify the program to produce other formats
than HTML or LaTeX, and to parse syntax extensions. A quick guide:
* `markdown_parser.leg` contains the grammar itself.
* `markdown_output.c` contains functions for printing the `Element`
structure in various output formats.
* To add an output format, add the format to `markdown_formats` in
`markdown_lib.h`. Then modify `print_element` in `markdown_output.c`,
and add functions `print_XXXX_string`, `print_XXXX_element`, and
`print_XXXX_element_list`. Also add an option in the main program
that selects the new format. Don't forget to add it to the list of
formats in the usage message.
* To add syntax extensions, define them in the PEG grammar
(`markdown_parser.leg`), using existing extensions as a guide. New
inline elements will need to be added to `Inline =`; new block
elements will need to be added to `Block =`. (Note: the order
of the alternatives does matter in PEG grammars.)
* If you need to add new types of elements, modify the `keys`
enum in `markdown_peg.h`.
* By using `&{ }` rules one can selectively disable extensions
depending on command-line options. For example,
`&{ extension(EXT_SMART) }` succeeds only if the `EXT_SMART` bit
of the global `syntax_extensions` is set. Add your option to
`markdown_extensions` in `markdown_lib.h`, and add an option in
`markdown.c` to turn on your extension.
* Note: Avoid using `[^abc]` character classes in the grammar, because
they cause problems with non-ascii input. Instead, use: `( !'a' !'b'
!'c' . )`