Go to file
2010-11-24 19:17:50 +01:00
cmd initial import 2010-11-21 23:04:39 +01:00
misc initial import 2010-11-21 23:04:39 +01:00
.hgignore initial import 2010-11-21 23:04:39 +01:00
doc.go initial import 2010-11-21 23:04:39 +01:00
LICENSE initial import 2010-11-21 23:04:39 +01:00
Makefile Makefile: hg clone peg if it is not present 2010-11-22 23:18:23 +01:00
markdown.go processRawBlocks: simplify appending of lists 2010-11-24 18:59:08 +01:00
output.go output.go: don't pad() for endnotes if there are none 2010-11-24 19:17:50 +01:00
parser.leg initial import 2010-11-21 23:04:39 +01:00
README.peg-markdown initial import 2010-11-21 23:04:39 +01:00

What is this?
=============

This is an implementation of John Gruber's [markdown][] in C. It uses a
[parsing expression grammar (PEG)][] to define the syntax. This should
allow easy modification and extension. It currently supports output in
HTML, LaTeX, or groff_mm formats, and adding new formats is relatively
easy.

[parsing expression grammar (PEG)]: http://en.wikipedia.org/wiki/Parsing_expression_grammar 
[markdown]: http://daringfireball.net/projects/markdown/

It is pretty fast. A 179K text file that takes 5.7 seconds for
Markdown.pl (v. 1.0.1) to parse takes less than 0.2 seconds for this
markdown. It does, however, use a lot of memory (up to 4M of heap space
while parsing the 179K file, and up to 80K for a 4K file). (Note that
the memory leaks in earlier versions of this program have now been
plugged.)

Both a library and a standalone program are provided.

peg-markdown is written and maintained by John MacFarlane (jgm on
github), with significant contributions by Ryan Tomayko (rtomayko).
It is released under both the GPL and the MIT license; see LICENSE for
details.

Installing
==========

On a linux or unix-based system
-------------------------------

This program is written in portable ANSI C. It requires
[glib2](http://www.gtk.org/download.html). Most *nix systems will have
this installed already. The build system requires GNU make.

The other required dependency, [Ian Piumarta's peg/leg PEG parser
generator](http://piumarta.com/software/peg/), is included in the source
directory. It will be built automatically. (However, it is not as portable
as peg-markdown itself, and seems to require gcc.)

To make the 'markdown' executable:

    make

(Or, on some systems, `gmake`.) Then, for usage instructions:

    ./markdown --help

To run John Gruber's Markdown 1.0.3 test suite:

    make test

The test suite will fail on one of the list tests.  Here's why.
Markdown.pl encloses "item one" in the following list in `<p>` tags:

    1.  item one
        * subitem
        * subitem
    
    2.  item two

    3.  item three

peg-markdown does not enclose "item one" in `<p>` tags unless it has a
following blank line. This is consistent with the official markdown
syntax description, and lets the author of the document choose whether
`<p>` tags are desired.

Cross-compiling for Windows with MinGW on a linux box
-----------------------------------------------------

Prerequisites:

*   Linux system with MinGW cross compiler For Ubuntu:

        sudo apt-get install mingw32

*   [Windows glib-2.0 binary & development files](http://www.gtk.org/download-windows.html).
    Unzip files into cross-compiler directory tree (e.g., `/usr/i586-mingw32msvc`).

Steps:

1.  Create the markdown parser using Linux-compiled `leg` from peg-0.1.4:

        ./peg-0.1.4/leg markdown_parser.leg >markdown_parser.c

    (Note: The same thing could be accomplished by cross-compiling leg,
    executing it on Windows, and copying the resulting C file to the Linux
    cross-compiler host.)

2.  Run the cross compiler with include flag for the Windows glib-2.0 headers:
    for example,

        /usr/bin/i586-mingw32msvc-cc -c \
        -I/usr/i586-mingw32msvc/include/glib-2.0 \
        -I/usr/i586-mingw32msvc/lib/glib-2.0/include -Wall -O3 -ansi markdown*.c

3.  Link against Windows glib-2.0 headers: for example,

        /usr/bin/i586-mingw32msvc-cc markdown*.o \
        -Wl,-L/usr/i586-mingw32msvc/lib/glib,--dy,--warn-unresolved-symbols,-lglib-2.0 \
        -o markdown.exe

The resulting executable depends on the glib dll file, so be sure to
load the glib binary on the Windows host.

Compiling with MinGW on Windows
-------------------------------

These directions assume that MinGW is installed in `c:\MinGW` and glib-2.0
is installed in the MinGW directory hierarchy (with the mingw bin directory
in the system path).

Unzip peg-markdown in a temp directory. From the directory with the
peg-markdown source, execute:

    cd peg-0.1.4
    for %i in (*.c) do @gcc -g -Wall -O3 -DNDEBUG -c -o %~ni.o %i
    gcc -o leg.exe leg.o tree.o compile.o
    cd ..
    peg-0.1.4\leg.exe markdown_parser.leg >markdown_parser.c
    @for %i in (markdown*.c) do @gcc -mms-bitfields -Ic:/MinGW/include/glib-2.0 -Ic:/MinGW/lib/glib-2.0/include -c -o %~ni.o %i
    gcc -O3 -Lc:/MinGW/lib/glib-2.0 -lglib-2.0 -lintl markdown.o markdown_lib.o markdown_output.o markdown_parser.o -o markdown.exe -Wl,--dy,--warn-unresolved-symbols,-lglib-2.0,-Lc:/MinGW/lib/glib-2.0,-lglib-2.0,-lintl

(Windows instructions courtesy of Matt Wolf.)

Extensions
==========

peg-markdown supports extensions to standard markdown syntax.
These can be turned on using the command line flag `-x` or
`--extensions`.  `-x` by itself turns on all extensions.  Extensions
can also be turned on selectively, using individual command-line
options. To see the available extensions:

    ./markdown --help-extensions
 
The `--smart` extension provides "smart quotes", dashes, and ellipses.

The `--notes` extension provides a footnote syntax like that of
Pandoc or PHP Markdown Extra.

Using the library
=================

The library exports two functions:

    GString * markdown_to_g_string(char *text, int extensions, int output_format);
    char * markdown_to_string(char *text, int extensions, int output_format);

The only difference between these is that `markdown_to_g_string` returns a
`GString` (glib's automatically resizable string), while `markdown_to_string`
returns a regular character pointer.  The memory allocated for these must be
freed by the calling program, using `g_string_free()` or `free()`.

`text` is the markdown-formatted text to be converted.  Note that tabs will
be converted to spaces, using a four-space tab stop.  Character encodings are
ignored.

`extensions` is a bit-field specifying which syntax extensions should be used.
If `extensions` is 0, no extensions will be used.  If it is `0xFFFFFF`,
all extensions will be used.  To set extensions selectively, use the
bitwise `&` operator and the following constants:

 - `EXT_SMART` turns on smart quotes, dashes, and ellipses.
 - `EXT_NOTES` turns on footnote syntax.  [Pandoc's footnote syntax][] is used here.
 - `EXT_FILTER_HTML` filters out raw HTML (except for styles).
 - `EXT_FILTER_STYLES` filters out styles in HTML.

  [Pandoc's footnote syntax]: http://johnmacfarlane.net/pandoc/README.html#footnotes

`output_format` is either `HTML_FORMAT`, `LATEX_FORMAT`, or `GROFF_MM_FORMAT`.

To use the library, include `markdown_lib.h`.  See `markdown.c` for an example.

Hacking
=======

It should be pretty easy to modify the program to produce other formats
than HTML or LaTeX, and to parse syntax extensions.  A quick guide:

  * `markdown_parser.leg` contains the grammar itself.

  * `markdown_output.c` contains functions for printing the `Element`
    structure in various output formats.

  * To add an output format, add the format to `markdown_formats` in
    `markdown_lib.h`.  Then modify `print_element` in `markdown_output.c`,
    and add functions `print_XXXX_string`, `print_XXXX_element`, and
    `print_XXXX_element_list`. Also add an option in the main program
    that selects the new format. Don't forget to add it to the list of
    formats in the usage message.

  * To add syntax extensions, define them in the PEG grammar
    (`markdown_parser.leg`), using existing extensions as a guide. New
    inline elements will need to be added to `Inline =`; new block
    elements will need to be added to `Block =`. (Note: the order
    of the alternatives does matter in PEG grammars.)

  * If you need to add new types of elements, modify the `keys`
    enum in `markdown_peg.h`.

  * By using `&{ }` rules one can selectively disable extensions
    depending on command-line options. For example,
    `&{ extension(EXT_SMART) }` succeeds only if the `EXT_SMART` bit
    of the global `syntax_extensions` is set. Add your option to
    `markdown_extensions` in `markdown_lib.h`, and add an option in
    `markdown.c` to turn on your extension.

  * Note: Avoid using `[^abc]` character classes in the grammar, because
    they cause problems with non-ascii input. Instead, use: `( !'a' !'b'
    !'c' . )`