# Introduction

Over the past decade of my academic and professional career I spent a fair bit of time documenting the things I was working on. While in graduate school this was primarily in Latex, documenting thousands of pages, some of which are available in my grad school notes. For the past few years my work has been less academic, less mathematical, and involved more programming. It was also not often distributed or published. As such my documentation lately has been primarily in Markdown. I found myself initially using Atom with the markdown-pdf plugin to periodically generate PDFs when necessary. This approach was fine, but it provided no real support for equations, and Atom was not my preferred editor.

I was looking for a better solution than this that provided the following:

• Short syntax that is easily human readable (e.g. markdown headings as # instead of Latex \section{} or HTML <h1></h1>). This was to make my writing quicker and more efficient and the document source easier to read. From this perspective, markdown was an attractive option. For example, bullets in markdown source look like bullets, versus those in Latex or HTML.
• This was could be facilitated further by good syntax highlighting in an editor, e.g. bolding headings # and **bold text**.
• Latex equation support. While it is expected that equations be infrequent and generally simple, they needed to be supported.
• Cross platform should allow the source document to be easily deployed across many platforms. For example:
• Github wikis
• Web via a static site generator like Hugo, used to create this site
• PDFs. It is important to note that I don’t anticipate the need for documents to be duplicated across these mediums, rather I wanted to be able to adopt an efficient and standard way of writing without thinking what the destination format would be when writing.
• Flexible styling across the above platforms. A solution which facilitated uniform styling across mediums would be ideal, whereby a single .css or .sty file could be used to consistently format both a generated PDF and content for the web.
• Minimal tooling required.
• No dedicated app or IDE required to effectively note-take.
• Fast to compile or view. However, the need for this was inversely proportional to the complexity of the document syntax. That is, if I could adopt a solution with a sufficiently simple syntax that allowed it to be easily read, then it would reduce the necessity of compiling to view the generated output while working.
• Offered citation support for referencing entries from a .bib. file.

# Solution: Markdown and Pandoc

After considering the above requirements for a simple, easy-to-read syntax, markdown was a good choice. Furthermore, it is widely supported on and offline, with easy tooling to generate PDFs including Pandoc and good support online at Github, Gitlab, and with Hugo, Jekyll, and more. Markdown supports embedded HTML and Latex with tools like KaTex and MathJax.

For the generation of PDFs Pandoc is easy to use and supports Latex. Latex is not supported in Github, but seems to be in Gitlab. Markdown is easy-to-read meaning I can efficiently view, edit, and write the source document, reducing the need for real-time rendering or to frequently and quickly compile. References can be included from a .bib file. Using the Sublime Text markdown syntax highlighting works well for markdown, although it does not do anything for Latex. But as the primary goal was for a simple, easy syntax that allowed for the inclusion of equations rather than a focus on highly mathematical documents, this was not a big deal - equations should be relatively simple and infrequent. And admittedly I’ve made no attempt yet to see if there are any options for syntax highlighting that may work better for markdown with Latex.

Latex was an alternate source format considered, with Pandoc providing tools for conversion to markdown or HTML for use on the web, and easy generation of PDFs. However, even with templates, Latex source is more verbose and and cumbersome to use for relatively simple note taking and documentation, most of what I do now. Furthermore, Latex is not a format well supported on the web. Jupyter was considered as well but seemed much more heavyweight than desired, required more tooling, and is not as widely or easily supported as markdown (although it is supported in Github, for example.)

Styling flexibility with this solution is nearly unlimited, although consistency between the web (e.g. via Hugo) and generated PDFs (e.g. via Pandoc and Latex) may not be easily maintained though. CSS used for the web could not necessarily be applied to PDFs and vice versa, but of the above requirements styling is not the most important. Once the desired styling is set for each of these outputs, it will likely not often be changed. There may even be options to use CSS with Pandoc and Latex, or tools to generate a .sty file from CSS, although this is another thing I’ve not yet looked into at the time of this writing.

## Using Pandoc

The Basic Pandoc command for generating doc.pdf from doc.md is:

$pandoc doc.md -o doc.pdf  For more about Pandoc check out the Pandoc User’s Guide. ## Bibliography A bibliography, in the form of a .bib can be easily be included with Pandoc. To format the references, the Citation Style Language can be specified. Thousands of CSL files can be found here. The following Pandoc options can be used to include the bibliography. --filter pandoc-citeproc \ --bibliography=test.bib \ --csl ieee.csl \  Citing the bib entry my-citation are accomplished in markdown by [@my-citation]. ## Styling To style the Pandoc generated output, several options for templates were available that can be used with the --template option. The Eisvogel Pandoc Latex template was one of the simplest and easiest. Just download it, put it in the default pandoc template location ~/.pandoc/templates/, and used with --template eisvogel. The result out-of-the-box is quite good, additional styling options will be described below. Docs on Pandoc’s different flavors of markdown described in the docs: Pandoc’s Markdown It says in the post Customizing pandoc to generate beautiful pdfs from markdown: GitHub style markdown is recommended if you wish to use the same source (or with minor changes) in multiple places. I chose to use markdown instead, as yaml_metadata_block not supported by gfm, nor does not work with Eisvogel template. ## Bash Script With the Pandoc options above (using filters, including a bibliography, specifying CSL file, and applying a template) the command to run Pandoc was becoming quite long. As was well described in the blog post Customizing pandoc to generate beautiful pdfs from markdown using a simple script to call Pandoc was an obvious solution. Calling Pandoc to convert a markdown to PDF required the following command: $ ~/.pandoc/md2pdf.sh doc.md ~/Desktop/doc.pdf


The contents of the script md2pdf.sh with all of the final options will be listed below.

# Markdown Processing in Pandoc and Hugo

With the above, the first problem that was encountered when attempting to generate a PDF from the source from this site was in the differences between the markdown processor of Hugo versus that of Pandoc. With the markdown processor in Hugo extra arguments can be passed to a code block, for example, specifying that line numbers should be turned off as shown below. However, when code such as this is used in the markdown file and Pandoc is called, it does not know how to interpret these extra arguments and ends up rendering the code weirdly. Again, it was not necessarily a primary use case that I generate PDFs from the posts on this site, but I wanted to have the flexibility to do so.

 1 2 3  js {linenos=false} var your = "code here"  

Here is what it looks like:

var your = "code here"


It looks just fine in Hugo - a nicely formatted code block without line numbers. This is not the case when generating a PDF with Pandoc as Pandoc cannot intepret the {lineos=false}. In the PDF the code is formatted as in-line code rather than in a block, all of the statements are on a single line, and {linenos=false} appears in the output as if it were inside the code block. To address this problem this additional code needed to be interpreted by Pandoc or simply ignored, with the desired effects of such code when used on the web to be achieved via Latex styling.

## Solution Option 1: Don’t use Pandoc

This is the obvious solution. For markdown source that is used to generate content only for the web a separate way to generate a PDF document may or will not be necessary. In rare cases when converting web content to PDF is necessary, printing from a browser to PDF is definitely a primitive but rather effective option. This was again important to acknowledge but not a viable solution to the underlying problem of how to handle flavors or features of markdown and the different processors that may not be supported by Pandoc.

## Solution Option 2: HTML/CSS Tricks and Pandoc Arguments

Another solution is to segregate markdown that is to be processed by Pandoc versus that that is to be processed by Hugo and its corresponding markdown engine. This could be accomplished using the following code, and using the Pandoc option --from markdown-markdown_in_html_blocks-native_spans. This tells Pandoc to process the HTML span with class="hide-me" thus showing its contents in the PDF generated by Pandoc. At the same time, when processed by Hugo for the web, the HTML span with class="hide-me" will be hidden using CSS display: none;. And the p element will naturally show in Hugo, but is hidden from Pandoc.

  1 2 3 4 5 6 7 8 9 10 11 12 13   js var your = "code here" 

js {linenos=false} var your = "code here" 



The results of this solution are as desired. With some cumbersome HTML and CSS, parts of the source can be defined that show up either in the Pandoc PDF output or on the web, and thus each of these parts can be written differently depending on how each markdown processor will use them.

This solution is at best horribly inelegant, requiring extra HTML elements and significant duplicated source just because the markdown processor of Pandoc is different than that of Hugo.

## Solution Option 3: Pandoc Filters

There is lots of information on Pandoc filters written in Python, php, Lua, etc. online, and this solution also seemed to the most elegant and flexible. This solution would require creating a filter called, for example, filter.lua that would handle the offending code when using Pandoc using the option --lua-filter=filter.lua.

A few minutes were spent looking at the filters, but I realized it might be a bit involved and so this option was set aside to come back to after seeing if there may be more simpler options. The Pandoc Filters docs were a useful reference. As filters were an interesting option, I was particularly interested in Pandoc Lua Filters as they seemed to be frequently used with success and I’d enjoyed working with Lua before. Filters can also be written in Python and used with Panflute, as described in the blog post Technical Writing with Pandoc and Panflute.

## Solution Option 4: Replace Offending Argument with sed

As the current problem was limited to one particular case (the occurrence of {linenos=false}) the stream editor sed could be easily used to look through the markdown source, replace offending code, and plumb the output into Pandoc. This way was very fast to understand and implement, more elegant than the HTML/CSS hacking above, and seemed somewhat flexible although it was far less elegant than ideal. But in order to ensure parts of the code could be specified for removal (and not occurrences in the text, like {linenos=false}) a &nbsp was tacked on. Then sed and Pandoc can be run as follows.

# Resources

• The PDF generated exactly from the markdown source used to create this page is available: documentation.pdf
• My .pandoc directory, including all Pandoc options, Latex headers, and Lua filters is available: https://github.com/dpwiese/.pandoc

3247 Words

2020-01-20 19:00 -0500