Typically, marketing assets like whitepapers, datasheets, and case studies are maintained by the graphic design team.

This can be frustrating for everyone who’s not on that team, because when you want to make a fix or update, even if it’s just fixing a typo, you have to get graphic design involved. And it can be frustrating for graphic design, too, because who wants to spend all day fixing typos?

There are other problems as well:

  • If you decide to change the template for your documents, because, for example, your branding has changed, it has to be changed manually everywhere
  • It’s hard to maintain documents in multiple formats (for example, RGB vs CMYK, or HTML vs PDF), because, again, changes made in one place have to be manually replicated elsewhere
  • Making even minor changes to documents requires the involvement of graphic design, and then an extra review cycle by the requester to make sure the change is made correctly
  • Tables of contents are a pain to manage generally, and have to be updated and manually cross-checked if the document structure or headings change

PrinceXML was recommended to me as a solution to these problems. As input, PrinceXML takes standard CSS and HTML, just as it’s used on the web. As output, it produces PDFs that are indistinguishable from something created using software like InDesign.

However, PrinceXML by itself doesn’t solve any of the problems above.

Why? Its interface is the command line. And the command has to be written to correctly import all of the necssary HTML and CSS for multiple versions, together with all images, boilerplate text, fonts and graphics, and so on.

This is not straightforward for users. Imagine giving everyone a car instead of forcing them to ride the bus. Much easier for them to get to where they want to be, but only if they understand how to use the turn signals, the gear shift, how to add fuel, etc. If they don’t know how to do these things, they’re worse off than they were before.

So, we needed an interface for PrinceXML. This interface had to make it easy to:

  • Send a document to PrinceXML for parsing
  • Include all the necessary files (images, fonts, multiple CSS versions, etc.)

As a bonus, it adds logic to parse Markdown, which is helpful because Markdown is far easier to understand than HTML is, but allows for including HTML if necessary.

I wrote a script that provides this interface, and the code (not including, of course, PrinceXML, which has to be purchased separately) is included in this GitHub repository: https://github.com/riboflavin/marquess.

The script is fairly straightforward bash, and it takes one parameter, which is the folder containing the document you want to use.

You can actually just drag the script, and then drag the target folder, straight to your command line if you’re using OS X. Or run the script by itself and it will prompt you for the folder.

Quick guide to the script

Here’s a quick guide to the functioning of the script. The first thing it’ll do is create some output folders for your document.

mkdir -p $docfolder/Output/PDF;
mkdir -p $docfolder/Output/PDF/img;

Next, it’ll look for a few lines at the beginning of the file that give header, subheader, and the date of the document. These are delimited with #, ##, and ###. The script will then remove those lines from the document and echo the remaining contents to the temporary final document.

#get the title out of content.md
TITLE=$(head -n 1 $docfolder/Input/content.md)
#bash substring replacement syntax

#remove three lines from the top of the content.md file 
#(the title, subtitle, date)
sed '1,3d' $docfolder/Input/content.md >> $docfolder/temp/content.md

Next, convert any existing markdown to HTML, using John Gruber’s Markdown perl script:

for i in $docfolder/temp/*.md; do perl $DIR/template/Markdown.pl --html4tags $i >> ${i%.*}.html; done;

After that, the script looks for a table of contents file, toc.md. If it doesn’t exist, the script looks for h1, h2, and h3 headings, parses them, and creates a new table of contents:

if [ ! -f "$docfolder/Input/toc.md" ]
    grep -e '^' $docfolder/temp/content.html | sed 's/<\/h[123]>/<span><\/span>& /g' >> $docfolder/Input/toc.md

Work then continues on the final document file, content.md. Marquess produces this file through a series of concatenations, and it does this in a giant loop that includes each format you want; these formats are included in the files names that Marquess looks for, and the filenames it generates.

The front page, for example, is generated like this:

for fmt in cmyk rgb; do

#preface. on the cmyk loop, for example, use example_cmyk_front.html
cat $DIR/template/doc/example_${fmt}_front.html >> $docfolder/Output/PDF/$fmt.html
cat $DIR/template/doc/example_common_front.html >> $docfolder/Output/PDF/$fmt.html

echo "<div id=\"cover\">" >> $docfolder/Output/PDF/$fmt.html
echo "<h1>$TITLE" >> $docfolder/Output/PDF/$fmt.html
echo "<h2>$SUBTITLE" >> $docfolder/Output/PDF/$fmt.html
echo "<h3>$TITLEDATE" >> $docfolder/Output/PDF/$fmt.html
echo "<div id=\"frontlogo\"></div>" >> $docfolder/Output/PDF/$fmt.html
echo "</div>" >> $docfolder/Output/PDF/$fmt.html

After adding lots of other stuff to the final output, at the very end, the script runs Prince on the HTML. You must update this line with the path to your PrinceXML binary.

#replace with path to your PrinceXML binary
/path_to_prince_binary/prince/prince $docfolder/Output/PDF/$fmt.html -o $docfolder/Output/PDF/${docname}_${fmt}.pdf -v

That’s it! The next step would be to make this an online app, with configurable (and saveable) options to generate documents on the fly. Though I’ve found the command line really useful since we often want to make lots of little iterations on our documents, then see what they look like.