Bootiful Asciidoctor

This project is a pipeline designed to support the easy creation of code-centric, technical books like Reactive Spring, whose pipeline was mostly an automated AsciidoctorJ Spring Boot application.

Motivations

To understand what this project provides, let’s examine the first cut at the publication pipeline provided for me, and then we can see what might be improved.

The original flow looked like this:

git clone all the code included in the book. For my Reactive Spring book, these came from the reactive-spring-book organization. I kept these in an OSS, Apache 2 licensed and accessible for everyone to consume. The code isn’t the most exciting bit, after all.
That would be the book itself. So, the next step `git clone`s the book’s docs, as well.
then I ran the Spring Boot Asciidoctor autoconfiguration. The autoconfiguration does a ton of the struggle involved in using Asciidoctor to convert .adoc files into five different output formats: prepress-ready .PDF, screen-ready .PDF, .html, .epub, and .mobi.
the next step was to take the output artifacts, then push the results to a place where I could collect them. The thing I used was a branch in a Git repository.

This process worked. I finished the book! It worked well. That said, there were some issues.

There are multiple code repositories - more than a dozen discrete Git repositories that the pipeline needs to clone before the book can successfully run. I did this in a Bash for-loop, one after the other. This kind of work is what we call embarrassingly parallel. There’s no reason we couldn’t do all that stuff concurrently. As it was, this stage in the process took a minute or more.
The autoconfiguration was also serialized. I invoked one DocumentProducer after the other. The serialization was very valuable when I tried to figure AsciidoctorJ and iron out the kinks in the pipeline because it made it easier to debug everything. Now, these five conversions could also be concurrent.
The last step in the process - stashing the output documents somewhere so that I could download them and then upload them to Amazon KDP, Leanpub.com, etc. - should be a bit more flexible. I stashed them in a branch on a private Git repository using some Bash scripting. There’s no reason they couldn’t have been sent to me by email or uploaded to an Amazon S3 bucket.
The whole pipeline was in Travis CI. I’ve moved virtually everything on which I’m continually working to Github Actions except the book. I was so worried about trying to recreate the spaghetti code pipeline in a new environment, particularly when I could be spending that time trying to ship the final edition of this book that ended up taking me two years to finish.
Also, the result was a combination of Bash scripting and Java code that interleaved to give us the final result. It became harder and harder to reproduce the flow on my local machine.
The whole thing - because it lacked concurrency - took forever! It took something like eight minutes on the Travis CI. That’s in addition to Travis CI’s inherent slowness. Have you noticed how fast out the gate Github Actions is? It’s crazy! I type git push and refresh the Github Actions browser page, and it’s already processing! Thank you Github.
Storing multi-megabyte .pdf documents and output artifacts in a Git repository isn’t a great approach. We can and should do better.

This project aims to fix those many issues and provide a foundation for new features and new horizons.

Improved flow control with Spring Batch: it uses Spring Batch to describe the pipeline’s flow. Spring Batch gives us many benefits, including audibility and a powerful DSL to describe concurrent stages and serial stages. It’s also Spring so that we can leverage all of Java in normal ways. I suppose I could’ve also used a workflow engine like Flowable in this case. Either way, the result is way faster even though - technically, underneath the hood - there’s a lot more going on.
Smarter Git integration: the old flow used Bash for all the Git interactions - cloning and pushing. This new project extracts those out into individual stages and leverages the wonderful JGit project.
Easier configuration The last stage - where we publish the documents to some terminus - is an excellent opportunity for strategies that can be activated or deactivated with Spring Boot configuration. The entire pipeline, come to think of it, benefits from Spring Boot configuration. There are a ton of things to specify in the pipeline: SSH or HTTP Git authentication credentials. Amazon S3 credentials. The workspace in which the pipeline will generate the documents. The list goes on, and it’s nice to have Spring Boot’s imminently flexible support here.
More dynamic control flow: I feel like this merely restates the point about improved flow control and more straightforward configuration differently, but it’s crucial: it’s much easier to use loops, if/else, while loops, and recursion from Java and Spring then it is from Bash. We can dynamically add functionality to stages or dynamically add stages to the pipeline, all in response to external configuration or changing dynamics.
Convention over configuration: Asciidoctor and Spring are vast, infinite in their possibilities. The process of publishing a technical book, on the other hand, is a known quantity. This new pipeline adopts some conventions - well-known attributes to refer to the code repository, assumptions about the book’s styling, etc. - to reduce to a handful of configuration options what turns out to be a dozen configuration options otherwise. Additionally, this project takes advantage of Spring Boot autoconfiguration to respond to classes, properties, etc. You should be able to get a new book pipeline up and running in a matter of minutes.
Integration and Isolation: So, the meat of this pipeline is AsciidoctorJ, which is the Ruby Asciidoctor-gem turned into a Java .jar with JRuby. The Asciidoctor gem, in turn, loads other gems like asciidoctor-pdf and asciidoctor-epub3. The combinations of these things make life enjoyable from a classloader, Maven, and Java class compatibility perspective. This project handles as much of that as possible for you and documents the rest.

Usage

Let’s look at a working sample that I’ve built called sample-pipeline. You can clone that for a working example. I’ll break down the relevant bits here so you can see what differentiates this from any other stock-standard Spring Boot build. The sample-pipeline in turn builds a book based on the documents in the sample-book.

The Maven Build

First, look at the pom.xml. You’ll notice that most of the configuration is in the build itself. This sample is a stock-standard Spring Boot application with nothing else in it. There’s no reason you couldn’t add to it, of course, and that’s part of the charm! This whole pipeline is an autoconfiguration. The sky’s the limit! There are a few things of note.

First, the build sits on top of Spring Batch, and Spring Batch assumes a SQL DataSource present somewhere in the context so that it can persist metadata tables associated with the versioning and execution state of the job instances. The goal is that if something goes wrong, you’ll be able to inspect the table metadata and see what happened or even intervene. If you don’t want to configure a SQL database, it’s fine. Just add an in-memory, embedded, stateless SQL database like H2 to your build and Spring Boot will configure a DataSource bean for you:

<dependency>
    <groupId>com.h2database</groupId>
    <artifactId>h2</artifactId>
    <scope>runtime</scope>
</dependency>

Second, the build includes the Spring Batch Job in the following Maven dependency:

<dependency>
    <groupId>bootiful.asciidoctor</groupId>
    <artifactId>asciidoctor-publication-job</artifactId>
    <version>0.0.1-SNAPSHOT</version>
</dependency>

This is not on Maven central, so you could either build the code yourself or use my Artifact repository, as I did. Here’s the relevant Maven configuration. There is an equivalent configuration for Gradle.

<repositories>
    <repository>
        <snapshots>
            <enabled>false</enabled>
        </snapshots>
        <id>central</id>
        <name>libs-release</name>
        <url>
            https://cloudnativejava.jfrog.io/cloudnativejava/libs-release
        </url>
    </repository>
    <repository>
        <snapshots/>
        <id>snapshots</id>
        <name>libs-snapshot</name>
        <url>
            https://cloudnativejava.jfrog.io/cloudnativejava/libs-snapshot
        </url>
    </repository>
</repositories>
<pluginRepositories>
    <pluginRepository>
        <snapshots>
            <enabled>false</enabled>
        </snapshots>
        <id>central</id>
        <name>plugins-release</name>
        <url>
            https://cloudnativejava.jfrog.io/cloudnativejava/plugins-release
        </url>
    </pluginRepository>
    <pluginRepository>
        <snapshots/>
        <id>snapshots</id>
        <name>plugins-snapshot</name>
        <url>
            https://cloudnativejava.jfrog.io/cloudnativejava/plugins-snapshot
        </url>
    </pluginRepository>
</pluginRepositories>

Also, there is some weirdness associated with the interaction between JRuby, AsciidoctorJ, JRuby loading JRuby gems, and the way Spring Boot packages .jar artifacts within other .jar in the Spring Boot Maven plugin. I had to tell Spring Boot’s Maven plugin not to pack a few .jar artifacts in the same way as it does everything else.

<plugin>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-maven-plugin</artifactId>
    <configuration>
        <requiresUnpack>
            <dependency>
                <groupId>org.jruby</groupId>
                <artifactId>jruby-complete</artifactId>
            </dependency>
            <dependency>
                <groupId>org.asciidoctor</groupId>
                <artifactId>asciidoctorj</artifactId>
            </dependency>
            <dependency>
                <groupId>org.asciidoctor</groupId>
                <artifactId>asciidoctorj-epub3</artifactId>
            </dependency>
            <dependency>
                <groupId>org.asciidoctor</groupId>
                <artifactId>asciidoctorj-pdf</artifactId>
            </dependency>
        </requiresUnpack>
    </configuration>
</plugin>

Alright, that’s most of the weirdness. At this point, it’s just using any other Spring Boot autoconfiguration. You have two extensibility planes: configuration properties such as those in application.properties and Spring itself.

Configuration Properties

You can get a working pipeline with a very small amount of configuration.

# (1)
pipeline.job.root=${HOME}/Desktop/root

# (2)
pipeline.job.target=${HOME}/Desktop/target

# (3)
pipeline.job.book-name=My Book

# (4)
pipeline.job.document-repository=https://github.com/your-org/your-docs.git

# (5)
pipeline.job.include-repositories=\
  https://github.com/your-org/code-repo-1.git,\
  https://github.com/your-org/code-repo-2.git

This tells the pipeline where it should do its work. It has to make a mess somewhere. Where should it be?
This tells the pipeline where it should dump out its produced files.
This is an alias for publication.book-name.
This tells the pipeline where to find the .adoc files for your book itself. I usually keep index.adoc at the root of this repository. You can see this sample repository for something to clone. It includes a sample Asciidoctor book with some interesting samples, including a cover, code inclusions, a table-of-contents, styling for EPub and PDF, etc.
This tells the pipeline which repositories should be cloned before the book is produced so that the documents in the document-repository can reference files in the cloned repositories for includes.

The pipeline sets up some common attributes, including one called code which you can use to reference the root of all the cloned Git repositories from the document-repository property. So, assuming you wanted to reference one of the bits of configuration or code - let’s say you have a file called src/main/java/Main.java - from your-org/code-repo-1, then you can include {code}/code-repo-1/src/main/java/Main.java in your Asciidoctor book chapters.

If you want to disable the pipeline as a whole, set pipeline.job.enabled=false.

There are five DocumentProducer beans registered by default as part of the underlying asciidoctor-autoconfiguration. One of them, the MobiProducer, will fail when running anywhere but Linux as it relies on a Linux binary for kindlegen. Suppose you have the macOS-compatible binary, great. Use that. Otherwise, you may want to disable that particular DocumentProducer when running the pipeline on your local macOS or Windows machine. Indeed, you may want to disable any or all of the DocumentProducer beans! There are five properties you can use to toggle them on or off.

Here are the five properties. Specify any of them and set them as false or true based on your particular use case. You could mix-and-match these properties with Spring profiles to conditionally activate them when running in your CI environment.

publication.epub.enabled
publication.mobi.enabled
publication.html.enabled
publication.pdf.prepress.enabled
publication.pdf.screen.enabled

All of these are enabled by default on Linux. The MobiDocumentProducer does not run unless it is on Linux. It’ll automatically disable itself on any other operating system.

Remember, you could specify all of these properties through any mechanism Spring Boot provides, including environment variables.

You might, for example, have the following environment variable before you run the pipeline:

export PUBLICATION_MOBI_ENABLED=false

Then run the pipeline. That will override any value specified in your local application.properties or application.yml.

PDF Compression

You may want to compress your PDF files. You can specify publication.pdf.screen.optimize=true to optimize the screen-ready PDF, and publication.pdf.prepress.optimize=true to optimize the press-ready PDF. Specify this and the respective DocumentProducer will emit one regular and one optimized PDF.

The pipeline shells out to the asciidoctor-pdf-optimize script which must be installed before it can be used. The sample-pipeline is a good example on how to get everythign working on an Ubuntu machine before running the build. Here’s the command to install everything required using the Ubuntu apt package management system.

sudo apt install ghostscript  \
  && sudo gem install asciidoctor
  && sudo gem install asciidoctor-pdf \
  && sudo gem install rghost

Spring Boot Overrides and Events

Let’s look at a sample Spring Boot application that configures a few things beyond what we’ve looked at:

package com.example.samplepipeline;

import bootiful.asciidoctor.DocumentsPublishedEvent;
import lombok.extern.log4j.Log4j2;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.boot.autoconfigure.batch.JobExecutionEvent;
import org.springframework.boot.context.event.ApplicationReadyEvent;
import org.springframework.context.ApplicationListener;
import org.springframework.context.annotation.Bean;
import org.springframework.core.env.Environment;

@Log4j2
@SpringBootApplication
public class SamplePipelineApplication {

    public static void main(String[] args) {
        SpringApplication.run(SamplePipelineApplication.class, args);
    }

	//(1)
    @Bean
    ApplicationListener<DocumentsPublishedEvent> documentsPublishedListener() {
        return event -> {
            log.info("Ding! The files are ready!");
            for (var e : event.getSource().entrySet())
                log.info(e.getKey() + '=' + e.getValue());
        };
    }

	//(2)
    @Bean
    ApplicationListener<JobExecutionEvent> batchJobListener() {
        return event -> {
            var jobExecution = event.getJobExecution();
            var createTime = jobExecution.getCreateTime();
            var endTime = jobExecution.getEndTime();
            var jobName = jobExecution.getJobInstance().getJobName();
            log.info("job (" + jobName + ") start time: " + createTime.toString());
            log.info("job (" + jobName + ") stop time: " + endTime.toString());
        };
    }
}

The pipeline publishes an ApplicationEvent after the pipeline has produced all the documents. You can get the source of the event - a Map<String, Collection<File>> that contains a mapping of the document type to the output documents. For example, the HTML producer might produce two files: index.html and an images directory. The key for the map is a way to distinguish which file is which. The pipeline produces two .pdf files, for example. One for the screen, and one for prepress.
Spring Batch, on top of which this pipeline builds, also publishes some useful information through an event. You can ask the job how long it took to run, its exit status, etc.

You don’t need to provide either of these ApplicationListener beans, however. A public static void main and voilà: a pipeline! Run the main class in your project and give it a few seconds or minutes and then inspect the output directory. The application configures a thread pool that keeps the Java process running a little longer than the job that depends on it. Your pipeline might finish many seconds before the Java process itself finishes.

Repository Clones

The pipeline delegates to instances of GitCloneCallback to handle cloning Git repositories. The default implementation assumes that the Git repository is wide-open, and unauthenticted, automatically configuring an instance of PublicGitCloneCallback.

If you want to authenticate using a username and password, then define a bean of type CredentialsProvider in the context.

    @Bean
    UsernamePasswordCredentialsProvider usernamePasswordCredentialsProvider(@Value("${GIT_USERNAME}") String user,//(1)
            @Value("${GIT_PASSWORD}") String pw) {//(2)
        return new UsernamePasswordCredentialsProvider(user, pw);
    }

The GIT_USERNAME environment variable might be, for example, you Github username
The GIT_PASSWORD environment variable might be, for example, your Github personal access token.

Alternatively, if you want to authenticate using SSH, you’ll need to define a bean of type TransportConfigCallback. There are some convenient methods - com.joshlong.git.GitUtils.createSshTransportConfigCallback - that you can use to make shorter work of building a new instance of this type.

Document Publication

We’ve just looked at the flow, and we assumed you have access to the directory where the pipeline dumped the files - whatever directory you specified in pipeline.job.target. This assumption’s invalid in most CI environments, so you’ll want to have those artifacts uploaded somewhere.

DocumentPublisher implementations help with this, taking the build pipeline’s output and publishing them somewhere for you to collect and inspect them.

Git Branch Publication

The GitBranchDocumentPublisher is the most accessible, so you might want to start with it. It clones a specified git repository, checks out a particular branch, then adds a directory for each output document type. It then adds the output artifacts into that directory, commits it, and pushes the branch - new artifacts and all - back to the Git repository. You’ll need to configure a few things - the Git repository and the branch - for this to work.

pipeline.job.publishers.git.enabled=true
pipeline.job.publishers.git.artifact-branch=artifacts
pipeline.job.publishers.git.repository=https://github.com/your-org/your-artifact-repo.git

If you want to authenticate using a username and password, then define a bean of type CredentialsProvider in the context.

    @Bean
    UsernamePasswordCredentialsProvider usernamePasswordCredentialsProvider(@Value("${GIT_USERNAME}") String user,//(1)
            @Value("${GIT_PASSWORD}") String pw) {//(2)
        return new UsernamePasswordCredentialsProvider(user, pw);
    }

The GIT_USERNAME environment variable might be, for example, you Github username
The GIT_PASSWORD environment variable might be, for example, your Github personal access token.

Amazon S3 Bucket Publication

This DocumentPublisher that uploads an archive to an Amazon S3 bucket containing all the documents.

pipeline.job.publishers.s3.enabled=true
pipeline.job.publishers.s3.access-key-id=${AWS_ACCESS_KEY_ID}
pipeline.job.publishers.s3.region=${AWS_REGION}
pipeline.job.publishers.s3.secret-access-key=${AWS_SECRET_ACCESS_KEY}
pipeline.job.publishers.s3.bucket-name=bootiful-asciidoctor

These properties configure an AmazonS3 client from the official AWS Amazon S3 client SDK. There are, as always, many ways to authenticate wit Amazon. If you want to use a service principal or something else, then feel free to provide a bean of type AmazonS3 in the application context that’s so configured, and the DocumentPublisher Spring Boot autoconfiguration will defer to that one instead.

Other Publishers

I’d like to expand the assortment of publishers. It’s not hard to see the opportunities:

Artifactory or Nexus
attachments in ane email
Github Packages
an FTP service
Dropbox

You get the idea. Literally infinite potential. Just a matter of time and will.

Running the build when you update the book

So, you’ve got everything installed and configured. You have the pipeline setup on one node. You have the .adoc files living in some other repository, and now you want to force the book to rebuild each time you change the .adoc repository. Unfortunately, Github Actions don’t yet have a way to force one build to happen whenever another changes. But we can force our pipeline to run manually using a repository_dispatch.

Triggering the Pipeline Whenever the Book Repository Changes

You’ll need to have a Github Personal Access Token - this is easy to do from your Developer Settings page in the Settings section of your account. Configure a Github Actions secret called GH_PAT in your repository’s Secrets section that contains your Github Personal Access Token.

Then, configure a Github Action for the Github repository that contains your .adoc files. You can use the Github Action workflow in the sample-book repository as a foundation:

name: CI

env:
  PIPELINE_ORG_NAME: bootiful-asciidoctor
  PIPELINE_REPO_NAME: sample-pipeline
  GH_PAT: ${{ secrets.GH_PAT }}

on:
  push:
    branches: [ master ]
  pull_request:
    branches: [ master ]

jobs:

  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - run: ${GITHUB_WORKSPACE}/.github/workflows/trigger_book_build.sh

This workflow will run whenever you git push the changes to your .adoc files. The Github Action couldn’t be more trivial: it clones your .adoc file and then triggers the book publication pipeline. In this case, that pipeline is http://github.com/bootiful-asciidoctor/sample-pipeline. Make sure to adjust the environment variables to reflect your Github repository.

The workflow in turn invokes a Shell script in the same directory - trigger_book_build.sh - that looks like this:

#!/usr/bin/env bash

# (1)
curl -H "Accept: application/vnd.github.everest-preview+json" -H "Authorization: token ${GH_PAT}" --request POST  --data '{"event_type": "update-event"}' https://api.github.com/repos/${PIPELINE_ORG_NAME}/${PIPELINE_REPO_NAME}/dispatches

Note that we’re sending a payload in the body of the HTTP POST entity in which we specify the name of the event we’d like to trigger. It’s arbitrary. Here, we say that the event_type is update-event, but update-event could just as easily have been publish-please-event or anything else, so long as we remember the event we’ve specified when implementing the next part…

That curl command invokes the Github API and triggers the pipeline workflow which in turn then ends up pulling in this very sample-book repository and all the dependent code repositories and then, a few minutes later, produces all the various documents.

Enabling the Pipeline repository to listen for changes

You don’t need to do much to make this work here besides ensure that your workflow file explicitly supports our arbitrary event. You can refer to the sample-pipeline Github Actions workflow for a more thorough example, but here’s the important bit:

on:
  repository_dispatch:
    types: update-event
  push:
    branches: [ master ]
  pull_request:
    branches: [ master ]

That tells Github Actions to run the pipeline whenever anybody updates the code in the pipeline repository itself or whenever it receives a valid update-event, like the one we’re publishing from the sample-book repository Github Actions.