This project is a pipeline designed to support the easy creation of code-centric, technical books like Reactive Spring, whose pipeline was mostly an automated AsciidoctorJ Spring Boot application.
Motivations
To understand what this project provides, let’s examine the first cut at the publication pipeline provided for me, and then we can see what might be improved.
The original flow looked like this:
-
git clone
all the code included in the book. For my Reactive Spring book, these came from thereactive-spring-book
organization. I kept these in an OSS, Apache 2 licensed and accessible for everyone to consume. The code isn’t the most exciting bit, after all. -
That would be the book itself. So, the next step `git clone`s the book’s docs, as well.
-
then I ran the Spring Boot Asciidoctor autoconfiguration. The autoconfiguration does a ton of the struggle involved in using Asciidoctor to convert
.adoc
files into five different output formats: prepress-ready.PDF
, screen-ready.PDF
,.html
,.epub
, and.mobi
. -
the next step was to take the output artifacts, then push the results to a place where I could collect them. The thing I used was a branch in a Git repository.
This process worked. I finished the book! It worked well. That said, there were some issues.
-
There are multiple code repositories - more than a dozen discrete Git repositories that the pipeline needs to clone before the book can successfully run. I did this in a Bash for-loop, one after the other. This kind of work is what we call embarrassingly parallel. There’s no reason we couldn’t do all that stuff concurrently. As it was, this stage in the process took a minute or more.
-
The autoconfiguration was also serialized. I invoked one
DocumentProducer
after the other. The serialization was very valuable when I tried to figure AsciidoctorJ and iron out the kinks in the pipeline because it made it easier to debug everything. Now, these five conversions could also be concurrent. -
The last step in the process - stashing the output documents somewhere so that I could download them and then upload them to Amazon KDP, Leanpub.com, etc. - should be a bit more flexible. I stashed them in a branch on a private Git repository using some Bash scripting. There’s no reason they couldn’t have been sent to me by email or uploaded to an Amazon S3 bucket.
-
The whole pipeline was in Travis CI. I’ve moved virtually everything on which I’m continually working to Github Actions except the book. I was so worried about trying to recreate the spaghetti code pipeline in a new environment, particularly when I could be spending that time trying to ship the final edition of this book that ended up taking me two years to finish.
-
Also, the result was a combination of Bash scripting and Java code that interleaved to give us the final result. It became harder and harder to reproduce the flow on my local machine.
-
The whole thing - because it lacked concurrency - took forever! It took something like eight minutes on the Travis CI. That’s in addition to Travis CI’s inherent slowness. Have you noticed how fast out the gate Github Actions is? It’s crazy! I type
git push
and refresh the Github Actions browser page, and it’s already processing! Thank you Github. -
Storing multi-megabyte
.pdf
documents and output artifacts in a Git repository isn’t a great approach. We can and should do better.
This project aims to fix those many issues and provide a foundation for new features and new horizons.
-
Improved flow control with Spring Batch: it uses Spring Batch to describe the pipeline’s flow. Spring Batch gives us many benefits, including audibility and a powerful DSL to describe concurrent stages and serial stages. It’s also Spring so that we can leverage all of Java in normal ways. I suppose I could’ve also used a workflow engine like Flowable in this case. Either way, the result is way faster even though - technically, underneath the hood - there’s a lot more going on.
-
Smarter Git integration: the old flow used Bash for all the Git interactions - cloning and pushing. This new project extracts those out into individual stages and leverages the wonderful JGit project.
-
Easier configuration The last stage - where we publish the documents to some terminus - is an excellent opportunity for strategies that can be activated or deactivated with Spring Boot configuration. The entire pipeline, come to think of it, benefits from Spring Boot configuration. There are a ton of things to specify in the pipeline: SSH or HTTP Git authentication credentials. Amazon S3 credentials. The workspace in which the pipeline will generate the documents. The list goes on, and it’s nice to have Spring Boot’s imminently flexible support here.
-
More dynamic control flow: I feel like this merely restates the point about improved flow control and more straightforward configuration differently, but it’s crucial: it’s much easier to use loops, if/else, while loops, and recursion from Java and Spring then it is from Bash. We can dynamically add functionality to stages or dynamically add stages to the pipeline, all in response to external configuration or changing dynamics.
-
Convention over configuration: Asciidoctor and Spring are vast, infinite in their possibilities. The process of publishing a technical book, on the other hand, is a known quantity. This new pipeline adopts some conventions - well-known attributes to refer to the code repository, assumptions about the book’s styling, etc. - to reduce to a handful of configuration options what turns out to be a dozen configuration options otherwise. Additionally, this project takes advantage of Spring Boot autoconfiguration to respond to classes, properties, etc. You should be able to get a new book pipeline up and running in a matter of minutes.
-
Integration and Isolation: So, the meat of this pipeline is AsciidoctorJ, which is the Ruby Asciidoctor-gem turned into a Java
.jar
with JRuby. The Asciidoctor gem, in turn, loads other gems likeasciidoctor-pdf
andasciidoctor-epub3
. The combinations of these things make life enjoyable from a classloader, Maven, and Java class compatibility perspective. This project handles as much of that as possible for you and documents the rest.
Usage
Let’s look at a working sample that I’ve built called sample-pipeline
. You can clone that for a working example. I’ll break down the relevant bits here so you can see what differentiates this from any other stock-standard Spring Boot build. The sample-pipeline
in turn builds a book based on the documents in the sample-book
.
The Maven Build
First, look at the pom.xml
. You’ll notice that most of the configuration is in the build itself. This sample is a stock-standard Spring Boot application with nothing else in it. There’s no reason you couldn’t add to it, of course, and that’s part of the charm! This whole pipeline is an autoconfiguration. The sky’s the limit! There are a few things of note.
First, the build sits on top of Spring Batch, and Spring Batch assumes a SQL DataSource
present somewhere in the context so that it can persist metadata tables associated with the versioning and execution state of the job instances. The goal is that if something goes wrong, you’ll be able to inspect the table metadata and see what happened or even intervene. If you don’t want to configure a SQL database, it’s fine. Just add an in-memory, embedded, stateless SQL database like H2 to your build and Spring Boot will configure a DataSource
bean for you:
<dependency>
<groupId>com.h2database</groupId>
<artifactId>h2</artifactId>
<scope>runtime</scope>
</dependency>
Second, the build includes the Spring Batch Job
in the following Maven dependency:
<dependency>
<groupId>bootiful.asciidoctor</groupId>
<artifactId>asciidoctor-publication-job</artifactId>
<version>0.0.1-SNAPSHOT</version>
</dependency>
This is not on Maven central, so you could either build the code yourself or use my Artifact repository, as I did. Here’s the relevant Maven configuration. There is an equivalent configuration for Gradle.
<repositories>
<repository>
<snapshots>
<enabled>false</enabled>
</snapshots>
<id>central</id>
<name>libs-release</name>
<url>
https://cloudnativejava.jfrog.io/cloudnativejava/libs-release
</url>
</repository>
<repository>
<snapshots/>
<id>snapshots</id>
<name>libs-snapshot</name>
<url>
https://cloudnativejava.jfrog.io/cloudnativejava/libs-snapshot
</url>
</repository>
</repositories>
<pluginRepositories>
<pluginRepository>
<snapshots>
<enabled>false</enabled>
</snapshots>
<id>central</id>
<name>plugins-release</name>
<url>
https://cloudnativejava.jfrog.io/cloudnativejava/plugins-release
</url>
</pluginRepository>
<pluginRepository>
<snapshots/>
<id>snapshots</id>
<name>plugins-snapshot</name>
<url>
https://cloudnativejava.jfrog.io/cloudnativejava/plugins-snapshot
</url>
</pluginRepository>
</pluginRepositories>
Also, there is some weirdness associated with the interaction between JRuby, AsciidoctorJ, JRuby loading JRuby gems, and the way Spring Boot packages .jar
artifacts within other .jar
in the Spring Boot Maven plugin. I had to tell Spring Boot’s Maven plugin not to pack a few .jar
artifacts in the same way as it does everything else.
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
<configuration>
<requiresUnpack>
<dependency>
<groupId>org.jruby</groupId>
<artifactId>jruby-complete</artifactId>
</dependency>
<dependency>
<groupId>org.asciidoctor</groupId>
<artifactId>asciidoctorj</artifactId>
</dependency>
<dependency>
<groupId>org.asciidoctor</groupId>
<artifactId>asciidoctorj-epub3</artifactId>
</dependency>
<dependency>
<groupId>org.asciidoctor</groupId>
<artifactId>asciidoctorj-pdf</artifactId>
</dependency>
</requiresUnpack>
</configuration>
</plugin>
Alright, that’s most of the weirdness. At this point, it’s just using any other Spring Boot autoconfiguration. You have two extensibility planes: configuration properties such as those in application.properties
and Spring itself.
Configuration Properties
You can get a working pipeline with a very small amount of configuration.
# (1)
pipeline.job.root=${HOME}/Desktop/root
# (2)
pipeline.job.target=${HOME}/Desktop/target
# (3)
pipeline.job.book-name=My Book
# (4)
pipeline.job.document-repository=https://github.com/your-org/your-docs.git
# (5)
pipeline.job.include-repositories=\
https://github.com/your-org/code-repo-1.git,\
https://github.com/your-org/code-repo-2.git
-
This tells the pipeline where it should do its work. It has to make a mess somewhere. Where should it be?
-
This tells the pipeline where it should dump out its produced files.
-
This is an alias for
publication.book-name
. -
This tells the pipeline where to find the
.adoc
files for your book itself. I usually keepindex.adoc
at the root of this repository. You can see this sample repository for something to clone. It includes a sample Asciidoctor book with some interesting samples, including a cover, code inclusions, a table-of-contents, styling for EPub and PDF, etc. -
This tells the pipeline which repositories should be cloned before the book is produced so that the documents in the
document-repository
can reference files in the cloned repositories for includes.
The pipeline sets up some common attributes, including one called code
which you can use to reference the root of all the cloned Git repositories from the document-repository
property. So, assuming you wanted to reference one of the bits of configuration or code - let’s say you have a file called src/main/java/Main.java
- from your-org/code-repo-1
, then you can include {code}/code-repo-1/src/main/java/Main.java
in your Asciidoctor book chapters.
If you want to disable the pipeline as a whole, set pipeline.job.enabled=false
.
There are five DocumentProducer
beans registered by default as part of the underlying asciidoctor-autoconfiguration
. One of them, the MobiProducer
, will fail when running anywhere but Linux as it relies on a Linux binary for kindlegen
. Suppose you have the macOS-compatible binary, great. Use that. Otherwise, you may want to disable that particular DocumentProducer
when running the pipeline on your local macOS or Windows machine. Indeed, you may want to disable any or all of the DocumentProducer
beans! There are five properties you can use to toggle them on or off.
Here are the five properties. Specify any of them and set them as false
or true
based on your particular use case. You could mix-and-match these properties with Spring profiles to conditionally activate them when running in your CI environment.
-
publication.epub.enabled
-
publication.mobi.enabled
-
publication.html.enabled
-
publication.pdf.prepress.enabled
-
publication.pdf.screen.enabled
All of these are enabled by default on Linux. The MobiDocumentProducer
does not run unless it is on Linux. It’ll automatically disable itself on any other operating system.
Remember, you could specify all of these properties through any mechanism Spring Boot provides, including environment variables.
You might, for example, have the following environment variable before you run the pipeline:
export PUBLICATION_MOBI_ENABLED=false
Then run the pipeline. That will override any value specified in your local application.properties
or application.yml
.
PDF Compression
You may want to compress your PDF files. You can specify publication.pdf.screen.optimize=true
to optimize the screen-ready PDF, and publication.pdf.prepress.optimize=true
to optimize the press-ready PDF. Specify this and the respective DocumentProducer
will emit one regular and one optimized PDF.
The pipeline shells out to the asciidoctor-pdf-optimize
script which must be installed before it can be used. The sample-pipeline
is a good example on how to get everythign working on an Ubuntu machine before running the build. Here’s the command to install everything required using the Ubuntu apt
package management system.
sudo apt install ghostscript \
&& sudo gem install asciidoctor
&& sudo gem install asciidoctor-pdf \
&& sudo gem install rghost
Spring Boot Overrides and Events
Let’s look at a sample Spring Boot application that configures a few things beyond what we’ve looked at:
package com.example.samplepipeline;
import bootiful.asciidoctor.DocumentsPublishedEvent;
import lombok.extern.log4j.Log4j2;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.boot.autoconfigure.batch.JobExecutionEvent;
import org.springframework.boot.context.event.ApplicationReadyEvent;
import org.springframework.context.ApplicationListener;
import org.springframework.context.annotation.Bean;
import org.springframework.core.env.Environment;
@Log4j2
@SpringBootApplication
public class SamplePipelineApplication {
public static void main(String[] args) {
SpringApplication.run(SamplePipelineApplication.class, args);
}
//(1)
@Bean
ApplicationListener<DocumentsPublishedEvent> documentsPublishedListener() {
return event -> {
log.info("Ding! The files are ready!");
for (var e : event.getSource().entrySet())
log.info(e.getKey() + '=' + e.getValue());
};
}
//(2)
@Bean
ApplicationListener<JobExecutionEvent> batchJobListener() {
return event -> {
var jobExecution = event.getJobExecution();
var createTime = jobExecution.getCreateTime();
var endTime = jobExecution.getEndTime();
var jobName = jobExecution.getJobInstance().getJobName();
log.info("job (" + jobName + ") start time: " + createTime.toString());
log.info("job (" + jobName + ") stop time: " + endTime.toString());
};
}
}
-
The pipeline publishes an
ApplicationEvent
after the pipeline has produced all the documents. You can get thesource
of the event - aMap<String, Collection<File>>
that contains a mapping of the document type to the output documents. For example, the HTML producer might produce two files:index.html
and animages
directory. The key for the map is a way to distinguish which file is which. The pipeline produces two.pdf
files, for example. One for the screen, and one for prepress. -
Spring Batch, on top of which this pipeline builds, also publishes some useful information through an event. You can ask the job how long it took to run, its exit status, etc.
You don’t need to provide either of these ApplicationListener
beans, however. A public static void main
and voilĂ : a pipeline! Run the main class in your project and give it a few seconds or minutes and then inspect the output directory. The application configures a thread pool that keeps the Java process running a little longer than the job that depends on it. Your pipeline might finish many seconds before the Java process itself finishes.
Repository Clones
The pipeline delegates to instances of GitCloneCallback
to handle cloning Git repositories. The default implementation assumes that the Git repository is wide-open, and unauthenticted, automatically configuring an instance of PublicGitCloneCallback
.
If you want to authenticate using a username and password, then define a bean of type CredentialsProvider
in the context.
@Bean
UsernamePasswordCredentialsProvider usernamePasswordCredentialsProvider(@Value("${GIT_USERNAME}") String user,//(1)
@Value("${GIT_PASSWORD}") String pw) {//(2)
return new UsernamePasswordCredentialsProvider(user, pw);
}
-
The
GIT_USERNAME
environment variable might be, for example, you Github username -
The
GIT_PASSWORD
environment variable might be, for example, your Github personal access token.
Alternatively, if you want to authenticate using SSH, you’ll need to define a bean of type TransportConfigCallback
. There are some convenient methods - com.joshlong.git.GitUtils.createSshTransportConfigCallback
- that you can use to make shorter work of building a new instance of this type.
Document Publication
We’ve just looked at the flow, and we assumed you have access to the directory where the pipeline dumped the files - whatever directory you specified in pipeline.job.target
. This assumption’s invalid in most CI environments, so you’ll want to have those artifacts uploaded somewhere.
DocumentPublisher
implementations help with this, taking the build pipeline’s output and publishing them somewhere for you to collect and inspect them.
Git Branch Publication
The GitBranchDocumentPublisher
is the most accessible, so you might want to start with it. It clones a specified git repository, checks out a particular branch, then adds a directory for each output document type. It then adds the output artifacts into that directory, commits it, and pushes the branch - new artifacts and all - back to the Git repository. You’ll need to configure a few things - the Git repository and the branch - for this to work.
pipeline.job.publishers.git.enabled=true
pipeline.job.publishers.git.artifact-branch=artifacts
pipeline.job.publishers.git.repository=https://github.com/your-org/your-artifact-repo.git
If you want to authenticate using a username and password, then define a bean of type CredentialsProvider
in the context.
@Bean
UsernamePasswordCredentialsProvider usernamePasswordCredentialsProvider(@Value("${GIT_USERNAME}") String user,//(1)
@Value("${GIT_PASSWORD}") String pw) {//(2)
return new UsernamePasswordCredentialsProvider(user, pw);
}
-
The
GIT_USERNAME
environment variable might be, for example, you Github username -
The
GIT_PASSWORD
environment variable might be, for example, your Github personal access token.
Alternatively, if you want to authenticate using SSH, you’ll need to define a bean of type TransportConfigCallback
. There are some convenient methods - com.joshlong.git.GitUtils.createSshTransportConfigCallback
- that you can use to make shorter work of building a new instance of this type.
Amazon S3 Bucket Publication
This DocumentPublisher
that uploads an archive to an Amazon S3 bucket containing all the documents.
pipeline.job.publishers.s3.enabled=true
pipeline.job.publishers.s3.access-key-id=${AWS_ACCESS_KEY_ID}
pipeline.job.publishers.s3.region=${AWS_REGION}
pipeline.job.publishers.s3.secret-access-key=${AWS_SECRET_ACCESS_KEY}
pipeline.job.publishers.s3.bucket-name=bootiful-asciidoctor
These properties configure an AmazonS3
client from the official AWS Amazon S3 client SDK. There are, as always, many ways to authenticate wit Amazon. If you want to use a service principal or something else, then feel free to provide a bean of type AmazonS3
in the application context that’s so configured, and the DocumentPublisher
Spring Boot autoconfiguration will defer to that one instead.
Other Publishers
I’d like to expand the assortment of publishers. It’s not hard to see the opportunities:
-
Artifactory or Nexus
-
attachments in ane email
-
Github Packages
-
an FTP service
-
Dropbox
You get the idea. Literally infinite potential. Just a matter of time and will.
Running the build when you update the book
So, you’ve got everything installed and configured. You have the pipeline setup on one node. You have the .adoc
files living in some other repository, and now you want to force the book to rebuild each time you change the .adoc
repository. Unfortunately, Github Actions don’t yet have a way to force one build to happen whenever another changes. But we can force our pipeline to run manually using a repository_dispatch
.
Triggering the Pipeline Whenever the Book Repository Changes
You’ll need to have a Github Personal Access Token - this is easy to do from your Developer Settings
page in the Settings
section of your account. Configure a Github Actions secret called GH_PAT
in your repository’s Secrets
section that contains your Github Personal Access Token.
Then, configure a Github Action for the Github repository that contains your .adoc
files. You can use the Github Action workflow in the sample-book
repository as a foundation:
name: CI
env:
PIPELINE_ORG_NAME: bootiful-asciidoctor
PIPELINE_REPO_NAME: sample-pipeline
GH_PAT: ${{ secrets.GH_PAT }}
on:
push:
branches: [ master ]
pull_request:
branches: [ master ]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- run: ${GITHUB_WORKSPACE}/.github/workflows/trigger_book_build.sh
This workflow will run whenever you git push
the changes to your .adoc
files. The Github Action couldn’t be more trivial: it clones your .adoc
file and then triggers the book publication pipeline. In this case, that pipeline is http://github.com/bootiful-asciidoctor/sample-pipeline. Make sure to adjust the environment variables to reflect your Github repository.
The workflow in turn invokes a Shell script in the same directory - trigger_book_build.sh
- that looks like this:
#!/usr/bin/env bash
# (1)
curl -H "Accept: application/vnd.github.everest-preview+json" -H "Authorization: token ${GH_PAT}" --request POST --data '{"event_type": "update-event"}' https://api.github.com/repos/${PIPELINE_ORG_NAME}/${PIPELINE_REPO_NAME}/dispatches
-
Note that we’re sending a payload in the body of the HTTP
POST
entity in which we specify the name of the event we’d like to trigger. It’s arbitrary. Here, we say that theevent_type
isupdate-event
, butupdate-event
could just as easily have beenpublish-please-event
or anything else, so long as we remember the event we’ve specified when implementing the next part…
That curl
command invokes the Github API and triggers the pipeline workflow which in turn then ends up pulling in this very sample-book
repository and all the dependent code repositories and then, a few minutes later, produces all the various documents.
Enabling the Pipeline repository to listen for changes
You don’t need to do much to make this work here besides ensure that your workflow file explicitly supports our arbitrary event. You can refer to the sample-pipeline
Github Actions workflow for a more thorough example, but here’s the important bit:
on:
repository_dispatch:
types: update-event
push:
branches: [ master ]
pull_request:
branches: [ master ]
That tells Github Actions to run the pipeline whenever anybody updates the code in the pipeline repository itself or whenever it receives a valid update-event
, like the one we’re publishing from the sample-book
repository Github Actions.