15
.
6
.
2015

Cross-language benchmarking - Part III

Git submodules and the single-command cross language benchmark

In my recent blog posts (part 1part 2) I have described in detail how to do micro benchmarking for Java and C/C++ with JMH and Hayai. I have presented a common execution approach based on Gradle.

Today I want to improve the overall project structure. Last time I already mentioned, that the project structure of the Gradle projects is not optimal. In the first part I will roughly repeat the main goal and proceedings from the past articles, secondly introduce some new requirements, and finally I will present you a more flexible module structure to split production code and benchmarks, which will then be embedded in a cross language super-project.

1.) What we covered so far

To keep track of our Todo list, here an excerpt from part 1:

We have already achieved all striked out items. Today, we will focus on the blue items.

2.) The messy things

For the sake of completeness, here is the Gradle build script where we want to identify new requirements today:

github:0cec60a93b72d01debcb

Baseline:

  • This build script is responsible for building the artifact under test AND for micro benchmarking execution
  • main project has dependencies to plugins and libraries that are not needed by the actual output artifact.
  • benchmark definitions need to reside in the same project source root together with the main program code.
  • The build script contains many setup details, you normally do not want to see

To have a second look at the project setup here is the link to the branch where we start off today: Chroma@github, Branch: crolabefra_starting_point.

Requirements:

  • Benchmarking code should be in a separate project, which depends on the artifact to benchmark, but the artifact itself should be independent of any benchmarking libraries or setup
  • The automation tool should be called once to execute all benchmarks over all attached projects
  • As a cross-language micro benchmark developer, I want to apply a framework plugin, that handles all the setup for me, which is needed to become result data (in the right format).

Both new requirements lead to restructuring the current setup completely. In the following sections, I will show you how

  • to use git and submodules to split up product code and benchmarks
  • the whole picture looks, tied up together with the Hayai integration in a cross language benchmarking project for Java and C/C++ code.
  • to tie everything together in a neat Gradle Plugin (future post)

3.) Separate out benchmarks project

This is the status quo from part 1:

github:b0b2bb8b0f17fde74c1f

We aim for the following structure:

github:08bfd53dee0a3c6fca5a

Advantages are obvious:

  • We have two different gradle modules that decouple build and library dependencies: Separation of dependencies, build steps and custom Gradle code.
  • JMH code is not part of the product source: Separation of responsibility, visibility. Cleaner project setup. The code to benchmark can either be a dependency to fetch from a repository or library or whatever comes to your mind!
  • Two different directories can be in managed in separate git repositories, means, benchmarking code and production code versions are no longer tightly coupled! A set of benchmarks can be easily executed on different SCM versions of the production code with just checking out another commit!

The only disadvantage that comes with the first bullet is, that we end up in a multi-module project. But the setup is very easy in Gradle. Basically only a settings.gradle file is needed in the ‘chromarenderer-java-benchmarks’ directory:

github:5281cf017b54c388185c

After changing the directory structure,, we can go on with removing the parts from both build.gradle files we no longer need. For the benchmarking project first:

github:aae39f85c117201020fe

Ok, we no longer need the JUnit dependency but need the dependency on the project to benchmark now. Well we did not win that much. But wait, what happens for the production code project? Look:

github:f165c90ae6f5036faec7

That what we wanted to see! Not a single line of evidence that there is JMH running benchmarks on the project! Like a vanilla hassle-free gradle project. For the effective examples, take a look at the example project repositories I prepared for this article:

4.) Git submodules

Now for all of you who have never used Git submodules before, here a short summary: Roughly speaking, Git submodules become handy when you want to have

  • repositories in repositories
  • one repository to participate in different other SCMs
  • import 3rd party code from other repositories which you do not want to manage in your own SCM obviously.

4.1) Concept of submodules

Use case example: Assume you have two projects, that both use a common set of files:

github:27bdea5331971877133f

Now instead of adding the same set of files to both git repositories (which obviously will cause a version mess a soon as both projects want to make changes to the common files, which need to be synced manually then), it is more clever to move all files in ‘common_assets’ to their own git repository:

github:56257004015587557251

Now you can tell your project repository to clone another repository into the directory and mark that directory as a submodule.

github:9439b8358919d768db3a

Git will then clone that submodule repository which will stay completely independent from the surrounding repository.

The thing which can now cause headaches is the fact that the parent repository only keeps track of the current HEAD revision. Stages and change-sets on the files inside the submodule are not visible to the parent, only changes of the local HEAD revision are recognized. Initially, after the command above, the change set should look like this:

github:4b4f6dc1e418f48b1f7c

.gitmodules contains the details about the newly created submodule. The new directory common_assets is now shown as it would be a simple file. Background is, that git created a link to a directory inside the .git folder (.git/modules/common_assets to be precise). After creating a commit out of the change set, git stored the currently checked out revision in the submodule. Just to bring probably more clarity to it, step into the submodule and change the HEAD by checking out another commit. This operation will produce a change set which will look like this

github:eeb536c533f8a244dc6c

If you return to the previous commit, the change is gone. Those submodule HEAD changes can be added to a commit like any other file change. The only thing that will be stored internally in the parent repository is the commit id of the currently checked out HEAD. What happens behind the scenes in the parent directory can be observed easily with the ‘diff’ command after changing the submodule HEAD revision again:

github:7c5b7091c2be39d6b555

That again looks like a normal file change, but in fact you changed the tracked HEAD revision of a submodule :).

I hope I brought some clarity to the concept. In any case I recommend the git manual forfurther reading.

General benefits

  • A submodule allows to combine different repositories in one parent without really adding the source files to a second SCM.
  • Changes to submodule content can be done at any time because it is a regular git repository on its own.
  • You can decide, when to pull new content in your submodules.
  • A project that consists of different submodules can keep track of the different combinations that were committed in history.
  • Submodule HEAD revisions can be easily changed at any time to an arbitrary point in the history without causing major file-set changes.

4.2) Git submodules and benchmarks?

Let’s get back to our new project structure. Combining the new structure of having benchmarks ‘surrounding’ the production code allows us to put them in independent repositories. The project to benchmark is then checked out as submodule into the benchmark project:

github:e591c20f881b44d4b4a6

Important note regarding git repository cloning which contain submodules:

After cloning a git repository, all contained submodules are NOT initialized and cloned automatically! You have two options. Either clone the repository with the recursive parameter:

github:6e4193a233f76c49f214

or, after a normal clone, execute:

github:425568579f9516fd6f64

All submodules you have in the repository will be fetched and the currently valid HEAD revision will be checked out.

5.) The missing ‘cross’ in cross language benchmarking

We spoke about the Java part so far. The concepts can be directly applied to the C++ project we started in Part 2 as well, of course. For the sake of not bothering you with the same things twice, I will just refer to the git repository I prepared for the Hayai benchmarks project:chromarenderer-cpp-benchmarks@github. You will recognize exactly the same changes:

  • All benchmark-related things moved one directory up
  • The core project ‘chromarenderer-cpp’ got rid of all dependencies on benchmarking infrastructure
  • Surrounding benchmarking module defines dependency on core module

Cool, looks like that can be generalized :).

Means, we now have two different benchmarking projects that can be run in isolation:

5.1) Yet another super Project

We could now check out both repositories and execute the benchmarks with a single command each. But no, we are even more lazy. We want one cross-language repository with one cross-language build. So let’s add another level of super project:

github.934f661560d5a2b3d9af

5.2) Small Gradle pitfall

In theory, the proposed directory model looks promising. But as soon as you moved everything and give it a try, you will be surprised, that this setup doesn’t work. In your settings.gradle in the CroLaBeFra super project, you would probably try something like:

github:ddfdc790248bc84e45bb

Now what strikes you, is that Gradle is unable to detect multiple ‘settings.gradle’ files in one project. Consequently, includes in the sub projects are ignored. But it wouldn’t be Gradle if there wasn’t a workaround ;). Because the super project should know that its includes bring more sub projects into the build, you could include the ‘settings.gradle’ files to you super project as well:

github:252a9ba34809d3509519

Now Gradle will read the subproject ‘settings.gradle’ files as well as if it would be part of the super-project ‘settings.gradle’ file. Downside is now, that the assumed directory structure gets inconsistent. Background:

github:e20db166ad5acec68291

basically means the same as

github:66838a20dcdf01e69b50

because the content will simply be evaluated in the super project context. But that directory and sub-project ‘chromarenderer-java’ do not exist on the super project directory level! But again, it wouldn’t be Gradle if there wasn’t a solution.

Fortunately, Gradle first collects all includes from projects before really accessing the directories. This means, we can change it afterwards by adding the following lines:

github:013d2594d0f819a28eb9

It feels a little bit dirty, but in the end, as long as Gradle does not support multiple settings.gradle files natively, there is no other way. As we do all the dirty things in the super project, we do not have to touch our sub-projects and they stay free of such workarounds.”

Ultimate result of the day with all changes and proposals applied: CroLaBeFra-POC@github

Clone (recursively) and simply run

github:f313b89ab08fecdba701

Mission accomplished :)

6.) Conclusion and ongoing work

Today we solved topic 7 and 5 from the list. Our core product project is independent of any benchmarking infrastructure and/or source code. A straightforward directory structure combined with the power of Git submodules allows for an easy management of the Gradle multi-module setup. With some Gradle magic, we added another project level on top, which allows to access and execute all benchmarking tasks with a single command. Looking at the list, we are almost done.

But I have one more topic I would like to add to the list:

  • Extract all benchmarking config from the ‘build.gradle’ files into easy-to-use Gradle plugins, in order to offer them to all of you :). Overall goal is to have a set of various plugins for different languages that can be applied to a benchmarking project. Ideally the only thing you need to do then, is
github:0dbd0f708ab47ed47d4b

instead of having thirty lines of Gradle script code to copy. Sounds good? Stay tuned for my next article on that!

So long!

Cheers Benni

Further articles of this series:

Benjamin
Software Engineer & Fellow