In my recent blog posts (part 1, part 2) I have described in detail how to do micro benchmarking for Java and C/C++ with JMH and Hayai. I have presented a common execution approach based on Gradle.
Today I want to improve the overall project structure. Last time I already mentioned, that the project structure of the Gradle projects is not optimal. In the first part I will roughly repeat the main goal and proceedings from the past articles, secondly introduce some new requirements, and finally I will present you a more flexible module structure to split production code and benchmarks, which will then be embedded in a cross language super-project.
1.) What we covered so far
To keep track of our Todo list, here an excerpt from part 1:
We have already achieved all striked out items. Today, we will focus on the blue items.
2.) The messy things
For the sake of completeness, here is the Gradle build script where we want to identify new requirements today:
- This build script is responsible for building the artifact under test AND for micro benchmarking execution
- main project has dependencies to plugins and libraries that are not needed by the actual output artifact.
- benchmark definitions need to reside in the same project source root together with the main program code.
- The build script contains many setup details, you normally do not want to see
To have a second look at the project setup here is the link to the branch where we start off today: Chroma@github, Branch: crolabefra_starting_point.
- Benchmarking code should be in a separate project, which depends on the artifact to benchmark, but the artifact itself should be independent of any benchmarking libraries or setup
- The automation tool should be called once to execute all benchmarks over all attached projects
- As a cross-language micro benchmark developer, I want to apply a framework plugin, that handles all the setup for me, which is needed to become result data (in the right format).
Both new requirements lead to restructuring the current setup completely. In the following sections, I will show you how
- to use git and submodules to split up product code and benchmarks
- the whole picture looks, tied up together with the Hayai integration in a cross language benchmarking project for Java and C/C++ code.
- to tie everything together in a neat Gradle Plugin (future post)
3.) Separate out benchmarks project
This is the status quo from part 1:
We aim for the following structure:
Advantages are obvious:
- We have two different gradle modules that decouple build and library dependencies: Separation of dependencies, build steps and custom Gradle code.
- JMH code is not part of the product source: Separation of responsibility, visibility. Cleaner project setup. The code to benchmark can either be a dependency to fetch from a repository or library or whatever comes to your mind!
- Two different directories can be in managed in separate git repositories, means, benchmarking code and production code versions are no longer tightly coupled! A set of benchmarks can be easily executed on different SCM versions of the production code with just checking out another commit!
The only disadvantage that comes with the first bullet is, that we end up in a multi-module project. But the setup is very easy in Gradle. Basically only a settings.gradle file is needed in the ‘chromarenderer-java-benchmarks’ directory:
After changing the directory structure,, we can go on with removing the parts from both build.gradle files we no longer need. For the benchmarking project first:
Ok, we no longer need the JUnit dependency but need the dependency on the project to benchmark now. Well we did not win that much. But wait, what happens for the production code project? Look:
That what we wanted to see! Not a single line of evidence that there is JMH running benchmarks on the project! Like a vanilla hassle-free gradle project. For the effective examples, take a look at the example project repositories I prepared for this article:
- chromarenderer-java-benchmarks@github benchmarks repository
- chromarenderer-java@github production code repository
4.) Git submodules
Now for all of you who have never used Git submodules before, here a short summary: Roughly speaking, Git submodules become handy when you want to have
- repositories in repositories
- one repository to participate in different other SCMs
- import 3rd party code from other repositories which you do not want to manage in your own SCM obviously.
4.1) Concept of submodules
Use case example: Assume you have two projects, that both use a common set of files:
Now instead of adding the same set of files to both git repositories (which obviously will cause a version mess a soon as both projects want to make changes to the common files, which need to be synced manually then), it is more clever to move all files in ‘common_assets’ to their own git repository:
Now you can tell your project repository to clone another repository into the directory and mark that directory as a submodule.
Git will then clone that submodule repository which will stay completely independent from the surrounding repository.
The thing which can now cause headaches is the fact that the parent repository only keeps track of the current HEAD revision. Stages and change-sets on the files inside the submodule are not visible to the parent, only changes of the local HEAD revision are recognized. Initially, after the command above, the change set should look like this:
.gitmodules contains the details about the newly created submodule. The new directory common_assets is now shown as it would be a simple file. Background is, that git created a link to a directory inside the .git folder (.git/modules/common_assets to be precise). After creating a commit out of the change set, git stored the currently checked out revision in the submodule. Just to bring probably more clarity to it, step into the submodule and change the HEAD by checking out another commit. This operation will produce a change set which will look like this
If you return to the previous commit, the change is gone. Those submodule HEAD changes can be added to a commit like any other file change. The only thing that will be stored internally in the parent repository is the commit id of the currently checked out HEAD. What happens behind the scenes in the parent directory can be observed easily with the ‘diff’ command after changing the submodule HEAD revision again:
That again looks like a normal file change, but in fact you changed the tracked HEAD revision of a submodule :).
I hope I brought some clarity to the concept. In any case I recommend the git manual forfurther reading.
- A submodule allows to combine different repositories in one parent without really adding the source files to a second SCM.
- Changes to submodule content can be done at any time because it is a regular git repository on its own.
- You can decide, when to pull new content in your submodules.
- A project that consists of different submodules can keep track of the different combinations that were committed in history.
- Submodule HEAD revisions can be easily changed at any time to an arbitrary point in the history without causing major file-set changes.
4.2) Git submodules and benchmarks?
Let’s get back to our new project structure. Combining the new structure of having benchmarks ‘surrounding’ the production code allows us to put them in independent repositories. The project to benchmark is then checked out as submodule into the benchmark project:
Important note regarding git repository cloning which contain submodules:
After cloning a git repository, all contained submodules are NOT initialized and cloned automatically! You have two options. Either clone the repository with the recursive parameter:
or, after a normal clone, execute:
All submodules you have in the repository will be fetched and the currently valid HEAD revision will be checked out.
5.) The missing ‘cross’ in cross language benchmarking
We spoke about the Java part so far. The concepts can be directly applied to the C++ project we started in Part 2 as well, of course. For the sake of not bothering you with the same things twice, I will just refer to the git repository I prepared for the Hayai benchmarks project:chromarenderer-cpp-benchmarks@github. You will recognize exactly the same changes:
- All benchmark-related things moved one directory up
- The core project ‘chromarenderer-cpp’ got rid of all dependencies on benchmarking infrastructure
- Surrounding benchmarking module defines dependency on core module
Cool, looks like that can be generalized :).
Means, we now have two different benchmarking projects that can be run in isolation:
5.1) Yet another super Project
We could now check out both repositories and execute the benchmarks with a single command each. But no, we are even more lazy. We want one cross-language repository with one cross-language build. So let’s add another level of super project:
5.2) Small Gradle pitfall
In theory, the proposed directory model looks promising. But as soon as you moved everything and give it a try, you will be surprised, that this setup doesn’t work. In your settings.gradle in the CroLaBeFra super project, you would probably try something like:
Now what strikes you, is that Gradle is unable to detect multiple ‘settings.gradle’ files in one project. Consequently, includes in the sub projects are ignored. But it wouldn’t be Gradle if there wasn’t a workaround ;). Because the super project should know that its includes bring more sub projects into the build, you could include the ‘settings.gradle’ files to you super project as well:
Now Gradle will read the subproject ‘settings.gradle’ files as well as if it would be part of the super-project ‘settings.gradle’ file. Downside is now, that the assumed directory structure gets inconsistent. Background:
basically means the same as
because the content will simply be evaluated in the super project context. But that directory and sub-project ‘chromarenderer-java’ do not exist on the super project directory level! But again, it wouldn’t be Gradle if there wasn’t a solution.
Fortunately, Gradle first collects all includes from projects before really accessing the directories. This means, we can change it afterwards by adding the following lines:
It feels a little bit dirty, but in the end, as long as Gradle does not support multiple settings.gradle files natively, there is no other way. As we do all the dirty things in the super project, we do not have to touch our sub-projects and they stay free of such workarounds.”
Ultimate result of the day with all changes and proposals applied: CroLaBeFra-POC@github
Clone (recursively) and simply run
Mission accomplished :)
6.) Conclusion and ongoing work
Today we solved topic 7 and 5 from the list. Our core product project is independent of any benchmarking infrastructure and/or source code. A straightforward directory structure combined with the power of Git submodules allows for an easy management of the Gradle multi-module setup. With some Gradle magic, we added another project level on top, which allows to access and execute all benchmarking tasks with a single command. Looking at the list, we are almost done.
But I have one more topic I would like to add to the list:
- Extract all benchmarking config from the ‘build.gradle’ files into easy-to-use Gradle plugins, in order to offer them to all of you :). Overall goal is to have a set of various plugins for different languages that can be applied to a benchmarking project. Ideally the only thing you need to do then, is
instead of having thirty lines of Gradle script code to copy. Sounds good? Stay tuned for my next article on that!
Further articles of this series: