Branching Strategies

Branching can either be embraced or avoided. Chris Oldwood documents the pros and cons of three main branching strategies.

One area in software development that appears to have suffered from the malaise of the Cargo Cult [ Wikipedia-1 ] is the use of branching within the version control system. The decision to use, or avoid, branches during the development of a software product sometimes seems to be made based on what the ‘cool companies’ are doing rather than what is suitable for the project and team itself.

What is often misunderstood about the whole affair is that it is not necessarily the branching strategy that allows these cool companies to deliver reliable software more frequently, but the other practices they use to support their entire process, such as automated testing, pair programming, code reviews, etc. These, along with a supportive organizational structure mean that less reliance needs to be made on the use of code branches to mitigate the risks that would otherwise exist.

This article describes the three main types of branching strategy and the forces that commonly dictate their use. From there it should be possible to understand how the problems inherent with branches themselves might be avoided and what it takes to live without them in some circumstances.

Codeline policies

Branches are lines, in the genealogy sense, of product development that reflect an evolution of the codebase in a way that is consistent for a given set of constraints. In essence each branch has a policy [ Berczuk02 ] associated with it that dictates what types of change (commit) are acceptable into that codeline. When there is an ‘impedance mismatch’ [ c2-1 ] between the code change and the policy, a branch may then be created to form a new codeline with a compatible policy.

All this talk of ‘forces’ and ‘policies’ is just posh speak for the various risks and mitigating techniques that we use when developing software. For example a common risk is making a change that breaks the product in a serious way thereby causing disruption to the entire team. One way of reducing the likelihood of that occurring is to ensure the code change is formally reviewed before being integrated. That in turn implies that the change must either be left hanging around in the developer’s working copy until that process occurs or committed to a separate branch for review later. In the former case the developer is then blocked, whilst in the latter you start accruing features that aren’t properly integrated. Neither of these options should sound very appealing and so perhaps it’s the development process that needs reviewing instead.

Merging can be expensive

Branching is generally cheap, both in terms of version control system (VCS) resources and time spent by the developer in its creation. This is due to the use of immutability within the VCS storage engine which allows it to model a branch as a set of deltas on top of a fixed baseline.

Whilst the branch creation is easy, keeping it up-to-date (forward integration) and/or integrating our changes later (reverse integration) can be far more expensive. It’s somewhat ironic that we talk about ‘branching’ strategies and not ‘merging’ strategies because it’s the latter aspect we’re usually most interested in optimising. Merge Debt is a term that has sprung up in recent times to describe the ever increasing cost that can result from working in isolation on a branch without synchronising yourself with your surroundings.

Improvements in tooling have certainly made merging two text-based files slightly easier and there are new tools that try and understand the ‘meaning’ of a code change at a language level to further reduce the need for manual intervention. Of course even these cannot help when there are semantic conflicts (syntactically correct changes that just do the wrong thing) [ Fowler ]. And sadly binary files are still a handful. Refactoring can also be a major source of merge headaches when the physical structure of the codebase changes underneath you; this is compounded by the common practice of giving the folders and files the same names as the namespaces and classes.

As we shall see in the following sections, branches are essentially graded by their level of stability, or degree of risk. Consequently the preferable direction for any merging is from the more stable into the more volatile on the assumption that tried-and-tested is less risky. The reason ‘cherry pick’ merges [ c2-2 ] get such a bad name is because they usually go against this advice – they are often used to pull a single feature ‘up’ from a more volatile branch. This carries with it the risk of having to drag in dependent changes or to try and divorce the desired change from its dependants without breaking anything else.

Integration branches

Before embarking on a full discussion of the main branching strategies we need to clear up some terminology differences that often come up as a result of the different naming conventions used by various VCS products.

Although there are three basic strategies, there are only two real types of branch – integration and private. Either you share the branch with others and collaborate or you own the branch and are solely responsible for its upkeep. It’s when you share the branch with others that the sparks really start to fly and so these tend to be minimised.

For small teams there is usually only a single major integration branch and this often goes by the name of main, trunk or master. Sometimes this is known as the development branch to distinguish it from one of the other more specialised kinds. Either way it’s expected that this will be the default branch where the majority of the integration will finally occur.

In larger organisations with much bigger teams there might be many integration branches for the same product, with perhaps one integration branch per project. At this scale the integration branch provides a point of isolation for the entire project and may spin off its own child branches. Multiple integration branches come with additional overhead, but that may well be less than the contention generated by a large team trying to share a single integration branch. If the project itself carries a large degree of uncertainty or cannot be delivered piecemeal then this project-level isolation can be more beneficial in the long run.

Release branch

Back in the days before VCS products supported the ability to branch you essentially had only one branch where all change took place. As the development process reached a point where the product was readying itself for a formal release, a code freeze was often put in place to reduce the changes to only those directly required to get the product out of the door. For those developers not directly working on ‘finishing’ the product they had to find other work to do, or find other ways to manage any code changes destined for later releases.

Once branching became available a common answer to this problem was to branch the codebase at a suitable moment so that work could continue on the next version of the product in parallel with the efforts to the stabilise the impending release. The codeline policy for a release branch is therefore based around making very few, well-reviewed, well-tested changes that should resolve outstanding issues without creating any further mess. As the release date approaches finding the time to continually test and re-test the entire product after every change can become much harder and therefore more time is often spent up front attempting to decide whether further change is even really desirable.

The branch is often not ‘cut’ from the development line at an arbitrary point in time – there will probably have been a reduction in high-risk changes leading up to the branch point so as to minimise the need to try and revert a complex feature at the last minute. By the time the release branch is ready to be created it should be anticipated that future additional changes will be kept to a bare minimum. This implies that during project planning the high-risk items are front-loaded to ensure they are given the longest time to ‘bed in’, i.e. you don’t upgrade compilers the day before a release.

The most extreme form of a product release is probably a patch, or hotfix. Time is usually the most critical aspect and so it demands that any change be completed in total isolation as this allows it to be done with the highest degree of confidence that there are no other untoward side-effects. This kind of release branch is usually created directly from a revision label as that should be the most direct way to identify the part of the product’s entire history that corresponds to the product version needing remediation. Whereas a branch is an evolving codeline, a label (or tag) is a snapshot that annotates a single set of revisions as a specific milestone.

What should be apparent about this particular strategy is that it’s mostly about compensating for a lack of stability in the main development process. If you never have to worry about supporting multiple product versions, then in theory you can change your development process to avoid the need for formal release branches. By ensuring you have adequate automated feature and performance testing and a streamlined development pipeline you should be able to deliver directly from the main integration branch.

However, despite the development team’s best efforts at working hard to minimise the delays in getting a feature into production, there can still be other organizational problems that get in the way of delivery. Maybe there needs to be formal sign-off of each release, e.g. for regulatory purposes, or the QA cycle is out of your hands. In these cases the release branch acts more like a quarantine zone while the corporate cogs slowly turn.

From a merging perspective release branches are generally a low-maintenance affair. As already stated the most desirable merge direction is from the stable codebase and release branches changes should be about the most carefully crafted of them all. Due to each one usually being an isolated change with high importance they can be merged into any ongoing integration branches the moment it becomes practical instead of waiting until the end.

Feature/task branch

If you think of the main development branch as the equator then a feature branch is the polar opposite of a release branch. Where the codeline policy for a release branch is aimed at providing maximum stability through low-risk changes, a feature branch has a policy aimed at volatile, high-risk changes. Instead of protecting the release from unwanted side-effects we’re now protecting the main development pipeline from stalling for similar reasons.

The definition of ‘feature’ could be as small as a simple bug fix made by a single developer right up to an entire project involving many developers (the aforementioned project-level integration branch). Other terms that are synonymous are ‘task branch’ and ‘private branch’. One suggests a narrower focus for the changes whilst the other promotes the notion of a single developer working in isolation. Either way the separation allows the contributor(s) to make changes in a more ad hoc fashion that suits their goal. As such they need not worry about breaking the build or even checking in code that doesn't compile, if that’s how they need to work to be effective.

One common use for a feature branch is to investigate changes that are considered experimental in nature, sometimes called a spike [ ExtremeProgramming ]. This type of feature may well be discarded at the end of the investigation with the knowledge gained being the point of the exercise. Rather than pollute the integration branch with a load of code changes that have little value, it’s easier to just throw the feature branch away and then develop the feature again in a ‘cleaner’ manner. Many version control systems don’t handle file and folder renames very well and so this makes tracing the history across them hard. For example, during a period of heavy refactoring, files (i.e. classes) may get renamed and moved around which causes their history to become detached. Even if the changes are reverted and the files return to their original names the history can still remain divorced as the VCS just sees some files deleted and others added.

In some cases the changes themselves may be inherently risky, but it may also be that the person making the changes might be the major source of risk. New team members always need some time getting up to speed with a new codebase no matter how experienced they are. However, junior programmers will likely carry more risk than their more senior counterparts, therefore it might be preferable to keep their work at arms length until the level of confidence in their abilities (or the process itself adapts) to empower them to decide for themselves how a change should best be made.

Once again it should be fairly apparent that what can mitigate some uses of feature branches is having a better development process in the first place. With a good automated test suite, pair programming, code reviews, etc. the feedback loop that detects a change which could destabilise the team will be unearthed much quicker and so headed it off before it can escalate.

What makes feature branches distasteful to many, though, is the continual need to refresh it by merging up from the main integration branch. The longer you leave it before refreshing, the more chance you have that the world has changed underneath you and you’ll have the merge from hell to attend to. If the team culture is to refactor relentlessly then this will likely have a significant bearing on how long you leave it before bringing your own branch back in sync.

Frequently merging up from the main integration branch is not just about resolving the textual conflicts in the source code though. It’s also about ensuring that your modifications are tested within the context of any surrounding changes to avoid the semantic conflicts described earlier. Whilst it might technically be possible to integrate your changes by just fixing any compiler warnings that occur in the final merge, you need to run the full set of smoke tests too (at a minimum) so that when you publish you have a high degree of confidence that your changes are sound.

Shelving

There is a special term for the degenerate case of a single-commit feature branch – shelving. If there is a need to suddenly switch focus and there are already changes in flight that you aren’t ready to publish yet, some VCSs allow you to easily put them to one side until you’re ready to continue. This is usually implemented by creating a branch based on the revision of the working copy and then committing any outstanding changes. When it’s time to resume, the changes can be un-shelved by merging the temporary branch back into the working copy (assuming the ancestry allows it).

One alternative to shelving is to have multiple working folders all pointing at different branches. If you constantly have to switch between the development, release and production codebases, for example, it can be easier (and perhaps faster) to just switch working folders than to switch branches, especially now that disk space is so cheap.

Forking

The introduction of the Distributed Version Control System (D-VCS) adds another dimension to the branching strategy because a developer’s machine no longer just holds a working set of changes, but an entire repository. Because it’s possible to make changes and commit them to a local repo, the developer’s machine becomes a feature branch in its own right. It is still subject to the same issues in that upstream changes must be integrated frequently, but it can provide far more flexibility in the way those changes are then published because of the flexibility modern D-VCSs provide.

No branch/feature toggle

Back in the days before version control systems were clever enough to support multiple threads of change through branches, there was just a single shared branch. This constraint in the tooling had an interesting side-effect that meant making changes had to be more carefully thought out.

Publishing a change that broke the build had different effects on different people. For some it meant that they kept everything locally for as long as possible and only committed once their feature was complete. Naturally this starts to get scary once you consider how unreliable hardware can be or what can go wrong every time you’re forced to update your working folder, which would entail a merge. Corruption of uncommitted changes is entirely possible if you mess the merge up and have no backup to return to.

The other effect was that some developers learnt to break their work down into much more fine-grained tasks. In contrast they tried to find a way to commit more frequently but without making changes that had a high chance of screwing over the team. For example new features often involve some refactoring work to bring things into shape, the addition of some new code and the updating or removing of other sections. Through careful planning, some of this work can often be done alongside other people’s changes without disturbing them, perhaps with some additional cost required to keep the world in check at all times. For instance, by definition, refactoring should not change the observable behaviour and so it must be possible to make those changes immediately (unanticipated performance problems notwithstanding).

This then is the premise behind using a single branch for development along with feature toggles to hide the functionality until it is ready for prime time. The notion of ‘always be ready to ship’ [ c2-3 ] engenders an attitude of very small incremental change that continually edges the product forward. The upshot of this is that ‘value’ can be delivered continually too because even the refactoring work has some value and that can go into production before the entire feature might be implemented. Feature toggles are a mechanism for managing delivery whereas branches are a mechanism for managing collaboration. The desire to increase collaboration and deliver more frequently will usually lead to the use feature toggles as a way of resolving the tension created by partially implemented stories.

This method of development does not come easily though, it demands some serious discipline. Given that every change is published to the team and the build server straight away means that there must be plenty of good practices in place to minimise the likelihood of a bug or performance problem creeping in unnoticed. The practices will probably include a large, mostly automated test suite along with some form of reviewing/pairing to ensure there are many ‘eyes’ watching.

The way that the feature is ‘toggled’ can vary depending on whether its activation will be static (compile time) or dynamic (run time). From a continuous-testing point of view it makes far more sense to ensure any new feature is enabled dynamically otherwise there are more hoops to jump through to introduce it into the test suite. Doing it at runtime also helps facilitate A/B testing [ Wikipedia-2 ] which allows old and new features to run side-by-side for comparison.

The nature of the toggle varies depending on what mechanisms are available, but either way the number of points in the code where the toggle appears should be kept to an absolute minimum. For example, instead of littering the code with #ifdef style pre-processor statements to elide the code from compilation it is preferable to have a single conditional statement that enables the relevant code path:

  if (isNewFeatureEnabled)
    DoNewFeature();

The toggle could take the form of a menu item in the UI, an entry in a configuration file, an #ifdef compilation directive, a data file, an extra parameter in an HTTP request, a property in a message, the use of REM to comment in/out a command in a batch file, etc. Whatever the choice its absence will generally imply the old behaviour, with the new behaviour being the exception until it goes live for good. At that point it will disappear once again.

One side-effect of working with feature toggles is that there might be a clean-up exercise required at the end if it gets pulled or if it supplants another feature – this will happen after go live and so needs to be planned in. During development there will also be periods of time where ‘unused code’ exists in the production codebase because the feature hasn’t been fully implemented yet. Whilst it’s beneficial that others get early sight of the ongoing efforts they need to be sure not to delete what might seem to be ‘dead code’.

The motivation for not branching is effectively to avoid merging at all. That won’t happen, simply because you need to continually refresh your working folder and any update could require a merge. However, the likelihood that any conflicts will crop up should be greatly diminished. In particular ‘noisy’ refactorings can be easier to co-ordinate because the changes can be made, pushed and pulled by others with the minimum of fuss.

Hybrid approaches

The three core branching strategies are not in any way mutually exclusive. It’s perfectly acceptable to do, say, the majority of development on the main integration branch with occasional feature branches for truly risky tasks and release branches to avoid getting stalled due to bureaucracy (e.g. waiting for the Change Review Board to process the paperwork).

Example: Visual Studio upgrade

Historically a tool like Visual C++ cannot be silently upgraded. Its project and solution data files are tied to a specific version and must match the version of the tool being used by the programmer. In the past this has created problems for larger teams where you all cannot just migrate at the same time without some serious groundwork. Aside from the project data file problems, in C++ at least, there is also the problem of the source code itself being compatible with the new toolchain. Visual C++ used to default to the non-standard scoping rules for a for loop meaning that the loop variable could leak outside the loop and into lower scopes. Bringing the codebase in line with the ISO standard also meant source code changes to be handled too.

When I tackled this with a medium sized team that were working on separate projects on multiple integration branches I had to use a combination of branching approaches as a big bang switchover was never going to work. Although the build and deployment process was somewhat arcane, the fact that multiple streams already existed meant that the parallelisation aspect was going to be less painful.

As part of an initial spike, I used a feature branch to investigate what I needed to upgrade the tooling and to see what the impact would be vis-à-vis source code changes. The end result of that were just some new build scripts to handle the tooling upgrade; everything else was ditched.

The next step was to bring as much of the existing codebase up to scratch by fixing the for loop scoping manually (where necessary) by inducing an extra scope (you just enclose the existing loop with another pair of braces). On one integration branch I upgraded the toolchain locally, fixed all the compiler errors and warnings, then reverted the toolchain upgrade, re-compiled with the current toolchain to verify backwards compatibility and finally committed just the source code changes. An email also went out too educating the other developers for the kinds of issues that might crop up in the future so that any new code would stand a better chance of being compatible at the final upgrade time.

Those code changes and the new upgrade scripts were then merged (cherry picked) into every integration branch so that each one could then be inspected and any new changes made since the original branch point occurred could made compatible too. At this point all the integration branches were in good shape and ready to migrate once we had ironed out the non-syntactic problems.

The next step was to verify that a build with the new toolchain worked at runtime too and so a new feature branch was taken from one of the integration branches which could be used to build and deploy the product for system testing. This allowed me to iron out any bugs in the code that only showed up with the new compiler behaviour and runtime libraries. Once fixed these changes could also be pushed across to the other integration branches so that all of the projects are now in a position to make the final switch.

My goal when doing this work was to avoid messing up any one project if at all possible. The uncertainty around the delivery schedule of each project meant that I didn’t know at the start which one was going to be the best to use to ‘bed in’ the upgrade, so I made sure they were all candidates. Whilst it felt wasteful to continuously throw changes away (i.e. the upgraded project files) during the migration process, the painless way the final switchover was done probably meant more time was saved by my teammates in the long run.

Gatekeeper workflows

In recent times one particular hybrid approach has sprung up that attempts to formalise the need to pass some review stage before the change can be accepted into the main codeline. This review process (the aforementioned gatekeeper) can either be done entirely automatically via a continuous integration server; or via some form of manual intervention after the build server has given the change the green light.

This style of workflow is the opposite of the ‘no branching’ approach because it relies on not letting anyone commit directly to the integration branch. Instead each developer gets their own feature branch into which they make their changes. Any time a developer’s branch has changed the continuous integration server will attempt to merge it with the integration branch, then build and run the test suite.

If that process succeeds and an automated gatekeeper is in play then the merge is accepted and the main branch is advanced. If a manual gatekeeper is involved, perhaps to review the change too, they can perform that knowing it has already passed all the tests which helps minimise wasting time on reviewing low quality changes. If the change fails at any stage, such as the merge, build or test run then the developer will need to resolve the issue before going around the loop again.

Whilst this has the benefit of ensuring the main development branch is always in a consistent state, build-wise, it does suffer from the same afflictions as any other feature branch – a need to continuing merge up from the integration branch. That said, where the no branching approach relies heavily on diligence by the programmer these workflows look to leverage the tooling in the continuous integration server to try and minimise the overhead. For example, just as they can automatically merge a feature branch to the integration branch on a successful build, they can also merge any changes from the integration branch back out to any feature branches when updated by the rest of the team. The net effect is that programmers can spend less time worrying about ‘breaking the build’ because they never contribute to it unless their changes are already known to be coherent.

This style of workflow could also be combined with feature toggles to aid in delivering functionality in a piecemeal fashion.

Summary

The goal of this article was to distil the folklore surrounding branching strategies down into the three key patterns – no branching, branching for a feature and branching for a release. We identified the policies that commonly drive the choice of strategy and the forces, often organisational in nature, that can push us in that direction. Finally we looked at how and when it might be suitable to combine them rather than blindly try to stick to the same strategy all the time, and how tooling is beginning to help reduce some of the overhead.

Acknowledgements

A big thumbs-up from me goes to Mike Long, Jez Higgins and the Overload review collective for their valuable input. And mostly to Fran, the Overload editor, for her patience as I made some significant last minute edits.