Continuous Integration is NOT the key

I’ve been reading this book for several months and finally I finished it; hurray!. It took so long because in the meantime I’ve been learning other programming languages and strategies and… well, honestly, the book was quite boring. Not only for the contents, that I’ll explain in detail below, but also because its poor style: long paragraphs, very few examples… I honestly find Robert Martins or Steve McConnells much more entertaining, because they spreadly use examples, short sentences, short paragraphs, experience stories, and they write in such a way that it’s like they talk directly to you.

Anyway, I started this book reading carefully every single line because I had really good feelings about it. By reading the summary I found lots of interesting topics related to releasing, evolving and maintaining a software product. But as time went by I started to skim pages one after another, very fast, in my last summer vacations. And, when I finished it I felt relieved.

Why do I criticize this book? Well, the issue is that the book was thirdly edited in 2010 and it’s currently obsolete. The content is pointless because it talks about Continuous Integration as the only and best key to achieve success in a project development. And, well, I have a completely different opinion about that. The thing is that when they talk about the ads of continuous integration what I see is just its weaknesses:

  • Check in frequently. How could developers checkin frequently on the mainline if it’s certainly possible to break the build and lots of fingers will point on their direction, demanding to fix the problem within minutes?? How could they even use meaningful comments in their checkins if they only dare to checkin when they’re done with their work? Most of the people will just type “task done”, as all the changes have been checked in together… if they could checkin the changes separatedly by using meaningful blocks they probably would include comments such as: “refactor in class XXX done”, “fixed a problem in the service provider”, or “New form added to enter new users”. The key is identify small tasks, but even small tasks will certainly require more than one bunch of changes.
  • If something goes wrong, stop the process until it’s fixed, or revert the changes! We were about increasing productivity, weren’t we? Then why stopping all the people until the build is fixed? That breaks the flow, increases stress and makes that the developers don’t checkin frequently, that is for sure.
  • Tests should run as fast as possible, ideally within seconds; 10 minutes at most. That’s not a real necessity, but it’s imposed to solve a Continuous Integration weakness: since the developers commit to the mainline all the time then you need to test every single changeset created. In order to achieve that you need that your tests run fast. What could you test in less than 10 minutes? Including compilation and building the installers, of course… let’s see… probably just (hopefully) well designed unit tests.

Kent Beck probably would say that you can test whatever you want just with unit tests, or that unit tests gives you a high confidence about your changes. Right, but when it’s all about complex software systems, we all know that we need to test a GUI, we have acceptance tests, non-functional requirements (such as performance), and so on, and unit tests cannot cover those issues (this is by definition; I don’t say that; Kent Beck does). So, what confidence you get about the existence of new freshly-introduced bugs in the code?

If only developers could work on isolated environments they could take the time to run as many tests as possible; this way, when the code is integrated in the mainline there’s a strongly certainty that they don’t introduce new bugs. Take into account that I’m talking about tests that should be run in a build, not local tests that are intended to be run by a developer. These last ones should be certainly fast.

I understand the main intention: keep the product releasable all the time. That is good, indeed and I agree with that. But there must be a better option, right?

The problem is that those authors that strongly defend the Continuous Integration pattern are not very open-minded to new, better approaches, Martin Fowler among them. They even state that branching is a bad solution, when they actually mean to say: merging is a nightmare if we don’t have the right tools to perform it. But things are not quite like that nowadays.

If only we had the right tools to branch our code, work on such isolated environments without disturbing the rest of the team and then merge our changes on the mainline… we wouldn’t have to worry about checkin it without introducing bugs, since the code wouldn’t need to even compile until we are done with it! We could checkin our code as frequently as we want, create sets of changes (aka changesets) that really express intention, and, assuming that intention is important in our job (writing code and tests, writing comments in our checkins), that sounds good, isn’t it?

On the other hand, you can find the book useful because it covers interesting topics related to development: testing strategies, infrastructures, some hints… but always oriented to continuous integration, so for the better please keep that in mind and forget about the continuous integration point of view.

Still, the book mentions two key concepts that I absolutely agree with: frequency and automation. It’s very important to release as frequently as possible (I do everyday) and it’s very, very important to do it as automatedly as possible, to avoid human errors and to trace where something went wrong, if that eventually happens.

There are other useful hints, too: the release process should be constantly improved, there should be a responsible to lead the whole process, aware of improving not only the process of building new releases, but also introducing new tests, new oportunities of testing and new methodologies: coverage tests, performance, cloud computing and virtualization, cyclomatic complexity and other metrics, evolving tests, mutating tests, and so forth.

To get to a conclusion,  I’ll explain in the next article what is a good solution to achieve success in a software project development by means of rapid development, high quality, control, and reduced stress, which inevitably leads to satisfaction. This is from my own experience, working for almost four years in a company that uses a different pattern (let’s say Continuous Integration redefined to the perfection ;-)) and leading the release management during the last two years (almost).

Please, don’t misunderstand me: continuous integration is not an evil; it’s better than nothing, but it’s not the best solution to take. If you are concerned about taking control on your development and getting better quality and satisfaction, I’d just keep Continuous Integration aside and first read the next article.

19 pensamientos en “Continuous Integration is NOT the key

  1. Which version control system are you using? You seem to have the idea that not checking code into the mainline repository = not doing any checkins. This is just wrong! Using a distributed version control system (e.g. git) means that the developer writes their update, checking in on their local repository where appropriate. Then, once the update and accompanying tests are written, they are are pushed to the central repository from which CI is run.

    “If only we had the right tools to branch our code, work on such isolated environments without disturbing the rest of the team and then merge our changes on the mainline”. We do, they are called distributed version control systems.

    Check in frequently: this makes sense now if we are talking about change sets. It just means keep those change sets as small as possible.

    If something goes wrong, stop the process until it’s fixed, or revert the changes: this also makes sense, since anything pushed to the mainline repository should be already tested and confirmed as working.

    Tests should run as fast as possible: Any meaningful code changes should run the test cases to make sure nothing is broken. The longer this takes, the less productive to developer, or the less inclined they are to run the tests. Both are bad.

    So if you alter you definition of checking in, I don’t know that you have too much against continuous integration:)

  2. We use Plastic SCM as source code control, and we use the Task per Branch pattern. This way every developer has his own isolated environment, and he/she can checkin whenever he wants, there’s no need to worry about breaking the build, since the tests are not run till the task is finished.

    Plastic SCM is a distributed version control system. What I wanted to do is: to throw an open question; I know that there are answers and solutions over there, but it seems that people that use CI don’t know them or don’t like them. So, before introducing distributed version control systems (take into account that strictly speaking distributed is not the only solution, but also Branch per Task pattern) I wanted to remark that question.

    I remarked on purpose that unit tests should run in a blink of the developer’s eyes. I stated clearly (I think) that the tests that a developer run in his local environment should be as fast as possible. But what certainty of bug-free gives us such tests? In my company we need more types of tests, and it’s not that strange that they get broken every now and then. If you work in isolated environments you can run the tests in a sandbox machine and start with a new, different task in the meantime. The faster the tests, the better, but now, provided with this solution, it’s not a must.

    I didn’t change the definition of checkin at all; I just said that developers should checkin frequently, but if you want them to do that, you’ll need to provide them the right environment to be self-confident about doing checkins: isolated environments, (i.e.: branches or distributed environments).

    Best,
    Luis

    • I agree that branch per task provides the same sort of isolation as distributed source control. But I believe this is what they have in mind in the book, check-in of complete change sets, where you seemed to imply they say CI should run on every single check-in.

      Regarding testing, I thought you were saying that it was a CI weakness that required fast tests? However, I’m interested in which tests you have that can’t be run locally by the developer. The only ones I can think of are where the dataset is too large for the developer’s environment, but this would be very few tests and you should be able to test most cases using a subset of the data. Also, I would still think it would be an issue if these external tests were failing?

      • Did you read the book? The whole book is CI-oriented. They don’t agree that branching is a good idea at all, but that’s because they have in mind long-living branches that are not frequently updated (rebase operations), which is a bad practice, I absolutely agree at that extent.

        CI imposes fasts tests as it’s necessary to test every changeset created to check the integrity of the build. If you work on task branches that’s not needed. There are tests that cannot be run in the developer environment because they need a special setup (databases, big amount of data… as you pointed out) or they take some time or even take control over mouse and keyboard, such as GUI tests.

  3. So your saying that working on a branch means not having to test (or merge) each change set (which is not what I thought you meant by branch per task, I was thinking task = change set). When do these change sets get tested then? When do they get merged in the main repository? When the whole feature built on them is complete? Isn’t it more work if it fails then?

  4. No, no, the idea is:
    1.- When you start working on a task, you create a branch and switch to it, and start working on it.
    2.- Then, you checkin as you feel like it; the recommendation is often and I would add: include useful comments in your checkins and create changesets that group changes that have some relationship among’em: a refactor, a bugfix, a new dialog implementation… or at the end of your daily work as a checkpoint.
    3.- When you feel that your task is done, trigger the tests to be run in a separate testing server. This server will enqueue the pending tasks to test. It’s much less work building once per task than building once per changeset, you’ll probably agree with me at this point. This is why I’m saying that tests doesn’t need to be that fast. This trigger can be implemented in several ways: we use Plastic SCM attributes and the Plastic SCM plugin for Atlassian’s Bamboo. Other options would be triggers (Plastic SCM and other SCM tools include this feature), for instance.
    4.- In the meantime, the developer can start with a new task (my suggestion: in a new workspace)… If something goes wrong, he’ll come back to the task and fix whichever must be fixed.
    5.- If everything goes fine, the integrator or build master will integrate that task in the next release by merge operation means (I recommend to release a new version everyday). If you configure your builds in the sandbox machine (Bamboo, Jenkins… whatever) to run as many tests suites as possible, the daily build will be highly probably successfully because you’ve reduced the risk by moving it to every single task, which is fine.

    • It all depends on how big those task are you talk about. If we’re getting out to a few days work without building or testing, then you’re asking for issues, probably on a larger scale than if you tested earlier. The other issue in having long running tasks in isolation is having it conflict with tasks from other developers when it is merged.

      CI tries to minimise the cost of code errors, conflicts or merging by decreasing the size of the code which is merged and tested and increasing the frequency so that errors are found early. This comes at the increased cost of time spent building and testing.

      Your integration method reduces the cost of building and testing during the task, but increases the cost of issues which occur at the end of the task, since more code has been built.

      So which is better depends on your environment: the cost of more frequent testing vs the cost of issues found later on in the process. It’s a tradeoff.

      • Damn, I forgot to mention at the beginning of the article, but it’s in the next, for sure; I remember that I did it.

        What you say is really interesting. Of course, the recommendation in every agile methodolgy is not to estimate more than 8 or 16 hours per task. If a task is bigger than this it can be divided, I can tell you.

        This is very, very important; otherwise the branches would become too big, they would contain too many changes and therefore the merge operation gets complicated if the branch is not rebased periodically (and the rebase operation implies a merge normally :-)).

        Taking into account this: small tasks, no more than 8 hours if possible, no more than 16 hours in any case, then branch per task performs nice and smooth.

        In the next article I’ll talk about all this: Branch per task and some brief guidelines to implement it.

  5. I’d like to remark once again: there’s no good reason to implement taks bigger than 16 hours, independently of using CI, Branch per task or whichever pattern you like at most ;-)

  6. Looks you started to skim pages just after first chapters. Some of your concerns are thoroughly addressed in the book, and some are not even valid. E.g. ten minute limit is imposed only on the first step of integration pipeline, but not on the whole process.

    • Yes, you’re right, but the 10-minute limit is imposed for the normal build that comprises the methodology. And it should be good enough to ensure the quality of a release, as the main objective is to keep the mainline releasable all the time. I can tell you that I got the idea very well, thanks for your contribution.

  7. “If only developers could work on isolated environments they could take the time to run as many tests as possible; this way, when the code is integrated in the mainline there’s a strongly certainty that they don’t introduce new bugs. Take into account that I’m talking about tests that should be run in a build, not local tests that are intended to be run by a developer. These last ones should be certainly fast.”

    It seems like we have that today. Git + mercurial make branching & merging much easier. I don’t really like the strict definitions of what CI is. Is it CI if you’re using branches? I don’t care. I’m one of the founders of CircleCI (https://circleci.com). To me, CI means 1) test every commit 2) commit often 3) merge with mainline often. At Circle, we test every branch by default. We recognize that testing is more than Unit Testing, it’s a process that includes integration and load testing. We make the build boxes as fast as we know how. We auto-parallelize tests, so you can throw hardware at the problem.

    IMO, avoid the dogma about the correct way to test, or what the definition of CI is. Just set up a process where you get feedback often.

  8. Pingback: Why the Branch per Task pattern IS the best solution | Realizando la idealidad

  9. You start from a wrong analysis.

    “How could developers checkin frequently on the mainline if it’s certainly possible to break the build and lots of fingers will point on their direction, demanding to fix the problem within minutes?? (…) If something goes wrong, stop the process until it’s fixed, or revert the changes!”

    When we talk about a build pipeline we talk about different stages and responsibilities. A developer cannot checkin code that doesn’t compile or do not pass unit tests. If a developer breaks the build in this static stages, he will be blamed because each individual must ensure (tipically just one command should be enough) that his code passes this minimum steps.

    If a checkin fails in integration, functional or load tests, he won’t be blamed. That’s why companies have continuous integration systems. It’s expected to give early (THE EARLIEST) feedback about unexpected situations. IT’S NORMAL. A build pipeline isn’t red or green. There are different grades of yellow, and the reaction to a yellow situation has to be managed in a proper way. That makes a project move forward fast.

  10. Well starting your comment stating that I’m wrong is not a good start, isn’t it? ;-)

    “A developer cannot checkin code that doesn’t compile or do not pass unit tests.”

    Tell me why. A good reason. The best reason you’ll get out of your pocket is: because nobody should get code that doesn’t compile!! Then, you’re right because you’re thinking in CI. Now forget about that. For a minute. Think in ISOLATED branches, environments. You don’t disturb anyone if you checkin bad code. Then, your precondition is not necessary anymore: it’s due to a weakness of your procedure, not imposed by the SCM tool you’re using.

    “If a checkin fails in integration, functional or load tests, he won’t be blamed. That’s why companies have continuous integration systems. It’s expected to give early (THE EARLIEST) feedback about unexpected situations. IT’S NORMAL. A build pipeline isn’t red or green. There are different grades of yellow, and the reaction to a yellow situation has to be managed in a proper way. That makes a project move forward fast.”

    Hum… that’s your opinion and that probably works for you, but AFAIK builds are red or green, there are not different grades or… what? yellow? orange? If you get your mainline different from green you are lost somewhere in nowhere, for sure. I’ve seen lots of companies’ procedure and read lots of books, wrote plugins for Jenkins, Bamboo and so on and I’ve never heard of “yellow” state. Well, Jenkins use a cloudy icon when several builds fail together, but that’s more fun than serious.

    If a build is red, then it’s broken. Otherwise, it’s green.

  11. I didn’t say you’re wrong, I said that particular statement is wrong. You’re right identifying caveats of CI and prescribing a solution, but in my opinion you’re also distorting reality.

    What makes a build yellow is not a couloured icon, an alarm or some rabbit ears. It’s the reaction of the team when an unexpected situation happens.

    A build that breaks in compilation and unit test stages is a red build. Hold the presses and fix or rollback. People have tools to avoid it. Commit -> local build -> checkin ensures a correct static build, and failing on this is a reason to blame somebody. This is a prerrequisite for a proper, portable build system, and therefore is out of the scope of a discussion about CI problematic.

    What happens next? Integration tests can fail. If it’s a test using a DB in an in-house staging environment we probably have to address the problem quickly. But if the case is an integration with a 3rd party service, we probably should wait and check the reports and the status of the service.

    Same with functional and non functional tests. I’m not gonna stop the developers because a load test suddenly reports a latency 40% over requirements, that’s ridiculous. There are many reasons why this can happen and, if worse come the worse, I can always go back to a green build. Not a good thing but expected to happen. Fingers should’t point anybody and halting checkins is not mandatory.

    • Ok, the thing is that in my organisation CI is not possible because we develop a powerful GUI that allows users to do lots of things. We have lots of GUI tests that cannot be implemented with unit tests, because unit tests are not suitable to test concurrency when loading information, animations and so forth. The problem is that these tests are slow, and there’s no way to make them faster but parallelising; on the other hand it’s quite easy to break this acceptance tests if someone changes something in the GUI: remove a button, change a view replacing some controls with other ones… so we cannot run CI tests within 10 minutes, and I bet that most companies neither.

      Secondly, the commit -> build -> checkin seems to me an effective but a complicated workaround to avoid breaking the mainline. But this is just my opinion because I never checkin something that hasn’t passed the tests in the mainline, so to me this sounds quite weird. If you’ve been working with CI for months or years this probably will sounds natural for you :-). I don’t need a pre-commit stage and then run the build and if something goes wrong revert or undo. When I commit, I do commit to the repository.

      • Those GUI tests sound more like acceptance than unit tests. Functional tests are expected to take longer to execute, probably hours in a large app. They’re never expected to be included in the “<10min" build. Only compilation and unit test are performed in this 1st stage.

        As you say, it's easy to break a GUI test. It's the less reliable part of a build pipeline. Changing the identifier of a button, or just a color can affect instrumentation and the test will report an error. That's why a team has a different reaction in this case than in an problem with a DB integration test. The 2nd is a showstopper, the 1st can be easily checked in a staging environment before deciding whether signing off or not (and fixing it afterwards, of course).

        Is this good or not? Well, up to you. I like getting merging feedback constantly. It's the cool thing with CI. When you're writing an isolated app that sounds trivial, but when your software interacts with a bunch of services and 3rd party dependencies that are also work in progress, this early feedback is a big help. It'd be nice to have a fully traceable codebase, but in such case I favour other attributes in my build system. Each project is different. One solution is not the key over the rest.

      • If we wouldn’t fix every test broken in the corresponding task, we’d have our tests broken all the time and, in the end, we wouldn’t have tests at all, or just the Unit tests, which doesn’t cover everything.

Deja un comentario

Introduce tus datos o haz clic en un icono para iniciar sesión:

Logo de WordPress.com

Estás comentando usando tu cuenta de WordPress.com. Cerrar sesión / Cambiar )

Imagen de Twitter

Estás comentando usando tu cuenta de Twitter. Cerrar sesión / Cambiar )

Foto de Facebook

Estás comentando usando tu cuenta de Facebook. Cerrar sesión / Cambiar )

Google+ photo

Estás comentando usando tu cuenta de Google+. Cerrar sesión / Cambiar )

Conectando a %s