Version control systems are a must have for every software
development project, even if only one developer works on it. A lot of
engineers don't really understand that, but it's not just to make
cooperation easier: with careful use it can provide highly valuable
information about the evolution of the code. That is essential if you
want to create something which will be understood 5 years from now when
you might be far away. And a careful hiring manager should pay attention
to hire people who can understand this, as it's crucial for doing
sustainable software development.
But this means that a patch
containing unrelated changes is a big no-no. Or even if they are
related, as the number of lines changed grows, it gets harder and harder
for others to understand what this change means. And it gets easier for
a subtle bug to hide in plain sight. Therefore breaking up those
changes into a series of patches is important to help others to review,
for future generations to learn what happened and why, and also for you,
the developer, to really understand what are you doing.
There are
some general guidelines about what can you do to slice up your big
changeset, but it's often hard, as the number one rule is:
- Don't break the rest of code! When you apply the series, after each patch it should build and function - ideally - without regressions. This is important for another great git tool to work: bisect. It helps you to narrow down when a particular problem were introduced, but it only works when your code builds after each commit, and new commits doesn't break - not even temporarily! - what already works.
- Nevertheless, try to split up what you are doing into logical steps, and cut there, even if those changes doesn't make too much sense on their own. Think about it as a powerful way to communicate your train of thought! It's not easy as development usually involves a lot of trial and fail, which means you have to edit various parts of your series. We will discuss how to do that efficiently later on.
- If you import/copy files from somewhere else, rename or delete them, do that as separate patches. If you make changes to that imported code, do it in separate patches, otherwise you force your reviewers to disseminate what is copied and what is new content. Or not, because for example I always automatically reject patches like that. If that imported code would break the build, only add it to your build system when those problems are fixed (in a separate patch, again)
- If you also make a lot of build system changes (modifying a lot of Makefiles for example), worth doing it in separate patch.
- If you add code by adding a completely new source file, you can leave adding it to the Makefiles in the last patch. That way you don't have to worry about "git blame", as your code won't be compiled until the last patch. So you can break up your changes without caring about whether it compiles or not.
- Adding new datatypes, struct members and variables could go into its own commit, before you make the functional changes. But be careful, if your compiler treats unused variables as an error, that might break your build. And if these kind of changes are not very big, it doesn't worth the effort.
- Same applies to removing unused identifiers, but of course after the actual changes were made.
- Don't make unrelated non-functional changes to the code, e.g. changing code style of code you otherwise doesn't touch. Those should always go to a separate patch. Probably it should be independent from your series at all.
In the rest of the article I will go through an example to show what are the useful features of git to help you in case you don't have all the changes in mind when you start creating the series. Which is probably true nearly all the time.
Create our example repository
Let's create a repository with a file 'examplefile', where the first 50 row contains the row numbers in the initial commit:
mkdir example cd example git init for (( i = 1; i <= 50; i++ )); do echo "$i" >> examplefile; done git add examplefile git commit -m "Initial commit"
Let's start our series with a few changes. We should create a separate branch for that:
git checkout -b exampleseries
Then make the changes like this:
$ git diff diff --git a/examplefile b/examplefile index 96cc558..3e451d2 100644 --- a/examplefile +++ b/examplefile @@ -2,9 +2,9 @@ 2 3 4 -5 +5 foo bar 6 -7 +7 foo bar 8 9 10 @@ -12,7 +12,7 @@ 12 13 14 -15 +15 foo bar 16 17 18
Separate a bulk of changes in the working directory into commits
The "git add" command normally stage the whole file for commit, but if you want to create separate patches, you have to invoke the interactive mode, with -i parameter:
$ git add -i examplefile staged unstaged path 1: unchanged +3/-3 examplefile *** Commands *** 1: status 2: update 3: revert 4: add untracked 5: patch 6: diff 7: quit 8: help What now> 5 staged unstaged path 1: unchanged +3/-3 examplefile Patch update>> 1 staged unstaged path * 1: unchanged +3/-3 examplefile Patch update>> diff --git a/examplefile b/examplefile index 96cc558..3e451d2 100644 --- a/examplefile +++ b/examplefile @@ -2,9 +2,9 @@ 2 3 4 -5 +5 foo bar 6 -7 +7 foo bar 8 9 10 Stage this hunk [y,n,q,a,d,/,j,J,g,s,e,?]? ? y - stage this hunk n - do not stage this hunk q - quit; do not stage this hunk nor any of the remaining ones a - stage this hunk and all later hunks in the file d - do not stage this hunk nor any of the later hunks in the file g - select a hunk to go to / - search for a hunk matching the given regex j - leave this hunk undecided, see next undecided hunk J - leave this hunk undecided, see next hunk k - leave this hunk undecided, see previous undecided hunk K - leave this hunk undecided, see previous hunk s - split the current hunk into smaller hunks e - manually edit the current hunk ? - print help @@ -2,9 +2,9 @@ 2 3 4 -5 +5 foo bar 6 -7 +7 foo bar 8 9 10 Stage this hunk [y,n,q,a,d,/,j,J,g,s,e,?]? s Split into 2 hunks. @@ -2,5 +2,5 @@ 2 3 4 -5 +5 foo bar 6 Stage this hunk [y,n,q,a,d,/,j,J,g,e,?]? y @@ -6,5 +6,5 @@ 6 -7 +7 foo bar 8 9 10 Stage this hunk [y,n,q,a,d,/,K,j,J,g,e,?]? q *** Commands *** 1: status 2: update 3: revert 4: add untracked 5: patch 6: diff 7: quit 8: help What now> 7 Bye.
The above example explained:
- We need to choose "patch" option, which is 5.
- Then choose which file we want to go through, obviously we only gave one at the command line.
- After another Enter we can choose whether to stage the unstaged hunks of this file.
- But there is an awful lot of possibilities, and the help is surprisingly well written, worth to look at it!
- We choose split, then add the changes made to line 5, and leave the others unstaged. We could also choose 'n' for each hunk separately, but 'q' is quicker.
So now our repo looks like this:
$ git status
On branch exampleseries
Changes to be committed:
(use "git reset HEAD <file>..." to unstage)
modified: examplefile
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)
modified: examplefile
We can commit this, and the rest of the file as a second commit:
git commit -m "First commit" git add examplefile git commit -m "Second commit"
Edit our series after it was committed to our working branch
Now imagine we figure out that we missed something, and would like to add to a previous commit:
$ git diff diff --git a/examplefile b/examplefile index 3e451d2..698723d 100644 --- a/examplefile +++ b/examplefile @@ -6,7 +6,7 @@ 6 7 foo bar 8 -9 +9 foo bar 10 11 12
If it's the latest commit, we can just do this:
git add examplefile git commit --amend
It will also allow us to edit the commit message. But if it's something earlier, we need to use 'stash' first to save our working tree, then use interactive rebase to edit a commit:
$ git stash Saved working directory and index state WIP on exampleseries: 3ddfac4 Second commit HEAD is now at 3ddfac4 Second commit $ git rebase -i master
Then we get our editor where we can decide what to do with our commits between 'master' and HEAD:
pick 9f5dc11 First commit pick 3ddfac4 Second commit # Rebase 732eafd..3ddfac4 onto 732eafd # # Commands: # p, pick = use commit # r, reword = use commit, but edit the commit message # e, edit = use commit, but stop for amending # s, squash = use commit, but meld into previous commit # f, fixup = like "squash", but discard this commit's log message # x, exec = run command (the rest of the line) using shell # # These lines can be re-ordered; they are executed from top to bottom. # # If you remove a line here THAT COMMIT WILL BE LOST. # # However, if you remove everything, the rebase will be aborted. # # Note that empty commits are commented out
BE CAREFUL! If you delete a line, that commit is deleted! Otherwise, read the help text, it's again self-explanatory. We need to change the action of our first commit to 'edit'. Then after saving the file it reapplies the first patch and stops for amending:
Stopped at 9f5dc111621f8a49180268db520ad210183348df... First commit
You can amend the commit now, with
git commit --amend
Once you are satisfied with your changes, run
git rebase --continue
Now we can apply our stash, add it to our commit then finish the rebase:
$ git stash pop
Auto-merging examplefile
rebase in progress; onto 732eafd
You are currently editing a commit while rebasing branch 'exampleseries' on '732eafd'.
(use "git commit --amend" to amend the current commit)
(use "git rebase --continue" once you are satisfied with your changes)
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)
modified: examplefile
$ git add examplefile
$ git commit --amend
[detached HEAD afe54f7] First commit
1 file changed, 2 insertions(+), 2 deletions(-)
$ git rebase --continue
Successfully rebased and updated refs/heads/exampleseries.
Note 1: 'git stash pop' took care of the fact line #7 was not modified in the first commit, so our hunk haven't applied cleanly. But of course auto-merge can't handle every case, so be prepared to handle merge issues!
Note 2: Keep in mind, that if this auto-merge fails, your latest stash stays on the top! Yes, you can have multiple stashes, and 'pop' removes them if apply succeeds.
Note 3: We could have created a temporary commit instead of stashing, and then move it after the first commit and use the 'fixup' command. Doesn't make too much difference in practice.
Note 4: When we stop for editing, we can also inject new commits! Fortunately 'git commit --amend' doesn't automatically continue the rebase, and a 'git commit' will create new ones.
Sending the patches
First we need to generate the patches for sending with format-patch:
git format-patch --patience -s --from="Your Name <your@address.com>" --to="<maintainer or mailing list>" --cc="Another Interested <party@domain.com>" --subject-prefix="PATCH" --cover-letter --notes -o ../patchseries/projectname/seriesname master..exampleseries
Most of the arguments are self-explanatory, a few remarks:
- "--patience" makes it try to generate more readable diffs
- "-s" prevents the patches to be printed on the stdout
- Don't forget to edit "--subject-prefix" when you send a V2 of your series!
- Each patch will be a reply-to for the first email, so it will be nicely threaded on a mailing list archive
- "--cover-letter" generates a patch file without a diff, it will be your cover letter, also the first in the series, so the rest of the patches will be replies to this. Don't forget to edit it!
- "--notes" adds your patch notes after the commit message, separated by a triple dash line. Useful for keeping record of your patch history, see below
- "-o" defines the output directory. It's worth to figure out a naming scheme, so you can find them later on.
- "-N" prevents the patches to be numbered in the subject line, useful if your changes are not related, and you want to send them as unrelated mails. It also disables reply-to headers to appear, so it won't be threaded for the reader
- If you just want the top X patch, use "-X"
The output will be the file names generated. You can directly feed that to "git send-email", which will send it through your SMTP server.
Updating your series
Unless you made some very trivial changes, it's quite probable that someone tells you to change something in one of your patches. Or extending it with new patches, removing some of them. The interactive rebase explained above can help you achieving that, but it's usually better to keep your first version of the series, and do the changes on a copy of that. It's easy:
git checkout -b exampleseriesV2
This will create a new branch, which at the beginning will point to the same commit. But when you do a rebase and modify your commit it will create a new commit, as the SHA value changes.
You can use the "git notes" command to maintain the patch history after you modify it. "git format-patch --notes" will add them after the commit message, separated by triple dash. So reviewers can see what has changed between versions, but it doesn't get applied on the other end (as it should not).
Using checkpatch automatically plus saving your Eclipse settings during cleanup
A lot of projects use the Linux kernel's checkpatch.pl script to find code style problems. It's useful to run it during commit, and probably the easiest thing to do it is with a wrapper script. I have this one in my ~/bin directory, which hijacks the git command from /usr/bin/git:
#!/bin/sh
command=$1
shift
case $command in
*clean)
# don't clean out Eclipse settings
/usr/bin/git $command -e .cproject -e .project -e .settings "$@"
;;
*commit)
# if there is checkpatch, don't allow commit when it fails
if [ -e scripts/checkpatch.pl ] && [ "$1" != "-n" ]; then
/usr/bin/git diff --cached | scripts/checkpatch.pl --no-signoff -q -
if [ $? -ne 0 ]; then
exit
fi
fi
/usr/bin/git $command "$@"
if [ $? -ne 0 ]; then
exit
fi
# run checkpatch again to look at the commit message
if [ -e scripts/checkpatch.pl ] && [ "$1" != "-n" ]; then
/usr/bin/git format-patch -1 --stdout | scripts/checkpatch.pl --no-signoff -q -
fi
;;
*)
/usr/bin/git $command "$@"
;;
esac
The first hook for "git clean" is actually preventing it to deleting the Eclipse project settings. The second checks if there is a checkpatch script in this repo, then runs it against the content we want to commit. It stops the commit if it fails, unless the first parameter is "-n". After the commit it checks again the last commit, so in case we used --amend we can still see the warnings, plus it can check the commit message as well. However it doesn't abort the commit if it finds an issue.