GIT filter-branch
git-filter-branch
can be used to rewrite the history of one or more branches.
As a Debian Developer working for Univention GmbH I often have to work with Debian packages.
Here are some more examples from my daily work.
Removing files
The manual page for git filter-branch contains several examples already.
The following two commands can be used to remove the file filename
from all commits in the current branch:
git filter-branch --tree-filter \
'rm -f filename' HEAD
git filter-branch --index-filter \
'git rm --cached --ignore-unmatch filename' HEAD
The second command is faster (for large repositories) as it does not need to checkout each revision to the working space.
You can use the -d
option to specify an alternative working directory like a tmpfs
file system or some other local file system on a fast SSD.
Change files
For removing or reverting changes to files working with the index is fine and faster.
But as soon as you want to apply a change to all commit, it is easier to use --tree-filter
:
The following example changes all occurrences of foo
to bar
in all revisions of the file filename
:
git filter-branch -d /tmp --tree-filter \
`sed -i "s/foo/bar/g" filename' HEAD
It is important to apply the same transformation to each commit: Otherwise the next commit after the one which got changed will accumulate all previous changes!
Dropping changes to files
Sometimes I have to apply the changes of one feature branch to multiple upstream branches.
This often leads to conflicts with debian/changelog
as different branches have different versions of this file.
So I often just strip all changes to those files, apply the feature-branch to my target-branch and generate a new entry for debian/changelog
(suing gbp dch
or similar).
This can be done using --index-filter
:
for each revision I tell git
to reset the file back to the content from the preceding commit.
git filter-branch --prune-empty --index-filter \
'git reset $(map $(git rev-parse $GIT_COMMIT~1)) -- "**/debian/changelog"' \
@{u}..HEAD
Some notes on that:
@{u}..HEAD}
is my way to iterate over all commits starting at the branch point up to the current HEAD.- You cannot use
HEAD
inside the filter as it always points to your last commit. It is not updated while the loop is iterating. It only gets updated once after the rewrite has been done. - Instead you can use
$GIT_COMMIT
, which points to the commit currently being processed. $GIT_COMMIT~1
references the original previous commit. Usingmap $REV
this gets mapped to the rewritten previous commit.
With some more thinking this can be simplified to:
base="$(git merge-base @{u} HEAD)"
git filter-branch --prune-empty --index-filter \
"git reset '$base' -- '**/debian/changelog'" \
@{u}..HEAD
which basically tells git to reset all changelog
files back to the revision at the branch point.
Dropping hunks
The previous example is somehow easy, as we drop all changes to a single file.
It gets more complicated if you only want to drop some changes (hunks), but keep others.
Then the previous technique fails because git
works tree based and not diff based:
a -> b -> c -> d -> HEAD
\
-> c' -> d' -> HEAD'
If for example you only do the filtering for c'
, but not d'
, then the diff from c'
to d'
will include the dropped change from c
to c'
.
TBC…