Pages

Sunday, March 20, 2016

Git vs Mercurial workflow: History, Commit, and Branching

This is going to be another Git vs Mercurial post, which is already widely available on the cyberspace (thisthisthis… I can go on) but I am writing another one nevertheless because I feel that many of such posts are written with so much hatred (especially those that is in favour of Git, unfortunately). So this post is going to be one of those Git vs Hg post where the author is in favour of Git but not trying to bash Mercurial (too hard, hopefully). Also, I am going to focus on the similarity and difference in workflow instead of functionalities, though differences in workflow will inevitably also bring about some differences in features into the discussion, but I will try to minimize it.
Note: This post assumes some basic knowledge of Git and Mercurial commands.

Attitude towards commit/changeset history

In Mercurial, history are “sacred” and not to be manipulated. The only command that a user can do out-of-the-box that can edit history is only hg revert, which only removes the last changeset. To manipulate further down the history, Mercurial extensions are available, but they are tedious to use and confusing at best (speaking from my experience using Mercurial on a project 2 years ago). For Git users like myself, this is a huge annoyance as we are used to “fixing” history to make it look nicer and more easily traceable in the future.
Git, on the other hand, allows users to manipulate commit history to their hearts’ content. In fact, it seems to be encouraged, evident from how easy it is to do so (git rebasegit rebase -igit push -f, and many other relatively short commands that changes history). This allows the creation of a better-looking, more linear commit history. However, inexperienced user may break the entire repo with it. Luckily, Git keeps track of everything and one can go back to the exact state before the accident happened (provided it didn’t happen more than a month ago; plenty of time to realize something bad has happened, if you ask me).

Commit workflow

In Git, there are 4 states a file can be in; untracked, unstaged, staged, and committed. To move a file (or more precisely, a change/modification) from untracked or unstaged to the staged state, use git addand to move to the committed stage, use git commit.
In comparison, in Mercurial there are only 3 states; untracked, uncommitted and committed. To move a file from untracked to uncommitted, use hg add and to commit use hg commit.
The difference here is that in Git, the user can choose not to commit all changes in the working directory (using git add <filename> or git add --patch, for example). This is useful to make a commit atomic (which is part of Git’s or any version control system’s best practices), or if you have finished work in some files but not in others. In contrast, in Mercurial, there is no staging state and hg commitautomatically commits all changes in the working directory. If you came from Git background like myself, you will find yourself repeatedly committing unfinished work, and what’s worse, it is difficult to fix the messy history due to that mistake! That was the pain I personally went through in the Mercurial project I worked on 2 years ago.
There are Mercurial extension that mimics this Git behavior, but I have not used it personally so I cannot comment on it. However, from what I have found on the Internet, it seems to be a decent replacement for users coming from Git background using Mercurial.

Branching

In Git, there is only one way to branch (though there are a few commands to create a new branch but that’s beside the point). Any divergence in commit history is a branch, and the name of the branch is namespaced according to which repository the branch originates from.
For example, if I have a branch called test which tracks a remote branch with the same name, if at some point in time my local branch and the remote branch diverges, when it is being merged, the remote one will be called remote/test. There is no other “branching” method.
In Mercurial, there are at least 3 ways of branching.
The first one is a clone-branch. This seems to be the initially-intended branching workflow of Mercurial, evident from the “local-cloning-optimization” feature they have called “hardlink” which makes cloning from a local repo faster. This, however, is not a feasible branching workflow for certain types of projects where dependencies need to be downloaded separately (through npm or pip, for example) for each repo.
The second branching workflow that Mercurial supports is called named-branch. It is quite similar to Git branch in that one can update (or checkout in Git terminology) to the latest commit on that branch. In Mercurial, however, named-branch is not a light-weight pointer to a HEAD just like in Git, but is something that is included in a changeset’s (or commit, in Git terminology) meta-data. There are some implications that people from Git’s world don’t really like (such as cluttering the revision history with short-lived branches and needing to “close” a branch with an extranous commit). On the other hand, it could be useful when tracing the history. But then again I have worked with Git for quite a while and I have never encountered a problem where I need to know the name of the branch a commit was from, so the usefulness is questionable.
The last branching workflow that Mercurial has is bookmarks. This is (claimed to be) the Git-branching equivalent in Mercurial. However, having used it in one of my projects 2 years ago, I find that they are not quite the same. In fact, I find that Mercurial’s named branch is more similar to Git branch than Mercurial bookmark is simiar to Git branch. In Git, you are always working on a branch, so whenever you commit, you commit to a branch and other people who pull your work knows that your commit belong to a certain branch. In contrast, you are not always on a bookmark in Mercurial, and thus people may accidentally committed when he is not on a bookmark, requiring the user to manually move thebookmark to the intended changeset. Also, when sending a pull request, the name of the bookmark is not shown. Instead, the hash of the changeset is written as the “branch name”. This makes me doubt the claim that bookmark is really the Mercurial equivalent of Git branch.
Arguably, there is another Mercurial branching workflow, which is simply not to do anything about it. If there is a divergence from a certain changeset, simply don’t do anything about it. Users can refer to a “branch” by the hash or revision number of the tip of the “branch” he/she intends to work on. This is good for quick fixes, but is not suitable for development branch as it will be difficult to keep track which “branch” is doing what. I am unsure if it is one of the intended way to use Mercurial.
Conclusion on Branching
In my opinion, Git’s branch workflow is better, as it is more consistent (i.e. there is only one way to do it). If I were to use Mercurial, though, I would probably use the named-branch workflow as it is much more manageable than all the other Mercurial branch workflow.
One may argue that Git also has some divergence in terms of branch workflow, namely "to merge or torebase". In my opinion, it is actually what makes Git great; you have the option whether to preserve history as it is or to make a cleaner, nice-looking history. In contrast, Mercurial’s different branch workflows are not really “options” but rather inconsistency on Mercurial’s default development workflow. I mean, I can’t really think of the benefit of using the clone-branch workflow, or bookmark-branch workflow over any of the other options. But then again, maybe I have not used Mercurial enough to discover the different benefits and disadvantage of the different ways of branching in Mercurial.
Just to reiterate, I wrote a comparison of two of the most popular distributed version control systems in terms of their attitudes towards history, commit workflow, and branching. I found that Git’s way of doing things to be better than that of Mercurial’s because of the flexibility and consistency that Git provides.
Feel free to point out any mistakes in my post. I have not used Mercurial for quite a while so probably some information is outdated, but I did some research before posting this so it should not be too outdated.
Other references not linked on the post itself:

No comments:

Post a Comment