February 8, 2010Editorial. Today we interrupt our regularly scheduled programming to bring you a guest post by Oleg Kiselyov, reinterpreting our previous post about nested loops and continuations with exceptions.
Hello!
I noticed your recent article about nested loops and continuations. I should have commented on it using the provided form, but I was not sure how formatting would come out. The comment includes a lot of code. Please feel free to post the code in whole or in part, or do anything else with it.
The thesis of my comment is that callCC is not necessary for the implementation of break and continue in single and nested loops. We observe that the continuations of each iteration and of the entire loop are invoked either 0 or 1 time (but never more than once). That is the pattern of exceptions. So, the problem posed by your article can be solved with exceptions. Here are several variations of the solution.
First, a few preliminaries: this message is the complete literate Haskell code.
> import Prelude hiding (break, catch)
> import Control.Monad
> import Control.Monad.Trans
Alas, ErrorT in Control.Monad.Error has the stupid Error constraint. So, we have to write our own Exception monad transformer. The code below is standard.
> newtype ExT e m a = ExT{unExT :: m (Either e a)}
>
> instance Monad m => Monad (ExT e m) where
> return = ExT . return . Right
> m >>= f = ExT $ unExT m >>= either (return . Left) (unExT . f)
>
> instance MonadTrans (ExT e) where
> lift m = ExT $ m >>= return . Right
>
> instance MonadIO m => MonadIO (ExT e m) where
> liftIO m = ExT $ liftIO m >>= return . Right
>
> raise :: Monad m => e -> ExT e m a
> raise = ExT . return . Left
>
> catch :: Monad m => ExT e m a -> (e -> ExT e' m a) -> ExT e' m a
> catch m h = ExT $ unExT m >>= either (unExT . h) (return . Right)
>
> runExT :: Monad m => ExT e m a -> m a
> runExT m = unExT m >>= either (const $ fail "Unhandled exc") return
We are ready to code the first solution, for simple, non-nested loops. The idea is to treat ‘break’ and ‘continue’ as exceptions. After all, both control operators cause computations to be skipped—which is what exceptions do. We define the datatype of our ’exceptions’:
> data BC = Break | Cont
>
> break, continue :: Monad m => ExT BC m a
> break = raise Break
> continue = raise Cont
Here is the code for the loop: it catches exceptions at some points:
> for_in :: Monad m => [a] -> (a -> ExT BC m ()) -> m ()
> for_in xs f = runExT $ mapM_ iter xs `catch` hbreak
> where
> iter x = catch (f x) hcont
> hcont Cont = return () -- handle Cont, re-raise Break
> hcont e = raise e
> hbreak Break = return ()
> hbreak Cont = return () -- Shouldn't happen actually
Here is your test:
> loopLookForIt1 :: IO ()
> loopLookForIt1 =
> for_in [0..100] $ \x -> do
> when (x `mod` 3 == 1) $ continue
> when (x `div` 17 == 2) $ break
> lift $ print x
Running it:
> tf1 = loopLookForIt1 :: IO ()
prints 23 numbers starting with 0, 2, 3 and ending with 30, 32, 33.
We have to generalize to nested loops. Two solutions are apparent. I would call the first one ‘dynamic’. We index the exceptions by levels, which are natural numbers. Level 0 pertains to the current loop, level 1 is for the parent loop, etc.
> data BCN = BCN BC Int -- Add the level of breaking
Operators break and continue now take the number: how many loop scopes to break. I think Perl has a similar breaking-with-number operator.
> breakN = raise . BCN Break
> continueN = raise . BCN Cont
The new iterator:
> for_inN :: Monad m => [a] -> (a -> ExT BCN m ()) -> ExT BCN m ()
> for_inN xs f = mapM_ iter xs `catch` hbreak
> where
> iter x = catch (f x) hcont
> hcont (BCN Cont 0) = return () -- handle Cont, re-raise Break
> hcont e = raise e
> -- If the exception is for a parent, re-raise it, decrementing its level
> hbreak (BCN Break 0) = return ()
> hbreak (BCN Cont 0) = return () -- Shouldn't happen actually
> hbreak (BCN exc n) = raise (BCN exc (n-1))
The single-loop test now looks as follows.
> loopLookForItN :: ExT BCN IO ()
> loopLookForItN =
> for_inN [0..100] $ \x -> do
> when (x `mod` 3 == 1) $ continueN 0
> when (x `div` 17 == 2) $ breakN 0
> lift $ print x
>
> tfN = runExT loopLookForItN :: IO ()
We can now write the nested loop test. I took a liberty to enhance the example in your article, so to exercises all cases:
> loopBreakOuter1 :: ExT BCN IO ()
> loopBreakOuter1 =
> for_inN [1,2,3] $ \x -> do
> lift $ print x
> for_inN [4,5,6] $ \y -> do
> lift $ print y
> when (y == 4) $ continueN 0
> when (x == 1) $ breakN 0
> when (x == 3) $ breakN 1
> when (y == 5) $ continueN 1
> breakN 1
> lift $ print x
>
> tbN1 = runExT loopBreakOuter1 :: IO ()
The result is the sequence of numbers: 1 4 5 1 2 4 5 3 4 5
There exists another solution for the nested-loop problem, which I call ‘static’. What if we just iterate the single-loop solution? We can nest ExT BC monad transformers to any given depth. To refer to particular layer in the transformer stack, we use lift. We can use the for_in iterator and the operators break, continue defined earlier. We write the nested test as follows:
> loopBreakOuterS1 :: IO ()
> loopBreakOuterS1 =
> for_in [1,2,3] $ \x -> do
> liftIO $ print x
> for_in [4,5,6] $ \y -> do
> liftIO $ print y
> when (y == 4) $ continue
> when (x == 1) $ break
> when (x == 3) $ lift $ break
> when (y == 5) $ lift $ continue
> lift $ break
> liftIO $ print x
> tbS1 = loopBreakOuterS1 :: IO ()
I guess the lesson here might be that callCC is often not needed (I would argue that callCC is never needed, but that’s the argument for another time). Here is another example of simple exceptions sufficing where call/cc was thought to be required:
http://okmij.org/ftp/Computation/lem.html
Cheers,
Oleg
February 5, 2010The bread and butter of an imperative programmer is the loop. Coming from a C/assembly perspective, a loop is simply a structured goto which jumps back to a set location if some condition is not met. Frequently, this loop ranges over the elements of some list data structure. In C, you might be doing pointer arithmetic over the elements of an array or following pointers on a linked list until you get NULL; in Python and other higher-level languages you get the for x in xs construct which neatly abstracts this functionality. Inside of a loop, you also have access to the flow control operators break and continue, which are also highly structured gotos. An even more compact form of loops and nested loops are list comprehensions, which don’t permit those flow operators.
Haskell encourages you to use the higher order forms such as map and fold, which even further restrict what may happen to the data. You’ll certainly not see a for loop anywhere in Haskell… However, as a pernicious little exercise, and also a way to get a little more insight into what callCC might be good for, I decided to implement for...in loops with both the continue and break keywords. The end hope is to be able to write code such as:
import Prelude hiding (break)
loopLookForIt :: ContT () IO ()
loopLookForIt =
for_in [0..100] $ \loop x -> do
when (x `mod` 3 == 1) $ continue loop
when (x `div` 17 == 2) $ break loop
lift $ print x
as well as:
loopBreakOuter :: ContT () IO ()
loopBreakOuter =
for_in [1,2,3] $ \outer x -> do
for_in [4,5,6] $ \inner y -> do
lift $ print y
break outer
lift $ print x
the latter solving the classic “nested loops” problem by explicitly labeling each loop. We might run these pieces of code using:
runContT loopBreakOuter return :: IO ()
Since continuations represent, well, “continuations” to the program flow, we should have some notion of a continuation that functions as break, as well as a continuation that functions as continue. We will store the continuations that correspond to breaking and continuing inside a loop “label”, which is the first argument of our hanging lambda:
data (MonadCont m) => Label m = Label {
continue :: m (),
break :: m ()
}
It’s sufficient then to call continue label or break label inside the monad to extract and follow the continuation.
The next bit is to implement the actual for_in construct. If we didn’t have to supply any of the continuations, this is actually just a flipped mapM_:
for_in' :: (Monad m) => [a] -> (a -> m ()) -> m ()
for_in' xs f = mapM_ f xs
Of course, sample code, f has the type Label m -> a -> m (), so this won’t do! Consider this first transformation:
for_in'' :: (MonadCont m) => [a] -> (a -> m ()) -> m ()
for_in'' xs f = callCC $ \c -> mapM_ f xs
This function does the same thing as for_in', but we placed it inside the continuation monad and made explicit a variable c. What does the current continuation c correspond to in this context? Well, it’s in the very outer context, which means the “current continuation” is completely out of the loop. That must mean it’s the break continuation. Cool.
Consider this second alternative transformation:
for_in''' :: (MonadCont m) => [a] -> (a -> m ()) -> m ()
for_in''' xs f = mapM_ (\x -> callCC $ \c -> f x) xs
This time, we’ve replaced f with a wrapper lambda that uses callCC before actually calling f, and the current continuation results in the next step of mapM_ being called. This is the continue continuation.
All that remains is to stick them together, and package them into the Label datatype:
for_in :: (MonadCont m) => [a] -> (Label m -> a -> m ()) -> m ()
for_in xs f = callCC $ \breakCont ->
mapM_ (\x -> callCC $ \continueCont -> f (Label (continueCont ()) (breakCont ())) x) xs
Et voila! Imperative looping constructs in Haskell. (Not that you’d ever want to use them, nudge nudge wink wink.)
Addendum. Thanks to Nelson Elhage and Anders Kaseorg for pointing out a stylistic mistake: storing the continuations as () -> m () is unnecessary because Haskell is a lazy language (in my defense, the imperative paradigm was leaking in!)
Addendum 2. Added type signatures and code for running the initial two examples.
Addendum 3. Sebastian Fischer points out a mistake introduced by addendum 1. That’s what I get for not testing my edits!
February 3, 2010And so classes begin this Spring Term of 2010. The classes that I am currently signed up to take are:
- 6.005: Software Construction
- 6.02: Intro to EECS II
- 6.045: Automata, Computing and Complexity
- 6.945: Large-scale Symbolic Systems
- 21M.283: Musicals of Stage and Screen
6.945 is the “fun” class of the semester; I expect to have to sink a lot of time into and get a lot out of it in return. 6.005 and 6.02 are strictly being taken because my degree requires it (I’ve scheduled 6.02 as a conflict class on top of 6.945, so I’ll probably have to do a little more legwork to make sure I get all the material for that class.) 6.045 is my mathematical class for the semester; no pure course 18 class for me, unfortunately! And on the advice of Robert Jacobs, 21M.283 is my HASS-ish class (I’m quite pleased that I’ve gotten the HASS-D requirement out of the way).
Among the classes I’m not taking this term: 6.006 (goodness knows I do need the algorithms knowledge), 7.01X (punt punt punt) and 6.824 (sounds like lots of fun, but would result in a triple conflict, which I’m not willing to do.)
February 1, 2010A classic stylistic tip given to C programmers is that inline functions should be preferred over macros, when possible. This advice stems from the fact that a macro and an inline function can achieve the same effect, but the inline function also gets type checking.
As it turns out, you can achieve static type checking with macros, if you’re willing to resort to the same cute trick that this following snippet from the Linux kernel uses:
#define module_param_named(name, value, type, perm) \
param_check_##type(name, &(value)); \
module_param_call(name, param_set_##type, param_get_##type, &value, perm); \
__MODULE_PARM_TYPE(name, #type)
Hmm… I wonder what that param_check_##type call is all about. Digging through a few more macro definitions, we see:
#define __param_check(name, p, type) \
static inline type *__check_##name(void) { return(p); }
So there you go. A throw-away inline function named __check_##name enforces that p is the same type as type. A comment is also given, explaining what’s going on:
/* The macros to do compile-time type checking stolen from Jakub
Jelinek, who IIRC came up with this idea for the 2.4 module init code. */
January 29, 2010Nelson Elhage wrote a post about Git and usability, in which he discussed one of the reasons why Git seems to be so confusing to users who have come in straight from a Subversion-style workflow. When discussing this issue offline, one of the things that came up was the fact that, while Subversion imposes a fairly rigid workflow upon its users, Git is flexible enough to do almost any sort of workflow. This is terrible for a user placed in a shop that uses Git: when they go Google for how to use Git, they’ll get any multitude of tutorials, each of which is for a different workflow.
In this multipart series, I’d like to discuss several different types of workflows that I’ve seen or experienced while using Git. This first post will look at a very simple example of a Git workflow, namely that of a single user, which will establish some basic idioms of Git that you might see in the other workflows.
A single-user workflow is, well, kind of simple. At it’s simplest, it’s not much more than a glorified backup system; you have lots of versions of your code. You can go back in time. Since I am assuming a general knowledge of version control systems, I don’t think I need to convince you why this is useful. This article also assumes that you’re comfortable enough to make commits in a repository (though we will not assume you know how to use the index; -a is a wondrous flag).
Backups
The very first thing you may notice when you move from a centralized VCS to a decentralized VCS is that your data never leaves your computer unless you explicitly say so. This is great if you are on an airplane and don’t have Internet access; you don’t have to pile up a stack of changes without being able to check in to the server. However, it means that you have to put in a little thought about where you are going to push your changes to. An easy way to do this is to utilize the multitude free public hosting. If you have a server that you have SSH access, private offsite backups are also easy: create a bare git repository on another server using git init --bare and then setup a remote that you can push to… but I’m getting ahead of myself!
If you created a Git repository and working copy on your own computer with git init, you’ll now have to wrangle with Git remotes. I personally find this quite annoying, and thus always arrange to have my bare Git repository (i.e. the server) setup before I git clone my working copy (i.e. the client), which sets up the configuration that makes pushing easy. My steps are then:
# On my server, make a directory (I like /srv/git/project.git) and in it run git init --bare # On my client, run git clone ssh://servername/srv/git/project.git
If you must setup the remotes on an existing repository, the following commands will do the trick:
git remote add origin $REPO_URL
git config branch.master.remote origin
git config branch.master.merge refs/heads/master
For the curious, the first line adds a remote named “origin” (which, by convention, is the remote setup from the repository you may have cloned) associated with $REPO_URL. The second and third lines setup default behavior for when you pull changes from the repository, to simulate the configuration that normally gets setup when you do a clone. (Note: this kind of sucks. Git 1.7.0 introduces the --set-upstream flag which fixes these problems.)
From there, all you need to do is make commits with git commit, and then push them to the remote repository with git push.
Topic branches
As a single user, most of your work in your repository will play nicely together; you don’t have to worry about someone else coming in and trampling on your commits. However, every once in a while you may find yourself in the midst of a large refactoring, and you find yourself having to leave things off for the day, or take an interrupt to work on a more pressing, albeit smaller, bugfix. Here, cheap commits and branching make this very simple on Git.
If you think the changes you are currently working on are big but you’ll be able to get back immediately to them, use git stash to temporarily pop your changes into a stash. You can then perform your minor changes, and once done, use git stash pop to restore your old changes. Stash works best as a temporary scratch place for you to store changes, and should be immediately emptied out when possible; you don’t want to be looking at multiple stashed changes and trying to figure out which one contains the ones you care about.
If your changes are a smidge bigger than that, or you think that you’re not going to be able to work on whatever large change you’re making for a while, you can make what’s called a topic branch. First, change your working copy over to a new branch using git checkout -b new-branch-name (pick a descriptive name). Then, make a commit to save your changes. If you pop open gitk, you’ll now notice that you have a commit hanging off of master. You can checkout master again using git checkout master and work on whatever other changes you need.
When you finally decide that your topic branch is done, you need to stick back into master. There are two ways to do this:
- You can pretend that your topic branch, as a whole, is just a big patch, and as such, this patch should reasonably apply to the most recent version of
master. In that case, running git rebase master while on the topic branch (you can check with git status) will take this “patch” and apply it to master. You can then checkout master and git pull topic-branch to fast-forward master to the topic branch. Since getting rid of old branches is a good thing, I recommend running git branch -d topic-branch afterwards. - You can take a stance that history is important, and perform a merge. On the master branch, run
git merge topic-branch. Just as in the first case, you can then cleanup the topic branch with git branch -d topic-branch.
Cleaning up after old topic branches is a good habit to get into, because it means you can use git branch to remind yourself quickly which topic branches might need your attention.
Additionally, if you care about backing up your topic branches, you should run git push origin topic-branch. You can delete topic branches from your remote using git push origin :topic-branch (note the colon).
Clean history
Many people pay a lot of attention to documentation inside a source file in order to puzzle out what a particular piece of code does. However, another excellent source of code documentation is looking at the history of a piece of code; when did a particular snippet get introduced, and what explanation did the author give for it when making that change? git blame will give you a blow-by-blow description of when every particular line in a Git file was changed, and git log will show you the conglomeration of changes made to a particular file.
Unfortunately, the usefulness of this mechanism highly depends on the quality of the messages you’re making in your commits, and if you’re using Git properly and committing often, you might have skimped a little on some of the messages. No worries; it happens to the best of us. You just have to remember to clean things up (i.e. rewrite history) when you’re done.
In this case, git rebase -i is your friend. Specify as an argument how far back you want to rewrite history (HEAD~N where N is a number is probably a good bet), and then rewrite history to your hearts content. You have three primary tools:
edit, and when Git gets to that commit, just run git commit --amend: This is fairly simple: you have a self-contained commit that you didn’t really write a good commit message for, well amend will let you change that commit message into something that is useful.squash: If you made a bunch of very small commits, and now you look at them and decide, no, they really logically go together, you can squash them together.edit with git checkout HEAD~: What this will do is give you a working tree with the changes of that commit, but without any of them actually part of a commit. You can then break a “too big” commit into bite-sized pieces using git add -p (which will selectively add hunks of your changes to the index) and then using git commit without the -a flag).
This strategy interacts particularly well with topic branches, which lend themselves to the following workflow:
- Create the topic branch with
git checkout -b topic-name, - Hack a lot on the branch, making tiny commits with incomprehensible summaries,
- Review your changes with
git log -u master..HEAD, - Edit your changes with
git rebase -i master, - Checkout master and
git pull topic-name.
And that’s it for part one! You may have noticed that all of these strategies seem to feed into each other: this unusual integration between all aspects is one of the benefits of Git’s simple internal model. If people would like to see some examples of these techniques in action, I’d be more than happy to blog about them some more. Thanks for reading.
January 27, 2010As treasurer for the Assassins’ Guild, I often have to beg and plead GMs to get posters for their respective games, since the UA Finboard has a requirement that, to get funding for events, you need to supply posters.
So I was quite surprised, amazed and impressed by Lauren Gust’s work on posters for Arcadia. The composition is simple, but effective, and definitely adds to the atmosphere of the game. Click for larger versions, and you can get full size PDFs of the posters in Lauren Gust’s public. For a little more background into what the posters are referencing, check out the Arcadia Rising scenario.
January 25, 2010In my younger days, the stylistic convention of MM/DD/YYYY confused me; why on earth would people opt for such an illogical system that placed months, days and years in non-hierarchical order? Surely something on order of YYYY-MM-DD would make far more sense: this format is sortable and, all-in-all, quite logical.
Eventually, though, I grudgingly accepted that MM/DD/YYYY, trades machine-friendliness for human-friendliness; after all, the year entry rarely changes, and for humans the month and date are the most important pieces of information. Context is usually more than enough to implicity specify what the year is.
But as a auto-complete user, I’ve come to appreciate that this sort of ordering can come in handy even when computers are involved. Consider the hierarchically named and non-hierarchally named list of files:
# hierarchally named
test-algorithm.sh
test-bottles.sh
test-capistrano.sh
utils.sh
# non-hierarchally named
algorithm-test.sh
bottles-test.sh
capistrano-test.sh
utils.sh
In the hierarchal case, to auto-complete test-algorithms.sh, I need to type t<tab>a<tab>; a total of four keystrokes. In the non-hierarchal case, however, I only need to type a<tab>. If I’m frequently accessing these files, the extra keystrokes add up.
So here’s my plea: the next time you’re coming up with a naming convention for files you’re sticking in a directory, consider both moving the “category” component to the end, and thinking of autocomplete friendly names. Your fingers will thank you for it.
(Hat-tip to GameTeX for showing me the light.)
January 22, 2010An unusual workflow for Git, one that Wizard employs extensively, is when a single developer needs to perform merges inside lots of working copies. Normally, a maintainer would pull from the branches he cared about, and offload a large amount of the work to those who were interested in contributing patches. However, Wizard is using Git to provide a service for people who don’t know and aren’t interested in learning Git, so we need to push updates and merge their software for them.
The problem encountered when you are doing a lot of merging is “repeated resolution of the same conflicts.” The solution, at least for the classical case, is git rerere. This feature saves the resolution of conflicts and then automatically applies those resolutions if the conflict comes up again. You can find out this much if you check man git-rerere.
Unfortunately, this merge resolution data is stored per .git directory, specifically in the rr-cache subdirectory, so some modest cleverness is necessary to make this work across many repositories. Fortunately, the simple solution of symlinking all of the rr-cache directories to a common directory both works and is safe of race-conditions when initially merging (it’s not race-safe when writing out resolutions, but I am willing to consider this low contention enough to be a non-issue).
Why is this solution race safe? At first glance at the code in rerere.c, this would seem not to be the case: if two merges were to happen to generate the same merge conflict (precisely the use case of git rerere), the following code would get executed with the same value of hex:
ret = handle_file(path, sha1, NULL);
if (ret < 1)
continue;
hex = xstrdup(sha1_to_hex(sha1));
string_list_insert(path, rr)->util = hex;
if (mkdir(git_path("rr-cache/%s", hex), 0755))
continue;
handle_file(path, NULL, rerere_path(hex, "preimage"));
The last three lines access the (now shared) rr-cache directory, and handle_file will attempt to write out the file rr-cache/$HEX/preimage with the preimage contents; if both instances run handle_file concurrently, this file will get clobbered.
But, as it turns out, we don’t care; barring a SHA-1 collision, both instances will write out the same file. The signature of handle_file is:
static int handle_file(const char *path,
unsigned char *sha1, const char *output)
The first argument is a path to read conflict markers from, and is mandatory. sha1 and output are optional; if output is not NULL, it contains a contents that the entire file, minus any diff3 conflict sections (the ones separated by ||||||| and =======) gets written to; if sha1 is not NULL, it gets a 20-byte binary digest written to it of the contents that output would have received. And thus, balance is restored to the world.
Addendum
Anders brings up the interesting question on whether or not two processes writing the same contents to the same file is actually race safe. Indeed, there is a very similar situation involving two processes writing the same contents to the same file which is a classic example of race conditions:
((echo "a"; sleep 1; echo "b") & (echo "a"; sleep 1; echo "b")) > test
Under normal circumstances, the contents of test is:
a
a
b
b
But every once in a while, one of the processes loses the race, and you get:
a
a
b
due to a non-atomic combination of writing and updating the file offset.
However, the distinguishing characteristic between this example and Git’s case is that, in this example, there is only one file descriptor. However, in Git’s case, there are two file descriptors, since each process called open independently. A more analogous shell script would be:
((echo "a"; sleep 1; echo "b") > test & (echo "a"; sleep 1; echo "b") > test)
the contents of which are (as far as I can tell) invariably:
a
b
Now, POSIX actually doesn’t say what happens if two writes to the same offset with the same contents occur. However, casual testing seems to indicate that Linux and ext3 are able to give a stronger guarantee that writing the same values won’t cause random corruption (note that, if the contents of the file were different, either combination could be possible, and this is what you see in practice).
January 20, 2010A bad habit in the domain of cooking that I’ve picked up from being a starving college student is the propensity to cook all of the ingredients I have on hand. Combine this with the fact that vegetables make a lot of food, the fact that you were planning on feeding 15-20 people (but realistically only fed 10), and that Costco has very large portions, and you have a recipe for post-Mystery Hunt leftover disaster.
Specifically, I now have a large pot filled to the brim with stir fried broccoli, chicken, carrots, celery, potatoes and beef chilling out in my refrigerator. It’s tasty food (if I may say so myself), but there’s only so much stirfry one person is willing to eat… and the quantities here would probably feed someone for a month. Perhaps it is time to invoke the power of living in a dorm.
January 18, 2010Have you ever performed a merge in Git and not have it quite turn out the way you wanted it to? For example, you accidentally converted all of your UNIX line endings to DOS line endings, and now the entire file reports a conflict? Maybe you see a conflict that you don’t really care about resolving, and want to resolve as theirs? Or perhaps the conflicted file is empty and you can’t figure out just what happened there?
Here are some advanced techniques you can apply to your conflicted merges to make things go a little easier. Many of them utilize Git plumbing; that is, the internal commands that interface directly with the bare metal Git abstractions: the index, the tree, the commit graph. Others are as simple as flipping a configuration switch.
Turn diff3 conflicts using git config --global merge.conflictstyle diff3. The diff3 conflict style adds an extra section between the new ||||||| marker and ======= markers, which indicates the original contents of the section, with your changes above and their (the branch that is being merged in’s) changes below. diff3 is a powerful way of reestablishing context of a change you made several months ago (to see the changes you made, compare the middle section with the upper section; for the changes they made, compare the middle section with the lower section), and there is really no good reason not to have this on by default.
If you’ve come in from Subversion, you may be familiar with the FILE.mine, FILE.r2 (the original you worked with) and FILE.r3 (the latest version checked in) files, as well as the ability to run svn resolve --accept theirs-full or mine-full, which says “I don’t care about the other changes, just use this version of the file.” Git offers similar facilities utilizing the merge parents, although they’re a little more hidden.
You may be already familiar with the git show command, which lets you view commits as well as arbitrary blobs inside the tree of any given commit. When you are inside a merge, you can use a special :N: syntax, where N is a number, to automatically select one of the merge parents. 1 selects the common base commit (the lower revision), 2 selects your version (“mine”), and 3 selects their version (the higher revision). So git show :3:foobar.txt shows the upstream version of foobar.txt.
To actually use one of these versions as the merge resolution, use git checkout {--ours|--theirs} filename.txt.
When you’re in a conflict, git diff will give you the deep and dirty of all the conflicts that occurred, sometimes this is too much information. In that case, you can run git ls-files -u to view all of the unmerged files (this is also a lot faster than git status, and will omit all of the files that were merged properly.)
You may notice that there as many as three copies of a file inside the list; this tells you the state of the “common”, “ours” and “theirs” copies mentioned previously. If 1 (common) is missing, that means that the file appeared at the same time in our branch and their branch. If 2 (ours) is missing, it means we deleted the file, but it got a change upstream. If 3 (theirs) is missing, it means we made some changes, but upstream deleted the file. This is especially useful if a file is conflicted, but you can’t figure out why (since there are no conflict markers.)
Sometimes life gives you lemons. Many people suggest you make lemon juice. However, if Git gives you a really bad set of conflict markers, for example, you accidentally flipped the newline style for one of the files, so now the entire file conflicts, don’t settle for that: redo the merge for that file. You can do this with the handy git merge-file command. This runs a three-way file merge, and takes three arguments: the current file, the common file, and the upstream file, and writes out the merge into the current file (first argument). Use git show to dump out your file, the common file and upstream file, do whatever changes to those files you need (for example, run dos2unix), run git merge-file mine common theirs, and then copy the mine over the old conflicted file. Voila, instant new set of conflict markers.
If you discover a global conflict relatively early in the merge process, and it was your fault, it might be easier to back out of the merge git reset --hard, fix the mistake, and try merging again. However, if you’ve already made substantial progress merging a copy, re-merging just a single file can be a lifesaver.
Don’t merge, rebase! Instead of running git pull, run git pull --rebase. Instead of running git merge master, run git rebase master. Your history will be much cleaner as a result, and you want have to go on a massive rebase marathon later if you want to submit your patches upstream.
Now, go forth and merge to thy hearts content!