Git diff with UTF-16 files

Git usually does a good job of guessing whether a file is a text or binary file, but apparently UTF-16 isn’t recognized as text. What happens is that if you edit UTF-16 files such as Windows .reg files, git diff will only tell you that they have changed, but won’t show actual text differences.

You can force a text diff by setting the “diff” attribute on the file type you’re interested in through the .gitattibute file as follows:

*.reg diff

But this will only make git treat those files as thought they were UTF-8, which will produce useless garbage.

To achieve a proper diff, you need to tell git to “preprocess” the files through a converter (in this case, UTF-16 to UTF-8) before performing the diff. This is done in 2 steps:

First, declare a diff type “utf16” in your .gitconfig file:

[diff "utf16"]
    textconv = "iconv -f utf-16 -t utf-8"

The “textconv” is where you tell git how to preprocess your file; here we use iconv to convert from UTF-16 to UTF-8.

Then associate your file type with the specific diff in .gitattributes:

*.reg diff=utf16

That’s it, git diff will now correctly show differences in .reg files as text differences!

A  few notes:

  • this trick only affects the user output of git diff and git log commands, it is not meant to generate an actual patch!
  • we use the external command “iconv” to perform the conversion; this means iconv must be installed on the system and accessible in the PATH. It will work on most Unixes, in Cygwin and in SourceTree’s git terminal, but not in Windows command prompt.
  • SourceTree (and probably other GUIs) will make use of this configuration to display the diffs correctly too.

Credits to the following StackOverflow contribution: http://stackoverflow.com/a/21020607/170637

 


Leave a Reply

Your email address will not be published. Required fields are marked *