restoring recovered files identity

I have a couple of one terabyte external hard drives. When I first got them I formatted them with FAT32 so that I could read and write them from both Windows and OS X.

FAT32 is not a journaling filesystem, which means if a write operation is interrupted the drive can end up in an inconsistent state. Within the first month of using the drive, I corrupted the filesystem. I forget exactly what happened: did I knock the power cord out? forget to eject the drive?

Luckily when this happens to a FAT32 disk the files can be recovered. Unfortunately, the filenames are lost. You get a folder called FOUND.000 with files named FILE0000.CHK up to, in my case, FILE0820.CHK. The extension on filenames is important on Windows and on OS X as well because Finder relies on it.

Unix-like systems come with a utility, called file, that can determine file types by examining the contents. When I first read about file it was described as checking the first 2-bytes, called the magic number or magic cookie. More modern versions must check more of the file, because they offer more information than can be contained in 2-bytes.

➜ {ninja} FOUND.000 $ file *.CHK
FILE0004.CHK: RIFF (little-endian) data, AVI, 628 x 254, 25.00 fps, video: DivX 5, audio: MPEG-1 Layer 3 (stereo, 48000 Hz)
FILE0007.CHK: RIFF (little-endian) data, AVI, 580 x 306, 23.98 fps, video: DivX 5, audio: MPEG-1 Layer 3 (stereo, 48000 Hz)
FILE0009.CHK: RIFF (little-endian) data, AVI, 576 x 320, 23.98 fps, video: XviD, audio: MPEG-1 Layer 3 (stereo, 48000 Hz)
FILE0013.CHK: RIFF (little-endian) data, AVI, 612 x 250, 23.98 fps, video: DivX 5, audio: MPEG-1 Layer 3 (stereo, 48000 Hz)
FILE0016.CHK: RIFF (little-endian) data, AVI, 592 x 320, 25.00 fps, video: XviD, audio: MPEG-1 Layer 3 (stereo, 48000 Hz)
FILE0019.CHK: RIFF (little-endian) data, AVI, 608 x 288, 25.00 fps, video: XviD, audio: MPEG-1 Layer 3 (stereo, 48000 Hz)
FILE0022.CHK: RIFF (little-endian) data, AVI, 640 x 272, 23.98 fps, video: XviD, audio: MPEG-1 Layer 3 (stereo, 44100 Hz)
FILE0026.CHK: PNG image, 640 x 272, 8-bit/color RGB, non-interlaced
FILE0027.CHK: PC bitmap, Windows 3.x format, 640 x 272 x 32
FILE0028.CHK: PNG image, 640 x 272, 8-bit/color RGB, non-interlaced

I wanted to at least check what the contents of my FOUND files were before deleting them. I needed to get the appropriate extensions back on the filenames so that finder and other tools would work with them properly. I thought I should be able to do it completely in the shell, and best of all, it’s already a REPL!

First step was to sort them by type.

➜ {ninja} FOUND.000 $ file *.CHK | grep PNG
FILE0026.CHK: PNG image, 640 x 272, 8-bit/color RGB, non-interlaced
FILE0028.CHK: PNG image, 640 x 272, 8-bit/color RGB, non-interlaced

Next I needed the filename by cutting the first 8 characters. I did not know about cut before but I’ve certainly needed to pull a section out of a line before. I expect I’ll be using it again.

➜ {ninja} FOUND.000 $ file *.CHK | grep PNG | cut -c 1-8
FILE0026
FILE0028

I needed to turn these lines in to separate mv commands and xargs does exactly that. I had not used xargs beyond entirely simple commands before but the man page was enough to get me started. Since I was about to move files without a safety net, I wanted to do a quick test first.

➜ {ninja} FOUND.000 $ file *.CHK | grep PNG | cut -c 1-8 | xargs -I filename echo filename.CHK filename.png
FILE0026.CHK FILE0026.png
FILE0028.CHK FILE0028.png

Looked good, so it was time for the real command.

➜ {ninja} FOUND.000 $ file *.CHK | grep PNG | cut -c 1-8 | xargs -I filename mv -v filename.CHK filename.png
FILE0026.CHK -> FILE0026.png
FILE0028.CHK -> FILE0028.png

With the extensions corrected Finder is once again previewing and opening the files correctly. It’s working well for videos and images. I think multi-volume rars are going to be more of a challenge.

This was all prompted by Usher being released today. I’ve wanted an application to manage videos on OS X for a while.