Tuesday, February 6, 2007

Winter of code (part 4: Ramblings)

In this episode I am going to continue talking about NTFS details that I think will be important to this project. I do not have this project ready so I am blogging as I think about it so expect some 'wardrobe malfunctions' along the way.

In fact the most important item in my mind so far is here: the The File Reference Number (FRN). The FRN is a 64-bit number that uniquely identifies each file in an NTFS volume. The FRN is comprised of a 48-bit part and a 16-bit part. The former part is the sector offset of the first record about the file in the MFT and the latter part (called the ‘generation’) is just a counter that monotonically increases each time NTFS reuses a MFT record. Taken together the FRN should be unique in time and space. Or I least I wish, but this is not exactly true.

The conditions under a FRN for a given file changes are kind of complicated and better left for another day.

The good news is that the FRN of a file should ‘rarely’, if ever, change because as long as the file has not been deleted, there is no reason to free/reuse its first record entry in the MFT which is really is what makes the FRN a stable id for a file. By the way, FAT volumes also have the notion of FRNs but with the disadvantage that they can change under much more common conditions making them more or less unusable. Lucky for us we don't care about FAT volumes.

So let me introduce a little command line test program called fdb1.exe. The only parameter it takes is a path to a file or directory and it outputs the corresponding FRN in hex along with the VSN (volume serial number). Taken together, this two numbers can uniquely identify a file in your computer. The total line count including blank lines is 113.

Here is a typical output:
file c:\$MFT
has FRN of 0x0001000000000000 and VSN is 0xc402ce7d
file c:\
has FRN of 0x0005000000000005 and VSN is 0xc402ce7d
file c:\boot.ini
has FRN of 0x0001000000000ca7 and VSN is 0xc402ce7d
file e:\
has FRN of 0x0005000000000005 and VSN is 0x481fccd6
file c:\$MFTMirr
has FRN of 0x0001000000000001 and VSN is 0xc402ce7d

Note how the FRN of C:\ and E:\ are the same but the VSN is different. This is somewhat explainable because the root of the volumen which is a directory named '.' is always the fifth record in the MFT. In fact the first 12 or so MFT records, all have very predictable contents.

Now let's get down an dirty. Here is main( ) :
int wmain(int argc, wchar_t* argv[])
{
try
{
if(2 != argc)
{
OutputUsageError();
return 1;
}
FileHandle file = CreateFile(argv[1], 0,
FILE_SHARE_READFILE_SHARE_WRITE,
NULL, OPEN_EXISTING,
FILE_FLAG_BACKUP_SEMANTICS,NULL);


BY_HANDLE_FILE_INFORMATION fi = {0};
BoolCheck bc =
::GetFileInformationByHandle(file.Get(),&fi);


OutputResults(argv[1], fi.nFileIndexLow,
fi.nFileIndexHigh,
fi.dwVolumeSerialNumber);

return 0;
}
catch(ApiError& ex)
{
OutputException(ex.error, ex.text);
return 2;
}
}


Yes, that is the actual program, *with* error handling included.

As you can see the code just calls two Win32 APIs and then spits the info collected to the standard output. The first one, CreateFile( ) opens the file or directory path received from the command line and gets you a handle that is fed into the second API GetFileInformationByHandle which returns a bunch of information about the file including the FRN. The only thing to note here is the use of the CreateFile parameters with “backup semantics” so that a single call can open both files and directories because the regular semantics (so to say) are meant for files and the API fail when opening a directory.

Besides showing some FRNs, I wanted to talk about the kind of code style I do. If you program Windows, take it as an advice. First of all, get rid of that ugly TCHAR and _T( ) macro stuff all over the place; Windows is Unicode and we are not coming back from that trip. One of the main purposes of adding all the extra complexity to C++ was to provide viable alternatives to C macros, therefore I feel is my duty to remove as many as I can. By the way, kudos to the VS2005 team that have made the win32 app project wizards with the Unicode switch enabled by default.

Second, use negative logic when testing a condition. The “true” part of the ‘if’ is where I expect to find the error/failure handling logic and possibly a return. The other option, which is to have the “true” part of the ‘if’ be the success condition leads to multiple nested levels of {, which is harder to read, verify and to code right.

However, in order to do this the right way you are going to need to follow the next item.

Third, use holders. A holder is a small, silly class that implements some form of the RAII pattern. I assume that you are familiar with this technique, if you are not, you really need to. The form that usually I like on a holder does not try to hide the native API to the ‘held’ resource; instead it handles in its constructor the possible error conditions that the ‘create/open’ API call can dish out.

class FileHandle
{
HANDLE file_;
public:
FileHandle(HANDLE file):file_(file)
{
if (INVALID_HANDLE_VALUE == file_)
{
throw
ApiError(WIDEN(__FUNCTION__),
::GetLastError());
}
}

~FileHandle()
{
_ASSERTE(INVALID_HANDLE_VALUE != file_);

if(!::CloseHandle(file_))
{
_ASSERTE(false);
}
}

HANDLE Get() const
{
return file_;
}

private:
FileHandle();
FileHandle(const FileHandle&);

};


Note bene: the built in macro __FUNCTION__ resolves to a quoted ANSI string containing the undecorated name of the enclosing function.

The holder as you can see it is quite simplistic. Oh I hear you say, ‘I have a better one, it uses templates’, and ‘yours does not do X, Y or Z’. Yes I know, I do have the fancy ones but they are hard to read, hard to test and they are overkill here.

Never underestimate the power of simple my little grasshopper.

And that goes for you too mr. boost:: lover. Not that there is anything wrong with liking boost.

Fourth, bake in some form of asserts, but in all things moderation. The best ones should be hidden somewhere in your helper classes, away from the main logic. For example in my code you see assert if CloseHandle fails. Why? Because that should not happen, unless somebody is violating the RAII contract, which it could actually happen at some point. I have seen way too many pointless asserts in my time. Besides, a destructor is a bad place to throw an exception.

Fifth, if you agree with the premise of using negative logic and if you buy the idea of using holders then you inevitably end using exception-based error handling because you end up handling API errors in a constructor. I know, I know, influential people have spoken against using exceptions, but frankly they use as examples in a language I call C+ code (a single plus) which is just C plus some type safety.

Speaking of exception-based error handling here two rules:
1. Throw a lot but catch just a few
2. Don’t catch what you cannot handle and thus never do a catch(…)

I also have seen that once you follow this 5-fold way what error is just an expected error and what is an exceptional condition become less blurry.

Without further ado, here is my silly exception class for this program:


struct ApiError
{
const wchar_t* text;
unsigned long error;

ApiError(const wchar_t* txt, unsigned long err)
:text(txt), error(err){}
};


The final trick that I want to show you is what do I do with the APIs that return a boolean result such as GetFileInformationByHandle. If you have seen a lot of windows code the pattern is all too familiar: if false then there was an error which can be recovered by calling GetLastError(), of course before calling any other API because the last error is a thread-local variable that will get overriten if you call any other Win32 API using the same thread


class BoolCheck
{
public:
BoolCheck(BOOL res)
{
if(FALSE == res)
{
throw
ApiError(WIDEN(__FUNCTION__),
::GetLastError());
}
}
private:
BoolCheck();
BoolCheck(const BoolCheck&);
};


And that's all there is to see. The rest of the program is the WIDEN macro that you can find anywhere (converts ansi strings into unicode), and two little ouput functions that just take the inputs and do your tipical std::wcout << stuff.

The bigger picture is that you can factor out into helper classes a lot of the noise that obscures the meat of the program and that I think is worth the effort.

No comments: