Tuesday, September 26, 2006

GpStructuredStorage internals

[This is a second article in the 'embedded file system' series. If you missed the first part, you can read it here.]

GpStructuredStorage compound file is organized in 1 KB blocks. First block contains header, then the content alternates between a file allocation table (FAT) fragment block and 256 blocks managed by the preceeding FAT fragment. Each block can be represented by a number - header block is block #0, first fat fragment is block #1 and so on.

[header:HEADER:1024]
[fat entry:FATENTRY:1024]
256 x [block:FOLDER/FILE:1024]
[fat entry:FATENTRY:1024]
256 x [block:FOLDER/FILE:1024]
...
[fat entry:FATENTRY:1024]
<=256 x [block:FOLDER/FILE:1024]

Header starts with a signature, which must always be 'GpStructuredStorage file'#13#10#26#0.

HEADER
[signature:32] // PChar
[unused:964]
[storage attribute file:4]
[storage attribute file size:4]
[first FAT block:4]
[first unused block:4]
[root folder:4]
[root folder size:4]
[version:4] // storage system version
Header ends in:
  • block number and size of the internal file containing global storage attributes
  • block number of the first FAT block (always 1)
  • block number of the first unused file block
  • block number and size of the root folder
  • storage system version (at the moment $01000200, or 1.0.2.0)

Each FAT fragment contains 256 32-bit numbers, one for each of the 256 following file blocks. This number is either 0 if block is last in the FAT chain, or it is a number of the next file block in the chain. Each block is linked into exactly one chain. It can either belong to a file/folder or to a empty block list. Address of the first block in the empty list is stored in the header ([first unused block] entry).

FATENTRY
256*[next block:4] // next-pointers for this block; 0 = unused
FAT structure defines the limitations of my file format - last block must have number that is less than 4294967294. As block are 1 KB in size, total file size cannot exceed 4294967294 * 1 KB, which is slightly less than 4 TB. Enough for all practical purposes, I think.

Folders are simple - each folder is just a file. It contains file information records and is terminated with two 0 bytes.

FOLDER //potentially split over several blocks
[FILE_INFO]
[FILE_INFO]
...
[FILE_INFO]
[0:2]
File information record is a variable length record containing file name (up to 64 KB), attributes, length, and address of the first file block (additional blocks are reached by following the FAT chain).
FILE_INFO
[file name length:2]
[file name:1..65535]
[file attributes:ATTRIBUTES:4]
[file length:4] // 4 GB per file
[first file block:4]
At the moment, only two attributes are defined. One specifies that file is actually a subfolder, and another designates a special file containing file attributes (for discussion of attributes see the previous article).
ATTRIBUTES
$0001 = attrIsFolder
$0002 = attrIsAttributeFile
That's just about everything that is to tell about the compound file format. Armed with this knowledge, one can easily write a compound file browser/repair utility. 

7 Comments:

Blogger lhaymehr said...

Brilliant. I found this in GExperts code and decided to use it in a NewsReader program i'm writing. These posts will surely come in handy.

Hvala, pozdrav :)

11:23  
Blogger gabr said...

Thanks for the thumbs up!

(In hvala ;) )

11:42  
Blogger Sabbiolina said...

good work, but i find a bug.

try to increase file size in test:

function TForm1.TestGSSBigAndSmall: boolean;
var
storage: IGpStructuredStorage;
begin
Result := false;
Log('Testing big and small files');
try
storage := CreateStructuredStorage;
storage.Initialize(CStorageFile, fmCreate);
// 0 bytes
TestFile(storage, '/small.dat', true, -1);
TestFile(storage, '/small2.dat', true, -1);
TestFile(storage, '/small2.dat', true);
// cross the 257-block boundary
TestFile(storage, '/large.dat', true, 64);
storage := CreateStructuredStorage;
storage.Initialize(CStorageFile, fmOpenRead);
TestFile(storage, '/small.dat', false, -1);
TestFile(storage, '/small2.dat', false);
TestFile(storage, '/large.dat', false, 64);
Result := true;
except
on E: Exception do
Log(' '+E.Message);
end;
end; { TForm1.TestGSSBigAndSmall }


Self test fail!
I use delphi 2007 Version 11.0.2804.9245
on XP

09:04  
Blogger gabr said...

I can confirm the problem. I'll fix it as soon as I have a little spare time.

Thanks for reporting this!

11:39  
Blogger gabr said...

It turned out to be only a bug in the test suite.

When you passed a value larger than 63 as a fourth parameter to the TestFile method, TestFile made some incorrect assumptions and compared value that was larger than $FFFF to another value that was truncated to two bytes. As a result of that, error was raised (incorrectly).

I have updated test suite at http://gp.17slon.com/gp/files/testgpstructuredstorage_src.zip.

12:49  
Anonymous Anonymous said...

Hi;

Does it possible to store an executable file (.exe) in this storage and
execute from there?

13:43  
Blogger gabr said...

Store, yes. Execute, no.

16:01  

Post a Comment

Links to this post:

Create a Link

<< Home