Portable XCF
This discussion is connected to the gimp-developer-list.gnome.org mailing list which is provided by the GIMP developers and not related to gimpusers.com.
This is a read-only list on gimpusers.com so this discussion thread is read-only, too.
Portable XCF | Tor Lillqvist | 15 Aug 12:49 |
Portable XCF | Raphaël Quinet | 15 Aug 13:57 |
Portable XCF | Steinar H. Gunderson | 15 Aug 14:10 |
Portable XCF | Marc) (A.) (Lehmann | 15 Aug 14:44 |
Portable XCF | Guillermo S. Romero / Familia Romero | 15 Aug 16:40 |
Portable XCF | Alan Horkan | 15 Aug 15:24 |
Portable XCF | Tor Lillqvist | 16 Aug 05:05 |
Portable XCF | Austin Donnelly | 15 Aug 15:30 |
Portable XCF | Alastair Robinson | 16 Aug 01:20 |
Portable XCF | Sven Neumann | 16 Aug 19:02 |
Portable XCF | Kevin Myers | 15 Aug 14:45 |
Portable XCF | Mukund | 15 Aug 14:55 |
Portable XCF | Tino Schwarze | 15 Aug 15:02 |
Portable XCF | Mukund | 15 Aug 15:22 |
Portable XCF | Tino Schwarze | 15 Aug 15:44 |
Portable XCF | Sven Neumann | 15 Aug 15:51 |
Portable XCF | Tino Schwarze | 15 Aug 17:11 |
Portable XCF | Raphaël Quinet | 15 Aug 15:22 |
Portable XCF | Tino Schwarze | 15 Aug 15:41 |
Portable XCF | Guillermo S. Romero / Familia Romero | 15 Aug 17:17 |
Portable XCF | Austin Donnelly | 15 Aug 18:03 |
Portable XCF | Marc) (A.) (Lehmann | 16 Aug 16:32 |
Portable XCF | Carol Spears | 16 Aug 05:53 |
Portable XCF | Karl Heinz Kremer | 16 Aug 14:03 |
Portable XCF
I won't take any stand on either side (or how many sides are there?) in the ongoing discussion, just air some fresh thoughts...
Many of the image formats suggested are some kind of archive formats (zip, ar) on the outside.
I understand that one important benefit from this is that you can store layers and whatnot objects as different "files" in the archive, and easily access them separately. Even with other tools like ar or unzip if need be.
However, these formats have the drawback that even if you can easily have read access to just one of the component "files" in the archive, it is impossible to rewrite a component if its size has changed (well, at least if it has grown) without rewriting at least the rest of the archive. (Or, maybe leaving the old version of the component as garbage bits in the middle, appending the new version and updating the index, if that is estimated to be less expensive than rewriting.)
Now, what concept do the ar, zip, etc formats closely resemble? What other thingie is it that you store files in? Yeah, file systems.
Wouldn't it be neat to use a real file system inside the image file... I.e. the image file would be a self-contained file system, with the image components (layers, XML files, whatnot) as files.
What file system would be good? I don't know. Presumably something as small and simple as possible, but not any simpler. Maybe FAT? ;-) Early V6 Unix style file system (but with longer file names)? Minix? Or something completely different? ISO9960 (I have no knowledge of this, it might be way too complex)? UDF?
Does this make any sense?
Yeah, I can think of some drawbacks: For instance, there would have to be some code to defragment and/or compact the file system image files when needed (if the amount of data in the file system has radically decreased, it should be compacted, for instance). Another is that if the blocks of a layer are scatered here and there, reading it might be slow than from traditional image file formats, where the data is contiguous in the image file.
One neat benefit would be that on some operating systems it would be possible to actually mount the image file as a file system...
--tml
Portable XCF
On Fri, 15 Aug 2003 13:49:41 +0300 (EET DST), Tor Lillqvist wrote:
I won't take any stand on either side (or how many sides are there?) in the ongoing discussion, just air some fresh thoughts...
[...]
Now, what concept do the ar, zip, etc formats closely resemble? What other thingie is it that you store files in? Yeah, file systems.
Wouldn't it be neat to use a real file system inside the image file... I.e. the image file would be a self-contained file system, with the image components (layers, XML files, whatnot) as files.
What file system would be good? I don't know. Presumably something as small and simple as possible, but not any simpler. Maybe FAT? ;-) Early V6 Unix style file system (but with longer file names)? Minix? Or something completely different? ISO9960 (I have no knowledge of this, it might be way too complex)? UDF?
There is unfortunately one thing that most of these filesystems have in common: they are designed to store their data in a partition that has a fixed size. If you create such a filesystem in a regular file, you have to pre-allocate the space that you will need for storing your data.
I have played a lot with loopback filesystems, which are useful for creating things like a read-only encrypted ext2 or FAT filesystem on a CD-ROM. Unfortunately, this only works well when starting with a 600+MB file in which I create the image of the filesystem. It is not possible (or not easy) for the filesystem to grow as needed.
We could have a mixed solution, in which the GIMP would start with a relatively small file containing a filesystem and then replace it with a larger one whenever necessary. But this is not elegant nor efficient, so the solution involving some kind of archive file format is better IMHO.
The proposal for XML + some kind of archive format looks good, except that I do not like the fact that all metadata (especially parasites) will have to be XML-escaped or encoded in Base64. Some parts may be stored as separate files in the archive, but that does not make the decoding easier because this means that some parts of the metadata are included directly while others are included by reference. The main advantage of using XML is that it can easily be debugged by hand. The other arguments that have been discussed so far (for or against XML) are not so significant. If we want something that can be easily read and edited by humans, let's go for XML. If we want something compact and efficient, let's go for something else.
-Raphaël
Portable XCF
On Fri, Aug 15, 2003 at 01:57:35PM +0200, Raphaël Quinet wrote:
There is unfortunately one thing that most of these filesystems have in common: they are designed to store their data in a partition that has a fixed size. If you create such a filesystem in a regular file, you have to pre-allocate the space that you will need for storing your data.
Unless, of course, you simply re-use the filesystem, and make the file a folder instead of a file. It has its definite disadvantages (what do you do if somebody messes with the case in the filenames, or 8.3 mangle them?), but I kind of like the idea. :-) (We've discussed this earlier, though. :-) )
/* Steinar */
Portable XCF
On Fri, Aug 15, 2003 at 01:57:35PM +0200, Raphaël Quinet wrote:
included directly while others are included by reference. The main advantage of using XML is that it can easily be debugged by hand. The other arguments that have been discussed so far (for or against XML) are not so significant.
Opinions differ... for me, debugging is absolutely unimportant. I never had to debug any xcf file, and I don't really want to change that :)
An XML format can be easily extended or updated, and extending xcf was a pain, with xml at least this could become easier.
and edited by humans, let's go for XML. If we want something compact and efficient, let's go for something else.
Indeed, "if". Efficiency is not the problem here (efficiency is much more a problem with the underlying image data storage, i.e. use flat or tiled areas etc.). XML isn't that inefficient compared to other serialization schemes, especially when this has to be done on load/save only, while it might be useful to dynamically swap in/out image data from the file (as some modern os'es do, while others rely on copying everything to swap first, as the gimp does :)
Portable XCF
I could be mistaken, but it doesn't seem that a file system with an extensible size would be a big problem...
We make a request to store a "file" in our "file system within a file", and what we want to store exceeds the available capacity of our present file system. No problem. Our file system's space request handling routine detects the out of space condition, and makes a request to the OS to extend the size of our real file, then proceeds with allocating the desired space in our internal file system. If OS reports out of space, then our file system reports out of space. Pointers used in our file system would be sized such that they could handle any reasonable size, perhaps 32 bit pointers to 256 byte blocks = 1 terrabyte capacity? Could even allow the block size to vary between different OS files to reduce wasted space for small "files" or support larger than 1 TB if necessary.
BTW, Microsoft Windows registry is already basically an extensible file system within a file. A high end business product that I use called also SAS has something similar. I would guess there are others out there as well.
s/KAM
----- Original Message ---
Portable XCF
On Fri, Aug 15, 2003 at 07:45:28AM -0500, Kevin Myers wrote: | BTW, Microsoft Windows registry is already basically an extensible file | system within a file. A high end business product that I use called also | SAS has something similar. I would guess there are others out there as | well.
You brought a strange thought to mind.
Subversion (http://subversion.tigris.org/) implements a versioned FS using a Sleepycat's Berkeley DB database. It has a full library implementation which any application could use.
Imagine that images could be revisioned. Subversion also uses a hybrid delta algorithm for binary diffs.
Mukund
Portable XCF
On Fri, Aug 15, 2003 at 01:55:26PM +0100, Mukund wrote:
| BTW, Microsoft Windows registry is already basically an extensible file | system within a file. A high end business product that I use called also | SAS has something similar. I would guess there are others out there as | well.
You brought a strange thought to mind.
Subversion (http://subversion.tigris.org/) implements a versioned FS using a Sleepycat's Berkeley DB database. It has a full library implementation which any application could use.
Well, using a database as container might be a good idea. I'm not quite familiar with Berkeley DB but it might be useful as a backend.
Imagine that images could be revisioned. Subversion also uses a hybrid delta algorithm for binary diffs.
Worst case: I make my black image white. That's the point where a binary diff will only waste processing power.
Bye, Tino.
Portable XCF
On Fri, Aug 15, 2003 at 03:02:46PM +0200, Tino Schwarze wrote:
| > Subversion (http://subversion.tigris.org/) implements a versioned FS
| > using a Sleepycat's Berkeley DB database. It has a full library
| > implementation which any application could use.
|
| Well, using a database as container might be a good idea. I'm not quite
| familiar with Berkeley DB but it might be useful as a backend.
Subversion provides its own client library for accessing the virtual file system. You won't have to work with the DB directly. It also provides an abstracted recover facility in one of its utilities (in case of stale locks).
| > Imagine that images could be revisioned. Subversion also uses a hybrid
| > delta algorithm for binary diffs.
|
| Worst case: I make my black image white. That's the point where a binary
| diff will only waste processing power.
I said hybrid delta algorithm for binary diffs. I didn't say straightforward A - B diffing.
Even if your images are black and white, they are most likely stored in a compressed format (if a Subversion based GIMP file format was ever invented), and if such compressed files are revisioned, no generic algorithm is going to give you a good difference.
The whole Subversion thing was a far fetched *idea*. An alternative, which is most definitely going to be blown off as there are more reasonable ways of implementing the GIMP file format which are not far fetched.
Mukund
Portable XCF
[Re-sending this because I sent it to Kevin instead of the list. Grumble...]
On Fri, 15 Aug 2003 07:45:28 -0500, "Kevin Myers" wrote:
I could be mistaken, but it doesn't seem that a file system with an extensible size would be a big problem...
It may be a problem with _existing_ filesystems.
We make a request to store a "file" in our "file system within a file", and what we want to store exceeds the available capacity of our present file system. No problem. Our file system's space request handling routine detects the out of space condition, and makes a request to the OS to extend the size of our real file, then proceeds with allocating the desired space in our internal file system. [...]
The whole point of Tor's proposal was to use an existing filesystem, such as FAT, Minix, UDF, ISO9960, etc. Using the Linux loopback devices (for example), one coudl easily mount these filesystems-in-a-file and use the standard tools to work with the files they contain. We could design a filesystem that can be extended dynamically, but then we lose the ability to use existing drivers and tools.
As I mentioned in my previous message, we could of course increase the size of a filesystem such as FAT, but that would basically require a new copy of the file in which we extend the file allocation table or inode table to leave enough room for the new sectors. The same tricks would have to be used when we want to shrink the file. In other words, this is not trivial.
I'd rather have some kind of archive format. If we want to replace an element in the archive by another one that is larger, we can append the larger one at the end of the archive, update the index and leave some unused bits in the middle. That would not waste more space than the filesystem idea. In both cases, we could have an option for defragmenting the file if we do not want to waste space with unused bits or unused sectors. Or we simply re-create a "clean" file when using the "Save As" option. This is exactly what is done by several software packages, including MS Office.
-Raphaël
Portable XCF
On Fri, 15 Aug 2003, Tor Lillqvist wrote:
Date: Fri, 15 Aug 2003 13:49:41 +0300 (EET DST) From: Tor Lillqvist
To: The Gimp Developers' list
Subject: Re: [Gimp-developer] Portable XCFI won't take any stand on either side (or how many sides are there?) in the ongoing discussion, just air some fresh thoughts...
Many of the image formats suggested are some kind of archive formats (zip, ar) on the outside.
I understand that one important benefit from this is that you can store layers and whatnot objects as different "files" in the archive, and easily access them separately. Even with other tools like ar or unzip if need be.
However, these formats have the drawback that even if you can easily have read access to just one of the component "files" in the archive, it is impossible to rewrite a component if its size has changed (well, at least if it has grown) without rewriting at least the rest of the archive. (Or, maybe leaving the old version of the component as garbage bits in the middle, appending the new version and updating the index, if that is estimated to be less expensive than rewriting.)
For the XML files you can use whitespace padding, I was reading the Adobe XMP specifcations and they do this in some places. It is less than ideal but it is an option.
The fact that others have already lead the way with these types of file formats means there is plenty of existing examples to learn from and solutions to potential pitfalls.
Now, what concept do the ar, zip, etc formats closely resemble? What other thingie is it that you store files in? Yeah, file systems.
Wouldn't it be neat to use a real file system inside the image file... I.e. the image file would be a self-contained file system, with the image components (layers, XML files, whatnot) as files.
What file system would be good? I don't know. Presumably something as small and simple as possible, but not any simpler. Maybe FAT? ;-) Early V6 Unix style file system (but with longer file names)? Minix? Or something completely different? ISO9960 (I have no knowledge of this, it might be way too complex)? UDF?
I am pretty sure you can have a Zip Filesytem. (I found a request for similar on the linux kernel mailing list but having difficulty finding something more substantial).
Hopefully someone who knows more about Zip or virtual filesystems can provide more substantial information.
I recall mumblings about Gnome doing away with the need for programs like the predecessors of File-Roller and having Gnome-vfs sort it out and use Nautilus instead.
This looks more promising http://www.hwaci.com/sw/tobe/zvfs.html http://webs.demasiado.com/freakpascal/zfs.htm hopefully someone else will come up with better links.
One neat benefit would be that on some operating systems it would be possible to actually mount the image file as a file system...
Zip is already in wide use and as it is more popular it is therefore more likely to be available as a filesystem if it is not already available than an 'ar' based solution.
To change the subject slightly the adhoc name 'Portable XCF' might be a bit misleading. Portable implies web formats and I think that PNG/MNG/JNG and others largely have this area covered and that the next genertion XCF will needt to many things and hold a fair bit of raw data and be reasonably fast which goes against being a web ready portable format (or at least makes it a low priority). At this early stage hopefully no one will get too attached to any particular name and that can be left until later.
Sincerely
Alan Horkan http://advogato.org/person/AlanHorkan/
Portable XCF
Tor wrote:
[filesystem within a file]
It's a nice idea in theory, but makes it quite hard to write a parser for. MS Word files (until recently) were basically FAT filesystems, which makes it easy to handle under Windows but harder to parse when you don't have a convenient DLL to do it lying around.
The FlashPix format (now little used?) is also a FAT filesystem; it was this fact that persuaded me that writing a Gimp FlashPix loader wouldn't be particularly easy.
So sure, consider the idea, but bear in mind it might be hard to pull off.
When this discussion started, I didn't like the idea of XML with binary data portions. I liked the current binary, tagged, format we have, and thought that it should just be extended. However, after the recent discussion I've come around to quite liking an ar-style archive with a XML catalog, XML metadata, and texels as separate members. I think this is roughly what Leonard was suggesting; we should listen to the voice of experience.
Austin
Portable XCF
On Fri, Aug 15, 2003 at 03:22:54PM +0200, Raphaël Quinet wrote:
I'd rather have some kind of archive format. If we want to replace an element in the archive by another one that is larger, we can append the larger one at the end of the archive, update the index and leave some unused bits in the middle. That would not waste more space than the filesystem idea. In both cases, we could have an option for defragmenting the file if we do not want to waste space with unused bits or unused sectors. Or we simply re-create a "clean" file when using the "Save As" option. This is exactly what is done by several software packages, including MS Office.
I want a new Preferences option in that case: [ ] Enable quick-save
BTW: Would it be possible to get a sparse file by zeroing the unused bits? Then it would be quite space efficient (at least with some file systems).
Bye, Tino.
Portable XCF
On Fri, Aug 15, 2003 at 02:22:03PM +0100, Mukund wrote:
| > Subversion (http://subversion.tigris.org/) implements a versioned FS | > using a Sleepycat's Berkeley DB database. It has a full library | > implementation which any application could use. |
| Well, using a database as container might be a good idea. I'm not quite | familiar with Berkeley DB but it might be useful as a backend.Subversion provides its own client library for accessing the virtual file system. You won't have to work with the DB directly. It also provides an abstracted recover facility in one of its utilities (in case of stale locks).
But we might want to access the DB directly, e.g. for shared memory.
The whole Subversion thing was a far fetched *idea*. An alternative, which is most definitely going to be blown off as there are more reasonable ways of implementing the GIMP file format which are not far fetched.
Hmmm.. it would be cool to have the Undo Stack saved, so I can _really_ continue where I left off when I saved the image.
Bye, Tino.
Portable XCF
Hi,
On Fri, 2003-08-15 at 15:22, Mukund wrote:
Even if your images are black and white, they are most likely stored in a compressed format (if a Subversion based GIMP file format was ever invented), and if such compressed files are revisioned, no generic algorithm is going to give you a good difference.
Actually with GEGL, a solid white or black image will be represented using a special layer node that has no image data at all.
Sven
Portable XCF
quinet@gamers.org (2003-08-15 at 1357.35 +0200):
There is unfortunately one thing that most of these filesystems have in common: they are designed to store their data in a partition that has a fixed size. If you create such a filesystem in a regular file, you have to pre-allocate the space that you will need for storing your data.
Or use a tool to change the size, they exist, and in some cases they allow changing while online. Examples are ext2resize and growfs.
GSR
Portable XCF
On Fri, Aug 15, 2003 at 03:51:53PM +0200, Sven Neumann wrote:
Even if your images are black and white, they are most likely stored in a compressed format (if a Subversion based GIMP file format was ever invented), and if such compressed files are revisioned, no generic algorithm is going to give you a good difference.
Actually with GEGL, a solid white or black image will be represented using a special layer node that has no image data at all.
But only as far as I say "create new layer/image with white background"... Or, wait, are you suggesting that "filling" is an operation known to GEGL, so a SolidFilledLayer will just change it's fill_color when it get's filled again? After all, this optimization does not work any more if I fill an arbitrary selection...
Bye, Tino.
Portable XCF
tino.schwarze@informatik.tu-chemnitz.de (2003-08-15 at 1541.28 +0200):
BTW: Would it be possible to get a sparse file by zeroing the unused bits? Then it would be quite space efficient (at least with some file systems).
Yes, try it with dd and cp (GNU version only?):
dd if=/dev/zero of=/tmp/zero-test count=1000 cp --sparse=always /tmp/zero-test /tmp/zero-sparse ls -l /tmp/zero-test /tmp/zero-sparse du -cs /tmp/zero-test /tmp/zero-sparse
If you get same byte size, 512000 bytes, but different block usage, 0 vs 503 here, your fs is doing sparse files. Another test I did here with a 8258506 bytes file, composed by catting a real data file of 7745389 bytes, then 512000 zero bytes and a final 1117 byte group of random data, gives an usage of 8098 blocks for the original and 7601 for the sparse copy.
What I do not know is how many fs support it, and if they can do on the fly or a forced copy is needed, or if it is a good idea from performance point of view.
GSR
Portable XCF
Yes, try it with dd and cp (GNU version only?):
dd if=/dev/zero of=/tmp/zero-test count=1000 cp --sparse=always /tmp/zero-test /tmp/zero-sparse ls -l /tmp/zero-test /tmp/zero-sparse du -cs /tmp/zero-test /tmp/zero-sparse
[...]
What I do not know is how many fs support it, and if they can do on the fly or a forced copy is needed
It is the copy which makes the sparse file. You can't make a hole in a file merely by writing a bunch of zeros to it. You can only do it by seeking past the (current) end of the file, then writing non-zero data. The bytes you seeked over are the hole, and will be read as if zeros.
GNU cp uses a bunch of heuristics to discover runs of zeros in the input file and seek over them in the output file, rather than just writing zeros.
Austin
Portable XCF
Hi,
On Friday 15 August 2003 2:30 pm, Austin Donnelly wrote:
When this discussion started, I didn't like the idea of XML with binary data portions. I liked the current binary, tagged, format we have, and thought that it should just be extended. However, after the recent discussion I've come around to quite liking an ar-style archive with a XML catalog, XML metadata, and texels as separate members. I think this is roughly what Leonard was suggesting; we should listen to the voice of experience.
If I may add my two penn'th:
Some thought needs to be given to how parasites are going to be stored - I'm thinking particularly of embedded ICC profiles here (IIRC the TIFF plugin attaches any encountered profile as a parasite).
Profiles can be large, so that last thing you'd want to do with one is attempt to text-encode it within an XML file.
I'd personally lean towards having a "Parasites" directory within the archive, and then filing the parasites within it by name, in text or binary format as is appropriate...
All the best,
Portable XCF
BTW, what happened to GNOME's libefs? From quickly browsing the sources, it seems to have been included in bonobo still in bonobo-1.0.22, but then bonobo was renamed to libbonobo, and I don't see any trace of it in libbonobo-2.3.6. Was it such a badly designed disaster that it was dropped? Or did it mutate into part of gnome-vfs or something?
--tml
Portable XCF
Austin Donnelly wrote:
Yes, try it with dd and cp (GNU version only?):
dd if=/dev/zero of=/tmp/zero-test count=1000 cp --sparse=always /tmp/zero-test /tmp/zero-sparse ls -l /tmp/zero-test /tmp/zero-sparse du -cs /tmp/zero-test /tmp/zero-sparse
[...]
What I do not know is how many fs support it, and if they can do on the fly or a forced copy is needed
It is the copy which makes the sparse file. You can't make a hole in a file merely by writing a bunch of zeros to it. You can only do it by seeking past the (current) end of the file, then writing non-zero data. The bytes you seeked over are the hole, and will be read as if zeros.
GNU cp uses a bunch of heuristics to discover runs of zeros in the input file and seek over them in the output file, rather than just writing zeros.
Austin
I looked up heuristic it said it meant heuristisch! How can this be so?
I thought when i cp'd something i was totally making a copy of the file and simply giving it a new name. The size never changes, so how could this be true?
carol
Portable XCF
On Friday, August 15, 2003, at 11:53 PM, Carol Spears wrote:
[ ... ]
I looked up heuristic it said it meant heuristisch! How can this be so?
Did you use an English/German dictionary? Next time use Merriam-Webster Online at www.m-w.com
I thought when i cp'd something i was totally making a copy of the file and simply giving it a new name. The size never changes, so how could this be true?
The contents of the file will not change, just it's representation on the disk. The "ls" will still report the original file size, but the file will actually use less space on the disk. Once the file is read back into memory, the "hole" is filled with the correct amount of zeros again.
Karl Heinz
Portable XCF
On Fri, Aug 15, 2003 at 03:41:28PM +0200, Tino Schwarze wrote:
BTW: Would it be possible to get a sparse file by zeroing the unused bits? Then it would be quite space efficient (at least with some file systems).
No, there is no way to do that. You will need to copy the file if you want to "sparsify" parts, or use os-specific interfaces to do that (if they exist, they don't exist under linux).
The closest you could get is to garbage collect the file and truncate it at the end.
Portable XCF
Hi,
On Sat, 2003-08-16 at 01:20, Alastair Robinson wrote:
Some thought needs to be given to how parasites are going to be stored - I'm thinking particularly of embedded ICC profiles here (IIRC the TIFF plugin attaches any encountered profile as a parasite).
ICC profiles shouldn't be handled as parasites. Parasites are things the core doesn't understand. It's a way to attach arbitrary data to gimp, images or drawables. As soon as the core starts to use color profiles it will know how to handle them and we don't need to use parasites for them.
I already suggested to store parasites in the archive, not embedded in the XML. I've also mentioned that I don't think that we should have folders in the archive since the structural information should be in one place, not in the XML and in some sort of directory tree.
Sven