How to parse an .xcf file?
This discussion is connected to the gimp-developer-list.gnome.org mailing list which is provided by the GIMP developers and not related to gimpusers.com.
This is a read-only list on gimpusers.com so this discussion thread is read-only, too.
GimpCon RFC: Portable XCF | Nathan Carl Summers | 09 Aug 03:01 |
GimpCon RFC: Portable XCF | Guillermo S. Romero / Familia Romero | 10 Aug 00:13 |
GimpCon RFC: Portable XCF | Adam D. Moss | 10 Aug 04:02 |
GimpCon RFC: Portable XCF | Nick Lamb | 11 Aug 16:25 |
GimpCon RFC: Portable XCF | Adam D. Moss | 11 Aug 16:45 |
GimpCon RFC: Portable XCF | Nathan Carl Summers | 11 Aug 17:04 |
GimpCon RFC: Portable XCF | Adam D. Moss | 11 Aug 17:10 |
GimpCon RFC: Portable XCF | Nathan Carl Summers | 11 Aug 17:48 |
GimpCon RFC: Portable XCF | Austin Donnelly | 12 Aug 15:31 |
GimpCon RFC: Portable XCF | Stephen J Baker | 12 Aug 16:03 |
GimpCon RFC: Portable XCF | Joao S. O. Bueno | 12 Aug 21:23 |
GimpCon RFC: Portable XCF | Nathan Carl Summers | 12 Aug 17:25 |
GimpCon RFC: Portable XCF | Sven Neumann | 12 Aug 19:32 |
GimpCon RFC: Portable XCF | Guillermo S. Romero / Familia Romero | 11 Aug 22:53 |
GimpCon RFC: Portable XCF | Adam D. Moss | 12 Aug 00:01 |
GimpCon RFC: Portable XCF | Nathan Carl Summers | 12 Aug 19:43 |
GimpCon RFC: Portable XCF | Øyvind Kolås | 13 Aug 16:42 |
GimpCon RFC: Portable XCF | Guillermo S. Romero / Familia Romero | 10 Aug 19:18 |
GimpCon RFC: Portable XCF | Nathan Carl Summers | 12 Aug 17:45 |
GimpCon RFC: Portable XCF | Nathan Carl Summers | 12 Aug 17:47 |
GimpCon RFC: Portable XCF | Adam D. Moss | 13 Aug 19:27 |
GimpCon RFC: Portable XCF | Nathan Carl Summers | 14 Aug 15:31 |
GimpCon RFC: Portable XCF | David Neary | 15 Aug 18:47 |
How to parse an .xcf file? | Steve Litt | 13 Apr 18:27 |
How to parse an .xcf file? | Sven Neumann | 13 Apr 18:50 |
How to parse an .xcf file? | Dave Neary | 19 Apr 09:44 |
How to parse an .xcf file? | Sven Neumann | 19 Apr 12:10 |
How to parse an .xcf file? | David Neary | 19 Apr 20:37 |
How to parse an .xcf file? | Joao S. O. Bueno | 19 Apr 20:46 |
How to parse an .xcf file? | Sven Neumann | 19 Apr 21:01 |
How to parse an .xcf file? | Steve Litt | 19 Apr 21:45 |
How to parse an .xcf file? | Steve Litt | 19 Apr 22:06 |
How to parse an .xcf file? | Sven Neumann | 19 Apr 22:31 |
How to parse an .xcf file? | Steve Litt | 19 Apr 22:49 |
How to parse an .xcf file? | Sven Neumann | 20 Apr 00:08 |
GimpCon RFC: Portable XCF
Several XCF formats have already been proposed; why should I propose another? It seems to me like the existing proposals have all missed the main point. While they have nice properties for certain extreme cases, they miss the boat when it comes to the main point of a graphics format, which is to efficiently store (and load) graphical information. This has lead to proposals that are neither elegant nor simple; instead they are cumbersome, with redundant, and superficial information stored, along with potential for confict between different sections of the file.
But rather than detail these problems, let me suggest my own solution.
Let us start with an existing graphics format, for inspiration if nothing else. The format I chose is PNG, because it is arguably the best existing lossless portable graphics format available. Before we continue, though, allow me to ennumerate what charactoristics the Gimp native format should possess, in no particular order:
1 lossless
2 portable between architectures and programs
3 extensible
4 capable of representing trees and graphs
5 recoverable from corruption
6 fast random access of data
7 able to support many color depth and spaces
8 able to represent any state that gimp maintains
9 fast loads and saves
10 compact
11 good compression of graphical data
PNG certainly supports 1,2,6,7,9,10, and 11. Let us examine the other issues in more detail.
Extensablity: PNG supports some degree of extensiblity, but the namespace available is quite small, being only four letters. While we could use the same chunk type name for all of our additions, say 'GIMP', and then have the first field in the chunk contain which kind of chunk it really is. But this is an inelegant hack.
Capablitity of representing trees and graphs: Obviously, png's minimal extension facilities could be used to implement chunks that envelope an entire tree of chunks, but this would be difficult to reconsile with PNG's current order-based approach to metadata association, and would be awkward for GIMP-aware and non-GIMP-aware PNG readers alike.
Corruption Recovery: PNG has good corruption detection, but little to facilitate corruption recovery.
Representation of GIMP state: see extensibility.
While PNG's faults aren't serious, we can do better.
A pure XML format, by way of comparison, would fulfill requirements 1,2,3,4,7, and 8. Requirement 5 in practice would be difficult to fulfill in a pure XML format without hand-hacking, which is beyound the skills of most users. A zlib-style compression step could make some progress towards 10.
An archive with XML metadata and png graphical data, on the other hand, would satisfy requirements 1,2,3,4,7,8, and 11. Requirement 6 is fulfilled for simple images, but for more complex images XML does not scale well, since every bite from the begining of the XML file to the place in which the data you are interested in is.
It seems like all we have to do is combine the strengths of PNG and the strengths of XML to create a format that satisfies our requirements. What we really need is not an extensible text markup language, but an extensible graphics markup format.
Such a format would bear strong resemblence to both PNG and XML.
Portable XCF would use a chunk system similar to PNG, with two major differences. First, chunk type would be a string instead of a 32-bit value. Second, chunks can contain an arbitrary number of subchunks, which of course can contain subchunks themselves.
At the end of each chunk is a checksum, as well as a close-chunk marker. The purpose of the close-chunk marker is to help recover in case of corruption; if no corruption is detected, the close-chunk marker is ignored.
One of the major advantages of this hybred technique is that if an implementation does not understand or is not interested in a particular chunk, it can seek to the next chunk without having to read or parse any of the data in-between.
image data chunks should use png-style adaptive predictive compression. They should also use adam-7.
An example is worth a thousand words. Here is a simple RGB image with two layers (one with a parasite) and a comment. This is just a rough sketch of what it would look like:
(labels in all capitial letters are for illustrative purposes and do not take up any space in the file.)
FILE HEADER:
"portable xcf file"
version major - 1 byte
version minor - 1 byte
CHUNK:
chunk start, optional - 2 byte bitmask with some png-like flags
"xcf-comment"
total size of chunk and subchunks - 4 bytes
size of chunk - 4 bytes
"This is the comment"
chunk end (flags) - 2 bytes
"xcf-comment"
1 (subchunk depth) - 1 byte
crc32 - 4bytes
CHUNK:
chunk start, manditory - 2 bytes
"xcf-layerstack"
total size - 4 bytes
size - 4 bytes
SUBCHUNK:
chunk start, manditory - 2 bytes
"xcf-colorspace"
total size - 4 bytes
size - 4 bytes
"xcf-sRGB"
chunk end (flags) - 2 bytes
"xcf-colorspace"
2 (depth) - 1 byte
crc32 - 4 bytes
SUBCHUNK:
chunk start, manditory - 2 bytes
"xcf-layer"
total size - 4 bytes
size - 4 bytes
PARASITE SUBSUBCHUNK:
chunk start, optional - 2 bytes
"gimp-parasite"
total size - 4 bytes
size - 4 bytes
chunk end - 2 bytes
3 (depth) - 1 byte
crc32 - 4 bytes
chunk end (flags) - 2 bytes
"xcf-layer"
2 (depth) - 1 byte
crc32 - 4 bytes
SUBCHUNK:
chunk start, manditory - 2 bytes
"xcf-layer"
total size - 4 bytes
size - 4 bytes
chunk end (flags) - 2 bytes
"xcf-layer"
2 (depth) - 1 byte
crc32 - 4 bytes
chunk end (flags) - 2 bytes
"xcf-layerstack"
1 (subchunk depth) - 1 byte
crc32 - 4bytes
chunk begin, optional - 2 bytes
"xcf-end"
total size-4 bytes
size - 4 bytes
crc of entire file - 4 bytes
Any comments?
Rockwalrus
GimpCon RFC: Portable XCF
rock@gimp.org (2003-08-08 at 1801.54 -0700):
7 able to support many color depth and spaces PNG certainly supports 1,2,6,7,9,10, and 11. Let us examine the other
IIRC (did I read the spec wrongly maybe?) the upper limit is RGBA with 16 bit per channel, no arbitrary color spaces or data formats. You would have to use gray PNGs to assemble other color spaces... and never want 32 int or floats, or use a similar trick than with colour spaces, split data. I think PNG does not fit 7 without tricks.
GSR
GimpCon RFC: Portable XCF
Guillermo S. Romero / Familia Romero wrote:
rock@gimp.org (2003-08-08 at 1801.54 -0700):
7 able to support many color depth and spaces PNG certainly supports 1,2,6,7,9,10, and 11. Let us examine the other
IIRC (did I read the spec wrongly maybe?) the upper limit is RGBA with 16 bit per channel, no arbitrary color spaces or data formats. You would have to use gray PNGs to assemble other color spaces... and never want 32 int or floats, or use a similar trick than with colour spaces, split data. I think PNG does not fit 7 without tricks.
Another data point is that floats are just a bastard to serialise in a portable, precise manner. Personally I'd represent a 32-bit float with a 32-bit integer and 32-bit fixed-point fractional part. Redundant but complete-ish. (Practical better ideas welcomed.)
--Adam
GimpCon RFC: Portable XCF
leonardr@lazerware.com (2003-08-09 at 1830.56 -0700):
Is there a good reason not to use either PSD or TIFF as the native format? The only possible argument for either is that Adobe controls them both. However, I would state that TIFF has pretty much taken on a life of its own outside Adobe (via libtiff).
You are right, PSD is not an option, it would mean always behind Adobe and never able to include new things. And even less now that the specs are harder to get, and some data I doubt will be ever public (is Gimp hardlight 100% the same than Photoshop for example?). About TIFF, every now and then someone appears with an horror story about TIFF files, so while better than PSD, I dunno if enough. :/
GSR
GimpCon RFC: Portable XCF
On Sun, Aug 10, 2003 at 03:02:41AM +0100, Adam D. Moss wrote:
Another data point is that floats are just a bastard to serialise in a portable, precise manner. Personally I'd represent a 32-bit float with a 32-bit integer and 32-bit fixed-point fractional part. Redundant but complete-ish. (Practical better ideas welcomed.)
IEEE floats are portable except for the endian issue. 32-bit floating point PCM audio is just IEEE floats prescribed as little (iirc) endian.
Where did you get the idea that this was problematic?
Nick.
GimpCon RFC: Portable XCF
Nick Lamb wrote:
> On Sun, Aug 10, 2003 at 03:02:41AM +0100, Adam D. Moss wrote:
>
>>Another data point is that floats are just a bastard to serialise
>>in a portable, precise manner. Personally I'd represent a 32-bit
>>float with a 32-bit integer and 32-bit fixed-point fractional part.
>>Redundant but complete-ish. (Practical better ideas welcomed.)
>
> IEEE floats are portable except for the endian issue. 32-bit floating
point
> PCM audio is just IEEE floats prescribed as little (iirc) endian.
>
> Where did you get the idea that this was problematic?
IIRC, the Loki guys. Some ramblings a few years ago on the problems of interoperability of game data between windows/mac/linuxx86/linuxalpha/etc over network and on disk. They made a special point of saying something like 'never, ever serialize floats' and it sounded like the voice of experience.
--Adam
GimpCon RFC: Portable XCF
On Mon, 11 Aug 2003, Adam D. Moss wrote:
IIRC, the Loki guys. Some ramblings a few years ago on the problems of interoperability of game data between windows/mac/linuxx86/linuxalpha/etc over network and on disk. They made a special point of saying something like 'never, ever serialize floats' and it sounded like the voice of experience.
Java doesn't seem to have a problem with it. Even poor fools like me who are working on VM's for machines with non-IEEE floats don't have too much of a problem.
Rockwalrus
GimpCon RFC: Portable XCF
Nathan Carl Summers wrote:
On Mon, 11 Aug 2003, Adam D. Moss wrote:
IIRC, the Loki guys. Some ramblings a few years ago on the problems of interoperability of game data between windows/mac/linuxx86/linuxalpha/etc over network and on disk. They made a special point of saying something like 'never, ever serialize floats' and it sounded like the voice of experience.
Java doesn't seem to have a problem with it. Even poor fools like me who are working on VM's for machines with non-IEEE floats don't have too much of a problem.
That's good to know, it helps me out with some of my own stuff...
How is the serialization done then, just a raw 32-bit IEEE float dump with a predefined endianness? 64-bit doubles just as easy?
--Adam
GimpCon RFC: Portable XCF
On Mon, 11 Aug 2003, Adam D. Moss wrote:
Nathan Carl Summers wrote:
On Mon, 11 Aug 2003, Adam D. Moss wrote:
IIRC, the Loki guys. Some ramblings a few years ago on the problems of interoperability of game data between windows/mac/linuxx86/linuxalpha/etc over network and on disk. They made a special point of saying something like 'never, ever serialize floats' and it sounded like the voice of experience.
Java doesn't seem to have a problem with it. Even poor fools like me who are working on VM's for machines with non-IEEE floats don't have too much of a problem.
That's good to know, it helps me out with some of my own stuff...
How is the serialization done then, just a raw 32-bit IEEE float dump with a predefined endianness? 64-bit doubles just as easy?
Yup.
Rockwalrus
GimpCon RFC: Portable XCF
rock@gimp.org (2003-08-08 at 1801.54 -0700):
Portable XCF would use a chunk system similar to PNG, with two major differences. First, chunk type would be a string instead of a 32-bit value. Second, chunks can contain an arbitrary number of subchunks, which of course can contain subchunks themselves.
PNG 32 bit names are char... or at least all them can be read. :] And I think the purpose of this was, among other ideas: easy to parse (always four chars) and makes sense with some rules about chars (caps vs normal). Even the magic of PNG had a reasoning (part binary to avoid confusion with text and capable of detecting non 8 bit transmision or bad byte order). IOW, why not make it similar, but just bigger (four char for name space and 12 more for function)? Arbitrary size strings does not seem a good idea to me.
Another thing, alignment (and thus padding), is worth the problems it could cause? If the format has to be fast, maybe this should be taken into account, and not only about small sizes in memory (ie 32 bit), but maybe disks (ie blocks) or bigger sizes in memory (ie pages) too. Would the format be used just as storage, or would it be used as source / destination when memory is scarce. Remember that some apps are capable of working in areas instead of the full image, to improve global troughput.
At the end of each chunk is a checksum, as well as a close-chunk marker. The purpose of the close-chunk marker is to help recover in case of corruption; if no corruption is detected, the close-chunk marker is ignored.
One of the major advantages of this hybred technique is that if an implementation does not understand or is not interested in a particular chunk, it can seek to the next chunk without having to read or parse any of the data in-between.
image data chunks should use png-style adaptive predictive compression. They should also use adam-7.
I would avoid compression inside the format. Files can be compressed as a whole, and IIRC Adam7 is about interlacing, not compression, dunno why an editor should do progressive load. Load smaller res in case of problem? I would try to avoid that instead of try to fix it, with proper storage and transmission. Load with proxy images? Too rough, IMO, it is not a scaled down version. PNG compression is the one provided by zlib, and I can show you cases in which other compressors have done a better job with my XCF files (anybody can try bzip2), and if computers keep evolving the same way, the extra CPU load is better than the disk or network transfer.
Letting other apps do it means those apps could be general, reducing work load. Or better, custom, but once the "look" of the data is well known and there is plenty of test cases (like FLAC but for XCF2, compression targeted at some kind of patterns). Realize too that this links to aligment things, if you know that a layer is always somewhere and requires X MB, you can overwrite and reread without problems.
An example is worth a thousand words. Here is a simple RGB image with two layers (one with a parasite) and a comment. This is just a rough sketch of what it would look like:
(labels in all capitial letters are for illustrative purposes and do not take up any space in the file.)
FILE HEADER: "portable xcf file"
Note what I said about PNG file header above.
version major - 1 byte
version minor - 1 byteCHUNK:
chunk start, optional - 2 byte bitmask with some png-like flags "xcf-comment"
total size of chunk and subchunks - 4 bytes size of chunk - 4 bytes
For all these sizes... why not 64 and be avoid future problems? If someone likes it and uses it for really big things, segmentation is a negative point. Or maybe force a small max size for each chunk (forcing segmentation) which would give more CRCs. Options, options, options...
"This is the comment"
chunk end (flags) - 2 bytes
"xcf-comment"
1 (subchunk depth) - 1 byte
crc32 - 4bytes
[...]
I would add unique chunk ID to each, so then can make references.
So of your list of items, 1 (lossless), 2 (portable), 3 (extensible), 4 (graphs), 7 (depth and spaces), 8 (gimp states) are a must. 5 (recoverable) will be nice, a lot, but if you want it to work, it sounds like some escaping and reserved flags will be needed (like line code in transmissions). I would forget 11 (compression), and put 10 (compact) as a secondary to 9 (fast load/save) and 6 (fast access). I would add tile based as 12.
To some extent, it reminds me of the Blender format (with the add on that Blender files are 64 or 32 bit, little or big endian, and all the plataforms can load them fine... Adam will love it :] ).
GSR
GimpCon RFC: Portable XCF
Guillermo S. Romero / Familia Romero wrote:
To some extent, it reminds me of the Blender format (with the add on that Blender files are 64 or 32 bit, little or big endian, and all the plataforms can load them fine... Adam will love it :] ).
I wrote a Blender file reading C library as part of my 'day job'... I wouldn't use the word 'love' exactly.
--Adam
GimpCon RFC: Portable XCF
How is the serialization done then, just a raw 32-bit IEEE float dump with a predefined endianness? 64-bit doubles just as easy?
Yup.
The real problem comes when your code is running on a system without IEEE float support, and you need to manually convert from IEEE float to your local weird-ass machine float. Not common, I grant you, but a real pain when it bites.
Austin
GimpCon RFC: Portable XCF
Austin Donnelly wrote:
How is the serialization done then, just a raw 32-bit IEEE float dump with a predefined endianness? 64-bit doubles just as easy?
The real problem comes when your code is running on a system without IEEE float support, and you need to manually convert from IEEE float to your local weird-ass machine float. Not common, I grant you, but a real pain when it bites.
So it's somehow preferable to come up with our own wierd-ass float format and make life equally hard for everyone?
By far the vast proportion of modern machines have IEEE float - so let's make life easy for the majority. The minority need a conversion routine no matter what we do. The last machine I used that didn't have IEEE float (some wierd Hitachi microcontroller) had convenient library functions to interconvert between it's native format and IEEE.
The only other alternative is to use a storage mechanism for which there is universal conversion support - but the only format that fits that bill is ASCII - surely we aren't contemplating that for bulk image data?
---- Steve Baker (817)619-2657 (Vox/Vox-Mail) L3Com/Link Simulation & Training (817)619-2466 (Fax) Work: sjbaker@link.com http://www.link.com Home: sjbaker1@airmail.net http://www.sjbaker.org
GimpCon RFC: Portable XCF
On Tue, 12 Aug 2003, Austin Donnelly wrote:
How is the serialization done then, just a raw 32-bit IEEE float dump with a predefined endianness? 64-bit doubles just as easy?
Yup.
The real problem comes when your code is running on a system without IEEE float support, and you need to manually convert from IEEE float to your local weird-ass machine float. Not common, I grant you, but a real pain when it bites.
Well, since my day job is working with a non-IEEE machine, I can tell you about that pain first hand. It probably took about three days to write conversion functions between the native format and IEEE float and double.
Rockwalrus
GimpCon RFC: Portable XCF
On Sat, 9 Aug 2003, Leonard Rosenthol wrote:
I see fast loads as an absolute requirement.
Then we need to also look at the GIMP itself and what can be done there.
Of course.
Hopefully, GIMP's file handling will improve to the point where it will load thing on an as-needed basis. Therefore, fast random access is
necessary.
Having load on demand via random access is what will really get you the fast loads!! If you don't solve that, then any work on the file format will be a waste towards your goal.
Exactly, it's a big priority.
Being compact is nice as
well, because not everyone has 3 terrabyte harddrives and a T3 line into their house.Agreed, but what does this really mean in "real world" terms. Are we willing to sacrifice functionality to get a couple of bytes here and there? The image data is the big hit in this format - the structure will end up as a small fraction of any XCF file.
We certainly shouldn't sacrifice high-priority stuff for size, but size should still be a consideration.
* incremental update
just update a single layer w/o rewriting the whole file!This seems like an excellent goal. It seems like you are suggesting a database-like format.
Not necessarily. You should be able to do it with any format with a good catalog system, but some will be easier than others.
How would you handle resizes? Either we could do immediate compaction or garbage collection. Both have their disadvantages.
I think sub-chunks is a bad idea. Although a common way to represent hierarchical relationship, they can also put overhead on random access and also slow down read/write under certain conditions.
How about a TIFF-like directory chunk at the beginning (except hierarchical)?
That would be one solution - sure.
Can you think of a better one?
I just think that doing a full "reinvent" of an image format seems like a waste of time. Let's look at Photoshop...
Photoshop is able to do 100% round-tripping of ALL of its functionality in three formats - PSD, TIFF and PDF. PDF is done via throwing their private info into an undocumented blob - so it doesn't really count. So let's look at the other two.
Both PSD and TIFF are rich image formats that already address most of your original list and are well defined and supported by many existing tools (including GIMP itself). Both are extensible (TIFF more so) and would allow for additional blocks to meet our needs.
Is there a good reason not to use either PSD or TIFF as the native format? The only possible argument for either is that Adobe controls them both. However, I would state that TIFF has pretty much taken on a life of its own outside Adobe (via libtiff).
I think the goal of the XCF redesign is to become the de-facto standard for interchange of layered images. Certainly one aspect of this is freedom from Adobe, but in addition to being an open standard, it needs to be a standard that people have confidence in. In other words, any XCF reader should be able to read any XCF writer's output. A layered TIFF by that name wouldn't cut it, because most tiff readers don't support layered images. Of course, we could always use TIFF internally but call it XCF. We might want to change the magic number as well.
I have no problem with basing Portable XCF on TIFF. It seems to be well designed without really being too overdesigned. On the other hand, I think there are a few improvements that we could make to make it better for the purposes of GIMP.
/me wonders if the CinePaint people have any thoughts...
Rockwalrus
GimpCon RFC: Portable XCF
On Sun, 10 Aug 2003, Leonard Rosenthol wrote:
At 7:18 PM +0200 8/10/03, Guillermo S. Romero / Familia Romero wrote:
About TIFF, every now and then someone appears with an horror story about TIFF files, so while better than PSD, I dunno if enough. :/
There are programs out there that generate bad TIFF - for one reason or another. But we already have to deal with that in our native TIFF coder...
This is what I mean by a standard that people can have confidence in -- people should trust that if their program writes good XCF's that a good program will be able to read it.
Rockwalrus
GimpCon RFC: Portable XCF
Hi,
On Tue, 2003-08-12 at 17:25, Nathan Carl Summers wrote:
Well, since my day job is working with a non-IEEE machine, I can tell you about that pain first hand. It probably took about three days to write conversion functions between the native format and IEEE float and double.
Does glib run on that machine at all? If not, there is not much point in caring and if yes, the conversion functions you wrote should be added to glib.
Sven
GimpCon RFC: Portable XCF
On Mon, 11 Aug 2003, Guillermo S. Romero / Familia Romero wrote:
rock@gimp.org (2003-08-08 at 1801.54 -0700):
Portable XCF would use a chunk system similar to PNG, with two major differences. First, chunk type would be a string instead of a 32-bit value. Second, chunks can contain an arbitrary number of subchunks, which of course can contain subchunks themselves.
PNG 32 bit names are char... or at least all them can be read. :] And I think the purpose of this was, among other ideas: easy to parse (always four chars) and makes sense with some rules about chars (caps vs normal). Even the magic of PNG had a reasoning (part binary to avoid confusion with text and capable of detecting non 8 bit transmision or bad byte order). IOW, why not make it similar, but just bigger (four char for name space and 12 more for function)? Arbitrary size strings does not seem a good idea to me.
This seems like a good proposal.
Another thing, alignment (and thus padding), is worth the problems it could cause? If the format has to be fast, maybe this should be taken into account, and not only about small sizes in memory (ie 32 bit), but maybe disks (ie blocks) or bigger sizes in memory (ie pages) too. Would the format be used just as storage, or would it be used as source / destination when memory is scarce. Remember that some apps are capable of working in areas instead of the full image, to improve global troughput.
Right. To be mmappable, the format should be aligned. I think with careful design, there won't be too much overhead from this.
When I wrote that the example was just a rough sketch, part of what I meant was that I didn't pay too much attention to bit sizes and alignment, because that would have been premature optimization.
One issue with alignment is which platform's alignement rules should be used. I think a good common-denominator format can be found. It won't get the wierd ones, of course. I work on a Cray, and nothing follows cray's alignment rules. :)
image data chunks should use png-style adaptive predictive compression. They should also use adam-7.
I would avoid compression inside the format. Files can be compressed as a whole
It does complicate in-place image manipulation, true. OTOH, you can get much better lossless compression using image-specific techniques such as predictive compression than you can using general purpose techniques.
and IIRC Adam7 is about interlacing, not compression, dunno why an editor should do progressive load. Load smaller res in case of problem? I would try to avoid that instead of try to fix it, with proper storage and transmission. Load with proxy images? Too rough, IMO, it is not a scaled down version.
Well, working a scaled-down version of large files is an important optimization. It's true that not all image manipulation functions can credibly be approximated with working on a scaled-down version, but that's for the gegl people to worry about.
My guess is that it will be easier to use interlaced data than true scaled-down images, and the savings in terms of computational time and pipeline flexablity will be worth it.
PNG compression is the one provided by zlib
PNG's use zlib compression on the overall file, but the entropy is first significanty reduced by using predictive encoding. It's not the same as just running gzip on raw data.
and I can show you cases in which other compressors have done a better job with my XCF files (anybody can try bzip2), and if computers keep evolving the same way, the extra CPU load is better than the disk or network transfer.
True.
Letting other apps do it means those apps could be general, reducing work load.
Of course, but we should not sacrifice functionality for convenience. :)
Or better, custom, but once the "look" of the data is well known and there is plenty of test cases (like FLAC but for XCF2, compression targeted at some kind of patterns).
Conformance testing is very important. That is a good idea.
Realize too that this links to aligment things, if you know that a layer is always somewhere and requires X MB, you can overwrite and reread without problems.
This will have to be worked out.
CHUNK:
chunk start, optional - 2 byte bitmask with some png-like flags "xcf-comment"
total size of chunk and subchunks - 4 bytes size of chunk - 4 bytesFor all these sizes... why not 64 and be avoid future problems? If someone likes it and uses it for really big things, segmentation is a negative point. Or maybe force a small max size for each chunk (forcing segmentation) which would give more CRCs. Options, options, options...
Both have their plusses and minuses.
"This is the comment"
chunk end (flags) - 2 bytes
"xcf-comment"
1 (subchunk depth) - 1 byte
crc32 - 4bytes[...]
I would add unique chunk ID to each, so then can make references.
Good idea.
So of your list of items, 1 (lossless), 2 (portable), 3 (extensible), 4 (graphs), 7 (depth and spaces), 8 (gimp states) are a must. 5 (recoverable) will be nice, a lot, but if you want it to work, it sounds like some escaping and reserved flags will be needed (like line code in transmissions).
If the chunk recoverer finds what it thinks is a valid block, but the checksum doesn't match, then it will assume there is no valid block there. So escaping isn't really necessary.
I would forget 11 (compression), and put 10 (compact) as a secondary to 9 (fast load/save) and 6 (fast access). I would add tile based as 12.
Compression of image data is important, although not essential. 10 is definately less important than 6/9.
To some extent, it reminds me of the Blender format (with the add on that Blender files are 64 or 32 bit, little or big endian, and all the plataforms can load them fine... Adam will love it :] )
Joyful. :)
Rockwalrus
GimpCon RFC: Portable XCF
Stephen J Baker wrote:
Austin Donnelly wrote:
How is the serialization done then, just a raw 32-bit IEEE float dump with a predefined endianness? 64-bit doubles just as easy?
The real problem comes when your code is running on a system without IEEE float support, and you need to manually convert from IEEE float to your local weird-ass machine float. Not common, I grant you, but a real pain when it bites.
So it's somehow preferable to come up with our own wierd-ass float format and make life equally hard for everyone?
By far the vast proportion of modern machines have IEEE float - so let's make life easy for the majority. The minority need a conversion routine no matter what we do. The last machine I used that didn't have IEEE float (some wierd Hitachi microcontroller) had convenient library functions to interconvert between it's native format and IEEE.
I am all for IEEE FP as well.
Just as an ilustration, the code I am working on for
custom layer modes uses fixed point - 32bit , being 16.16.
There are reasons that lead me to choose it, I can comment if
it they are of interest to anyone. If the internal image format is 32bit IEEE it will be easy for me to add the needed convertions, as the 8 bit unsigned integer and 16 bit unsigned integer conversions are in place already.
The only other alternative is to use a storage mechanism for which there is universal conversion support - but the only format that fits that bill is ASCII - surely we aren't contemplating that for bulk image data?
----
Steve Baker (817)619-2657 (Vox/Vox-Mail)
GimpCon RFC: Portable XCF
* Nathan Carl Summers [030813 15:39]:
dunno why an editor should do progressive load. Load smaller res in case of problem? I would try to avoid that instead of try to fix it, with proper storage and transmission. Load with proxy images? Too rough, IMO, it is not a scaled down version.
Well, working a scaled-down version of large files is an important optimization. It's true that not all image manipulation functions can credibly be approximated with working on a scaled-down version, but that's for the gegl people to worry about.
My guess is that it will be easier to use interlaced data than true scaled-down images, and the savings in terms of computational time and pipeline flexablity will be worth it.
Ideally GEGL will collapse all affine transformations, thus doing resampling only once,. that resample should ideally be from the original data, possible being stored in a tile cache for following calculations of the compositing graph. If the ability to directly from file use a scaled-down version of the image one should rather use a specialiced image format storing a image pyramid (sizes 50%, 25%, 12.5%, 6.25%, etc) that allows the gegl faucet node providing the image to use scale factor as a parameter when loading.
For general operation this will not be of great importance, and thus a format providing a linear buffer be better. For 8 and 16bit integer data uncompressed png would provide random access if within a container file format. A compressed png file would be a little harder, but by making an intelligent png loader, one could get a row of tiles without much overhead, (uncompressing the preceeding tiles without actually storing the data)
/pippin
GimpCon RFC: Portable XCF
Stephen J Baker wrote:
So, I think what is needed to make a reliable file format is to provide a well written library for reading and writing the files that's freely available and properly maintained on every modern platform FROM DAY ONE.
I agree with this -- I think it's really important.
(That's if we either want or expect the new XCF to become a defacto standard in the first place. Personally I'm not sure either way, but in any case it makes sense to library-ise the XCF load/saver just from a technical abstraction standpoint.)
--Adam
GimpCon RFC: Portable XCF
On Thu, 14 Aug 2003, Øyvind Kolås wrote:
* Adam D. Moss [030814 09:59]:
Stephen J Baker wrote:
So, I think what is needed to make a reliable file format is to provide a well written library for reading and writing the files that's freely available and properly maintained on every modern platform FROM DAY ONE.
I agree with this -- I think it's really important.
[SNIP]
but in any case it makes sense to library-ise the XCF load/saver just from a technical abstraction standpoint.)
Which is why I in an earlier mail suggested developing a GEGL file format that gimp could extend and use a subset of. By doing it this way, gegl would be the aforementioned file loading, and compositing library,. (e.g. if an application needs to load an XCF2 file, but don't support layers, the library would be capable of compositing it, putting the logic to do compositing of layers, layer groups, adjustment layers, u8, u16, float, double, cmyk, rgb, ycbcr and spotcolors into a file loading library,. makes very little sense
It actually makes a lot of sense to have GEGL support loading XCFs. It would probably be a good idea to have a separate library as well, for those apps that already have their own compositors and don't want to have another one as well.
Rockwalrus
GimpCon RFC: Portable XCF
Leonard Rosenthol wrote:
At 6:29 PM +0200 8/14/03, Øyvind Kolås wrote:
Then you jsut want to be able to understand the XML file, which is the reason I proposed using something like xml in the first place, the rest of the logic would then be contained in your application.
Well, yes, I need to understand the FILE FORMAT...whether that be XML, PNG, TIFF, XCF, etc.
But there seems to be a general belief that there should be a standard library for reading/writing the file format to help reduce the issues of multiple implementations. That library shoudl ONLY be a file format handler, it should NOT be all of GEGL...
Surely this is a detail, and the important thing, that is using some kind of metadata manifest, with binary image data stored in some widely supproted image format, is something we can agree on?
Whether gegl provides a separate libxcf or not is surely a detail that can be taken care of at the implementation stage...
That said, since the general idea is to store layer structure in the image data, and use compositing to generate the final image, libxcf would require access to quite a lot of gegl's internal workings most of the time... at least if the destination application wanted to use gegl for composing. Of course, if they wanted to work around gegl, and use a native layer model, then they wouldn't need to get at gegl's graphing stuff at all. But they'd be limiting themselves more or less to stacks, or very simple trees.
Cheers,
Dave.
How to parse an .xcf file?
Hi all,
I've created a .xcf file in which you cannot turn off the selection. If you want to see it, it's:
http://www.stevelitt.com/images/page2.xcf
What I'd like to do is go into the actual data with an editor, and manually remove the selection. I'm assuming here that a .xcf file is some sort of compressed markup language, but in fact I can't convert it to text with zless, unzip or gunzip.
What is the format for a .xcf file, and how do I parse it?
I'm using Gimp version 1.2.5 on Mandrake 9.2.
Thanks
SteveT
Steve Litt
Author:
* Universal Troubleshooting Process courseware
* Troubleshooting Techniques of the Successful Technologist
* Rapid Learning: Secret Weapon of the Successful Technologist
Webmaster
* Troubleshooters.Com
* http://www.troubleshooters.com
How to parse an .xcf file?
Hi,
Steve Litt writes:
I've created a .xcf file in which you cannot turn off the selection. If you want to see it, it's:
This file has a floating selection. Either anchor it or turn it into a new layer. There's nothing broken about the file.
What I'd like to do is go into the actual data with an editor, and manually remove the selection. I'm assuming here that a .xcf file is some sort of compressed markup language, but in fact I can't convert it to text with zless, unzip or gunzip.
What is the format for a .xcf file, and how do I parse it?
Don't attempt to parse it. There is no documentation on the file format. It's entirely GIMP's job to deal with this format. Don't try to use it outside the GIMP.
Sven
How to parse an .xcf file?
Hi Steve,
Steve Litt wrote:
What is the format for a .xcf file, and how do I parse it?
xcf is a pure-binary format. It is documented in several places -
1) in devel-docs/xcf.txt in the GIMP's CVS
2) in app/xcf/*.[ch] - notably xcf.c/xcf.h which describe the file format and
xcf-load and -save which do the reading/writing.
3) in ImageMagick's xcf filter (this will flatten your image and the floating
selection, I believe)
4) On Cinepaint's website http://cinepaint.sourceforge.net in the docs section.
Cheers, Dave.
How to parse an .xcf file?
Hi,
Dave Neary writes:
Steve Litt wrote:
What is the format for a .xcf file, and how do I parse it?
xcf is a pure-binary format. It is documented in several places -
1) in devel-docs/xcf.txt in the GIMP's CVS 2) in app/xcf/*.[ch] - notably xcf.c/xcf.h which describe the file format and xcf-load and -save which do the reading/writing. 3) in ImageMagick's xcf filter (this will flatten your image and the floating selection, I believe)
4) On Cinepaint's website http://cinepaint.sourceforge.net in the docs section.
Sorry, wrong answer. The right answer is: "You don't parse it."
Sven
How to parse an .xcf file?
Hi,
Sven Neumann wrote:
Dave Neary writes:
xcf is a pure-binary format. It is documented in several places -
1) in devel-docs/xcf.txt in the GIMP's CVS 2) in app/xcf/*.[ch] - notably xcf.c/xcf.h which describe the file format and xcf-load and -save which do the reading/writing. 3) in ImageMagick's xcf filter (this will flatten your image and the floating selection, I believe)
4) On Cinepaint's website http://cinepaint.sourceforge.net in the docs section.Sorry, wrong answer. The right answer is: "You don't parse it."
I guess we'll have to agree to disagree. "you don't parse it" doesn't exactly tally with being an open source program.
Dave.
How to parse an .xcf file?
On Monday 19 April 2004 15:37, David Neary wrote:
Hi,
Sven Neumann wrote:
Dave Neary writes:
xcf is a pure-binary format. It is documented in several places -
1) in devel-docs/xcf.txt in the GIMP's CVS 2) in app/xcf/*.[ch] - notably xcf.c/xcf.h which describe the file format and xcf-load and -save which do the reading/writing. 3) in ImageMagick's xcf filter (this will flatten your image and the floating selection, I believe)
4) On Cinepaint's website http://cinepaint.sourceforge.net in the docs section.Sorry, wrong answer. The right answer is: "You don't parse it."
I guess we'll have to agree to disagree. "you don't parse it" doesn't exactly tally with being an open source program.
Actually, I do read it as
"XCF is to hackish, and only The GIMP can read it. We need to speed-up a new
GIMP native format, with a XML Header.; Meanwhile use other image formats,
please"
But that means that by all means, the GIMP should have another file format that could be readable by other apps.
What about starting work on this for 2.2, even if it is not going to completly replace .XCF by the time 2.2 is out? We could have an extensible file format that would hold most GIMP image information, and still not be the default file format, if that would be so hard to achieve.
I do not recall seeing a new file format in the "2.2" plans Sven wrote up a couple of weeks ago.
Dave.
How to parse an .xcf file?
Hi,
"Joao S. O. Bueno" writes:
Actually, I do read it as "XCF is to hackish, and only The GIMP can read it. We need to speed-up a new GIMP native format, with a XML Header.; Meanwhile use other image formats, please"
Well, yes, but I wrote that in an earlier reply already.
What about starting work on this for 2.2, even if it is not going to completly replace .XCF by the time 2.2 is out? We could have an extensible file format that would hold most GIMP image information, and still not be the default file format, if that would be so hard to achieve.
I do not recall seeing a new file format in the "2.2" plans Sven wrote up a couple of weeks ago.
The plan wasn't a plan but a request for comments.
The new file format we outlined will be strongly dependent on GEGL and it will require decoders to either use GEGL or to implement functionality similar to GEGL. In case you didn't notice, OEyvind Kolaas is working on this and he already came up with a nice subset of what could be the final file format.
Sven
How to parse an .xcf file?
On Monday 19 April 2004 03:01 pm, Sven Neumann wrote:
Hi,
"Joao S. O. Bueno" writes:
Actually, I do read it as "XCF is to hackish, and only The GIMP can read it. We need to speed-up a new GIMP native format, with a XML Header.; Meanwhile use other image formats, please"
Well, yes, but I wrote that in an earlier reply already.
What about starting work on this for 2.2, even if it is not going to completly replace .XCF by the time 2.2 is out? We could have an extensible file format that would hold most GIMP image information, and still not be the default file format, if that would be so hard to achieve.
I do not recall seeing a new file format in the "2.2" plans Sven wrote up a couple of weeks ago.
The plan wasn't a plan but a request for comments.
The new file format we outlined will be strongly dependent on GEGL and it will require decoders to either use GEGL or to implement functionality similar to GEGL. In case you didn't notice, OEyvind Kolaas is working on this and he already came up with a nice subset of what could be the final file format.
I'm just one user of millions, but I hope any new native file format is not binary. Being able to easily manipulate your data with Vim is a very nice plan B, in case it gets in a state Gimp can't handle, or in case one wants to run a script on it.
I do this with LyX all the time, including a VimOutliner to LyX conversion
script.
Steve Litt
Author:
* Universal Troubleshooting Process courseware
* Troubleshooting Techniques of the Successful Technologist
* Rapid Learning: Secret Weapon of the Successful Technologist
Webmaster
* Troubleshooters.Com
* http://www.troubleshooters.com
(Legal Disclaimer) Follow these suggestions at your own risk.
How to parse an .xcf file?
On Monday 19 April 2004 02:37 pm, David Neary wrote:
Hi,
Sven Neumann wrote:
Dave Neary writes:
xcf is a pure-binary format. It is documented in several places -
1) in devel-docs/xcf.txt in the GIMP's CVS 2) in app/xcf/*.[ch] - notably xcf.c/xcf.h which describe the file format and xcf-load and -save which do the reading/writing. 3) in ImageMagick's xcf filter (this will flatten your image and the floating selection, I believe)
4) On Cinepaint's website http://cinepaint.sourceforge.net in the docs section.Sorry, wrong answer. The right answer is: "You don't parse it."
I guess we'll have to agree to disagree. "you don't parse it" doesn't exactly tally with being an open source program.
Dave.
I think what Sven probably meant is it's very difficult to parse, having to basically rip out the load and save functions, and get all the data definitions right. David -- thanks for the tips, because in my case it's worth the aggravation to be able to parse the data.
I've been reviewing xcf.h and xcf.c and it's pretty interesting. It looks to me like an image is basically a list of properties, layers, channels, floating selection, and selections. I'm not sure what a property is, or how it relates to a pixel.
I just began to study XcfImage *read_xcf_image( FILE *fp ), and after I fully understand it, I should be able to be able to parse a .xcf file into an outline or something similar, and then pack the outline back into a .xcf file.
Thanks
Steve
Steve Litt
Author:
* Universal Troubleshooting Process courseware
* Troubleshooting Techniques of the Successful Technologist
* Rapid Learning: Secret Weapon of the Successful Technologist
Webmaster
* Troubleshooters.Com
* http://www.troubleshooters.com
How to parse an .xcf file?
Hi,
Steve Litt writes:
I think what Sven probably meant is it's very difficult to parse, having to basically rip out the load and save functions, and get all the data definitions right. David -- thanks for the tips, because in my case it's worth the aggravation to be able to parse the data.
I also explicitely meant to discourage you and anyone else from attempting to load XCF files. If you need to read them, let GIMP read them for you. The GIMP PDB gives you everything you need to access the data contained in the XCF file.
Sven
How to parse an .xcf file?
On Monday 19 April 2004 04:31 pm, Sven Neumann wrote:
I also explicitely meant to discourage you and anyone else from attempting to load XCF files. If you need to read them, let GIMP read them for you. The GIMP PDB gives you everything you need to access the data contained in the XCF file.
What is the Gimp PDB?
Steve
Steve Litt
Author:
* Universal Troubleshooting Process courseware
* Troubleshooting Techniques of the Successful Technologist
* Rapid Learning: Secret Weapon of the Successful Technologist
Webmaster
* Troubleshooters.Com
* http://www.troubleshooters.com
How to parse an .xcf file?
Hi,
Steve Litt writes:
I also explicitely meant to discourage you and anyone else from attempting to load XCF files. If you need to read them, let GIMP read them for you. The GIMP PDB gives you everything you need to access the data contained in the XCF file.
What is the Gimp PDB?
The GIMP Procedural Database; basically the API that GIMP exports to plug-ins and scripts.
Sven