Size on disk vs size reported on status bar
This discussion is connected to the gimp-developer-list.gnome.org mailing list which is provided by the GIMP developers and not related to gimpusers.com.
This is a read-only list on gimpusers.com so this discussion thread is read-only, too.
Size on disk vs size reported on status bar
I was curious about the disparity in reported size between the size on disk in a file manager, and the size reported by "%m". I'm guessing "%m" is the size in memory?
For example, I have a GIMP 2.9 XCF that is 4000 x 6000 pixels, 7 layers. It's the early stages of a drawing, with very little information on the various layers, just the main outlines, some guidelines and a merged layer in shades of gray.
The size of the XCF file on disk is 11.5 MiB as measured by the Dolphin file manager. When I open the XCF file in GIMP, the size shown in the status bar is 3.8 GB.
Why such a huge difference in reported size?
Size on disk vs size reported on status bar
Elle Stone (ellestone@ninedegreesbelow.com) wrote:
I was curious about the disparity in reported size between the size on disk in a file manager, and the size reported by "%m". I'm guessing "%m" is the size in memory?
Yeah, it is the in-memory size, including the overhead for all the gobjects etc.
The size of the XCF file on disk is 11.5 MiB as measured by the Dolphin file manager. When I open the XCF file in GIMP, the size shown in the status bar is 3.8 GB.
The current generation XCF files can contain compressed image data. That'd easily explain the difference.
Bye, Simon
simon@budig.de http://simon.budig.de/
Size on disk vs size reported on status bar
On 10/03/2017 01:21 PM, Simon Budig wrote:
The current generation XCF files can contain compressed image data.
When I save XCF files I don't use the option in the Save dialog to compress the data. Is there some other compression going on automatically?
Best, Elle
Size on disk vs size reported on status bar
Hi,
On Tue, Oct 3, 2017 at 7:17 PM, Elle Stone wrote:
I was curious about the disparity in reported size between the size on disk in a file manager, and the size reported by "%m". I'm guessing "%m" is the size in memory?
For example, I have a GIMP 2.9 XCF that is 4000 x 6000 pixels, 7 layers. It's the early stages of a drawing, with very little information on the various layers, just the main outlines, some guidelines and a merged layer in shades of gray.
The size of the XCF file on disk is 11.5 MiB as measured by the Dolphin file manager. When I open the XCF file in GIMP, the size shown in the status bar is 3.8 GB.
I don't know how this info is computed in GIMP code (i.e. does it check for actual memory size or does it do some basic maths as I am about to do), but assuming your image is 8 bits per channel, and all 7 layers are the size of the image and have no alpha, that makes: 4000*6000*7*24 = 4 032 000 000 / 1024^3 ~ 3.8GB! :-)
Why such a huge difference in reported size?
Basically in memory, we don't compress. It doesn't matter if your image were even full unicolor (which could compress perfectly and could result in a file of a few bytes in any compressed format even if the image was billions of pixels). As such, every bytes is fully repeated in memory, even if it were the same.
Jehan
_______________________________________________ gimp-developer-list mailing list
List address: gimp-developer-list@gnome.org List membership: https://mail.gnome.org/mailman/listinfo/gimp-developer-list List archives: https://mail.gnome.org/archives/gimp-developer-list
ZeMarmot open animation film http://film.zemarmot.net Patreon: https://patreon.com/zemarmot Tipeee: https://www.tipeee.com/zemarmot
Size on disk vs size reported on status bar
Hi again!
On Tue, Oct 3, 2017 at 8:18 PM, Elle Stone wrote:
On 10/03/2017 01:21 PM, Simon Budig wrote:
The current generation XCF files can contain compressed image data.
When I save XCF files I don't use the option in the Save dialog to compress the data. Is there some other compression going on automatically?
Yes, XCF historically uses RLE compression, which is basic hence not great, but still enough for a very simple image with a lot of identical (i.e. same color) adjacent pixels (as I understand your image is).
The option in the save dialog is about activating the newer zlib compression instead, which is much more sophisticated and should therefore result in even smaller files. But on very simple files, simple compression can work great too (sometimes even better than complex compression, that's rare but it may happen).
Jehan
Best,
Elle_______________________________________________ gimp-developer-list mailing list
List address: gimp-developer-list@gnome.org List membership: https://mail.gnome.org/mailman/listinfo/gimp-developer-list List archives: https://mail.gnome.org/archives/gimp-developer-list
ZeMarmot open animation film http://film.zemarmot.net Patreon: https://patreon.com/zemarmot Tipeee: https://www.tipeee.com/zemarmot
Size on disk vs size reported on status bar
On 10/03/2017 09:33 PM, Jehan Pagès wrote:
I don't know how this info is computed in GIMP code (i.e. does it check for actual memory size or does it do some basic maths as I am about to do), but assuming your image is 8 bits per channel, and all 7 layers are the size of the image and have no alpha, that makes: 4000*6000*7*24 = 4 032 000 000 / 1024^3 ~ 3.8GB!:-)
Hi,
Actually the image is at 32-bit floating point precision, all the layers have alpha channels, most of the layers have masks, and there are a couple extra channels (saved masks) in the channels dialog.
Ignoring the masks and the extra saved channels, and assuming I'm doing the right math, this would be:
4000*6000*7*32*4 = 21 504 000 000 / 1024^3 ~ 20 GB
As an experiment, I followed Simon's suggestion and made a new document, same size, 7 layers, and filled each layer with random LCH noise. This brought the %m value up to 4.8 GB.
Removing all operations from the Undo history brought the %m value down to 2.9 GB. Saved to disk, the size is 1.6 GB. I guess random noise also compresses?
Best,
Elle
Size on disk vs size reported on status bar
On 10/03/2017 09:40 PM, Jehan Pagès wrote:
Hi again!
On Tue, Oct 3, 2017 at 8:18 PM, Elle Stone wrote:
On 10/03/2017 01:21 PM, Simon Budig wrote:
The current generation XCF files can contain compressed image data.
When I save XCF files I don't use the option in the Save dialog to compress the data. Is there some other compression going on automatically?
Yes, XCF historically uses RLE compression, which is basic hence not great, but still enough for a very simple image with a lot of identical (i.e. same color) adjacent pixels (as I understand your image is).
The option in the save dialog is about activating the newer zlib compression instead, which is much more sophisticated and should therefore result in even smaller files. But on very simple files, simple compression can work great too (sometimes even better than complex compression, that's rare but it may happen).
Hi Jehan,
Thanks! for this information. I didn't realize any compression was being used if the option in the save dialog wasn't selected.
Saving large files can take a long time. Does anyone know how much time percent-wise is literally writing the file out (well, I suppose this would partly depend on the type of disk and the file system type), versus how much time is spent doing the compression before the file is written? How large would the XCF file would be if there were no compression at all before writing the file to disk, compared to the compressed size?
Best,
Elle
Size on disk vs size reported on status bar
Hi,
On Wed, Oct 4, 2017 at 1:09 PM, Elle Stone wrote:
On 10/03/2017 09:40 PM, Jehan Pagès wrote:
Hi again!
On Tue, Oct 3, 2017 at 8:18 PM, Elle Stone wrote:
On 10/03/2017 01:21 PM, Simon Budig wrote:
The current generation XCF files can contain compressed image data.
When I save XCF files I don't use the option in the Save dialog to compress
the data. Is there some other compression going on automatically?Yes, XCF historically uses RLE compression, which is basic hence not great, but still enough for a very simple image with a lot of identical (i.e. same color) adjacent pixels (as I understand your image is).
The option in the save dialog is about activating the newer zlib compression instead, which is much more sophisticated and should therefore result in even smaller files. But on very simple files, simple compression can work great too (sometimes even better than complex compression, that's rare but it may happen).
Hi Jehan,
Thanks! for this information. I didn't realize any compression was being used if the option in the save dialog wasn't selected.
Saving large files can take a long time. Does anyone know how much time percent-wise is literally writing the file out (well, I suppose this would partly depend on the type of disk and the file system type), versus how much time is spent doing the compression before the file is written?
No but that would be easy to time it by timing the code if you really
wanted to. :-)
In any case, RLE is much faster than zlib which is why it is still
disabled by default because it takes too much time, though it also
compresses better.
How large
would the XCF file would be if there were no compression at all before writing the file to disk, compared to the compressed size?
If there were no compression at all, the size would be the size in memory (well minus the history, which takes space in memory as you noted, but which we don't save in the XCF). There is no "generic" percentage for the compressed size. It all depends on the file contents.
Removing all operations from the Undo history brought the %m value down to 2.9 GB. Saved to disk, the size is 1.6 GB. I guess random noise also compresses?
I assume you meant with the default compression (RLE)? If so, of course, though it will be very limited for random noise. RLE is an extremely simple compression which will compress repeated *sequential* pixels. For instance if the pixel stream is C1 C2 C2 C2 C2 C2 C1 C2, RLE would compress as: C1 5 C2 C1 C2. This works much better on real life images which would have color areas and even better on drawn images which have a lot more and bigger perfectly flat colors. But on a perfectly noisy images where each pixel is different to the previous one, RLE would just compress absolutely nothing. In your generated noisy image though, there may be some repeatition here and there. But zlib should perform much better in such a case.
Jehan
Best,
Elle
ZeMarmot open animation film http://film.zemarmot.net Patreon: https://patreon.com/zemarmot Tipeee: https://www.tipeee.com/zemarmot
Size on disk vs size reported on status bar
Hi again!
On Wed, Oct 4, 2017 at 1:01 PM, Elle Stone wrote:
On 10/03/2017 09:33 PM, Jehan Pagès wrote:
I don't know how this info is computed in GIMP code (i.e. does it check for actual memory size or does it do some basic maths as I am about to do), but assuming your image is 8 bits per channel, and all 7 layers are the size of the image and have no alpha, that makes: 4000*6000*7*24 = 4 032 000 000 / 1024^3 ~ 3.8GB!:-)
Hi,
Actually the image is at 32-bit floating point precision, all the layers have alpha channels, most of the layers have masks, and there are a couple extra channels (saved masks) in the channels dialog.
Ignoring the masks and the extra saved channels, and assuming I'm doing the right math, this would be:
4000*6000*7*32*4 = 21 504 000 000 / 1024^3 ~ 20 GB
Sorry I was stupid earlier. By multiplying by 24 (or here 32), we get
the size in bits, we want in bytes (instead, we must multiply by 3
bytes for 32 bits). So a single mono-channel image should be
4000*6000*3 bytes.
So assuming each layer has alpha and mask, that makes 5 channels per
layer and each channel uses 3 bytes:
4000*6000*3*7*5 ~ 2.34 GB.
Hopefully I didn't make too many errors this time. :P You also said that there were a few extra channels. Selection is a channel as well, so if you have one, it must be taken into account. This said, I am wondering if the channels are all 3-bytes precision in a 32-bit image?
Jehan
As an experiment, I followed Simon's suggestion and made a new document, same size, 7 layers, and filled each layer with random LCH noise. This brought the %m value up to 4.8 GB.
Removing all operations from the Undo history brought the %m value down to 2.9 GB. Saved to disk, the size is 1.6 GB. I guess random noise also compresses?
Best,
Elle
ZeMarmot open animation film http://film.zemarmot.net Patreon: https://patreon.com/zemarmot Tipeee: https://www.tipeee.com/zemarmot
Size on disk vs size reported on status bar
Hi!
On Wed, Oct 4, 2017 at 2:28 PM, Jehan Pagès wrote:
Hi again!
On Wed, Oct 4, 2017 at 1:01 PM, Elle Stone wrote:
On 10/03/2017 09:33 PM, Jehan Pagès wrote:
I don't know how this info is computed in GIMP code (i.e. does it check for actual memory size or does it do some basic maths as I am about to do), but assuming your image is 8 bits per channel, and all 7 layers are the size of the image and have no alpha, that makes: 4000*6000*7*24 = 4 032 000 000 / 1024^3 ~ 3.8GB!:-)
Hi,
Actually the image is at 32-bit floating point precision, all the layers have alpha channels, most of the layers have masks, and there are a couple extra channels (saved masks) in the channels dialog.
Ignoring the masks and the extra saved channels, and assuming I'm doing the right math, this would be:
4000*6000*7*32*4 = 21 504 000 000 / 1024^3 ~ 20 GB
Sorry I was stupid earlier. By multiplying by 24 (or here 32), we get the size in bits, we want in bytes (instead, we must multiply by 3 bytes for 32 bits). So a single mono-channel image should be 4000*6000*3 bytes.
So assuming each layer has alpha and mask, that makes 5 channels per layer and each channel uses 3 bytes: 4000*6000*3*7*5 ~ 2.34 GB.
Ok so I meant 4 bytes of course! I can't even do basic maths right anymore! :P
4000*6000*4*7*5 ~ 3.2G.
Hopefully I didn't make too many errors this time. :P You also said that there were a few extra channels. Selection is a channel as well, so if you have one, it must be taken into account. This said, I am wondering if the channels are all 3-bytes precision in a 32-bit image?
Ok so I had a quick look at the code. For the size displayed in the status bar, it seems it would use the size as returned by each GimpObject through the get_memsize() virtual method: https://git.gnome.org/browse/gimp/tree/app/core/gimpobject.c#n445
You can get an idea of the expected size by looking at GimpTemplate code which is used to guess the future memory size (to warn before image creation if the expected size will be big): https://git.gnome.org/browse/gimp/tree/app/core/gimptemplate.c#n426
We can see that the expected size seems to be based on the initial layer + the selection + the projection (which apparently always has alpha and multiplies the result by 1.33: https://git.gnome.org/browse/gimp/tree/app/core/gimpprojection.c#n326). So for your image with 7 layers, alpha (and assuming all with mask), following this estimation algorithm: 4000*6000*4*7*5 + 4000*6000*4 + 4000*6000*16*1.33 ~ 3.7GB.
This estimation is quite close to the actual 3.8GB GIMP gives you. With the few more channels you said you had, I guess that's about it. :-)
Jehan
Jehan
As an experiment, I followed Simon's suggestion and made a new document, same size, 7 layers, and filled each layer with random LCH noise. This brought the %m value up to 4.8 GB.
Removing all operations from the Undo history brought the %m value down to 2.9 GB. Saved to disk, the size is 1.6 GB. I guess random noise also compresses?
Best,
Elle--
ZeMarmot open animation film
http://film.zemarmot.net
Patreon: https://patreon.com/zemarmot Tipeee: https://www.tipeee.com/zemarmot
ZeMarmot open animation film http://film.zemarmot.net Patreon: https://patreon.com/zemarmot Tipeee: https://www.tipeee.com/zemarmot
Size on disk vs size reported on status bar
On 10/04/2017 09:02 AM, Jehan Pagès wrote:
Hi!
On Wed, Oct 4, 2017 at 2:28 PM, Jehan Pagès wrote:
Hi again!
On Wed, Oct 4, 2017 at 1:01 PM, Elle Stone wrote:
Actually the image is at 32-bit floating point precision, all the layers have alpha channels, most of the layers have masks, and there are a couple extra channels (saved masks) in the channels dialog.
Ignoring the masks and the extra saved channels, and assuming I'm doing the right math, this would be:
4000*6000*7*32*4 = 21 504 000 000 / 1024^3 ~ 20 GB
Sorry I was stupid earlier. By multiplying by 24 (or here 32), we get the size in bits, we want in bytes (instead, we must multiply by 3 bytes for 32 bits). So a single mono-channel image should be 4000*6000*3 bytes.
So assuming each layer has alpha and mask, that makes 5 channels per layer and each channel uses 3 bytes: 4000*6000*3*7*5 ~ 2.34 GB.Ok so I meant 4 bytes of course! I can't even do basic maths right anymore! :P
4000*6000*4*7*5 ~ 3.2G.
Hi Jehan,
So a byte is 8 times a bit? For people like me who can never remember the difference between a byte and a bit, is there a one-sentence explanation for why there are bytes *and* bits?
Ok so I had a quick look at the code. For the size displayed in the status bar, it seems it would use the size as returned by each GimpObject through the get_memsize() virtual method: https://git.gnome.org/browse/gimp/tree/app/core/gimpobject.c#n445
You can get an idea of the expected size by looking at GimpTemplate code which is used to guess the future memory size (to warn before image creation if the expected size will be big): https://git.gnome.org/browse/gimp/tree/app/core/gimptemplate.c#n426
We can see that the expected size seems to be based on the initial layer + the selection + the projection (which apparently always has alpha and multiplies the result by 1.33: https://git.gnome.org/browse/gimp/tree/app/core/gimpprojection.c#n326). So for your image with 7 layers, alpha (and assuming all with mask), following this estimation algorithm: 4000*6000*4*7*5 + 4000*6000*4 + 4000*6000*16*1.33 ~ 3.7GB.
This estimation is quite close to the actual 3.8GB GIMP gives you. With the few more channels you said you had, I guess that's about it. :-)
Thanks! That's a nice explanation! This might be useful to put in the user's manual, if it isn't already there.
Best, Elle
Size on disk vs size reported on status bar
Hi!
On Wed, Oct 4, 2017 at 3:51 PM, Elle Stone wrote:
On 10/04/2017 09:02 AM, Jehan Pagès wrote:
Hi!
On Wed, Oct 4, 2017 at 2:28 PM, Jehan Pagès wrote:
Hi again!
On Wed, Oct 4, 2017 at 1:01 PM, Elle Stone wrote:
Actually the image is at 32-bit floating point precision, all the layers have alpha channels, most of the layers have masks, and there are a couple
extra channels (saved masks) in the channels dialog.Ignoring the masks and the extra saved channels, and assuming I'm doing the
right math, this would be:4000*6000*7*32*4 = 21 504 000 000 / 1024^3 ~ 20 GB
Sorry I was stupid earlier. By multiplying by 24 (or here 32), we get the size in bits, we want in bytes (instead, we must multiply by 3 bytes for 32 bits). So a single mono-channel image should be 4000*6000*3 bytes.
So assuming each layer has alpha and mask, that makes 5 channels per layer and each channel uses 3 bytes: 4000*6000*3*7*5 ~ 2.34 GB.Ok so I meant 4 bytes of course! I can't even do basic maths right anymore! :P
4000*6000*4*7*5 ~ 3.2G.
Hi Jehan,
So a byte is 8 times a bit?
Nowadays, on all common hardware, yes. That depends on the hardware, the processor in particular. And basically this is the smallest memory size which you can manipulate (we cannot just grab a single pixel). I am told there used to be hardware where bytes could be other value than 8 but I never saw such a hardware in my life. And I'm pretty sure most software nowadays assume that 1 byte == 8 bits and would likely crash horribly on a different kind of hardware (if even they can build). :-)
For people like me who can never remember the difference between a byte and a bit, is there a one-sentence explanation for why there are bytes *and* bits?
Well a bit is just 2 values (0 or 1). But as I said, we can't manipulate bits. We only manipulate bytes. This is hardware limitation. This i the reason why there are bytes and bits.
Ok so I had a quick look at the code. For the size displayed in the status bar, it seems it would use the size as returned by each GimpObject through the get_memsize() virtual method: https://git.gnome.org/browse/gimp/tree/app/core/gimpobject.c#n445
You can get an idea of the expected size by looking at GimpTemplate code which is used to guess the future memory size (to warn before image creation if the expected size will be big): https://git.gnome.org/browse/gimp/tree/app/core/gimptemplate.c#n426
We can see that the expected size seems to be based on the initial layer + the selection + the projection (which apparently always has alpha and multiplies the result by 1.33: https://git.gnome.org/browse/gimp/tree/app/core/gimpprojection.c#n326). So for your image with 7 layers, alpha (and assuming all with mask), following this estimation algorithm: 4000*6000*4*7*5 + 4000*6000*4 + 4000*6000*16*1.33 ~ 3.7GB.
This estimation is quite close to the actual 3.8GB GIMP gives you. With the few more channels you said you had, I guess that's about it. :-)
Thanks! That's a nice explanation! This might be useful to put in the user's manual, if it isn't already there.
Maybe. But I'm not sure it is useful to go into that much details which is really implementation dependent. Just saying that the image in memory is uncompressed and would use a lot more memory than on disk should be enough, no? We could add some example computation, but I think a less accurate computation like my initial ones (basically stop at: a single layer image size, with alpha and 32-bit per pixel, would be: width * height * 4 * 4) would be more useful to grab the concepts rather than adding stuff like projection and multiplying by 1.33 because of pyramid levels which is like cryptic language.
Anyway feel free to check if the manual already has such text and propose some additional text if missing. :-)
Jehan
P.S.: I was also wrong when dividing by 1024^3 to get GB. Our code uses g_format_size() which says that 1kB == 1000 B. This is another very annoying historical thing about bytes is that originally their multiples used to be converted in different ways (depending on whether you were the technical department or the marketing one! :P). Since it made no sense (kilo should always be 1000, that's international standard), the kibi has been invented which is the binary prefix for multiples of 1024. So our code is good on the new standard, and I was wrong on using 1024 instead of 1000. This makes my estimation about 3.9 GB and not 3.7 in the end so a bit more than the 3.8 you actually see in GIMP. But well… that was just an estimation and is close enough. ;-)
Best,
Elle
ZeMarmot open animation film http://film.zemarmot.net Patreon: https://patreon.com/zemarmot Tipeee: https://www.tipeee.com/zemarmot
Size on disk vs size reported on status bar
For people like me who can never remember the difference between a byte and a bit, is there a one-sentence explanation for why there are bytes *and* bits?
As Jehan mentioned, a bit is simply a holder for one 0 or 1 value (a yes or no answer). A byte is 8 of those answers. Why 8? Old ( and weird, possibly arbitrary ) hardware limitations. IIRC, to get a character like 'A', 'B', 'C', etc to display we had to use bytes of bits, and that's why we're still counting them today ;)
Chris
Size on disk vs size reported on status bar
Le 04/10/2017 à 15:51, Elle Stone a écrit :
So a byte is 8 times a bit? For people like me who can never remember the difference between a byte and a bit, is there a one-sentence explanation for why there are bytes *and* bits?
Well letter 'i' is obviously more simple than letter y, and its shape is dead sImple : that's a good hint.
So a possible one-sentence explanation : A Y (as in byte) is much more complex a glyph thant a I (as in bit), 8 times more complex.
NOtice also that I glyph is very close in shape to 1 and 1 is the only possible bit value, along with 0 that is none...
Sure one can create some nice one liner poetry. JL
Size on disk vs size reported on status bar
Le 04/10/2017 à 15:51, Elle Stone a écrit :
So a byte is 8 times a bit? For people like me who can never remember the difference between a byte and a bit, is there a one-sentence explanation for why there are bytes *and* bits?
Perhaps a food analogy might help you remember the difference between byte and bit.
If you aren't hungry you may just nibble a bit (bit) at your food but when you are hungry you will be take a bite (byte) of the food.
From that you should be able to visualize the difference that is smaller than a byte.
Size on disk vs size reported on status bar
On Fri, Nov 24, 2017 at 1:06 PM, Kevin Cozens wrote:
Le 04/10/2017 à 15:51, Elle Stone a écrit :
So a byte is 8 times a bit? For people like me who can never remember the difference between a byte and a bit, is there a one-sentence explanation for why there are bytes *and* bits?
Perhaps a food analogy might help you remember the difference between byte and bit.
I believe this is the proper analogy.
1 binary digit is 1 bit: B_inary dig_IT. Values are 0 and 1.
4 bits is a nybble, and has the values 0 thru 15 (2^4-1). These are coded
as 0-9,A-F in hexadecimal (6+10).
8 bits or 2 nybbles are a byte, with the values 0-255 (2^8-1), coded as 00
thru FF. Also called an octet.*
In C a stray backslash may cause you to accidentally run into octal numbers, being the 3 bits 0-7 representing base-8, but these are best left in the dustbin of history.
Chris
* Today, a byte being 8-bits is mostly by convention. In the dark ages of computing there were also 6-bit bytes (2 octals, being 2 of the 3-bit octal numbers 0-7, not the 8-bit octets) and other odd combinations that are also best forgotten.
Size on disk vs size reported on status bar
On 11/24/17 19:06, Kevin Cozens wrote:
Le 04/10/2017 à 15:51, Elle Stone a écrit :
So a byte is 8 times a bit? For people like me who can never remember the difference between a byte and a bit, is there a one-sentence explanation for why there are bytes *and* bits?
Perhaps a food analogy might help you remember the difference between byte and bit.
If you aren't hungry you may just nibble a bit (bit) at your food
Actually a "nibble" is 4 bits : https://en.wikipedia.org/wiki/Nibble