RSS/Atom feed Twitter
Site is read-only, email is disabled

gegl-vips

This discussion is connected to the gimp-developer-list.gnome.org mailing list which is provided by the GIMP developers and not related to gimpusers.com.

This is a read-only list on gimpusers.com so this discussion thread is read-only, too.

8 of 8 messages available
Toggle history

Please log in to manage your subscriptions.

gegl-vips jcupitt@gmail.com 17 Apr 09:22
  gegl-vips " 17 Apr 13:24
   Fwd: [Gimp-developer] gegl-vips " 17 Apr 13:26
   gegl-vips jcupitt@gmail.com 17 Apr 20:40
    gegl-vips " 17 Apr 22:28
     gegl-vips jcupitt@gmail.com 18 Apr 16:49
    gegl-vips " 17 Apr 22:28
  Fwd: [Gimp-developer] gegl-vips " 17 Apr 13:25
jcupitt@gmail.com
2011-04-17 09:22:50 UTC (about 14 years ago)

gegl-vips

Hi all,

I've had a stab at a quick hack of gegl-0.1.6 to use libvips (another demand-driven image processing library) as the backend for batch processing. I think it is maybe an interesting way to look at gegl performance, for this use case at least.

https://github.com/jcupitt/gegl-vips

This has some very strong limitations. First, it will not work efficiently with interactive destructive operations, like "paint a line". This would need area cache invalidation in vips, which is a way off. Secondly, I've only implemented a few operations (load / crop / affine / unsharp / save / process), so all you can do is some very basic batch processing. It should work for dynamic graphs (change the parameters on a node and just downstream nodes will recalculate) but it'd need a "display" node to be able to test that.

There's a README with some more detail on how it works, a test program and some timings.

If I run the test program linked against gegl-0.1.6 on a 5,000 x 5,000 pixel RGB PNG image on my laptop (a c2d at 2.4GHz), I get 96s real, 44s user. I tried experimenting with various settings for GEGL_SWAP and friends, but I couldn't get it to go faster than that, I probably missed something. Perhaps gegl's disk cache plus my slow laptop harddrive are slowing it down.

Linked against gegl-vips with the operations set to exactly match gegl's processing, the same thing runs in 27s real, 38s user. So it looks like some tuning of the disc cache, or maybe even turning it off for batch processing, where you seldom need pixels more than once, could give gegl a very useful speedup here. libvips has a threading system which is on by default and does double-buffered write-behind, which also help.

If you use uncompressed tiff, you can save a further 15s off the runtime. libpng compression is slow, and even with compression off, file write is sluggish.

The alpha channel is not needed in this case, dropping it saves about 5s real time.

babl converts to linear float and back with exp() and log(). Using lookup tables instead saves 12s.

The gegl unsharp operator is implemented as gblur/sub/mul/add. These are all linear operations, so you can fold the maths into a single convolution. Redoing unsharp as a separable convolution saves 1s.

Finally, we don't really need 16-bit output here, 8 is fine. This saves only 0.5s for tiff, but 8s for PNG.

Putting all these together, you get the same program running in 2.3s real, 4s user. This is still using linear float light internally. If you switch to a full 8-bit path you get 1s real, 1.5s user. I realise gegl is committed to float, but it's interesting to put a number on the cost.

Does this sound useful? I think it's maybe a way to weight the benefits of the various possible optimisations. I might try running the tests on a machine with a faster hard disk.

John

"
2011-04-17 13:24:55 UTC (about 14 years ago)

gegl-vips

Thank you for taking a serious look at GEGL, I've trimmed away the bits relating to the VIPS backend and rather focus on the performance numbers you get out and will try to explain them.

On Sun, Apr 17, 2011 at 10:22 AM, wrote:

Linked against gegl-vips with the operations set to exactly match gegl's processing, the same thing runs in 27s real, 38s user. So it looks like some tuning of the disc cache, or maybe even turning it off for batch processing, where you seldom need pixels more than once, could give gegl a very useful speedup here. libvips has a threading system which is on by default and does double-buffered write-behind, which also help.

On my c2d 1.86ghz laptop I get 105s real 41s user with default settings. Setting GEGL_SWAP=RAM in the environment to turn off the disk swapping of tiles makes it run in 43s real 41s user. With the default settings GEGL will start swapping when using more than 128mb of memory for buffers, this limit can be increased by setting for instance GEGL_CACHE_SIZE=1024 to not start swapping until 1gb of memory is in use. This leads to similar behavior, the tile backend of GEGL is using reads and writes on the tiles, using mmaping instead could increase the performance.

If you use uncompressed tiff, you can save a further 15s off the runtime. libpng compression is slow, and even with compression off, file write is sluggish.

Loading a png into a tiled buffer as used by GeglBuffer is kind of bound to be slow, at the moment GEGL doesnt have a native TIFF loader, if the resources were spent on writing a proper TIFF backend to GeglBuffer GEGL would be able to lazily swap in the image data from TIFF files as needed.

babl converts to linear float and back with exp() and log(). Using lookup tables instead saves 12s.

If the original PNG was 8bit, babl should have a valid fast path for using lookup tables converting it to 32bit linear. For most other conversions involved in this process babl would likely fall back to reference conversions that go via 64bit floating point; and processes each pixel with lots of logic perutating components etc. By adding/fixing the fast paths in babl to match the reference conversion a lot of the time spent converting pixels in this test should vanish.

The gegl unsharp operator is implemented as gblur/sub/mul/add. These are all linear operations, so you can fold the maths into a single convolution. Redoing unsharp as a separable convolution saves 1s.

For smaller radiuses this is fine, for larger ones it is not, ideally GEGL would be doing what is optimal behind the users back.

Finally, we don't really need 16-bit output here, 8 is fine. This saves only 0.5s for tiff, but 8s for PNG.

Making the test case you used save to 8bit PNG instead gives me 34s real and 33s user. I am not entirely sure if babl has a 32bit float -> 8bit nonlinear RGBA conversion, it might just be libpngs data throughput that makes this difference.

save = gegl_node_new_child (gegl, "operation", "gegl:png-save", "bitdepth", 8, "path", argv[2], NULL);

Putting all these together, you get the same program running in 2.3s real, 4s user. This is still using linear float light internally. If you switch to a full 8-bit path you get 1s real, 1.5s user. I realise gegl is committed to float, but it's interesting to put a number on the cost.

This type of benchmark really stress tests the file loading/saving parts of code where I am fully aware that GEGL is far from optimal, but it is also something that doesn't in any way reflect GIMPs _current_ use of GEGL which involves converting 8bit data to and from float with some very specific formats and then only doing raw processing. This will of course change in the future.

Does this sound useful? I think it's maybe a way to weight the benefits of the various possible optimisations. I might try running the tests on a machine with a faster hard disk.

It is useful, but it would perhaps be even more useful to see similar results for a test where the loading/saving is taken out of the benchmark
and measure raw image data crunching.

Setting GEGL_SWAP=RAM, BABL_TOLERANCE=0.02 in the environment to make babl be lenient with the error introduced by its fast paths I run the test in, it should be possible to fix the fast paths in babl to be correct enough to pass the current stricter criteria for use; and thus get these results without lowering standards. Even adding slightly faster but guaranteed to be correct 8bit/16bit float conversions would likely improve this type of benchmarking.

16bit output: real: 28.3s user: 26.9s 8bit output: real: 25.1s user: 23.6s

Thank you for looking at this - and I do hope my comments above help explain some of the reasons for the slower processing.

/Øyvind K.

«The future is already here. It's just not very evenly distributed»
                                                 -- William Gibson
http://pippin.gimp.org/                            http://ffii.org/
_______________________________________________
Gimp-developer mailing list
Gimp-developer@lists.XCF.Berkeley.EDU
https://lists.XCF.Berkeley.EDU/mailman/listinfo/gimp-developer
"
2011-04-17 13:25:46 UTC (about 14 years ago)

Fwd: [Gimp-developer] gegl-vips

---------- Forwarded message ---------- From:
Date: Sun, Apr 17, 2011 at 10:22 AM
Subject: [Gimp-developer] gegl-vips
To: gimp-developer

Hi all,

I've had a stab at a quick hack of gegl-0.1.6 to use libvips (another demand-driven image processing library) as the backend for batch processing. I think it is maybe an interesting way to look at gegl performance, for this use case at least.

https://github.com/jcupitt/gegl-vips

This has some very strong limitations. First, it will not work efficiently with interactive destructive operations, like "paint a line". This would need area cache invalidation in vips, which is a way off. Secondly, I've only implemented a few operations (load / crop / affine / unsharp / save / process), so all you can do is some very basic batch processing. It should work for dynamic graphs (change the parameters on a node and just downstream nodes will recalculate) but it'd need a "display" node to be able to test that.

There's a README with some more detail on how it works, a test program and some timings.

If I run the test program linked against gegl-0.1.6 on a 5,000 x 5,000 pixel RGB PNG image on my laptop (a c2d at 2.4GHz), I get 96s real, 44s user. I tried experimenting with various settings for GEGL_SWAP and friends, but I couldn't get it to go faster than that, I probably missed something. Perhaps gegl's disk cache plus my slow laptop harddrive are slowing it down.

Linked against gegl-vips with the operations set to exactly match gegl's processing, the same thing runs in 27s real, 38s user. So it looks like some tuning of the disc cache, or maybe even turning it off for batch processing, where you seldom need pixels more than once, could give gegl a very useful speedup here. libvips has a threading system which is on by default and does double-buffered write-behind, which also help.

If you use uncompressed tiff, you can save a further 15s off the runtime. libpng compression is slow, and even with compression off, file write is sluggish.

The alpha channel is not needed in this case, dropping it saves about 5s real time.

babl converts to linear float and back with exp() and log(). Using lookup tables instead saves 12s.

The gegl unsharp operator is implemented as gblur/sub/mul/add. These are all linear operations, so you can fold the maths into a single convolution. Redoing unsharp as a separable convolution saves 1s.

Finally, we don't really need 16-bit output here, 8 is fine. This saves only 0.5s for tiff, but 8s for PNG.

Putting all these together, you get the same program running in 2.3s real, 4s user. This is still using linear float light internally. If you switch to a full 8-bit path you get 1s real, 1.5s user. I realise gegl is committed to float, but it's interesting to put a number on the cost.

Does this sound useful? I think it's maybe a way to weight the benefits of the various possible optimisations. I might try running the tests on a machine with a faster hard disk.

John

Gimp-developer mailing list
Gimp-developer@lists.XCF.Berkeley.EDU
https://lists.XCF.Berkeley.EDU/mailman/listinfo/gimp-developer



-- 
«The future is already here. It's just not very evenly distributed»
                                                 -- William Gibson
http://pippin.gimp.org/                            http://ffii.org/
_______________________________________________
Gegl-developer mailing list
Gegl-developer@lists.XCF.Berkeley.EDU
https://lists.XCF.Berkeley.EDU/mailman/listinfo/gegl-developer
"
2011-04-17 13:26:50 UTC (about 14 years ago)

Fwd: [Gimp-developer] gegl-vips

I didn't see that the discussion was on gimp-devel, forwarding my reply as well, in case someone are following GEGL developer/digging in its archives.

---------- Forwarded message ---------- From: Øyvind Kolås
Date: Sun, Apr 17, 2011 at 2:24 PM
Subject: Re: [Gimp-developer] gegl-vips To: jcupitt@gmail.com
Cc: gimp-developer

Thank you for taking a serious look at GEGL, I've trimmed away the bits relating to the VIPS backend and rather focus on the performance numbers you get out and will try to explain them.

On Sun, Apr 17, 2011 at 10:22 AM,   wrote:

Linked against gegl-vips with the operations set to exactly match gegl's processing, the same thing runs in 27s real, 38s user. So it looks like some tuning of the disc cache, or maybe even turning it off for batch processing, where you seldom need pixels more than once, could give gegl a very useful speedup here. libvips has a threading system which is on by default and does double-buffered write-behind, which also help.

On my c2d 1.86ghz laptop I get 105s real 41s user with default settings. Setting GEGL_SWAP=RAM in the environment to turn off the disk swapping of tiles makes it run in 43s real 41s user. With the default settings GEGL will start swapping when using more than 128mb of memory for buffers, this limit can be increased by setting for instance GEGL_CACHE_SIZE=1024 to not start swapping until 1gb of memory is in use. This leads to similar behavior, the tile backend of GEGL is using reads and writes on the tiles, using mmaping instead could increase the performance.

If you use uncompressed tiff, you can save a further 15s off the runtime. libpng compression is slow, and even with compression off, file write is sluggish.

Loading a png into a tiled buffer as used by GeglBuffer is kind of bound to be slow, at the moment GEGL doesnt have a native TIFF loader, if the resources were spent on writing a proper TIFF backend to GeglBuffer GEGL would be able to lazily swap in the image data from TIFF files as needed.

babl converts to linear float and back with exp() and log(). Using lookup tables instead saves 12s.

If the original PNG was 8bit, babl should have a valid fast path for using lookup tables converting it to 32bit linear. For most other conversions involved in this process babl would likely fall back to reference conversions that go via 64bit floating point; and processes each pixel with lots of logic perutating components etc. By adding/fixing the fast paths in babl to match the reference conversion a lot of the time spent converting pixels in this test should vanish.

The gegl unsharp operator is implemented as gblur/sub/mul/add. These are all linear operations, so you can fold the maths into a single convolution. Redoing unsharp as a separable convolution saves 1s.

For smaller radiuses this is fine, for larger ones it is not, ideally GEGL would be doing what is optimal behind the users back.

Finally, we don't really need 16-bit output here, 8 is fine. This saves only 0.5s for tiff, but 8s for PNG.

Making the test case you used save to 8bit PNG instead gives me 34s real and 33s user. I am not entirely sure if babl has a 32bit float -> 8bit nonlinear RGBA conversion, it might just be libpngs data throughput that makes this difference.

 save = gegl_node_new_child (gegl,                              "operation", "gegl:png-save",                              "bitdepth", 8,                              "path", argv[2],                              NULL);

Putting all these together, you get the same program running in 2.3s real, 4s user. This is still using linear float light internally. If you switch to a full 8-bit path you get 1s real, 1.5s user. I realise gegl is committed to float, but it's interesting to put a number on the cost.

This type of benchmark really stress tests the file loading/saving parts of code where I am fully aware that GEGL is far from optimal, but it is also something that doesn't in any way reflect GIMPs _current_ use of GEGL which involves converting 8bit data to and from float with some very specific formats and then only doing raw processing. This will of course change in the future.

Does this sound useful? I think it's maybe a way to weight the benefits of the various possible optimisations. I might try running the tests on a machine with a faster hard disk.

It is useful, but it would perhaps be even more useful to see similar results for a test where the loading/saving is taken out of the benchmark
and measure raw image data crunching.

Setting GEGL_SWAP=RAM, BABL_TOLERANCE=0.02 in the environment to make babl be  lenient with the error introduced by its fast paths I run the test in, it should be possible to fix the fast paths in babl to be correct enough to pass the current stricter criteria for use; and thus get these results without lowering standards. Even adding slightly faster but guaranteed to be correct 8bit/16bit float conversions would likely improve this type of benchmarking.

16bit output:  real: 28.3s   user:  26.9s 8bit output:   real: 25.1s   user: 23.6s

Thank you for looking at this - and I do hope my comments above help explain some of the reasons for the slower processing.

/Øyvind K. --
«The future is already here. It's just not very evenly distributed»                                                  -- William Gibson http://pippin.gimp.org/                            http://ffii.org/

«The future is already here. It's just not very evenly distributed»
                                                 -- William Gibson
http://pippin.gimp.org/                            http://ffii.org/
_______________________________________________
Gegl-developer mailing list
Gegl-developer@lists.XCF.Berkeley.EDU
https://lists.XCF.Berkeley.EDU/mailman/listinfo/gegl-developer
jcupitt@gmail.com
2011-04-17 20:40:58 UTC (about 14 years ago)

gegl-vips

Hello Øyvind, thanks for the reply.

On 17 April 2011 14:24, Øyvind Kolås wrote:

On my c2d 1.86ghz laptop I get 105s real 41s user with default settings. Setting GEGL_SWAP=RAM in the environment to turn off the disk swapping of tiles makes it run in 43s real 41s user.

I found GEGL_SWAP=RAM, but on my laptop the process wandered off into swap death before finishing. Is there some way to limit mem use? I only have 2gb.

Loading a png into a tiled buffer as used by GeglBuffer is kind of bound to be slow, at the moment GEGL doesnt have a native TIFF loader,

You can work with tiled tiff straight from the file, but for sadly for striped tiff (as 90%+ are, groan) you have to unpack the whole file first :-(

babl converts to linear float and back with exp() and log(). Using lookup tables instead saves 12s.

If the original PNG was 8bit, babl should have a valid fast path for using lookup tables converting it to 32bit linear. For most other

OK, interesting, I shall look at the callgrind output again.

The gegl unsharp operator is implemented as gblur/sub/mul/add. These are all linear operations, so you can fold the maths into a single convolution. Redoing unsharp as a separable convolution saves 1s.

For smaller radiuses this is fine, for larger ones it is not, ideally GEGL would be doing what is optimal behind the users back.

Actually, it works for large radius as well. By separable convolution I mean doing a 1xn pass then a nx1 pass. You can "bake" the sub/mul/add into the coefficients you calculate in gblur.

It is useful, but it would perhaps be even more useful to see similar results for a test where the loading/saving is taken out of the benchmark
and measure raw image data crunching.

Yes, good point, it should be easy to instrument it for this kind of test. I'll have a go.

John

Gegl-developer mailing list
Gegl-developer@lists.XCF.Berkeley.EDU
https://lists.XCF.Berkeley.EDU/mailman/listinfo/gegl-developer
"
2011-04-17 22:28:39 UTC (about 14 years ago)

gegl-vips

On Sun, Apr 17, 2011 at 9:40 PM, wrote:

On 17 April 2011 14:24, Øyvind Kolås wrote:

On my c2d 1.86ghz laptop I get 105s real 41s user with default settings. Setting GEGL_SWAP=RAM in the environment to turn off the disk swapping of tiles makes it run in 43s real 41s user.

I found GEGL_SWAP=RAM, but on my laptop the process wandered off into swap death before finishing. Is there some way to limit mem use? I only have 2gb.

My laptop has 3gb of RAM and thus doesn't end up crunching swap on such a test.

Setting GEGL_CACHE_SIZE=1300 or so, should have a similar effect, hopefully GEGL wouldn't need to make everying swap. (not that in doing so you should _not_ set GEGL_SWAP=RAM). I noticed that setting GEGL_THREADS=anything_more_than_1 causes things to crash, along with other things that more subtly break.. are the reason GEGL doesnt default to keep all cores busy yet.

Loading a png into a tiled buffer as used by GeglBuffer is kind of bound to be slow, at the moment GEGL doesnt have a native TIFF loader,

You can work with tiled tiff straight from the file, but for sadly for striped tiff (as 90%+ are, groan) you have to unpack the whole file first :-(

I'm not sure what a striped tiff is, if it stores each scanline separately GeglBuffer could be able to load data directly from it by using 1px high tiles that are as wide as the image.

babl converts to linear float and back with exp() and log(). Using lookup tables instead saves 12s.

If the original PNG was 8bit, babl should have a valid fast path for using lookup tables converting it to 32bit linear. For most other

OK, interesting, I shall look at the callgrind output again.

I'd recommend setting the BABL_TOLERANCE=0.004 environment variable as well, to permit some fast paths with errors around or below 1.0/256 avoiding the rather computationally intensive synthetic reference conversion code in babl.

The gegl unsharp operator is implemented as gblur/sub/mul/add. These are all linear operations, so you can fold the maths into a single convolution. Redoing unsharp as a separable convolution saves 1s.

For smaller radiuses this is fine, for larger ones it is not, ideally GEGL would be doing what is optimal behind the users back.

Actually, it works for large radius as well. By separable convolution I mean doing a 1xn pass then a nx1 pass. You can "bake" the sub/mul/add into the coefficients you calculate in gblur.

I thought you meant hard-coded convultions similar to the crop-and-sharpen example, baking it into the convolution might be beneficial, though at the moment I see it as more important to make sure gaussian blur is as fast as possible since it is a primitive that both this, and dropshadow and other commonly employed compositing things are built from.

/Øyvind K.

«The future is already here. It's just not very evenly distributed»
                                                 -- William Gibson
http://pippin.gimp.org/                            http://ffii.org/
_______________________________________________
Gimp-developer mailing list
Gimp-developer@lists.XCF.Berkeley.EDU
https://lists.XCF.Berkeley.EDU/mailman/listinfo/gimp-developer
"
2011-04-17 22:28:39 UTC (about 14 years ago)

gegl-vips

On Sun, Apr 17, 2011 at 9:40 PM, wrote:

On 17 April 2011 14:24, Øyvind Kolås wrote:

On my c2d 1.86ghz laptop I get 105s real 41s user with default settings. Setting GEGL_SWAP=RAM in the environment to turn off the disk swapping of tiles makes it run in 43s real 41s user.

I found GEGL_SWAP=RAM, but on my laptop the process wandered off into swap death before finishing. Is there some way to limit mem use? I only have 2gb.

My laptop has 3gb of RAM and thus doesn't end up crunching swap on such a test.

Setting GEGL_CACHE_SIZE=1300 or so, should have a similar effect, hopefully GEGL wouldn't need to make everying swap. (not that in doing so you should _not_ set GEGL_SWAP=RAM). I noticed that setting GEGL_THREADS=anything_more_than_1 causes things to crash, along with other things that more subtly break.. are the reason GEGL doesnt default to keep all cores busy yet.

Loading a png into a tiled buffer as used by GeglBuffer is kind of bound to be slow, at the moment GEGL doesnt have a native TIFF loader,

You can work with tiled tiff straight from the file, but for sadly for striped tiff (as 90%+ are, groan) you have to unpack the whole file first :-(

I'm not sure what a striped tiff is, if it stores each scanline separately GeglBuffer could be able to load data directly from it by using 1px high tiles that are as wide as the image.

babl converts to linear float and back with exp() and log(). Using lookup tables instead saves 12s.

If the original PNG was 8bit, babl should have a valid fast path for using lookup tables converting it to 32bit linear. For most other

OK, interesting, I shall look at the callgrind output again.

I'd recommend setting the BABL_TOLERANCE=0.004 environment variable as well, to permit some fast paths with errors around or below 1.0/256 avoiding the rather computationally intensive synthetic reference conversion code in babl.

The gegl unsharp operator is implemented as gblur/sub/mul/add. These are all linear operations, so you can fold the maths into a single convolution. Redoing unsharp as a separable convolution saves 1s.

For smaller radiuses this is fine, for larger ones it is not, ideally GEGL would be doing what is optimal behind the users back.

Actually, it works for large radius as well. By separable convolution I mean doing a 1xn pass then a nx1 pass. You can "bake" the sub/mul/add into the coefficients you calculate in gblur.

I thought you meant hard-coded convultions similar to the crop-and-sharpen example, baking it into the convolution might be beneficial, though at the moment I see it as more important to make sure gaussian blur is as fast as possible since it is a primitive that both this, and dropshadow and other commonly employed compositing things are built from.

/Øyvind K.

«The future is already here. It's just not very evenly distributed»
                                                 -- William Gibson
http://pippin.gimp.org/                            http://ffii.org/
_______________________________________________
Gegl-developer mailing list
Gegl-developer@lists.XCF.Berkeley.EDU
https://lists.XCF.Berkeley.EDU/mailman/listinfo/gegl-developer
jcupitt@gmail.com
2011-04-18 16:49:44 UTC (about 14 years ago)

gegl-vips

On 17 April 2011 23:28, Øyvind Kolås wrote:

Setting GEGL_CACHE_SIZE=1300 or so, should have a similar effect, hopefully GEGL wouldn't need to make everying swap. (not that in doing so you should _not_ set GEGL_SWAP=RAM). I noticed that setting

I tried on a desktop machine with a fast harddrive and it's much quicker, thank you for your help. This is 2 x 2.7GHz Opterons 254s, 4gb of ram, 10k rpm hard drive:

gegl-0.1.6, default settings 50s r, 44s u gegl-0.1.6, BABL_TOLERANCE=0.004 46s r, 39s u gegl-0.1.6, BABL_TOLERANCE=0.004, GEGL_SWAP=RAM 41s r, 39s u (memuse peaks at 1.8gb) + 8-bit PNG output
38s r, 36s u

I'm puzzled as to why 8-bit output makes so little difference, you'd think there would be rather a large improvement. I experimented with GeglNode::dont_cache but I wasn't able to bring memuse down without slowing the program horribly.

I'm still not certain I'm hitting the fast BABL path though, if I run in callgrind I see 56m calls to pow(), 10% of total run time. 56m is roughly output image size * 3, if that helps track it down.

John

Gegl-developer mailing list
Gegl-developer@lists.XCF.Berkeley.EDU
https://lists.XCF.Berkeley.EDU/mailman/listinfo/gegl-developer