RSS/Atom feed Twitter
Site is read-only, email is disabled

GPU-accelerated Image Filtering w/ CUDA

This discussion is connected to the gimp-developer-list.gnome.org mailing list which is provided by the GIMP developers and not related to gimpusers.com.

This is a read-only list on gimpusers.com so this discussion thread is read-only, too.

8 of 9 messages available
Toggle history

Please log in to manage your subscriptions.

GPU-accelerated Image Filtering w/ CUDA Alan Reiner 29 Aug 20:13
  GPU-accelerated Image Filtering w/ CUDA Jon Nordby 29 Aug 20:50
   GPU-accelerated Image Filtering w/ CUDA Alan 03 Sep 02:37
    GPU-accelerated Image Filtering w/ CUDA Sven Neumann 04 Sep 21:44
  GPU-accelerated Image Filtering w/ CUDA Jacopo Corzani 29 Aug 20:51
AANLkTikZC5ARcnwjpyNXhtZO6H... 07 Oct 20:28
  GPU-accelerated Image Filtering w/ CUDA Alan Reiner 30 Aug 00:40
   GPU-accelerated Image Filtering w/ CUDA Øyvind Kolås 30 Aug 00:46
    GPU-accelerated Image Filtering w/ CUDA Dov Grobgeld 30 Aug 06:48
Alan Reiner
2010-08-29 20:13:13 UTC (over 14 years ago)

GPU-accelerated Image Filtering w/ CUDA

This is a long message, so let me start with the punchline: *I have a lot of CUDA code that harnesses a user's GPU to accelerate very tedious image processing operations, potentially 200x speedup. I am ready to donate this code to the GIMP project. This code can be run on Windows or Linux, and probably Mac, too.* *It only works on NVIDIA cards, but can detect at runtime whether the user has acceptable hardware, and disables itself if not.*

Hi all, I'm new here. I work on real-time image processing applications that must run at 60-240Hz, which is typically too fast for doing things like convolutions on large images. However, the new fad is to use CUDA to harness the parallel computing power of your graphics card to do computations, instead of just rendering graphics. The speed ups are phenomenal.

For instance, I implemented a basic convolution algorithm (blurring), which operates on a 4096x4096 image with a 15x15 kernel/PSF. On my CPU it took *27 seconds* (AMD Athlon X3 440). When running the identical algorithm in CUDA, I get it done in *0.1 to 0.25 seconds*, so between 110x to 250x speedup (NVIDIA GTX 460). Which side of the spectrum you are on depends on whether the memory already resides in the GPU device memory, of it needs to be copied in/out on each operation.

Any kind of operation that resembles convolution, such as edge detection, blurring, morphology operations, etc, are all highly parallelizable and ideal for GPU-acceleration. *I have a lot of this code already written for grayscale images, and can be donated to the GIMP project.* I would be interested to expand the code to work on color images (though, I suspect just doing it three times on each channel would probably not be ideal), and I don't think it will be that hard to integrate into the existing GIMP project (only a couple extra libraries need to be added for a user's computer to benefit from it).

Additionally, the CUDA comes with convenient functions for determining whether a user has a CUDA-enabled GPU, and can default to regular CPU operations if they don't have one. It can determine how many cards they have, select the fastest one, and adjust the function calls to accommodate older GPU cards. Therefore, I believe the code can safely be integrated and dynamically enable itself only if it can be used.

My solution is for any image size (within the limit of GPU memory), but the kernel/PSF size must be odd and no larger than 25x25. It's not to say larger kernel sizes can't be done in CUDA, but my solution is "elegant" for sizes smaller than that, due to having a limited amount of shared memory. I believe it will still work up to a 61x61 kernel but with substantial slowdown (though, probably still much faster than CPU). Beyond that, I believe a different algorithm is needed.

I have implemented basic convolution (which assumes 0s outside the edge of the image), bilateral filter (which is blurring without destroying edges), and most of the basic binary morphological operations (kernel-based erode, dilate, opening, closing). I believe it would be possible to develop a morphology plugin, that allows you to start with a binary image, and click buttons for erode, dilate, opening, etc, and have it respond immediately. This would allow someone to start with an image, and try lots of different combinations of morphological operations to determine if their problem can be solved with morphology (which usually requires a long and complex sequence of morph ops).

Unfortunately, I don't have much time to become a GIMP developer, but I feel like I can still contribute. I will happily develop the algorithms to be run on the GPU, as they will probably benefit my job, too (I'm open to suggestions for functions that operate on the whole image, but independently). And I can help with the linking to CUDA libraries, which NVIDIA claims can be done quickly by someone with no CUDA experience.

Please let me know if anyone is interested to work with me on this: etotheipi@gmail.com

-Alan

Jon Nordby
2010-08-29 20:50:46 UTC (over 14 years ago)

GPU-accelerated Image Filtering w/ CUDA

On 29 August 2010 20:13, Alan Reiner wrote:

This is a long message, so let me start with the punchline:  *I have a lot of CUDA code that harnesses a user's GPU to accelerate very tedious image processing operations, potentially 200x speedup.  I am ready to donate this code to the GIMP project.

Hi

The first thing you should do if you want an (open source) project to to able use your code is to provide the code. Preferably in the form of a public source control repository. Without this first step, nothing can happen. :)

Please let me know if anyone is interested to work with me on this: etotheipi@gmail.com

Please note that it is expected in open source projects that communication is kept in public channels (eg: a mailing list like this one) unless there is a very good reason not to. For this reason I've cc'ed the list in this reply, and I urge you to do the same.

Jacopo Corzani
2010-08-29 20:51:07 UTC (over 14 years ago)

GPU-accelerated Image Filtering w/ CUDA

On 08/29/2010 08:13 PM, Alan Reiner wrote:

This is a long message, so let me start with the punchline: *I have a lot of CUDA code that harnesses a user's GPU to accelerate very tedious image processing operations, potentially 200x speedup. I am ready to donate this code to the GIMP project. This code can be run on Windows or Linux, and probably Mac, too.* *It only works on NVIDIA cards, but can detect at runtime whether the user has acceptable hardware, and disables itself if not.*

Hi all, I'm new here. I work on real-time image processing applications that must run at 60-240Hz, which is typically too fast for doing things like convolutions on large images. However, the new fad is to use CUDA to harness the parallel computing power of your graphics card to do computations, instead of just rendering graphics. The speed ups are phenomenal.

For instance, I implemented a basic convolution algorithm (blurring), which operates on a 4096x4096 image with a 15x15 kernel/PSF. On my CPU it took *27 seconds* (AMD Athlon X3 440). When running the identical algorithm in CUDA, I get it done in *0.1 to 0.25 seconds*, so between 110x to 250x speedup (NVIDIA GTX 460). Which side of the spectrum you are on depends on whether the memory already resides in the GPU device memory, of it needs to be copied in/out on each operation.

Any kind of operation that resembles convolution, such as edge detection, blurring, morphology operations, etc, are all highly parallelizable and ideal for GPU-acceleration. *I have a lot of this code already written for grayscale images, and can be donated to the GIMP project.* I would be interested to expand the code to work on color images (though, I suspect just doing it three times on each channel would probably not be ideal), and I don't think it will be that hard to integrate into the existing GIMP project (only a couple extra libraries need to be added for a user's computer to benefit from it).

Additionally, the CUDA comes with convenient functions for determining whether a user has a CUDA-enabled GPU, and can default to regular CPU operations if they don't have one. It can determine how many cards they have, select the fastest one, and adjust the function calls to accommodate older GPU cards. Therefore, I believe the code can safely be integrated and dynamically enable itself only if it can be used.

My solution is for any image size (within the limit of GPU memory), but the kernel/PSF size must be odd and no larger than 25x25. It's not to say larger kernel sizes can't be done in CUDA, but my solution is "elegant" for sizes smaller than that, due to having a limited amount of shared memory. I believe it will still work up to a 61x61 kernel but with substantial slowdown (though, probably still much faster than CPU). Beyond that, I believe a different algorithm is needed.

I have implemented basic convolution (which assumes 0s outside the edge of the image), bilateral filter (which is blurring without destroying edges), and most of the basic binary morphological operations (kernel-based erode, dilate, opening, closing). I believe it would be possible to develop a morphology plugin, that allows you to start with a binary image, and click buttons for erode, dilate, opening, etc, and have it respond immediately. This would allow someone to start with an image, and try lots of different combinations of morphological operations to determine if their problem can be solved with morphology (which usually requires a long and complex sequence of morph ops).

Unfortunately, I don't have much time to become a GIMP developer, but I feel like I can still contribute. I will happily develop the algorithms to be run on the GPU, as they will probably benefit my job, too (I'm open to suggestions for functions that operate on the whole image, but independently). And I can help with the linking to CUDA libraries, which NVIDIA claims can be done quickly by someone with no CUDA experience.

Please let me know if anyone is interested to work with me on this: etotheipi@gmail.com

-Alan

Alan Reiner
2010-08-30 00:40:31 UTC (over 14 years ago)

GPU-accelerated Image Filtering w/ CUDA

I forgot that CUDA is not OSS. We don't have to worry about that because we only use it for in-house simulations. I only remembered it was free for such use.

I know that similar stuff can be done with OpenGL, but that's a completely different beast. There's also OpenCL but I don't know anything about that either. At least those two solutions should work on both NVIDIA and ATI, but I believe the code still needs to be tailored specifically for each architecture.

As for portability, I don't see that as a concern for any of these. For various platforms, it would be preprocessed out. For everything else it can detect and disable itself if it won't work on the resident card.

I might look a little bit into the OpenGL solution to see if that's feasible, but my understanding is that it's more archaic and not as powerful. And I personally don't have a reason to learn it. Perhaps one day when I have time to contribute directly to an OSS project.

-Alan

On Aug 29, 2010 2:51 PM, "Jacopo Corzani" wrote:

On 08/29/2010 08:13 PM, Alan Reiner wrote:

This is a long message, so let me start with the punchl...

Øyvind Kolås
2010-08-30 00:46:43 UTC (over 14 years ago)

GPU-accelerated Image Filtering w/ CUDA

On Sun, Aug 29, 2010 at 11:40 PM, Alan Reiner wrote:

I forgot that CUDA is not OSS.  We don't have to worry about that because we only use it for in-house simulations.  I only remembered it was free for such use.

I know that similar stuff can be done with OpenGL, but that's a completely different beast.  There's also OpenCL but I don't know anything about that either.   At least those two solutions should work on both NVIDIA and ATI, but I believe the code still needs to be tailored specifically for each architecture.

As for portability, I don't see that as a concern for any of these.  For various platforms, it would be preprocessed out.  For everything else it can detect and disable itself if it won't work on the resident card.

I might look a little bit into the OpenGL solution to see if that's feasible, but my understanding is that it's more archaic and not as powerful.  And I personally don't have a reason to learn it.  Perhaps one day when I have time to contribute directly to an OSS project.

Doing image processing on the GPU using OpenGL and GLSL for GIMPs next generation engine is planned and the initial proof of concept of such a system deeply integrated with GEGL exist in a branch of the git repository at http://git.gnome.org/browse/gegl/log/?h=gsoc2009-gpu , The approach taken there is to implement automatic migration of tiles between cpu and gpu.

/Øyvind K.

Dov Grobgeld
2010-08-30 06:48:06 UTC (over 14 years ago)

GPU-accelerated Image Filtering w/ CUDA

Alan,

You're code certainly sounds very useful, and I would love to see it open sourced. May I suggest, as was already stated, that you decide upon a license, find a name for your library, and then open a github ( http://github.com) account (or any other free hosting) where you upload the code. Whether it will be made part of gimp or not is a different issue, and I agree that you should introducing closed source dependencies for such a project is not a good idea.

Btw, there is an open standard for CUDA-like operations being developed, called OpenCL, but it is not very supported yet. See: http://en.wikipedia.org/wiki/OpenCL . Pehaps you want to investigate whether there is NVIDIA support for the operations that you use, and if so, recode the algorithms in OpenCL? But again, I would do the work in a separate repository in github.

Regards, Dov

On Mon, Aug 30, 2010 at 01:46, Øyvind Kolås wrote:

On Sun, Aug 29, 2010 at 11:40 PM, Alan Reiner wrote:

I forgot that CUDA is not OSS. We don't have to worry about that because

we

only use it for in-house simulations. I only remembered it was free for such use.

I know that similar stuff can be done with OpenGL, but that's a

completely

different beast. There's also OpenCL but I don't know anything about

that

either. At least those two solutions should work on both NVIDIA and

ATI,

but I believe the code still needs to be tailored specifically for each architecture.

As for portability, I don't see that as a concern for any of these. For various platforms, it would be preprocessed out. For everything else it

can

detect and disable itself if it won't work on the resident card.

I might look a little bit into the OpenGL solution to see if that's feasible, but my understanding is that it's more archaic and not as powerful. And I personally don't have a reason to learn it. Perhaps one day when I have time to contribute directly to an OSS project.

Doing image processing on the GPU using OpenGL and GLSL for GIMPs next generation engine is planned and the initial proof of concept of such a system deeply integrated with GEGL exist in a branch of the git repository at http://git.gnome.org/browse/gegl/log/?h=gsoc2009-gpu , The approach taken there is to implement automatic migration of tiles between cpu and gpu.

/Øyvind K.

Alan
2010-09-03 02:37:38 UTC (over 14 years ago)

GPU-accelerated Image Filtering w/ CUDA

Hi all,

It sounds like CUDA is not ideal for GIMP, but individuals on this list might be personally interested in it (it is free, just not OSS, might be good for plugin devel). I pushed my code into a git repo:

http://github.com/etotheipi/CUDA_Image_Convolution

Keep in mind that the state of the code is very immature in structure, but the algorithms are solid and work very fast if you have a CUDA-enabled graphics card. The readme has a lot of useful information

Some timings:

Basic convolution, erosion or dilation (on NVIDIA GTX 460): 4096x4096 image with 15x15 PSF/SE: 125 ms compute time ( 8 Hz)
4096x4096 image with 3x3 PSF/SE: 20 ms compute time ( 50 Hz)
2048x2048 image with 5x5 PSF/SE: 7.5 ms compute time ( 130 Hz)
512x512 image with 3x3 PSF/SE: 0.36 ms compute time (2750 Hz)

These timings are without memory transfers, which is somewhere between 1 GB/s and 3 GB/s hostdevice. Keep in mind that the code operates only on floats (which I need for my application), but it could be significantly faster if modified to work on 8-bit integers and batch memory operations in 128-bit chunks. Maybe one day...

Let me know if you have any interest developing or simply using this code.

Cheers, -Alan

On 08/29/2010 02:50 PM, Jon Nordby wrote:

On 29 August 2010 20:13, Alan Reiner wrote:

This is a long message, so let me start with the punchline: *I have a lot of CUDA code that harnesses a user's GPU to accelerate very tedious image processing operations, potentially 200x speedup. I am ready to donate this code to the GIMP project.

Hi

The first thing you should do if you want an (open source) project to to able use your code is to provide the code. Preferably in the form of a public source control repository. Without this first step, nothing can happen. :)

Please let me know if anyone is interested to work with me on this: etotheipi@gmail.com

Please note that it is expected in open source projects that communication is kept in public channels (eg: a mailing list like this one) unless there is a very good reason not to. For this reason I've cc'ed the list in this reply, and I urge you to do the same.

Sven Neumann
2010-09-04 21:44:13 UTC (over 14 years ago)

GPU-accelerated Image Filtering w/ CUDA

On Thu, 2010-09-02 at 20:37 -0400, Alan wrote:

Hi all,

It sounds like CUDA is not ideal for GIMP, but individuals on this list might be personally interested in it (it is free, just not OSS, might be good for plugin devel). I pushed my code into a git repo:

http://github.com/etotheipi/CUDA_Image_Convolution

Keep in mind that the state of the code is very immature in structure, but the algorithms are solid and work very fast if you have a CUDA-enabled graphics card. The readme has a lot of useful information

Some timings:

Basic convolution, erosion or dilation (on NVIDIA GTX 460): 4096x4096 image with 15x15 PSF/SE: 125 ms compute time ( 8 Hz)
4096x4096 image with 3x3 PSF/SE: 20 ms compute time ( 50 Hz)
2048x2048 image with 5x5 PSF/SE: 7.5 ms compute time ( 130 Hz)
512x512 image with 3x3 PSF/SE: 0.36 ms compute time (2750 Hz)

These timings are without memory transfers, which is somewhere between 1 GB/s and 3 GB/s hostdevice. Keep in mind that the code operates only on floats (which I need for my application), but it could be significantly faster if modified to work on 8-bit integers and batch memory operations in 128-bit chunks. Maybe one day...

Let me know if you have any interest developing or simply using this code.

You might want to resend this offer to the gegl-developer list. It might be interesting to integrate your work with the gsoc2009-gpu branch.

Sven