Hi!
I'm afraid there is still some work to be done in the opencl branch. I've
been reorganizing the code a little and rebasing it always when I can. But
there are two important things missing yet, which I intend to do as soon as
possible:
1. implementing an gegl:over operator.
2. asynchronous opencl tasks instead of synchronizing after each one, this
requires major reworking in my code.
1. is relatively easy to do, but 2. is really needed if we want good
performance .
---
Also, I have some questions which I'd like the community opinion if
possible:
the current scheme where I alloc as many tiles in the GPU as there are in
the image is complicated, for two reasons:
* GPUs have weird addressing modes, so it's hard to know if a tiled image
can fit in the GPU until I try to alloc a buffer in the GPU and it fails.
Also, the drivers are not optimized for this use case, usually they expect
a few large memory buffers while we have many small of them, I've been
through some weird problems with memory management during this project.
* PCI transferences for small tiles have too much overhead, if we want it
to really have good performance we need to use very big tiles [like
4096x4096] which kills the purpose of using tiles in the first place,
shouldn't it be the case of just un-tiling the processing region in the
image in GPU processing?
I know the main point in using the current scheme is avoiding memory
transferences back and forth the GPU for each operation in a link. But I'd
like someone to give a look at the current state of the code in my branch.
The code is very complex and requires locking for each operation in a tile
[because it's possible to have cpu and gpu data out-of-sync], maybe we
should just keep it simple and do what the Darktable guys have been doing
[which is the simple scheme I mentioned].
Also, there are some new processors in the market with integrated GPUs
which should minimize a lot this memory problem. But I'm not sure about
this.
Pros:
Predictable GPU memory use
Less overhead in PCI transferences and GPU processing
Much simpler code
Cons:
We have to bring data back and forth the GPU for each operation
So, the question is. I have to do a major rework of the code anyway, I have
no problem in doing it the way I explained, which I believe is less prone
to errors, but what you guys think about it?
bye!
Victor Oliveira
On Sun, Oct 30, 2011 at 6:12 AM, Michael Mur