Test plan
This discussion is connected to the gegl-developer-list.gnome.org mailing list which is provided by the GIMP developers and not related to gimpusers.com.
This is a read-only list on gimpusers.com so this discussion thread is read-only, too.
Test plan | yahvuu | 05 Jul 20:09 |
Test plan | Øyvind Kolås | 05 Jul 21:49 |
Test plan | yahvuu | 06 Jul 14:26 |
Test plan | yahvuu | 06 Jul 15:46 |
Test plan
Hi,
here are some thoughts about a static test framework for gegl operations.
* Functional Tests These can be as simple as a set of files including a composition, reference output and a test description. A perl/ruby script feeds the composition through the gegl binary and compares the output with reference data. Optionally some PNGs + HTML are created for visual inspection.
Test Case: foo.xml composition foo.xxx reference data; format to be determinded foo.specs.yaml test specification foo.input1.xxx optional input data
Output: output/foo.xxx compositing result output/foo.result.yaml pass/fail info + basic statistic measures
Optional report output:
report/foo.output.png
report/foo.reference.png
report/foo.diff.png
report/foo.html
The test specification foo.specs.yaml looks like:
comment: Regression test for Bug #1234
errorbounds:
peak-absolute: 1e-6
The result foo.result.yaml something like:
testname : foo
result : pass
errors :
peak-absolute : 1e-10
mean-square : 1e-15
differing-pixels : 123
[...]
timing :
total : 500 # ms
The data format .xxx should be a float format understood by ImageMagick, candidates so far are MIFF, PFM, FITS, TIFF.
Reference data is most easily created using Octave. Probably a 'make reference' target will generate them from some ground truth m-files.
* Unit Tests Unit tests for operations are most comfortably expressed as functional tests. However, there are some tests which are common for all/many operations: - rendering must be independent of tile size - operations with spatial parameters should be scale invariance, i.e. applying a operation on a downscaled composition using parameters which have been downscaled by the same factor should give a good approximation of applying the unscaled operation on the unscaled composition and downscaling afterwards. - neutral settings
Somewhat unrelated, i'd also like to install a round trip test for GEGL's XML format, but haven't found a good way to deal with value rounding yet.
Comments, other things to consider, do's and don'ts? Is somebody already working on this?
peter
Test plan
On Sat, Jul 5, 2008 at 7:09 PM, yahvuu wrote:
here are some thoughts about a static test framework for gegl operations.
* Functional Tests These can be as simple as a set of files including a composition, reference output and a test description. A perl/ruby script feeds the composition through the gegl binary and compares the output with reference data. Optionally some PNGs + HTML are created for visual inspection.
GEGL should be capable of hosting it's own inspection framework for the generated images. Although not advertised as such, GeglBuffer serializes itself to a specified file format structure both for tiled and linear buffers for any format supported by babl.
This is more or less what the current gallery is the start of. At the moment this gallery (sometimes augmented with development or developer specific examples) is one of the primary methods used to test for both functional as well as performance regressions. Comparisons are done manually though.
Other better regression tests of a similar mind set are found for the functionality of GeglBuffer (this also being an ad-hoc test set, but at least comparisons and regressions are detected automatically.
snip<
Reference data is most easily created using Octave. Probably a 'make reference' target will generate them from some ground truth m-files.
This would mean reimplementing all the operations in octave as well. I do not see the purpose of this, once implemented as readable C source using floating point and having yielded a correct result, the correct result should be added to version control (but probably not the tarballs), to prevent future regressions.
See also bug #427532 for other thoughts about testing of operaitons.
http://bugzilla.gnome.org/show_bug.cgi?id=427532
* Unit Tests
Unit tests for operations are most comfortably expressed as functional tests. However, there are some tests which are common for all/many operations: - rendering must be independent of tile size
The core implementation of operations should not be implemented using APIs that allow specifying the tile size, further regression tests of GeglBuffer should be added to the GeglBuffer test suite (which doesn't even need to make use of GeglNode/GeglOperation etc.)
- operations with spatial parameters should be scale invariance, i.e. applying a operation on a downscaled composition using parameters which have been downscaled by the same factor should give a good approximation of applying the unscaled operation on the unscaled composition and downscaling afterwards. - neutral settings
One idea I've had in the past with regard to example compositions for the ops is adding the snippet of XML itself inside the .c file (maintaining the idea that for most ops a rather small .c file should be a selfcontained entity in the sourcetree). This embedded composition could have a small set of default images available for input. The resulting rendering could be displayed in the autogenerated operation reference.
/Øyvind K.
Test plan
Øyvind Kolås schrieb:
GEGL should be capable of hosting it's own inspection framework for the generated images. Although not advertised as such, GeglBuffer serializes itself to a specified file format structure both for tiled and linear buffers for any format supported by babl.
i'm not shure i fully understand what you imply by this. It is nice to have the comparison routines inside GEGL so they can be shared with the dynamic test system as sketched in bug #427532. On the other hand, duplicating these few routines outside GEGL allows to decouple the test runner from the GEGL API by large.
The test runner which crawls the directories should IMHO be an external tool, preferably written in a dynamic language, at least until the test specification format has settled.
I think it should be easy to add new test cases (following the route of turning each bug into a test) and it should be easy to inspect the test suite. That is to spot missing tests and avoid redundant ones. An optional detailed HMTL report including images would be useful for that.
If i read gegl_buffer_save() correctly, it uses a custom serialization format. Using this format makes it hard to utilize externally generated reference data, limiting the test suite to pure regression tests.
[..]
This would mean reimplementing all the operations in octave as well. I do not see the purpose of this, once implemented as readable C source using floating point and having yielded a correct result, the correct result should be added to version control (but probably not the tarballs), to prevent future regressions.
Not using independent reference data effectively degrades testing to visual inspection. Chances are low that plug-in authors actually check the values by hand. Thus the tests become blind for a large class of potential errors.
Octave is just an example here as it allows very concise implementations. For linear filters, this boils down to specifying the convolution matrix and calling a well tested generic filter function - just a couple of lines. I don't think there will be octave equivalents for each and every op, though.
- rendering must be independent of tile size
The core implementation of operations should not be implemented using APIs that allow specifying the tile size
so why not test the ops for compliance to this advice? The test will be generic and cheap to implement, though not in terms of CPU cycles.
Even the non-optimized op implementations offer a surprising lot of potential pitfalls. As an example, the gaussian iir filter features heavy ringing at radius=0, resulting in outlined tile borders. Although not strictly a bug, this needs special casing. And it is all too easy to oversee such problems.
One idea I've had in the past with regard to example compositions for the ops is adding the snippet of XML itself inside the .c file (maintaining the idea that for most ops a rather small .c file should be a selfcontained entity in the sourcetree).
yeah, self-documenting is a good thing. I doubt you want to ship comprehensive test suites within the plugins, though.
greetings, peter
Test plan
yahvuu schrieb:
Octave is just an example here as it allows very concise implementations. For linear filters, this boils down to specifying the convolution matrix and calling a well tested generic filter function - just a couple of lines. I don't think there will be octave equivalents for each and every op, though.
uh, that was a little imprecise. I don't intend to implement the full operations in Octave. It's just that octave is a convenient test data generator. And in some cases like linear filters, these small generators will come close to the full functionality, as a side effect.
peter