Perf tools evaluation and proposal
This discussion is connected to the gegl-developer-list.gnome.org mailing list which is provided by the GIMP developers and not related to gimpusers.com.
This is a read-only list on gimpusers.com so this discussion thread is read-only, too.
Perf tools evaluation and proposal | Henrik Akesson | 17 Jun 11:37 |
Perf tools evaluation and proposal | Boudewijn Rempt | 17 Jun 11:52 |
Perf tools evaluation and proposal | Henrik Akesson | 17 Jun 12:44 |
Perf tools evaluation and proposal | jcupitt@gmail.com | 20 Jun 23:15 |
Perf tools evaluation and proposal | Henrik Akesson | 22 Jun 11:38 |
Perf tools evaluation and proposal | jcupitt@gmail.com | 22 Jun 12:00 |
Perf tools evaluation and proposal
Hi,
here's the promised evaluation of the current profiling tools and a proposal.
Regards,
Henrik
TOOLS
Valgrind - retained
Oprofile - retained
gprof, sprof - obsoleted by oprofile
ltrace, ptrace - not capable of profiling dynamically loaded objects (dlopen-ed)
GEGL instrumentation - same functionality is achieved by the below proposition.
Further info: http://sites.google.com/site/computerresearcher/profiling-tools/
################################################################################ PART ONE - VALGRIND - Where the time is spent ################################################################################
Current tools are capable of producing an abundance of profiling data.
Callgrind (with Cachegrind activated) will produce 13 different
measurements for every line of code that has been executed during the
program run:
1 on how many times the line has been executed
8 on cache use
4 on branching
Running callgrind on a command line gegl program that runs gaussian-blur produces profiling data for 149 different objects (89 of which are gegl operations that get loaded)!!!
NOTE: A serious limitation of Valgrind is that it can only count events, it cannot tell you how much time things take (such as cache miss or execution time for an instruction). This is because it heavily modifies the code before running it, and therefore renders any time measurements useless.
I propose to implement a tool that allows the user to (step 1) select
the data he is interested in and (step 2) presents the results in a
easy-to-understand way.
Step 1 - SELECTION - user workflow:
1a) Select the libraries of interest
1b) Select the entry/exit function (normally the main function), i.e.
only data measured inside this function (including calls to other
functions) is displayed
1c) Hotpath elaboration. I.e. display and selection of the code
execution path to display. (TBD)
Step 2 - EVALUATION - workflow: 2a) Code annotation. I.e. display the above selected code with measurements 2b) Trend display
Step 3 - MANAGEMENT - workflow:
3a) Adding new data (cmd line and web)
3b) Adding new evaluation scenarios (web)
3c) Listing data
3d) Deleting data
3e) Listing scenarios
3f) Deleting scenarios
3g) Adding scenarios
################################################################################ PART TWO - OPROFILE - What the processor is doing ################################################################################
An important limitation of the processor's performance counters is that from a choice of about 100, there are only 2-8, depending on processor model and mark, that can be used in the same moment. There has been some attempts by some researchers to multiplex them, but this requires modification of kernel code and thus is not accessible for the ordinary mortal developer.
I propose to:
a) By repeatedly running the test/performance case assemble all the
important data into the database
b) Possibly there will be a need to use statistics in order to have
reliable data (i.e. determine which distribution and use this during
data collection) TBD.
c) Add groups of annotation data to the point 2a. Basically groups
such as "L1 Cache use", "L2 Cache use", "Vector extensions" etc...
Normally the data should be stored internally in the same format as for valgrind. This means that only an import tool needs to be written in order to add oprofile data.
################################################################################
IMPLEMENTATION DETAILS
################################################################################
I propose a web-based implementation using jRuby and Ruby on the Rails together with a sql database (could potentially be an object database = db4o).
For the code annotation, I propose to use the existing subversion repository for retreiving the code to be annotated and GNU source highlight.
Perf tools evaluation and proposal
On Wed, 17 Jun 2009, Henrik Akesson wrote:
I propose to implement a tool that allows the user to (step 1) select the data he is interested in and (step 2) presents the results in a easy-to-understand way.
Doesn't kcachegrind provide all of this for you?
Boudewijn
Perf tools evaluation and proposal
The main differences of my solution compared to KCachegrind are:
I would use a database approach allowing to display performance history. I.e. in the case of a drop in performance, a developer would be able to localise it to a certain commit. Also useful for comparing the performance of two different solutions during development.
The aggregation of counter, IMHO is very important for understanding what the processor is doing. It is a very difficult task, and just having data from 2-8 counters is like providing only a small window on a big complex image.
I think that you could summarize kcachegrind as a tool that shows you all the data whereas the approach I propose is to show as little data as possible in order to localise performance issues and when found showing the full image of what the processor is doing.
IMHO KCachegrind is not easy to use nor understand (ex: what does "Distance 6-11 (6)" mean? etc...)
My solution is web based.
I believe that this is enough of differences to justify a new tool. Is it to you?
Henrik
2009/6/17 Boudewijn Rempt :
On Wed, 17 Jun 2009, Henrik Akesson wrote:
I propose to implement a tool that allows the user to (step 1) select the data he is interested in and (step 2) presents the results in a easy-to-understand way.
Doesn't kcachegrind provide all of this for you?
Boudewijn
Perf tools evaluation and proposal
Hi Henrik,
2009/6/17 Henrik Akesson :
I believe that this is enough of differences to justify a new tool. Is it to you?
OK, though I hope the project is not becoming too ambitious. One of the reasons we all liked your original proposal so much was that it seemed focused and clearly achievable. I would be slightly concerned that you are maybe looking too much at the presentation of results and not enough on getting the numbers to start with.
How about as a next step setting up a small test suite, a series of automated measurements, and a simple reporting generator? Leave the fancy web interface until the basic tests and measurements are working. This would also give you something concrete to show at the mid-term evaluation.
John
Perf tools evaluation and proposal
John,
Hmm, I'm not sure I've expressed myself clearly...
Currently, I have a simple ruby test harness and a guassian-blur test that I use to generate test data with Valgrind. For the valgrind part, I plan on using all the measurements callgrind + cachegrind can provide. Oprofile is much more complicated (as there's a choice of roughly 80 measurements and because it's based on statistical sampling).
This is why I focus on valgrind for mid-term. Paying attention to your concern, I propose to start with a very simple web-interface (using text when possible instead of fancy graphics etc...) and then working towards improving it.
However, I will not have the time to set up a test suite before mid-term. Therefore I propose to start that ASAP after the mid-term.
I intend to produce a fully functional suite based on Valgrind before starting with oprofile, as I estimate the latter as the "high-risk" part of the project. Like that I can guarantee you a working tool-set using valgrind and hopefully also something that can provide the "bells-and-whistles" of oprofile.
Does this seem reasonable to you?
Henrik
2009/6/20 :
Hi Henrik,
2009/6/17 Henrik Akesson :
I believe that this is enough of differences to justify a new tool. Is it to you?
OK, though I hope the project is not becoming too ambitious. One of the reasons we all liked your original proposal so much was that it seemed focused and clearly achievable. I would be slightly concerned that you are maybe looking too much at the presentation of results and not enough on getting the numbers to start with.
How about as a next step setting up a small test suite, a series of automated measurements, and a simple reporting generator? Leave the fancy web interface until the basic tests and measurements are working. This would also give you something concrete to show at the mid-term evaluation.
John
Perf tools evaluation and proposal
Hi again Henrik,
2009/6/22 Henrik Akesson :
Currently, I have a simple ruby test harness and a guassian-blur test that I use to generate test data with Valgrind.
Ah, OK, sorry, I'd only seen the report, I didn't realise you had something concrete going already. I was worried there might not be much to see at mid-term.
Your proposal sounds good to me, well done.
However, I will not have the time to set up a test suite before mid-term. Therefore I propose to start that ASAP after the mid-term.
On the performance tests, a senario I've used in the past is described here:
http://www.vips.ecs.soton.ac.uk/index.php?title=Benchmarks
It's derived from a real application which took high-resolution master images off a server and generated files for printing on a large-format inkjet. It might be a bit fiddly to implement. A much simpler version is here:
http://www.vips.ecs.soton.ac.uk/index.php?title=Speed_and_Memory_Use
This is just: load, crop, shrink, sharpen, save. It's not very demanding, but it is very easy to implement, and typical of applications like Picasa or F-Spot. It might be helpful I guess.
I'm sure the gimp list could suggest an application benchmark that would be typical of GEGL use in Gimp.
John