RSS/Atom feed Twitter
Site is read-only, email is disabled

frontend to help classify images

This discussion is connected to the gimp-user-list.gnome.org mailing list which is provided by the GIMP developers and not related to gimpusers.com.

This is a read-only list on gimpusers.com so this discussion thread is read-only, too.

3 of 3 messages available
Toggle history

Please log in to manage your subscriptions.

frontend to help classify images h121@ied.com 08 May 20:02
frontend to help classify images Helge Hielscher 08 May 20:43
  frontend to help classify images Shawn Willden 08 May 22:37
h121@ied.com
2004-05-08 20:02:32 UTC (over 20 years ago)

frontend to help classify images

Short:
I'm looking for a front-end tool to help me process (qualify and classify / catalog) about 10,000 scanned images.

Long:

Hi,

I've promissed to someone to process a bunch (10,000) of images, and am realizing it might not be as simple as I hoped.

I'm soliciting, in this forum of image processing experts, experiences and suggestions for possible solutions. I am a complete newbie into image processing, with zero experience (I know Gimp exists), so please don't think that I know what I want. I only have an idea of what should the ideal target roughly look like.

Before I start, first is a meta-question - is this a good forum to ask this at ? In which forum(s) would it be good to ask these types of questions ? (I've noticed that gimp-perl has on average only 1.5 posts/mth)

I'm looking for some tool(s) that would help me qualify and classify / catalog a bunch of images. I can easily build myself the database structures I need in MySQL or PostgreSQL. I'm having trouble finding the frontend tool which would allow me to view (and manipulate a little) the image and would be an effective data entry tool for these qualifications / classifications. I'm not so concerned at this point about the viewing of the images once they are all processed, althouh I imagine that the same tool might be also used for viewing these images at the end. I would prefer to run it all on linux, but would settle for Windows front-end, if necessary. I know Javascript, PHP & Perl if some integrating is needed. (I also know C++ and Java, but would prefer not to use them for this, if possible.)

I have about 4,000 photos w/EXIF info, but I'll write about those at the bottom of this post. More importantly, I have about 7,000 B&W (dithered) scans of docs of various contrasts (sometimes light gray), all of them are text (no photos). I currently have them all (99%) in pixel format (png and pbm), about 1% are jpegs. None are multipage scans, all are single-image scans.

I need to classify them in several "dimensions", but elements / attributes of those dimensions may vary based on the type of content the document carries;

I need to build a searchable database, so I can find them by specifying a criteria in one or more dimensions. E.g. "all expense docs from `Botanical Gardens' involving period June 23, 2003 to July 23, 2003", and a set of 140 image files would fall out for display / browsing.

I would really hope to have a frontend which would be fully controllable via kbd, just because kbd is so much faster to use than mouse (for most things (*1))

Key Meaning a "This is another page from the same doc. Write it into the DB and b "This page is blank - doesn't contain any information" 6.1 display next scan".
n "This is scan is a page of another doc. Close the previous logical doc." d "Add this page to a doc that has been created before" s "Start a new doc."
f "This page pertains to finances." 6.2 c "This page pertains to finances / income." 6.2.1 e "This page pertains to finances / expense." 6.2.2 l "This page pertains to legal." 6.3 i "This page pertains to info." 6.4 ...

(*1) - mouse comes in handy for only two actions: see G4 and G7 below

So I guess I would be looking for a "graphical engine", or "display engine" capable of (hopefully fast) display and manipulation of images. Separate zoomed window for fine navigation would be a nice extra. It would be nice if it would have combo boxes for choosing / adding items (see dimension 4 below), where the selection of items narrows down as you type lookup codes / starting letters of the entities. ( see point 4 below )

If worse comes to worst, I would settle for this whole thing being done in javascript ( I found that it is possible to draw lines / rectangles in javascript - see maptuit.com : http://tremblant.www.maptuit.com/corporate/testdrive/getamap.html) But using browser and javascript for this image manipulation would be terribly slow, probably ugly, and I would hate if I had to use MS's Explorer's exentions :-( Not to mention that I have no idea how could I do 8x-zoom popup with mouse-fine-control in Javascript. Plus browsers don't really allow for easy image panning.

Below is what I think my wishlist should be. But then again, I'm new to image processing ...

Thanks,

John

This is what I imagine the graphical engine should be able to do:

G1 fit-to-widow

G2 fit-width-to-window

G3 1-to-1 pixel zoom

G4 8-to-1 pixel zoom (in a smaller window - see G7)

G5 mouse movement in the above three items moves the image, so whole page could be quickly visually scanned for defects

G6 ability to specify areas (mostly rectangular, possibly occasionally rotated) of an image [ this would tango with the system feature 2.5 below - ability to treat these areas as separate scans (as pieces of different documents) ]

G7 fine-navigation: nice extra: when Conrol key or something is pressed, a fine-navigation (8x zoomed) window pops up on the side, and mouse movement is 8x finer - allows for spefifying fine rotation angle (1.7.2) by means of clicking on two points which *should* be in a straight horizontal or vertical line on the original

G8 another nice extra: "increase contrast" algorithm - in a B/W or dithered picture: draw a 2 or 3-pixel wide line between pixels that are less then distance X apart (this will enhance). This is just my formulation of what a "contrast enhancing" algorithm should do. Or another algorithm with similar effect: if a pixel has another pixel less than distance X away, turn other pixels black in its 2 or 3-pixel diameter.

The dimensions would be:

1 picture quality dimension: 1.2 resolution : 300 ? 600 ? other ? 1.3 lineart or dithered ?
1.4 legible scan ?
1.5 the whole page is scanned ? or are parts / edges missing? 1.6 needs re-scan ?
1.7 needs post-processing ?
1.7.1 rotation by X*90 degrees
1.7.2 rotation by Y*0.1 degrees
1.7.3 increasing "contrast" (difficult with B&W/dithered pics)

2 document structure dimension: (2.1 to 2.3 erased) 2.4 which scan is the chapter title page, if any ? 2.5 if one scan contains more than one logical document, how does the scan divide into areas containing them ? 2.6 which library does it belong to ? 2.7 which shelf within library does it belong to ? 2.8 which volume of books on that shelf does it belong to ? 2.9 which book in that volume does it belong to ? 2.10 which chapter in that book does it belong to ? 2.11 which page of the chapter is it ? 2.12 which side of that page is it ?

3 time dimension: 3.1 date & time
3.2 period (from date to date)
3.3 expiry date
3.4 other date

4 entities dimension: 4.1 from which entity ? [ choose from / add to list of entities ] 4.2 to which entity ? [ choose from / add to list of entities ] 4.3 publishing entity ? [ choose from / add to list of entities ] 4.4 from which address ? [ choose from / add to list of addresses ] 4.5 to which address ? [ choose from / add to list of addresses ]

5 values: 5.1 ID1
5.2 ID2
5.3 title
5.4 subject
5.5 value1
5.6 value2
5.7 value3

6 flag:
6.1 blank page ?
6.2 financial ?
6.2.1 expense ?
6.2.2 income ?
6.3 legal ?
6.4 infomational ?
6.5 expired ?

7 ownership / responsibility for this doc: 7.1 Jack's group
7.1.1 Jack
7.1.2 Peter
7.2 Mary's group
7.2.1 Mary
7.2.1 Dennis

Then I have about 4,000 JPEG color pics, most of them w/EXIF data.

With these, there may be additional qualification, plus some from above may not qualify
1.8 rating of quality of composition (capturing the intended subject) 1.9 rating of technical quality
1.8.1 focused
1.8.2 not shaken (when tripod not used) 1.8.3 proper lighting / timing / contrast

and then sorting them into categories : 8. category
8.1 trees
8.1.1 indoor
8.1.2 outdoor
8.2 bushes
8.3 tools

Helge Hielscher
2004-05-08 20:43:34 UTC (over 20 years ago)

frontend to help classify images

On Sat, 08 May 2004 14:02:32 -0400, h121 wrote:

Short:
I'm looking for a front-end tool to help me process (qualify and classify / catalog) about 10,000 scanned images.

You may want to have a look at KimDaBa. http://ktown.kde.org/kimdaba
http://kde-apps.org/content/show.php?content=10065

Regards, Helge

Shawn Willden
2004-05-08 22:37:28 UTC (over 20 years ago)

frontend to help classify images

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On Saturday 08 May 2004 12:43 pm, Helge Hielscher wrote:

On Sat, 08 May 2004 14:02:32 -0400, h121 wrote:

Short:
I'm looking for a front-end tool to help me process (qualify and classify / catalog) about 10,000 scanned images.

You may want to have a look at KimDaBa. http://ktown.kde.org/kimdaba
http://kde-apps.org/content/show.php?content=10065

I second the recommendation. A little more information:

KimDaBa uses an XML file to store all of the image metadata. While this solution isn't ideal for huge image databases, because it has to load the entire metadata database into RAM to get reasonable performance, it will be just fine with only 10,000 images (I have about 7,000 images in mine). Once all of the metadata is in XML format, you can write a little code to put it in any other format you like.

KimDaBa supports arbitrary classification schemes. Basically, you define a set of categories (you called them dimensions), and the values for each category. The image classification window is customizable, so you can arrange the defined categories for convenience. Though it may not be applicable for your application, you can also define groups within a category. Groups contain values or other groups.

After images are categorized, you can drill down pretty much any way you like. Selecting one criterion, then another, then another, until you've identified the set of images you want, or at least narrowed it down far enough to make a manual search reasonable.

KimDaBa will also allow you to "mark up" images, rotating them, drawing on them, etc., but all of the modifications are stored as metadata, without modifying the actual image file. When you display the photo with KimDaBa, the markup is visible, the image is rotated, etc.

KimDaBa reads date/time and rotation data from EXIF, if available.

One thing KimDaBa does not do (AFAIK) well is provide convenient ways to zoom in. Also, it's a bit slow to display each image. I think this is because it just uses the KDE image code to decompress and display images, and it isn't terribly fast. In a KDE photo manipulation tool I wrote, I had to resort to trading off some RAM to cache the last few and the next few images. I think JPEG decoding time was the biggest part of the problem, but frankly didn't look into it that much after I found an acceptable solution.

Also, it sound like you might need something more flexible than KimDaBa's HTML generation for the final image database.

One other plus, the primary author, Jasper Pedersen, is generally quite responsive, so if you can make a case that some need of yours is likely to be shared by others, he's likely to add your requested feature, and quickly. The code is reasonably clean also, so if you're a C++ programmer you can always look into adding whatever bits you need yourself.

Shawn. -----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFAnUUIp1Ep1JptinARAhjxAJ9YBgjNoDPAd8QDkrYpLlkASqLQ0QCfR02P Py5ESlyHoy8ORMdB+0y5xP8=
=YwRn
-----END PGP SIGNATURE-----