New serialization format
This discussion is connected to the gegl-developer-list.gnome.org mailing list which is provided by the GIMP developers and not related to gimpusers.com.
This is a read-only list on gimpusers.com so this discussion thread is read-only, too.
New serialization format | Ville Sokk | 05 Jul 14:55 |
New serialization format | Michael Natterer | 05 Jul 14:57 |
New serialization format | Ville Sokk | 05 Jul 15:16 |
New serialization format | gfxuser | 05 Jul 16:07 |
New serialization format | Øyvind Kolås | 05 Jul 18:03 |
New serialization format | Victor Oliveira | 05 Jul 18:07 |
New serialization format | Hendrik Boom | 06 Jul 00:42 |
New serialization format | Daniel Rogers | 09 Jul 14:33 |
New serialization format | Michael Muré | 23 Jul 13:57 |
New serialization format | rassahah@googlemail.com | 24 Jul 22:00 |
New serialization format | Jon Nordby | 05 Jul 16:27 |
New serialization format | Ville Sokk | 05 Jul 17:20 |
New serialization format | rassahah@googlemail.com | 05 Jul 22:59 |
New serialization format | Isaac Wagner | 05 Jul 18:07 |
New serialization format
GEGL needs a new serialization format. On IRC two popular choices were JSON and YAML. Fancy features are not required so both should be good choices.
YAML pros:
* very readable
JSON pros:
* pretty much every programming language has a JSON library
* simpler than YAML
BatO's YAML example: http://pastebin.com/QML9BkCb and JSON: http://pastebin.com/9ZV9y7Vz
Any ideas why one should be chosen over the other? Maybe there's an even better option?
gegl-developer-list mailing list gegl-developer-list@gnome.org https://mail.gnome.org/mailman/listinfo/gegl-developer-list
New serialization format
On Thu, 2012-07-05 at 17:55 +0300, Ville Sokk wrote:
GEGL needs a new serialization format. On IRC two popular choices were JSON and YAML. Fancy features are not required so both should be good choices.
YAML pros:
* very readableJSON pros:
* pretty much every programming language has a JSON library * simpler than YAMLBat´O's YAML example: http://pastebin.com/QML9BkCb and JSON: http://pastebin.com/9ZV9y7Vz
Any ideas why one should be chosen over the other? Maybe there's an even better option?
And XML was ruled out because it's not the latest fad any longer?
confused, --mitch
New serialization format
And XML was ruled out because it's not the latest fad any longer?
confused, --mitch
I don't think anything was ruled out. People just don't like XML for this kind of thing. But the good thing about XML is no additional dependencies.
New serialization format
Am 05.07.12 16:57, schrieb Michael Natterer:
On Thu, 2012-07-05 at 17:55 +0300, Ville Sokk wrote:
GEGL needs a new serialization format. On IRC two popular choices were JSON and YAML. Fancy features are not required so both should be good choices.
YAML pros:
* very readableJSON pros:
* pretty much every programming language has a JSON library * simpler than YAMLBat´O's YAML example: http://pastebin.com/QML9BkCb and JSON: http://pastebin.com/9ZV9y7Vz
Any ideas why one should be chosen over the other? Maybe there's an even better option?
And XML was ruled out because it's not the latest fad any longer?
Hi,
I was also wondering why not XML. IIRC image processing in GEGL is internally represented by a tree (correct me if I'm wrong). Are YAML and JSON able to handle this, better than a native tree format like XML?
Best regards,
grafxuser
New serialization format
On 5 July 2012 16:55, Ville Sokk wrote:
GEGL needs a new serialization format. On IRC two popular choices were JSON and YAML. Fancy features are not required so both should be good choices.
Why does it need a new one? I am not saying it does not, but I think the justification/rationale/usecases is necessary to have a reasonable discussion and end-result.
YAML pros:
* very readableJSON pros:
* pretty much every programming language has a JSON library * simpler than YAMLBat´O's YAML example: http://pastebin.com/QML9BkCb and JSON: http://pastebin.com/9ZV9y7Vz
Any ideas why one should be chosen over the other? Maybe there's an even better option?
New serialization format
On Thu, Jul 5, 2012 at 7:07 PM, gfxuser wrote:
I was also wondering why not XML. IIRC image processing in GEGL is internally represented by a tree (correct me if I'm wrong). Are YAML and JSON able to handle this, better than a native tree format like XML?
Look at the examples. They are JSON and YAML versions of docs/gallery/clones.xml from GEGL source tree. No problem with this. Hopefully you can see that the JSON and YAML versions are more readable.
On Thu, Jul 5, 2012 at 7:27 PM, Jon Nordby wrote:
Why does it need a new one? I am not saying it does not, but I think the justification/rationale/usecases is necessary to have a reasonable discussion and end-result.
Of course. I'm not aware why pippin and BatO were interested in a new format but the problem for me was how nodes are connected in the current format. Stacked nodes go input -> output and child node's output is connected to the parent's aux. You can't use operations with aux2 this way.
gegl-developer-list mailing list gegl-developer-list@gnome.org https://mail.gnome.org/mailman/listinfo/gegl-developer-list
New serialization format
On Thu, Jul 5, 2012 at 6:07 PM, gfxuser wrote:
I was also wondering why not XML. IIRC image processing in GEGL is internally represented by a tree (correct me if I'm wrong). Are YAML and JSON able to handle this, better than a native tree format like XML?
The reason that a new/extended format is needed is that GEGL is more general than a tree - it uses graphs. At the moment GEGL uses the implicit tree of the nodes from XML + the ability to clone outputs to represent the subset of graphs that are representable with sources, filters and composers (two input pads, one output pad). All of the programming API in GEGL deals with graphs not trees - the tree representation of constrained subsets can however still be useful to use both for _some_ deserialization purposes as well as UI representations.
/yvind K.
The future is already here. It's just not very evenly distributed -- William Gibson http://pippin.gimp.org/ http://ffii.org/ _______________________________________________ gegl-developer-list mailing list gegl-developer-list@gnome.org https://mail.gnome.org/mailman/listinfo/gegl-developer-list
New serialization format
Hi,
I was also wondering why not XML. IIRC image processing in GEGL is internally represented by a tree (correct me if I'm wrong). Are YAML and JSON able to handle this, better than a native tree format like XML?
Best regards,
grafxuser
I have been under the impression that while no operations currently have more than one output, in theory gegl will support multiple outputs and in fact parts of the codebase already do (again, in theory) support multiple outputs. Perhaps someone could clarify on this? If it is the case, then eventually anything to do with gegl graphs, such as serialization of them, will need to treat them as generalized graphs and can't assume they will be trees. While XML is great at representing trees, like you said, I imagine the current format would start to get pretty convoluted when trying to add rules for representing non-tree graphs.
New serialization format
JSON++
On Thu, Jul 5, 2012 at 11:03 AM, Øyvind Kolås wrote:
On Thu, Jul 5, 2012 at 6:07 PM, gfxuser wrote:
I was also wondering why not XML. IIRC image processing in GEGL is internally represented by a tree (correct me if I'm wrong). Are YAML and JSON able to handle this, better than a native tree format like XML?
The reason that a new/extended format is needed is that GEGL is more general than a tree - it uses graphs. At the moment GEGL uses the implicit tree of the nodes from XML + the ability to clone outputs to represent the subset of graphs that are representable with sources, filters and composers (two input pads, one output pad). All of the programming API in GEGL deals with graphs not trees - the tree representation of constrained subsets can however still be useful to use both for _some_ deserialization purposes as well as UI representations.
/Øyvind K.
--
«The future is already here. It's just not very evenly distributed» -- William Gibson http://pippin.gimp.org/ http://ffii.org/ _______________________________________________ gegl-developer-list mailing list
gegl-developer-list@gnome.org
https://mail.gnome.org/mailman/listinfo/gegl-developer-list
New serialization format
Ville Sokk wrote:
GEGL needs a new serialization format. On IRC two popular choices were JSON and YAML. Fancy features are not required so both should be good choices.
YAML pros:
* very readableJSON pros:
* pretty much every programming language has a JSON library * simpler than YAMLBatO's YAML example: http://pastebin.com/QML9BkCb and JSON: http://pastebin.com/9ZV9y7Vz
Any ideas why one should be chosen over the other? Maybe there's an even better option?
Some ideas regarding this serialization thing follow (lengthy)...
from the pastebin example:
{
"gegl.png":
{
"name": "gegl:load",
"path": "data/gegl.png"
},
...
I assume the "name" here is a mistake and it should actually be
"operation", right? Anyway, i use "name" for the rest of this post.
As far as i understand this: A gegl serialization file contains an object, which maps node names to nodes; "gegl.png" is a name of a node. So later on the json refers to this node for example in the node with the name "bg1":
..., "bg1":
{
"name": "gegl:gaussian-blur",
"input": "gegl.png:output",
"std-dev-x": 5.0,
"std-dev-y": 5.0
},
Personally i do not like this kind of reference:
"input": "gegl.png:output",
This is because of the string-encoding with the colon that separates the referee node with the output pad name. Effectively this would prohibit the use of the colon in names of nodes and parameters. Even though there are no properties with a ':' in their name currently, the special use of the ':' would create a dependency for the node implementations to the serialization format, which is bad. Better would be to give the node reference and the name of the parameter as a pair of strings. It would look like this:
..., "bg1":
{
"name": "gegl:gaussian-blur",
"input": ["gegl.png", "output" ],
"std-dev-x": 5.0,
"std-dev-y": 5.0
},
It might be useful to consider using even objects for this kind of reference, because the components can be named. Could look like this:
..., "bg1":
{
"name": "gegl:gaussian-blur",
"input": { "node": "gegl.png", "pad": "output" },
"std-dev-x": 5.0,
"std-dev-y": 5.0
},
I do not know what is better here, generally in json based formats, you will often see the array style (["gegl.png", "output" ]) when you are concerned about file size and the object style ({ "node": "gegl.png", "pad": "output" }) when more concerned about readability, so for this case i would choose the object style.
Also i would suggest allowing for some way to specify a node inline, without creating a name, for example it could be done with the object notation like this:
..., "bg1":
{
"name": "gegl:gaussian-blur",
"input": {
"node": {
"name": "gegl:load",
"path": "data/gegl.png"
},
"pad": "output"
},
"std-dev-x": 5.0,
"std-dev-y": 5.0
},
The node references ("gegl.png", "bg1" etc) would still be required for nodes that get used in more than one place, though. But in my experience those global references will always get in the way in one or two places, sooner or later, because for example they hinder you to take some nodes out of one file and paste it into another file without doing some renaming first. In general, modifying a graph will be more difficult, because one always has to watch the references of the nodes.
Another thing: how are other data types to be represented (the pastebin does
not contain that case). For example in json the gegl:color node should look
something like this:
{
"my-color": {
"name": "gegl:color",
"value":
}
}
but what is ? Perhaps it could be an object with rgba values:
"my-color": {
"name": "gegl:color",
"value": {"r": 0.1, "g": 0.2, "b": 0.3, "a": 0.4 }
}
Are there other data types, that would need to be represented (for example
the "d" property of a gegl:fill-path node)?
For the json vs xml: I think the json is more readable than xml, especially for small files, whereas xml has more support from things like schema, xsl, etc. But the benefits of xml (in my opinion at least) come only into play when the additional overhead of using xml is outweighted by the processing time (number of files) one needs to handle, which is probably not the case for gegl, OR (and that might be worth to think about) when intermixmangling the gegl xml with some other xml by using different namespaces, for example by joining a gegl graph with some svg nodes in it, all in a single file (personally i do not know any concrete usecase for this, so i would ignore this for now). I do not know enough about yaml to compare with it.
Best regards - Rasmus
gegl-developer-list mailing list gegl-developer-list@gnome.org https://mail.gnome.org/mailman/listinfo/gegl-developer-list
New serialization format
On Thu, Jul 05, 2012 at 06:07:22PM +0200, gfxuser wrote:
Am 05.07.12 16:57, schrieb Michael Natterer:
On Thu, 2012-07-05 at 17:55 +0300, Ville Sokk wrote:
GEGL needs a new serialization format. On IRC two popular choices were JSON and YAML. Fancy features are not required so both should be good choices.
YAML pros:
* very readableJSON pros:
* pretty much every programming language has a JSON library * simpler than YAMLBatO's YAML example: http://pastebin.com/QML9BkCb and JSON: http://pastebin.com/9ZV9y7Vz
Any ideas why one should be chosen over the other? Maybe there's an even better option?
And XML was ruled out because it's not the latest fad any longer?
Hi,
I was also wondering why not XML. IIRC image processing in GEGL is internally represented by a tree (correct me if I'm wrong). Are YAML and JSON able to handle this, better than a native tree format like XML?
XML was originally a text format. It's rather complicated to parse, mainly because of legacy compatibility issues. It's bulky. It's usable for word processing (as in OpenOffice) only because it's been compressed.
-- hendrik
gegl-developer-list mailing list gegl-developer-list@gnome.org https://mail.gnome.org/mailman/listinfo/gegl-developer-list
New serialization format
On Jul 5, 2012, at 7:57 AM, Michael Natterer wrote:
And XML was ruled out because it's not the latest fad any longer?
I think this is pretty much the right answer. There is a ton of XML hate in the world right now.
Having fought this battle when dealing with millions of lines of code, 100's of thousands of lines of JSON and/or XML, I can leave the following advice
XML is probably the right answer here. XML sucks in the following ways.
1. It's verbose. This is actually good for humans, but it sucks as a wire format, and some people feel the verbosity is unreadable. That's only true if you're able to keep all the context in your head. Once someone screws up the indentation, or you're 1000 lines in and 12 nested levels deep, having the extra context of tag names makes a huge difference. Also, gzip is awesome here and solves the on-disk space issues.
2. It's complex. No argument here. There is a lot of things is supposed to do, and a major ambiguity that people always complain about (attribute vs. elements).
3. Many of the parsers are memory hogs (tree parsers) or very slow (though that's gotten much better and doesn't apply to the parser gegl is using). They were copying too many strings.
1 and 3 means it sucks as an on-wire format for interactive HTTP requests (though gzip pretty much negates 1). 2 means it's hard to write a fast JS parser for it, which means your HTML5 app will get slow.
Everyone says "it's more readable!" Then they try to maintain a large file, using their JSON file. Then they discover that validation and line numbers for errors, and a more expressive grammar go a long way towards keeping programs simpler. The first time you spend an hour trying to track down where your missing "," caused your entire file to fail to parse, you'll wish you had a better parser. I haven't found a JSON parser that will actually spit out line numbers and context for errors. With XML, it's easy to combine multiple grammars (think embedding GEGL ops into another XML document). It has a validation language (two of them, in fact. yes, they have warts but they do actually work for most things). It's easier for new brains to look at (though slower for familiar brains). It's more self-describing, for those who expect their file format to be produced or consumed by many other programs. It's amazing how important strict specification can be when it comes to using a file as an interchange format. XML is much better at this, than most other options.
Anyways, if you just expect your serialization to be temporary (like a wire format), needs to be parsed fast by a huge variety of hardware in languages without a byte array (JS), or is only produced and consumed by your own application, then JSON (or BSON, or protocol buffers) seem like a good choice. If you're going for more of an interchange format, stick with XML.
Thus I would strongly suggest using XML for this.
Also, as far as structure goes, if you want to represent a general graph, you can draw inspiration from DOT, the language of graphviz. There is also graphML. You could frankly use graphML straight out of the box, though it has lots of features you're probably not interested in.
The general structure is usually:
.. graph attributes
'
So you don't try to put a tree in the text at all. IT's just a list of nodes and edges.
-- Daniel
gegl-developer-list mailing list gegl-developer-list@gnome.org https://mail.gnome.org/mailman/listinfo/gegl-developer-list
New serialization format
Let's roll the ball a bit further.
Here is another try, with XML this time. In the process, I corrected some
flaws that was present in the YAML and JSON test:
* the output node is mentioned explicitly. It was implicit before, the
output node being the root of the tree.
* parameters and connection are separated by different tags, not just by
name
* operation's ID and pad are separated when describing a connection, so no
need to parse, and each can have ':' in their names
The different kind of parameters we can have:
- boolean
- int
- double
- string
- enumeration
- GeglColor (serialization defined in gegl-color.c)
- GeglCurve (no serialization format defined)
- GeglPath (serialization defined in gegl-path.c
Is this better ?
New serialization format
Michael Mur wrote:
Let's roll the ball a bit further.
Here is another try, with XML this time. In the process, I corrected some flaws that was present in the YAML and JSON test: * the output node is mentioned explicitly. It was implicit before, the output node being the root of the tree. * parameters and connection are separated by different tags, not just by name
* operation's ID and pad are separated when describing a connection, so no need to parse, and each can have ':' in their namesThe different kind of parameters we can have: - boolean
- int
- double
- string
- enumeration
- GeglColor (serialization defined in gegl-color.c) - GeglCurve (no serialization format defined) - GeglPath (serialization defined in gegl-path.cIs this better ?
Looks ok to me. One thing i would do: Leave out the 'out' attribute of the root node (which probably should designate the root of the graph). It does not really fit to the rest of the file. The XML describes completely what the graph looks like already. The 'out' attribute seems to be different; it seems to describe how to USE the graph instead of how it looks. But for this a single attribute is not enough anyway. For example, i sometimes use gegl for splitting an image into the channels R, G and B, so i have three root nodes, so there would be no useful single attribute to use as a root. And because i do not have a better idea at hand for how this should be included in a serialized format (if it should be included at all), i would leave it out.
wbr - Rasmus
--
Michael
gegl-developer-list mailing list gegl-developer-list@gnome.org https://mail.gnome.org/mailman/listinfo/gegl-developer-list