Since recent compiz updates in cooker, we have a "small" issue: the
window decoration plugin is not enabled correctly if the gconf
plugin is loaded in compiz, which we do by default (#32251).
Actually, it's a bug in the compiz part that initializes plugins.
It maintains a list of loaded plugins, and fills it at startup after
each successful plugin load. The problem is that the gconf plugin
also updates this list when initializing.
It happens this way: compiz first loads the glib plugin, adds it to
the plugins list, then loads the gconf plugin, which reads GConf
settings and notably loads the active_plugins GConf key in the
plugins list. And since the gconf plugin is loaded, compiz adds it
to the plugin list, but to the wrong place: it doesn't handle the fact
that a plugin could have modified the current plugins list
(more details in my post on the compiz ML).
I wrote a quick patch (see previous link) to make compiz handle the
fact that a plugin modifies the active plugins list, and it makes
compiz loads the decoration plugin fine. It has an unfortunate side
effect: the gconf plugin is not added in the active plugins list
anymore, and it make compiz segfaults when it is stopped.
To debug it further, I would have made gdb not stop on the SIGINT
signal and allow it to pass the signal to the debugged program
(gdb doc about signals):
(gdb) handle SIGINT nostop
(gdb) handle SIGINT pass
But I wasn't really able to catch the segfault this way, so I kinda
gave up. Anyway, this bug is not much important, we used to load the
gconf plugin by default, to make the default plugins stored in
gconf and to use the deprecated gset-compiz frontend. Now,
the default plugins list has been added by
coling in the core.xml file, so this is
not very useful anymore. (Hey, no, this wasn't a waste of time!)
Ok, there is a bug and I haven't issued a proper fix, but well, it
won't obsess me much. Furthermore, my
Asperger Quotient
is only 14, I'm not even up the so-called "average women" quotient.
So, why bother with this buglet? :-p
At that time, yesterday night, I thought that my day at the office was
offer, and that I could gently modify the compiz package from home,
not to load the gconf plugin by default. So, a bit later, I
leisurely updated my laptop at home with latest cooker, and prepared
to test my update package.
That was kinda naive, I should have known by now that the world is
full of bugs, and that they have a special kinship with me.
Of course, compiz didn't start correctly. Worse, it made my system
hard lock.
A quick look on freedesktop's bugzilla shows a bug with a close
description:
r300 DRI misrenders 3D objects (bug 11380).
In this report, the bug impacts blender. Ok, it's maybe a really long
shot. This bug seems to involve gcc and the -O2 build optimization,
especially the -ftree-vrp one
(upstream gcc bug 32544).
Yep, building with -O0 solves my bug \o/
Still, I couldn't let this miscompilation with -O2 unsolved.
So, the day after (today), I had a closer look to the incriminated
Mesa source code (src/mesa/drivers/dri/r300/r300_state.c).
Let's try the -fno-tree-vrp option mentionned in the bug report.
It works as well \o/
I extracted the code of the r300SetupPixelShader() function in a
separate r300_pixelshader.c file to make debugging
easier, since Bero already noticed it was the culprit (in the gcc bug
report).
First, the preprocessed files are the same:
gcc -O2 <options> r300_pixelshader.c -E -o r300_pixelshader.i
gcc -O2 -fno-tree-vrp <options> r300_pixelshader.c -E -o r300_pixelshader.notreevrp.i
Then, we can compare the generated assembly code (using
-fverbose-asm to get some hints):
gcc -fverbose-asm -O2 <options> r300_pixelshader.c -S -o r300_state.s
gcc -fverbose-asm -O2 -fno-tree-vrp <options> r300_pixelshader.c -S -o r300_state.s
It differs, maybe -fno-inline could have helped as well to reduce the diff,
but here I'm lost (see preprocessed diff on the freedesktop bug report).
Strangly enough, building with the -O3 optimize option instead of
-O2 also makes the r300 driver run fine.
According to the gcc manpage, -O3 is equivalent to
-O2 -finline-functions -funswitch-loops -fgcse-after-reload
A quick look at gcc's source code (decode_options() in gcc/opts.c)
seems to confirm that, but actually, a lot more options are enabled.
For example, some bits of loops unroll (from tree_complete_unroll()
in gcc/tree-ssa-loop.c) are enabled if optimize level is greater or
equal to 3. Building with
-O2 -finline-functions -funswitch-loops -fgcse-after-reload -funroll-loops
also made it work.
By the way, the code also runs fine when built with gcc 4.3.
This is all very interesting, but it does not help to find what's
breaking it between -00 and -O2, and it does not change the fact
that I'm lost /o\
Pixel, our beloved gcc maintainer in law, has helped me a lot so far,
and we were able to extract a not so big code portion (the assembly
diff is about 150 lines) that triggers the bug.
So let's give him the hot potato ;-)
Now, I'm sort of out of the loop, Pixel handles this fine. He has a
very small testcase (dozen of lines) that involves union variables and
Value Range Propagation (VRP) on trees, which can be debugged using
-fdump-tree-vrp, and gdb...
For now, I'm uploading a Mesa package built with -fno-tree-vrp in
cooker, we'll get a proper fix later if possible.
I can now focus on "desktop" features, expect a new drak3d soon!