Since recent compiz updates in cooker, we have a "small" issue: the window decoration plugin is not enabled correctly if the gconf plugin is loaded in compiz, which we do by default (#32251).
Actually, it's a bug in the compiz part that initializes plugins. It maintains a list of loaded plugins, and fills it at startup after each successful plugin load. The problem is that the gconf plugin also updates this list when initializing.
It happens this way: compiz first loads the glib plugin, adds it to the plugins list, then loads the gconf plugin, which reads GConf settings and notably loads the active_plugins GConf key in the plugins list. And since the gconf plugin is loaded, compiz adds it to the plugin list, but to the wrong place: it doesn't handle the fact that a plugin could have modified the current plugins list (more details in my post on the compiz ML).
I wrote a quick patch (see previous link) to make compiz handle the fact that a plugin modifies the active plugins list, and it makes compiz loads the decoration plugin fine. It has an unfortunate side effect: the gconf plugin is not added in the active plugins list anymore, and it make compiz segfaults when it is stopped.
To debug it further, I would have made gdb not stop on the SIGINT signal and allow it to pass the signal to the debugged program (gdb doc about signals):
(gdb) handle SIGINT nostop (gdb) handle SIGINT pass
But I wasn't really able to catch the segfault this way, so I kinda gave up. Anyway, this bug is not much important, we used to load the gconf plugin by default, to make the default plugins stored in gconf and to use the deprecated gset-compiz frontend. Now, the default plugins list has been added by coling in the core.xml file, so this is not very useful anymore. (Hey, no, this wasn't a waste of time!)
Ok, there is a bug and I haven't issued a proper fix, but well, it won't obsess me much. Furthermore, my Asperger Quotient is only 14, I'm not even up the so-called "average women" quotient. So, why bother with this buglet? :-p
At that time, yesterday night, I thought that my day at the office was offer, and that I could gently modify the compiz package from home, not to load the gconf plugin by default. So, a bit later, I leisurely updated my laptop at home with latest cooker, and prepared to test my update package.
That was kinda naive, I should have known by now that the world is full of bugs, and that they have a special kinship with me. Of course, compiz didn't start correctly. Worse, it made my system hard lock.
A quick look on freedesktop's bugzilla shows a bug with a close description: r300 DRI misrenders 3D objects (bug 11380). In this report, the bug impacts blender. Ok, it's maybe a really long shot. This bug seems to involve gcc and the -O2 build optimization, especially the -ftree-vrp one (upstream gcc bug 32544). Yep, building with -O0 solves my bug \o/
Still, I couldn't let this miscompilation with -O2 unsolved. So, the day after (today), I had a closer look to the incriminated Mesa source code (src/mesa/drivers/dri/r300/r300_state.c). Let's try the -fno-tree-vrp option mentionned in the bug report. It works as well \o/
I extracted the code of the r300SetupPixelShader() function in a separate r300_pixelshader.c file to make debugging easier, since Bero already noticed it was the culprit (in the gcc bug report).
First, the preprocessed files are the same:
gcc -O2 <options> r300_pixelshader.c -E -o r300_pixelshader.i gcc -O2 -fno-tree-vrp <options> r300_pixelshader.c -E -o r300_pixelshader.notreevrp.i
Then, we can compare the generated assembly code (using -fverbose-asm to get some hints):
gcc -fverbose-asm -O2 <options> r300_pixelshader.c -S -o r300_state.s gcc -fverbose-asm -O2 -fno-tree-vrp <options> r300_pixelshader.c -S -o r300_state.s
It differs, maybe -fno-inline could have helped as well to reduce the diff, but here I'm lost (see preprocessed diff on the freedesktop bug report).
Strangly enough, building with the -O3 optimize option instead of -O2 also makes the r300 driver run fine. According to the gcc manpage, -O3 is equivalent to -O2 -finline-functions -funswitch-loops -fgcse-after-reload A quick look at gcc's source code (decode_options() in gcc/opts.c) seems to confirm that, but actually, a lot more options are enabled. For example, some bits of loops unroll (from tree_complete_unroll() in gcc/tree-ssa-loop.c) are enabled if optimize level is greater or equal to 3. Building with -O2 -finline-functions -funswitch-loops -fgcse-after-reload -funroll-loops also made it work.
By the way, the code also runs fine when built with gcc 4.3.
This is all very interesting, but it does not help to find what's
breaking it between -00 and -O2, and it does not change the fact
that I'm lost /o\
Pixel, our beloved gcc maintainer in law, has helped me a lot so far,
and we were able to extract a not so big code portion (the assembly
diff is about 150 lines) that triggers the bug.
So let's give him the hot potato ;-)
Now, I'm sort of out of the loop, Pixel handles this fine. He has a very small testcase (dozen of lines) that involves union variables and Value Range Propagation (VRP) on trees, which can be debugged using -fdump-tree-vrp, and gdb...
For now, I'm uploading a Mesa package built with -fno-tree-vrp in cooker, we'll get a proper fix later if possible. I can now focus on "desktop" features, expect a new drak3d soon!