Saturday, May 14, 2011

Hunting the glib-networking / libproxy mystery bug

Just when we wanted to stabilize KDE-4.6, an unpleasant surprise appeared out of nowhere. For no apparent reason, polkit was failing to make DBUS connections. As one consequence, some users could not log into KDE; the desktop would simply hang during the session initialization. After upgrading and downgrading packages, and after a lot of communication on bugzilla (since noone from the Gentoo KDE team could reproduce the bug), a pattern emerged: net-libs/glib-networking-2.28.6.1 was somehow causing the problem, force-unmerging it with "emerge -C" restored correct behaviour. Later, this was narrowed down to net-libs/glib-networking[libproxy].

Minus docs and debug info, this is what is installed by glib-networking:
/usr/lib/gio/modules/libgiolibproxy.so
/usr/lib/gio/modules/libgiognutls.so
/usr/lib/gio/modules/libgiognomeproxy.so
/usr/share/dbus-1/services/org.gtk.GLib.PACRunner.service
/usr/libexec/glib-pacrunner
Which leads to the question, how can these seemingly unrelated libraries
cause DBUS hangs. Let's just say, the Gnome guys did not know either, but obviously the most intrusive part is the proxy autoconfiguration service registered in dbus.

As KDE-4.6.2 stabilization was pending, one way to temporarily get around the problem was to block concurrent installation of net-libs/glib-networking[libproxy]. After all that package appeared only on 24 Apr 2011 (one week before the bug report was filed) as ~arch in the main tree, and is hard-required in exactly one package (net-voip/telepathy-gabble). Yes, we checked that thoroughly before. User responses however were not so, err, welcoming.
"Anyway it is high time ppl start thinking before committing any changes in the main tree... even if we are talking about an unstable ~ stuff..." (bug 365479, comment 38)

"Now we have repeat with strange blocker... and all because some guy forgotten (or didnt want to) try it before pushing into tree... Sometimes i think that Gentoo developers comes from the round-up." (bug 365479, comment 43)
Ah well. Now the blocker is gone again and we won't hinder people from shooting themselves in the foot, but we still dont know what actually the problem is. In any case, libproxy seems to be prone to more misbehaviour (what the %$&%$ is it doing in NVidia OpenGL code??!!). So far the reports of various details do not really add up to a coherent picture.

No comments:

Post a Comment