You are not logged in.

#1 2012-11-06 21:57:29

Trilby
Inspector Parrot
Registered: 2011-11-29
Posts: 29,537
Website

Cairo: bad xlib event (ShmCompletion?) in cairo_xlib_surface_create()

EDIT: the title of this thread has changed to reflect how the issue has been narrowed down.  As can be seen by the thread progress, I didn't initially suspect a cairo-xlib problem.  The original first post follows:

A less concatenated summary than the title:
A dwm-style array of function pointers declared as size 'LASTEvent' but only initalized with the events the given program should respond to seems to leave uninitialized (but declared) elements potentially holding stray data which could be treated as function pointers leading to a seg fault.

Details:
In the AUR package Slider-git, I have used an array of function pointers and a simple while loop for event handling.  This is modelled directly after dwm's event handling and it is an approach I've used effectively in a number of X11 programs.  Slider has recently started seg faulting immediately after start up.  I have traced the problem to responding to an event of type #65.

I have initialized the array of function pointers to have functions for ButtonPress and KeyPress as follows:

static void (*handler[LASTEvent])(XEvent *) = {
    [ButtonPress]    = buttonpress,
    [KeyPress]        = keypress,
};

buttonpress and keypress are defined functions with prototypes coming before the initalization of the handler array.

The main loop was as follows:

while (running && !XNextEvent(dpy,&ev))
    if (handler[ev.type]) handler[ev.type](&ev);

The intend of these code segments is to declare an array of function pointers (handler) and initalize elements with numbers defined by the enum values "ButtonPress" and "KeyPress" to the relevant functions.  The main loop checks whether the received event has an initialized function pointer in the handler array, and if it does, that handler function is called.

This approach has worked well in many programs and is taken directly from dwm.

Recently slider has began segfaulting shortly after startup.  I have traced it to the main event loop, and further narrowed it down to a specific event type number (ev.type == 65).  Neither ButtonPress nor KeyPress are 65.  I found the following variation on the main loop does side-step the problem:

while (running && !XNextEvent(dpy,&ev)) {
    if (ev.type == ButtonPress) buttonpress(&ev);
    if (ev.type == KeyPress) keypress(&ev);
}

I have put this up on git as a temporary solution to the problem, but I'd like help understanding why this is needed.


I realize that the handler array is declared to have LASTEvent number of elements (LASTEvent being the last entry in an enum with all other event types - defined in Xlib.h).  So handler is an array of *many* elements each of a function pointer type.  In my code only two of those elements are actually initialized, and in any given program (such as dwm), only a subset of the pointers in such an array are initalized.

As the elements of the array that are not intended to be used are never explicitly zeroed out they are not null pointers.  Thus the conditional test in the while loop can pass as true on 'random' data left in the memory range of the handler array.  I therefore suspect that such a random 'junk' element is passing the conditional for a #65 event and my program is attempting to "call" a random address as if it was a function. Am I misinterpreting this?

This, in fact, seems like a likely or expected result of this type of event handling.  Yet many programs use this approach quite effectively.  What am I missing in how dwm does this or why doesn't this problem show up far more often?

(edit: typos)

Last edited by Trilby (2012-11-08 02:09:47)


"UNIX is simple and coherent..." - Dennis Ritchie, "GNU's Not UNIX" -  Richard Stallman

Offline

#2 2012-11-07 21:03:16

moetunes
Member
From: A comfortable couch
Registered: 2010-10-09
Posts: 1,033

Re: Cairo: bad xlib event (ShmCompletion?) in cairo_xlib_surface_create()

Have you tried it without a comma after keypress ?

static void (*handler[LASTEvent])(XEvent *) = {
    [ButtonPress]    = buttonpress,
    [KeyPress]        = keypress
};

You're just jealous because the voices only talk to me.

Offline

#3 2012-11-07 22:24:28

Trilby
Inspector Parrot
Registered: 2011-11-29
Posts: 29,537
Website

Re: Cairo: bad xlib event (ShmCompletion?) in cairo_xlib_surface_create()

Thanks, I did try with and without the comma and I get the same results.

I've come to wonder whether dwm does, in fact, initalize a function pointer for every event it registers to receive.  This would make sense as a way to avoid this problem, though I am skeptical because programs register to get event messages for whole groups of event types (with event masks).  I can test this idea later tonight though.  I'll report back if I get anywhere.

Follow-up: Nope, this hypothesis can be rejected.  I stopped looking once I found the first event that dwm registers to receive without specifying a handler for in the function pointer array: dwm sets the SubstructureRedirectMask on the root window and does not initialize a handler function pointer for the CirculateRequest event.

Last edited by Trilby (2012-11-08 00:36:20)


"UNIX is simple and coherent..." - Dennis Ritchie, "GNU's Not UNIX" -  Richard Stallman

Offline

#4 2012-11-08 00:47:29

Trilby
Inspector Parrot
Registered: 2011-11-29
Posts: 29,537
Website

Re: Cairo: bad xlib event (ShmCompletion?) in cairo_xlib_surface_create()

My assessment was incorrect.

I just checked on the Xlib event enum.  There is no event #65 - in fact there are only 36 events (LASTEvent = 36).  So the segfault is coming from trying to read from outside the handler array.

Now the question is why on earth is Xlib sending my program an event with an event type value that is outside the acceptable range.

------------

I've done some more debugging and found that this mystery event has send_event = True which means this is not a 'normal' event from the X server.  This explains how it can have an inappropriate event number.  It is also reported from a window other than my program's window (and not the root window).  As this problem only started with updates of poppler and cairo I'm wondering if one of them may be making a call to XSendEvent with a bad event number.  As I use a cairo-xlib surface and (AFAIK) poppler doesn't have any direct interaction with xlib, I'm thinking this may be a change in the cairo libary.

Now, off to go grep through all the cairo-xlib source code for any calls to XSendEvent.

--------------

I believe this is the relevant change in cairo-xlib: http://comments.gmane.org/gmane.comp.lib.cairo/23258

Now, to see if I can make sense of what they are doing.

-------------

I have confirmed that the bad event is being generated by cairo-xlib during an cairo_xlib_surface_create() by having the main thread wait until the rendering thread is complete.  If the main loop is delayed until after the rendering thread has completed and the XEvents are flushed (and discarded) then this will run as expected and the problem is solved.  If I do this without discarding the events, then the #65 event still comes through.

-------------

This seems like it may be a ShmCompletion event (X11/extentions/XShm.h), but these are entirely new to me.  I understand that cairo may use these internally, but why are such events being sent to my window?

Crap, now cairographics.org is down, so I can't even fetch the sources to start exploring.  Oh well, I guess that means troubleshooting is over for the evening.

Last edited by Trilby (2012-11-08 02:02:11)


"UNIX is simple and coherent..." - Dennis Ritchie, "GNU's Not UNIX" -  Richard Stallman

Offline

Board footer

Powered by FluxBB