Back to the main page.

Bug 149 - there is no way to cancel jobs that are submitted but not yet completed

Status CLOSED FIXED
Reported 2010-09-08 18:08:00 +0200
Modified 2011-01-05 12:01:09 +0100
Product: FieldTrip
Component: peer
Version: unspecified
Hardware: PC
Operating System: Mac OS
Importance: P1 normal
Assigned to: Robert Oostenveld
URL:
Tags:
Depends on:
Blocks:
See also:

Robert Oostenveld - 2010-09-08 18:08:46 +0200

E.g. consider peercellfun(@pause, {3600, 3600, 3600, 3600}) which would case a one-hour pause. Shortly after starting this, the user presses ctrl-C. The jobs will continue running, effectively blocking the busy slaves for one hour. My idea is to implement a "dead man's switch", i.e. the master should announce every second that it is still interested in the job results. If the master fails to indicate its interest, the job on the busy slave is terminated. This could (partially) be implemented using the matlab timer object. PS The matlab timer object could also be used to enforce a timavail in peerexec.


Robert Oostenveld - 2010-09-08 23:33:17 +0200

The general idea is that a job should be aborted if the master disappears. Using the onCleanup function in peercellfun now ensures that the master switches to zombie when ctrl-c is pressed. On the slave side I tried implementing it using a matlab timer, but that was only able to execute an error outside of the workspace that was eval'ed. A possible, but undesired solution for the peerslave.m was to do an exit in the timer. For the command-line peerslave, the timer does not seem to work, probably because the engine does not have a java VM. In the end, I did manage to implement something: on the master side it involves immediately switching to zombie on exit of peercellfun (also on a forced exit with ctrl-c). On the slave side it involves a kill-switch that is enabled during job execution and triggered if the master disappears. The check_killswitch() function is executed at the end of the expire loop. It is enabled with the option --killswitch 0|1.


Robert Oostenveld - 2010-09-08 23:38:56 +0200

it turns out that switching the master immediately to zombie on a ctrl-c has an unexpected side effect on the cmd-line peesslave as demonstrated by peerslave[90489]: starting MATLAB engine peerslave[90489]: executing job 1 from robert@MacBook.local (jobid=250879757) peerslave[90489]: failed to write jobdef and then it exits. I filed a new bug for that, see http://bugzilla.fcdonders.nl/show_bug.cgi?id=151


Robert Oostenveld - 2011-01-05 11:57:04 +0100

selected a long list of resolved bugs from roboos and changed the status into "RESOLVED"


Robert Oostenveld - 2011-01-05 12:01:09 +0100

selected all old bugs from roboos with status RESOLVED and changed it into CLOSED