Back to the main page.

Bug 275 - timallow/timreq have to be made consistent on master and slave

Status CLOSED FIXED
Reported 2010-12-08 11:41:00 +0100
Modified 2011-01-05 12:01:02 +0100
Product: FieldTrip
Component: peer
Version: unspecified
Hardware: PC
Operating System: Mac OS
Importance: P1 normal
Assigned to: Robert Oostenveld
URL:
Tags:
Depends on:
Blocks:
See also:

Robert Oostenveld - 2010-12-08 11:41:32 +0100

in peercellfun.m % estimate the time that it took the collected jobs to finish estimated_min = min(collecttime(collected) - submittime(collected)); estimated_max = max(collecttime(collected) - submittime(collected)); estimated_avg = estimated_max; % the maximum is used instead of the mean if ~isempty(ResubmitTime) % use the user-specified amount estimated = ResubmitTime; elseif any(collected) % estimate the expected time of the jobs, assuming a "normal" distribution % the rationale for the estimate is the mean plus X times the standard deviation % instead of the standard deviation the min-max range is used estimated = estimated_avg + 2*(estimated_max - estimated_min); % take into account that the estimate is less accurate in case of only few collected jobs estimated = estimated * (1 + 1/(1+log10(sum(collected)))); % add some time to allow the matlab engine to start estimated = estimated + 10; elseif ~isempty(timreq) % assume that it will not take more than 2x the required time estimated = 2*timreq; % add some time to allow the matlab engine to start estimated = estimated + 10; else % it is not possible to estimate the time that a job will take estimated = inf; end the code above determines when a job is considered to be lost and requires resubmission in peerslave.c /* determine the maximum allowed job duration */ timallow = 2*(host->timavail+1); this means that a 24h peerslave can continue running for 48h, whereas the job is only a 10 minute job. The job probably will be resubmitted much sooner than killed.


Robert Oostenveld - 2010-12-08 11:50:27 +0100

in peerslave.c the killswitch should be triggered if either host->timallow is exceeded or job->timreq (times a fudge factor) is exceeded. i.e. take minimum of the two


Robert Oostenveld - 2010-12-15 21:42:55 +0100

fixed in revision 2442


Robert Oostenveld - 2010-12-15 21:43:51 +0100

there is currently still a problem with the estimated time of the jobs in peercellfun furthermore, 64-bit linux peerslave needs to be recompiled and started on mentat and esi-hpc1


Robert Oostenveld - 2011-01-05 11:56:57 +0100

selected a long list of resolved bugs from roboos and changed the status into "RESOLVED"


Robert Oostenveld - 2011-01-05 12:01:02 +0100

selected all old bugs from roboos with status RESOLVED and changed it into CLOSED