Back to the main page.

Bug 936 - make sure qsubexec's output becomes available in an atomic operation

Status CLOSED FIXED
Reported 2011-09-06 14:26:00 +0200
Modified 2011-09-14 14:33:36 +0200
Product: FieldTrip
Component: peer
Version: unspecified
Hardware: PC
Operating System: Windows
Importance: P1 normal
Assigned to: Eelke Spaak
URL:
Tags:
Depends on:
Blocks:
See also:

Eelke Spaak - 2011-09-06 14:26:04 +0200

I figured I'd make a bug instead of an email, just to keep the info about the bug central. It turns out that matlab's rename() is not an atomic operation, as I found out after some googling. Also movefile() would not do the trick. So, I'll now implement the renaming by a system()-call to the rename-command, which will hopefully help.


Eelke Spaak - 2011-09-06 15:35:25 +0200

This seems to do the trick. I will do some more testing to make sure. One more note: the walltime option seems to impose quite a strict limit on the computation time. I noted a few jobs that were killed because they lasted 47s (as evident from the stderr output file), while timreq was 30.


Eelke Spaak - 2011-09-06 16:18:45 +0200

Before I forget: it turns out that even the 'rename' Unix command is not atomic. However, 'mv' internally uses the rename(2) system call, which *is* atomic (as long as the moving/renaming is done within the same filesystem).


Robert Oostenveld - 2011-09-06 17:34:50 +0200

great! It seems to work for me, although for 1000 jobs qsubcellfun somehow is still very slow. But the slowness is not due to the output not working any more. A consideration is to make a rename mex file that directly uses the atomic OS call.


Robert Oostenveld - 2011-09-06 17:35:07 +0200

(In reply to comment #1) > One more note: the walltime option seems to impose quite a strict limit on the > computation time. I noted a few jobs that were killed because they lasted 47s > (as evident from the stderr output file), while timreq was 30. I was not aware that jobs were already killed in the current torque configuration. Something to look into...


Robert Oostenveld - 2011-09-14 14:33:36 +0200

I closed all the bugs that were in the status RESOLVED. This includes the ones that we just discussed in the weekly fieldtrip meeting, but also the bugs that we did not discuss.