Back to the main page.

Bug 914 - qsubcellfun should use proper working directory and meaningful job IDs

Status CLOSED FIXED
Reported 2011-08-31 15:50:00 +0200
Modified 2011-09-14 14:33:50 +0200
Product: FieldTrip
Component: core
Version: unspecified
Hardware: PC
Operating System: Windows
Importance: P1 normal
Assigned to: Eelke Spaak
URL:
Tags:
Depends on:
Blocks:
See also:

Eelke Spaak - 2011-08-31 15:50:59 +0200

When I run a very simple sequence of test commands (the one mentioned in the reference docs for qsubcellfun, in fact): fname = 'power'; x1 = {1, 2, 3, 4, 5}; x2 = {2, 2, 2, 2, 2}; y = qsubcellfun(fname, x1, x2); The function takes forever (well, half an hour at least, and then I Ctrl+C'ed it) to return. Executing qstat in a shell tells me that the individual jobs were done a few seconds after they were submitted.


Eelke Spaak - 2011-08-31 16:13:11 +0200

After doing some debugging, it seems like, although the _input.mat, .sh, and .m files related to a job are created, the job execution itself malfunctions, since there is no corresponding _output.mat file, ever. I do get .sh.oXXX and .sh.eXXX files (presumably created by the torque system?), which seem to contain standard output and error output, respectively. For the contents of those files, see below: eelspa@mentat286:~/MATLAB/maier-analysis 661 $ cat job_95750684.sh.o1998 Starting version 79 of MATLAB Executing /opt/matlab79/bin/matlab -singleCompThread -nosplash -nodisplay -r job_95750684 < M A T L A B (R) > Copyright 1984-2009 The MathWorks, Inc. Version 7.9.0.529 (R2009b) 64-bit (glnxa64) August 12, 2009 To get started, type one of these: helpwin, helpdesk, or demo. For product information, visit www.mathworks.com. >> eelspa@mentat286:~/MATLAB/maier-analysis 662 $ cat job_95750684.sh.e1998 ??? Undefined function or variable 'job_95750684'. The weird thing is, that if I manually start matlab, go to my home directory, and execute 'job_95750684', then everything seems to work: some computation is done and matlab exits (as it should, because qsubexec is called), and a _output.mat file is generated.


Eelke Spaak - 2011-08-31 16:40:10 +0200

I found the issue: in my home directory, I have a startup.m file that automatically executes "cd MATLAB", so MATLAB starts in another directory than assumed by the auto-generated shell script. Removing my startup.m is a temporary fix, but this seems inelegant. Maybe we should use a '~/.ftqsub/' folder to place all the temporary stuff in? If we start matlab from that folder, any startup.m in other folders will be ignored of course. Another advantage is that we then don't get all this junk floating around users' home directory in case a job fails.


Robert Oostenveld - 2011-08-31 17:19:08 +0200

qsub/private/feval does the following, which I would expect to overrule the pwd that results from your startup.m "fexec.m" line 69 of 225 --30%-- col 3 % try setting the same path directory option_path = ft_getopt(optin, 'path'); setcustompath(option_path); % try changing to the same working directory option_pwd = ft_getopt(optin, 'pwd'); setcustompwd(option_pwd); I agree that we can consider another directory for the temporary files, but ideally they are cleaned up already by qsubexec and by qsubget: manzana> grep delete *.m qsubexec.m: delete(inputfile); qsubget.m: delete(shellscript); qsubget.m: delete(matlabscript); qsubget.m: delete(outputfile); qsubget.m: % the files cannot be deleted if the user changes the present working qsubget.m: delete(logfile_e); qsubget.m: delete(logfile_o);


Robert Oostenveld - 2011-08-31 18:02:29 +0200

(In reply to comment #3) it indeed does not clean up properly...


Robert Oostenveld - 2011-08-31 18:06:24 +0200

(In reply to comment #4) and it makes the temporary files in the homedir, instead of in the pwd (which I think would be more appropriate), whereas the o and e files are in the pwd. We could consider making the tmp files in a hidden directory as you suggest, but over the course of years it would fill up with junk. And we cannot use the /tmp directory, because it is not shared over mentats.


Eelke Spaak - 2011-09-01 09:18:56 +0200

The setcustompwd does work I think, but the pwd stored in optin will be set to the working directory in which qsubfeval was called. Since the .m-files are stored in the home directory, matlab cannot find them. Shall I change qsubfeval so that all temporary files are generated in the calling matlab's pwd? Or shall I implement the hidden-directory feature? And should I also look at the deletion of the temporary files? (In reply to comment #3) > qsub/private/feval does the following, which I would expect to overrule the pwd > that results from your startup.m > > "fexec.m" line 69 of 225 --30%-- col 3 > > % try setting the same path directory > option_path = ft_getopt(optin, 'path'); > setcustompath(option_path); > > % try changing to the same working directory > option_pwd = ft_getopt(optin, 'pwd'); > setcustompwd(option_pwd); > > I agree that we can consider another directory for the temporary files, but > ideally they are cleaned up already by qsubexec and by qsubget: > > manzana> grep delete *.m > qsubexec.m: delete(inputfile); > qsubget.m: delete(shellscript); > qsubget.m: delete(matlabscript); > qsubget.m: delete(outputfile); > qsubget.m: % the files cannot be deleted if the user changes the present > working > qsubget.m: delete(logfile_e); > qsubget.m: delete(logfile_o);


Eelke Spaak - 2011-09-01 09:27:06 +0200

(In reply to comment #6) > The setcustompwd does work I think, but the pwd stored in optin will be set to > the working directory in which qsubfeval was called. Since the .m-files are > stored in the home directory, matlab cannot find them. No, scrap that, I was confused. Once inside qsubexec (and feval), everything works fine actually. The issue is that matlab cannot find the temporary job_XXXXXX.m script (located in the home dir), when the startup.m issues a cd command. So what would actually solve this issue is if we change the matlab command from: matlab -r job_XXXXXX to matlab -r "run ~/job_XXXXXX" (untested, but should work) This still leaves the directories-stuff of course.


Eelke Spaak - 2011-09-01 12:44:34 +0200

The changes I made: - The temp files are now stored in the calling matlab's pwd; - Job IDs are now meaningful strings; - A job's stdout and stderr are now redirected to /dev/null (errors are caught by fexec anyway, so this shouldn't be a problem).


Robert Oostenveld - 2011-09-14 14:33:50 +0200

I closed all the bugs that were in the status RESOLVED. This includes the ones that we just discussed in the weekly fieldtrip meeting, but also the bugs that we did not discuss.