Back to the main page.

Bug 1615 - qsubcellfun does not detect java runtime error

Status CLOSED WONTFIX
Reported 2012-07-19 17:18:00 +0200
Modified 2012-12-31 11:46:28 +0100
Product: FieldTrip
Component: qsub
Version: unspecified
Hardware: PC
Operating System: Linux
Importance: P3 normal
Assigned to: Robert Oostenveld
URL:
Tags:
Depends on:
Blocks:
See also:

Marcel Zwiers - 2012-07-19 17:18:58 +0200

When I submit a series of jobs, a portion of those already crash at startup but qsubcellfun does not notice that and keeps on waiting for the results to return. The error file remains empty, I suppose because a java logfile was created instead (and in my home) and a hs_err_pid#.log (in the working directory). Here's an example output file: Starting version 79 of MATLAB Executing /opt/matlab79/bin/matlab -singleCompThread -nodisplay -singleCompThread -nosplash -nodisplay -r restoredefaultpath;addpath('/home/common/matlab/fieldtrip/qsub');qsubexec('marzwi_mentat277_p11308_b41_j040');exit # # An unexpected error has been detected by Java Runtime Environment: # # java.lang.OutOfMemoryError: Cannot create GC thread. Out of system resources. # # Internal Error (gcTaskThread.cpp:41), pid=45005, tid=140680095475456 # Error: Cannot create GC thread. Out of system resources. # # Java VM: Java HotSpot(TM) 64-Bit Server VM (11.2-b01 mixed mode linux-amd64) # An error report file with more information is saved as: # /home/mrphys/marzwi/mridata/hs_err_pid45005.log # # If you would like to submit a bug report, please visit: # http://java.sun.com/webapps/bugreport/crash.jsp # Opening log file: /home/mrphys/marzwi/java.log.45005


Marcel Zwiers - 2012-07-19 17:51:07 +0200

The error file does not always stay empty. Here's an example of the (very short) error file: /opt/matlab79/bin/matlab: fork: retry: Resource temporarily unavailable The corresponding output file was the same as before and for completeness, here's the java log-file: Operating System: Linux 2.6.32-220.el6.x86_64 #1 SMP Tue Dec 6 19:48:22 GMT 2011 x86_64 Processor ID: x86 Family 15 Model 9 Stepping 1, AuthenticAMD Host Name: dccn-c006.dccn.nl Stack Trace: [0] libjvm.so:0x00007f7d7bbc4e2e(0x7f7d8c18af20, 0x7f7d8407c410, 0x7f7d7bebbdf0, 0x5000000000) [1] libjvm.so:0x00007f7d7bce2250(0x7f7de0000001, 0x7f7d7bd19e40 "Cannot create GC thread. Out of ..", 0x7f7d 84032800, 0) [2] libjvm.so:0x00007f7d7b9541c8(0x7f7d8407b000, 0x7f7d8403c880, 39, 0xffffffff) [3] libjvm.so:0x00007f7d7b9a6cad(0x7f7d8403ca30 "ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ..", 0x7f7d8403c880, 48, 0x 08000000) [4] libjvm.so:0x00007f7d7b9a57ee(0x7f7d8c18b1f0, 0x7f7d7bbd8d07, 0x7f7d642e0000, 0x08000000) [5] libjvm.so:0x00007f7d7b9a5583(0x7f7d642e0000, 0x08000000, 65536, 0x7fffed1ff800) [6] libjvm.so:0x00007f7d7bbd8d07(0x7f7d84032801, 0, 0x7f7d8c18b270, 0x7f7d7bcbda9a) [7] libjvm.so:0x00007f7d7bcbddec(0x7f7d7b010100, 0, 0x7f7d84032d70, 0x7f7d84032f00) [8] libjvm.so:0x00007f7d7bcbda9a(0x7f7d84032800, 0x7f7d84032da0, 0x7f7d84033360, 0x7f7d84033370) [9] libjvm.so:0x00007f7d7b9da8d5(0x332e0010b0, 0x332e0010c4, 0, 0x7f7d8c18b3f0) [10] libjvm.so:0x00007f7d7bca7b47(0x1007f7d8401c900, 14, 0x7f7d840246a0 "-Djava.library.path=/opt/matlab2.." , 14) [11] libjvm.so:JNI_CreateJavaVM~(0x0c3ba8fa, 8, 9, 0x7f7d8401c900) + 128 bytes [12] libmwjmi.so:InitSunVM()(0x7f7d8c18b800, 0, 0x7f7d8c18b818, 0x184020fb8) + 1885 bytes [13] libmwjmi.so:InitJava()(0x7f7d9f404aa0, 0xffffffff, 0x7f7d8c18bc40, 0x7f7d9f1c615e) + 155 bytes [14] libmwjmi.so:mljInit()(0, 0x332cc09276, 0, 0x7f7d8c18b9c0) + 21 bytes [15] libmwmcr.so:mcr_initialize0(MfileReader*, MexFileReader*, bool, char const*, char const**)(0x01a7c598, 0x7f7d8c18bcc0 "°K", 0x7f7d8c18c700, 0) + 1726 bytes [16] libmwmcr.so:mcr::runtime::InterpreterThread::Impl::run(boost::shared_ptr<mcr::runtime::InterpreterThrea d::Impl> const&, mcr::runtime::InterpreterThread::Impl::init_context*)(0x7f7d7c004bb0, 0x7f7d7c004c50, 0x7f7d8 40008c0, 0x7f7d8c18bdb0) + 164 bytes [17] libmwmcr.so:run_init_and_handle_events(void*)(0x7f7d8c18bd90, 4096, 0, 0x00a01000) + 63 bytes [18] MATLAB:boost::function0::operator()() const(0x7fffed1ce318, 0x800000005, 0, 0x7f7d840008c0) + 412 bytes [19] MATLAB:mcrMain(int, char const**)(0x7f7d8c18bef0, 0x7f7d8c18c700, 0, 0x7fffed1ce160) + 243 bytes [20] libmwmcr.so:runMcrMain(void*)(0x7f7d8c18c700, 0, 0, 0) + 28 bytes ------------------------------------------------------------------------ Fatal Java Exception detected at Thu Jul 19 17:39:59 2012 ------------------------------------------------------------------------ Configuration: MATLAB Version: 7.9.0.529 (R2009b) MATLAB License: 38957 Operating System: Linux 2.6.32-220.el6.x86_64 #1 SMP Tue Dec 6 19:48:22 GMT 2011 x86_64 GNU C Library: 2.12 stable Window System: No active display Current Visual: None Processor ID: x86 Family 15 Model 9 Stepping 1, AuthenticAMD Virtual Machine: Java is not enabled Default Encoding: ISO-8859-1 Java is not enabled </p>

Robert Oostenveld - 2012-11-23 20:02:23 +0100

The underlying cause was linux instabilities. I don't think the ft code can be hardened against this. It is not a problem any more on our cluster, so I close this one.


Robert Oostenveld - 2012-12-31 11:46:28 +0100

closed several bugs that have been resolved for some time. Feel free to reopen the bug if you disagree.