Back to the main page.

Bug 645 - Increase the speed of reading in xml files, Anyone an idea how to??

Status CLOSED FIXED
Reported 2011-05-06 23:54:00 +0200
Modified 2012-04-11 16:48:39 +0200
Product: FieldTrip
Component: external
Version: unspecified
Hardware: PC
Operating System: Windows
Importance: P1 enhancement
Assigned to: Robert Oostenveld
URL:
Tags:
Depends on:
Blocks:
See also:

Ingrid Nieuwenhuis - 2011-05-06 23:54:04 +0200

Created attachment 46 xml file by egi_mff containing the events In toolbox xml4mat (now added to external), the function xml2mat.m does the reading in. This can take very long (in case of reading in an egi_mff event file with many triggers). Would there be ways to speed this up? On my quite fast 64 bit Windows 7 PC it can take up to 30 minutes for a 45 minute file. I've add an xml file as example


Robert Oostenveld - 2011-05-10 20:53:06 +0200

this affects the following functions ft_definetrial statistics_analytic statistics_montecarlo private/prepare_design (is this still in use?) statistics_stats (should be removed anyway)


Robert Oostenveld - 2011-05-10 20:53:51 +0200

(In reply to comment #1) > this affects the following functions > > ft_definetrial > statistics_analytic > statistics_montecarlo > private/prepare_design (is this still in use?) > statistics_stats (should be removed anyway) oops, that was a reply to the incorrect bug, it was meant for 640


Robert Oostenveld - 2011-05-10 21:29:56 +0200

it is indeed very slow, whereas it only contains 600 events MacBook> grep beginTime ~/Desktop/test_bug645.xml | wc -l 600 perhaps try another xml parser, such as http://www.artefact.tk/software/matlab/xml this one is implemented by Guillaume Flandin (from SPM/FIL)


Robert Oostenveld - 2011-05-10 21:39:57 +0200

or http://www.mathworks.com/matlabcentral/fileexchange/4278


Ingrid Nieuwenhuis - 2012-03-30 02:48:44 +0200

We (Thanks to Gio) now have a solution that works much faster:


Ingrid Nieuwenhuis - 2012-03-30 02:59:30 +0200

Created attachment 244 new xml2struct function which is much faster than the one in external>xml4mat


Ingrid Nieuwenhuis - 2012-03-30 03:00:58 +0200

Oops, I meant to continue the description before submitting this bug. As I said, thanks to Gio we have a faster solution now. We use a new xlm2struct function, that is much faster! I've tested it, and used it without any problems. This is Gio's description, and I have attached the new xlm2struct function that goes with it: Now I use xmlread function from matlab and this one from Matlab File Exchange: http://www.mathworks.com/matlabcentral/fileexchange/28518-xml2struct Then I modified in such a way that it gives the same output that Fieldtrip is expecting. To run it, download the function and put it in the path higher than /path/to/fieldtrip/external/xml4mat This is what I do: 1) add fieldtrip as usual: addpath /path/to/fieldtrip ft_defaults addpath /path/to/fieldtrip/external/xml4mat 2) and afterwards addpath /path/to/dir/with/xlm2struct 3) Then try to read an MFF file. If you see, "xml2struct reading NAME_OF_THE_XML_FILE" it means you're using the faster xml2struct.m I haven't included in Fieldtrip because I don't know the licence and it relies on xmlread which relies on Java.


Robert Oostenveld - 2012-03-30 08:58:57 +0200

it is only a single function, so makes sense to add it to fileio/private (where it will always have higher priority than external/xml2struct). I don't see any explicit copying or license restrictions mentioned, so I assume that the file can be added to fieldtrip. We should of course keep the original authors names in the code and provide a link to where the file came from. Would the external/xml2struct still be needed after adding this one?


Gio Piantoni - 2012-03-30 09:54:09 +0200

Created attachment 245 diff xml2struct_fileexchange.m xml2struct_modifiedforFT.m Hi, sorry I did not include the code in the main Fieldtrip code before. The xml2struct is indeed much faster than the xml4mat. However, we should keep in mind: (1) xml2struct relies on xmlread and java. So you need the javamanager running. I can imagine cases in which matlab runs with the -nojvm flag. In this case, xml2struct would crash (while xml2mat would run). (2) xml4mat is GNU licensed, while xml2struct has BSD license. Does BSD license allow us to include the function in a GNU project and modify it? (3) I made some small changes to the xml2struct code to make it 100% compatible with the output of xml2mat (see attachment). If we get rid of xml2mat, then we can use the original xml2struct code, without changes (but we need to modify ft_read_event and ft_read_header). I just noticed that they added a small change to the code: "The function now replaces element and attribute names containing - by _dash_, . by _dot_ and : by _colon_" but this change can be integrated easily. Thus, if points (1) and (2) are good for you, we can add xml2struct to fileio/private and test it. What do you think? Cheers, G


Ingrid Nieuwenhuis - 2012-03-30 18:09:57 +0200

Soon EGI will have the new MFF reading code ready, and that also relies on java. So probably when using MFF code, one should just always run matlab with Java... So for now, I think it will be easiest if we just add this new modified xml2struct to private. Would it be possible to detect if JAVA is available, and if so use this one, and if not, use the old toolbox? Hopefully this will all be very temporary for whenever EGI's code is ready.


Robert Oostenveld - 2012-04-04 09:20:00 +0200

(In reply to comment #9) regarding license issues with http://www.mathworks.com/matlabcentral/fileexchange/28518-xml2struct http://www.gnu.org/licenses/gpl-faq.html#OrigBSD states ----- Why is the original BSD license incompatible with the GPL? (#OrigBSD) Because it imposes a specific requirement that is not in the GPL; namely, the requirement on advertisements of the program. Section 6 of GPLv2 states: You may not impose any further restrictions on the recipients' exercise of the rights granted herein. GPLv3 says something similar in section 10. The advertising clause provides just such a further restriction, and thus is GPL-incompatible. The revised BSD license does not have the advertising clause, which eliminates the problem. ----- The license attached to xml2struct states Copyright (c) 2010, Wouter Falkena All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. which means that there is no requirement for advertisement. This is further explained on http://en.wikipedia.org/wiki/BSD_licenses which clarifies that xml2struct is covered by the (GPL compatible) BSD-2-clause-license. That means we can include it in fieldtrip/fileio/private and that it does not have to go into a separate external directory.


Gio Piantoni - 2012-04-04 09:32:19 +0200

Great, Thanks for looking into it! I agree about adding it to fileio/private. We can ask users to run java when converting MFF files (maybe throwing an informative error if Java is not running). I don't think it's of much use to have a toolbox which only works as very rare fallback. In this way, we can get rid of external/xml2mat What do you think?


Robert Oostenveld - 2012-04-04 10:20:39 +0200

(In reply to comment #12) I made some improvements to the first implementation of the egi_mff file format (egi_mff_v1), use alternative (faster) xml2struct function, give informative error in case JVM is not running, added script to test the speed of reading the xml file (see http://bugzilla.fcdonders.nl/show_bug.cgi?id=645), added script to test the general correctness of the main reading functions (see http://bugzilla.fcdonders.nl/show_bug.cgi?id=1407). Sending fileio/ft_read_data.m Sending fileio/ft_read_event.m Sending fileio/ft_read_header.m Adding fileio/private/xml2struct.m Adding test/test_bug1407.m Adding test/test_bug645.m Transmitting file data ...... Committed revision 5582. Note that the test_bug645 script shows that the old xml2struct takes ~400 seconds, whereas the alternative one takes ~20 seconds.


Robert Oostenveld - 2012-04-04 10:44:39 +0200

I added a faq to the fieldtrip wiki See http://fieldtrip.fcdonders.nl/faq/how_can_i_read_egi_mff_data_without_the_jvm


Robert Oostenveld - 2012-04-11 16:48:39 +0200

I cleaned up my bugzilla list by changing the status from resolved (either fixed or wontfix) into closed. If you don't agree, please reopen the bug. Robert