Back to the main page.

Bug 1264 - Revive automated testing

Status CLOSED FIXED
Reported 2012-01-17 15:23:00 +0100
Modified 2012-12-31 11:46:25 +0100
Product: FieldTrip
Component: release
Version: unspecified
Hardware: PC
Operating System: Linux
Importance: P1 major
Assigned to: Robert Oostenveld
URL:
Tags:
Depends on: 1441
Blocks:
See also:

Boris Reuderink - 2012-01-17 15:23:20 +0100

For each commit, the tests in the /test dir should be ran, and a summary should be presented at http://fieldtrip.fcdonders.nl/development/dashboard .


Boris Reuderink - 2012-01-17 15:23:48 +0100

Updated hours worked.


Robert Oostenveld - 2012-01-18 12:41:33 +0100

There is a mockup dashboard, containing actual content. I am not confident yet that the historical overview provides the desired information to help in improving code quality. What should be done to automate the test executions? What should be done to allow a manual test run of one or multiple scripts? What should be done to automate the update of the dashboard? What should be done to upload the test result log files?


Boris Reuderink - 2012-01-18 14:32:49 +0100

Changed status to assigned. In the FieldTrip meeting, the following suggestions were made: - when a change breaks a test, the diff between the last-good and broken revision could be linked on google-code. - the number of failing tests over time could be shown (indicates particularly reliable FieldTrip revisions).


Robert Oostenveld - 2012-03-05 14:05:29 +0100

the dashboard at http://fieldtrip.fcdonders.nl/development/dashboard lists a number of test cases with an UNKNOWN outcome. These should be fixed. One issue for these unknown outcomes is that the log files are not correctly formatted, see for example http://fieldtrip.fcdonders.nl/development/dashboard/r5384/test_bug1027 and http://fieldtrip.fcdonders.nl/development/dashboard/r5384/test_bug1223 I don't know whether that is related to the detection of whether a job passed or failed.


Boris Reuderink - 2012-03-28 10:58:39 +0200

OK. Further, there is no action coupled to detecting failing tests. A mail should be send in case a commit causes previously passing tests to not pass.


Boris Reuderink - 2012-04-03 12:05:25 +0200

A new issue: some of the log files are quite big (.5 MB!). This is caused by very verbose output. The problem is that it requires quite some disk space, since for every revision all the log-files are stored.


Robert Oostenveld - 2012-04-23 13:49:54 +0200

I checked the tests with unknown results, below are some findings - test_bug1027 may very well take a long time, but also seems to be interactive (i.e. opening a figure and waiting for input) - test_bug1093 idem - test_bug1223 makes a figure - test_bug1227 was a script, not a function (I just fixed it) - test_bug168 is interactive, runs forever - test_datasets perhaps because it returns a variable? Why would that be a problem? - test_historical was not meant for regression testing, is now disabled - test_warp don't know


Robert Oostenveld - 2012-04-23 13:56:20 +0200

(In reply to comment #6) I checked the size, these are the large ones roboos@dccn-l005> ls -lS total 11604 -rw-r--r-- 1 roboos mrphys 4301549 Apr 23 13:27 test_historical.txt -rw-r--r-- 1 roboos mrphys 2410931 Apr 23 13:27 test_bug1309.txt -rw-r--r-- 1 roboos mrphys 992621 Apr 23 13:27 test_bug1049.txt -rw-r--r-- 1 roboos mrphys 967932 Apr 23 13:27 test_tutorial_beamformer.txt -rw-r--r-- 1 roboos mrphys 331430 Apr 23 13:27 test_ft_qualitycheck.txt -rw-r--r-- 1 roboos mrphys 256667 Apr 23 13:27 test_tutorial_beamformer20120321.txt -rw-r--r-- 1 roboos mrphys 208241 Apr 23 13:27 test_bug472.txt -rw-r--r-- 1 roboos mrphys 193114 Apr 23 13:27 test_ft_freqstatistics.txt -rw-r--r-- 1 roboos mrphys 132724 Apr 23 13:27 test_ft_sourcemovie.txt -rw-r--r-- 1 roboos mrphys 129214 Apr 23 13:27 test_ft_connectivityanalysis.txt -rw-r--r-- 1 roboos mrphys 112993 Apr 23 13:27 test_tutorial_connectivity.txt the first one has been resolved (see comment 7). I suggest that the others are extended with the following lines at the top of the test script global ft_default ft_default.feedback = 'no' This will be automagically mixed into all subsequent cfgs for all fieldtrip function calls, and should make those functions much less verbose.


Boris Reuderink - 2012-04-23 14:32:08 +0200

I have disable the feedback as suggested in comment #8 in SVN revision 5694.


Robert Oostenveld - 2012-04-24 21:49:17 +0200

Hi Boris, A number of test scripts fails in the regression framework due to figures and/or due to asking for user input. For figures in general the regression framework should be robust, e.g. by using xvfb. For interactive input the scripts can of course not be made robust. Can you come with a suggestion for dealing with 1) non-interactive figure-generating test scripts 2) interactive test scripts?


Boris Reuderink - 2012-04-25 13:29:34 +0200

(In reply to comment #10) # On the point of interactive tests: I think we can change the name of interactive test scripts, so that they are not run automatically after each commit. It should be obvious that these are not run automatically: - test_* -> run automatically. - on of {debug,inspect,notest}_* -> do not run automatically. Is that an idea? # On figure-generating scripts: I'll try to figure (no pun intended) something with a virtual frame-buffer.


Robert Oostenveld - 2012-08-23 09:19:39 +0200

We had a discussion yesterday in the FT meeting on the dashboard. I won't repeat the outcome here, but will list a technical idea: If you look at http://fieldtrip.fcdonders.nl/tutorial/shared?do=index there are shared sections. These for example allow the same dataset to be explained in multiple tutorials, http://fieldtrip.fcdonders.nl/tutorial/shared/dataset is used in http://fieldtrip.fcdonders.nl/tutorial/preprocessing and referred to at various locations. The inclusion of one page in another is done with {{page>:tutorial:shared:dataset}} That is something we can use here as well, i.e. think of dashboard.txt being a plain wiki page and dashboard_sorted_rev.txt dashboard_sorted_status.txt dashboard_sorted_duration.txt dashboard_sorted_test.txt and dashboard.txt would just have headers (to create the page index in the left column) and the pages included. Creating the sorted txt files is probably very little work on the linux command line (using sort). It would mean that Eelke does not have to look into a sortable-table-plugin for the dokuwiki cms. However, that might also be pretty simple, see https://www.dokuwiki.org/plugin:sortablejs and links on that page, like https://www.dokuwiki.org/plugins?plugintag=tables#extension__table Please discuss with Eelke what the most efficient way to go forward.


Eelke Spaak - 2012-08-23 09:25:05 +0200

From a usability standpoint, just having a sortable table is highly preferable to having the same table repeated multiple times, with different sort options. Wouldn't you agree? Actually I already looked into this yesterday evening, and my suspicions that it would be quite simple were indeed confirmed. I was not too happy with the sortablejs plugin Robert mentioned, it seems a bit buggy. But I will try to get it working nonetheless; and otherwise just writing my own plugin should be trivial, since there are very good non-dokuwiki javascript-plugins for the functionality (e.g. http://tablesorter.com/docs/).


Eelke Spaak - 2012-08-23 09:41:52 +0200

Actually the sortablejs plugin works without any trouble whatsoever, apparently the bugs mentioned on its page have all been solved :) See http://fieldtrip.fcdonders.nl/playground for an example. Boris, if you make the table like this: <sortable 4> ^Revision ^ Test ^ Bugzilla ^ Status ^ Duration ^ History ^ |[[http://code.google.com/p/fieldtrip/source/detail?r=6399|6399]] | test_bug1014 | [[http://bugzilla.fcdonders.nl/show_bug.cgi?id=1014|1014]] | [[http://fieldtrip.fcdonders.nl/development/dashboard/r6399/test_bug1014|passed]] | 1.896| ++++++++++| ... </sortable> then it will automatically be sortable, and initially sorted on column 4 (which is status, so 'failed' will be on top).


Robert Oostenveld - 2012-08-23 09:55:35 +0200

(In reply to comment #14) cool!


Boris Reuderink - 2012-08-23 10:34:14 +0200

(In reply to comment #14) Great! I'll update the dashboard's format.


Boris Reuderink - 2012-08-23 14:34:30 +0200

Okay, the dashboards should now be wrapped in tags. I'll have to wait for the automatic sync to see if it was successful. I also added a very simple email script, but apparently I don't have the rights to mail to fieldtrip-bugs@science.ru.nl --- waiting approval. </p>

Boris Reuderink - 2012-08-23 15:05:34 +0200

(In reply to comment #17) It does not seem to work; the wiki claims the file was updated, but it is not sortable. I can't see the source for this page :/. Perhaps Robert can? Eelke, is sortable tag enabled for http://fieldtrip.fcdonders.nl/development/dashboard#?


Eelke Spaak - 2012-08-23 15:09:19 +0200

(In reply to comment #18) I touch-ed dokuwiki's conf/local.php, triggering a cache refresh, and now it works. I expect it to keep working even on the next dashboard update, otherwise I will investigate further.


Boris Reuderink - 2012-08-23 15:15:27 +0200

(In reply to comment #19) Cool, really slick.


Boris Reuderink - 2012-08-23 15:29:37 +0200

Ok. I mark this bug as FIXED:RESOLVED (it has been on long enough). Let's see if it keeps working before closign. For mailing the results I have opened a new bug. For new issues, I would propose to do the same.


Robert Oostenveld - 2012-08-23 16:22:59 +0200

please do make the test script a clickable link to the latest version at googlecode


Boris Reuderink - 2012-09-12 16:39:56 +0200

During the FT meeting of today JM mentioned that although a testcase was fixed, the dashboard still shows a broken test (test_ft_plot_vector, fixed somewhere 6402–6410). Might this have an interaction with the recent cluster failure?


Boris Reuderink - 2012-09-28 14:23:39 +0200

The problem that JM mentioned #1714 was solved. TODO: - check why mailing reports keep failing, - make test scripts clickable.


Robert Oostenveld - 2012-10-03 09:06:49 +0200

let me also post a positive comment I fixed regression issues with bug 2 and bug 62. They now shows as "passed" :-)


Boris Reuderink - 2012-10-04 12:51:09 +0200

Yay. New problem: test_nanstat does not appear to be run any more (since moved to src dir?)


Robert Oostenveld - 2012-10-04 13:40:09 +0200

(In reply to comment #26) I (and some of my cron jobs) would expect tests to be in a test (sub)directory.


Boris Reuderink - 2012-10-04 16:34:12 +0200

(In reply to comment #27): I realize I made the same assumption. I moved the test script to $FT/src to be able to use the files in the private dir — but of course that is not necessary.


Jörn M. Horschig - 2012-10-05 11:18:15 +0200

http://fieldtrip.fcdonders.nl/development/dashboard/r6693/test_bug1168 the fail testcases fail correctly on my machine and a mentat, but not in the automatic script. probably the reason is that this function is calling ft_multiplotTFR, which can only be used with a display. If no display, no plot, thus nothing can go wrong with plotting (which should)


Robert Oostenveld - 2012-10-05 11:35:49 +0200

some test scripts produce figures which do not result in an error. I know that some low-level figure features can indeed case problems when running in a cron job or on torque without display. Can we confirm the error with a simple test case e.g. try the script load avgFIC.mat cfg = [] ft_multiplotER(cfg, avgFIC) and use matlab_sub to execute it. If there is an error, it should should up in the "e" and "o" files.


Boris Reuderink - 2012-10-05 13:13:08 +0200

(In reply to comment #24) making the test scripts clickable requires (correctly and robustly) parsing the log files. I'll see if I can get that working.


Robert Oostenveld - 2012-10-05 16:33:30 +0200

(In reply to comment #31) could you not do some awk/perl/python where each 2nd column is encapsulated like this? <a http://code.google.com/p/fieldtrip/source/browse/trunk/test/test_bug2.m>test_bug2</a> Oh, the test files being at different locations under the trunk messes this up. Should we then move them all to one location (fieldtrip/test)?


Boris Reuderink - 2012-10-09 11:08:19 +0200

(In reply to comment #32) I actually like keeping code and tests close together — anything to keep testing low effort. For my Python code, I use the same file prefix (code.py & code_test.py), and store them in the same dir. I see code, documentation and tests as different representations of my intent.


Boris Reuderink - 2012-11-02 13:29:56 +0100

I am no longer working on FieldTrip. Hence, I donate all my bugs to the joint development user.


Robert Oostenveld - 2012-11-29 09:29:28 +0100

The following scripts now run as a cron job 2-59/15 * * * * /home/mrphys/roboos/fieldtrip/ft-test/poll-job.sh 5-59/15 * * * * /home/mrphys/roboos/fieldtrip/ft-test/parse-job.sh 8-59/15 * * * * /home/mrphys/roboos/fieldtrip/ft-test/format-job.sh 0 20 * * tue,wed /home/mrphys/roboos/fieldtrip/ft-test/mail-job.sh Probably the frequency can be turned down a bit.


Robert Oostenveld - 2012-12-31 11:46:25 +0100

closed several bugs that have been resolved for some time. Feel free to reopen the bug if you disagree.