Back to the main page.

Bug 2314 - MOFB - mother of all ft_databrowser bugs

Status NEW
Reported 2013-10-09 14:50:00 +0200
Modified 2017-11-28 09:22:22 +0100
Product: FieldTrip
Component: core
Version: unspecified
Hardware: All
Operating System: All
Importance: P3 normal
Assigned to: Roemer van der Meij
URL:
Tags:
Depends on: 2116411642231827103033163316351661166221942203229623042312231523162317233523482424247325592656271227132714287829202934299531213123313732233279
Blocks:
See also:

Roemer van der Meij - 2013-10-09 14:50:03 +0200

This bug is intended for general discussion of improvements to ft_databrowser. Bugs related to ft_databrowser should have this bug as a dependency. Then, when the specific bugs get updated, an email is also sent to the CC list of this bug. If you are working on the databrowser, or would like to join the discussion, please add yourself as a CC.


Roemer van der Meij - 2013-10-09 14:51:07 +0200

(Jorn and Eelke, I added you as CC based on previous databrowser work, please remove yourself if you want ;))


Jan-Mathijs Schoffelen - 2013-10-10 10:48:57 +0200

Hoi Roemer, I tentatively assigned this back to you. It's not meant that you need to do all the work, but at least it makes a named person responsible for pushing this forward. If it is kept on the unassigned list, for sure it will not be pushed forward.


Robert Oostenveld - 2013-10-10 11:47:39 +0200

gentlemen, please keep the mother-bug the top-most one and list other bugs here as "depends on". In the respective page for the other bugs, this bug 2314 can be listed as "blocks". There is this "show dependency tree" option (http://bugzilla.fcdonders.nl/showdependencytree.cgi?id=2314&hide_resolved=1), which may help you to understand the direction. The graph option does not seem to work.


Craig Richter - 2013-10-10 15:04:17 +0200

One proposed enhancement that I discussed with Roemer is to significantly change the internal workings of the databrowser in the following way: It appears to function by determining the data of interest to be plotted and then injecting the new data into the figure and redrawing the plot. For large multichannel data sets there is certainly a speed issue with moving through the data, e.g. paging through time segments. An enhancement we discussed is to load all of the data into the figure and use adjustment e.g. of the axis limits to move pan through time or channels. In my experience this is very fast. I wanted to bring this up for discussion to debate the advantages/disadvantages since it will require a rather large change in the code.


Jörn M. Horschig - 2013-10-10 15:24:37 +0200

Hey, first of all, I like the title of the bug. I like intuitive abbreviations (one could have also gone ahead and named it MOAFT_DABU - how silly). I like the idea, however there are some problems we might want to think about beforehand. Loading all the data might be alright for continuous data, but for segmented data, the databrowser should only plot one trial (thus, not all data at the same time). Not sure if possible, but we could try to draw different trials on different axes and set the visibility of all but one to 0. That would fasten up the databrowser tremendously, given that there will be no memory problem. In the low-level implementation, data is stored with the figure. If I recall correctly no adjustments to the data are made, so no copy of the data is made. In future the databrowser should not get unstable because of memory problems. We would need to check how data are stored, which mostly entails finding out when the different scaling-variables are applied. One thing that *will* result in a data copy is the on-the-fly data preprocessing (applying filters etc.). If that is done on all data, one might easily run out of memory. As a test-scenario, we could use both MEG and TMS-EEG data from the tutorials. They are fairly big in size because of sampling rate and/or #channels. I guess that will reasily result in a copy of several gigs, thus I would propose to apply the preprocessing just on the chunk of data that is currently on the screen and update sequentially. But we'd need to test this. Craig, if you feel that these are no bigger obstacles, maybe you can go ahead and create a new bug (and link it to this one)?


Craig Richter - 2013-10-10 16:30:16 +0200

I think the way to handle the epoch case is to concatenate the trial data and adjust the GUI controls so that it is impossible to end up between trials, i.e. you can only present data of one trial, and then zoom, etc. inside of that. This shouldn't be tough. I'm not very experienced with Matlab's handling of the memory when it is in a figure, so any enlightenment in terms of not letting the memory blow up would be very helpful. Also, I'm trying to make a bug for this, but it wants to put the whole developer list as the CC. How do I add this specific list of developers?


Jörn M. Horschig - 2013-10-10 17:05:11 +0200

Hi Craig, don't touch the CC list but add the dependence to this bug. Bugzilla handles what needs to be done :) (however, if you want to change it, use CTRL to add multiple users) The memory issue is not specific to figures, but to Matlab. In essence, Matlab tries to not copy data unless necessary. E.g. dataA = dataB results in no copy, thus no increase in memory requirements, but if you subsequently do dataA(1) = 0; then Matlab will create a copy of dataB, assign the outcome to dataA and make a change to dataA(1). Thus, memory requirements have just doubled


Robert Oostenveld - 2013-10-11 09:31:53 +0200

(In reply to comment #7) Jorn is right. But note that memory might not be so "cheap" here: the data is organized in matrices, whereas in the figure each line has associated data in two vectors (the x and y-data). Doing y1 = randn(100, 10000); y2 = cell(1,100); for i=1:100 y2{i} = y(1,:); end will result in two real copies in memory. Had it been y1 = randn(100, 10000); y2 = y1; then indeed only a single real copy would be present, with y2 being a "shallow" copy (with its own name, but pointing to the same section in RAM).


Roemer van der Meij - 2013-10-23 11:44:58 +0200

Jorn makes a very good point, but one that can be avoided I think without too much trouble. We can set a hard limit on how many samples are 'active' in the plot. If this is reasonable, no explosion can occur. For example, let's say we set this to 1e7 data-points. As double precision float is 8 bytes, this would put ~76MB in the figure. Trying this out will quickly tell us whether this is clunky or not on an average desktop. 1e7 data-points with common 1000Hz sampling and 250 channels, is 40 seconds of data. Already far more than will probably be useful. On the trial vs segmentation... Just as we modularize code right now with computational functions (or try to), we should do something similar with plotting functions. I think most of us already think in this way, just making it explicit here. I.e., completely separate code that does the visualization, with that of the bookkeeping. The reasons are simple, 1) a set of plotted lines is a set of plotted lines, whether it comes from segmented or non-segmented data, and 2) code is easier to read/fix/expand/etc. Right now, we sort of do this in the databrowser, but not clearly. Redraw_cb, the main subfunction that plots everything, does a lot of data-handling and bookkeeping besides plotting. And other subfunctions are probably kind of in between. Here is what I propose as a simple plan of development (please shoot/comment on this). 1) in the current databrowser, strictly separate plotting from bookkeeping, without changing functionality. The plotting bit itself should have no 'intelligence', it should only care about samples that are plotted, where they should be plotted, labels, topo's, etc. 2) once functionality is separated, enhance the way the data is plotted by instead of plotting a selection, load more than the selection and use axis-limits (or any other modification). Possible substeps in step 1: 1A: split redraw_cb into two subfunctions: prepvisdat and plotvisdat. Choose as a cutting point the first occurrence of visualizations. 1B: make sure the appropriate input is given to plotvisdat (prepvisdat can stay the same). 1C: migrate data-handling still present in plotvisdat to prepvisdat and adopt input accordingly. As a rule of thumb, plotvisdat should think in terms of input to the plot_vector function, not in terms of input to ft_prepocessing. This is still unspecific, and a more general strategy. Just a proposal, let's discuss this here.


Robert Oostenveld - 2013-10-23 17:35:50 +0200

(In reply to comment #9) I think it is a good idea to split the prep/plot parts. We might reconsider how "data/information" is passed around in the function. Now part of it probably goes through guidata, whereas other stuff might be passed to functions. I think it is easier if that were better defined and more consistent.


Jan-Mathijs Schoffelen - 2016-03-01 21:17:38 +0100

I have noticed some stagnation regarding this set of bugs. Is it possible to come up with a plan how to address these open issues? I suggest doing an inventory to check which ones pertain to bugs, and which are feature requests/enhancement. Based in this, you can prioritize them and/or decide not to further pursue them. Some of them may already even be solved.


Roemer van der Meij - 2016-03-03 14:47:12 +0100

(In reply to Jan-Mathijs Schoffelen from comment #11) Ha, this has been a while indeed. I'm still working on the databrowser occasionally, and am still interested with working on speed increases. As the past showed, I find it difficult to get going with this. A huge effort it will take. In the next few months fieldtrip-wise my efforts will be on adding ecog plotting functionality (see my github if you're curious!). In the meantime, I'll go through some of the bugs and see what their status is.