Back to the main page.

Bug 3012 - fltpadding doesn't work with data in memory

Status CLOSED FIXED
Reported 2015-11-24 23:34:00 +0100
Modified 2016-06-14 16:14:51 +0200
Product: FieldTrip
Component: core
Version: unspecified
Hardware: PC
Operating System: Windows
Importance: P5 normal
Assigned to: Robert Oostenveld
URL:
Tags:
Depends on:
Blocks:
See also: http://bugzilla.fieldtriptoolbox.org/show_bug.cgi?id=2978http://bugzilla.fieldtriptoolbox.org/show_bug.cgi?id=3005

Teresa Madsen - 2015-11-24 23:34:48 +0100

I'm not exactly sure if this is a bug or if I'm just not understanding how this is supposed to work. I think there are multiple problems here, which is why this is so long and complicated. If it's not exactly broken, you can take this as a feature request to make it work the way I was hoping it would. ;-) I've loaded my data from NEX files and am working on artifact rejection. I parsed the continuous data into 100 chunks to allow for easier visualization without crashing my graphics card. When I search for low frequency artifacts, I often get large edge artifacts at the beginning or end of these fake "trials," so I tried to add filter padding, but it doesn't seem to work in this format. 1) First, I tried this: %% load continuous LFP data, all channels cfg = []; cfg.dataset = nexfile{r,p}; cfg.channel = 'AD*'; cfg.continuous = 'yes'; cfg.outputfile = outputfile1; data = ft_preprocessing(cfg); %% break data into 100 chunks for visualization % during artifact rejection process, will be % re-merged before filtering cfg = []; cfg.dataset = nexfile{r,p}; cfg.trialdef.triallength = floor(... (data.sampleinfo(end,2)-... data.sampleinfo(1,1))/100)/data.fsample; cfg.trialdef.ntrials = 100; cfg = ft_definetrial(cfg); data = ft_redefinetrial(cfg, data); trl = data.cfg.trl; %% define low frequency artifacts cfg = []; cfg.trl = trl; % channel selection, cutoff and padding cfg.artfctdef.zvalue.channel = channels; cfg.artfctdef.zvalue.cutoff = 10; cfg.artfctdef.zvalue.trlpadding = 0; cfg.artfctdef.zvalue.fltpadding = 2; cfg.artfctdef.zvalue.artpadding = 0.25; % algorithmic parameters cfg.artfctdef.zvalue.lpfilter = 'yes'; cfg.artfctdef.zvalue.lpfreq = 1; cfg.artfctdef.zvalue.rectify = 'yes'; cfg.artfctdef.zvalue.boxcar = 1; % make the process interactive cfg.artfctdef.zvalue.interactive = 'yes'; disp(['Low frequency artifact detection for ' regions{t} '.']); [~, artifact.lowfreq{t}] = ft_artifact_zvalue(cfg,data); Running that last cell gives these messages/errors: Low frequency artifact detection for mPFC. searching for artifacts in 7 channels Subscript indices must either be real positive integers or logicals. Error in ft_fetch_data (line 117) count = count (begsample:endsample); Error in ft_artifact_zvalue (line 285) dat{trlop} = ft_fetch_data(data, 'header', hdr, 'begsample', trl(trlop,1)-fltpadding, 'endsample', trl(trlop,2)+fltpadding, 'chanindx', sgnind, 'checkboundary', strcmp(cfg.continuous,'no'), 'skipcheckdata', 1); 117 count = count (begsample:endsample); I figured out that begsample is negative because it's looking for data before the first sample of the file, so I added a step to reject any NaNs and 2s before and after them and the beginning and end of the file. It seems like it would be easier to just pad with zeros in this case (PROBLEM #1). 2) Attempt 2, run the first 2 cells as above, then: %% mark NaNs as artifacts cfg = []; cfg.artfctdef.nan.channel = 'AD*'; cfg.artfctdef.nan.pretim = 2; % pre-artifact rejection interval in sec cfg.artfctdef.nan.psttim = 2; [~, artifact.nan] = ft_artifact_nan(cfg,data); %% also mark first & last 2 seconds of data as % artifacts, to allow filter padding later artifact.pad = [data.sampleinfo(1,1) ... data.sampleinfo(1,1)+2000; ... data.sampleinfo(end,2)-2000 ... data.sampleinfo(end,2)]; %% remove NaNs & padding from data cfg = []; cfg.artfctdef.nan.artifact = artifact.nan; cfg.artfctdef.pad.artifact = artifact.pad; cfg.artfctdef.reject = 'partial'; data = ft_rejectartifact(cfg,data); trl = data.cfg.trl; %% define low frequency artifacts cfg = []; cfg.trl = trl; % channel selection, cutoff and padding cfg.artfctdef.zvalue.channel = channels; cfg.artfctdef.zvalue.cutoff = 10; cfg.artfctdef.zvalue.trlpadding = 0; cfg.artfctdef.zvalue.fltpadding = 2; cfg.artfctdef.zvalue.artpadding = 0.25; % algorithmic parameters cfg.artfctdef.zvalue.lpfilter = 'yes'; cfg.artfctdef.zvalue.lpfreq = 1; cfg.artfctdef.zvalue.rectify = 'yes'; cfg.artfctdef.zvalue.boxcar = 1; % make the process interactive cfg.artfctdef.zvalue.interactive = 'yes'; disp(['Low frequency artifact detection for ' regions{t} '.']); [~, artifact.lowfreq{t}] = ft_artifact_zvalue(cfg,data); When I run the last cell, I get the following messages/error: Low frequency artifact detection for mPFC. searching for artifacts in 7 channels searching in trial 101 from 101 showing trial 1, channel AD02 Error using plot Vectors must be the same length. Error in ft_artifact_zvalue>redraw_cb (line 919) plot(opt.h1, xval, yval, 'linestyle', '-', 'color', 'b', 'displayname', 'data'); Error in ft_artifact_zvalue (line 492) redraw_cb(h); 919 plot(opt.h1, xval, yval, 'linestyle', '-', 'color', 'b', 'displayname', 'data'); xval is a 1x57110 double, containing the sample numbers from "trial" 1. yval is a 1x61110 double (the number of data points that would be in the padded trial - so PROBLEM #2 is that it's not removing the padding when it's time to plot the data), but contains all zeros because nansum was previously used on a matrix of all NaNs: ft_artifact_zvalue, line 370 zsum{trlop} = nansum(zdata,1); % accumulate the z-values over channels zdata was all NaNs because dat{1} was padded with NaNs at the beginning because the data variable obviously didn't include any time before the first trial (I removed it to get it to work this much!). Shouldn't that be padded with zeros instead, to allow the filter to work? That's PROBLEM #3. 3) My next approach was to feed in the raw continuous data, first using cfg.inputfile, with the trl configuration as defined above (100 chunks, minus 2s at beginning and end of file and around NaNs), and that opened a whole different can of worms: %% define low frequency artifacts cfg = []; cfg.inputfile = outputfile1; cfg.trl = trl; % channel selection, cutoff and padding cfg.artfctdef.zvalue.channel = channels; cfg.artfctdef.zvalue.cutoff = 10; cfg.artfctdef.zvalue.trlpadding = 0; cfg.artfctdef.zvalue.fltpadding = 2; cfg.artfctdef.zvalue.artpadding = 0.25; % algorithmic parameters cfg.artfctdef.zvalue.lpfilter = 'yes'; cfg.artfctdef.zvalue.lpfreq = 1; cfg.artfctdef.zvalue.rectify = 'yes'; cfg.artfctdef.zvalue.boxcar = 1; % make the process interactive cfg.artfctdef.zvalue.interactive = 'yes'; disp(['Low frequency artifact detection for ' regions{t} '.']); [~, artifact.lowfreq{t}] = ft_artifact_zvalue(cfg); Running that cell gives these messages/error: Low frequency artifact detection for mPFC. reading 'data' from file 'S:\Teresa\Analyses\TempData\Raw16LFPContData_200Conditioning.mat' searching for artifacts in 7 channels searching in trial 1 from 1 Reference to non-existent field 'datafile'. Error in ft_artifact_zvalue>redraw_cb (line 900) data = ft_read_data(cfg.datafile, 'header', hdr, 'begsample', trl(trlop,1), 'endsample', trl(trlop,2), 'chanindx', sgnind, 'checkboundary', strcmp(cfg.continuous,'no')); Error in ft_artifact_zvalue (line 492) redraw_cb(h); This error occurs because opt.data was left empty on line 473, because nargin==1 (PROBLEM #4). I think line 472 should be changed to "if ~hasdata". But even with that fixed, note that it's "searching in trial 1 from 1" instead of 101 as before, which is because it ignores cfg.trl and uses data.sampleinfo instead (PROBLEM #5). And problem 2 still happens. 4) This is my current workaround, sending ft_artifact_zvalue back to the original NEX file, but it takes way longer than just passing in the already loaded data. %% define low frequency artifacts cfg = []; cfg.dataset = nexfile{r,p}; cfg.trl = trl; % channel selection, cutoff and padding cfg.artfctdef.zvalue.channel = channels; cfg.artfctdef.zvalue.cutoff = 10; cfg.artfctdef.zvalue.trlpadding = 0; cfg.artfctdef.zvalue.fltpadding = 2; cfg.artfctdef.zvalue.artpadding = 0.25; % algorithmic parameters cfg.artfctdef.zvalue.lpfilter = 'yes'; cfg.artfctdef.zvalue.lpfreq = 1; cfg.artfctdef.zvalue.rectify = 'yes'; cfg.artfctdef.zvalue.boxcar = 1; % make the process interactive cfg.artfctdef.zvalue.interactive = 'yes'; disp(['Low frequency artifact detection for ' regions{t} '.']); [~, artifact.lowfreq{t}] = ft_artifact_zvalue(cfg); What would make even more sense would be to have the option of feeding in the continuous data (as I tried in #3), letting that be what gets filtered, but then display based on the info in cfg.trl. But then again, my NaNs would break that. So, I think the BEST solution would be to concatenate all contiguous samples in the data structure, pad those long chunks with zeros, filter, and then drop the padding and re-parse the trials for plotting. Let me know if that makes any sense.


Robert Oostenveld - 2015-11-26 11:31:18 +0100

Regarding "I parsed the continuous data into 100 chunks to allow for easier visualisation" -> ft_databrowser was designed to visualise data either from disk or from memory. When reading for disk, it will only read the section that is needed. We designed this for ~100 GB datasets. This is something to consider when working with large data. Regarding visualizing, you can use ft_databrowser to show a small part of a continuous recording that is in memory. This will show one-second segments: data = []; data.label = {'1'} data.time{1} = (1:100000)/1000; data.trial{1} = randn(1, length(data.time{1})); ft_databrowser([], data) So I don't understand the initial problem (yet). It might be that I am overlooking some peculiarity that is due to the NEX files. But moving on to the question about padding, first of all I tried replicating your situation (with the fake data from above): cfg = []; cfg.artfctdef.zvalue.cutoff = 3 cfg.artfctdef.zvalue.trlpadding = 0; cfg.artfctdef.zvalue.fltpadding = 2; cfg.artfctdef.zvalue.artpadding = 0.25; cfg.artfctdef.zvalue.lpfilter = 'yes'; cfg.artfctdef.zvalue.lpfreq = 1; cfg.artfctdef.zvalue.rectify = 'yes'; cfg.artfctdef.zvalue.boxcar = 1; cfg.artfctdef.zvalue.channel = 'all' cfg.artfctdef.zvalue.interactive = 'yes'; ft_artifact_zvalue(cfg, data) I ran into two problems, which I just fixed. See https://github.com/fieldtrip/fieldtrip/commit/271560f4cfd7e127535f49cc5b5cd0bed356dd31 for details. mac011> svn commit utilities/ft_fetch_data.m private/preproc.m Sending private/preproc.m Sending utilities/ft_fetch_data.m Transmitting file data .. Committed revision 10940. With the code above, filtering is not done (as per design) since the padding extends outside the available data. cfg = [] cfg.trl(:,1) = 1:1000:100000; cfg.trl(:,2) = 999+(1:1000:100000); cfg.trl(:,3) = 0; segmented = ft_redefinetrial(cfg, data) Also here I ran into problems mac011> svn commit Sending ft_artifact_zvalue.m Sending utilities/ft_fetch_data.m Transmitting file data .. Committed revision 10942. Now it works as I expect. The first two trials in zvalue are not handled very well because the requested filter padding extends beyond the available data, causing the filter NOT to be applied. I am surprised about the lack of warnings (it is only shown once). This commit will make sure that the warning is more visible. mac011> svn commit ft_artifact_zvalue.m private/preproc.m Sending ft_artifact_zvalue.m Sending private/preproc.m Transmitting file data .. Committed revision 10944. With these changes to the code, could you please re-evaluate your issues?


Teresa Madsen - 2015-12-01 18:54:36 +0100

(In reply to Robert Oostenveld from comment #1) Thanks, that does work better, but still not great. To answer your first question, yes, I know databrowser can display continuous data with no problem, and I do use it to review the detected artifacts before rejecting them, but my graphics problems occur while interactively selecting the z-value threshold for artifact detection, at which point it does try to plot the entire "single trial" at once. So, for the chopped data, your fixes now get me to the ft_artifact_zvalue GUI with no errors, but as you said, there are no filters applied to the first and last trials, so those skew the z-value detection and are pretty much completely lost, if I set the threshold to a reasonable level for the rest of the data. I could work around that by creating trials just for padding, but wouldn't it be nice to add an option to pad with 0s instead of NaNs? For the continuous data, I first run into Bug 3005 (http://bugzilla.fieldtriptoolbox.org/show_bug.cgi?id=3005), which no one has replied to, but that's an easy fix. After that, I get to the ft_artifact_zvalue GUI with no more errors, but it's very sluggish and gets finicky because of the large amount of data being displayed. Minor problems include the data disappearing from view after I've zoomed in and back out (but it comes back when I click another GUI button, as this refreshes the data), and major problems include my monitor going into powersave mode because the graphics card has completely stopped. Sometimes it recovers with a graphics driver error (sends me to this link: http://nvidia.custhelp.com/app/answers/detail/a_id/3007) and I just have to restart Matlab, but other times I have to force the computer to reboot. I may need a new graphics driver or card or something, but this problem only occurs when trying to display large amounts of continuous data like this - hence my desire to break it up.


Teresa Madsen - 2015-12-01 22:14:35 +0100

(In reply to Teresa Madsen from comment #2) Here's the workaround I came up with: load(outputfile1); % from the 1st cell in my original message %% mark NaNs as artifacts cfg = []; cfg.artfctdef.nan.channel = 'AD*'; [~, artifact.nan] = ft_artifact_nan(cfg,data); %% remove NaNs from data cfg = []; cfg.artfctdef.nan.artifact = artifact.nan; cfg.artfctdef.reject = 'partial'; data = ft_rejectartifact(cfg,data); otrl = data.cfg.trl; % original trl structure %% divide continuous data into ~30s chunks, % with 2s "trials" at beginning and end that % will be lost to padding with NaNs padsec = 2; % seconds of padding to add padsamp = ceil(padsec*data.fsample); % # of samples trlsec = 30; % minimum length of trials trlsamp = ceil(trlsec*data.fsample); % # of samples ntrl = cell(size(otrl,1),1); for ot = 1:size(otrl,1) % original trial # otlen = otrl(ot,2)-otrl(ot,1); % ot length if otlen < (2*padsamp) % if less than 4s of data between NaNs warning(['Skipping original trial # ' ... int2str(ot) ... ' because of insufficient length.']); ntrl{ot} = []; else nnt = floor((otlen ... % number of new trials - (2*padsamp)) ... % minus 4 seconds lost to padding /trlsamp); % divided by min trial length ntlen = floor((otlen ... % new trial length - (2*padsamp)) ... % minus 4 seconds lost to padding /nnt); % divided by # of new trials ntrl{ot} = nan([nnt+2 2]); % preallocate space ntrl{ot}(1,:) = [otrl(ot,1) ... (otrl(ot,1) + padsamp)]; for nt = 1:nnt % new trial # ntrl{ot}(nt+1,:) = [(ntrl{ot}(nt,2)+1) ... ((ntrl{ot}(nt,2)+1) + ntlen-1)]; end ntrl{ot}(nnt+2,:) = [(ntrl{ot}(nnt+1,2)+1) ... ((ntrl{ot}(nnt+1,2)+1) + padsamp)]; if ntrl{ot}(1,1) ~= otrl(ot,1) ... || ntrl{ot}(end,2) > otrl(ot,2) ... || any(any(isnan(ntrl{ot}))) error(['something is wrong with ntrl{' ... int2str(ot) '}']); end end end ttrl = vertcat(ntrl{:}); % temp trl trl = [ttrl ttrl(:,1)]; % add offset from beginning of file % TO DO: change to offset from nearest tone % (probably by adding this cell's function to % my_trialfun) %% restructure data into those chunks cfg = []; cfg.trl = trl; data = ft_redefinetrial(cfg, data); % ... farther down in my script ... %% define low frequency artifacts cfg = []; cfg.trl = trl; % channel selection, cutoff and padding cfg.artfctdef.zvalue.channel = channels; cfg.artfctdef.zvalue.cutoff = 10; cfg.artfctdef.zvalue.trlpadding = 0; cfg.artfctdef.zvalue.fltpadding = 2; cfg.artfctdef.zvalue.artpadding = 0.25; % algorithmic parameters cfg.artfctdef.zvalue.lpfilter = 'yes'; cfg.artfctdef.zvalue.lpfreq = 1; cfg.artfctdef.zvalue.rectify = 'yes'; cfg.artfctdef.zvalue.boxcar = 1; % make the process interactive cfg.artfctdef.zvalue.interactive = 'yes'; disp(['Low frequency artifact detection for ' ... regions{t} '.']); [~, artifact.lowfreq{t}] = ft_artifact_zvalue(cfg,data); ^ This works fine, just eliminating the 2s trials I built in for the purpose of padding. But again, simply adding an option to pad with 0s seems easier.


Teresa Madsen - 2015-12-22 22:29:04 +0100

I guess you fixed what was broken, and my workaround is sufficient for what I need to do, so I'll close this bug. Thanks.


Robert Oostenveld - 2016-06-14 16:14:51 +0200

Hereby I am closing multiple bugs that have been resolved for some time now. If you don't agree to the resolution, please reopen.