Back to the main page.

Bug 498 - Numerical accuracy affecting usage of "nearest" on time axes

Status CLOSED FIXED
Reported 2011-02-16 14:49:00 +0100
Modified 2014-05-15 15:31:17 +0200
Product: FieldTrip
Component: core
Version: unspecified
Hardware: All
Operating System: All
Importance: P1 critical
Assigned to: Robert Oostenveld
URL:
Tags:
Depends on:
Blocks:
See also: http://bugzilla.fcdonders.nl/show_bug.cgi?id=1943

Craig Richter - 2011-02-16 14:49:36 +0100

This issue concerns how numerical accuracy in Matlab can cause unexpected results when using Fieldtrip. An initial issue, that has been addressed already, concerned the time vector found in most data structures. When using ft_redefinetrial while trying to select data of a specific length, trials that were long enough were not being selected. This issue was due to numerical accuracy. For example: If a time 0f 0.400 seconds had been defined, ft_redefinetrial would subtract the first from the last element of the time vector and compare this to 0.400. In this specific instance many trials were exactly 0.400 seconds long. The reason the comparison failed was due to the fact that the 0.400 generated by matlab for the comparison differed from the 0.400 that resulted from the subtraction of the time vector elements - an issue of numerical accuracy. I fixed this in ft_redefinetrial by changing the way that trial length is measured, using the sampling rate and data length in samples to calculate the length in seconds, which avoids the accuracy issue. I would imagine that this issue may reappear in many other instances, so we might want to consider a way to avoid it. One possibility that Chris and I cooked up is to define the accuracy of the time vector when it is generated, based on the sampling rate, i.e., if fs = 1000, then the time vector elements are defined to 3 decimal places. It is the colon operator which is the real problem since it generates numbers to full floating-point accuracy.


Boris Reuderink - 2011-11-17 10:46:38 +0100

Changed the status of bugs without a specific owner to UNCONFIRMED. I'll try to replicate these bugs (potentially involving the submitter), and change confirmed bugs to NEW. Boris


Boris Reuderink - 2011-12-07 11:29:44 +0100

Dear Craig, Could you construct and minimal example that displays this problem? Or better yet, provide (in diff format?) the changes you made to resolve this issue? Best regards, Boris


Craig Richter - 2011-12-07 11:49:51 +0100

(In reply to comment #2) Hi Boris, The issue is pretty much as I described it. In the older implementation of ft_redefine trial, the trial length is determined by subtracting the start time from the end time of the data.time vector. Since the values in the time vector are determined to the limit of accuracy in matlab, when you subtract them, you get unexpected numbers, i.e. 1.34 x 10-17, instead of zero. If this is compared with zero, then matlab determines that it is not equal, which is unexpected for the user. I was using 400 ms. If you go to line 302 in ft_redefinetrial you can see the change I made. Here I calculate the trial length without using the time vector. This way there is no numerical accuracy problem when the comparison is made on line 310. Here is an example: >> time = 0.5:0.1:1 time = 0.5000 0.6000 0.7000 0.8000 0.9000 1.0000 >> time(5) - time(4) == .1 ans = 0 So here numerical accuracy attacks, so a solution is to define the time vector to the accuracy allowed by the sampling rate, so 3 decimals in the case of 1000 Hz. Or use the sampling rate and the length in samples to determine the time, which is what I did on line 302. It amounts to the same thing. Fixing the time vector would be faster since then we would not have to change anything downstream. Best, Craig


Boris Reuderink - 2012-01-11 17:30:55 +0100

Robert is working on an improved nearest function and ft_selectdata


Robert Oostenveld - 2012-01-11 17:47:04 +0100

(In reply to comment #4) I have made an improved version of the nearest function, that allows the calling function to specify whether or not to allow out-of-range values and optionally whether to allow a tolerance of half a bin width. See nearest and test_nearest. manzana> svn commit test/test_nearest.m utilities/nearest.m Adding test/test_nearest.m Sending utilities/nearest.m Transmitting file data .. Committed revision 5130. What still needs to be done is to check all uses of the nearest function. In case of a begin-end range, it is important to check that at least one of them lies within the range of the array. To be continued.


Robert Oostenveld - 2012-06-05 09:29:58 +0200

On 5 Jun 2012, at 3:21, Richter, Craig wrote: I'm battling with my friend 'numerical accuracy' again. It is once again my unusual use of ft_redefinetrial. I use it to find trials that have data between a set of times (unequal trial lengths). I do this so that I can do sliding window analysis using custom scripts, for good reasons (trust me ;-)). The issue I have is with your nearest.m function. I think it is usually well behaved but it can create issues when the tois fall exactly halfway between the times defined by data.time{i}. For example: if toilim = [.1 .2] and data.time{1} = 0.05: 0.1 : 0.25 then nearest would usually go with the lower match since this is the default behaviour of min, BUT, due to numerical accuracy it will actually flip between the higher and lower value, and this will cause the generation of data that fluctuates in length by 1 sample, based on the specific toilim input. This is not expected behaviour since all the parameters are the same in terms of the data length wanted, etc., but based on the specific toilim pair, the outcome will differ. I tried about a million different ways to get around this, but I realized that (I think???) there is fundamentally no way to do it without a threshold. Can you think of anything else? I know it is ugly, but I can't see another fix. I've added this: [dum, indx] = min(round(10^9.*(abs(array(:) - val)))./10^9); Which is a threshold at a 1 billionth of a second (when using nearest.m on time data). If anybody has an amp that can do that… I haven't committed it since it is so ugly, but it is working for me for now. What do you propose? Would working completely in integer samples be the solution? I like using time better for the intuitiveness, but this has caused my so many headaches.


Robert Oostenveld - 2012-06-05 09:31:39 +0200

(In reply to comment #6) So what you are saying is that if the time axis is [1 2 3 4 5] that the nearest point to 1.5 sometimes is the 1st and sometimes the 2nd sample, because sometimes the time is actually [1+eps 2 3 4 5] and sometimes it is [1 2-eps 3 4 5] I recall that some code was implemented and added somewhere to ensure that there is no difference in the time axes of subsequent trials. The question then is: why does that code not solve the problem?


Craig Richter - 2012-06-05 10:56:25 +0200

Not exactly, I believe the time axes of each trial are identical. As I understand it, the issue arises from the subtraction: abs(array(:) - val)). Since I'm selecting data by moving a window along the time axis, e.g. [1 2 3 4 5], then for toi 1.5 the operation may result in 0.5+eps, and thus snap to 2, but toi = 2.5 may result in 0.5-eps, and thus also go to 2. In this case if toilim = [1.5 4], then I will get a resulting trial of 2 samples. If toilim = [2.5 5], then I will get a result that is 3 samples long. If I slide ft_redefinetrial using a sliding toilim, I will then get an inaccuracy of one sample for the trials that are generated.


Robert Oostenveld - 2012-06-05 12:18:05 +0200

(In reply to comment #8) if the array is the same and the val is the same, you would expect a deterministic behaviour. But what you are saying is that for subsequent vals of 1.5, 2.5, 3.5 ... the val is sometimes closer to the higher array index and sometimes closer to the lower. Why do you start with these values anyway? Why not array = [1 2 3 4 5 ...] and for each value in [1 2 3 4] instead of what you are doing here with for each value in [1.5 2.5 3.5 4.5] Should we not simply give an error if the user specifies an ambiguous value, i.e. if the value is too close to the centre of the interval?


Craig Richter - 2012-06-05 13:11:09 +0200

I suppose that there could be a rule that the time intervals selected must be integer multiples of the sampling rate, so in my case I have a 250 Hz sampling rate, and I desire a 200 ms window sliding by 50 ms. This recipe leads to the numerical error causing the problem. Exactly how the sampling rate, window and slide interact is something I would have to work out. I suppose I don't know how uniquely deadly this combination is. In terms of the specific values I used in the example, these were just for illustration. What happens based on the above parameters is that the start or end points for certain windows will lie in the middle of two samples, and then the rounding error occurs. If you program an error, I think it would be hard for the user to understand. The threshold does handle the problem. I does cap the precision of nearest.m to 10e-9, but this is probably sufficient for our purposes? We could go lower since the error is on the order of 10e-16. Here is a concrete example: If the time axis of a trial is defined as: 1:0.004:1.1 and I want a 50 ms window starting from 1 s, then with this time axis and sampling rate the corresponding sample for 1.05 ms does not exist, so nearest needs to decide to go up or down. In most cases it will go down to 1.048, as it is intended to do, BUT, as one slides the this window across a longer series of data cases WILL occur where it goes up, and this is what causes the problem. The threshold resolves this ambiguity by setting the values equal to 9 decimal places so that the lower of the two is selected rather than the upper value which could be marginally smaller due to the numerical error of the subtraction. [dum, indx] = min(round(10^9.*(abs(array(:) - val)))./10^9); %this rounds to the lowest in case of a tie


Robert Oostenveld - 2012-06-05 14:00:37 +0200

(In reply to comment #10) I agree with the suggested change. Please implement it. I suggest to make the threshold higher, e.g. 1e-6 instead of 1e-9, as it should also work for single precision data representations, where >> eps('single') ans = 1.1921e-07


Craig Richter - 2012-06-05 14:29:25 +0200

Threshold fix has been committed


Craig Richter - 2012-06-05 14:30:07 +0200

fixed


Robert Oostenveld - 2012-08-23 10:33:52 +0200

closed multiple bugs that have been resolved for some time


Jörn M. Horschig - 2013-09-25 10:24:09 +0200

I changed the way nearest works in light with bug 1943 (the 'precision' parameter caused the function to get unreliable for higher values): svn ci utilities/nearest.m -m "bugfix #1943 - nearest now works using find rather than on absolute differences weighted by some precision matrix" Sending utilities/nearest.m Transmitting file data . Committed revision 8534.