Back to the main page.

Bug 1038 - the reading functions can support tgz and zip datasets, unzip on the fly

Status	CLOSED FIXED
Reported	2011-10-12 16:46:00 +0200
Modified	2019-08-10 12:29:19 +0200
Product:	FieldTrip
Component:	fileio
Version:	unspecified
Hardware:	PC
Operating System:	Mac OS
Importance:	P3 enhancement
Assigned to:	Eelke Spaak
URL:
Tags:
Depends on:
Blocks:
See also:	http://bugzilla.fcdonders.nl/show_bug.cgi?id=1747

Robert Oostenveld - 2011-10-12 16:46:41 +0200

this requires unpacking in a temporary directory and a unique way of assigning a tmp-identifierTODO

Boris Reuderink - 2011-11-17 10:46:39 +0100

Changed the status of bugs without a specific owner to UNCONFIRMED. I'll try to replicate these bugs (potentially involving the submitter), and change confirmed bugs to NEW. Boris

Boris Reuderink - 2012-01-03 14:38:29 +0100

Confirmed (enhancement by Robert). Changed status to NEW.

Boris Reuderink - 2012-02-03 12:04:16 +0100

Maybe some ideas can be borrowed from Python's DataSource: http://docs.scipy.org/doc/numpy/reference/generated/numpy.DataSource.html It implements caching and opening from URLs on top of Robert's proposal.

Robert Oostenveld - 2012-02-03 15:33:02 +0100

Jan-Mathijs, Could this download-and-cache-on-the-fly be of relevance for the HCP? Do you know how the mgz format is dealt with? I suspect it to be compressed.

Jan-Mathijs Schoffelen - 2012-02-03 16:34:28 +0100

Yes, the mgz format is compressed. It is compressed/uncompressed on the fly, using unix('....gzip etc') for compression, and platform-dependent unix('...gunzip etc') or unix('...zcat ...') for decompression. Doesn't seem to work for windows

Eelke Spaak - 2012-02-29 14:47:01 +0100

Proposal for implementing: - Change ft_filetype to detect .zip, .tgz, .tar.gz, .gz file extensions, and return that as the filetype. - Change ft_read_header and ft_read_data to check whether filetype equals one of the compressed types, and if so, extract the file to a temporary directory, and recursively call ft_read_header/ft_read_data on the extracted file set. Am I missing something why this would not work? (Don't have any experience with these reading functions or data formats other than CTF.)

Eelke Spaak - 2012-02-29 15:07:52 +0100

JM thinks it's a good idea, working on it.

Robert Oostenveld - 2012-03-02 17:25:52 +0100

(In reply to comment #7) indeed good idea. I suggest the test scenario to include at least these cases: 1) ctf_ds, i.e. *.ds directory, where the directory would be zipped. 2) simple one-file EEG format, e.g. biosemi_bdf or ns_cnt, where the file is zipped 3) brainvision triplet of vhdr+vmrk+eeg files. These are normally not in a directory. case 1 also applies to neuralynx_sdma, also egi_mff would be a good test case case 2 applies to a lot of formats, and is simple. No additional testing needs to be done. But it raises the question: how do ft_filetype and ft_read_xxx interact with each other? Should ft_filetype already unzip? How to avoid multiple unzip actions? Should there be a "zipcache" function in fileio/private with a persistent list/struct-array with the original zipped filename and the alternative unzipped filename? case 3 also applies to a set of dicom images (ft_read_mri), or to an analyze hdr+img anatomical file. In case 3, should it support both a "flat" zipfile and one with a subdir in it?

Robert Oostenveld - 2012-09-26 22:34:45 +0200

(In reply to comment #3) I moved the URL idea to the separate bug 1747

Eelke Spaak - 2014-01-29 14:59:52 +0100

bash-4.1$ svn commit test/test_readcompresseddata.m fileio Sending fileio/ft_read_data.m Sending fileio/ft_read_header.m Sending fileio/ft_read_mri.m Adding fileio/private/inflate_file.m Adding test/test_readcompresseddata.m Transmitting file data ..... Committed revision 9149.

Robert Oostenveld - 2019-08-10 12:29:19 +0200

This closes a whole series of bugs that have been resolved (either FIXED/WONTFIX/INVALID) for quite some time. If you disagree, please file a new issue describing the issue on https://github.com/fieldtrip/fieldtrip/issues.