temp_ascii_reader.py does not properly read most current data files

The current reader has several issues.

**First issue**
The guessing is assumed the same for both 1D and 2D, which is not obvious that it is to me.  However for 1D, SasView spent some time long ago to optimize the universality of the guesses which the new code for some reason does not follow.  This leads to several issues:
* start position - the current code looks for the first line beginning with a number. This is not correct. Headers often include rows that start with a number. Moreover there are even some header rows that are all numbers. and even sometimes several rows with numbers only. After trial and error over a few years and a number of formats, the recipe for finding the starting line is: find the first 3 rows in a row with only numbers and exactly the same number of numbers (same number of columns)
* number of columns - a minor issue. Currently there is some code to pull the number of columns on all rows after the first (what happens when you hit a footer row that no longer has numbers?) and chooses the most frequently encountered number of columns. Data should all have exactly the same number of columns and in fact is what is used above to determine that one is in fact inside the data block.  This whole method should be removed and the logic checking for number of columns moved to the start position method IMO.
* ending -- currently the assumption seems to be that everything from the "starting" row onward is valid data. This is mostly true. However, there used to be some data formats which used footers instead of headers. the way around that was to define the number of rows as the number of rows from starting row till either EOF **or** reaching a line that was not a row of numbers of the same number of columns as the rest.

**Second Issue**
The assumed order in the `onedim` and `twodim` are incorrect for most data out there I believe. Almost all existing `onedim` data follows the order q, I, dI, sigmQ, "mean Q", and shadow factor (where mean Q corrects Q for the shadow factor).  So the current order is right for 2 and 3 column data but not for 4. The last 2 columns were proposed at a noBUGS meeting years ago but only implemented in the NIST ABS data format as far as I know.

There are rather few 2D ascii data formats in Q space out there. the NIST *.DAT format is the main one I know of (unless GRASP has one but I believe its 2D data is in pixel format?), and served as the basis for the first 2D reduced data used by SasView. That format follows Qx,Qy,I(Qx,Qy,Qz), dI(Qx,Qy,Qz), Qz, sigmaQ parallel (to Q), sigmaQ perpendicular (to Q), Beamstop shadow factor, mask (only in some cases). However I note that we have 3, 4, and 7 column *.dat formats in our 2D example data. This needs to be investigated as to what they are and how they were being treated. I note that the ASCII extension has been deprecated. I also note that Q data put on a grid can be done in many ways so hard to put a default to that.

**Third Issue**
Only two datatypes currently appear to be envisioned: `onedim` and `twodim` with *.DAT being the only 2D ASCII envisioned (possibly true). The ABS 1D type is unique and should be addressed separately. Other 1D types might also need to be addressed separately if one wants to extract metadata from the headers (not currently done in SasView 6.x). 

In fact all extensions should be handled by `sasdata` I think as discussed in [sasview#3899](https://github.com/SasView/sasview/issues/3899 ). Moreover, I believe `xml` and `hdf` should **not** require those extensions as the files themselves are self-describing so the reader should be able to decide if a file is an `xml` of `hdf` format if I'm not mistaken?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

temp_ascii_reader.py does not properly read most current data files #192

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

temp_ascii_reader.py does not properly read most current data files #192

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions