LAL  7.5.0.1-b72065a
Module StreamSeriesInput.c

Converts an input stream into a time or frequency series.

Author
Creighton, T. D.

Prototypes

void
LAL<typecode>ReadTSeries( LALStatus *stat,
<datatype>TimeSeries *series,
FILE *stream )
void
LAL<typecode>ReadTVectorSeries( LALStatus *stat,
<datatype>TimeVectorSeries *series,
FILE *stream )
void
LAL<typecode>ReadTArraySeries( LALStatus *stat,
<datatype>TimeArraySeries *series,
FILE *stream )
void
LAL<typecode>ReadFSeries( LALStatus *stat,
<datatype>FrequencySeries *series,
FILE *stream )
LAL status structure, see The LALStatus structure for more details.
Definition: LALDatatypes.h:947

Description

These routines parse an input stream *stream to fill in the data and metadata fields of a time or frequency series *series. The field series->data must be NULL, so that it can be created and filled by the routine. The other fields may be initialized or not; they will be overwritten by metadata read from *stream. If an error occurs, *series will be left unchanged, but *stream will have been read up to the point where the error occured.

For each of these prototype templates there are in fact 10 separate routines corresponding to all the atomic datatypes <datatype> (except CHAR) referred to by <typecode>:

<typecode><datatype><typecode><datatype>
I2INT2U2UINT2
I4INT4U4UINT4
I8INT8U8UINT8
SREAL4CCOMPLEX8
DREAL8ZCOMPLEX16
Format for *stream:
The input stream is assumed to be a text stream (ASCII) consisting of a header containing metadata followed by numerical data in standard integer or floating-point format, as recognized by the routines in StringConvert.c. The header consists of zero or more lines beginning with a # character, followed by a metadata field name and value in the format:
# fieldname=value

The = sign in this format is standard but optional; it may be replaced or surrounded with any amount of any whitespace except a newline \n. If fieldname is unrecognized, it is ignored; if it is recognized, then value must be in a suitable format for the field type, as described below. Blank lines, or lines containing just a # character, are skipped. Once a line is encountered that contains non-whitespace characters and does not start with #, that line is assumed to be the beginning of the numerical data. From that point on, all non-whitespace characters must be part of parseable numbers; no more comments are permitted (although blank lines will still be skipped).

If a metadata field appears twice in the header, the later one takes precedence. At present these routines do not track which fields have been previously assigned, so no warnings or errors are generated.

How the data is packed into the series->data structure depends on what metadata has been provided, as described below.

Required, conditional, and optional metadata:

The input stream need not contain a complete set of metadata, allowing some metadata to be read from *stream and others to be set elsewhere. For each type of series, some metadata will be required, and the routine will abort if the metadata is not found. Other metadata are conditional, meaning that the routine will operate differently depending on whether or not these metadata were found. The remaining metadata are optional; if they are not found in *stream, they will be left unchanged. The recognized metadata fields are listed below.

<datatype>TimeSeries:

Required fields:
none
Conditional fields:
length
Optional fields:
name, epoch, deltaT, f0, sampleUnits, datatype

<datatype>TimeVectorSeries:

Required fields:
none
Conditional fields:
length, vectorLength
Optional fields:
name, epoch, deltaT, f0, sampleUnits, datatype

<datatype>TimeArraySeries:

Required fields:
dimLength
Conditional fields:
length, arrayDim
Optional fields:
name, epoch, deltaT, f0, sampleUnits, datatype

<datatype>FrequencySeries:

Required fields:
none
Conditional fields:
length
Optional fields:
name, epoch, deltaT, f0, deltaF, sampleUnits, datatype

Below we describe the required format for the field values, as well as what occurs if a conditional field is or isn't present.

Required fields:

dimLength
(TimeArraySeries only): value consists of a sequence of UINT4s separated by whitespace (but not a newline '\n'). These data are stored in series->data->dimLength: the number of integers gives the number of array indecies, while the value of each integer gives the dimension of the corresponding array index.

Conditional fields:

arrayDim

(TimeArraySeries only): value is a single UINT4, to be stored in series->data->arrayDim. This must equal the product of the index ranges in dimLength, above, or an error is returned. If not given, the arrayDim field will be set equal to the product of the index ranges in dimLength. (The arrayDim and dimLength fields can appear in any order in *stream; checking is done only after all header lines have been read.)

vectorLength

(TimeVectorSeries only): value is a single UINT4, to be stored in series->data->vectorLength. If not specified in the header portion of *stream, it will be taken to be the number of data on the first line of the data portion of *stream, or half the number of real data for a complex-valued TimeVectorSeries; if an odd number of real data are found on the first line of a complex TimeVectorSeries, then an error is returned.

length:
value is a single UINT4, to be stored in series->data->length. If it is specified in the header portion of *stream, data will be read until length is reached. Otherwise, *stream will be read to its end or until an unparseable character is read, and length will then be set accordingly. (If parsing stops in the middle of filling a complex, vector, or array valued element, the partly-read element is discarded.)

Optional fields:

name:

value is a string surrounded by double-quotes, which is parsed in the manner of a string literal in C: it may contain ordinary printable characters (except double-quote and \), escape sequences (such as \t for tab, \n for newline, or \ and double-quote literal backslash and quote characters), and octal or hexadecimal codes (\ooo or \xhh, respectively) for arbitrary bytes. Unlike in C, literals cannot be split between lines, adjacent literals are not concatenated, and converted strings longer than LALNameLength-1 will be truncated. The resulting string is stored in series->name, and will always contain a \0 terminator, beyond which the contents are unspecified.

epoch:

value is a single INT8 number of GPS nanoseconds, or a pair of INT4s representing GPS seconds and nanoseconds separately, separated by non-newline whitespace.

deltaT

(any time series): value is a single REAL8 number.

f0:

value is a single REAL8 number.

deltaF

(FrequencySeries only): value is a single REAL8 number.

sampleUnits:

value is string surrounded by double-quotes; the quotes are stripped and the string passed to XLALParseUnitString() to determine series->sampleUnits. Since XLALParseUnitString() is not very robust, it is recommended to use only unit strings that have been generated by XLALUnitAsString(), or to remove this metadata field and set series->sampleUnits within the code.

datatype:

value is string identifying the series type; e.g. REAL4TimeSeries (not surrounded by quotes). This should correspond to the type of *series, not to any field in *series. If there is a type mismatch, a warning is generated (and errors may occur later while parsing the data).

Data format:
The data portion of *stream consists of whitespace-separated integer or real numbers. For complex input routines, the real data are parsed as alternately the real and imaginary parts of successive complex numbers. By convention, each line should correspond to a single base, complex, vector, or array valued element of the series->data sequence. However, this is required only in the case of a TimeVectorSeries where the vectorLength metadata was not set in the header, since in this case the value of vectorLength will be taken from the number of elements read on the first data line. After this, and in all other cases, newlines are treated as any other whitespace.

If a length value is specified in the header, then data are read until the required length is acheived; if fscanf() returns zero or negative before this (representing either the end-of-input or a character that cannot be interpreted as part of the numerical data), an error is returned. If a length value was not specified, data are read until fscanf() returns zero or negative: at this point any partially-completed complex, vector, or array valued element is discarded, and series->data->length set to the number of elements read.

Algorithm

These routines use LALCHARReadVector() to read the header lines and the first line of data. After this, data are parsed directly from *stream using fscanf(). This is done for efficiency: repeated calling of the LAL string parsing routines in StringConvert.c involves far too much computational overhead.

After the first data line has been read, the length of each sequence element will be known from the atomic type, as well as the specified dimLength (for arrays), vectorLength (for vectors), or number of elements on the first data line (for vectors without an explicitly specified vectorLength). If length is also specified, a sequence of the appropriate size is allocated, and all the data is copied or read directly into it. If length was not specified, the data read with fscanf() are stored in a linked list of buffers of size BUFFSIZE (a local # defined constant) until parsing stops. Then a sequence of the appropriate size is allocated and the data copied into it.