commands.md

EpiData Analysis

Command and Function Reference Guide (version 2.8)


###Commands

Manage data Analyze data Write programs
read
save
append
merge
aggregate
use datasets
create new content
list content
edit content
delete content
Consistency and Validity Checks
Reports
describe variables
tables
frequencies
means
count
survival analysis
scatter plot
line plot
bar chart
histogram
epicurve
select observations
if-then
sort data
Disk and file commands
set parameters
Labels, Values and format in output
Types of Variables
How to use Variables and References
run scripts
Clean up & stop
Functions
Operators
Startup options

Some commands are currently only available in EpiData Analysis Classic. Download EpiData Classic here

  • Linear regression
  • SPC graphs - Pareto Charts, Ichart etc.

Syntax for all commands

command <variables | expression> [!option] [!option := a|b]

In command descriptions, the following notation is used

  [ ] : optional specification of observation number.    
  {a|b|...} : indicates alternative choices  
  <...> : indicates user specified name/identifier  

If you are in doubt of when to use double quotes "" and when not, the rule is:

  • Use "..." for all external references (e.g. "file names.ext") or assignments of text values ( e.g. set "COMMANDLOG" := "ON")
  • Double quotes are NOT needed for variables, defined value labels, or dataset names

Read and Save Data

## read
read [{"<filename>" | <expression>}] [!options ...]

Read a copy of the data file into memory.

Note: The file name, including the file extension must be contained in " ".
If you use a case sensitive operating system (Linux) then the case of filename and extension is important.

parameters

  • filename

An optional filename may be given in quotes. The file format is detected based on the file extension, which must be one of {csv | epx | epz | dta}. If no filename is given, the open file dialog is started.

All EpiData products support reading Stata files from version 4 to 16 (format 105 to 118). Stata has a special .dta version (format 119) which is only for datasets with more than 32767 variables. This format may also be used in Stata 15 and 16, but is not the standard format. This format is not supported by EpiData.

See this link for at technical description of the .dta formats

  • expression

An optional expression (instead of <filename>) that resolves to a string specifying the filename. (see examples)

options

  • !force

Will force reading a locked epx/epz file.

Note: Use with caution since the project may be used by someone else

  • !c

Close the current project. It is only required when project (data, labels, variables, valuelabels, etc.) has changed.

  • !d := "<_single character_>"

Use the character as the delimiter used when reading .txt/.csv files. This option is only valid when importing delimited files. During import the delimiter will be validated. If the structure of the file does not fit with the selected delimiter, it will be rejected.

If no delimiter is set for importing delimited files, the program will guess which delimiter is used.

  • !q := "<_single character_>"

Use the character for recognizing quoted strings when reading .txt/.csv files. This option is only valid when importing delimited files. If character is set to empty/nothing (e.g. !q := "") then the complete content of the file is used in guessing/validating delimiters. If no quote characters is set, the default " (double quote) is used to identify strings.

  • !vn := {true|false}

Read the first line in a .txt/.csv file as either variable names or as data.

  • true: read the first line as variable names, with no regards to the actual content.
  • false: read the first line as data, with no regards to the actual content.

If the option is not used when reading delimited files, the program will look at the line and guess if it is data or headers.

Note: In files with all string data and where first line is known to be variable names, the program has a high probability of guessing incorrectly whether first line is headers or not. In such cases please use this option to ensure correct reading of data.

  • !pw := "<_string>_"

If you use this option with a password protected file (.epx | .epz | .rec), reading will occur directly, otherwise you will be prompted for a password.

  • !login := "<_string_>"

When an EpiData project (.epx or .epz) has Extended Access enabled, you will be prompted to log in with a password and a username. With this option, the file is opened with the supplied login. A password for that user may be supplied with option

  • !pw := "<_string_>"

examples

read "bromar.rec";      // complete filename provided
read "bromar" + ".epx"; // expression using two strings in concatenation
new global fn string;
fn := "bromar.epx";
read fn;                // expression using the variable fn

save

save [{"<filename>" | <expression>}] [!replace] [!force] [!output [:="{html | text}"] ]
     [!format:="{stata|epidata|csv}"] [options]

Save a copy of all variables in memory to a file, to use the data again

parameters

  • filename

An optional filename may be given in quotes. The file format is detected based on the file extension which must be one of {csv|epx|epz|dta}. If no filename is given the save file dialog is started.

  • expression

An optional expression (instead of filename) that resolves to a string with the filename. This can be e.g. a concatenation of strings, a global variable or something else. See read for examples.

options

  • !replace

Overwrite existing file

  • !force

Will force overwriting a locked .epx/.epz file

Note: Use with caution since the project may be used by someone else

  • !output [:= "html | text"\]

Instead of saving to the project, this options saves the output to a file. If no format is specified (text/html) then the current output format is used.

  • !format := "{stata|epidata|csv}

Save the project as a specific type.

This will ONLY change the content. The user must make sure the file has the correct extension.

  • !d := "<single character>"

Will force the delimiter of values/names used when writing .txt/.csv files. This option is only valid when saving delimited files. The default value is "," (comma)

  • !q := "<single character>"

Will force the character used for writing quoted strings. This option is only valid when saving delimited files. The default value is " (double quotation mark)

  • !version := <integer>

Specify which stata version to save the data as. Accept values range from 4 -> 14, default is 14

  • !vn := {true|false}

If true the first line in the .csv/.txt file is the variable names. Default value is "true"

  • !dated := "<single character>"

Will force the character used for writing date delimiters. Default value is "/" (slash)

  • !timed := "<single character>"

Will force the character used for writing time delimiters. Default value is ":" (colon)

  • !decd:= "<single character>"

Will force the character used for writing decimal seperators. Default value is "." (dot)

  • !nl := "<single character>"

Will force the end of line character used in the .txt/.csv file. Default value is that of the operating system:
Linux: #10 MacOS: #13 Windows: #13#10

  • !memonl := "<single character>"

Will force the character used for writing the linebreaks from a memo variable. Default value is " " (space)

  • !fixed

Forces the writing of the .csv/.txt in a fixed format, where the length of each variable is used to specify the length of each column.

Note: Using !fixed ignores the use of the option !d

  • !bom

This options adds the UTF-8 Byte Order Mark to the .csv/.txt file.

Merge, Append & Aggregate

append

append [<var1> <var2>...] [!ds := <dataset>] [!fn := lt;filename>]

Add observations after all observations in current file

options

  • !fn := <filename>

    Append this file. Without !fn the file open dialog is shown. All known file types may be used and all options from read associated with reading of data may be used. e.g. !d or !h options for reading csv files.

  • !ds := <dataset>

    Specifies which dataset to use from the external file. It is only needed if the external file contains multiple datasets.

    Only fields with same name as variables in memory will be read. Variables from previous read which are not in the appended file will be set to missing for the appended observations.

See variables on using referenced variables for this command

merge

merge [<key<sub>1</sub>> <key<sub>2</sub>> ...]
      [!fn [:= "<filename>":]] [!ds := <dataset>]
      [!table] [!combine> !update> !replace]

Merge the current data file with another dataset file based on key variable(s). The result is a NEW dataset which is added to the top level of the project.

options

  • !fn [:= <filename>]

    Merge with a dataset in this file. Without !fn it is assumed that current used dataset should be merge with a related dataset. If no is provided then a dialog is shown to open the external file. All known file types may be used and all options from read associated with reading of data can be used too. e.g. !d or !h options for reading csv files.

  • !ds := <dataset>

    Specifies which dataset to use. It is only needed if the multiple datasets exist (either as related datasets or in an external file).

  • !label := "<text>"

  • !l := "<text>"

    Assign the descriptive text as a caption/label for the resulting dataset.

  • !table

    The external file is used as a lookup table. e.g. to add person information to a file with clinical results

  • !combine

    All non-missing values in the external dataset replaces all MISSING values for common variables

  • !update

    All non-missing values in the external dataset replaces ALL values for common variables

  • !replace

    All values in the external dataset replace all values for common variables

  • !nu

    By default the resulting data is automatically used. By using this option the program will stay on the current dataset after the command has completed.

  • !r

Specify a different name for the resulting dataset. Default naming is a concatenation of the two used datasets

To keep information in variables with an identical name (e.g. mergevar or name) from all files rename the variables before you merge the next file. e.g.

read "pt.epx";
edit variable name !r := "ptname";
merge hospid !filename := "hospital.epx" !r := MergeDs;
use MergeDs;
edit variable name !r := "hospname";
etc ....

After merge the variable mergevar indicates source of information for each observation (records). mergevar is defined with variable labels for these values:

1 = In main dataset only
2 = In merged dataset only
3 = In both datasets

Note: if **mergevar already exists in the dataset, an error will occur and the merge will be stopped. Drop the **mergevar variable first.

example

// Load a project
read "Clinical Example.epx";

// A: Internal merge (1 related dataset)
// Since the current used dataset only has a single related dataset,
// there is enough information provided by the key variable to combine to two datasets.
merge;

// B: Internal merge (2+ related datasets)
// The currently used dataset have 2 or more related datasets, so we need to use the
// option !ds := <id> to specify which dataset we want to merge with.
// Use  list to see which dataset id's you have in the project
list ds        
merge !ds := labdata

// C: External merge (1+ dataset in external file - e.g. a .csv file)
// In order to merge with an external file the key variables to merge on must be provided
merge patientid !ds := "PatientNames.csv":labdata

// D: External merge (2+ datasets in external file)
// In order to merge with an external file that has 2+ dataset, both the key variables
// AND the dataset you wish to merge with must be provided!
merge patientid !ds := firstdataset !filename := "PatientNames.epx"

See variables on using referenced variables for this command

aggregate / agg

aggregate [<var1> <var2>...] [!options]

Aggregate - collapse - combine - data when you wish to change from individual to group level.

See variables on using referenced variables for this command

options

  • !m

Include observations with missing data (.)

  • !q

Hides all output

  • `!nc

`Do not show counts of observations for each variable

  • !nt

Do not show total counts of observations

  • !ds := <dataset>

Save the aggregated dataset as it's own dataset with given name

  • !replace

use in combination with !ds to replace an existing dataset

  • !caption := "<string>"

give the dataset a caption (used both in output and with !ds)

  • !hd := <global string vector>

give a custom label for the generated variables and show these as column descriptors in table head

  • !u

use in combination with !ds. Use the generated dataset after this command completes. Cannot be used with an active select!

  • !full

Expands the resulting dataset with ALL possible value-combinations from ... All entries with no data will contain system missing.

See labeling for options on changing between labels/values

See formatting for options on formatting percentages

Summary statistic options

Each of these options must be followed by class="cmd">:= variable. Only one variable can be given, but each option can be used multiple times with different variables.

  • !mv Count of missing values (system and user defined)

  • !mean Mean of the variable

  • !sd Standard deviation

  • !sv Variance

  • !min Minimum value

  • !med Median value

  • !max Maximum value

  • !pXX XX percentile (XX = 1, 5, 10,25, 50, 75, 90, 95, 99)

  • !sum Sum of values

  • !des Min, Median and Max

  • !iqr p25 and p75

  • !idr p10 and p90

  • !isr p5 and p95

  • !mci Mean and 95% CI (low + high)

example

// define a global vectore for column texts:
new global columntxt[5] string;
columntxt[1] := "Group";
columntxt[2] := "N Total";
columntxt[3] := "n (observed)";
columntxt[4] := "Mean";
columntxt[5] := .;
agg sex age family !hd:=columntxt  !mci:=economy !mci:=children ;

Using datasets & Sorting

use

use <dataset>

Change the active dataset of a project.

See variables on using referenced variables for this command

example

read "Clinical Example.epx";
list dataset;
use datafile_id_2;

sort

sort *variable<sub>1</sub> [*variable<sub>2</sub> ...] [!<u>d</u>escending]

Sort the current dataset based on the given variables. Sort respects current select!

  • !descending
    !d

Sorts the dataset in decending order

See variables on using referenced variables for this command

Creating content

<a name="creategrp"id="creategrp">

new project / new p

new project

Creates a new empty project, e.g. for simulation or testing.

options

  • !size:= <integer>

Adds an initial dataform with observations in it. If omitted, no dataform is created and must be created manually.

  • !label:= "<text>"

Adds a title to the project. If not used a default title is given.

  • !c

Closes and open project if modified

  • !pw := <string>

Encrypts the data of the project with a single password. All data is encrypted with the AES/Rijndael algorithm, but metadata is not encrypted.

  • !showFieldNames := <boolean>
    !sfn := <boolean>

Show/Hide the field name next to the entry field in Manager and EntryClient

  • !showFieldBorders := <boolean>
    !sfb := <boolean>

Show/Hide the border of the entry field in Manager and EntryClient

  • !backupInterval := <integer>
    !bi := <integer>

Interval at witch Manager and EntryClient automatically saved the project

  • !backupOnShutdown := <boolean>
    !bos := <boolean>

Perform a backup when closing the project. The name for the backup is based on the current date/time.

new dataset / new

new dataset <dataset> [!options...]

Create a new dataset for the project. Use the options to specify relations between datasets. If the command completes successfully, the newly created dataset is automatically used

options

  • !parent := "<parentform id>"

Used for creating parent-child relations. If omitted the dataform is created as a top-level dataform.

  • !label := "<text>"
    !l := "<text>"

    Assign the descriptive text as a caption/label for the dataset.

  • !childobs := <integer>

    Used only in combination with !parent. Gives the number of allowed child observations in the child dataset (0 = no limit)

  • !afterobs := <integer>

    Used only in combination with !parent. Tells EntryClient what happens after entry of one complete observation

0 = new observation
1 = return to parent
2 = return on max number of observataions
3 = stay on the current observation
  • !statusbar := "<text>"

    Sets the "content string" of a dataform (see manager for formatting).

  • !size := <integer>

    Initialize the dataform with empty observations.

See variables on using referenced variables for this command

new variable / new var / new v

new variable *variable <type> [:= expression] [!options...]

Create a new variable of a given type and optionally assign the value in expression. The variable type and expressions type must be compatible. Variables contain a value for each observation. If no expression is given, all values will be missing.

options

  • !label := "<text>" class="option">
    !l := "<text>"

    Assign the descriptive text as a label for the variable. An existing variable label will be replaced with the new one.

  • !valuelabel := <valuelabel name>
    !vl := <valuelabel name>

    Assign an existing valuelabel set to the variable. An existing assignment will be replaced but not deleted. To delete the existing valuelabel set see deleting content

  • !length := <integer> class="option">
    !le := <integer>

    Changes the entry length of a variable

  • !decimal := <integer> class="option">
    !dec := <integer>

    Change the decimal entry length for floating point variables. Changing the decimal length for other variable types have no impact

  • !rangelow := <value>

    Set the lower bound for a range of values. Must be used in combination with !rangehigh

  • !rangehigh:= <value>

    Set the upper bound for a range of values. Must be used in combination with !rangelow

  • !entrymode := {0|1|2}

    Change the entry mode used in EpiData EntryClient

0 = default
1 = must enter
2 = no enter
  • !confirm

    If used, the variable has the "confirm entry" flag set. Used in EpiData EntryClient

  • !key

    Adds the variable to be part of the key for the current dataset

  • !cmpX := *variable

Where "X" is replaced with one of GT, LT, GE, LE, EQ, NE. Adds comparison between the new variable and the assigned variable

  • !u> !memo

When creating a string variable it is possible to specify the sub type using one of the above options. !u specifies this is an uppercase string variable. !memo specifies this a memo variable

  • !dmy> !mdy> !ymd

When creating a date variable it is possible to specify the sub type using one of the above options. !dmy is the default type if no option is used else the specified sub type is used

  • !auto [{0|1|2}]

When creating a variable that supports automatic content (date, time or integer), using this option changes the default type to the automatic type.
Integer become AutoIncrement, DMY becomes AutoDMY, etc.

For time and date variables the number specifies when the variable is updated:

 0 = When obervation is created (default)
 1 = When observation is first saved
 2 = Each time the record is saved after being edited

examples

Examples where all observations get the same value:

new variable v1 integer := 1 + 2 * 3 - 4;
new variable v2 float   := (2 * pi) * 5;
new variable v3 string  := "Hello World!";
new variable v4 time    := now();
new variable v5 boolean := (2 > 3);
new variable v6 date    := today();

Examples where a value depends on other variables:

new variable v1 integer := v14 + v17;
// v1 is equal to sum of v14 and v17
new variable age date   := integer((today() - dateborn)/365.25)
// calculated age in whole years

See variables on using referenced variables for this command

new global / new g

new global Variable <type> [:= expression]
new global Variable <integer expression> <type> [:= expression]

Create a new global variable

parameters

  • Variable must be unique. If the variable name is followed by square brackets [...], then a global vector is created, where each entry can be individually accessed using
  • type is a valid EpiData type
  • expression is a value assigned to the global variable. The global variable type and expressions type must be compatible

A global variable or parameter has only one value, whereas a standard variable has one value for each observation. Global variables can for most parts be used like as a regular variable.

If a value is assigned when creating a new global vector all entries of the vector will have the same value!

example

new global g1 integer     := 1 + 2 * 3 - 4;
new global g2 float       := (2 * pi) * 5;
new global g3 string      := "Hello World!";
new global g4 time        := now();
new global g5 boolean     := (2 > 3);
new global g6 date        := today();
new global g7[10] integer := 10;
g7[3]                     := 20;

See variables on using referenced variables for this command

new valuelabel / new vl

new valuelabel <name> <type> (<value> , <label>) (...) [!m := <value>]

Create a new value label set with a given type (boolean not supported) and assign at least one (value, label) pair.

parameters

  • Each (value, label) pair will be added to the newly created set. The datatype of the value MUST match the defined datatype for the value label set itself. It is not possible to create an empty valuelabel set.

  • The valuelabel name must be unique; it cannot be the same as any variable. A useful practice is to start the valuelabel name with an underscore: _

Note: An empty set will restrict data entry to system missing only!

options

  • !m := <value>

    Marks the given value in the value label set as missing. If the value is not part of the (value, label) pairs or the datatype does not match an error will be reported. This option can be used multiple times with different values.

example

// "normal" value label
new valuelabel _VL1 integer (1, "Value A") (2, "Value B") (9, "Missing") !m := 9;
// using expression
new valuelabel _VL2 integer (0 + 1, "This " + "is " + "value " + 1) (1 + 1, "This " + "is " + "value " + 2) (2 + 1, "This " + "is" + "value " + 2);

See edit valuelabels"> for more advanced use of variables and loops to create additional valuelabels

See variables on using referenced variables for this command

Listing content

browse

browse [variable1 [variable2 ...] ] [options]

Show the variables mentioned in a spreadsheet grid

parameters

  • without variable names, browse all variables

After browse has started you may Right Click and see how to close or adapt columns. Browse will, by default, follow the show formats setting.

options

  • !caption := "<string>"

Give the browser window a custom caption

  • !c

Close all currently open browsers

  • !a

Arrange all browsers in a cascade

  • !vn

    Show variable names instead of following the 'set "FORMAT VALUE LABEL"' and
    set "FORMAT VARIABLE LABEL" options

See variables on using referenced variables for this command

Note: browse is much faster than list

list data / list d

list data [variable1 [variable2 ...]]

Show values on the screen for all variables mentioned, with one observation per line (not limited by the width of the display)

parameters

  • without variable names, list all variables.

See labeling for options on changing between labels/values

Note: browse is much faster than list.

Note: When list follows class="cmd">select: the sequence number is within the current select, not for the whole dataset.

See variables on using referenced variables for this command

list project / list p

list project

Shows a brief overview of the project

###Options

  • !info

    Also shows the study information

list dataset / list ds

list dataset

Shows a list of datasets for the project

options

  • !all

    Outputs additional information about the listed datasets

list variable / list var / list v

list variable

List all currently defined variable names, types, formats and labels

list valuelabel / list vl

list valuelabel

Show the full list of all valuelabel sets. Each set is listed individually as value/label pair and marked whether a value is considered missing or not.

list results / list res / list r

list results

List all current result variables and their values.

means, describe, tables and other estimation commands create result variables, e.g. $mean[1] or $count. All result variables for a commandn are cleared when running the same command again.

list global / list g

list global

List currently defined global variables and their types and value. Global variables contain a single value and global vectors contain multiple values. The list shows both types.

Editing variable and label definitions

edit project / edit p

edit project

Edits a project.

options

  • !label:= "<text>"

    Adds a title to the project. If not used a default title is given.

  • !showFieldNames := <boolean>
    !sfn := <boolean>

Show/Hide the field name next to the entry field in Manager and EntryClient

  • !showFieldBorders := <boolean>
    !sfb := <boolean>

Show/Hide the border of the entry field in Manager and EntryClient

  • !backupInterval := <integer>

  • !bi := <integer>

Interval at witch Manager and EntryClient automatically saved the project

  • !backupOnShutdown := <boolean>
    !bos := <boolean>

Perform a backup when closing the project. The name for the backup is based on the current date/time.

edit dataset / edit ds

edit dataset *dataset<sub>1</sub> [!options...]

Edit an existing dataset in the project.

options

  • !label := "<text>"

    Assign the descriptive text as a caption/label for the dataset.

  • !childobs := <integer>

    Used only if dataset is related to a parent. Gives the number of allowed child observations (0 = no limit)

  • !afterobs := <integer>

    Used only if the dataset is related to a parent. Tells EntryClient what happens after entering the whole observation

0 = new observation,
1 = return to parent
2 = return on max observation
3 = stay on current observation
  • !statusbar := "<text>"

    Sets the "content string" of a dataform (see manager for formatting).

  • !size := <integer>

    Changes the size of the dataset to amount of observations.

  • !r := <new dataset name>

    Changes the name of the dataset. If the name is already in use an error will occur.

  • !noparent

    Moves the current dataset (and all related datasets) to be a top-level dataset.

Note: create a new empty child dataset to restore the relate situation followed by a merge of data in child datasets.

See variables on using referenced variables for this command

edit variable / edit var / edit v

edit variable *variable<sub>1</sub> [!<options>...]

Edit the metadata of *variable1. The options specify which metadata are changed, multiple options may be used at once

options

  • !label := "<text>"

    Assign the descriptive text as a label for the variable. An existing variable label will be replaced with the new one.

  • !vl := <valuelabel id>

    Assign an existing valuelabel set to the variable. An existing assignment will be replaced but not deleted. To delete the existing valuelabel set, see deleting content

  • !novl

    Removes an existing valuelabel set from the variable.

  • !l := <integer>

    Changes the entry length of a variable

  • !d := <integer>

    Changes the decimal entry length for floating point variables. Changing the decimal length for other variable types have no impact

  • !min := <value>

    Set the lower bound for a range of values. Must be used in combination with !max

  • !max:= <value>

    Set the upper bound for a range of values. Must be used in combination with !min

  • !norange

    Removes an existing defined range for the variable

  • !entry := <integer>

    Changes the entry mode used in EpiData EntryClient

0 = default
1 = must enter
2 = no enter
  • !cmpX := *variable

    Where "X" is replaced with one of GT, LT, GE, LE, EQ, NE. Adds comparison between this variable and the assigned variable.

  • !confirm

    If used, the variable has the "confirm entry" flag set. Used in EpiData EntryClient.

  • !noconfirm

    If used, the variable has the "confirm entry" flag unset. Used in EpiData EntryClient.

  • !key

    Adds the variable to be part of the key for the current dataset

  • !nokey

    Removes the variable from being part of the key for the current dataset

  • !r := <new variable name>

    Changes the name of the variable. If the name is already in use an error will occur.

Note: Data values are NOT changed! Even if the new length or decimals is shorter than actual content. To keep the changes made you must save the data.

See variables on using referenced variables for this command

edit valuelabel / edit vl

edit valuelabel *valuelabel<sub>1</sub> [(<value> , <text>) ...] [!m := <value>] [!delete := <value>] [!nomissing := <value>]

Edit an existing value label set and optionally assign any number of (value, label) pairs.

If a (value, label) pair already exist, the new label will replace the old label. Otherwise the (value, label) pair will be added to the set. The datatype of the value MUST match the datatype for the value label set itself.

options

  • !m := <value>

    Marks the given value in the value label set as missing. If the value is not part of the (value, label) pairs, an already existing pair or the datatype does not match an error will be reported.

    This option can be used multiple times with different values.

  • !d := <value>

Deletes the value label pair with the given value. If no such pair exists and error will be reported.

This option can be used multiple times with different values.

  • !nom := <value>

Removes the marks on the given value label pair that it should be considered missing. If no such pair exists and error will be reported.

This option can be used multiple times with different values.

  • !r := <new value label name>

Changes the name of the value label. If the new name is already in use an error will occur.

Note: All variables already using this value label will continue to have the same label.

To remove a valuelabel from a variable, see edit variable

example

// Create a new simple valuelabel
new vl _VL1 int (1, "A");

// Simple edit: add another valuelabel
edit vl _VL1 (2, "B");

// Simple edit: replace an existing valuelabel
edit vl _VL1 (2, "Replaced B");

// Create a valuelabel set using a loop.
new valuelabel _VL2 int (1, "This is the first value label"); // create a new valuelabel set
new global i integer;                                         // we also need a loop variable

// now create 5 pairs (2, "... 2") (3, "... 3") ...
for i := 2 to 5 do
  edit valuelabel _VL2 (i, "This is valuelabel no: " + i);

See variables on using referenced variables for this command

edit data / edit d

edit data [!md] [!nomd] [!mv] [!nomv]

Edit the status of observations

options

  • !md / !nomd

    Marks / Unmarks the current select observations for deletion

  • !mv / !nomv

    Marks / Unmarks the current select observations as verified

Deleting content

drop dataset / drop ds

drop dataset *dataset1* [*dataset2* ...]

Remove the listed datasets (and related datasets) from memory

See variables on using referenced variables for this command

drop data / drop d

drop data [!del]

Drop all data within current select from memory. Save the data first if you wish to keep any changes.

options

  • !del Drops all observations marked for deletion.

Note: Make sure to test whether this creates a problem in a related dataset with the check command

example

read "bromar.epx";
select (id > 1000) do
  drop data;  // Drops all observations where id > 1000, but keeps the rest.

read "bromar.epx";
drop data !del ; // drop all observations "marked for deletion"

drop variable / drop var / drop v

drop variable *variable<sub>1</sub> [*variable<sub>2</sub> ...]

Remove the listed variables from memory

See variables on using referenced variables for this command

Note: The default is to estimate the 95% confidence interval for odds ratio or risk ratio. See the set command to choose a different interval.

Descriptive statistics

freq / fre

freq variable1 [!<option> ...]

Frequency distribution for variable1

parameters

  • variable1 may be any type

options

  • !m

Include observations with missing data (.)

  • !cum

Add cumulative percentage

  • !r

Add row percentage

See labeling for options on changing between labels/values

See formatting for options on formatting percentages

See variables on using referenced variables for this command

means

means variable1 [!by=*variable2] [!t]

Basic descriptive statistics for variable1, optionally stratified by variable2 with analysis of variance.

  • Statistics: count, total, mean, variance, standard deviation, 95% confidence interval for the mean, standard error, skewness, excess kurtosis.
  • Percentiles: minimum, 5%, 10%, 25%, median, 75%, 90%, 95%, maximum.

See variables on using referenced variables for this command

parameters

  • variable1 must be numeric

options

  • !by:

Stratify by this variable

  • !t:

Analysis of Variance to test for homogeneity of the mean across strata, including Bartletts test for homogeneity of variance.

With class="cmd">!by (stratified), F-test is given.

Without class="cmd">!by (one stratum), T-test that mean=0 (e.g. as a paired T-test for the difference in before and after measures)

Warning: Check results carefully if !by variable has only one observation in a stratum

Estimates are saved as result variables. Use class="cmd">list results for details

See labeling for options on changing between labels/values

methodology notes:

describe

describe *variable list [option list]

Basic descriptive statistics and frequencies for a group of variables

With no options specified, a single table will be provided, with one row per variable showing: number of observations, number of unique values

For numerical variables, the output will also include mean, standard deviation, minimum, median, maximum

statistic options

Use any combination of options to customize the output

  • !msd: mean, standard deviation and sum

  • !mci: mean and confidence interval. See set to change the confidence interval

  • !rm: minimum, median, maximum

  • !idr: 10th percentile, median, 90th percentile

  • !iqr: 25th percentile, median, 75th percentile

  • !fh: 5 most frequent values

  • !fl: 5 least frequent values

  • !fb: 5 most frequent and 5 least frequent values

  • !ct: force one row per variable when the above options are specified. This will be ignored if one of class="cmd">!fh !fl !fb is specified.

  • !m:number of missing values

See variables on using referenced variables for this command

See labeling for options on changing between labels/values

methodology notes

  • All statistics are based on the means command and all frequencies are based on the freq command, so results from describe will be exactly the same as those from means or freq.

count

count

Counts number of observations. Count may be used with select to count within a subgroup. No parameters or options apply.

result variables:

  • $count

tables / tab

tab *<column variable> <row variable> [!<option> ...]

Crosstabulate the variables chosen.

Data and output options

  • !m

    Include observations with missing data (.)

  • !w := <variable>

Use number of observations in the variable as frequency weight

  • !by := <variable>

Stratify the data by this variable. If multiple !by options are used, each unique combination of values from the by-variables will have it's own sub-table.

  • !q

Hide all output! Result variable are still calculated

  • !nc

Hide combined/unstratified tables

  • !nb

Hide sub/stratified tables

  • !ns

Hide summary table

percentage options

  • !pr

Show row percents for each table cell and col/row totals

  • !pc

Show col percents for each table cell and col/row totals

  • !pt

Show total percents for each table cell and col/row totals

sort optionlist

Indicate by !sxxx where the x may include
r:row c:Column a:Ascending d:Descending t:Total l:label (else numerical)

  • !sa Sort col & row in ascending value order

  • !sd Sort col & row in descending value order

  • !sla Sort col & row in ascending label order

  • !sld Sort col & row in descending label order

  • !sca := <index> Sort col ascending value order in given index

  • !scd := <index> Sort col descending value order in given index

  • !sra := <index> Sort row ascending value order in given index

  • !srd := <index> Sort row descending value order in given index

  • !scta Sort on col totals ascending order

  • !scta Sort on col totals descending order

  • !srta Sort on row totals ascending order

  • !srtd Sort on row totals descending order

estimation and testing options

  • !t

Chi2

  • !ex

Fisher Exact test for 2x2 tables only

  • !odds

Odds Ratio and confidence interval for 2x2 tables, including Mantel-Haenszel adjustment for stratified data

  • !rr

Risk Ratio and confidence interval for 2x2 tables, including Mantel-Haenszel adjustment for stratified data

Note:The default is to estimate the 95% confidence interval for odds ratio or risk ratio. See the set command to choose a different interval.

See labeling for options on changing between labels/values

See formatting for options on formatting percentages

ctable / cta

cta <column variable> <row variables> [!<option> ...]

The ctable command summarizes a series of cross tables for the first variable against each of the following variables.

parameters

  • column variable will usually have only two values, as with an outcome

The ctable options have the same meaning as in the tables command.

data and output options

  • !m

Include observations with missing data (.)

  • !w := <variable>

Use number of observations in the variable as frequency weight

  • !by := <variable>

Stratify the data by this variable. If multiple !by options are used, estimates of odds ratio, risk ratio and chi-square will be based on the combination of all stratifying variables.

Note that attack rates and the Fisher Exact Test will be based on unstratified data always.

  • !q Hide all output! Result variable are still calculated

sort options

Sorting (applies to individual variable tables). Indicate by !sxxx where the x indicate:
r:row c:Column a:Ascending d:Descending t:Total l:label (else numerical)

  • !sa Sort col & row in ascending value order

  • !sd Sort col & row in descending value order

  • !sla Sort col & row in ascending label order

  • !sld Sort col & row in descending label order

  • !sca := <integer> Sort col ascending value order in given index

  • !scd := <integer> Sort col descending value order in given index

  • !sra := <integer> Sort row ascending value order in given index

  • !srd := <integer> Sort row descending value order in given index

  • !scta Sort on col totals ascending order

  • !scta Sort on col totals descending order

  • !srta Sort on row totals ascending order

  • !srtd Sort on row totals descending order

estimation and testing options

  • !t

Chi2 and p-value

  • !ex

Fisher Exact test for 2x2 tables only

  • !odds

Odds Ratio for 2x2 tables, including Mantel-Haenszel adjustment for stratified data

  • !rr

Risk Ratio for 2x2 tables, including Mantel-Haenszel adjustment for stratified data

Note:The default is to estimate the 95% confidence interval for odds ratio or risk ratio. See the set command to choose a different interval.

Attack rate table

An attack rate table is commonly used in food-borne outbreak investigations. These options simpligy review and reporting of multiple exposures.

  • !ar

Show unstratified 2x2 tables, attack rates and risk ratios

  • !en

Show unstratified 2x2 tables

output table sort options

Only one may be given

  • !sn

Sort the table rows by variable name

  • !sl

Sort the table rows by variable label

  • !ss

Sort the table rows by key statistic, depending on the estimation options
priority is given to RR then OR then Fisher Exact P then Chi2 P

See labeling for options on changing between labels/values

See formatting for options on formatting percentages

See variables on using referenced variables for this command

Graphs and charts

survival / sur

survival outcomevariable timevariable [!by:=stratifyvariable] [options]
survival outcomevariable date1 date2 [!by:=stratifyvariable] [options]

Kaplan-Meier plots and lifetables for time-to-failure data with censoring. Tabulations of survival at each time when there were deaths (failures), plus confidence intervals. A summary table shows the median survival by stratum. The KM plot is always provided in a separate window unless !q is specified as an option.

parameters

  • outcome variable must have discrete values, one of which indicates failure or death
  • date variable must be an integer. The outcome variable may be numeric or string
  • date1 and date2 must be date variables. Elapsed time is calculated as date2 - date1

options

  • !o

Specify the value of outcome indicating death (failure), which may be numeric or text; the default is zero

  • !by

Stratify by this variable

  • !t

Log-rank test for equality of survival among strata

  • !ref:=value

reference value for the hazard ratio (only with !t)

  • !w:=weightVariable

Specify a weight variable

  • !mt Missing values of date2 take the maximum value of date2

  • !exit:=datevalue

Missing values of date2 are assigned this date. It may be easiest so use the createdate function to specify the date.

  • !i:="t1,t2,t3,...tn"

    Aggregate data to these time intervals. If the string is missing, the set value for LIFETABLE INTERVAL is used

  • !adj

    When intervals are specified, adjust the number at risk to exclude half of the censored subjects (Hosmer, Lemeshow)

output options

  • !nt Omit the lifetables

  • !nou Omit the unstratified lifetable

  • !nos Omit the stratified lifetables

  • !ns Omit the summary table

  • !ng Do not show the KM plot

Kaplan-Meier plot options

  • !cb Copy the KM plot points to the clipboard for use in other software

By default, confidence intervals are shown as error bars

  • !cin Omit the confidence intervals from the KM plot

  • !cib Show the confidence intervals as shaded bands. The default KM plot shows the upper and lower confidence intervals as dotted lines.

  • !cil Show the confidence intervals as dotted lines.

  • See graph options

survival is a graph command and any graph option may be specified

result variables

Estimates are saved as result variables. Use list results for details

methodology

  • confidence intervals calculated using the method in Statistics with Confidence, referenced elsewhere.

See labeling for options on changing between labels/values

See variables on using referenced variables for this command

See variables on using referenced variables for this command

scatter

scatter Xvariable Yvariable [graphoptionlist]

Simple scatter plot for two variables.

parameters

  • Xvariable may also be integer, float or date/time
  • Yvariable may be integers or float

options

  • !l

    Draw a line instead of points

  • !p

    Draw points as well as a line (use if !l was specified)

  • !colors:="colorMap"

    colorMap is a string of up to 10 digits mapping the Analysis colours to the chart series. For scatter, a single digit may be specified: scatter xvar yvar !colors:="4"

  • scatter is a graph command and any graph option may be specified

See variables on using referenced variables for this command

barchart

barchart Variable [StratifyVariable] [options]

Draw a barchart for Variable. A barchart shows frequencies at each indiviual value of Variable.

parameters

  • Variable may be of any type
  • Stratifyvariable may be of any type

options

  • !pct

    Y-axis values are percentage of the total across strata

  • !w:=weightVariable

for grouped data, specify the weights

  • !stack

stack bars for stratified data; !stack and !pct together will have stacked bars that sum to 100%

  • graph options

histogram is a graph command and any graph option may be specified

See variables on using referenced variables for this command

histogram

histogram Variable [StratifyVariable] [options]

Draw a histogram for a variable, based on consecutive integer or day intervals. The user is responsible for recoding variables so that consecutive intervals make sense.

A histogram is a bar chart where every integer value within range is represented on the X-axis.

parameters

  • Variable may be integer or date
  • Stratifyvariable may be of any type

options

  • !interval:=i

where i is an integer > 1, will group bars; the default is 1

  • !w:=weightVariable

for grouped data, specify the weights.

  • !stack

stack bars for stratified data.

  • graph options histogram is a graph command and any graph option may be specified

See variables on using referenced variables for this command

epicurve

epicurve Variable [StratifyVariable] [options]

Draw an epidemic curve for a variable, based on consecutive integer or day intervals. The user is responsible for recoding variables so that consecutive intervals make sense.

An epicurve is a stacked histogram, where individual boxes are shown for each subject

parameters

  • Variable may be integer or date
  • Stratifyvariable may be of any type

options

  • !interval:=i

where i is an integer > 1, will group bars; the default is 1

  • graph options epicurve is a graph command and any graph option may be specified

Consistency and Validity Check of data

check data

check data [var1 ...]

Use this command to perform a check of the data in selected variables (if no variable are specified, then ALL variable are checked).

The data is checked for:

  • Data length: Is the number of characters used in data within the length specified for the variable
  • Range/Valuelabel: Is the data within the specified range and/or is it a legal value label
  • Must Enter: Does the variable have data for all observations if it is marked as Must Enter
  • Jumps: If a variable has jumps assigned, do the skipped fields have the correct values
  • Comparison: If a variable is compared to another variable, is the comparison uphold.

example

read "bromar.epx"
check data                   // checks all variable
check data dectime kmgrp age // Only checks the variables dectime, kmgrp and age

See variables on using referenced variables for this command

check key

check key [var1 ...]

Check that the data in specified variables are unique and represent a key.

If no variables are specified and a key is already present in the current dataset, this key is checked.

example

read "bromar.epx"
check key id                 // checks if the variable ID represents a unique key

See variables on using referenced variables for this command

check relate

check relate

Check that all observations have a valid parent observation

example

read "related_data.epx";  // Load the project
use child_dataset;        // Change dataset to a related dataset
check relate;             // Perform the check from the child dataset "upwards" to the parent.
                          // Must be repeated if you have more levels

check study

check study

Check that the study information of is specified or not.

example

read "samplev3.epx";  // Load the project
check study;          // Perform the check

REPORTS

report users

report users

If a project is using Extended Access control, this command will show a condensed report of the log entries and a list of failed login attempts.

If the project is not using Extended Access control, an error will be displayed.

report validate / report val

report validate [var1 var2 ...] [!options]

Compares two dataset / projects against each other, validating the data content and outputs a report of differences based on the comparison.

parameters

The variables var1 .. var2 denotes the sorting variables. This is required if not comparing whole projects OR if the datasets does not contain and key variables.

options

  • !fn := "<string>"

Opens an external file to compare with.

  • !ds := <dataset id>

Specifies a single dataset (internal/external) to compare with.

  • !nos

Excludes all string types from comparison

  • !nodt

Excludes all date and time types from comparison

  • !noauto

Excludes all auto types from comparison

  • !noc

All text comparisons are done case in-sensitive

  • !nol

Only show the condensed report - do not show the list of observations

  • !val

All records that pass the comparison will be marked as verified. The pass is based on the option chosen from above!

example

read "bromar.epx";               // Load the project

// Run a report based on the two internal datasets
// (1st is currently used, 2nd is the one marked with !ds :=...)
report val id !ds := ds2;

// Run a report based on the two datasets, one internal and one external
// (1st is currently used, 2nd is the one marked with !ds :=...)
report val id !fn := "double_entry.epx" !ds := ds1

// If you have two projects there are two ways compare there.
// If you wish to compare individual dataset, use the options above.
// If you have two project you wish to make a complete validation on, use following:

// Run a report based on the two complete projects, one internal and one external
report val !fn := "double_entry.epx"

// The last example is a special case where both the internal and external project only contains
// a single dataset each. In this case you only need to specify the sorting variable(s)
// and the external file. The dataset option is not needed since the external project only has a single dataset.
report val id !fn := "double_entry.epx"

report countby / report cby

report cby [var1 var2 ...] [!options]

Compares the combination of variables across several datasets. The variables var1 .. varn is considere a "key" and each unique combination of this key is counted across all the specified datasets.

The output is a report with a condensed table of the found keys and a complete table with the found unique key values and the count of these in each dataset.

options

  • !fn := <global string vector>

    This option accepts a global vector with the filenames that are included in the report. The files can be in different formats, but the variable names MUST be the same in each file.

    If a file name is sys.missing (.), the dataset in the currently opened project is used.

  • !ds := <global string vector>

    This option accepts a global vector with the dataset name that is included in the report. The number of entries in the dataset variable MUST be the same as the filenames.

  • !nol

    Only show the condensed report - do not show the list of observations

example

// Setup the input for the report:
new global filenames[5] string;
filenames[1] := "count_file_1.epx";
filenames[2] := "count_file_2.rec";
filenames[3] := "count_file_3.dta";
filenames[4] := "count_file_4.csv";
filenames[5] := .;   // use the current file
// Setup the dataset names
new global datasets[5] string;
datasets[1] := "ds1";
datasets[2] := "ds1";
datasets[3] := "ds1";
datasets[4] := "ds1";
datasets[5] := "ds1";
// Run the report:
report cby id !fn := filenames !ds := datasets

Disk commands

<a name="cd" id="cd"

cd

cd ["<directory path>"]

Change the working directory (folder) to the specified path. If no path is given a dialog is shown to select the working directory.

ls / dir

ls ["<directory path>"]
dir ["<directory path"]

List files in a directory

parameters

  • directory name may include wild cards (* or ?) If no path is given a dialog is shown to select the working directory

erase

erase "<file name>"

Delete the file from disk.

parameters

  • may use wildcards (* or ?), but the directory name should not as this may or may not be allowed by the operating system
  • If no path is given, the current working directory is used.

Warning: The file is deleted (if the file exist) with no confirmatory question

Program-wide options

set

set ["parameter"] [:= "value"]

Change the value of an EpiData setting. An example of this is colour or font selection.

All set ["parameter"] definitions may be added to the file startup.pgm to define your own defaults. Edit this file using the menu as its location will vary, depending on the operating system.

  • MacOS: Analysis / Preferences
  • Windows / Linux: Edit / Options

parameters

  • Without parameters, provides a list of available parameters and their current values

  • parameter any legal set option (must be enclosed in double quotes)

  • value will be depending on the parameter, but may be a number, text, ON/OFF or a hexadecimal font colour See colour examples here)

    For settings with ON/OFF or a text value include this in double quotes

    If no value is specified, show the curent value

The case of parameters and values does not matter.

examples

set "echo";
set "echo" := "off";
set "COMMANDLINE FONT COLOUR" := "#FFF000";

Set parameters and defaults

Option Possible values Default Value Comments or example
BROWSER BG COLOUR hex colour code "#FFFFFF" Adjust the colour of the background. e.g. #000000 is black.
BROWSER FONT COLOUR hex colour code "#000000" Adjust the colour of the font. e.g. #FFF000 is yellow.
BROWSER FONT NAME string (depends on the operating system) Name of the font used in the browser.
BROWSER FONT SIZE 10 Adjust the size of the font in the browser.
BROWSER FONT STYLE <fsBold/fsItalic/fsUnderline> " " Adjust the style of the text in the browser. Eg. underlines text, bold text.
BROWSER OBS DEFAULT COLOUR hex colour code "#F0F0F0" Adjust the colour of "obs" column for normal/default observations
BROWSER OBS DELETED COLOUR hex colour code "#FF0000" Adjust the colour of "obs" column for observations marked for deletion
BROWSER OBS VERIFIED COLOUR hex colour code "#008080" Adjust the colour of "obs" column for verified observations
BROWSER VALUE LABEL L/V/LV/VL V Default option for output of variable data (value and/or label). See Valuelabels for options. This options applies to "list data" and "browse" only
BROWSER VARIABLE LABEL VLA / VLN / VN / VNL VN Default option for displaying variable name and/or label. See Variable labels for options. This options applies to "list data" and "browse" only
COMMANDLINE BG COLOUR hex colour code "#FFFFFF" Adjust the colour of the background. e.g. #000000 is black.
COMMANDLINE FONT COLOUR hex colour code "#000000" Adjust the colour of the font. e.g. #FFF000 is yellow.
COMMANDLINE FONT NAME string (depends on the operating system) Name of the font used in the commandline edit.
COMMANDLINE FONT SIZE 10 Adjust the size of the font in the commandline edit.
COMMANDLINE FONT STYLE <fsBold/fsItalic/fsUnderline> " " Adjust the style of the font, e.g. bold, underline etc
COMMANDLOG ON/OFF ON When "ON" a complete list of executed commands is saved to a file in current active dir.
COMMANDLOGFILE string commandlog.pgm Name of the file to save the executed commands
COMMANDLOGLINES 1000 The number of lines kept in the commandlog file. If the number of lines is exceeded, the lines are dropped from the beginning
CONFIDENCE INTERVAL 90> 95> 99 95 Set the default confidence interval to be estimated by table or ctable
CSV DELIMITER , The separator used between variables when you export to the clipboard from the browser.
DISPLAY COMMANDTREE WINDOW ON/OFF OFF Opens/Closes the command tree window
DISPLAY DATASET WINDOW ON/OFF OFF Opens/Closes the dataset window
DISPLAY HISTORY WINDOW ON/OFF OFF Opens/Closes the history window
DISPLAY VARIABLE WINDOW ON/OFF OFF Opens/Closes the variable window
ECHO ON/OFF ON When = ON show results, OFF: "silent"
Use show error if you wish to suppress errors too!
EDITOR FONT NAME string (depends on the operating system) Name of the font used in the editor.
EDITOR FONT SIZE 10 Adjust the size of the font.
EXITSAVE YES/NO NO If "YES" the user is prompted on closing the program for saving if a project is open and has been modified
INCLUDE DELETED ON/OFF OFF If "ON" then observations marked for deletion is also included in calculations
OUTPUT BG COLOUR hex colour code "#FFFFFF" Adjust the colour of the output background. e.g. #000000 is black.
OUTPUT CSS FILE string (empty) When using HTML output it is possible to use an external CSS file. If the file name specified does not exist a file will be created with the content of the built in CSS.
OUTPUT CSS INTERNAL YES/NO YES If set to YES, the content of the CSS FILE is embedded into the HTML. If set to NO the CSS FILE is referenced from within the HTML output.
OUTPUT FONT COLOUR hex colour code "#000000" Adjust the colour of the output font. e.g. #FFF000 is yellow.
OUTPUT FONT NAME string (depends on the operating system) Name of the font used in the text output.
OUTPUT FONT SIZE 10 Adjust the size of the font in the text output.
OUTPUT FONT STYLE <fsBold/fsItalic/fsUnderline> " " Adjust the style of the font, e.g. bold, underline etc
OUTPUT FORMAT TEXT/HTML TEXT Format of the output window HTML viewing is current in beta
OUTPUT SAVE FORMAT TEXT/HTML TEXT Set the default format when saving the output to file.
SHORT MONTH NAMES string (depends on the language of the operating system)eg. Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec The function createdate will use these value when trying to match against the short month names. This list MUST contain 12 items seperated by commas
SHOW COMMAND ON/OFF ON If "ON" then each line that is run (from command line or editor) is added to output as ".<command...>". "OFF" = no output
SHOW DEBUG ON/OFF ON If "ON" then lines containing debug information is shown. "OFF" = no output
SHOW ERROR ON/OFF ON If "ON" then lines containing error information is shown. "OFF" = no output
SHOW INFO ON/OFF ON If "ON" then lines containing informational output is shown. "OFF" = no output
SHOW WARNING ON/OFF ON If "ON" then lines containing warning information is shown. "OFF" = no output
STATISTICS VALUE LABEL L/V/LV/VL L Default option for output of variable data (value and/or label). See Valuelabels for options. This options applies to commands not covered by "BROWSER VALUE LABEL"
STATISTICS VARIABLE LABEL VLA / VLN / VN / VNL VLA Default option for displaying variable name and/or label. See Variablelabels for options. This options applies to commands not covered by "BROWSER VALUE LABEL"

Common options

Valuelabels

  • !v Show only the value, (**fallback if no label to corresponding value)

  • !l Show only the label (**default)

  • !vl Show the value then the label

  • !lv Show the label then the value

Variable Labels

  • !vn Show only the name, (**fallback if no variable label assigned)

  • !vla Show only the label (**default)

  • !vnl Show the name then the label

  • !vln Show the label then the name

Decimals for percentages or statistics

  • !d0 0 decimals

  • !d1 1 decimal

  • !d2 2 decimals

  • !d3 3 decimal

  • !d4 4 decimals

  • !d5 5 decimals

Variable types

  • integer / int / i

    A variable (standard, result or global) that contains an integer value.

  • float / f A variable (standard, result or global) that contains an floating point value.

Note: all floating points shown on screen appear in the current national setting (locale), but input (from editor or command line) must always use "." (period) as the decimal separator. The saved data in a given project can be used in different national settings without giving problems or need for conversions.

  • string / str / s

    A variable (standard, result or global) that may contain any string

  • boolean / bool / b

    A variable (standard, result or global) that contains only true or false

  • time / t

    A variable (standard, result or global) that contains a time value.

  • date / d

    A variable (standard, result or global) that contains a date value. All new date variables created will be a DMY type, but this may change in the future.

Variable references

Any command that accepts more than one variable as parameters can use the following schemes for variable expansion.

  • var1-var4 (dash)

    Use the two variables given and all variables between them.

  • var* (asterisk)

    * is used as a replacement for 0 to many characters. This cannot be the first character.

  • var? (question mark)

    ? is used as a replacement for exactly 1 character. This cannot be the first character.

Note: If there are no variables matching the result then you get an error

It is possible to combine "*" and "?" for more elaborate expressions, but neither can be combined with "-"

examples

// Consider the following set of variables (and in that order):
// V1, V2, V3, V4, V10, V11, V100
list data V2 - V10;      // V2 - V10 is expanded to the variables V2, V3, V4 and V10
list data V1* ;          // V1* is expanded to V1, V10, V11, V100 
list data V?  ;          // V?  is expanded to V1, V2, V3 and V4
list data V1??;          // V1?? is expanded to V100 only!

A referenced variable may also be used in the expansion. These will be evaluated before the expansion!

Referenced Variable

@{variable1}

With a referenced variable, you essentially use the content of another variable (global, result) to provide the variable name.

examples

new global gvar1 string := "sex";
read "bromar.epx";
freq sex;       // Outputs a frequency table for the variable "sex"
freq @{gvar1};  // Does the same as above, because the content of gvar1 is "sex"

This can be combined with indexing of a variable. Using some of the builtin result variables like $dataset and $variable:

new global i integer;
for i := 1 to size($variable) do
  begin
    // Here we output the name of all the variables:
    // - not using the @{..} because we want the content of the $variable result var.
    ? $variable[i]

    // Here we do a frequency table of the variable.
    // - using @{..} because "freq" needs a variable and not the content of $variable
    freq @{$variable[i]}
  end;

The Variable inside the @{..} may itself be another reference (with or without index), making it possible to combine multiple levels of references.

new global f    string := "age"
new global g[3] string := "f";  // All entries have the value "f", but that is fine for this example.
new global h[3] string := "g";  // All entries have the value "g", but that is fine for this example.

// The line below is a valid construction, which evalutes the following way
// 1: h[1] is evaluated into the string "g"
// 2: "g" is used in @{"g"}, which means - use the content of g as a variable
// 3: g[1] is evaluated into the string "f"
// 4: "f is used in @{"f"}, which means - use the content of f as a variable
// 5: @{f} is evalued to the variable AGE
// 6: The command freq is run on the variable AGE.

freq @{ @{h[1]}[1] };

Example of how to loop over a referenced variable:

// you wish to estimate the time for parts of an analysis and have created a number of time stamps:
new global tx t:= now(); // where x is 1 , 2, 3 etc.

// now to display these and the difference: - assume you had five of these:

new global i i;
for i:= 1 to 5 do
   begin
     ? i + " time: " + @{"t" + i};  // this works becaus the parenthesis will be t1 t2 t3 etc.
     end;

// now also calculate the difference in time between the two:

new global tdif t;   // tdif is a time difference

for i:= 2 to 5 do
   begin
     tdif := (@{"t" + i} - @{"t" + (i-1)});  // notice again the (tdif = t2 -t1 ) when i was = 2
     ? i + " difference : " + tdif;
   end;

#Programming aids

These are not normally used in interactive mode

runtest

runtest ["<directory path>"]

Run all .pgm files in a given directory (folder) to verify function.

This is provided for testing of correct estimation etc. If no path is given, a dialog is shown to select the working directory.

parameters

  • directory path

    a directory that contains multiple .pgm files

  • without parameters, the open file dialogue is started

run

run ["<filename.pgm>"]

Execute the commands saved in a .pgm file

parameters

  • filename.pgm

    may include a path

  • without parameters, the open file dialogue is started

Clean up - stop

close

close

Stop using a project

  • all unsaved variables and changes to existing variables and labels will be lost
  • global variables will remain in memory

cls

cls

Clear the output screen

clh

clh

Clear the history of commands

reset

reset

Reset of all parameters of the program! This is almost equivalent of doing:

close;
drop global !all;
cls;
clh;

Note: reset also clears all result variables!

Functions available in EpiData Analysis

In the following, takes indicates the variable type for each parameter and result indicates the type of the result of the function:

  • s: string
  • b: boolean
  • d: date
  • t: time
  • i: integer
  • f: floating point
  • n: any numeric
  • v: variable

Parameters may be variables read from fields, created variables, or any expression that evaluates to the correct type.

String functions

function takes result example
length(str) s i length("Abcde") => 5
pos(instr, findstr) s, s i pos("Abcde", "cd") => 3
pos("Abcde", "z") => 0
substring(str, start, len) s, i, i s substring("Abcde", 2, 3) => "bcd"
trim(str) s s trim("Abcde ") => "Abcde"
trim(" Abcde") => "Abcde"
lower(str) s s lower("Abcde") => "abcde"
upper(str) s s upper("Abcde") => "ABCDE"
concat(X, s1, s2, ..., sn) s, any, ... s Concat(...) concatenates values s1 -> sn into a string. If any of the sx parameters returns system missing it will be replaced by the value of X
concat("X", "a", v1) => "aX" if v1 is missing, otherwise a + the value of v1
For user defined missing values, the actual value is added to the string.

Arithmetic functions (including Random numbers)

function takes result example
abs(x) n n abs(-12) => 12
exp(x) n f exp(1) => 2.71828182845905
fraction(x) f f fraction(12.34) => 0.34
ln(x) n f ln(2.71828182845905) => 1
ln(0) => missing
log(x) n f log(10) => 1
log(0) => missing
round(x, digits) n, d, t f round(12.44,1) => 12.4
round(12.5,0) => 13
sqrt(x) n f sqrt(4) => 2
random(x) i i Random integer from 0 to x
sum(n1, n2, ..., nn) n, ... n Sums the non-missing values n1 => nn; missing or user defined missing values are ignored.

Trigonomerty functions

function takes result example
tan(x) f f tan(0) => 0
arctan(x) f f arctan(1) => pi/2
cos(r) f f cos(pi/2) => 6.12303176911189E-17
cos(pi) => -1
arccos(r) f f arccos(0) => pi / 2
sin(r) f f sin(pi/2) => 1
sin(pi) => 6.12303176911189E-17
arcsin(r) f f arcsin(0) => 0

Date functions

function takes result example
createdate(datestr) s d The form of datestr is automatically detected, but if the string is ambiguous the preference is always DMY over MDY.
If parts of the datestr are omitted, then these parts are filled with todays values.
If the string is not recognised as a date, system missing is returned.
createdate("31/12/2016") => 31/12/2016
createdate(datestr,date-type) s, s d createdate("31/12/2016", "dmy") => 31/12/2016
createdate("12/31/2016", "mdy") => 31/12/2016
createdate("2016/12/31", "ymd") => 31/12/2016
createdate(datestr,fmt-string) s, s d Converts any string to a date based on the format specified in fmt-string. The format options can be found in the FPC source documentation
createdate("31-dec-16", "dd-mmm-yy") => 31/12/2016
For the "mmm" format it is possible to control the abbreviated month names using the set options. The default is based on the language of the Operating System.
createdate(d, m, y) i, i, i d createdate(31, 12, 2016) => 31/12/2016
today() - i returns today's date; may be assigned to a date variable or an integer
day(d) d i day(31/12/2004) => 31
dayofweek(d) d i dayofweek(31/12/2004) => 5
Monday=1, Sunday=7
month(d) d i month(31/12/2004) => 12
week(d) d i week(22/02/2001) => 8
year(d) d i year(31/12/2004) => 2004

Time functions

function takes result example
createtime(timestr) s t createtime("12:34:56") => 12:34:56
The form of *timestr is automatically detected. If parts of the timestr are omitted, then these parts are filled with 0 (zero).
createtime(h, m, s) i, i, i t createtime(12, 34, 56) => 12:34:56
now() - f returns the time right now. It can be assigned to a time or float variable
second(t) t i second(12:34:56) => 56
minute(t) t i minute(12:34:56) => 34
hour(t) t i hour(12:34:56) => 12

Logic functions

function takes result example
b1 and b2 b,b b true and true => TRUE
true and false => FALSE
false and true => FALSE
false and false => FALSE
b1 or b2 b,b b true or true => TRUE
true or false => TRUE
false or true => TRUE
false or false => FALSE
b1 xor b2 b,b b true xor true => FALSE
true xor false => TRUE
false xor true => TRUE
false xor false => FALSE
not b b not(false) => TRUE

Conversion functions

function takes result example
boolean(x) any b boolean(x) => TRUE, for any non-zero x
boolean(0) => FALSE
boolean("true") => TRUE, "true" text is case in-sensitive
boolean(x) => FALSE, for any text other than "true"
integer(x) any i integer(1.23) => 1
integer(31/12/2016) => 42735
integer("2") => 2
integer("a") => .
Any input x that cannot be interpreted as an integer returns missing "."
float(x) any f float(1) => 1.00
float("12,34") => 12.34
Any input *x that cannot be interpreted as a float returns missing "."
string(x) n s string(1.23) => "1.23"

Identifier functions

function takes result example
exist(x) v b Returns true/false whether the provided identifier exists
idtype(x) v i Returns the type of the identifier provided. This function can be used on all valid identifiers and the integer value returned has the following meaning:
0: Global variable
1: Global vector
2: Regular Variable
3: Dataset
4: Valuelabel
5: Result Variable
6: Result Vector
7: Result Matrix
if using idtype(x) with the eval function "?", the output will be in text.
datatype(x) v i Returns the type of date stored in the variable. The integer value returned has the following meaning:
-1: Variable has no data type - e.g. a dataset variable.
0: Boolean
1: Integer
2: Auto Increment
3: Float
4: DMY Date
5: MDY Date
6: YMD Date
7: DMY Auto Date
8: MDY Auto Date
9: YMD Auto Date
10: Time
11: Auto Time
12: Uppercase String
13: String
14: Memo
if using datatype(x) with the eval function "?", the output will be in text.
size(x) v i Size returns the size/length of an identifier (if applicable).
Global & Result variables always have size 1
Global vector, Result Vector, Variable & Valuelabel return the length/size/count of elements/data
Result Matrix is not implemented yet - it returns -1;
Dataset returns the total number of observations (even if a select is applied).
label(v) v s Return the descriptive label of the identifier. This is only possible for variables and datasets.

Test and special functions

function takes result example
lre(x,y) n n lre($mean1, 1.23456789123456)
returns number of digits precision of $mean1
iif(b, x, y) b
n
n
n iif(..., true value, false value) evaluates the boolean expression (b) inline, and based on the result either returns the true value or false value.
iif(2 = 3, "This is true", "This is false") => "This is false"
samevalue
(x, y, z)
n, d, t
n, d, t
i
b samevalue($mean1, 1.23456789123456, 10)
returns true or false indicating if |x-y| < 10-10
Best used for comparing floating point values. Since the internal binary representation of two seemingly similar numbers may differ, using x = y can fail.
samevalue
(x, y)
n, d, t b samevalue($mean1, 1.23456789123456)
returns true or false indicating if x = y
The same as calling samevalue(x, y, 15)
cwd() s Returns the current working directory
deleted([index]) [i] b Returns true/false whether the record is marked for deletion. If no index is supplied the current record number is tested
select deleted() do edit data !nomd
selects records marked for deletion and unmarks them
verified([index]) [i] b Returns true/false whether the record is marked as verified. If no index is supplied the current record number is tested:
select verified() do edit data !nomd
selects records marked for deletion and unmarks them

Operators used in EpiData Analysis

operator syntax result meaning example
  • | n+n | n | addition | 1+2 => 3
  • | s+any
    any+s | s | concatenation | "A"+"B" => "AB"
    "A"+1 => "A1"
  • | d+n | d | date addition | "30/11/2004"+31 => "31/12/2004"
  • | n-n | n | subtraction|2-1 => 1
  • | d-d | n | date subtraction | "31/12/2004"-"30/11/2004" => 31
  • | d-n | d | date subtraction | "31/12/2004"-31 => "30/11/2004" * | n*n | n | multiplication | 2*3 => 6 / | n/n | n | division | 5/2 => 2.5
    5/0 => missing div | n div n | i | integer result of division | 5 div 2 => 2
    5 div 0 => missing ^ | n^n | f | exponentiation | 5^2 => 25
    4^0.5 => 2 ( ) | | | group expressions | (5\*(2+4))/2 => 15
    5\*2+4/2 == (5*2)+(4/2) => 12 = | n = n | b | equal |1 = 2 => FALSE < | n < n | b | less than | 1<2 => TRUE

| n > n | b | greater than | 1>2 => FALSE <= | n <= n | b | less than or equal | 1<=2 => TRUE
2<=2 => TRUE = | n >= n | b | greater than or equal | 1<=2 => FALSE
2>=2 => TRUE <> | n <> n | b | not equal to | 1<>2 => TRUE
1<>1 => FALSE $ | $resultvar | | result value | ? $count => 4027

Startup options for EpiData Analysis

The use of startup options depends on the operating system. You may be able to create a desktop shortcut that includes these or start analysis from the command line.

epidataanalysis [options]

options

  • -h or --help

    Show this help and exit.

  • -v or --version

    Show version info and exit.

  • -i or --inifile [FILE]

    Uses [FILE] as startup program. If no location is specified startup.pgm is used.

examples

With Linux:

./epidataanalysis -i /path/to/startup.pgm