<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
                   "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd">
<!-- lifted from troff+man by doclifter -->
<refentry id='mlp1'>
<!--  @(#)mlp.1 2001/04/02 NIST -->
<!--  I Image Group -->
<!--  G. T. Candela, Craig I. Watson &amp; C. L. Wilson -->

<refmeta>
<refentrytitle>MLP</refentrytitle>
<manvolnum>1A</manvolnum>
<refmiscinfo class='date'>02 April 2001</refmiscinfo>
<refmiscinfo class='source'>NIST</refmiscinfo>
<refmiscinfo class='manual'>NFIS Reference Manual</refmiscinfo>
</refmeta>
<refnamediv id='name'>
<refname>mlp</refname>
<refpurpose>Does training and testing runs using a 3-layer feed-forward linear perceptron Neural Network.</refpurpose>
</refnamediv>
<!-- body begins here -->
<refsynopsisdiv id='synopsis'>
<cmdsynopsis>
  <command>mlp</command>    
    <arg choice='opt'>-c </arg>
    <arg choice='opt'><replaceable>specfile</replaceable></arg>
</cmdsynopsis>
</refsynopsisdiv>


<refsect1 id='description'><title>DESCRIPTION</title>
<para><emphasis remap='B'>Mlp</emphasis>
trains a 3-layer feed-forward linear perceptron
using novel methods of machine learning that help control the
learning dynamics of the network. As a result, the derived minima
are superior, the decision surfaces of the trained network are
well-formed, the information content of confidence values is increased,
and generalization is enhanced.  The theory behind the machine learning
techniques used in this program is discussed in the following reference:</para>

<para>[C. L. Wilson, J. L. Blue, O. M. Omidvar, "The Effect of Training
Dynamics on Neural Network Performance," NIST Internal Report 5696,
August 1995.]</para>

<para>Machine learning is controlled through a batch-oriented iterative
process of training the MLP on a set of prototype feature vectors,
and then evaluating the progress made by running the MLP (in its
current state) on a separate set of testing feature vectors. Training on
the first set of patterns then resumes for a predetermined number
of passes through the training data, and then the MLP is tested again
on the evaluation set. This process of training and then testing
continues until the MLP has been determined to have satisfactorily
converged.</para>

<para>The MLP neural network is suitable for use as a classifier or as a
function-approximator. The network has an input layer, a hidden layer,
and an output layer, each layer comprising a set of nodes. The 
input nodes are feed-forwardly connected to the hidden nodes, and
the hidden nodes to the output nodes, by connections whose weights
(strengths) are trainable. The activation function used for the
hidden nodes can be chosen to be sinusoid, sigmoid (logistic), or
linear, as can the activation function for the output nodes. Training
(optimization) of the weights is done using either a Scaled Conjugate
Gradient (SCG) algorithm [1], or by starting out with SCG and 
then switching to a Limited Memory Broyden Fletcher Goldfarb
Shanno (LBFGS) algorithm [2]. Boltzmann pruning [3], i.e. dynamic
removal of connections, can be performed during training if desired.
Prior weights can be attached to the patterns (feature vectors) in
various ways.</para>

<para>[1] J. L. Blue and P. J. Grother, "Training Feed Forward Networks Using
Conjugate Gradients," NIST Internal Report 4776, February 1992, and
in Conference on Character Recognition and Digitizer Technologies,
Vol. 1661, pp.  179-190, SPIE, San Jose, February 1992.</para>

<para>[2] D. Liu and J. Nocedal, "On the Limited Memory BFGS Method for
Large Scale Optimization," IMathematical Programming B, Vol. 45,
503-528, 1989.</para>

<para>[3] O. M. Omidvar and C. L. Wilson, "Information Content in Neural Net
Optimization," NIST Internal Report 4766, February 1992, and in <emphasis remap='I'>Journal
of Connection Science</emphasis>, 6:91-103, 1993.</para>

<para><emphasis remap='B'>Training and Testing Runs</emphasis></para>

<para>When mlp is invoked, it performs a sequence of runs. Each run does
either training, or testing:</para>

<para><emphasis remap='B'>training run:</emphasis>
A set of patterns is used to train (optimize) the weights
of the network. Each pattern consists of a feature vector, along with
either a class or a target vector. A feature vector is a tuple of
floating-point numbers, which typically has been extracted from some
natural object such as a handwritten character. A class denotes the actual 
class to which the object belongs, for example the character which a
handwritten mark is an instance of. The network can be trained to become
a classifier: it trains using a set of feature vectors extracted from
objects of known classes.  Or, more generally, the network can be
trained to learn, again from example input-output pairs, a function
whose output is a vector of floating-point numbers, rather than a class;
if this is done, the network is a sort of interpolator or
function-fitter. A training run finishes by writing the final values of
the network weights as a file. It also produces a summary file showing
various information about the run, and optionally produces a longer
file that shows the results the final (trained) network produced for
each individual pattern.</para>  

<para><emphasis remap='B'>testing run:</emphasis>
A set of patterns is sent through a network, after the
network weights are read from a file. The output values, i.e. the
hypothetical classes (for a classifier network) or the produced output
vectors (for a fitter network), are compared with target classes or
vectors, and the resulting error rate is computed. The program can
produce a table showing the correct classification rate as a function
of the rejection rate.</para>

</refsect1>

<refsect1 id='options'><title>OPTIONS</title>
<variablelist remap='TP'>
  <varlistentry>
  <term><emphasis remap='I'>[-c]</emphasis></term>
  <listitem>
<para>Only do error checking on the specfile parameters and print any
warnings or errors that occur in the specfile format.</para>
  </listitem>
  </varlistentry>
  <varlistentry>
  <term><emphasis remap='I'>[specfile]</emphasis></term>
  <listitem>
<para>Specfile to be used by mlp. The default is a specfile
named "spec" located in the current working directory.</para>

<para>This is a file produced by the user, which sets the parameters
(henceforth "parms") of the run(s) that mlp is to perform. It consists
of one or more blocks, each of which sets the parms for one run. Each
block is separated from the next one by the word "newrun" or "NEWRUN".
Parms are set using name-value pairs, with the name and value separated
by non-newline white space characters (blanks or tabs). Each name-value
pair is separated from the next pair by newline(s) or semicolon(s).
Since each parm value is labeled by its parm name, the name-value
pairs can occur in any order. Comments are allowed; they are delimited
the same way as in C language programs, with /* and */. Extraneous
white space characters are ignored.</para>

<para>When mlp is run, it first scans the entire specfile, to find and report
any (fatal) errors (e.g. omitting to set a necessary parm, or using an
illegal parm name or value) and also any conditions in the specfile
which, although not fatally erroneous, are worthy of warnings
(e.g. setting a superfluous parm). Mlp writes any applicable warning
or error messages; then, if there are no errors in the specfile, it
starts to perform the first run. Warnings do not prevent mlp from
starting to run. The motivation for having mlp check the entire
specfile before it starts to perform even the first run, is that
this will prevent an mlp instance that runs a multi-run specfile from
failing, perhaps many hours, or days, after it was started, because
of an error in a block far into the specfile: such errors will be
detected up front and presumably fixed by the user, because that is
the only way to cause mlp to get past its checking phase. To cause
mlp only to check the specfile without running it, use the -c option.</para>

<para>The following listing describes all the parms that can be set in a
specfile. There are four types of parms: string (value is a filename),
integer, floating-point, and switch (value must be one of a set of
defined names, or may be specified as a code number). A block of the
specfile, which sets the parms for one run, often can omit to set the
values of several of the parms, either because the parm is unneeded
(e.g., a training "stopping condition" when the run is a test run;
or, <emphasis remap='B'>temperature</emphasis> when boltzmann is <emphasis remap='I'>no_prune</emphasis>), or because it is an
architecture parm (<emphasis remap='B'>purpose, ninps, nhids, nouts, acfunc_hids, or
acfunc_outs</emphasis>), whose value will be read from <emphasis remap='B'>wts_infile</emphasis>. The
descriptions below indicate which of the parms are needed only for
training runs (in particular, those described as stopping conditions).
Architecture parms should be set in a specfile block only if its run is
to be a training run that generates random initial network weights: a
training run that reads initial weights from a file (typically, final
weights produced by a previous training session), or a test run (must
read the network weights from a file), does not need to set any of the
architecture parms in its specfile block, because their values are
stored in the weights file that it will read. (Architecture parms
are ones whose values it would not make sense to change between
training runs of a single network that together comprise a training
"meta-run", nor between a training run for a network and a test run
of the finished network.) Setting unneeded parms in a specfile block
will result in warning messages when mlp is run, but not fatal
errors; the unneeded values will be ignored.</para>

<para>If a parm-name/parm-value pair occurring in a specfile has just its
value deleted, i.e. leaving just a parm name, then the name is ignored
by mlp; this is a way to temporarily unset a parm while leaving its
name visible for possible future use.</para>

<para><emphasis remap='B'>String Parms (Filename)</emphasis></para>

  <!-- .RS -->
  </listitem>
  </varlistentry>
</variablelist>
<variablelist remap='TP'>
  <varlistentry>
  <term><emphasis remap='B'>short_outfile</emphasis></term>
  <listitem>
<!-- .br -->
<para>This file will contain summary information about the run, including a
history of the training process if a training run. The set of information
to be written is controlled, to some extent, by the switch parms
<emphasis remap='B'>do_confuse</emphasis> and <emphasis remap='B'>do_cvr</emphasis>.</para>

  </listitem>
  </varlistentry>
  <varlistentry>
  <term><emphasis remap='B'>long_outfile</emphasis></term>
  <listitem>
<!-- .br -->
<para>This optionally produced file will have two lines of header information
followed by a line for each pattern. The line will show: the sequence
number of the pattern; the correct class of the pattern (as a number
in the range 1 through <emphasis remap='B'>nouts</emphasis>); whether the hypothetical class the
network produced for this pattern was right (R) or wrong (W); the
hypothetical class (number); and the <emphasis remap='B'>nouts</emphasis> output-node activations the
network produced for the pattern. (See the switch parm
<emphasis remap='B'>show_acs_times_1000</emphasis> below, which controls the formatting of the
activations.) In a testing run, mlp produces this file for the result
of running the patterns through the network whose weights are read from
<emphasis remap='B'>wts_infile</emphasis>; in a training run, mlp produces this file only for the
final network weights resulting from the training session. This is often
a large file; to save disk space by not producing it, just leave the
parm unset.</para>

  </listitem>
  </varlistentry>
  <varlistentry>
  <term><emphasis remap='B'>patterns_infile</emphasis></term>
  <listitem>
<!-- .br -->
<para>This file contains patterns upon which mlp is to train or test a network.
A pattern is either a feature-vector and an associated class, or a
feature-vector and an associated target-vector. The file must be in
one of the two supported patterns-file formats, i.e. ASCII and
(FORTRAN-style) binary; the switch parm <emphasis remap='B'>patsfile_ascii_or_binary</emphasis>
must be set to tell mlp which of these formats is being used.</para>

  </listitem>
  </varlistentry>
  <varlistentry>
  <term><emphasis remap='B'>wts_infile</emphasis></term>
  <listitem>
<!-- .br -->
<para>This optional file contains a set of network weights. Mlp can read such
a file at the start of a training run - e.g., final weights from a
preceding training run, if one is training a network using a sequence of
runs with different parameter settings (e.g., decreasing values of
<emphasis remap='B'>regfac</emphasis>) - or, in a testing run, it can read the final weights
resulting from a training run. This parm should be left unset if
random initial weights are to be generated for a training run (see the
integer parm <emphasis remap='B'>seed</emphasis>).</para>

  </listitem>
  </varlistentry>
  <varlistentry>
  <term><emphasis remap='B'>wts_outfile</emphasis></term>
  <listitem>
<!-- .br -->
<para>This file is produced only for a training run; it contains the final
network weights resulting from the run.</para>

  </listitem>
  </varlistentry>
  <varlistentry>
  <term><emphasis remap='B'>lcn_scn_infile</emphasis></term>
  <listitem>
<!-- .br -->
<para>Each line of this optional file should consist of a long class-name
(as shown at the top of <emphasis remap='B'>patterns_infile</emphasis>) and a corresponding short
class-name (1 or 2 characters), with the two names separated by white
space; the lines can be in any order. This file is required only for
a run that requires short class-names, i.e. only if <emphasis remap='B'>purpose</emphasis> is
<emphasis remap='I'>classifier</emphasis> and (1) <emphasis remap='B'>priors</emphasis> is <emphasis remap='I'>class</emphasis> or <emphasis remap='I'>both</emphasis>
(these settings of <emphasis remap='B'>priors</emphasis> require class-weights to be read
from <emphasis remap='B'>class_wts_infile</emphasis>, and that type of file can be read only
if the short class-names are known) or (2) <emphasis remap='B'>do_confuse</emphasis> is
<emphasis remap='I'>true</emphasis> (proper output of confusion matrices requires the short
class-names, which are used as labels).</para>

  </listitem>
  </varlistentry>
  <varlistentry>
  <term><emphasis remap='B'>class_wts_infile</emphasis></term>
  <listitem>
<!-- .br -->
<para>This optional file contains class-weights, i.e. a "prior weight" for
each class. (See switch <emphasis remap='B'>parm</emphasis> priors to find out how mlp can use these
weights.) Each line should consist of a short class-name (as shown in
<emphasis remap='B'>lcn_scn_infile</emphasis>) and the weight for the class, separated by white
space; the order of the lines does not matter.</para>

  </listitem>
  </varlistentry>
  <varlistentry>
  <term><emphasis remap='B'>pattern_wts_infile</emphasis></term>
  <listitem>
<!-- .br -->
<para>This optional file contains pattern-weights, i.e. a "prior weight" for
each pattern. (See switch parm <emphasis remap='B'>priors</emphasis> to find out how mlp can use
these weights.) The file should be just a sequence of floating-point numbers 
(ascii) separated from each other by white space, with the numbers in
the same order as the patterns they are to be associated with.</para>

  </listitem>
  </varlistentry>
</variablelist>

<para><emphasis remap='B'>Integer Parms</emphasis></para>

<variablelist remap='TP'>
  <varlistentry>
  <term><emphasis remap='B'>npats</emphasis></term>
  <listitem>
<!-- .br -->
<para>Number of (first) patterns from <emphasis remap='B'>patterns_infile</emphasis> to use.</para>

  </listitem>
  </varlistentry>
  <varlistentry>
  <term><emphasis remap='B'>ninps, nhids, nouts</emphasis></term>
  <listitem>
<!-- .br -->
<para>Specify the number of input, hidden, and output nodes in the network.
If <emphasis remap='B'>ninps</emphasis> is smaller than the number of components in the
feature-vectors of the patterns, then the first <emphasis remap='B'>ninps</emphasis> components of
each feature-vector are used. If the network is a <emphasis remap='I'>classifier</emphasis>
(see <emphasis remap='B'>purpose</emphasis>), then <emphasis remap='B'>nouts</emphasis> is the number of classes, since there is
one output node for each class. If the network is a <emphasis remap='I'>fitter</emphasis>, then
<emphasis remap='B'>ninps</emphasis> and <emphasis remap='B'>nouts</emphasis> are the dimensionalities of the input and
output real vector spaces. These are architecture parms, so they should
be left unset for a run that is to read a network weights file.</para>

  </listitem>
  </varlistentry>
  <varlistentry>
  <term><emphasis remap='B'>seed</emphasis></term>
  <listitem>
<!-- .br -->
<para>For the UNI random number generator, if initial weights for a training run
are to be randomly generated. Its values must be positive. Random weights
are generated only if <emphasis remap='B'>wts_infile</emphasis> is not set. (Of course, the
<emphasis remap='B'>seed</emphasis> value can be reused to generate identical initial weights in
different training runs; or, it can be varied in order to do several
training runs using the same values for the other parameters. It is
often advisable to try several seeds, since any particular <emphasis remap='B'>seed</emphasis>
may produce atypically bad results (training may fail). However, the
effect of varying the <emphasis remap='B'>seed</emphasis> is minimal if Boltzmann pruning is used.)</para>

  </listitem>
  </varlistentry>
  <varlistentry>
  <term><emphasis remap='B'>niter_max</emphasis></term>
  <listitem>
<!-- .br -->
<para><emphasis remap='B'>A stopping condition:</emphasis> maximum number of iterations a training run
will be allowed to use.</para>

  </listitem>
  </varlistentry>
  <varlistentry>
  <term><emphasis remap='B'>nfreq</emphasis></term>
  <listitem>
<!-- .br -->
<para>At every nfreq'th iteration during a training run, the <emphasis remap='B'>errdel</emphasis> and
<emphasis remap='B'>nokdel</emphasis> stopping conditions are checked and a pair of status lines
is written to the standard error output and to <emphasis remap='B'>short_outfile</emphasis>.</para>

  </listitem>
  </varlistentry>
  <varlistentry>
  <term><emphasis remap='B'>nokdel</emphasis></term>
  <listitem>
<!-- .br -->
<para><emphasis remap='B'>A stopping condition:</emphasis> stop if the number of iterations used so far is
at least kmin and, for each of the most recent NNOT (defined in
<emphasis remap='I'>src/lib/mlp/optchk.c</emphasis>) sequences of <emphasis remap='B'>nfreq</emphasis> iterations, the number
right and the number right minus number wrong have both failed to increase
by at least <emphasis remap='B'>nokdel</emphasis> during the sequence.</para>


  </listitem>
  </varlistentry>
  <varlistentry>
  <term><emphasis remap='B'>lbfgs_mem</emphasis></term>
  <listitem>
<!-- .br -->
<para>This value is used for the m argument of the LBFGS optimizer (if that
optimizer is used, i.e. only if there is no Boltzmann pruning). This is
the number of corrections used in the bfgs update. Values less than 3 are
not recommended; large values will result in excessive computing time, as
well as increased memory usage.  Values in the range 3 through 7 are
recommended; value must be positive.</para>

  </listitem>
  </varlistentry>
</variablelist>

<para><emphasis remap='B'>Floating-Point Parms</emphasis></para>

<variablelist remap='TP'>
  <varlistentry>
  <term><emphasis remap='B'>regfac</emphasis></term>
  <listitem>
<!-- .br -->
<para>Regularization factor. The error value that a training run attempts
to minimize, contains a term consisting of regfac times half the average
of the squares of the network weights. (The use of a regularization
factor often improves the generalization performance of a neural network,
by keeping the size of the weights under control.) This parm must always
be set, even for test runs (since they also compute the error value,
which always uses <emphasis remap='B'>regfac</emphasis>); however, its effect can be nullified by
just setting it to 0.</para>

  </listitem>
  </varlistentry>
  <varlistentry>
  <term><emphasis remap='B'>alpha</emphasis></term>
  <listitem>
<!-- .br -->
<para>A parm required by the <emphasis remap='B'>type_1</emphasis> error function.</para>

  </listitem>
  </varlistentry>
  <varlistentry>
  <term><emphasis remap='B'>temperature</emphasis></term>
  <listitem>
<!-- .br -->
<para>For Boltzmann pruning: see the switch parm <emphasis remap='B'>boltzmann</emphasis>. A higher
temperature causes more severe pruning.</para>

  </listitem>
  </varlistentry>
  <varlistentry>
  <term><emphasis remap='B'>egoal</emphasis></term>
  <listitem>
<!-- .br -->
<para><emphasis remap='B'>A stopping condition:</emphasis> stop when error becomes less than or
equal to <emphasis remap='B'>egoal</emphasis>.</para>

  </listitem>
  </varlistentry>
  <varlistentry>
  <term><emphasis remap='B'>gwgoal</emphasis></term>
  <listitem>
<!-- .br -->
<para><emphasis remap='B'>A stopping condition:</emphasis> stop when | <emphasis remap='B'>g</emphasis> | / | <emphasis remap='B'>w</emphasis> | becomes
less than or equal to <emphasis remap='B'>gwgoal</emphasis>, where <emphasis remap='B'>w</emphasis> is the vector of
network weights and <emphasis remap='B'>g</emphasis> is the gradient vector of the error with
respect to <emphasis remap='B'>w</emphasis>.</para>

  </listitem>
  </varlistentry>
  <varlistentry>
  <term><emphasis remap='B'>errdel</emphasis></term>
  <listitem>
<!-- .br -->
<para><emphasis remap='B'>A stopping condition:</emphasis> stop if the number of iterations used so far
is at least kmin and the error has not decreased by at least a
factor of <emphasis remap='B'>errdel</emphasis> over the most recent block of <emphasis remap='B'>nfreq</emphasis> iterations.</para>

  </listitem>
  </varlistentry>
  <varlistentry>
  <term><emphasis remap='B'>oklvl</emphasis></term>
  <listitem>
<!-- .br -->
<para>The value of the highest network output activation produced when the
network is run on a pattern (the position of this highest activation
among the output nodes is the hypothetical class) can be thought of as
a measure of confidence. This confidence value is compared with the
threshold <emphasis remap='B'>oklvl</emphasis>, in order to decide whether to classify the
pattern as belonging to the hypothetical class, or to reject it,
i.e. to consider its class to be unknown because of insufficient
confidence that the hypothetical class is the correct class. The
numbers and percentages of the patterns that <command>mlp</command> reports as
<emphasis remap='I'>correct</emphasis>, <emphasis remap='I'>wrong</emphasis>, and <emphasis remap='I'>unknown</emphasis>, are affected by
<emphasis remap='B'>oklvl</emphasis>: a high value of <emphasis remap='B'>oklvl</emphasis> generally increases the number
of unknowns (a bad thing) but also increases the percentage of the
accepted patterns that are classified correctly (a good thing). If
no rejection is desired, set <emphasis remap='B'>oklvl</emphasis> to 0. (<emphasis remap='I'>Mlp</emphasis> uses the single
<emphasis remap='B'>oklvl</emphasis> value specified for a run; but if the switch parm <emphasis remap='B'>do_cvr</emphasis>
is set to <emphasis remap='I'>true</emphasis>, then <command>mlp</command> also makes a full
<emphasis remap='I'>correct vs. rejected</emphasis> table for the network (for the
finished network if a training run). This table shows the (number
correct) / (number accepted) and (number unknown) / (total number)
percentages for each of several standard <emphasis remap='B'>oklvl</emphasis> values.)</para>

  </listitem>
  </varlistentry>
  <varlistentry>
  <term><emphasis remap='B'>trgoff</emphasis></term>
  <listitem>
<!-- .br -->
<para>This number sets how mildly the target values for network output
activations vary between their "low" and "high" values. If <emphasis remap='B'>trgoff</emphasis> is
0 (least mild, i.e. most extreme, effect), then the low target value
is 0 and the high, 1; if <emphasis remap='B'>trgoff</emphasis> is 1 (most mild effect), then low
and high targets are both (1 / <emphasis remap='B'>nouts</emphasis>); if <emphasis remap='B'>trgoff</emphasis> has an
intermediate value between 0 and 1, then the low and high targets
have intermediately mild values accordingly.</para>

  </listitem>
  </varlistentry>
  <varlistentry>
  <term><emphasis remap='B'>scg_earlystop_pct</emphasis></term>
  <listitem>
<!-- .br -->
<para>This is a percentage that controls how soon a hybrid SCG/LBFGS training
run (hybrid training can be used only if there is to be no
Boltzmann pruning) switches from SCG to LBFGS. The switch is done the
first time a check (checking every nfreq'th iteration) of the network
results finds that every class-subset of the patterns has at least
<emphasis remap='B'>scg_earlystop_pct</emphasis> percent of its patterns classified correctly.
A suggested value for this parm is 60.0.</para>

  </listitem>
  </varlistentry>
  <varlistentry>
  <term><emphasis remap='B'>lbfgs_gtol</emphasis></term>
  <listitem>
<!-- .br -->
<para>This value is used for the gtol argument of the LBFGS optimizer. It
controls the accuracy of the line search routine mcsrch. If the function
and gradient evaluations are inexpensive with respect to the cost of
the iteration (which is sometimes the case when solving very large
problems) it may be advantageous to set <emphasis remap='B'>lbfgs_gtol</emphasis> to a small
value. A typical small value is 0.1. <emphasis remap='B'>Lbfgs_gtol</emphasis> must be greater
than 1.e-04.</para>

  </listitem>
  </varlistentry>
</variablelist>

<para><emphasis remap='B'>Switch Parms</emphasis></para>

<para>Each of these parms has a small set of allowed values; the value is
specified as a string, or less verbosely, as a code number (shown
in parentheses after string form):</para>

<variablelist remap='TP'>
  <varlistentry>
  <term><emphasis remap='B'>train_or_test</emphasis></term>
  <listitem>
<!-- .br -->
  <!-- .RS -->
<para></para> <!-- FIXME: blank list item -->
  </listitem>
  </varlistentry>
</variablelist>
<variablelist remap='TP'>
  <varlistentry>
  <term><emphasis remap='B'>train </emphasis><literal>0</literal></term>
  <listitem>
<!-- .br -->
<para>Train a network, i.e. optimize its weights in the sense of minimizing
an error function, using a training set of patterns.</para>
  </listitem>
  </varlistentry>
  <varlistentry>
  <term><emphasis remap='B'>test </emphasis><literal>1</literal></term>
  <listitem>
<!-- .br -->
<para>Test a network, i.e. read in its weights and other parms from a file,
run it on a test set of patterns, and measure the quality of the
resulting performance.</para>
  </listitem>
  </varlistentry>
</variablelist>
<!-- .RE -->

<variablelist remap='TP'>
  <varlistentry>
  <term><emphasis remap='B'>purpose</emphasis></term>
  <listitem>
<!-- .br -->
<para>Which of two possible kinds of engine the network is to be. This is
an architecture parm, so it should be left unset for a run that is to
read a network weights file. The allowed values are:</para>

  <!-- .RS -->
  </listitem>
  </varlistentry>
</variablelist>
<variablelist remap='TP'>
  <varlistentry>
  <term><emphasis remap='B'>classifier </emphasis><literal>0</literal></term>
  <listitem>
<!-- .br -->
<para>The network is to be trained to map any feature vector to one of a
small number of classes. It is to be trained using a set of
feature vectors and their associated correct classes.</para>
  </listitem>
  </varlistentry>
  <varlistentry>
  <term><emphasis remap='B'>fitter </emphasis><literal>1</literal></term>
  <listitem>
<!-- .br -->
<para>The network is to be trained to approximate an unknown function that
maps any input real vector to an output real vector. It is to be
trained using a set of input-vector/output-vector pairs of the
function. <emphasis remap='B'>NOTE: this is not currently supported.</emphasis></para>
  </listitem>
  </varlistentry>
</variablelist>
<!-- .RE -->

<variablelist remap='TP'>
  <varlistentry>
  <term><emphasis remap='B'>errfunc</emphasis></term>
  <listitem>
<!-- .br -->
<para>Type of error function to use (always with the addition of a
regularization term, consisting of <emphasis remap='B'>regfac</emphasis> times half the average
of the squares of the network weights).</para>

  <!-- .RS -->
  </listitem>
  </varlistentry>
</variablelist>
<variablelist remap='TP'>
  <varlistentry>
  <term><emphasis remap='B'>mse </emphasis><literal>0</literal></term>
  <listitem>
<!-- .br -->
<para>Mean-squared-error between output activations and target values, or its
equivalent computed using classes instead of target vectors. This is the
recommended error function.</para>

  </listitem>
  </varlistentry>
  <varlistentry>
  <term><emphasis remap='B'>type_1 </emphasis><literal>1</literal></term>
  <listitem>
<!-- .br -->
<para>Type 1 error function; requires floating-point parm <emphasis remap='B'>alpha</emphasis> be set.
(Not recommended.)</para>

  </listitem>
  </varlistentry>
  <varlistentry>
  <term><emphasis remap='B'>pos_sum </emphasis><literal>2</literal></term>
  <listitem>
<!-- .br -->
<para>Positive sum error function. (Not recommended.)</para>
  </listitem>
  </varlistentry>
</variablelist>
<!-- .RE -->

<variablelist remap='TP'>
  <varlistentry>
  <term><emphasis remap='B'>boltzmann</emphasis></term>
  <listitem>
<!-- .br -->
<para>Controls whether Boltzmann pruning of network weights is to be done
and, if so, the type of threshold to use:</para>

  <!-- .RS -->
  </listitem>
  </varlistentry>
</variablelist>
<variablelist remap='TP'>
  <varlistentry>
  <term><emphasis remap='B'>no_prune </emphasis><literal>0</literal></term>
  <listitem>
<!-- .br -->
<para>Do no Boltzmann pruning.</para>

  </listitem>
  </varlistentry>
  <varlistentry>
  <term><emphasis remap='B'>abs_prune </emphasis><literal>2</literal></term>
  <listitem>
<!-- .br -->
<para>Do Boltzmann pruning using threshold exp(- |<emphasis remap='B'>w</emphasis>| / <emphasis remap='B'>T</emphasis>), where
<emphasis remap='B'>w</emphasis> is a network weight being considered for possible pruning and
<emphasis remap='B'>T</emphasis> is the Boltzmann <emphasis remap='B'>temperature</emphasis>.</para>

  </listitem>
  </varlistentry>
  <varlistentry>
  <term><emphasis remap='B'>square_prune </emphasis><literal>3</literal></term>
  <listitem>
<!-- .br -->
<para>Do Boltzmann pruning using threshold exp(- <emphasis remap='B'>w^2</emphasis> / <emphasis remap='B'>T</emphasis>), where
<emphasis remap='B'>w</emphasis> and <emphasis remap='B'>T</emphasis> are as above.</para>
  </listitem>
  </varlistentry>
</variablelist>
<!-- .RE -->

<variablelist remap='TP'>
  <varlistentry>
  <term><emphasis remap='B'>acfunc_hids, acfunc_outs</emphasis></term>
  <listitem>
<!-- .br -->
<para>The types of <emphasis remap='I'>activation functions</emphasis> to be used on the hidden nodes and
on the output nodes (separately settable for each layer). These are
architecture parms, so they should be left unset for a run that is to
read a network weights file. The allowed values are:</para>

  <!-- .RS -->
  </listitem>
  </varlistentry>
</variablelist>
<variablelist remap='TP'>
  <varlistentry>
  <term><emphasis remap='B'>sinusoid </emphasis><literal>0</literal></term>
  <listitem>
<!-- .br -->
<para>f(x) = 0.5 * (1 + sin(0.5 * x))</para>
  </listitem>
  </varlistentry>
  <varlistentry>
  <term><emphasis remap='B'>sigmoid </emphasis><literal>1</literal></term>
  <listitem>
<!-- .br -->
<para>f(x) = 1 / (1 + exp(-x)) (Also called logistic function.)</para>
  </listitem>
  </varlistentry>
  <varlistentry>
  <term><emphasis remap='B'>linear </emphasis><literal>2</literal></term>
  <listitem>
<!-- .br -->
<para>f(x) = 0.25 * x</para>
  </listitem>
  </varlistentry>
</variablelist>
<!-- .RE -->

<variablelist remap='TP'>
  <varlistentry>
  <term><emphasis remap='B'>priors</emphasis></term>
  <listitem>
<!-- .br -->
<para>What kind of prior weighting to use to set the final pattern-weights,
which control the relative amounts of impact the various patterns have
when doing the computations. These final pattern-weights remain fixed
for the duration of a training run, but of course they can be changed
between training runs.</para>

  <!-- .RS -->
  </listitem>
  </varlistentry>
</variablelist>
<variablelist remap='TP'>
  <varlistentry>
  <term><emphasis remap='B'>allsame </emphasis><literal>0</literal></term>
  <listitem>
<!-- .br -->
<para>Set each final pattern-weight to (1 / <emphasis remap='B'>npats</emphasis>). (The simplest thing
to do; appropriate if the set of patterns has a natural distribution.)</para>
  </listitem>
  </varlistentry>
  <varlistentry>
  <term><emphasis remap='B'>class </emphasis><literal>1</literal></term>
  <listitem>
<!-- .br -->
<para>Set each final pattern-weight to the class-weight of the class of the
pattern concerned divided by <emphasis remap='B'>npats</emphasis>. The class-weights are derived
by dividing the given-class-weights, read from the <emphasis remap='B'>class_wts_infile</emphasis>,
by the derived-class-weights, computed for the current data set and
the normalize them to sum to 1.0.  (Appropriate if the frequencies of
the several classes, in the set of patterns, are not approximately equal
to the natural frequencies (prior probabilities), so as to compensate for
that situation.)</para>
  </listitem>
  </varlistentry>
  <varlistentry>
  <term><emphasis remap='B'>pattern </emphasis><literal>2</literal></term>
  <listitem>
<!-- .br -->
<para>Set the final pattern-weights to values read from <emphasis remap='B'>pattern_wts_infile</emphasis>
divided by <emphasis remap='B'>npats</emphasis>. (Appropriate if none of the other settings of
priors does satisfactory calculations (one can do whatever calculations
one desires), or if one wants to dynamically change these weights between
sessions of training.)</para>
  </listitem>
  </varlistentry>
  <varlistentry>
  <term><emphasis remap='B'>both </emphasis><literal>3</literal></term>
  <listitem>
<!-- .br -->
<para>Set each final pattern-weight to the class-weight of the class of the
pattern concerned, times the provided pattern-weight, and divided by
<emphasis remap='B'>npats</emphasis>; compute the class-weights as previously described in
<emphasis remap='B'>class priors</emphasis> and read pattern-weights from file
<emphasis remap='B'>pattern_wts_infile</emphasis>. (Appropriate if one wants to both adjust
for unnatural frequencies, and dynamically change the pattern weights.)</para>
  </listitem>
  </varlistentry>
</variablelist>
<!-- .RE -->

<variablelist remap='TP'>
  <varlistentry>
  <term><emphasis remap='B'>patsfile_ascii_or_binary</emphasis></term>
  <listitem>
<!-- .br -->
<para>Tells mlp which of two supported formats to expect for the patterns file
that it will read at the start of a run.  (If much compute time is being
spent reading ascii patsfiles, it may be worthwhile to convert them to
binary format: that causes faster reading, and the binary-format files
are considerably smaller.)</para>
  <!-- .RS -->
  </listitem>
  </varlistentry>
</variablelist>
<variablelist remap='TP'>
  <varlistentry>
  <term><emphasis remap='B'>ascii </emphasis><literal>0</literal></term>
  <listitem>
<!-- .br -->
<para><emphasis remap='B'>patterns_infile</emphasis> is in ascii format.</para>
  </listitem>
  </varlistentry>
  <varlistentry>
  <term><emphasis remap='B'>binary </emphasis><literal>1</literal></term>
  <listitem>
<!-- .br -->
<para><emphasis remap='B'>patterns_infile</emphasis> is in binary (FORTRAN-style binary) format.</para>
  </listitem>
  </varlistentry>
</variablelist>
<!-- .RE -->

<variablelist remap='TP'>
  <varlistentry>
  <term><emphasis remap='B'>do_confuse</emphasis></term>
  <listitem>
<!-- .br -->
  <!-- .RS -->
<para></para> <!-- FIXME: blank list item -->
  </listitem>
  </varlistentry>
</variablelist>
<variablelist remap='TP'>
  <varlistentry>
  <term><emphasis remap='B'>true </emphasis><literal>1</literal></term>
  <listitem>
<!-- .br -->
<para>Compute the confusion matrices and miscellaneous information and include
them in <emphasis remap='B'>short_outfile</emphasis>.</para>
  </listitem>
  </varlistentry>
  <varlistentry>
  <term><emphasis remap='B'>false </emphasis><literal>0</literal></term>
  <listitem>
<!-- .br -->
<para>Do not compute the confusion matrices and miscellaneous information.</para>
  </listitem>
  </varlistentry>
</variablelist>
<!-- .RE -->

<variablelist remap='TP'>
  <varlistentry>
  <term><emphasis remap='B'>show_acs_times_1000</emphasis></term>
  <listitem>
<!-- .br -->
<para>This parm need be set only if the run is to produce a <emphasis remap='B'>long_outfile</emphasis>.</para>
  <!-- .RS -->
  </listitem>
  </varlistentry>
</variablelist>
<variablelist remap='TP'>
  <varlistentry>
  <term><emphasis remap='B'>true </emphasis><literal>1</literal></term>
  <listitem>
<!-- .br -->
<para>Before recording the network output activations in <emphasis remap='B'>long_outfile</emphasis>,
multiply them by 1000 and round to integers.</para>
  </listitem>
  </varlistentry>
  <varlistentry>
  <term><emphasis remap='B'>false </emphasis><literal>0</literal></term>
  <listitem>
<!-- .br -->
<para>Record the activations as their original floating-point values.</para>
  </listitem>
  </varlistentry>
</variablelist>
<!-- .RE -->

<variablelist remap='TP'>
  <varlistentry>
  <term><emphasis remap='B'>do_cvr </emphasis><emphasis remap='I'>(See the notes on </emphasis><emphasis remap='B'>oklvl</emphasis><emphasis remap='I'>.)</emphasis></term>
  <listitem>
<!-- .br -->
  <!-- .RS -->
<para></para> <!-- FIXME: blank list item -->
  </listitem>
  </varlistentry>
</variablelist>
<variablelist remap='TP'>
  <varlistentry>
  <term><emphasis remap='B'>true </emphasis><literal>1</literal></term>
  <listitem>
<!-- .br -->
<para>Produce a correct-vs.-rejected table and include it in <emphasis remap='B'>short_outfile</emphasis>.</para>
  </listitem>
  </varlistentry>
  <varlistentry>
  <term><emphasis remap='B'>false </emphasis><literal>0</literal></term>
  <listitem>
<!-- .br -->
<para>Do not produce a correct-vs.-rejected table.</para>
  </listitem>
  </varlistentry>
</variablelist>
<!-- .RE -->

</refsect1>

<refsect1 id='examples'><title>EXAMPLE(S)</title>
<para>From <emphasis remap='I'>test/pcasys/execs/mlp/mlp.src</emphasis>:</para>

<!-- .RS -->
<para><emphasis remap='B'>% mlp</emphasis>
<!-- .br -->
Runs mlp assuming the default specfile ("spec") in the local directory.</para>

<para><emphasis remap='B'>% mlp myspecfile</emphasis>
<!-- .br -->
Runs mlp using the specfile "myspecfile".</para>
</refsect1>

<refsect1 id='see_also'><title>SEE ALSO</title>
<para>fixwts (1A), mlpfeats (1A)</para>


</refsect1>

<refsect1 id='author'><title>AUTHOR</title>
<para>NIST/ITL/DIV894/Image Group</para>
</refsect1>
</refentry>

