APPENDIX A.  MLP TRAINING OUTPUT 
Explanation of the output produced during MLP training 
Whentheprogram mlpdoesatrainingrun,itwritesoutputtothe
standarderrorandwritesthesameoutputtothe
short_outfilespecifiedinthespecfile.
Thepurposeofthisappendixisto

explainthemeaningofthisoutput. Mlpproducessimilaroutput
foratestingrunexceptthatthe"trainingprogress"partismissing.

Pattern-Weights  Asapreliminary,itwillbehelpfultodiscuss
the"pattern-weights"which mlpuses,sincetheyareusedinthe
calculationsofmanyofthevaluesshownintheoutput.Thepattern-weightsare

"prior"weights,oneforeachpattern; 12theyremainconstant
duringatraining(ortesting)run,althoughitispossibletodoa
training"meta-run"thatisasequenceoftrainingrunsandto
changethepattern-weightsbetweentheruns.Thesettingofthe
pattern-weightsiscontrolledbythepriorsvaluesetinthe
specfileandmaybeaffectedbyprovideddatafiles,as
follows(inall cases,thedivisionby Nismerelya
normalizationthatslightlyreduces
theamountofcalculationneededlater):

allsame:if priorsis allsame,theneachpattern-weightisset
to(1/ N),whereNisthenumberofpatterns. class: afileof
givenclass-weightsmustbesupplied;eachgivenclass-weightis
dividedbytheactualclass-weightoftheinputdatasetandthe
newclass-weightsarenormalizedsotheir

sumis1.0.Theneachpattern-weightissettothenewclass-weight
oftheclassofthecorrespondingpattern,dividedby N(numberof
patterns).Theendresultisthatifthe actualdistributionof
thedatasetdoesnotequalthatofthegivenclass-weights,
class-weightsareadjustedsothefinalresultsapproximatewhat
thescoreswouldbeifthe distributionwerethesameasthegiven
class-weights.Iftheuserisonlyconcernedabouttheunadjusted
scoreforthegivendata,setthegivenclass-weightsequaltothe
actual class-weights.
pattern:afileof(original)pattern-weightsmustbesupplied;
eachofthemisdividedby Ntoproduce
thecorrespondingpattern-weight.

both: filesofclass-weightsand(original)pattern-weights
mustbothbesupplied;eachpattern-weightisthensettothe
class-weight(class-weightsareadjustedasdiscussedinthe class

portionofthislist)oftheclassofthecorrespondingpattern,
timesthecorresponding(original)pattern-weight,dividedbyN.

Thepattern-weightsareusedinthecalculationoftheerrorvalue
that mlpattemptstominimizeduringtraining.Whenthetraining
patternsaresentthroughthenetwork,eachpatternproduces an
errorcontribution,whichismultipliedbythepattern-weight
forthatpatternbeforebeingaddedtoanerror
accumulator(SectionA.1.1.2.2).Thepattern-weightsarealso
involvedinthe calculationsofseveralotherquantitiesbesides
theerrorvalue;alltheseusesaredescribedbelow.
References[49]discusstheuseofclass-basedprior
weights(Section5.4,pages10-11)which correspond
totheclasssettingofpriors. 12Apatternisa
feature-vector/classorfeature-vector/target-vectorpair

Explanation of Output 
A.1.1.1 Header 
Thefirstpartoftheoutputisa"header"showingthespecfile
parametervalues.Hereistheheaderofthe short_outfile

test/pcasys/execs/mlp/mlp_dir/trn1.errproducedbythefirst
trainingrunofasequenceof
runsusedtotrainthefingerprintclassifier:

Classifier MLP

Training run Patterns file: fv1-9mlp.kls; using all 24300
patterns Final pattern-wts: made
from provided class-wts and pattern-wts,

files priors and patwts Error function: sum of squares
Reg. factor: 2.000e+00 Activation fns. on hidden, output
nodes: sinusoid, sinusoid Nos. of input, hidden, output
nodes: 128, 128, 6 Boltzmann pruning, thresh.
exp(-w^2/T), T 1.000e-05
Will use SCG Initial network weights: random, seed 12347
Final network weights will be written as file trn1.wts
Stopping criteria (max. no. of iterations 50):

(RMS err) <= 0.000e+00 OR (RMS g) <= 0.000e+00 *
(RMS w) OR (RMS err) > 9.900e-01 *
(RMS err 10 iters ago) OR

(OK - NG count) < (count 10 iters ago) + 1.(OK level: 0.000) Long outfile: trn1l.err

Given and Actual Prior Weights

A => 0.036583 0.038025 L => 0.338497 0.319506
R => 0.316920 0.306584 S => 0.000000 0.005597
T => 0.029482 0.030123 W => 0.278518 0.300165
Given/Actual = New Prior Weights

A -> 0.193897 L -> 0.213518 R -> 0.208333
S -> 0.000000 T -> 0.197247 W -> 0.187005

SCG: doing <= 50 iterations; 17286 variables.

A.1.1.2 Training Progress 
Thenextpartoftheoutputlistsarunningupdateonthetraining
progress.Thefirstfewlinesoftrainingprogressreportedare:

pruned 80 6 86 C 1.67872e+05 H 2.40068e+04 R 85.70 M -0.00 T 0.0841

Iter Err ( Ep Ew) OK UNK NG OK UNK NG

0 0.474 (0.240 0.289) 6564 0 17736 = 27.0 0.0 73.0 % 0.0 0 4 19 0 0 70 pruned
108 3 111 C 1.75555e+05 H 2.54052e+04 R 85.53 M -0.00 T 0.0836 pruned
124 5 129 C 1.84026e+05 H 2.58204e+04 R 85.97 M -0.00 T 0.0824 pruned
129 6 135 C 2.20275e+05 H 2.72642e+04 R 87.62 M -0.00 T 0.0814 pruned
138 3 141 C 1.73226e+05 H 2.76075e+04 R 84.06 M -0.00 T 0.0803 pruned
138 5 143 C 1.78328e+05 H 2.99593e+04 R 83.20 M -0.00 T 0.0762 pruned
152 4 156 C 1.74579e+05 H 3.03576e+04 R 82.61 M -0.00 T 0.0745 pruned
167 5 172 C 1.81337e+05 H 3.14710e+04 R 82.65 M -0.00 T 0.0681 pruned
149 7 156 C 1.89832e+05 H 3.95510e+04 R 79.17 M -0.00 T 0.0536 pruned
178 7 185 C 1.78410e+05 H 3.90489e+04 R 78.11 M -0.00 T 0.0526 pruned
184 7 191 C 2.19716e+05 H 3.99658e+04 R 81.81 M -0.00 T 0.0490

10 0.328 (0.103 0.220) 19634 0 4666 = 80.8 0.0 19.2 % 0.0 2 90 99 0 1 68

Theline

Iter Err ( Ep Ew) OK UNK NG OK UNK NG
comprisescolumnheadersthatpertaintothosesubsequentlines
thatbeginwithaninteger("firstprogresslines");eachfirst
progresslineisfollowedbya"secondprogressline,T^andthereare

"pruninglines"ifBoltzmannpruningisused.Thesethreetypesof
linesarediscussedbelow,secondprogresslinesfirstbecause
someofthecalculationsusedtoproducethemarelaterused to
makethefirstprogresslines.
A.1.1.2.1 Second progress lines 
Thesearethelinesthatbeginwithfractionalnumbers;
thefirstofthemintheaboveexampleis

0.0 0 4 19 0 0 70
Ignoringforamomentthefirstvalueinsuchaline,
theremainingvaluesarethe"percentages"rightbyclass,which

mlpcalculatesasfollows.Itmaintainsthree
pattern-weight-accumulatorsforeachclass:

( )ria =rightpattern-weight-accumulatorforcorrectclassi
( )wia =wrongpattern-weight-accumulatorforcorrectclassi
( )uia =unknown(rejected)pattern-weight-accumulatorforcorrectclassi

Whenmlpsendsatrainingpatternthroughthenetworktheresultis
anoutputactivationforeachclass;thehypotheticalclassis,
ofcourse,whicheverclassreceivesthehighestactivation.If
the highestactivationequalsorexceedstherejection
threshold oklvlsetinthespecfile,then mlp acceptsits
resultforthispattern,andaddsits
pattern-weight(Section0)toeither ( )ria or ( )wia 
(where iisthecorrectclassofthepattern)accordingtowhether
thenetworkclassifiedthepatternrightlyorwrongly.
Otherwise,(i.e.ifthehighestactivationislessthen oklvl)

mlpadds

thepattern-weightto ( )uia .Theseaccumulators
reachtheirfinalvaluesafterallofthetraining

patternsaresentthroughthenetwork. Mlpthendefines
theright"percentage"ofcorrectclass itobe

( ) ( ) ( ) ( )uiwiri

ri

aaa a

++ 100 

Itshowsthesevalues,roundedtointegers,inthesecondprogress
lines,asthevaluesafterthefirstone.Forexample,thesecond
progresslineaboveshowsthattheright"percentages"of correct
classes0and1are0and4.13 If priorsis allsamethenthe
pattern-weightsareallequalandso ( )ria ,etc.arethe
numbers classifiedrightly,etc.timesthissingle
pattern-weight;thepattern-weightcancelsoutbetweenthe
numeratoranddenominatoroftheaboveformula,so
thattheresultingvaluereallyisthe

percentageofthepatternsofclassithatthenetworkclassified
rightly.Ifpriorshasavalueotherthanallsame(i.e. class,
pattern,or both)thentheright"percentages"oftheclassesare
notthe simplepercentagesbutratherareweightedquantities,
whichmaymakemoresensethanthesimplepercentagesifsome
patternsshouldhavemoreimpactthanothers,asindicatedby
their largerweights.14 Asforthefirstvalueofasecond
progressline,thisismerelytheminimumof
theright"percentages"oftheclasses,butshown
roundedtothenearesttenthratherthantothenearest

integer.Thisminimumvalue
showshowthenetworkisdoingonits"worst"class.15
A.1.1.2.2 First progress lines 
Thesearethelinesthatbeginwithaninteger.Thecolumnheadings,
whichpertaintotheselines,andthe
firstoftheselinesintheexample,are:

Iter Err ( Ep Ew) OK UNK NG OK UNK NG

0 0.474 (0.240 0.289) 6564 0 17736 = 27.0 0.0 73.0 %

Thevaluesinafirstprogresslinehavethefollowingmeanings:
Iter: Trainingiterationnumber,numberingstartingat0.A
firstprogressline(andsecondprogressline)areproduced
everynfreq'thiteration(setinthespecfile).

Err, Ep, Ew:Thecalculationsleadingtothesevaluesareasfollows.

13InthiscasetheclassesS,indexnumbersT^are0through5andthe
classesarefingerprinttypesArch(A),LeftL oop (L),
RightLoop(R),TentedArch(T),Scar(S),andWhorl(W) . Inthis
discussion,S,class iT^merelymeanstheclasswhose
indexnumber,numberstartingat0,is i.Notealsothatalthough
thesoftwareusesclassindexnumbers thatstartat0,theclass
indexnumbersitwritestolong_outfilestartat1.14In
particular,ifthetrainingpatternssetissuchthatthe
proportionsofthepatternsbelongingtothevariousclasses
arenotapproximatelyequaltothenaturalfrequenciesofthe
classes,thenitmaybeagoodideatouse
class-weights(priorssettoclass,andclass-weightsprovided
rnafile)tocompensatefortheerroneousdistribution.See[49].
15WhenmlpuseshybridSCG/LBFGStrainingratherthanonlySCG(it
doesthisonlyifpruningisnotspecified)it switchesfromSCGto
LBFGSwhentheminimumreachesorexceedsaspecified
threshold,scg_earlystop_pct.

Mlpprintsthe Err, Ep  and Ewvaluesasdefinedabove.Note
thatthevalue mlpattemptstominimizeis E,butpresumablythe
sameeffectwouldbehadbyattemptingtominimize
Err, sinceitisanincreasingfunctionofE.

N

n aij

tij ( )patiw

( )msepatiE ,

( )mseE1

( )1,typepatiE

( )11typeE ( )possumpatiE ,

( )11typeE

1E Ep ( )wsqs

Ew

E Err

= = = = =

=

=

 =

  =

 =

  =

=

= = =

= =

numberofpatterns numberofclasses activationproducedby
patterniatoutputnodej(i.e.classj) targetvalue
foraij pattern-weightofpatterni(SectionA.1.1)

( )\Delta  -

= -

1

0

2n j ijij ta ,errorcontributionforpatterniiferrfuncismse

( ) ( )\Delta - =

1 0

,21 N i

msepatipati Ewn 

( )( )\Delta  ` --+- kj ijik aaffexp1 11 ,wherekiscorrectclassofpatterni,

errorcontributionforpatterniiferrfuncistype_1(#isalpha)

( ) ( )\Delta - =

1 0

1,1 N i

typepatipati Ewn 

( )\Delta  -

= -+-

1

0 110 n

j ijijijij tata ,errorcontributionforpatterniif errfuncispos_sum 

( ) ( )\Delta - =

1 0

,1 N i

possumpatipati Ewn 

( )mseE1 , ( )mseE1 ,or ( )mseE1 ,accordingtoerrfunc 
1E iferrfuncispos_sum, 12E otherwise halfofmeansquarednetworkweight

( )wsqs2 

( )wsqsE *+ regfac1 

E2 

OK, UNK, NG, OK, UNK, NG: "Numbers"ofpatterns
OK(classifiedcorrectly),
UNKnown(rejected),andwro NGorNoGood(classifiedincorrectly),thenthecorresponding

"percentages.T^ Mlpcalculatesthesevaluesasfollows.Itadds
uptheby-class accumulators ( )ria , ( )wia ,and ( )uia
definedearliertomakeoverallaccumulators,wherenis thenumberofclasses:

( ) ( )\Delta -

==

1 0 n i

rir aa 

( ) ( )\Delta -

==

1 0 n i

wiw aa 

( ) ( )\Delta -

==

1 0 n

i

uiu aa 

Itcomputes"numbers"right,wrong,andunknown--thefirst OK,
NG,and UNKvaluesofafirstprogressline--asfollows,where
Nisthenumberofpatternsandsquarebracketsdenote roundingtoaninteger:

( ) ( ) ( ) ( )uwrrwu aaaa ++= 

( ) ( ) ( )][ rwurr aNan = =T^numberT^right

( ) ( ) ( )][ rwuww aNan = =T^numberT^wrong ( ) ( ) ( )wru
nnNn --= =T^numberT^unknown Fromthese"numbers,T^ mlp
computescorresponding"percentages"--thesecond OK,NG,andUNKvalues--asfollows:

( ) ( ) ][100 Nnp rr =  ( ) ( ) ][100 Nnp ww = 

( ) ( ) ][100 Nnp uu =  If priorsis allsamethensincethe
pattern-weightsareallequal,cancellationofthesingle
pattern-weightoccursbetweenthenumeratorsanddenominators
oftheformulasabovefor ( )rn  and ( )wn ,sothattheyreally
arethenumbersofpatternsclassifiedrightlyandwrongly.Then
itis obviousthat ( )un reallyisthenumberunknownandthat
( )rp ,etc.reallyarethepercentages classifiedrightly,etc.

A.1.1.2.3 Pruning lines (optional) 
Theselines,whichbeginwith"pruned,T^appearifBoltzmann
pruningisspecified( boltzmannsettoabs_pruneorsquare_pruneinspecfile,
anda temperatureset).Thefirstpruninglineof

theexampleis pruned 80 6 86 C 1.67872e+05 H 2.40068e+04 R 85.70 M -0.00 T 0.0841
Regardlessof nfreq, mlpwritesapruninglineeverytimeit
performspruning.Thefirstthreevaluesofapruninglinearethe
numbersofnetworkweightsthat

mlppruned(temporarilysettozero)inthefirstweightslayer,inthesecondlayer,andinbothlayerstogether.Theremaining

valuesannouncedbytheletters C,H,R,and M,arecalculatedas
follows(thevalueannouncedbyTactuallyisnot
calculatedcorrectly,andshouldbeignored):


( )wtsn ( )prunedn ( )unprunedn ( ) ( )minmax , ww

C ( )abss log

( )12ws

H

R M

= =

= = = =

= = = =

numberofnetworkweights(bothlayers) numberofweightspruned

( ) ( )prunedwts nn - 

maximum&minimumabsolutevaluesofunprunedweights

( ) ( ) ( )( ) ( )( )12logloglog minmax +- wwn unpruned =capacity

sumoflogarithmsofabsolutevaluesofunprunedweights

( ) ( ) ( ) ( )( ) ( )( )2loglog12log minlog wns unprunedabs -+ 

C- ( )12ws =entropy

( ) ( ) C1212100 ww ss* 

meanofunprunedweights

A.1.1.3 Confusion Matrices and Miscellaneous Information (Optional) 
Ifdo_confuseissettotrueinthespecfile,thenextpartofthe
outputconsistsoftwo"confusionmatrices"andsome
miscellaneousinformation:

oklvl 0.00

# Highest two outputs (mean) 0.784 0.145; mean diff 0.639

key name

A A L L R R S S T T W W
# key: A L R S T W
# row: correct, column: actual
# A: 333 315 267 0 0 9
# L: 12 7522 86 0 0 144
# R: 21 148 7128 0 0 153
# S: 0 0 0 0 0 0
# T: 60 346 323 0 0 3
# W: 2 798 509 0 0 5985
# unknown # * 0 0 0 0 0 0

percent of true IDs correctly identified (rows)

36 97 96 0 0 82 percent of predicted IDs correctly identified (cols)

78 82 86 0 0 95

# mean highest activation level
# row: correct, column: actual
# key: A L R S T W
# A: 35 43 43 0 0 38
# L: 32 83 41 0 0 48
# R: 32 43 83 0 0 49
# S: 88 4666 4042 0 0 317
# T: 33 49 48 0 0 38
# W: 29 61 58 0 0 85

# unknown # * 0 0 0 0 0 0

Histogram of errors, from 2^(-10) to 1

15899 5322 10477 14278 15596 22398 16728 16005 13376 9364 6357

10.9 3.7 7.2 9.8 10.7 15.4 11.5 11.0 9.2 6.4 4.4%

Thefirstlineofthisoptionalsectionoftheoutputshows
thevalueoftherejectionthresholdoklvlsetinthe
specfile(thiswasalreadyshowninthe
header).Thenextlineshowsthemean

values,overthetrainingpatternsassentthroughthenetworkat
theendoftraining,ofthehighestandsecond-highestoutputnode
values,andthemeandifferenceofthesevalues.Nextisatable
showingtheshortclassname("key")andlongclassname("name")of
eachclass.Inthisexamplethekeysandnamesarethesame,butin
generalthenamescanbequitelongwhereasthekeys mustbeno
longerthantwocharactersinlength:theshortkeys
areusedtolabeltheconfusionmatrices.

Nextaretheconfusionmatricesof"numbers"andof"mean
highestactivationlevel.T^ Mlphasthefollowingaccumulators:

( )patwtsija =pattern-weightaccumulatorforcorrectclassi
andhypotheticalclassj ( )highacija =high-activation
accumulatorforcorrectclassiandhypotheticalclassj ( )u
highacija , =high-activationunknownaccumulatorforcorrect
classi Ifapatternsentthroughthenetworkproducesahighest
activationthatmeetsorexceeds oklvl (sothat mlpacceptsits
resultforthispattern),then mlpaddsits
pattern-weightto ( )patwtsija and

addsthehighestactivationto ( )highacija ;whereiandjare
thecorrectclassandhypotheticalclassof thepattern.
Otherwise,i.e.if mlpfindsthepatterntobeunknown(rejectsthe
result),itaddsits pattern-weightto ( )uija (SectionA.1.1.2.1)and
addsthehighestactivationto ( )uhighacija , ,where iis

thecorrectclassofthepattern.Afterithasprocessedallthe
patterns, mlpcalculatestheconfusionmatrix
of"numbers"andits"unknown"line;someadditionalinformationconcerning

therowsandcolumnsofthatmatrix;andtheconfusion
matrixof"meanhighestactivationlevel"andits"unknown"line,asfollows.

Firstdefinesomenotation: Mlpcalculatesthevaluesasfollows,
where ( )ria , ( )wia , ( )uia andareasdefinedin
Section A.1.1.2.1andsquarebracketsagaindenote
roundingtoaninteger:16

( ) ( ) ( )( ) ( )

\Delta \Delta \Theta 

\Lambda  \Xi \Xi \Pi  \Sigma 

+= \Delta  -=10nj patwtsijui

patwtsijpatsiconfuse ij aa aNn 

( ) ( ) ( )( ) ( ) ( ) \Delta 

\Theta 

\Lambda \Xi  \Pi  \Sigma 

++= uiwiri

uipatsiuconfuse i aaa aNn , 

( ) ( )( ) ( ) \Delta 

\Theta 

\Lambda \Xi  \Pi  \Sigma 

-= uconfuseipatsi

confuseiirowr i nN np ,, 100 

( ) ( )( )

\Delta \Delta \Theta 

\Lambda  \Xi \Xi \Pi  \Sigma =

\Delta 

-

=

1 0

, 100n

i

confuseij

confusejjcolr j nnp 

( ) ( )( )

\Delta \Delta \Theta 

\Lambda  \Xi \Xi \Pi  \Sigma =

confuseij

highacijconfuse ij n ah 100 

( ) ( )( ) \Delta 

\Theta 

\Lambda \Xi  \Pi  \Sigma =

uconfusei

uhighaciuconfuse i n ah ,

,, 100 

If priorsis allsame,thepattern-weightsareallequal,and
cancellationofthesinglepattern- weightbetweennumerator
anddenominatorcauses ( )confuseijn abovetobe
thenumberofpatterns

ofcorrectclass iandhypotheticalclass j;similarly, ( )u
confusein , reallyisthenumberofpatternsof  16The
denominatorsoftheexpressionshownherefor ( )confuseijn
and ( )uconfusein , areequal,buttheseexpressionsshow

whatthesoftwareactuallycalculates.

( )patiN ( )confuseijn ( )uconfusein ,

( )rowrip ,

( )colrjp , ( )confuseijh ( )uconfuseih ,

= = = = = = =

numberofpatternsofcorrectclassi valueinrowiandcolumnjof
firstconfusionmatrix(ofS,numbersT^) ithvalueofS,
unknownT^lineatbottomoffirstconfusionmatrix ithvalueofS,
percentoftrueIDscorrectlyidentified(rows)T^line jth
valueofS,percentofpredictedIDscorrectly
identified(cols)T^line valueinrowiandcolumnjof
secondconfusionmatrix jthvalueofS,unknownT^line
atbottomofsecondconfusionmatrix

correctclass ithatwereunknown; ( )rowrip , and ( )colrjp ,
 reallyarethepercentagesthattheon- diagonal(correctly
classified)numbersinthematrixcompriseoftheirrowsand
columns respectively; ( )confuseijh reallyis
themeanhighestactivationlevel(multipliedby100androunded

toaninteger)ofthepatternsofcorrectclass iandhypothetical
class j;and ( )uconfuseih , reallyisthe meanhighest
activationlevelofthepatternsofcorrectclass ithatwere
unknown.If priorshasoneofitsothervalues,the
printedvaluesareweightedversionsofthesequantities.

Thefinalpartofthisoptionalsectionoftheoutputisahistogram
oferrors.Thispertainstotheabsoluteerrorsbetweenoutput
activationsandtargetactivations,acrossall
outputnodes(6nodes inthisexample)andalltraining
patterns(24,300patternsinthisexample),whenthepatternsare
sentthroughthetrainednetwork.Oftheresultingsetofabsolute
errorvalues(243,000valuesin thisexample),thishistogram
showsthenumber(firstline)andpercentage(secondline)ofthese
valuesthatfallintoeachofthe11intervals(-$,2-10],(2-10,2-9],E^,(2-1,1].

A.1.1.4 Final Progress Lines 
Thenextpartoftheoutputconsistsofarepeatofthe
column-headersline,finalfirst-progress-line,
andfinalsecond-progress-lineofthetraining
progresspartoftheoutput,butwithan F

prependedtothefinalfirst-progress-line:

Iter Err ( Ep Ew) OK UNK NG OK UNK NG F 50 0.098 (0.081 0.040) 21211 0 3089 = 87.3 0.0 12.7 %

0.0 36 97 96 0 0 82

A.1.1.5 Correct-vs.-Rejected Table (Optional) 
Ifdo_cvrissetto trueinthespecfile,thenextpartoftheoutput
isacorrect-vs.-rejectedtable;thefirstandlast
fewlinesofthistable,fromtheexampleoutput,are:

thresh right unknown wrong correct rejected
1tr 0.000000 21211 0 3089 87.29 0.00
2tr 0.050000 21211 0 3089 87.29 0.00
3tr 0.100000 21211 0 3089 87.29 0.00
4tr 0.150000 21211 0 3089 87.29 0.00
5tr 0.200000 21211 0 3089 87.29 0.00
...
48tr 0.975000 3777 20521 2 99.95 84.45
49tr 0.980000 3230 21068 2 99.94 86.70
50tr 0.985000 2691 21607 2 99.93 88.92
51tr 0.990000 2141 22158 1 99.95 91.19
52tr 0.995000 1509 22791 0 100.00 93.79

Mlpproducesthesetablevaluesasfollows.Ithasafixedarrayof
rejection-thresholdvalues,whichhavebeensetinan
unequally-spacedpatternthatworkswell,anditusesthree
pattern- weight-accumulatorsforeachthreshold:
As mlpsendseachpatternthroughthefinishednetwork,
 17itloopsoverthethresholds tk:foreach k,it
comparesthehighestnetworkactivationproducedforthepatternwith t

ktodecidewhetherthepatternwouldbeacceptedorrejected.
Ifaccepted,itaddsthepattern-weightofthat

patterneitherto ( )rcvrka , orto ( )wcvrka , accordingtowhetheritclassifiedthepatternrightlyor

wrongly;ifrejected,itaddsthepattern-weightto ( )ucvrka ,
 .Afterallthepatternshavebeen throughthenetwork,
mlpfinishesthetableasfollows.Foreachthreshold
tkitcalculatesthefollowingvalues:

Mlpthenwritesalineofthetable.Thevaluesofthelinearethe
thresholdindex kplus1with "tr"18  appended, tk("thresh"), ( )rcvrn , ("right"), ( )ucvrn , ("unknown"), ( )wcvrn , ("wrong"), ( )corrcvrp , 

("correct"),and ( )rejcvrp , ("rejected").If priorsis
allsamethen,sinceallpattern-weightsarethesame,cancellation
ofthesinglepattern-weightoccursbetweennumeratoranddenominatorinthe

aboveexpressionsfor ( )rcvrn , and ( )wcvrn , ,sotheyreally
arethenumberofpatternsclassified rightlyandwronglyif
threshold tkisused.Also,itisobviousthat ( )ucvrn , really
isthenumberof patternsunknownforthisthreshold, ( )corrcvrp ,
 reallyisthepercentageofthepatternsacceptedat

thisthresholdthatwereclassifiedcorrectly,and ( )rejcvrp ,
 reallyisthepercentageofthe
Npatternsthatwererejectedatthisthreshold.If

priorshasoneofitsothervalues,thenthetabulated
valuesareweightedversionsofthesequantities.

17If do_cvris truethen

mlpcalculatesacorrectvs.rejecttable,butonlyfor
thefinalstateofthenetworkinthetrainingrun.

18forS,trainingT^;thecorrectvs.rejecttableforatestrunusesS,tsT^

tk ( )rcvrka ,

( )wcvrka ,

( )ucvrka ,

= = = =

kththreshold rightpattern-weight-accumulatorfor
kththreshold wrongpattern-weight-accumulatorfor
kththreshold unknownpattern-weight-accumulatorfor
kththreshold

( )rwucvra ,

( )rcvrn ,

( )wcvrn ,

( )ucvrn , ( )corrcvrp ,

( )rejcvrp ,

= = = = =

=

( ) ( ) ( )ucvrkwcvrkrcvrk aaa ,,, ++ 

( ) ( )][ ,, rwucvrkrcvrk aNa =S,numberrightT^ ( ) ( ) ][ ,, rwucvrkwcvrk aNa =S,numberwrongT^

( ) ( )wcvrrcvr nnN ,, -- =S,numberunknownT^(rejected)
( ) ( ) ( )( )wcvrrcvrrcvr nnn ,,,100 + =S,
percentagecorrectT^ ( ) Nn ucvr,100 =S,percentagerejectedT^

A.1.1.6 Final Information 
Thefinalpartoftheoutputshowsmiscellaneous
information: Iter 50; ierr 1 : iteration limit

Used 51 iterations; 154 function calls; Err 0.098;
 |g|/|w| 1.603e-04 Rms change in weights 0.289

User+system time used: 3607.3 (s) 1:00:07.3 (h:m:s)
Wrote weights as file trn1.wts

Thefirstlinehereshowswhatiterationthetrainingrunendedon,
andthevalueandmeaningofthereturncode ierr,whichindicateswhy

mlpstoppeditstrainingrun:intheexample,thespecified
maximumnumberofiterations( niter_max),50,had
beenused.Thistrainingrunwas

actuallythefirstrunofasequence;itsinitialnetworkweights
wererandom,buteachsubsequentrunusedthefinalweightsofthe
precedingrunasitsinitialweights.Theonly
parametervaried fromoneruntothenextwasthe
regularizationfactor regfac,whichwasdecreased
ateachstep:successiveregularization.Eachrunwas
limitedto50iterations,anditwasassumedthatthis
smalliterationlimitwouldbereachedbeforeanyof
theotherstoppingconditionsweresatisfied.
Whensinusoidactivationfunctionsareused,asinthiscase,best
trainingrequiresthatsuccessive regularizationbeused.If
sigmoidfunctionsareused,itisjustaswelltodoonlyonetraining
run,andinthatcaseoneshouldprobablysettheiterationlimit
toalargenumbersothattraining willbestoppedbyoneofthe
otherconditions,suchasanerrorgoal(egoal). Thenextline
shows:howmanyiterationsmlpused(countingthe0'thiteration;
yes,thisisstupidafteritalreadysaidwhatiteration
itstoppedon);howmanycallsoftheerrorfunctionitmade;

thefinalerrorvalue;andthefinalsizeoftheerrorgradient
vector(squarerootofsumofsquares),normalizedbydividingit
bythefinalsizeoftheweights.Thenextlineshows
theroot- mean-squareofthechangeinweights,betweentheir
initialvaluesandtheirfinalvalues.Thenextlineshowsthe
combineduserandsystemtimeusedbythetrainingrun. 19The
finalline merelyreportsthenameofthefiletowhich
mlpwrotethefinalweights.

19Settingtheinitialnetworkweights,readingthe
patternsfile,andother(minor)setupwork,arenottimed.