4.1.4 Running PCASYS 4.1.4.1 PCASYSDataFiles For the purpose of conveniently storing and transporting data, formats have been defined for three types of data. matrix: A matrix of real numbers. covariance: A covariance matrix of real numbers. This format saves disk space by storing only the non-strict lower triangle, which is sufficient because a covariance matrix is symmetric. classes: A list of classes, though tofas unsigned characters. For use with fingerprints in PCASYS, class values 0 through 5 denote arch, leftloop, rightloop, scar, tented arch, and whorl respectively. A classes file can be used for any classification situation with no more than 255 classes. Each type of file can exist in either an ASCII or a binary storage mode. A data file contains header information followed by the data itself. The header information contains a description string(can be of any length, but must contain no new lines; or can be left empty), code bytes indicating the file type and storage mode, and additional information specific to the file type. Additional information includes: if matrix, the two dimensions; if covariance, the order(i.e., what both dimensions of the symmetric matrix are) and the number of vectors used to build the covariance; and if classes, the number of elements.The data info command can be run on any PCASYS datafile. Data info writes a report of the header information to the standard output. 4.1.4.2Commands Installation of PCASYS provides the following commands, shown here with short descriptions. For a complete description and usage instructions for any of these commands, consult the manual pages in Appendix B or on the CD-ROM. 4.1.4.2.1ClassifierDemos pcasys non-graphical demo pcasysx graphical demo 4.1.4.2.2 Training(Optimization)Commands eva_evt finds the eigen values and eigen vectors lintran runs a linear transform on a set of vectors meancov makes mean and covariance from a set of vectors kltran runs a Karhunen-Loe`ve transform on a set of vectors mkoas makes orientation arrays from fingerprints mktran makes transform matrix incorporating the optimized regional weights optosf optimizes the overall smoothing factor optrws optimizes the regional weights 4.1.4.2.3UtilityCommands asc2bin converts an ASCII data file to binary bin2asc converts a binary data file to ASCII chgdesc changes the description string of a datafile cmbmcs combines several mean/covariance filepairs datainfo reports the header info of a data file to standard output oas2pics makes IHead pictures of orientation arrays rwpics makes IHead pictures of regionalweights or estimated gradients stackms stacks several matrixfiles together 4.1.4.3 RunningtheClassifier 4.1.4.3.1 GraphicalandNon-graphicalVersions The classifier has a graphical version( pcasysx) and a non-graphical version( pcasys). The graphical version, which requires the XWindowSystem, produces windows on the screen containing graphics showing the results of the phases of processing used to classify each fingerprint. Many of the illustrations in this report were made from screen dumps of the graphical demo. The non-graphical version classifies the finger prints but produces no graphics; it is suitable if you do not have XWindows, or for greatest running speed. Both versions optionally produce a stream of messages on the terminal showing which fingerprint the classifier is working on and what phase of processing it is performing, and both versions produce an output file. 4.1.4.3.2 DefaultParametersandSpecifyingParameters The default files needed by the classifier are located in the distribution's top-level pcasys directory. The subdirectory pcasys/images contains a set of images used to create the screens when running the graphics version. The subdirectory pcasys/parms has all the default parameter files used by the classifier. The pcasys/weights directory is split into two subdirectories pnn and mlp, which contain the optimized prototypes for each of the classifiers. The 2700 sample images used by the classifier are located in test/pcasys/data/images. If the user needed to save disk space this directory could be created as a link to the mounted CD-ROM. Please note that if the installation directory is other than /usr/local/nfis, then by default the PCASYS utilities will not know where the parameter files are located in the distribution. In this case, the definition for INSTALL_DIR in the header file include/little.h must be changed prior to compilation. See Section2.1 for installation instructions. 4.1.4.3.3OutputFile The output file has a line for each of the fingerprints that were classified. Each line shows: the fingerprint file name; the actual class(A,L,R,S,T, and W stand for the pattern-level classes arch,leftloop,rightloop,sear,tentedarch,and whorl); the output of the classifier(a hypothesized class and a confidence); the output of the auxiliary pseudo-ridge tracing whorl detector(whether or not a concave-upward shape,aS,conup,T^was found); the final output of the hybrid classifier, which is a hypothesized class and a confidence; and whether this hypothesized class was right or wrong. The output showing the first and last10 sample images using the PNN classifier is: s0024301.wsq: is W; nn: hyp W, conf 0.59; conup y; hyp W, conf 1.00; right s0024302.wsq: is R; nn: hyp R, conf 0.88; conup n; hyp R, conf 0.88; right s0024303.wsq: is R; nn: hyp R, conf 1.00; conup n; hyp R, conf 1.00; right s0024304.wsq: is R; nn: hyp R, conf 1.00; conup n; hyp R, conf 1.00; right s0024305.wsq: is R; nn: hyp R, conf 0.99; conup n; hyp R, conf 0.99; right s0024306.wsq: is L; nn: hyp L, conf 0.99; conup n; hyp L, conf 0.99; right s0024307.wsq: is L; nn: hyp L, conf 0.94; conup n; hyp L, conf 0.94; right s0024308.wsq: is L; nn: hyp L, conf 0.99; conup n; hyp L, conf 0.99; right s0024309.wsq: is L; nn: hyp L, conf 1.00; conup n; hyp L, conf 1.00; right s0024310.wsq: is L; nn: hyp L, conf 1.00; conup n; hyp L, conf 1.00; right... s0026991.wsq: is W; nn: hyp W, conf 1.00; conup y; hyp W, conf 1.00; right s0026992.wsq: is W; nn: hyp W, conf 1.00; conup y; hyp W, conf 1.00; right s0026993.wsq: is T; nn: hyp A, conf 0.79; conup n; hyp A, conf 0.79; wrong s0026994.wsq: is W; nn: hyp W, conf 1.00; conup y; hyp W, conf 1.00; right s0026995.wsq: is W; nn: hyp W, conf 1.00; conup y; hyp W, conf 1.00; right s0026996.wsq: is W; nn: hyp W, conf 0.84; conup y; hyp W, conf 1.00; right s0026997.wsq: is W; nn: hyp W, conf 0.75; conup y; hyp W, conf 1.00; right s0026998.wsq: is L; nn: hyp L, conf 0.84; conup n; hyp L, conf 0.84; right s0026999.wsq: is W; nn: hyp W, conf 1.00; conup y; hyp W, conf 1.00; right s0027000.wsq: is W; nn: hyp W, conf 0.96; conup y; hyp W, conf 1.00; right pct error: 7.07 A L R S T W A 41( 83.7) 3( 6.1) 0( 0.0) 0( 0.0) 4( 8.2) 1( 2.0) L 3( 0.4) 784( 97.5) 3( 0.4) 0( 0.0) 5( 0.6) 9( 1.1) R 7( 1.0) 6( 0.8) 699( 95.1) 0( 0.0) 5( 0.7) 18( 2.4) S 0( 0.0) 4( 80.0) 0( 0.0) 0( 0.0) 1( 20.0) 0( 0.0) T 19( 22.6) 26( 31.0) 14( 16.7) 0( 0.0) 25( 29.8) 0( 0.0) W 1( 0.1) 35( 3.4) 27( 2.6) 0( 0.0) 0( 0.0) 960( 93.8) The last part of the output file is a brief summary of the results. First, there is the percent error, i.e. the percentage of the fingerprints that were classified incorrectly. Following this is a confusion matrix. It has the same format as Table2 and Table3, described in the next section. 4.1.5 Classification Results The fingerprint images used to train and test the PCASYS classifier were taken from NIST Special Database 14(SD14)[20]. This database consists of images scanned from 2700 pairs of standard fingerprint cards. Each pair of cards contains fingerprints taken from a single individual, but captured on two different occasions. One card is the card stored in the FBI file for this person and is denoted the filecard. The other card was sent into be searched against the database and is denoted the searchcard. Each card was scanned at 19.69 pixels per millimeter (500 pixels per inch),then parsed into individual fingerprint images, by cutting out rectangles of predefined locations and dimensions, corresponding to the printed boxes in which the rolled finger impressions were made. We trained(optimized) the main classifiers using fileprints f0000001.wsq through f0024300.wsq of SD14. Then, the finished classification system was made by adding to the classifier the pseudo-ridge tracer, with its parameters set to values that had been arrived at much earlier as a result of testing. With all aspects of the classification system settled, we then tested its accuracy on searchprints s0024301.wsq through s0027000.wsq of SD14. The test set that was used is provided on the CD-ROM in directory test/pcasys/data/images, in the form of the original fingerprint images. The classifier may be run on this entire set if desired, to duplicate the test results, or it may be run on a subset of these prints or on other prints provided by the user. The 24,300 prints from which the NN training feature vectors are derived are not provided on the CD-ROM because there would not be enough space, but the prototype feature vectors themselves are provided( test/pcasys/data/oas). The result of the test was an error rate(fraction of the test prints misclassified) of 7.07% for PNN and 8.19% for MLP. More insight into the behavior of the classifiers can be obtained by examining the confusion matrix of Table2 and Table3. This matrix has a row for each actual class and a column for each hypothesized class, and it shows, as the non-parenthesized numbers, how many test prints fell into each(actual class ,hypothetical class)cell. For example, it shows that 784 of the L(leftloop) prints were classified as L and that 4 of them were classified as R (rightloop). Each parenthesized number is the percentage that its corresponding count comprises of the sum of the counts in that row. For example, the parenthesized numbers show that 97.4% of the L prints were classified as L, and that 0.5% of them were classified as R. The entries shown in boldface correspond to correct classifications. The 7.07%(or 8.19%) error rate and confusion matrix, pertain to the use of the classifier without rejection: it is required to produce a hypothesized class for every print. However, if the classifier is allowed to reject someprints, indicating it is uncertain about the hypothesized class, it can achieve an error rate much lower than 7.07%(or 8.19%)on the prints that it accepts. The confidence number produced by the classifier is used to provide an adjustable rejection level. To implement rejection, it is sufficient to set a confidence threshold, then reject all prints for which the classifier produces a confidence below the threshold. The larger a threshold isused, the greater is the percentage of the prints that are rejected(obviously), but also the smaller is the percentage of the accepted prints that are misclassified. The curves in Figure17 are error vs. reject curves that summarize this behavior, produced from the results of the test runs. Curves are included for a classifier consisting of PNN or MLP alone or with the help of the pseudo-ridge analyzer; clearly the hybrid classifier is more accurate than the PNN or MLP alone, at all rejection levels. Table 2. PNN Confusion matrix Non-parenthesized: Actual count that occurred for that cell. Parenthesized: Percentage of total row sums. Table 3. MLP Confusion matrix Same layout as Table 1. 0 1 2 3 4 5 6 7 8 9 10 0 20 40 60 80 Reject Percentage Err or Per cen tag e mlp.ntrc mlp.trc pnn.ntrc pnn.trc Figure 17. Error versus reject curves for PNN and MLP classifiers and hybrid combinations. Actual HypothesizedClass Class A L R S T W A 41 (83.7) 3(6.1) 0(0.0) 0(0.0) 4(8.2) 1(2.0) L 3 (0.4)784 (97.5) 3(0.4) 0(0.0) 5(0.6) 9(1.1)R 7(1.0) 6(0.8) 699 (95.1) 0(0.0) 5(0.7) 18(2.4)S 0(0.0) 4(80.0) 0(0.0) 0 (0.0) 1(20.0) 0(0.0) T 19(22.6) 26(31.0) 14(16.7) 0(0.0) 25 (29.8) 0(0.0)W 1(0.1) 5(3.4) 27(2.6) 0(0.0) 0(0.0) 960 (93.8) Actual HypothesizedClass Class A L R S T W A 10 (20.4) 20(40.8) 18(36.7) 0(0.0) 0(0.0) 1(2.0) L 0 (0.0)783 (97.4) 4(0.5) 0(0.0) 0(0.0) 17(2.1)R 0(0.0) 9(1.2) 704 (95.8) 0(0.0) 0(0.0) 22(3.0)S 0(0.0) 3(60.0) 2(40.0) 0 (0.0) 0(0.0) 0(0.0) T 0(0.0) 43(51.2) 40(47.6) 0(0.0) 0 (0.0) 1(1.2)W 0(0.0) 29(2.8) 12(1.2) 0(0.0) 0(0.0) 982 (96.0)