NAME
mat2harbo.pl - Convert matrix in Senseclusters sparse format to Harwell-Boeing (HB) format and set input parameters (lap2) for input to SVDPACKC.
SYNOPSIS
mat2harbo.pl [OPTIONS] MATRIX
The file input is a SenseClusters sparse matrix
cat input
Output =>
5 4 12
1 1.5 3 2.5 4 1.0
2 2.5 3 2.5
1 1.5 3 2.5 4 1.0
2 2.5 3 2.5
2 2.5 3 2.5
Convert that to Harwell-Boeing form.
mat2harbo.pl input --title "matrix format convestion" --id "sample" --numform 10f8.4
Output =>
matrix format convestion sample
#
rra 5 4 12 0
(10i8) (10i8) (10f8.4) (10f8.4)
1 3 6 11 13
1 3 2 4 5 1 2 3 4 5
1 3
1.5000 1.5000 2.5000 2.5000 2.5000 2.5000 2.5000 2.5000 2.5000 2.5000
1.0000 1.0000
The Harwell Boeing format stores data in 80 columns. The numform 10f8.4 says that there should be 10 numbers per line, each with 8 numeric values, where 4 digits are to the right of the decimal point.
See http://math.nist.gov/MatrixMarket/formats.html#hb for a detailed explanation of Harwell Boeing format.
Type mat2harbo.pl --help
for a quick summary of options
DESCRIPTION
Converts a sparse matrix in SenseClusters format to Harwell-Boeing (HB) sparse format, which is the format required by SVDPACKC. This program also creates (optionally) the lap2 file which provides parameter settings for SVDPACKC.
INPUT
Required Arguments:
MATRIX
A sparse MATRIX in SenseClusters' format that is to be converted into Harwell Boeing format.
First line should show exactly 3 numbers separated by blanks as :
#nrows #ncols #nnz
where
#nrows = Number of rows
#ncols = Number of columns
#nnz = Total number of non-zero values
in the MATRIX.
Each line thereafter should show a row of the MATRIX in sparse format. A sparse row should be a space separated list of pairs of numbers where the first number shows the column index of a non-zero value and second number is the non-zero value itself that appears at that column index.
Column index counting starts from 1.
Sample MATRIX examples =>
-
5 5 15 2 9 4 9 1 6 2 5 3 7 4 8 5 6 1 4 2 5 1 7 2 6 3 7 1 9 2 8 3 9
Shows a 5 x 5 integer matrix containing total 15 non-zero elements. Each ith line after the first line shows the non-zero elements in the ith row. e.g. 2nd line (1st row) has 2 non-zero values (both 9) at column indices 2 and 4. 6th line (5th row) has 3 non-zero values; 9 at index 1, 8 at index 2 and 9 at index 3.
-
7 8 34 1 0.160 2 -0.059 3 1.864 5 0.724 6 -0.472 7 -0.467 2 -0.209 4 1.487 5 6.728 7 -3.085 8 1.396 1 14.594 3 -2.858 4 -0.618 6 16.510 8 -2.314 3 -0.384 5 -1.189 7 -0.155 8 0.006 1 -0.128 3 0.020 4 -0.125 8 0.039 2 0.062 3 0.058 4 0.016 5 0.057 7 0.407 8 0.015 4 0.033 6 1.377 7 0.074 8 0.994
Shows a 7 x 8 real matrix =>
7 8 0.160 -0.059 1.864 0.000 0.724 -0.472 -0.467 0.000 0.000 -0.209 0.000 1.487 6.728 0.000 -3.085 1.396 14.594 0.000 -2.858 -0.618 0.000 16.510 0.000 -2.314 0.000 0.000 -0.384 0.000 -1.189 0.000 -0.155 0.006 -0.128 0.000 0.020 -0.125 0.000 0.000 0.000 0.039 0.000 0.062 0.058 0.016 0.057 0.000 0.407 0.015 0.000 0.000 0.000 0.033 0.000 1.377 0.074 0.994
Optional Arguments:
--title TITLE
Allows user to specify the Title of the MATRIX which is displayed at Line1 (1-72) of the output HB matrix. If --title is not specified, mat2harbo uses the MATRIX file name as the default title.
--id ID
Programs processing the HB formatted matrix can identify the matrix by the ID specified at Line1 (73-80). Default ID is "harbomat". This identifier is limited to 8 characters.
--cpform CPFORM
Specifies the Column Pointer Format. The column pointer should have the format of type MiN which indicates that each line in Block1 contains M integer pointers each occupying N character spaces. Default format is 10i8.
Note: M x N must be 80.
--rpform RPFORM
Specifies the Row Pointer Format for row pointers in Block2. This has same MiN type of format as --cpform.
--numform NUMFORM
Specifies the Numeric Format to represent the matrix values in Block3.
mat2harbo allows 2 numeric formats :
- 2. MfD.F - which means that there are total M real numbers on each line of block3, each occupying total D digit/character space, of which last F digits show fractional portion. =back
-
Thus, Matrix values could be Integer or Real, selected by specifying a particular format.
Default NUMFORM is (5f16.10) which uses 16 digits for each MATRIX value of which last 10 digits stand for the fractional part and each line contains 5 such real numbers.
Parameter Setting Options :
The options listed in this section create the parameter file (lap2) for las2.c automatically.
--param
Creates the parameter file lap2 that can be directly used while running las2.
--k K
Sets the value of maxprs option in LAP2 file to K i.e. requests K singular triplets from las2. Value of K should not exceed the number of columns of MATRIX. Default K = 300
--rf RF
Reduces the dimensions of the column space of the MATRIX by scaling factor RF i.e. if the MATRIX has N columns, maxprs is set to N/RF where RF > = 1
In other words, N/RF singular triplets are requested from las2. Default RF = 10 that reduces the column space to 10% or preserves 10% of the original dimensions.
If both --k and --rf are specified, maxprs = min(K,N/RF) Thus, default maxprs = min(300,N/10)
--iter I
Specifies the number of iterations for las2. I, if specified, should not exceed the number of columns in the MATRIX and I should be at least as high as maxprs. Default I = min((3 * maxprs),#cols) where maxprs = min(K,N/RF).
Help on setting parameters in file las2.h
The header file las2.h in SVDPACKC specifies values of various constants for las2. This section provides some guidelines on setting these constants for using SenseClusters. Please note that the version of SVDPACKC found in /External has been modified with the settings as described below.
NMAX
Specifies the maximum possible number of columns in the matrix given to las2. las2.h initially has a value of NMAX = 3000, which allows a maximum of 3000 columns. However, we have found this default is too small for many of our experiments, so we recommend setting NMAX much higher. We routinely use a value of 30,000, and will assume that the user has reset NMAX in las2.h to this value in the rest of this discussion.
In general, this value should be higher than NCOLS shown by the 3rd column on the 3rd line in the output of mat2harbo.pl.
NZMAX
Specifies the maximum possible number of non-zero values in the matrix. Initially the settings in las2.h have NZMAX = 100000. However, again we have found this to be too small. If the user sets NMAX to 30,000, and if we assume a 30,000 x 30,000 matrix is approximately 1% dense, NZMAX could be set to 9,000,000 (30,000 x 30,000 / 100). This is the value we routinely use, and we will assume that the user has reset NZMAX to this value in the rest of this discussion.
The user can check the exact NZMAX for their matrix on line 3 column 4 of the output matrix displayed by mat2harbo.pl and then set NZMAX to something higher than that.
LMTNW
This specifies the maximum total memory to be allocated by las2. The initial setting of LMTNW in las2.h is 600000, however, we find that this is often too small. In general, the size of LMTNW is determined by the values you set NMAX and NZMAX to. LMTNW should be at least as large as :
LMTNW = (6*NMAX + 4*NMAX + 1 + NZMAX*NZMAX)
mat2harbo.p assumes that NMAX has been reset to 30,000 and that NZMAX is set to 9,000,000. Thus,
LMTNW = ((6 * 30,000) + (4 * 30,000) + 1 + (30,000 * 30,000))
This leads to the new value for LMTNW of 900,300,001, which is equivalent to a maximum working memory size of 1 GB. We have found this size to be more than adquate to do SVD on a 25,000 x 25,000 matrix.
math2arbo.pl will show an advisory message indicating the minimum size that LMNTW should be set for, and will issue a warning message if the actual size needed for the user matrix exceeds 900,300,001 (approx 1 GB).
Memory is dynamically allocated by las2 depending upon the size of the input matrix, irrespective of the value of LMTNW. In short, LMTNW specifies the upper limit on memory consumption and the actual consumption depends on the size of the matrix. Hence, LMTNW doesn't specify the total memory that las2 will *always* consume rather its an upper limit that could be consumed if necessary.
In case if las2 fails due to insufficient values of these parameters as indicated by the las2.h file, an error message will be shown in output file lao2 suggesting that the matrix is too large or something ... User is adviced to check 3rd line of the matrix in Harwell-Boeing format (as produced by this program) that is given to las2. Check if NCOLS shown at column 3 of line 3 in the HB matrix exceeds NMAX. If so, increase NMAX to something higher than NCOLS. If not, check if NNZ shown by column 4 on line 3 of the HB matrix exceeds NZMAX in las2.h, if so, increase NZMAX. If not, increase the LMTNW to something higher than (6*NMAX + 4*NMAX + 1 + NMAX*NMAX), or simply increase it without too much computations until las2 succeeds :-)
The other problem that a user might notice is that sometimes las2 runs for a very long time like more than few days. In such case, user is advised to restart las2 by reducing the values of parameters 'maxprs' and 'iter' in parameter file lap2. Specifically, the 2nd parameter in lap2 is iter and the 3rd one is maxprs. Remember that, iter has to be >= maxprs.
Other Options :
--help
Displays this message.
--version
Displays the version information.
Harwell Boeing Format
Header Section
Line1 (Title[72], Id[8])
Line2 Skipped (as SVDPack ignores this line)
Line3 (Type[3], 11x, Nrows[14], Ncols[14], NNZ[14], Nrhs[14])
where Type[3] is a 3 Character Field in which
- 1. char[1] =
-
r for Real matrix,
c for Complex matrix,
p for Pattern matrix
- 2. char[2] =
-
u for Unsymmetric matrix,
h for Hermitian matrix (Aij=Aji* where Aji* is a complex conjugate of Aij),
z for Skew Symmetrix matrix
r for Rectangular matrix
- 3. char[3]=
-
a for Assembled matrix
f for Unassembled Finite Elements
Nrows = Number of Rows
Ncols = Number of Columns
NNZ = Number of Non-Zero Elements
Nrhs = Number of Right-Hand Sides (not used in SenseClusters)
Line4
(Pointer_Format[16], Row_index_Format[16], Numeric_Value_Format[20], RHS_Format[20])
Pointers and Row Indices could have MiN type of format which specifies that there are M intergers on each line and each represented with N digits. (M x N must be = 80 as this format only supports column width of maximum 80 characters)
Numeric Values can have either MiN format with same interpretation of M and N as above or MfD.F format which specifies that there are M real numbers on each line, each occupying total D digit space of each last F digits show the fractional part.
Note: D is that total space used to represent a number that includes the decimal point and +/- sign if any.
The above 4 Lines form the Header of the HB sparse matrix.
Data Section
This section contains 3 blocks which contain the non-zero values in the matrix along with their row and column index information.
*************************************************************************
NON-ZERO ENTRIES ARE STORED IN COLUMN ORDER !!!
*************************************************************************
We consider data section of 3 blocks:
BLOCK1 POINTERS
The first block is an array whose entries show the indices (in block3) of the leading non-zero value of every column.
e.g. If a given matrix is
4 6
2 3 0 0 0 1
0 2 0 1 2 0
0 0 2 4 1 0
1 1 0 0 5 0
Then the first block will contain the pointers
[1 3 6 7 9 12 13]
This shows that
The first column begins at the 1st non-zero entry (2) The second column begins at the 3rd non-zero entry (3) [in COLUMN ORDER] The third column begins at the 6th non-zero entry (2) The forth column begins at the 7th non-zero entry (1) and so on ...
*************************************************************************
NULL columns (having no non-zero elements) are not allowed.
*************************************************************************
Note: The column pointers start at 1.
The last entry in @pointers contains an extra pointer pointing to one location after the last entry. So the last index in @pointers will always be #nnz + 1 (where #nnz = total no. of non-zero entries)
BLOCK2 ROW_INDICES
This block stores the row indices of the non-zero matrix entries in column order.
For above matrix, this block will look like
[1 4 1 2 4 3 2 3 2 3 4 1]
which shows that
The 1st non-zero entry is at 1st row. The 2nd non-zero entry is at 4th row. The 3rd non-zero entry is at 1st row and so on ....
Note: Row indices start from 1.
BLOCK3 VALUES
This block contains the actual non-zero values from the matrix in column order.
Thus, the block3 for the above shown matrix will look like
[2 1 3 2 1 2 1 4 2 1 5 1]
General Observations:
- 1. The length(block2)=length(block3) and each is equal to the number of non-zero entries in the matrix.
- 2. The length(block1) = #columns of matrix + 1 as each column will have an entry in block1 that shows the position of the leading non-zero element in it and there are no NULL columns allowed.
-
+1 because there is an extra pointer pointing to the location after the last non-zero entry.
- 3. The column pointers in block1 are also the pointers to block3 entries where the leading(first) non-zero entry of each column is located.
Sample Output
matrix.dat harbomat
#
rra 4 6 11 0
(10i8) (10i8) (8f10.3) (8f10.3)
1 3 5 6 7 10 12
2 3 2 3 4 1 1 2
3 3 4
1.000 1.000 2.000 4.000 1.000 2.000 1.000
2.000 2.000 3.000 1.000
Shows the HB format for a 4 x 6 matrix :
4 6
0 0 0 2 1 0
1 2 0 0 2 0
1 4 0 0 2 3
0 0 1 0 0 1
AUTHORS
Amruta Purandare, University of Pittsburgh
Ted Pedersen, University of Minnesota, Duluth tpederse at d.umn.edu
COPYRIGHT
Copyright (c) 2003-2008, Amruta Purandare and Ted Pedersen
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to
The Free Software Foundation, Inc.,
59 Temple Place - Suite 330,
Boston, MA 02111-1307, USA.
2 POD Errors
The following errors were encountered while parsing the POD:
- Around line 162:
=over should be: '=over' or '=over positive_number'
- Around line 175:
You forgot a '=back' before '=head3'