Virtual Computational Chemistry Laboratory

Input data Output results Examples List List of key words



Input data format

The program process data that are in a tabular form. Each data entry (onecase) corresponds to one row and each variable corresponds to one column.For example, the Kubinyi'sset contains 5 independent (input, X1-X5) and one dependent (output,Y) variable and 8 data entries.

You can copy- paste  this set into the program as:
 
(A):(B):


8
0  0  1  1  0   1
1  0  0  1  1   1
0  1  1  2  1   2
2 -2  2  3  1   2
0  0  1  1  1   1
1  0  1  2  1   2
1  0  0  1  1   1
0  3 -1  3  .99 2.1
8
1   0  0  1  1  0
1   1  0  0  1  1
2   0  1  1  2  1
2   2 -2  2  3  1
1   0  0  1  1  1
2   1  0  1  2  1
1   1  0  0  1  1
2.1 0  3 -1  3  .99


The position of Y-values in both tables is:
X1 X2 X3 X4 X5  Y
Y is the last
Y   X1 X2 X3 X4 X5
Y is the first

The default way is (A), with dependent variable (Y) followingindependent variables (X1-X5). In the case (B) you should use REVERSED=1.The first line of data should indicate number of rows (data entries) thatare available in the data for the training data set.

Suppose, you want to use 2 last rows as a test set. This can be doneby :

6 2
0  0  1  1  0   1
1  0  0  1  1   1
0  1  1  2  1   2
2 -2  2  3  1   2
0  0  1  1  1   1
1  0  1  2  1   2
1  0  0  1  1   1
0  3 -1  3  .99 2.1

The program will know that there are two data set. The first onewill be used for training (and in general, always the first)and the second one to test the algorithm performance.  Up to 10 setscan be added in the same way and only the first set will be used to trainthe program.

If you do not know the target values of the test set, the first lineshould be changed to:

6 -2
0  0  1  1  0   1
1  0  0  1  1   1
0  1  1  2  1   2
2 -2  2  3  1   2
0  0  1  1  1   1
1  0  1  2  1   2
1  0  0  1  1
0  3 -1  3  .99

If data sets can contains names of data entries, this should be indicatedby  NAMES=1. An example of the same dataset with names is:

8
1 0  0  1  1  0   1
999999 1  0  0  1  1   1
This_is_a_long_name 0  1  1  2  1  2
The_name_can_be 2 -2  2  3  1   2
any_character  0  1  1  1   1
@3$$091  0  1  2  1   2
but 1  0  0  1  1   1
no_space_and_tabs! 0  3 -1  3  .99 2.1

You can also see that there is no requirement for alignment of datain columns. The data can be separated with any number of tabs and spaces.

See FAQ if you have questions. How to cite this applet? Are you looking for a new job in chemoinformatics?

Copyright 2001 -- 2023 https://vcclab.org. All rights reserved.