Scientific journal
Scientific Review. Technical science
ISSN 2500-0799
ПИ №ФС77-57440

A MEASURE OF INFORMATION SIMILARITY FOR SEMISTRUCTURED INFORMATION ANALYSIS

Butakova M.A. 1 Klimanskaya E.V. 1 Yants V.I. 2
1 Rostov State Transport University
2 Rostov State Building University
The paper proposes a new measure of information for the analysis of similarity of semistructured documents based on interference-wave approach. A description of the subject area of research semistructured data is done. There examples of the phenomenon of weak structured documents are presented. The principles storage organization semistructured documents in databases and tools in schema-less existing databases and databases with variable data schema are described. The principle of interference wave vectors and interference is expounded. The formula calculating the measures on the basis vectors of interference is done. The process of indexing and finding relevant information on the measure is described. A modification of the interference-wave measures of similarity information in summary form is developed. Testing of the model on an experimental database is executed. Found that the proposed algorithm for computing least has a linear computational complexity conclusions about the possibility of applying the method in large databases.