Methods of the organization of highly effective specialized repositories of scientific and educational purpose based on cluster computing technologies

A number of approaches and methods was developed for creating of data warehouses for scientific and educational purpose. Developed werehouses unlike most analogues are oriented on the task analysis of unstructured data sets rather than on the problem of documents flow. In particular, the method of warehouse architecture oriented on semantic and advanced analytical processing was designed. Ways to support very large repositories of text data based on the use of cluster computing technologies was developed. A method of analysis of information processing in very large repositories of text data, including automatic abstracting, classification, clustering scientific and educational information was created. Ways to identify and analyze the structure of text information objects focused primarily on unstructured and semi-structured data was created. Techniques which establish associative links between data elements of information objects was developed. A method for assessing the originality of text scientific and educational information objects and resources was proposed. A way to visualize text warehouse data based on the data presentation in the form of graph was created. The developed methods and techniques were implemented in the experimental prototype of heterogeneous data warehouse. Thus the work offers scientific basis for the realization of a new type of cluster-centric warehouses for mostly unstructured or semi-structured data that are designed for use in scientific and educational field. Also, the proposed methods allow creation of means to automate the structuring, cataloging, semantic-based search for scientific and educational data.

Microsoft Office document icon 2300-f.doc259.5 KB