Development and implementation of methodology for intellectual data analysis using Bayesian networks theory and regression analysis
A new two-stage method for intellectual data analysis is proposed that combines Bayesian networks theory and regression analysis. The method is based on two sets of mathematical techniques. The first one is used for constructing topology of Bayesian network and forming probabilistic inference. The inference is used further on for decision making on the basis of forecast estimates. The second set of methods is used for development of regression model with making use of logistic link function that serves as a basis for forecast estimation. The modeling results and forecast estimates are used to construct a combined integrated forecast estimate that provides a decision maker with statistically substantiated recommendations regarding further development of the process under study. To construct the model in the form of Bayesian network a mutual information is used for selection of statistically significant process variables, and to construct the network topology the functional is hired based on the minimum description length. To select optimal threshold state values the weight of evidence method is implemented. An optimal regression model is constructed by the direct sequential inserting of independent variables using the value of . To calibrate the combined model constructed the technology of dividing the data sample into training, control and testing sets is hired with application to stratified analyzed variable.
An original architecture of decision support system for modeling and forecasting was developed on the basis of the methods and algorithms proposed. The proposed two-stage method has been implemented in the form of program module DMTwoStage in SAS/IML programming language that can be implemented on any SAS family platform. The use of the universal SAS platform provides a possibility for a quick operative modification of the computing procedures developed thanks to the open modular architecture of the computer based decision support system for modeling and forecasting of development the processes under study. A set of practical problems has been solved using the DMTwoStage program module and practically significant results were received in the form of mathematical models and combined forecast estimates based on these models.
To test correctness of the theory developed a set of analytical procedures and computing experiments has been carried out with the use of substantial volumes of statistical data and expert estimates.