IJMTES – IMPROVING EFFICIENCY AND ACCURACY OF FINDING BEST ATTRIBUTES TO GENERATE DECISION TREE FOR MINING DATA STREAMS

Journal Title : International Journal of Modern Trends in Engineering and Science

Author’s Name : K Rajesh Kannan | M Gokul | R Natesan | T R Lekhaa

Volume 02 Issue 06  Year 2015 

ISSN no: 2348-3121

Page no: 20-24

Abstract The problem of selecting best attribute in the considered node using the sample data and which have to be same while considering whole data stream. In mining data streams, the existing system is the Hoeffding tree algorithm and the ID3 algorithm for decision tree learning system. They are incorrectly mathematically defensible or time consuming. In proposed system, there is a solid mathematical concept. This method will be more accurate when compared to the existing system. The algorithms mentioned above are designed for static data sets and cannot be directly applied to the data streams and significant modifications needed. The dominant problem is to determine the best attribute in each node, since the stream is of infinite size.ID3 algorithm provides the background of our method, it will be briefly presented. The ID3 algorithm is intended to produce non binary trees, however it can be easily transformed to the binary mode. The proposed system focus only on the binary case, all the presented methods can be adapted to non binary trees. During the learning process, in each created node Lq, a particular subset Sq of the training data set S is processed. If all elements of set Sq are of same class, the node is tagged as a leaf and the split is not made. Then according to the split-measure function, best attribute to split is chosen among available attributes in the considered node.

Keywords— Best Attribute; Data Stream; Decision Tree; ID3 Algorithm

Reference

[1] C. Aggarwal, Data Streams: Models and Algorithms.Springer, (2007).
[2] A. Bifet, G. Holmes, G. Pfahringer, R. Kirkby, and R. Gavalda,“New Ensemble Methods for Evolving Data Streams,”Proc. 15thACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining(KDD ’09),June/July (2009).
[3] A. Bifet and R. Kirkby, “Data Stream Mining a Practical Approach,” technical report, Univ. of Waikato,(2009).
[4] P. Domingos and G. Hulten, “Mining High- L. Breiman, J.H. Friedman,R.A.Olshen and C.J. Stone, Speed Data Streams,”Proc. Sixth ACM SIGKDD Int’l Conf. Knowledge Discovery and DataMining,. 71-80, (2000).
[5] C. Franke, “Adaptivity in Data Stream Mining,” PhD dissertation,Univ. of California, (2009).
[6] J. Gama, R. Fernandes, and R. Rocha, “Decision Trees for Mining Data Streams,”Intelligent Data Analysis, Vol. 10, No. 1, pp. 23-45,Mar. (2006).
[7] J. Gao, W. Fan, and J. Hang, “On Appropriate Assumptions to Mine Data Streams: Analysis and Practice,”Proc. IEEE Int’l Conf.Data Mining (ICDM ’07),Oct. (2007).
[8] J. Han and M. Kamber, Data Mining: Concepts and Techniques,second ed., Elsevier, (2006).
[9] J. Hertz, A. Krogh, and R.G. Palmer, Introduction to the Theory of Neural Computation. Addison-Wesley,(1991).

Full Paper: Click Here