IJMTES – CLASSIFICATION OF IMBALANCE PROBLEM BY MWMOTE AND SSO

Journal Title : International Journal of Modern Trends in Engineering and Science

Author’s Name : S.Jayasree | A Alice Gavya

Volume 02 Issue 05  Year 2015

ISSN no: 2348-3121

Page no: 1-4

Abstract In this paper, a resampling ensemble algorithm is developed focused on the classification problems for imbalanced datasets. In this method, the small classes are oversampled and large classes are undersampled. The resampling scale is determined by the ratio of the minimum number of class and maximum number of class. Oversampling for “small” classes is done by MWMOTE technique and undersampling for “large” classes is performed according to SSO technique. Our aim is to reduce the time complexity as well as the enhancement of accuracy rate of classification result.

Keywords— Imbalanced classification, Resampling algorithm, SMOTE, MWMOTE, SSO

Reference

[1] Yun Qian, et al., “A resampling ensemble algorithm for classification of imbalance problems”, Neurocomputing, Elsevier(2014).
[2] Sukarna Barua, et al., “MWMOTE – Majority Weighted Minority Oversampling Technique for imbalanced dataset learning”, Proc. IEEE transactions on Knowledge and Data Engineering, pp. 405-425, (2014).
[3] Pengyi Yang, et al., “Sample Subset Optimization Techniques for Imbalanced and Ensemble Learning Problems in Bioinformatics Applications”, proc. IEEE transactions on Cybernetics, pp. 445-455, (2014).
[4] T. Jo and N. Japkowicz, “Class Imbalances versus Small Disjuncts,” ACM SIGKDD Exploration Newsletter, Vol. 6, No. 1, pp. 40-49, (2004).
[5] R.C. Prati, G.E.A.P.A. Batista, and M.C. Monard, “Class Imbalances versus Class Overlapping: An Analysis of a Learning System Behavior,” Proc. Mexican Int’l Conf. Artificial Intelligence, pp. 312-321, (2004).
[6] N. Japkowicz and S. Stephen, “The Class Imbalance Problem: A Systematic Study,” Intelligent Data Analysis, Vol. 6, No. 5, pp. 429- 449, (2002).
[7] J. Zhang and I. Mani, “KNN Approach to Unbalanced Data Distributions: A Case Study Involving Information Extraction,” Proc. Int’l Conf. Machine Learning, Workshop Learning from Imbalanced Data Sets, (2003).
[8] R.C. Holte, L. Acker, and B.W. Porter, “Concept Learning and the Problem of Small Disjuncts,” Proc. Int’l Joint Conf. Artificial Intelligence, pp. 813-818, (1989).
[9] G.E.A.P.A. Batista, R.C. Prati, and M.C. Monard, “A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data,” ACM SIGKDD Explorations Newsletter, vol. 6, No. 1, pp. 20-29, (2004).
[10] J. Wu, H. Xiong, P. Wu, and J. Chen, “Local Decomposition for Rare Class Analysis,” Proc. Int’l Conf. Know. Discovery and Data Mining (KDD), pp. 814-823, (2007).
[11] D.A. Cieslak and N.V. Chawla, “Start Globally, Optimize Locally, Predict Globally: Improving Performance on Imbalanced Data,” Proc. IEEE Int’l Conf. Data Mining, pp. 143-152, (2008).
[12] C. Rao, “A Review of Canonical Coordinates and an Alternative to Correspondence Analysis Using Hellinger Distance,” Questiio´: Quaderns d’Estadı´stica, Sistemes, Informatica i Investigacio´ Operativa, Vol. 19, pp. 23-63, (1995).

Full Paper: Click Here