IJMTES – A REVIEW ON EFFICIENT BIG DATA ANALYTICAL ARCHITECTURE FOR RETRIEVAL OF DATA BASED ON WEB SERVER

Journal Title : International Journal of Modern Trends in Engineering and Science

Author’s Name : Prof.Pramod Deshmukh | Pankaj Patil | Yogesh Lokare | Ajay Katware  unnamed

Volume 03 Issue 07 2016

ISSN no:  2348-3121

Page no: 76-80

Abstract – Nowadays cloud has been a computational and storage solution for several information centric organizations. The problem nowadays those organizations facing from the cloud are in data searching in an efficient manner. An efficient framework is needed to distribute the work of searching and fetching from thousands of computers. The data in HDFS is distributed and needs lots of time to retrieve. HDFS requires lots of time to retrieve the data. As a result there is need to design a web server in the map phase by using the jetty web server which will provides a fast and efficient way of searching data in MapReduce paradigm. For real time processing on Hadoop, a searchable mechanism is implemented in HDFS by creating a multilevel index in web server with multi-level index keys and indexing in DataNode. The web server can be used to handle traffic throughput. By means of web clustering technology we can improve the application overall performance. To keep the work down, the load balancer should be able to distribute load among the newly added nodes in the server.  

Keywords— Hadoop; MapReduce;compute cloud ; Web Serer; load balancing 

Reference

  1. A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy, “Hive – a warehousing solution over a Map-Reduce framework,” Proceeding very large data bases (PVLDB), 2009, vol. 2, pages 1626–1629.
  2. Bhandarkar, M. “MapReduce programming with apache Hadoop”, parallel & distributed processing international symposium on digital object identifier, IEEE, April 20 2010, pages 1-5.
  3. G. SubhaSadasivam, V.Karthikeyan, P. Raja, “Design and implementation of a novel peer-to- peer information retrieval framework”, March 2008, pages: 1-9
  4. J. Dean and S. Ghemawat, “MapReduce: Simplified data processing on large clusters,” in Operating system design and implementation (OSDI), a computer science conference, 2004, pages: 137–150.
  5. Jin Ming Shih;Chih Shan;Ruay Shiung Chang, “Simplifying MapReduce data processing”, Utility and cloud computing(UCC), 4th IEEE International conferences on digital object identifier,2011,pages:366-370.
  6. Krishnan S.counio, J.C. “Pepper: An Elastic web server farm for cloud based on Hadoop” Cloud computing technology and science, IEEE second international conferences on digital object identifier, 2010, pages 741-747.
  7. Tom White (June 2009), “Hadoop: The Definitive Guide”, First Edition, Yahoo press, June 2009.
  8. Apache Hadoop project (2012). [Online] Available: http://hadoop.apache.org/
  9. Shvachko,K.; Hairong Kuang; Radia, S.; Chansler, R. “The Hadoop Distributed File System”, Mass storage system and technologies (MSST), IEEE 26th Symposium on digital object identifier, 2010, pages:1-10.
  10. Yongqiang He, Rubao Lee, Yin Huai, Zheng Shao, Jain,N., Xiaodong Zhang, ZhiweiXu, “RCFile: A fast and space efficient data placement structure in MapReduce based warehouse systems”, in Data engineering(ICDE), IEEE 27th international conference on digital object identifier,2011,pages:1199- 1208.
  11. Xingguo Cheng,Nanfeng Xiao,Faliang Huang “Research on HDFS- based Web Server Cluster”, 2011 International conference on Digital object identifier,pages:1-4.
  12. Steffen Heinzl,Christoph Metz, “Toward a Cloud-ready Dynamic Load Balancer Apache Web Sserver”, 2013 Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprise, pages:242-245
  13. NIST Definition of Cloud computing v15, csrc.nist.gov/groups/SNS/cloud-computing/cloud-defv15.doc.
  14. Daniel Grosu, M.S, “Load Balancer in Distributed Systems: A Game Theoretic Approach”, The University of Texes at San Antonio, May 2003.
  15. The apache software foundation, “mod proxy,” hht://httpd.apache.org/docs/2.2/mod/mod proxy.html.
  16. “Mod proxy balancer,” http://httpd.apache.org/docs/2.2/mod/mod proxy balancer.html.
  17. Hsueh-Yi Chang,Hung-Chang Hsiao Yu-Change Chao, “the load rebalancing problem in distributed file system”, IEEE international conference on cluster computing, 2012,pages:117-125
  18. Hung-Chang Hsiao,Hsueh-Yi Chung,Haiying Shen,Yu-Cchang Chao, “Load rebalancing for distributed file systems in clouds”, IEEE transaction on parallel and distributed system,vol-24,no 5,may 2013,pages:951-961.
Scroll Up