International Journal of Engineering
Trends and Technology

Research Article | Open Access | Download PDF

Volume 44 | Number 2 | Year 2017 | Article Id. IJETT-V44P241 | DOI : https://doi.org/10.14445/22315381/IJETT-V44P241

Review of Web Clustering Algorithms and Evaluation


Sarika, Mukesh Rawat

Citation :

Sarika, Mukesh Rawat, "Review of Web Clustering Algorithms and Evaluation," International Journal of Engineering Trends and Technology (IJETT), vol. 44, no. 2, pp. 211-214, 2017. Crossref, https://doi.org/10.14445/22315381/IJETT-V44P241

Abstract

Clustering is a procedure of dividing an arrangement of information articles into an arrangement of significant sub-classes, called clusters. Clustering discovers groups of information protests that are comparable in some sense to each other. The individuals from a cluster are more similar to each other than they resemble individuals from different clusters. The objective of clustering is to discover brilliant clusters with the end goal that the between group likeness is low and the intra-group similitude is high. Clustering should be possible by various techniques, for example, Hierarchical,Partitioning,Density based, Grid based and so forth .In Clustering, Hierarchical Clustering is a strategy for group examination which looks to fabricate a chain of command of the groups. Generally Hierarchical Clustering fall into two types: Agglomerative: This is a “bottom up" approach: every perception begins in its own group, and combines of groups are converged as one climbs the order. Divisive: This is a "top down" approach: all perceptions begin in one group, and parts are performed recursively as one moves down the pecking order. The motivation behind the Clustering system is to cluster the data from a massive information set and make over it into a sensible frame for supplementary reason. Clustering is a noteworthy errand in information examination and information mining applications.

Keywords

Clustering, Hierarchical clustering, Sub-classes,Agglomerative Hierarchical clustering, Divisive Hierarchical clustering

References

1. Nicholas O. Andrews and Edward A. Fox, “Recent Developments in Document Clustering”, thesis, October 16, 2007.
2. Jain and R. Dubes. “Algorithms for Clustering Data.” Prentice Hall, 1988.
3. Chris Staff: Bookmark Category Web Page Classification Using Four Indexing and Clustering Approaches. AH 2008:345-348.
4. Han J., Kamber M.,”Data Mining: Concepts and Techniques,” Morgan Kaufmann (Elsevier), 2006.
5. seung-sikh,”Keyword based document clustering”, report, school of cs, kookim university.seoul,korea.
6. Swatantra kumar sahu*,” Classification of Document clustering Approaches”, International Journal of Advanced Research in Computer Science and Software Engineering, ISSN: 2277 128X, Volume 2, Issue 5, May 2012.
7. Charu C. Aggarwal,” A SURVEY OF TEXT CLUSTERING ALGORITHMS”, rport, IBM T. J. Watson Research Center Yorktown Heights, NY. Anna Huang,” Similarity Measures for Text Document Clustering”, report, Department of Computer Science,The University of Waikato, Hamilton, New Zealand.
8. C. Aggarwal, S. Gates, and P. Yu. On the merits of building categorization systems by supervised clustering. In Proceedings of (KDD) 99, 5th (ACM) International Conference on Knowledge Discovery and Data Mining, pages 352–356, San Diego, US, 1999. ACM Press, New York, US.
9. Deepti Gupta,Komal Kumar Bhatia, A.K. Sharma, A Novel Indexing Technique for Web Documents using Hierarchical Clustering, IJCSNS International Journal of Computer Science and Network Security, VOL.9 No.9, September 2009.
10. S. Chakrabarti. Data mining for hypertext: A tutorial survey. SIGKDD Explorations: Newsletter.
11. S. Guha, N. Mishra, R. Motwani, and L. O’Callaghan. Clustering data streams. In IEEE Symposium on Foundations of Computer Science, pages 359–366, 2000.
12. F. Beil, M. Ester, X. Xu. Frequent term-based text clustering, ACM KDD Conference, 2002.
10. N. Slonim, N. Tishby. Document Clustering using word clusters via the information bottleneck method, ACM SIGIR Conference, 2000.
13. Zamir, O. Etzioni. Web Document Clustering: A Feasibility Demonstration, ACM SIGIR Conference, 1998.

Time: 0.0017 sec Memory: 36 KB
Current: 1.9 MB
Peak: 4 MB