Mining Sequential Access Pattern with Low Support From Large Pre-Processed Web Logs
Abstract
Problem statement: To find frequently occurring Sequential patterns from web log file on the basis of minimum support provided. We introduced an efficient strategy for discovering Web usage mining is the application of sequential pattern mining techniques to discover usage patterns from Web data, in order to understand and better serve the needs of Web-based applications. Approach: The approaches adopt a divide-and conquer pattern-growth principle. Our proposed method combined tree projection and prefix growth features from pattern-growth category with position coded feature from early-pruning category, all of these features are key characteristics of their respective categories, so we consider our proposed method as a pattern growth, early-pruning hybrid algorithm. Results: Our proposed Hybrid algorithm eliminated the need to store numerous intermediate WAP trees during mining. Since only the original tree was stored, it drastically cuts off huge memory access costs, which may include disk I/O cost in a virtual memory environment, especially when mining very long sequences with millions of records. Conclusion: An attempt had been made to our approach for improving efficiency. Our proposed method totally eliminates reconstructions of intermediate WAP-trees during mining and considerably reduces execution time.
DOI: https://doi.org/10.3844/jcssp.2010.1293.1300
Copyright: © 2010 S. Vijayalakshmi and V. Mohan. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
- 3,651 Views
- 2,949 Downloads
- 5 Citations
Download
Keywords
- Data mining
- sequential pattern mining
- frequent pattern mining
- web usage mining
- hybrid algorithm
- WAP-tree