Parallel Mining for High Utility Itemsets Mining by Efficient Data Structure
AbstractMining high utility itemsets in transaction database is an important task in data mining and widely applied in many areas. Recently, many algorithms have been proposed, but most algorithms for identifying high utility itemsets need to generate candidate sets by overestimating their utility and then calculating their exact utility value. Therefore, the number of candidate itemsets is much larger than the actual number of high utility itemsets. In this paper, we introduce the Retail Transaction-Weighted Utility (RTWU) structure and propose two algorithms: EAHUIMiner algorithm and PEAHUI-Miner parallel algorithm. They have been experimented and compared to the two most efficient algorithms: EFIM and FHM. Results show that our algorithm is better with sparse datasets.
R. Agrawal, “Fast algorithms for mining association rules in large databases,” in Proceedings of the 20th International Conference on Very Large Data Bases, 1994, pp. 478–499.
W. Song, Y. Liu, and J. Li, “Vertical mining for high utility itemsets,” in IEEE International Conference on Granular Computing (GrC). IEEE, 2012, pp. 429–434.
G.-C. Lan, T.-P. Hong, and V. S. Tseng, “An efficient projection-based indexing approach for mining high utility itemsets,” Knowledge and information systems, vol. 38, no. 1, pp. 85–107, 2014.
D. H. Phong and N. M. Hung, “Một mô hình hiệu quả khai phá tập mục lợi ích cao,” Research and Development on Information and Communication, pp. 26–36, Jun. 2015.
V. S. Tseng, C.-W. Wu, B.-E. Shie, and P. S. Yu, “Up-growth: an efficient algorithm for high utility itemset mining,” in Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2010, pp. 253–262.
S. Dawar and V. Goyal, “Up-hist tree: An efficient data structure for mining high utility patterns from transaction databases,” in Proceedings of the 19th International Database Engineering & Applications Symposium. ACM, 2015, pp. 56–61.
M. Liu and J. Qu, “Mining high utility itemsets without candidate generation,” in Proceedings of the 21st ACM international conference on Information and knowledge management. ACM, 2012, pp. 55–64.
S. Zida, P. Fournier-Viger, J. C.-W. Lin, C.-W. Wu, and V. S. Tseng, “Efim: a highly efficient algorithm for high-utility itemset mining,” in Mexican International Conference on Artificial Intelligence. Springer, 2015, pp. 530–546.
P. Fournier-Viger, C.-W. Wu, S. Zida, and V. S. Tseng, “Fhm: faster high-utility itemset mining using estimated utility co-occurrence pruning,” in International symposium on methodologies for intelligent systems. Springer, 2014, pp. 83–92.
Y. Liu, W. keng Liao, and A. Choudhary, “A two-phase algorithm for fast discovery of high utility itemsets,” in Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2005, pp. 689–695.
Philippe Fournier-Viger’s site. http://www.philippe-fournierviger.com.