METHOD FOR MINING HIGH-UTILITY PATTERNS IN TRANSACTION STREAM DATA BASED ON LINKED LIST STRUCTURE
Abstract
Mining valuable patterns in data streams presents a significant challenge in the field of data mining. This task is crucial as it allows for the identification of highly profitable item sets within transaction databases. However, as new transactions are continually added, new valuable patterns emerge, thus changing the usefulness of previously analyzed data. It is essential to promptly update information regarding these changes to enable effective business decision-making. Consequently, existing mining methods applied to transaction flow datasets require considerable time to identify new patterns and update information related to new transactions. This article focuses on the research and proposal of a new transaction stream data mining method called High-Utility Stream Linked-List Mining. The method utilizes a linked list structure known as the High-Utility Stream Linked List (HUSLL) to store information about patterns in the database. Mining and updating transaction information are directly performed on the HUSLL structure. Experimental results demonstrate that this novel mining method exhibits more efficient execution times compared to previous solutions.
References
Yun, U., Ryang, H., Lee, G., & Fujita, H. (2017). An efficient algorithm for mining high-utility patterns from incremental databases with one database scan. Knowledge-Based Systems, 124, 188–206.
Yao, H., & Hamilton, H. J. (2006). Mining itemset utilities from transaction databases. Data & Knowledge Engineering, 59, 603–626.
Yao, H., Hamilton, H. J., & Butz, C. J. (2004). A foundational approach to mining itemset utilities from databases. Proceedings of the 2004 SIAM International Conference on Data Mining, (pp. 482–486).
Pisharath, J., Liu, Y., Liao, W.-k., Choudhary, A., Memik, G., & Parhi, J. (2005). NU-MineBench 2.0. Tech. rep., Citeseer.
Liu, Y., Liao, W.-k., & Choudhary, A. (2005). A two-phase algorithm for fast discovery of high-utility itemsets. Advances in Knowledge Discovery and Data Mining: 9th Pacific-Asia Conference, PAKDD 2005, Hanoi, Vietnam, May 18-20, 2005. Proceedings 9, (pp. 689–695).
Liu, J., Wang, K., & CM Fung, B. (2012). Direct discovery of high-utility itemsets without candidate generation. 2012 IEEE 12th international conference on data mining, (pp. 984–989).
Lin, J. C.-W., Gan, W., Hong, T.-P., Zhang, B., & others. (2015). An incremental high-utility mining algorithm with transaction insertion. The Scientific World Journal, 2015.
Li, Y.-C., Yeh, J.-S., & Chang, C.-C. (2008). Isolated items discarding strategy for discovering high-utility itemsets. Data & Knowledge Engineering, 64, 198–217.
Li, H.-F., Huang, H.-Y., Chen, Y.-C., Liu, Y.-J., & Lee, S.-Y. (2008). Fast and memory efficient mining of high-utility itemsets in data streams. 2008 eighth IEEE international conference on data mining, (pp. 881–886).
Krishnamoorthy, S. (2017). HMiner: Efficiently mining high-utility itemsets. Expert Systems with Applications, 90, 168–183.
Fournier-Viger, P., Wu, C.-W., Zida, S., & Tseng, V. S. (2014). FHM: Faster high-utility itemset mining using estimated utility co-occurrence pruning. Foundations of Intelligent Systems: 21st International Symposium, ISMIS 2014, Roskilde, Denmark, June 25-27, 2014. Proceedings 21, (pp. 83–92).
Fournier-Viger, P., Lin, J. C.-W., Gueniche, T., & Barhate, P. (2015). Efficient incremental high-utility itemset mining. In Proceedings of the ASE BigData & SocialInformatics 2015 (pp. 1–6).
Fournier-Viger, P., Chun-Wei Lin, J., Truong-Chi, T., & Nkambou, R. (2019). A survey of high-utility itemset mining. High-utility pattern mining: Theory, algorithms and applications, 1–45.
Duong, Q.-H., Fournier-Viger, P., Ramampiaro, H., Nørvåg, K., & Dam, T.-L. (2018). Efficient high-utility itemset mining using buffered utility-lists. Applied Intelligence, 48, 1859–1877.
Ahmed, C. F., Tanbeer, S. K., & Jeong, B.-S. (2010). Efficient mining of high-utility patterns over data streams with a sliding window method. Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing 2010, 99–113.
Agrawal, R., Srikant, R., & others. (1994). Fast algorithms for mining association rules. Proc. 20th int. conf. very large databases, VLDB, 1215, pp. 487–499.
Yeh, J.-S., Chang, C.-Y., & Wang, Y.-T. (2008). Efficient algorithms for incremental utility mining. Proceedings of the 2nd international conference on Ubiquitous information management and communication, (pp. 212–217).
Ahmed, C. F., Tanbeer, S. K., Jeong, B.-S., & Lee, Y.-K. (2009). Efficient tree structures for high-utility pattern mining in incremental databases. IEEE Transactions on Knowledge and Data Engineering, 21, 1708–1721.
Zheng, H.-T., & Li, Z. (2015). iCHUM: an efficient algorithm for high-utility mining in incremental databases. Knowledge Science, Engineering and Management: 8th International Conference, KSEM 2015, Chongqing, China, October 28-30, 2015, Proceedings 8, (pp. 212–223).
Yun, U., & Ryang, H. (2015). Incremental high-utility pattern mining with static and dynamic databases. Applied intelligence, 42, 323–352.
FoodMart2000,. Microsoft. Developer. Network. (MSDN). (n.d.). Retrieved from http://msdn.microsoft.com/enus/library/aa217032(v=sql.80).asp
Philippe, F.-V., Yimin, Z., Jerry, C.-W., Hamido, F., & Yun, S. (2019). Mining Local and Peak High-utility Itemsets. Information Sciences, 481, 344--367. Retrieved from http://www.philippe-fournier-viger.com/spmf/index.php?link=datasets.php
Crimes in Chicago. (n.d.). Retrieved from https://www.kaggle.com/datasets/currie32/crimes-in-chicago
Han, J., Pei, J., & Yin, Y. (2000). Mining frequent patterns without candidate generation. ACM sigmod record, 29(2), 1-12.
Zaki, M. (2000). Scalable algorithms for association mining. IEEE transactions on knowledge and data engineering, 12(3), 372-390.
Agarwal, R., Aggarwal, C., & Prasad, V. (2001). A tree projection algorithm for generation of frequent itemsets. Journal of parallel and Distributed Computing, 61(3), 350-371.
Pyun, G., Yun, U., & Ryu, K. (2014). Efficient frequent pattern mining based on linear prefix tree. Knowledge-Based Systems, 55, 125-139.
Yao, H., Hamilton, H., & Geng, L. (2006). A unified framework for utility-based measures for mining itemsets. Proc. of ACM SIGKDD 2nd Workshop on Utility-Based Data Mining, (pp. 28-37).