"FPGA-coprocessor Enhanced Ant Colony Systems Data Mining"
Jason C. Isaacs and Simon Y. Foo
Florida A&M University - Florida State University
Data mining has recently become a popular research topic. The increase in the use of computers has resulted in an explosion of information that can be used to find hidden knowledge. The term data mining, coined in 1995 by Evangelos Simoudis of IBM, defined it as: “The process of extracting previously unknown, comprehensible, and actionable information from large databases and using it to make crucial business decisions.” Data mining spans many disciplines including computer science, machine learning, artificial intelligence, parallel algorithms, pattern recognition, and database. The core of data mining is traced back to classical statistics, artificial intelligence, and machine learning. Statistics are the foundation of most technologies on which data mining is built and includes regression, variance, and discriminant analysis concepts.
We are developing new computation, simulation and data analysis methods through the unique exploration of Ant Colony System (ACS) engineering. Our attempt to create a full function data mining system using bio-inspired semi-autonomous agents takes advantage of computation saving innovations such as context-reconfigurable hardware and fast data pre-processing algorithms. The major feature of this system is that the mining algorithm is evolving through the use of genetic algorithms combined with ACS, affording the algorithm environment specific malleability. The expected result is an evolvable ACS providing solution sets for data mining problems implemented on the FPGA. Our target device is a PCI Virtex II development board.
This focus of this paper is to analysis an ACS clustering technique, a subset of ACS data mining mentioned above, specifically pattern classification over various datasets. The major steps of the tests are as follows: (1) Data collection - considering size, feature space, and the number of classes. (2) Data preprocessing - removing noise, handling missing data, data normalization and rescaling. (3) Data compressing - using PCA. (4) Pattern classification/clustering - using ACS. It is steps 3 and 4 that are implemented in hardware.
Five datasets were chosen from UCI Machine Learning Repository. They are three two-class datasets of Wisconsin Breast Cancer Databases including Original database (WDBC-9), New diagnostic database (WDBC-30) and New prognostic database (WPBC), and the other 2 three-class dataset: Wine Recognition database (WINE) and Iris Plant database (IRIS).
KEYWORDS: Data Mining, Ant Colony Systems, FPGA Co-Processor
2005 MAPLD International Conference Home Page