Abstract
Closed Itemset mining is a major task both in Data Mining and Formal
Concept Analysis. It is an efficient way to determine patterns which are
hidden in raw data. These patterns may be expressed as association rules
which capture relations between variables in databases. In the case of Formal
Concept Analysis, Closed Itemsets form the basis of formal concepts. A
concept consists of an extent and intent. It turns out that the intent is
exactly a closed itemset whose support is the cardinality of extent. With
the increasing application of Data Mining and Formal Concept Analysis in
knowledge discovery, Closed Itemset mining is fast becoming an attractive
research area.
While many prominent algorithms have been developed to mine closed
itemsets in both Formal Concept Analysis and Frequent Closed Itemset mining
area, they are typically unsuitable for distributed implementation. The
root cause is that the theoretical foundation of closed itemset mining is
based on a centralized data representation. By examining the features of
closed itemset, we propose distributed closed itemset computing theories for
Formal Concept Analysis and Frequent Closed Itemset mining in a distributed
environment. Additionally, taking the MapReduce framework as our
inspiration we propose a general distributed approach for performing closed
itemset mining. We then provide representative exemplars of how the classic
centralized algorithms in Formal Concept Analysis and Frequent Closed
Itemset mining can be implemented in a distributed fashion using our methodology
and present a family of MR* algorithms, where the abbreviation
stands for MapReduce and * stands for the underlying algorithm that has
been adapted. To analyse the factors which impact a distributed algorithm's
performance, we compare our MR* algorithms with the state-of-the-art. Experiments
conducted on real datasets demonstrate that the MR* algorithms
for Formal Concept Analysis are efficient and scalable. The MR* algorithm
and theory for Frequent Closed Itemset mining of this work set the scene for
future developments. We analyze our experimental results and discuss the
bottlenecks which we will focus on in future work.
Original language | English |
---|---|
Awarding Institution | |
Supervisors/Advisors |
|
Publication status | Unpublished - 2011 |
Keywords
- Distributed algorithms