minimum k union algorithm implementation

Author:Staal A. Vinterbo
Copyright:2009 Staal A. Vinterbo
Version:generated for 0.01

Module Synopsis

Let m be a list of lists representation of a collection of sets, and let k be a positive integer. Then:

from minkunion import minkunion
d = minkunion(m, k)

produces a list of elements that is an approximation of the minimum union of k elements from m.

If we let X be list of lists, each of which represents a row in a data matrix, and we let k be a positive integer, then:

from minkunion import kambiguate
Y = kambiguate(X, k)

produces a k-ambiguated version of X in Y by cell-suppression. This means that each list in Y matches at least k lists in X if we interpret '?' as representing any possible value.

Standalone Invocation

Run as a standalone program this module will perform k-ambiguation on the input data set.

Note that the rectangular data set required as input is assumed to contain attribute names on the first line. Each subsequent line contains the attribute values for an object. The last attribute, i.e., last column, is taken to contain class labels.

Try (assuming a shell command line):

$ python -h
$ python -e | less
$ python dataset.txt | less
$ python -k 3 dataset.txt | less

To generate a html version of this short explanation:

$ python -e | rst2html > explanation.html

rst2html is a part of the python docutils package


The minimum k union of a collection of sets is the set of smallest cardinality obtainable by taking the union of k members of the collection.

This module contains a simple greedy algorithm that iteratively selects the set that contributes the least to the union. This algorithm is used to implement k-ambiguation by cell suppression.

Details in:

Vinterbo, S. A.
A Stab at Approximating Minimum Subadditive Join
Algorithms and Data Structures: 10th International Workshop,
WADS 2007, Halifax, Canada, August 15-17, 2007
Proceedings, Springer Verlag, 2007