Read Data Mining Online

Authors: Mehmed Kantardzic

Data Mining (119 page)

The crossover does not require any modification. We take advantage of the fact that all classifiers are of equal length. Therefore, to crossover two selected parents, say Q
1
and Q
2

we generate a random crossover-position point. Suppose that we crossover after the third character in the string as marked; then the offspring are

The strength of the offspring is an average (possibly weighted) of the parents’ strengths. Now the system is ready to continue its learning process: starting another cycle, accepting further positive and negative samples from the training set, and modifying the strengths of classifiers as a measure of fitness. We can note that the training data set is included in the learning process through evaluation of schema’s strengths for each iteration. We expect that the population of classifiers converges to some rules with very high strengths.

One of the possible implementations of the previous ideas is the
GIL system
, which moves the GA closer to the symbolic level—mainly by defining specialized operators that manipulate binary strings. Previous symbolic classifiers are translated into binary strings, where for each attribute a binary string of fixed length is generated. The length is equal to the number of possible values for the given attribute. In the string, the required value is set to 1 and all others are set to 0. For example, if attribute A
1
has the value z, it is represented with the binary string 001 (0’s are for values x, y). If the value of some attribute is *, that means that all values are possible, so it is represented with value 1 in all positions of the binary string.

For our previous example, with six attributes and a total number of 17 different values for all attributes, the classifier symbolically represented as

can be transformed into the binary representation

where bars separate bit sets for each attribute. The operators of the GIL system are modeled on inductive reasoning, which includes various inductive operators such as RuleExchange, RuleCopy, RuleGeneralization, RuleSpecialization, RuleSplit, SelectorDrop, ReferenceChange, and ReferenceExtension. We discuss some of them in turn.

13.6.1 RuleExchange

The RuleExchange operator is similar to a crossover of the classical GA as it exchanges selected complex between two parent chromosomes. For example, two parents (two rules)

may produce the following offspring (new rules)

Other books

Hippomobile! by Jeff Tapia
Forest Spirit by David Laing
A Door in the River by Inger Ash Wolfe
Vengeance by Jonas Saul
Cravings by Liz Everly
Deadfall by Stephen Lodge
Fighting for Love by L.P. Dover
Gypsy Girl by Kathryn James