Pages: [1]
  Print  
Author Topic: Bug in Feature Generation: side effects  (Read 3876 times)
steffen
Sr. Member
****
Posts: 376



« on: June 20, 2008, 11:12:03 AM »

Hello RapidMiner Team

I am using the latest cvs-version and tried to implement the ZTransformation. That means, calculating mean and std from input ExampleSet and then apply a series of RM-Operators, calling them within my code. Trying some preprocessing steps before my operator, I stepped over the strange behaviour of the FeatureGenerationOperator, which I also use. Then I simulated the Code in a process, using only RM-builtin-Operator. The strange things happened again. Two notes regarding the following setups:
  • The "useless" re-naming I got to perform because (originally) I wanted to use an attributenname containing a "(" within FeatureGeneration (confidence...)
  • In the following setups I used the dataset described by golf.aml delivered with the RM-distribution.
1. Here is my basic setup...which works!
Code:
<operator name="Root" class="Process" expanded="yes">
    <operator name="ExampleSource" class="ExampleSource">
        <parameter key="attributes" value="golf.aml"/>
    </operator>
    <operator name="Temperature->ijon" class="ChangeAttributeName">
        <parameter key="new_name" value="ijon"/>
        <parameter key="old_name" value="Temperature"/>
    </operator>
    <operator name="apply_ztrans" class="FeatureGeneration">
        <list key="functions">
          <parameter key="tichy" value="/(-(ijon,const[73.571]()),const[6.3326]())"/>
        </list>
        <parameter key="keep_all" value="true"/>
    </operator>
    <operator name="skip_ijon" class="FeatureNameFilter">
        <parameter key="filter_special_features" value="true"/>
        <parameter key="skip_features_with_name" value="ijon"/>
    </operator>
    <operator name="tichy->Temperature" class="ChangeAttributeName">
        <parameter key="new_name" value="Temperature"/>
        <parameter key="old_name" value="tichy"/>
    </operator>
</operator>


2. But accidently using the wrong attributename within FeatureGeneration, no error message appeared, but this wrong result.
Code:
<operator name="Root" class="Process" expanded="yes">
    <operator name="ExampleSource" class="ExampleSource">
        <parameter key="attributes" value="golf.aml"/>
    </operator>
    <operator name="Temperature->ijon" class="ChangeAttributeName">
        <parameter key="new_name" value="ijon"/>
        <parameter key="old_name" value="Temperature"/>
    </operator>
    <operator name="apply_ztrans" class="FeatureGeneration">
        <list key="functions">
          <parameter key="tichy" value="/(-(Temperature,const[73.571]()),const[6.3326]())"/>
        </list>
        <parameter key="keep_all" value="true"/>
    </operator>
    <operator name="skip_ijon" class="FeatureNameFilter">
        <parameter key="filter_special_features" value="true"/>
        <parameter key="skip_features_with_name" value="ijon"/>
    </operator>
    <operator name="tichy->Temperature" class="ChangeAttributeName">
        <parameter key="new_name" value="Temperature"/>
        <parameter key="old_name" value="tichy"/>
    </operator>
</operator>

3. Setting the correct names, but applying  the Sorting-Operator before causes the same results as in step 2.
Code:
<operator name="Root" class="Process" expanded="yes">
    <operator name="ExampleSource" class="ExampleSource">
        <parameter key="attributes" value="golf.aml"/>
    </operator>
    <operator name="sort_temperature" class="Sorting">
        <parameter key="attribute_name" value="Temperature"/>
    </operator>
    <operator name="Temperature->ijon" class="ChangeAttributeName">
        <parameter key="new_name" value="ijon"/>
        <parameter key="old_name" value="Temperature"/>
    </operator>
    <operator name="apply_ztrans" class="FeatureGeneration">
        <list key="functions">
          <parameter key="tichy" value="/(-(ijon,const[73.571]()),const[6.3326]())"/>
        </list>
        <parameter key="keep_all" value="true"/>
    </operator>
    <operator name="skip_ijon" class="FeatureNameFilter">
        <parameter key="filter_special_features" value="true"/>
        <parameter key="skip_features_with_name" value="ijon"/>
    </operator>
    <operator name="tichy->Temperature" class="ChangeAttributeName">
        <parameter key="new_name" value="Temperature"/>
        <parameter key="old_name" value="tichy"/>
    </operator>
</operator>


At this point I came to the conclusion, that the problem must lurk deeply in the RapidMiner entrails ...

Hope this error-desription was somehow helpful

greetings

Steffen
Logged

"I want to make computers do what I mean instead of what I say"
Read The Fantastic Manual
Tobias Malbrecht
Global Moderator
Sr. Member
*****
Posts: 293



WWW
« Reply #1 on: June 20, 2008, 01:01:14 PM »

Hi Steffen,

wow, what a wonderful detailed bug report. I must admit, I just browsed over it since I have not that much time today, but I will have a closer look at it on Monday, if nobody else will have done so until then ...  Wink

Regards,
Tobias
Logged

Tobias Malbrecht
Director of Product Marketing
RapidMiner
steffen
Sr. Member
****
Posts: 376



« Reply #2 on: July 15, 2008, 02:16:40 PM »

Hello RapidMiner-Team

I just checked out the 4.2 Release and it seems, that this bug is still there. I will open a ticket now, because I guess it is easier to keep track of such things in the huge amount of work you got to do. I thought about it before, but I didnt want to be annoying  Wink

beside this ... keep up the good work !

greetings

Steffen
Logged

"I want to make computers do what I mean instead of what I say"
Read The Fantastic Manual
Ingo Mierswa
Administrator
Hero Member
*****
Posts: 1226



WWW
« Reply #3 on: July 16, 2008, 02:11:55 PM »

Hey,

thanks for the reminder. We indeed missed this, sorry.

Cheers,
Ingo
Logged

Did you try our new Marketplace? Upload or download new Extensions, add comments, and organize your operators. Have a look at  http://marketplace.rapid-i.com
jean-charles
Guest
« Reply #4 on: July 21, 2008, 02:27:57 PM »

Hi Steffen, Hi Tobias, Hi All,

I have found a bug in FeatureGeneration too (maybe the same ?), but strangely there is the same kind of bug in AttributeConstructionLoader.

The basic idea of my experiment is to merge two lexical matrices in text mining : I have 10 documents in ".doc" format, 13 in "pdf", I use a "TextInput" subtree for each but I have to merge two examplesets with different lines and different atttributes.

I have tried "ExampesetMerge/Join/cartesian", none of them are satisfactory. Now I tried AttributeConstructionLoader and FeatureGeneration, both using "keep all=true" and "filepath= true" options, but I have such a message :
"The function name 'const' must be used with empty arguments".

Here is my experiment :

<operator name="Deux_repertoires" class="Process" expanded="yes">
    <description text="analyse du premier repertoire pour mise en forme#ylt#br#ygt#analyse du deuxieme repertoire#ylt#br#ygt#analyse_croisee, voir avec #yquot#attribute construction loader#yquot#"/>
    <parameter key="encoding"   value="win-1250"/>
    <operator name="Country" class="OperatorChain" expanded="yes">
        <operator name="ExampleSource" class="ExampleSource">
            <parameter key="attributes"   value="D:\users\default\project\base_doc\fichiers_croises\dummy\file_doc.aml"/>
        </operator>
        <operator name="IdTagging" class="IdTagging">
        </operator>
        <operator name="FeatureGeneration" class="FeatureGeneration">
            <parameter key="filename"   value="D:\users\default\project\base_doc\fichiers_croises\dummy\attributs_html.att"/>
            <list key="functions">
            </list>
            <parameter key="keep_all"   value="true"/>
        </operator>
        <operator name="ExampleSource (2)" class="ExampleSource">
            <parameter key="attributes"   value="D:\users\default\project\base_doc\fichiers_croises\dummy\file_html.aml"/>
        </operator>
        <operator name="FeatureGeneration (2)" class="FeatureGeneration">
            <parameter key="filename"   value="D:\users\default\project\base_doc\fichiers_croises\dummy\file_cross.att"/>
            <list key="functions">
            </list>
            <parameter key="keep_all"   value="true"/>
        </operator>
    </operator>
    <operator name="Elements" class="OperatorChain" activated="no" expanded="yes">
    </operator>
    <operator name="Croisement" class="OperatorChain" activated="no" expanded="yes">
    </operator>
</operator>


Is this the known behaviour steffen has been talking about ?
Cheers,
  Jean-Charles.
Logged
Ingo Mierswa
Administrator
Hero Member
*****
Posts: 1226



WWW
« Reply #5 on: July 22, 2008, 04:13:54 PM »

Hi Steffen, Hi Jean-Charles,

so, back again to feature generation. First, some comments on Steffen's Report:

Quote
At this point I came to the conclusion, that the problem must lurk deeply in the RapidMiner entrails ...

Yes, it is. Very deep. We have two different data structures (actually only one data structure and a view structure) for the data we handle. First, the ExampleTable which actually holds the data and the ExampleSets which define views on the underlying tables. All operators work on the ExampleSets with one exception: the feature generation operators directly work on the tables for performance reasons and to easily share newly generated attributes among views without the need for re-creation. This is, for example, useful for the evolutionary feature construction approaches.

However, changing the underlying table columns without "notifying" the view columns (attributes) might lead to some strange behaviour. For that reason, one simply have to copy the attribute (I kept the renaming) like in the following process. Then it works with both attribute names in the construction:

Code:
<operator name="Root" class="Process" expanded="yes">
    <operator name="ExampleSource" class="ExampleSource">
        <parameter key="attributes" value="C:\Dokumente und Einstellungen\Mierswa\Eigene Dateien\rm_workspace\sample\data\golf.aml"/>
    </operator>
    <operator name="AttributeCopy" class="AttributeCopy">
        <parameter key="attribute_name" value="Temperature"/>
        <parameter key="new_name" value="ijon"/>
    </operator>
    <operator name="Temperature->ijon" class="ChangeAttributeName" activated="no">
        <parameter key="new_name" value="ijon"/>
        <parameter key="old_name" value="Temperature"/>
    </operator>
    <operator name="apply_ztrans" class="FeatureGeneration">
        <list key="functions">
          <parameter key="tichy" value="/(-(Temperature,const[73.571]()),const[6.3326]())"/>
        </list>
        <parameter key="keep_all" value="true"/>
    </operator>
    <operator name="skip_ijon" class="FeatureNameFilter">
        <parameter key="filter_special_features" value="true"/>
        <parameter key="skip_features_with_name" value="ijon"/>
    </operator>
    <operator name="skip_Temperature" class="FeatureNameFilter">
        <parameter key="filter_special_features" value="true"/>
        <parameter key="skip_features_with_name" value="Temperature"/>
    </operator>
    <operator name="tichy->Temperature" class="ChangeAttributeName">
        <parameter key="new_name" value="Temperature"/>
        <parameter key="old_name" value="tichy"/>
    </operator>
</operator>


About the attribute construction loading: please use the operator "AttributeConstructionLoader" instread. The file parameter for the "FeatureGeneration" operator is sort of deprecated (unfortunately, we cannot mark this for parameters) and is only left in for backwards compatibility reasons.


However, just a small comment on the whole feature generation stuff: we will revise the feature generation algorithms until the next release anyway in order to ease the generation process and allow more generation types.

Cheers,
Ingo
Logged

Did you try our new Marketplace? Upload or download new Extensions, add comments, and organize your operators. Have a look at  http://marketplace.rapid-i.com
steffen
Sr. Member
****
Posts: 376



« Reply #6 on: July 22, 2008, 05:32:09 PM »

Hello Ingo

Thank you for the workaround !

Quote
However, just a small comment on the whole feature generation stuff: we will revise the feature generation algorithms until the next release anyway in order to ease the generation process and allow more generation types.

This would be nice. Did you consider using a language like JavaScript for user-defined functions ? Something like the "Modified Java Script Value" in Pentaho Kettle ? Beside "click-it-together-functions" it would be nice to have something powerful for the users with a stronger programming background.

greetings

Steffen
Logged

"I want to make computers do what I mean instead of what I say"
Read The Fantastic Manual
Ingo Mierswa
Administrator
Hero Member
*****
Posts: 1226



WWW
« Reply #7 on: July 22, 2008, 05:46:31 PM »

Hi again,

we actually also thought of a scripting engine for user defined functions which should be supported in Java 6 anyway (at least JavaScript should be supported).


For the more "traditional" mathematical functions we currently evaluate JEP:

http://www.singularsys.com/jep/index.html

which would really nicely fit into RapidMiner.


Any thoughts about this?

Cheers,
Ingo
Logged

Did you try our new Marketplace? Upload or download new Extensions, add comments, and organize your operators. Have a look at  http://marketplace.rapid-i.com
Pages: [1]
  Print  
 
Jump to: