Pages: [1]
  Print  
Author Topic: NullPointerException  (Read 1937 times)
Darme
Newbie
*
Posts: 10


« on: April 23, 2013, 11:02:28 AM »

Hi,

I am a newbie to RapidMiner. I am trying to use Expectation Maximization to cluster some data. I have a around 500 000 of data rows in .csv file. I am using the process "Read CSV" -> Normalise -> Replace Missing Vlaues -> Clustering
However i always get a nullpointer exception at the clustering time  Sad
I am doing something wrong here?

Thanks in advance
Darme
Logged
Marius
Administrator
Hero Member
*****
Posts: 1794



WWW
« Reply #1 on: April 23, 2013, 11:31:14 AM »

Do you get an error dialog which allows to submit a bug report? If so, please use the corresponding button.
If there is no such dialog, please post your process setup and give us a detailed description of your data (number and types of attributes, and any particularities).

Best regards,
Marius
Logged

Please add [SOLVED] to the topic title when your problem has been solved! (do so by editing the first post in the thread and modifying the title)
Please click here before posting.
Darme
Newbie
*
Posts: 10


« Reply #2 on: April 23, 2013, 12:30:54 PM »

Hi Marius,

Thank you for your prompt reply. Following is the error massage i get.

The setup does not seem to contain any obvious errors, but you should check the log massages or activate the debug mode in the settings dialog in order to get more information about this problem

The log contains the following

           subprocess 'Main Process'
             +- Read CSV[1] (Read CSV)
             +- Normalize[1] (Normalize)
             +- Replace Missing Values[1] (Replace Missing Values)
       ==>   +- Clustering[1] (Expectation Maximization Clustering)
Apr 23, 2013 4:49:13 PM SEVERE: java.lang.NullPointerException

the data has 11 attributes which are of types text, number and date. In the normalise process i have set value type to numeric
In the clustering i have set randomly assigned examples
In the  Replace Missing Values i have set attribute filter type to all and default to average

do you need any more information?  Please let me know

Thanks again
Darme
Logged
Marius
Administrator
Hero Member
*****
Posts: 1794



WWW
« Reply #3 on: April 23, 2013, 12:43:58 PM »

Hi,

it seems that you also have missing values in your nominal and/or date attributes. You should remove/replace all missing values before applying Expectation Maximum Clustering.

Best regards,
Marius
Logged

Please add [SOLVED] to the topic title when your problem has been solved! (do so by editing the first post in the thread and modifying the title)
Please click here before posting.
Darme
Newbie
*
Posts: 10


« Reply #4 on: April 23, 2013, 05:29:57 PM »

Hi again,

I added two Replace Missing Vlaues steps to the below process. One has attribute filter type , "value_type" set to text  with default set to value and replenishment set as "extra"

The other has the value-type "date" and replenishment value of 23/4/2013.

Still i get the same error. Am i still on the wrong path. Please help.

Thank you very much
Darme
Logged
Marius
Administrator
Hero Member
*****
Posts: 1794



WWW
« Reply #5 on: April 24, 2013, 10:18:21 AM »

Can you please post your process setup as described in the post linked in my signature?

Additionally, try to set a breakpoint before the clustering operator and inspect the metadata for missing values.

Best regards,
Marius
Logged

Please add [SOLVED] to the topic title when your problem has been solved! (do so by editing the first post in the thread and modifying the title)
Please click here before posting.
Darme
Newbie
*
Posts: 10


« Reply #6 on: April 24, 2013, 03:00:25 PM »

Hi Marius,

Once again thank you for your advices.
I have attached the code of the process i am using and i believe all the required information is there.

Since i have a very large set of data, if a breakpoint is set for clustering then i think i need to iterate for each row of data one by one.
Is there a way to stop when a value is missing, similar to setting conditions to breakpoints?

Thanks and Regards
Darrshan

Code:
Code:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.009">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.1.009" expanded="true" name="Process">
    <process expanded="true" height="494" width="709">
      <operator activated="true" class="read_csv" compatibility="5.1.009" expanded="true" height="60" name="Read CSV" width="90" x="45" y="30">
        <parameter key="csv_file" value="C:\Users\yahoo\Desktop\CSEtemp.csv"/>
        <parameter key="column_separators" value=","/>
        <parameter key="first_row_as_names" value="false"/>
        <list key="annotations">
          <parameter key="0" value="Name"/>
        </list>
        <parameter key="encoding" value="windows-1252"/>
        <list key="data_set_meta_data_information">
          <parameter key="0" value="StockCode.true.text.attribute"/>
          <parameter key="1" value="SectorKey.true.text.attribute"/>
          <parameter key="2" value="TimeKey.true.date.attribute"/>
          <parameter key="3" value="OpenPrice.true.real.attribute"/>
          <parameter key="4" value="ClosePrice.true.real.attribute"/>
          <parameter key="5" value="NetChange.true.real.attribute"/>
          <parameter key="6" value="ChangePercentage.true.real.attribute"/>
          <parameter key="7" value="Highest.true.real.attribute"/>
          <parameter key="8" value="Lowest.true.real.attribute"/>
          <parameter key="9" value="Volume.true.integer.attribute"/>
          <parameter key="10" value="TotalValue.true.real.attribute"/>
        </list>
      </operator>
      <operator activated="true" class="normalize" compatibility="5.1.009" expanded="true" height="94" name="Normalize" width="90" x="45" y="255">
        <parameter key="attribute_filter_type" value="value_type"/>
      </operator>
      <operator activated="true" class="replace_missing_values" compatibility="5.1.009" expanded="true" height="94" name="Replace Missing Values (3)" width="90" x="179" y="345">
        <list key="columns"/>
      </operator>
      <operator activated="true" class="replace_missing_values" compatibility="5.1.009" expanded="true" height="94" name="Replace Missing Values" width="90" x="313" y="345">
        <parameter key="attribute_filter_type" value="value_type"/>
        <parameter key="value_type" value="text"/>
        <parameter key="default" value="value"/>
        <list key="columns">
          <parameter key="SectorKey" value="value"/>
          <parameter key="StockCode" value="value"/>
          <parameter key="TimeKey" value="value"/>
        </list>
        <parameter key="replenishment_value" value="extra"/>
      </operator>
      <operator activated="true" class="replace_missing_values" compatibility="5.1.009" expanded="true" height="94" name="Replace Missing Values (2)" width="90" x="447" y="345">
        <parameter key="attribute_filter_type" value="value_type"/>
        <parameter key="value_type" value="date"/>
        <parameter key="default" value="value"/>
        <list key="columns"/>
        <parameter key="replenishment_value" value="23/4/2013"/>
      </operator>
      <operator activated="true" class="expectation_maximization_clustering" compatibility="5.1.009" expanded="true" height="76" name="Clustering" width="90" x="514" y="75">
        <parameter key="k" value="3"/>
        <parameter key="add_as_label" value="true"/>
        <parameter key="use_local_random_seed" value="true"/>
        <parameter key="inital_distribution" value="randomly assigned examples"/>
      </operator>
      <connect from_op="Read CSV" from_port="output" to_op="Normalize" to_port="example set input"/>
      <connect from_op="Normalize" from_port="example set output" to_op="Replace Missing Values (3)" to_port="example set input"/>
      <connect from_op="Replace Missing Values (3)" from_port="original" to_op="Replace Missing Values" to_port="example set input"/>
      <connect from_op="Replace Missing Values" from_port="example set output" to_op="Replace Missing Values (2)" to_port="example set input"/>
      <connect from_op="Replace Missing Values (2)" from_port="example set output" to_op="Clustering" to_port="example set"/>
      <connect from_op="Clustering" from_port="cluster model" to_port="result 1"/>
      <connect from_op="Clustering" from_port="clustered set" to_port="result 2"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>[ /code]
Logged
Marius
Administrator
Hero Member
*****
Posts: 1794



WWW
« Reply #7 on: April 25, 2013, 09:35:06 AM »

No, you don't need to check each row one by one: just switch the the metadata view in the results perspective, and for each attribute you'll see the number of missing values.

Anyway, my suspect is that in the second Replace Missing Values operator you should select valye_type nominal, polynominal or binominal instead of text (text is a special data type used only in the Text Processing extension).
Experiment with that setting, *and* check the result with a breakpoint.

Best regards,
Marius
Logged

Please add [SOLVED] to the topic title when your problem has been solved! (do so by editing the first post in the thread and modifying the title)
Please click here before posting.
Darme
Newbie
*
Posts: 10


« Reply #8 on: April 25, 2013, 01:59:24 PM »

Hi,

As you have advised i changed the settings of Replace Missing Values operator and also changed the read csv operators data types accordingly.
Still i am getting the same result Sad

Also i created break points before clustering and in the meta data view the "Missing value" column shows only "?" I also set break points at each step and looked at the meta data and the result was same.

Furthermore i created the given schema on a MS SQL server evaluation edition and ran a query to retrieve null values for the given data set. The result was that there are no null values.

Do you think something else has gone wrong? Any more information needed?

Thanks again
Darme
Logged
Marcin
Global Moderator
Full Member
*****
Posts: 165


« Reply #9 on: April 29, 2013, 10:08:59 AM »

I have tried to reproduce your error with my own data (with missings included), but your process runs without an error. Your process XML says you are still using a quite old version (5.1). Could you update RapidMiner to 5.3.8 and check again?
Logged
Darme
Newbie
*
Posts: 10


« Reply #10 on: April 29, 2013, 11:40:19 AM »

Hi again,

I updated to 5.3.008 and still get the same error. Could it be that some setting/configuration issue?
Could you send me your xml file so that i can check it here?

Many thanks again
Darme
Logged
Darme
Newbie
*
Posts: 10


« Reply #11 on: May 01, 2013, 07:03:31 AM »

Hi again,

I tried out RM version 5.3.8 with modifications to the process. But still the result is same.
I have attached herewith the xml code
Seems something is fundamentally wrong either in the way i am doing or in the data.
Could you please share your xml to try out with my data?

Thanks alot
Darme

Code:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.008">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="read_csv" compatibility="5.3.008" expanded="true" height="60" name="Read CSV" width="90" x="45" y="120">
        <parameter key="csv_file" value="C:\Users\yahoo\Desktop\CSEtemp.csv"/>
        <parameter key="column_separators" value=","/>
        <parameter key="first_row_as_names" value="false"/>
        <list key="annotations">
          <parameter key="0" value="Name"/>
        </list>
        <parameter key="encoding" value="windows-1252"/>
        <list key="data_set_meta_data_information">
          <parameter key="0" value="StockCode.true.polynominal.attribute"/>
          <parameter key="1" value="SectorKey.true.binominal.attribute"/>
          <parameter key="2" value="TimeKey.true.date.attribute"/>
          <parameter key="3" value="OpenPrice.true.real.attribute"/>
          <parameter key="4" value="ClosePrice.true.real.attribute"/>
          <parameter key="5" value="NetChange.true.real.attribute"/>
          <parameter key="6" value="ChangePercentage.true.real.attribute"/>
          <parameter key="7" value="Highest.true.real.attribute"/>
          <parameter key="8" value="Lowest.true.real.attribute"/>
          <parameter key="9" value="Volume.true.integer.attribute"/>
          <parameter key="10" value="TotalValue.true.real.attribute"/>
        </list>
      </operator>
      <operator activated="true" class="normalize" compatibility="5.3.008" expanded="true" height="94" name="Normalize" width="90" x="45" y="255"/>
      <operator activated="true" class="replace_missing_values" compatibility="5.3.008" expanded="true" height="94" name="Replace Missing Values" width="90" x="112" y="390">
        <parameter key="attribute_filter_type" value="value_type"/>
        <parameter key="value_type" value="date"/>
        <parameter key="default" value="zero"/>
        <list key="columns"/>
      </operator>
      <operator activated="true" class="replace_missing_values" compatibility="5.3.008" expanded="true" height="94" name="Replace Missing Values (2)" width="90" x="246" y="390">
        <parameter key="attribute_filter_type" value="value_type"/>
        <parameter key="value_type" value="real"/>
        <list key="columns"/>
      </operator>
      <operator activated="true" class="replace_missing_values" compatibility="5.3.008" expanded="true" height="94" name="Replace Missing Values (3)" width="90" x="380" y="390">
        <parameter key="attribute_filter_type" value="value_type"/>
        <parameter key="value_type" value="binominal"/>
        <parameter key="default" value="value"/>
        <list key="columns">
          <parameter key="SectorKey" value="value"/>
        </list>
        <parameter key="replenishment_value" value="BFI"/>
      </operator>
      <operator activated="true" class="replace_missing_values" compatibility="5.3.008" expanded="true" height="94" name="Replace Missing Values (4)" width="90" x="514" y="390">
        <parameter key="attribute_filter_type" value="value_type"/>
        <parameter key="value_type" value="polynominal"/>
        <parameter key="default" value="value"/>
        <list key="columns">
          <parameter key="StockCode" value="value"/>
        </list>
        <parameter key="replenishment_value" value="AAAA"/>
      </operator>
      <operator activated="true" class="expectation_maximization_clustering" compatibility="5.3.008" expanded="true" height="76" name="Clustering" width="90" x="514" y="210">
        <parameter key="inital_distribution" value="randomly assigned examples"/>
      </operator>
      <connect from_op="Read CSV" from_port="output" to_op="Normalize" to_port="example set input"/>
      <connect from_op="Normalize" from_port="original" to_op="Replace Missing Values" to_port="example set input"/>
      <connect from_op="Replace Missing Values" from_port="original" to_op="Replace Missing Values (2)" to_port="example set input"/>
      <connect from_op="Replace Missing Values (2)" from_port="original" to_op="Replace Missing Values (3)" to_port="example set input"/>
      <connect from_op="Replace Missing Values (3)" from_port="original" to_op="Replace Missing Values (4)" to_port="example set input"/>
      <connect from_op="Replace Missing Values (4)" from_port="example set output" to_op="Clustering" to_port="example set"/>
      <connect from_op="Clustering" from_port="cluster model" to_port="result 1"/>
      <connect from_op="Clustering" from_port="clustered set" to_port="result 2"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>
Logged
Marcin
Global Moderator
Full Member
*****
Posts: 165


« Reply #12 on: May 02, 2013, 08:50:16 AM »

OK, I think I see where the problem is. It is a very subtle error I haven't seen directly in your processes. You are connecting the second port of your "Replace missing" to the next operator. The three letters "ori" indicate that this is the original output which is passed through without any changes, so, your data still contains missing values. Please use the first port "exa".

For the NullPointerException we have already created an intern ticket.
Logged
Darme
Newbie
*
Posts: 10


« Reply #13 on: May 04, 2013, 07:04:56 AM »

Many thanks for your advice

I used the above process with using output as "exe" and got rid of the NullPointerException.
However i have some issues with the result.

1. In the "Replace Missing value" for date, i have provided value as zero and all of the date values have been replaced by "Jan 1, 1970"
2. In the "Replace Missing value" for real, i have set the default value as average and in most of the columns the actual values have been replaced by the average figure
3. In the "Replace Missing value" for binomial, i have set the default value as "BFI" and all of the actual values have been replaced with this.

Is it possible for me to do the clustering with the actual values? Is there any reason why the tool replaces actual values with the values for replacement?

In another experiment, keeping all of the above as same but i altered "Replace Missing value" for date, by setting a default value of 1/1/2009.Then again i got the NullPointerException.
Could you explain this behaviour?

Once again thank you for your understanding and continues help with this regard and hope for solutions for my questions

Regards
Darme
Logged
Darme
Newbie
*
Posts: 10


« Reply #14 on: May 13, 2013, 07:33:25 AM »

Hi Marius,

I managed to get results by trying out various options in the tool. Mainly I used attribute_type for all attributes rather than their data types and set one as the prediction. I guess if we keep attributes in some data types there could be nullpointer exception possibly because data type mismatches. Please correct me if I am wrong here.

Once again thank you very much for all your help with this regard

P.S shall I put this issue in to solved state

Regards
Darrshan
Logged
Pages: [1]
  Print  
 
Jump to: