Pages: [1]
  Print  
Author Topic: compare values in current and previous row  (Read 2307 times)
sebisawa
Newbie
*
Posts: 6


« on: August 04, 2013, 03:38:15 PM »

Hi


I want to know intervals of purchases. For example, I have records like

- user01,2012/01/01
- user02,2012/01/02
- user01,2012/01/04
- user02,2012/01/06

How I can add "days since the last purchase"?  like

- user01,2012/01/01,X
- user02,2012/01/02,X
- user01,2012/01/04,3days
- user02,2012/01/06,4days


Logged
fras
Global Moderator
Jr. Member
*****
Posts: 84


« Reply #1 on: August 12, 2013, 04:15:56 PM »

Hi,

Operator "Generate Attribute" is useful. There are many ways to add new attributes.
Especially you can manipulate date attributes with "date_add", "date_diff", "date_before", etc.
Here is an example that adds 5 days to each date:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.008">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="generate_sales_data" compatibility="5.3.008" expanded="true" height="60" name="Generate Sales Data" width="90" x="45" y="75"/>
      <operator activated="true" class="select_attributes" compatibility="5.3.008" expanded="true" height="76" name="Select Attributes" width="90" x="179" y="75">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="date"/>
      </operator>
      <operator activated="true" class="generate_attributes" compatibility="5.3.008" expanded="true" height="76" name="Generate Attributes (2)" width="90" x="313" y="75">
        <list key="function_descriptions">
          <parameter key="date02" value="date_add(date, 5, DATE_UNIT_DAY)"/>
        </list>
      </operator>
      <connect from_op="Generate Sales Data" from_port="output" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Generate Attributes (2)" to_port="example set input"/>
      <connect from_op="Generate Attributes (2)" from_port="example set output" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>
Logged
sebisawa
Newbie
*
Posts: 6


« Reply #2 on: August 12, 2013, 04:55:00 PM »

Hi,


Thanks for your reply. What I expected is not how to calculate dates, but retrieve values from previous records and compare with the current ones.
Isn't it possible?
Logged
fras
Global Moderator
Jr. Member
*****
Posts: 84


« Reply #3 on: August 12, 2013, 05:27:02 PM »

Perhaps you should clarify our example. What do you mean by "X" ?
Logged
sebisawa
Newbie
*
Posts: 6


« Reply #4 on: August 12, 2013, 05:33:00 PM »

Sorry for confusion.

- user01,2012/01/01,X
- user02,2012/01/02,X
- user01,2012/01/04,3days
- user02,2012/01/06,4days

What I would like to know is, how many days passed since the last purchases.
Line 1 is the first record for user01, so we can't calculate the duration. I meant "X" as null.
Line 2 is the same as Line 1.
Line 3, it is the second purchase of user 01, the diff of 2012/01/01 and 2012/01/04 is "3 days".
Line 4, it is the second purchase of user 02, the diff of 2012/01/02 and 2012/01/06 is "4 days".

Does it make sense?

Logged
awchisholm
Sr. Member
****
Posts: 392


WWW
« Reply #5 on: August 13, 2013, 11:09:11 AM »

Hello

Here's an example that uses the Lag operator that you could try.

Code:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.013">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.3.013" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="subprocess" compatibility="5.3.013" expanded="true" height="76" name="Subprocess" width="90" x="112" y="75">
        <process expanded="true">
          <operator activated="true" class="generate_data_user_specification" compatibility="5.3.013" expanded="true" height="60" name="Generate Data by User Specification" width="90" x="111" y="30">
            <list key="attribute_values">
              <parameter key="user" value="&quot;user1&quot;"/>
              <parameter key="date" value="&quot;2013-08-01&quot;"/>
            </list>
            <list key="set_additional_roles"/>
          </operator>
          <operator activated="true" class="generate_data_user_specification" compatibility="5.3.013" expanded="true" height="60" name="Generate Data by User Specification (2)" width="90" x="112" y="120">
            <list key="attribute_values">
              <parameter key="user" value="&quot;user1&quot;"/>
              <parameter key="date" value="&quot;2013-08-08&quot;"/>
            </list>
            <list key="set_additional_roles"/>
          </operator>
          <operator activated="true" class="generate_data_user_specification" compatibility="5.3.013" expanded="true" height="60" name="Generate Data by User Specification (3)" width="90" x="112" y="210">
            <list key="attribute_values">
              <parameter key="user" value="&quot;user2&quot;"/>
              <parameter key="date" value="&quot;2013-08-01&quot;"/>
            </list>
            <list key="set_additional_roles"/>
          </operator>
          <operator activated="true" class="generate_data_user_specification" compatibility="5.3.013" expanded="true" height="60" name="Generate Data by User Specification (4)" width="90" x="112" y="300">
            <list key="attribute_values">
              <parameter key="user" value="&quot;user2&quot;"/>
              <parameter key="date" value="&quot;2013-08-06&quot;"/>
            </list>
            <list key="set_additional_roles"/>
          </operator>
          <operator activated="true" class="append" compatibility="5.3.013" expanded="true" height="130" name="Append" width="90" x="380" y="75"/>
          <operator activated="true" class="nominal_to_date" compatibility="5.3.013" expanded="true" height="76" name="Nominal to Date" width="90" x="581" y="75">
            <parameter key="attribute_name" value="date"/>
            <parameter key="date_format" value="yyyy-MM-dd"/>
          </operator>
          <connect from_op="Generate Data by User Specification" from_port="output" to_op="Append" to_port="example set 1"/>
          <connect from_op="Generate Data by User Specification (2)" from_port="output" to_op="Append" to_port="example set 2"/>
          <connect from_op="Generate Data by User Specification (3)" from_port="output" to_op="Append" to_port="example set 3"/>
          <connect from_op="Generate Data by User Specification (4)" from_port="output" to_op="Append" to_port="example set 4"/>
          <connect from_op="Append" from_port="merged set" to_op="Nominal to Date" to_port="example set input"/>
          <connect from_op="Nominal to Date" from_port="example set output" to_port="out 1"/>
          <portSpacing port="source_in 1" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="loop_values" compatibility="5.3.013" expanded="true" height="76" name="Loop Values" width="90" x="112" y="165">
        <parameter key="attribute" value="user"/>
        <process expanded="true">
          <operator activated="true" class="filter_examples" compatibility="5.3.013" expanded="true" height="76" name="Filter Examples" width="90" x="179" y="75">
            <parameter key="condition_class" value="attribute_value_filter"/>
            <parameter key="parameter_string" value="user=%{loop_value}"/>
          </operator>
          <operator activated="true" class="series:lag_series" compatibility="5.3.000" expanded="true" height="76" name="Lag Series" width="90" x="380" y="75">
            <list key="attributes">
              <parameter key="date" value="1"/>
            </list>
          </operator>
          <connect from_port="example set" to_op="Filter Examples" to_port="example set input"/>
          <connect from_op="Filter Examples" from_port="example set output" to_op="Lag Series" to_port="example set input"/>
          <connect from_op="Lag Series" from_port="example set output" to_port="out 1"/>
          <portSpacing port="source_example set" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="append" compatibility="5.3.013" expanded="true" height="76" name="Append (2)" width="90" x="112" y="255"/>
      <operator activated="true" class="rename" compatibility="5.3.013" expanded="true" height="76" name="Rename" width="90" x="246" y="75">
        <parameter key="old_name" value="date-1"/>
        <parameter key="new_name" value="date1"/>
        <list key="rename_additional_attributes"/>
      </operator>
      <operator activated="true" class="generate_attributes" compatibility="5.3.013" expanded="true" height="76" name="Generate Attributes" width="90" x="246" y="165">
        <list key="function_descriptions">
          <parameter key="diff" value="date_diff(date1,date)/1000/3600/24"/>
        </list>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="5.3.013" expanded="true" height="76" name="Select Attributes" width="90" x="246" y="255">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attributes" value="|user|diff"/>
      </operator>
      <connect from_op="Subprocess" from_port="out 1" to_op="Loop Values" to_port="example set"/>
      <connect from_op="Loop Values" from_port="out 1" to_op="Append (2)" to_port="example set 1"/>
      <connect from_op="Append (2)" from_port="merged set" to_op="Rename" to_port="example set input"/>
      <connect from_op="Rename" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
      <connect from_op="Generate Attributes" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

regards

Andrew
Logged

sebisawa
Newbie
*
Posts: 6


« Reply #6 on: August 13, 2013, 11:52:34 AM »

Hi Andrew


Thanks for your post, however I can't execute it with RapidMiner 5.3.
I could not find the operator named "Lag Series". Is this why?

Aug 13, 2013 7:49:50 PM SEVERE: Process failed: The dummy operator Lag Series (replacing series:lag_series) cannot be executed.
Logged
wessel
Hero Member
*****
Posts: 558


« Reply #7 on: August 13, 2013, 12:35:48 PM »

The lag operator requires time series plugin.

Maybe you want to experiment with the Script operator.
Put this inside your script operator and find out what happens:



ExampleSet es = operator.getInput(ExampleSet.class);

es.recalculateAllAttributeStatistics();

for (Attribute a : es.getAttributes()) {
    double mean = es.getStatistics(a, Statistics.AVERAGE);
    String name = a.getName();
    for (Example example : es) {
        example[name] = example[name] - mean;
    }
}

double last = 0;
for (Example e : es) {
    e["a"] = e["id"] + last;
    last = e["id"];
}

int size = es.size()
for (int i = 1; i < size; i++) {
   Example e1 = es.getExample(i-1);
   Example e0 = es.getExample(i);
   e0["b"] = e0["id"] + e1["id"];      
}


return es;
« Last Edit: August 13, 2013, 12:44:31 PM by wessel » Logged
wessel
Hero Member
*****
Posts: 558


« Reply #8 on: August 13, 2013, 12:44:53 PM »

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.013">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.3.013" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" breakpoints="after" class="generate_data" compatibility="5.3.013" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
        <parameter key="target_function" value="driller oscillation timeseries"/>
        <parameter key="number_of_attributes" value="2"/>
      </operator>
      <operator activated="true" class="generate_id" compatibility="5.3.013" expanded="true" height="76" name="Generate ID" width="90" x="180" y="30"/>
      <operator activated="true" class="generate_attributes" compatibility="5.3.013" expanded="true" height="76" name="Generate Attributes" width="90" x="315" y="30">
        <list key="function_descriptions">
          <parameter key="a" value="&quot;z&quot;"/>
          <parameter key="b" value="&quot;z&quot;"/>
        </list>
      </operator>
      <operator activated="true" class="execute_script" compatibility="5.3.013" expanded="true" height="76" name="Execute Script" width="90" x="450" y="29">
        <parameter key="script" value="&#10;ExampleSet es = operator.getInput(ExampleSet.class);&#10;&#10;es.recalculateAllAttributeStatistics();&#10;&#10;for (Attribute a : es.getAttributes()) {&#10;    double mean = es.getStatistics(a, Statistics.AVERAGE);&#10;    String name = a.getName();&#10;    for (Example example : es) {&#10;        example[name] = example[name] - mean;&#10;    }&#10;}&#10;&#10;double last = 0;&#10;for (Example e : es) {&#10;    e[&quot;a&quot;] = e[&quot;id&quot;] + last;&#10;    last = e[&quot;id&quot;];&#10;}&#10;&#10;int size = es.size()&#10;for (int i = 1; i &lt; size; i++) {&#10;&#9;Example e1 = es.getExample(i-1);&#10;&#9;Example e0 = es.getExample(i);&#10;&#9;e0[&quot;b&quot;] = e0[&quot;id&quot;] + e1[&quot;id&quot;];&#9;&#9;&#10;}&#10;&#10;&#10;return es;"/>
      </operator>
      <connect from_op="Generate Data" from_port="output" to_op="Generate ID" to_port="example set input"/>
      <connect from_op="Generate ID" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
      <connect from_op="Generate Attributes" from_port="example set output" to_op="Execute Script" to_port="input 1"/>
      <connect from_op="Execute Script" from_port="output 1" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>
Logged
sebisawa
Newbie
*
Posts: 6


« Reply #9 on: August 13, 2013, 01:10:13 PM »

Hi wessel

Thanks for your reply. "Execute Script" seems very helpful.
Where is the document regarding to the scripting?
Logged
awchisholm
Sr. Member
****
Posts: 392


WWW
« Reply #10 on: August 13, 2013, 03:11:14 PM »

The time series extension is what you need for the Lag operator. 
Logged

sebisawa
Newbie
*
Posts: 6


« Reply #11 on: August 15, 2013, 04:25:52 AM »

Hi Andrew and Wessel

Thanks for your kindly support.

I could run the process from Andrew by downloading Series Extension from the marketplace.
However it takes long time to finish processing. I assume it is because the examples must
be filtered as much as users exists, it is not efficient. So I use this operator without "loop values"
operator.
 
Script one is fantastic and someday I would love to try.

Best regards
Logged
Pages: [1]
  Print  
 
Jump to: