Pages: [1]
  Print  
Author Topic: [SOLVED] Sparse Data/Collaborative Filtering  (Read 300 times)
abcac
Newbie
*
Posts: 4


« on: July 30, 2012, 03:01:47 PM »

Hello,

I am trying to do collaborative filtering however I am having difficulty reading the data in.

Originally I formatted the data as a map, where a line contained ID, ID, Boolean. This would process in a few seconds.

What I need is a matrix with the two ID fields being coordinates and the Boolean being the entry. I could not figure out how to do this.

I moved on to trying to use readSparse, however it now takes 1 minute to read in the data. This seems odd and probably wont scale.

*I am new to rapidMiner, any suggestions on resources would be great.

Code:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.008">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
    <process expanded="true" height="-20" width="-50">
      <operator activated="true" class="read_sparse" compatibility="5.2.008" expanded="true" height="60" name="Read Sparse" width="90" x="28" y="230">
        <parameter key="format" value="yx"/>
        <parameter key="data_file" value="*****************t"/>
        <parameter key="dimension" value="216370"/>
        <parameter key="datamanagement" value="boolean_sparse_array"/>
        <list key="prefix_map"/>
      </operator>
      <connect from_op="Read Sparse" from_port="output" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>


Thanks in advance
« Last Edit: August 10, 2012, 01:21:00 AM by abcac » Logged
Nils
Administrator
Sr. Member
*****
Posts: 334


« Reply #1 on: August 01, 2012, 08:22:34 AM »

Hi,

as you can see in the help view the Read Sparse operator does not support the format you are trying to load. It supports only simple arrays.

Quote
Reads an example file in sparse format, i.e. lines have the form
label index:value index:value index:value...

What do you want to do with your data afterwards? Maybe you can try another Read operator.

Best,
Nils
Logged

Please add [SOLVED] to the topic title when your problem has been solved! (do so by editing the first post in the thread and modifying the title)
abcac
Newbie
*
Posts: 4


« Reply #2 on: August 10, 2012, 01:20:35 AM »

I decided to move forward without using a sparse matrix. Am testing my process using a small subset of attributes (~1000).
Logged
Pages: [1]
  Print  
 
Jump to: