Pages: [1]
  Print  
Author Topic: Preprocessing for FPGrowth  (Read 1120 times)
guilhermecr
Newbie
*
Posts: 4


« on: June 03, 2009, 08:39:20 PM »

I am working with basket analisys. I am already generating the binomial format using other programs.

What RM operator can I use to transform the dataset from this format:

1,3
2,3,4
1,2,3

to this:

1,0,1,0
0,1,1,1
1,1,1,0

Thanks in advance Smiley
Logged
Sebastian Land
Administrator
Hero Member
*****
Posts: 2426


« Reply #1 on: June 04, 2009, 09:59:38 AM »

Hi,
your data format is called dense, because it only saves the indices of the columns unequal 0. RapidMiner supports a dense format, but it slightly differs from yours. If you could bring your data in the following format, you can easily load it:
1:1 3:1
2:1 3:1 4:1
1:1 2:1 3:1

If you then use the operator SparseFormatExampleSource with the parameter format set to no_label and the parameter dimension set to the number of dimensions (the highest number occuring in your file) then it works.

Greetings,
  Sebastian
Logged
guilhermecr
Newbie
*
Posts: 4


« Reply #2 on: June 04, 2009, 02:49:49 PM »

I am starting with market basket, so I have been practicing with datasets available in the internet.
I have used the 'retail' data set available at http://fimi.cs.helsinki.fi/data/retail.dat, which is in the dense format.

But since I will get my own data from a friend's shop, my question is:

What is the best format for a market basket analysis with RM?


Thanks

PS: I will probaly use Apriori and FPGrowth.
Logged
Pages: [1]
  Print  
 
Jump to: