Pages: [1] 2 3 ... 10
 on: March 06, 2015, 08:30:12 AM 
Started by joandcruz - Last post by Martin Schmitz

if you just want to assign a score, based on the keywords you might want to have a look at this thread,8638.0.html

If you want to find the words automatically, you can do standard text mining on them. The trick is, that you can cluster the documents. Afterwards you can use the cluster information as Label and do a feature selection on them. Thus you can get the important words per cluster.



 on: March 06, 2015, 08:29:39 AM 
Started by ammargh - Last post by ammargh
Double-click RapidMiner Studio in my applications folder

 on: March 06, 2015, 08:26:15 AM 
Started by joandcruz - Last post by Martin Schmitz

could you give a bit more background? I am fully aware of the binomal distribution stuff, but i do not understand the example.



 on: March 05, 2015, 08:40:53 PM 
Started by joandcruz - Last post by joandcruz

I am struggling at the following task: I am using the "Series" plugin to create windows of a series, loop over the created examples, transform each into a "series" and extract features like max, min, ZCR, etc. using the extraction operators. Most of those operators create a single attribute associated to the series. What I want to do is to create a new example set, containing one example per window, where the attributes correspond to the features I extracted on those windows. How would I do that? I would need to aggregate new examples in a set within the loop and I would have to access the single attributes associated to the series and I don't know how to solve those problems. I appreciate any advice from you!

 on: March 05, 2015, 08:28:09 PM 
Started by joandcruz - Last post by joandcruz

I am currently working in RapidMiner 4.6.

I have extracted features from my main data set into 2 sets of features. One is a word vector (53 features) and the other is a set with 10 different features.

I have 2 different classifiers that I would like to combine in an ensemble method:

    Logistic Regression on the word vector
    W-J48graft on a different set of features

From my understanding I can only use operators such as stacking and voting if I give it one and the same data set as input.

How would I go about combining predictions from both my data sets using an ensemble method?

Thank you in advance!

 on: March 05, 2015, 08:24:09 PM 
Started by joandcruz - Last post by joandcruz
Hi guys, I am trying to understand how Quinlan calculates the errors for leaf when pruning a decision tree. I have read his book on the subject and he says:

"For a confidence level CF, the upper limit on this probability can be found from the confidence limits for the binomial distribution. This upper limit here is written U_cf(E,N)"

where E is incorrectly classified events and N is total events in the leaf. He has a example with a confidence level of 25%, N=6 and E=0 and he calculates the error to U_25%(0,6) = 0.206. Could anyone explain how this is actually calculated? I have had no luck searching for it. Thank you for any help!

 on: March 05, 2015, 08:19:26 PM 
Started by joandcruz - Last post by joandcruz

Hi all,
Sorry for the long title, but I could not find an efficient one  Smiley
I am new to RM, and I am finding document similarities via RM. My sources are the webpages, and I basically read them and compare them.
So far so good; but here is the problem:
I want to determine keywords and title for the documents, and I also want to assign weights to keywords.  When I run the program, title and keywords seem to be '?'. So, is there a way to manually enter the keywords and title for now? For later stages: how can RM automatically get keywords from webpages?

 on: March 05, 2015, 08:17:36 PM 
Started by joandcruz - Last post by joandcruz

Please Help Me. I am stuck.

I have a general decision tree and also CHAID and ID-3.

The parameters are

- minimal size for split
- minimal leaf size
- minimal gain
- maximal depth
- confidence

My training data is 400.
Ny features are 6707
My amount of total text is 27910

How can I determine a good value for the parameter without testruns. Testruns would take too much time due to the high enourmous amount of data.
Who has an idea for me?

Thank you!!!

 on: March 05, 2015, 06:31:22 PM 
Started by Rhmanig - Last post by Rhmanig

I am using Process Document to tokenize text (plus transform case, filter stop words and generate n-grams). I wonder why RapidMiner does not make a use of free memory and CPU and the process takes such a log time.

The current data size is 1059MB and the process is running for almost 5 days :/ The system has four cores and 29GB RAM. on average it uses %46 of CPU and right now it uses 75% of memory (the memory usage is going up slowly).

Please explain if you know why.



 on: March 05, 2015, 03:55:31 PM 
Started by mafern76 - Last post by mafern76
Thank you Martin!

It's not Generate Function Set I need, just precise calculations like you said, with a loop.

I'm using generate %{loop_attribute} - %{loop_attribute}_1 for example, this works, but my time intervals are arbitrary, so I need to generate a time instance index. Any idea on how to get it? I couldn't work it out on my own, thanks a lot.

id   time   time_instance
1   4   1
1   8   2
1   20   3
2   3   1
2   4   2
2   80   3
2   120   4

Pages: [1] 2 3 ... 10