Pages: [1]
  Print  
Author Topic: Text filtering problem! Please help!  (Read 405 times)
karhunen
Newbie
*
Posts: 1


« on: September 14, 2013, 12:15:54 PM »

Hey community,

I'm new in working with rapidminer and I try to filter multiple words from different pdf-files.

First I tried to filter just one word after tokenizing the files with the "Filter Tokens (by content)" Module.
I used the condition "contains" and specified my "string". This actually works fine.
Now i want to filter multiple words but i just dont know how to do this.

Can you please help me? I would really appreciate it!

Background:
I'm trying to classify some documents by using a wordlist with positive and negative words.
Rapidminer should analyse the given pdf-files regarding the amount of positive and negative words.

Any ideas?
« Last Edit: September 14, 2013, 12:27:19 PM by karhunen » Logged
awchisholm
Sr. Member
****
Posts: 390


WWW
« Reply #1 on: September 15, 2013, 04:13:23 PM »

Hello

You could use a word list to filter the document for those words only.

Here is an example that does more than you need.

http://rapidminernotes.blogspot.co.uk/2013/04/finding-needles-in-text-haystacks.html

You will need to make some changes for what you want.

regards

Andrew
Logged

Pages: [1]
  Print  
 
Jump to: