Pages: [1]
  Print  
Author Topic: Text Mining- Select Token based on Dictionary File  (Read 454 times)
mbjasser
Newbie
*
Posts: 4


« on: June 14, 2014, 06:02:59 AM »

Hi every one,

I'm trying to work on a text mining workflow to filter specific language contents based on  a specific language dictionary (TXT file most probably).

I was able to filter stopwords using the operator "Filter Stopwords (Dictionary)" to filter the content depending on a dictionary, but I'm still trying to select tokens based also on a dictionary, but it seems that the only operator offered is Filter Tokens (by contents) (which enables selecting tokens based on a regular expression, there is no option for selecting tokens based on a dictionary file).

I need your support if you have an idea if there exists any operator to do that, or if I'm missing something.

Thank you in advance
Logged
JEdward
Full Member
***
Posts: 145


« Reply #1 on: June 17, 2014, 05:23:51 AM »

Seconded. 

See my post here: http://rapid-i.com/rapidforum/index.php/topic,8008.msg27328.html#msg27328 suggesting that a group of us band together to pay for RM to improve language support in the text extension.  Particularly for more difficult languages such as Arabic, Chinese, Indonesian or even TxtSpk. 
Logged
fras
Global Moderator
Jr. Member
*****
Posts: 83


« Reply #2 on: June 17, 2014, 10:56:23 AM »

There is a operator "Filter Stopwords" where you can place a
txt file with one stopword per line.
Logged
mbjasser
Newbie
*
Posts: 4


« Reply #3 on: June 18, 2014, 11:46:03 AM »

Thanks fras, I'm already using the operator of Filter Stop words based on dictionary, but this operator is filtering words, I want to select tokens based on dictionary not filtering (removing). I'm not able to find any up to now. The only option offered is filter tokens by content using the regular option field.

Logged
mbjasser
Newbie
*
Posts: 4


« Reply #4 on: June 18, 2014, 12:57:12 PM »

it's god idea JEdward, hopefully it'll work
Logged
JEdward
Full Member
***
Posts: 145


« Reply #5 on: July 02, 2014, 04:47:14 AM »

Yes, definitely something I'm going to follow up on in the next couple of weeks. 
Logged
Pages: [1]
  Print  
 
Jump to: