Pages: [1]
  Print  
Author Topic: [SOLVED] Problem with tokenize  (Read 202 times)
jose
Newbie
*
Posts: 14


« on: February 07, 2012, 02:37:34 PM »

hello!

My question is this, so I have understood the tokenize operator divides the sentences into words. there is some way of dividing the prayers taking two words and not a word as usual the operator tokenize?.
« Last Edit: February 10, 2012, 10:58:53 AM by Marius » Logged
text_miner
Newbie
*
Posts: 10


« Reply #1 on: February 07, 2012, 03:41:56 PM »

Hi Jose,

Are you asking if you can have terms of more than one word/token?  If so, the answer is yes.  After you tokenize, use the Generate n-Grams (Terms) operator.  This will generate phrases of n sequential tokens.  Note: you will still have the single terms in your term-by-document matrix too.  For example, generating 2-grams you would have "heart", "attack", and "heart attack" in the matrix.
Logged
jose
Newbie
*
Posts: 14


« Reply #2 on: February 07, 2012, 06:03:25 PM »

ok, perfect,  thanks
Logged
Pages: [1]
  Print  
 
Jump to: