Welcome,
Guest
. Please
login
or
register
.
Did you miss your
activation email?
Home
Help
Search
Login
Register
Rapid-I
Rapid-I Forum
»
RapidMiner
»
Feature Requests
»
Web Mining for Other Languages
Pages: [
1
]
« previous
next »
Print
Author
Topic: Web Mining for Other Languages (Read 497 times)
krsnewwave
Newbie
Posts: 1
Web Mining for Other Languages
«
on:
May 02, 2012, 03:43:31 AM »
Hi!
I have some Japanese pages in my analysis, and I noticed that the operator "Extract Content" isn't friendly with texts other than UTF-8. Is there any way to change how it handles its encoding?
(EDIT: While this question is hanging, I think I'm going to try the complement - removing all html tags and the regions with <script>. It seems to work okay thus far.)
«
Last Edit: May 02, 2012, 03:52:30 AM by krsnewwave
»
Logged
restuar
Newbie
Posts: 9
Re: Web Mining and Text mining for Other Languages
«
Reply #1 on:
July 12, 2012, 02:36:04 PM »
I agree. My colleagues and I plan to conduct comparative studies on English and Chinese online newspapers. Would you have anything in the pipeline for Cantonese and Mandarin character recognition? Thanks!
Logged
haddock
Hero Member
Posts: 837
Re: Web Mining for Other Languages
«
Reply #2 on:
July 12, 2012, 03:13:26 PM »
Hi there,
I know people who do this sort of thing, they say that a problem with Chinese is tokenising the sentences, as there are no spaces to separate the words, check this out
http://www.foreverastudent.com/2012/03/chinese-word-frequency-list-news.html
.
It is possible, but not easy!
Good luck.
«
Last Edit: July 12, 2012, 03:17:20 PM by haddock
»
Logged
Where is the wisdom we have lost in knowledge?
Where is the knowledge we have lost in information?
T.S.Eliot ~ Choruses from the Rock 1934
restuar
Newbie
Posts: 9
Re: Web Mining for Other Languages
«
Reply #3 on:
August 02, 2012, 03:24:10 PM »
Thank you. My team mates and I have agreed to use the english version of the website. My problem now is how to tell rapidminer that when it accesses the chinese website, it should use the translated one. Would you know how to solve this problem?
Logged
Pages: [
1
]
Print
« previous
next »
Jump to:
Please select a destination:
-----------------------------
General Community
-----------------------------
=> News and Updates
=> Data Mining
=> Chit Chat
-----------------------------
RapidMiner
-----------------------------
=> Getting Started
=> Data Mining / ETL / BI Processes
=> Problems and Support
=> Feature Requests
=> Development
-----------------------------
RapidAnalytics
-----------------------------
=> Getting Started
=> Applications and Integration
-----------------------------
RapidNet
-----------------------------
=> Getting Started
=> Problems and Support
Loading...