Pages: [1]
  Print  
Author Topic: Character encoding problem with AttributeFilter  (Read 1957 times)
kostadin
Newbie
*
Posts: 9


« on: September 01, 2008, 04:38:37 PM »

I have the following problem (bug?). I want to do the following:

1. Load data with an ExcelExampleSource-Operator (the data is labeled, e.g. the first line contains the labels of the Excel-columns)
2. Apply an AttributeFilter to the loaded data by filtering certain attribute names.

The Excel input file is German, therefore there can be German Umlaute like ä, ö, ü contained in the column-labels.
In the AttributeFilter operator I set parameter "condition_class" to the value "attribute_name_filter". As a parameter string I use a regular expression containing German Umlaute like "Häuser|Bäume".
Therefore in the root operator I set the encoding to UTF-16:
<parameter key="encoding"   value="UTF-16"/>

Since I work with the GUI-version of RapidMiner, I now want to switch from the XML-editor tab to the parameter editor tab. And now it happens, I receive the following error message:
com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence. Cancel to ignore changes, Ok to go on editing.

As soon as I remove the Umlaute, everything works fine. It somehow seems to expect the regular expression to be UTF-8 whereas it really should be treated as UTF-16, but that's only a guess.

I can temporarily change the column labels in the input data file to not using German Umlaute, however in the long run that's no real option. Any suggestions?
Logged
Tobias Malbrecht
Global Moderator
Sr. Member
*****
Posts: 293



WWW
« Reply #1 on: September 01, 2008, 05:34:56 PM »

Hi,

we are already aware of that problem. However we unfortunately have not yet found a solution to overcome that problem. Sorry, but for now you have to stick to the dirty solution by renaming the attributes before loading the data into RapidMiner. But we well keep trying to solve the problem, however I doubt we will be able to accomplish this in the short term.

Regards,
Tobias
Logged

Tobias Malbrecht
Director of Product Marketing
RapidMiner
kostadin
Newbie
*
Posts: 9


« Reply #2 on: September 02, 2008, 07:20:49 AM »

Okay, thanks for the answer. As you said, there are several possible workarounds.
Logged
Pages: [1]
  Print  
 
Jump to: