Pages: [1]
  Print  
Author Topic: [SOLVED] Accessing Text Data in Blob in MySQL  (Read 1088 times)
Datadude
Newbie
*
Posts: 9


« on: December 17, 2012, 04:50:02 AM »

I'm trying to access a text stored in a Blob field in MySQL.  However as I'm putting my job together I"m getting an error message:

The example set must contain at least one text attribute.  

It doesn't seem like Rapid understands that there might be text data in that binary field.  I"m trying to figure out how to do the conversion so that Rapid Miner understands what is going on here.  I"m attempting to connect my Read Database component to my Process Documents from Data component so that the Process Documents from Data Component can execute upon the text residing in the binary field.  Is there a way to do this conversion in RapidMiner?
« Last Edit: December 20, 2012, 05:40:43 AM by Datadude » Logged
awchisholm
Sr. Member
****
Posts: 394


WWW
« Reply #1 on: December 17, 2012, 06:35:22 AM »

Hello,

It's probably because you need to change the type of the attribute you're interested in to text. Use the "nominal to text" operator.

Regards,

Andrew
Logged

Datadude
Newbie
*
Posts: 9


« Reply #2 on: December 18, 2012, 01:06:42 AM »

Ok...so I did that and I seem to be getting to the next step.  Thanks...but now I'm getting a another exception:

Dec 16, 2012 11:39:00 PM SEVERE: Process failed: operator cannot be executed (java.lang.String cannot be cast to org.jdom.Text). Check the log messages...
Dec 16, 2012 11:39:00 PM SEVERE: Here:           Process[1] (Process)
           subprocess 'Main Process'
             +- Read Database[1] (Read Database)
             +- Nominal to Text[1] (Nominal to Text)
             +- Process Documents from Data[1] (Process Documents from Data)
           subprocess 'Vector Creation'
       ==>         +- Extract Information[1] (Extract Information)

Is there something I need to do before I can process an xml string with XPath?
Logged
awchisholm
Sr. Member
****
Posts: 394


WWW
« Reply #3 on: December 18, 2012, 09:16:02 AM »

The best thing to do is to post your process so we can see the details.

Andrew
« Last Edit: December 19, 2012, 03:47:21 PM by awchisholm » Logged

Datadude
Newbie
*
Posts: 9


« Reply #4 on: December 18, 2012, 02:41:24 PM »

Here is it:

Code:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.008">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
    <parameter key="logfile" value="/Users/wardloving/Documents/Data Mining/log.out"/>
    <parameter key="resultfile" value="/Users/wardloving/Documents/Data Mining/results.out"/>
    <process expanded="true" height="116" width="614">
      <operator activated="true" class="read_database" compatibility="5.2.008" expanded="true" height="60" name="Read Database" width="90" x="45" y="30">
        <parameter key="connection" value="Local MySQL Nutch"/>
        <parameter key="query" value="SELECT content&#10;FROM webpage where Id = 'org.episcopalchurch.www:http/parish/all-saints-episcopal-church-vista-ca'"/>
        <enumeration key="parameters"/>
      </operator>
      <operator activated="true" class="nominal_to_text" compatibility="5.2.008" expanded="true" height="76" name="Nominal to Text" width="90" x="246" y="30">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="content"/>
      </operator>
      <operator activated="true" class="text:process_document_from_data" compatibility="5.2.004" expanded="true" height="76" name="Process Documents from Data" width="90" x="514" y="30">
        <parameter key="create_word_vector" value="false"/>
        <list key="specify_weights"/>
        <process expanded="true" height="252" width="1095">
          <operator activated="true" class="text:extract_information" compatibility="5.2.004" expanded="true" height="60" name="Extract Information" width="90" x="112" y="75">
            <parameter key="query_type" value="XPath"/>
            <list key="string_machting_queries"/>
            <list key="regular_expression_queries"/>
            <list key="regular_region_queries"/>
            <list key="xpath_queries">
              <parameter key="Name" value="substring-before(//title, ',')"/>
              <parameter key="Staff Name" value="substring-before(//*[@class = 'field field-type-text field-field-clergy']/div/div/node()[not(self::div)],',')"/>
              <parameter key="Staff Title" value="substring-after(//*[@class = 'field field-type-text field-field-clergy']/div/div/node()[not(self::div)], ',')"/>
              <parameter key="Address Line 1" value="//*[@class = 'street-address']"/>
              <parameter key="City" value="//*[@class = 'locality']"/>
              <parameter key="State" value="//*[@class = 'region']"/>
              <parameter key="Zip" value="//*[@class = 'postal-code']"/>
              <parameter key="Email" value="//*[@class = 'field field-type-text field-field-email']/div/div/node()[not(self::div)]"/>
              <parameter key="Phone" value="//*[@class = 'field field-type-text field-field-phone']/div/div/node()[not(self::div)]"/>
              <parameter key="Fax" value="//*[@class = 'field field-type-text field-field-fax']/div/div/node()[not(self::div)]"/>
              <parameter key="URL" value="//*[@class = 'field field-type-text field-field-fax']/div/div/node()[not(self::div)]"/>
              <parameter key="Twitter" value="//*[@class = 'field field-type-text field-field-twitter']/div/div/node()[not(self::div)]"/>
            </list>
            <list key="namespaces"/>
            <list key="index_queries"/>
          </operator>
          <connect from_port="document" to_op="Extract Information" to_port="document"/>
          <connect from_op="Extract Information" from_port="document" to_port="document 1"/>
          <portSpacing port="source_document" spacing="36"/>
          <portSpacing port="sink_document 1" spacing="0"/>
          <portSpacing port="sink_document 2" spacing="0"/>
        </process>
      </operator>
      <connect from_op="Read Database" from_port="output" to_op="Nominal to Text" to_port="example set input"/>
      <connect from_op="Nominal to Text" from_port="example set output" to_op="Process Documents from Data" to_port="example set"/>
      <connect from_op="Process Documents from Data" from_port="word list" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>
Logged
awchisholm
Sr. Member
****
Posts: 394


WWW
« Reply #5 on: December 19, 2012, 03:59:52 PM »

Hello

I tried with some local data and it seems the output from the read database operator is already of the right type so the conversion is not necessary. So I've learned something  Smiley

Try unchecking "assume html"  on the "extract information" operator.

Andrew
Logged

Datadude
Newbie
*
Posts: 9


« Reply #6 on: December 20, 2012, 05:39:10 AM »

Thanks awchisholm for the tip.  When I removed this attribute the error stopped showing up.  This is good.  Unfortunately, it revealed that my HTML content in my database has been truncated/corrupted making parsing with XPath difficult.   Sometimes you just can't win but at least I know what is going on.
Logged
Pages: [1]
  Print  
 
Jump to: