Pages: [1]
  Print  
Author Topic: [SOLVED] "Hierarchical Classification" operator  (Read 1150 times)
mdc
Jr. Member
**
Posts: 60


« on: July 11, 2013, 12:51:50 AM »

Hello,

I'm trying to do hierarchical classification of documents and I believe the 'hierarchical classification' operator is the way to go as recommended here in the forum. My problem is that I couldn't figure out how to use this operator and what to expect as an output. I couldn't find any example of use in the forum either. Can somebody post a sample process using this operator?

thanks in advance,
Matthew
« Last Edit: July 29, 2013, 05:50:44 PM by mdc » Logged
awchisholm
Sr. Member
****
Posts: 390


WWW
« Reply #1 on: July 11, 2013, 12:49:11 PM »

Hello

Here's an example of a top down clustering. It uses the top clustering operator which itself contains another clustering operator; in this case expectation maximization with k = 2.. By observation this all works something like this. The outer operator invokes the inner which splits the example set into k = 2 clusters. The outer operator then repeats this with the examples from these 2 clusters and the inner operator duly splits these into 2 more clusters. This repeats for the number defined in the max depth parameter for the top down clustering operator. I believe the flatten clusters operator is what is needed to extract a particular clustering and to prove this to myself I added a map labels operator with performance to see how well the clusters map to the ground truth.

Code:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.008">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="5.3.008" expanded="true" height="60" name="Retrieve Iris" width="90" x="112" y="75">
        <parameter key="repository_entry" value="//Samples/data/Iris"/>
      </operator>
      <operator activated="true" class="top_down_clustering" compatibility="5.3.008" expanded="true" height="76" name="Clustering (2)" width="90" x="112" y="165">
        <parameter key="max_depth" value="2"/>
        <process expanded="true">
          <operator activated="true" class="expectation_maximization_clustering" compatibility="5.3.008" expanded="true" height="76" name="Clustering" width="90" x="179" y="75"/>
          <connect from_port="example set" to_op="Clustering" to_port="example set"/>
          <connect from_op="Clustering" from_port="cluster model" to_port="cluster model"/>
          <portSpacing port="source_example set" spacing="0"/>
          <portSpacing port="sink_cluster model" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="flatten_clustering" compatibility="5.3.008" expanded="true" height="76" name="Flatten Clustering" width="90" x="112" y="255"/>
      <operator activated="true" class="map_clustering_on_labels" compatibility="5.3.008" expanded="true" height="76" name="Map Clustering on Labels" width="90" x="380" y="210"/>
      <operator activated="true" class="performance" compatibility="5.3.008" expanded="true" height="76" name="Performance" width="90" x="514" y="75"/>
      <connect from_op="Retrieve Iris" from_port="output" to_op="Clustering (2)" to_port="example set"/>
      <connect from_op="Clustering (2)" from_port="cluster model" to_op="Flatten Clustering" to_port="hierarchical"/>
      <connect from_op="Clustering (2)" from_port="clustered set" to_op="Flatten Clustering" to_port="example set"/>
      <connect from_op="Flatten Clustering" from_port="flat" to_op="Map Clustering on Labels" to_port="cluster model"/>
      <connect from_op="Flatten Clustering" from_port="example set" to_op="Map Clustering on Labels" to_port="example set"/>
      <connect from_op="Map Clustering on Labels" from_port="example set" to_op="Performance" to_port="labelled data"/>
      <connect from_op="Map Clustering on Labels" from_port="cluster model" to_port="result 1"/>
      <connect from_op="Performance" from_port="performance" to_port="result 2"/>
      <connect from_op="Performance" from_port="example set" to_port="result 3"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
      <portSpacing port="sink_result 4" spacing="0"/>
    </process>
  </operator>
</process>

regards

Andrew
Logged

mdc
Jr. Member
**
Posts: 60


« Reply #2 on: July 11, 2013, 03:01:05 PM »

Thanks Andrew for the reply.
But I'm looking for hierarchical classification, particularly its operator. I have hierarchical labels which I can enter in the operator's table. But other than that I have no idea how to use (expected input and output) it.

Matthew
Logged
awchisholm
Sr. Member
****
Posts: 390


WWW
« Reply #3 on: July 11, 2013, 11:37:28 PM »

Hello Matthew

Good point - I didn't pay attention to the question and substituted clustering for classification

I'm not familiar with hierarchical classification in the context of machine learning but I'm guessing it's something to do with dividing example sets into smaller and smaller pieces based on a rule at each stage. That's sort of what the clustering example is doing with the proviso that the rule is not controllable because it is the same clustering algorithm at all times. It also produces a prediction so it is usable as a classifier - again with one proviso, the label results are not derived from the training data so there would also be ambiguity about the true identify of the clusters.


regards

Andrew
Logged

mdc
Jr. Member
**
Posts: 60


« Reply #4 on: July 12, 2013, 12:24:01 AM »


Hi,

I created a hierarchical classification a couple of years ago similar to what you described --modelling/applying different set of labels to each divided example set. The set of labels are hierarchical. But since there is this 'Hierarchical Classification' operator, I thought that this could make the process simpler.

Anyways, if anybody has a sample process please post it or maybe a hint on how it works.  Huh

thanks,
Matthew
Logged
mdc
Jr. Member
**
Posts: 60


« Reply #5 on: July 19, 2013, 04:03:13 PM »


Anybody Sad, any hint  Huh on how to  use that 'hierarchical classification' operator?
Logged
Marius
Administrator
Hero Member
*****
Posts: 1794



WWW
« Reply #6 on: July 22, 2013, 01:35:12 PM »

The following process performs a hierarchical classification on Iris. You have to define the hierarchy in tabular form, starting from a "root" node.
Please have a look at the process below and come back with any questions you have.

Best regards,
Marius

Code:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.008">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="5.3.008" expanded="true" height="60" name="Retrieve Iris" width="90" x="45" y="30">
        <parameter key="repository_entry" value="//Samples/data/Iris"/>
      </operator>
      <operator activated="true" class="hierarchical_multi_class_classification" compatibility="5.3.008" expanded="true" height="76" name="Hierarchical Classification" width="90" x="179" y="30">
        <list key="hierarchy">
          <parameter key="versicolor_virginica" value="Iris-versicolor"/>
          <parameter key="versicolor_virginica" value="Iris-virginica"/>
          <parameter key="root" value="Iris-setosa"/>
          <parameter key="root" value="versicolor_virginica"/>
        </list>
        <process expanded="true">
          <operator activated="true" class="support_vector_machine" compatibility="5.3.008" expanded="true" height="112" name="SVM" width="90" x="179" y="30"/>
          <connect from_port="training set" to_op="SVM" to_port="training set"/>
          <connect from_op="SVM" from_port="model" to_port="model"/>
          <portSpacing port="source_training set" spacing="0"/>
          <portSpacing port="sink_model" spacing="0"/>
        </process>
      </operator>
      <connect from_op="Retrieve Iris" from_port="output" to_op="Hierarchical Classification" to_port="training set"/>
      <connect from_op="Hierarchical Classification" from_port="model" to_port="result 2"/>
      <connect from_op="Hierarchical Classification" from_port="example set" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>
Logged

Please add [SOLVED] to the topic title when your problem has been solved! (do so by editing the first post in the thread and modifying the title)
Please click here before posting.
mdc
Jr. Member
**
Posts: 60


« Reply #7 on: July 23, 2013, 03:52:05 PM »


Thanks Marius. It works but if I apply the model to an exampleset, the result is not showing the hierarchical labels --just the original labels (iris-*). Is there a way to make the prediction use the parent labels too --like another column?

Matthew


Code:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.008">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="5.3.008" expanded="true" height="60" name="Retrieve Iris" width="90" x="45" y="30">
        <parameter key="repository_entry" value="//Samples/data/Iris"/>
      </operator>
      <operator activated="true" class="hierarchical_multi_class_classification" compatibility="5.3.008" expanded="true" height="76" name="Hierarchical Classification" width="90" x="179" y="30">
        <list key="hierarchy">
          <parameter key="versicolor_virginica" value="Iris-versicolor"/>
          <parameter key="versicolor_virginica" value="Iris-virginica"/>
          <parameter key="root" value="Iris-setosa"/>
          <parameter key="root" value="versicolor_virginica"/>
        </list>
        <process expanded="true">
          <operator activated="true" class="support_vector_machine" compatibility="5.3.008" expanded="true" height="112" name="SVM" width="90" x="179" y="30"/>
          <connect from_port="training set" to_op="SVM" to_port="training set"/>
          <connect from_op="SVM" from_port="model" to_port="model"/>
          <portSpacing port="source_training set" spacing="0"/>
          <portSpacing port="sink_model" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="5.3.008" expanded="true" height="76" name="Select Attributes" width="90" x="313" y="75">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="label"/>
        <parameter key="invert_selection" value="true"/>
        <parameter key="include_special_attributes" value="true"/>
      </operator>
      <operator activated="true" class="apply_model" compatibility="5.3.008" expanded="true" height="76" name="Apply Model" width="90" x="447" y="30">
        <list key="application_parameters"/>
      </operator>
      <connect from_op="Retrieve Iris" from_port="output" to_op="Hierarchical Classification" to_port="training set"/>
      <connect from_op="Hierarchical Classification" from_port="model" to_op="Apply Model" to_port="model"/>
      <connect from_op="Hierarchical Classification" from_port="example set" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Apply Model" to_port="unlabelled data"/>
      <connect from_op="Apply Model" from_port="labelled data" to_port="result 2"/>
      <connect from_op="Apply Model" from_port="model" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>

Logged
Marius
Administrator
Hero Member
*****
Posts: 1794



WWW
« Reply #8 on: July 24, 2013, 02:33:05 PM »

Matthew,

unfortunately that is not possible with a single operator. It is possible to build a custom process that creates hierarchical labels, but that is way more complex.

Best regards,
Marius
Logged

Please add [SOLVED] to the topic title when your problem has been solved! (do so by editing the first post in the thread and modifying the title)
Please click here before posting.
mdc
Jr. Member
**
Posts: 60


« Reply #9 on: July 29, 2013, 05:48:58 PM »


Thanks. That's good to know.

Matthew
Logged
Pages: [1]
  Print  
 
Jump to: