Pages: [1]
  Print  
Author Topic: Write filters to disk  (Read 937 times)
chris_ml
Newbie
*
Posts: 17


« on: June 11, 2009, 06:06:24 PM »

The operator ModelGrouper is a convenient solution if some preprocessing and predictions models must be
simultaneously written to disk. A data mining process also often contains some filters like the
"FeatureNameFilter" operator which are however not written to disk when the ModelWriter is used.

In the following code, is there a way to also dump the "FeatureNameFilter" into a file such that the complete
process can be later read in and be applied on unseen data?

Code:
<?xml version="1.0" encoding="US-ASCII"?>
<process version="4.4">

  <operator name="Root" class="Process" expanded="yes">
      <parameter key="logverbosity"     value="init"/>
      <parameter key="random_seed"      value="2001"/>
      <parameter key="encoding" value="SYSTEM"/>
      <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
          <parameter key="target_function"      value="polynomial classification"/>
          <parameter key="number_examples"      value="100"/>
          <parameter key="number_of_attributes" value="5"/>
          <parameter key="attributes_lower_bound"       value="-10.0"/>
          <parameter key="attributes_upper_bound"       value="10.0"/>
          <parameter key="local_random_seed"    value="-1"/>
          <parameter key="datamanagement"       value="double_array"/>
      </operator>
      <operator name="NoiseGenerator" class="NoiseGenerator">
          <parameter key="random_attributes"    value="3"/>
          <parameter key="label_noise"  value="0.05"/>
          <parameter key="default_attribute_noise"      value="0.0"/>
          <list key="noise">
          </list>
          <parameter key="offset"       value="0.0"/>
          <parameter key="linear_factor"        value="1.0"/>
          <parameter key="local_random_seed"    value="-1"/>
      </operator>
      <operator name="Normalization" class="Normalization">
          <parameter key="return_preprocessing_model"   value="true"/>
          <parameter key="create_view"  value="false"/>
          <parameter key="method"       value="Z-Transformation"/>
          <parameter key="min"  value="0.0"/>
          <parameter key="max"  value="1.0"/>
      </operator>
      <operator name="FeatureNameFilter" class="FeatureNameFilter">
          <parameter key="filter_special_features"      value="false"/>
          <parameter key="skip_features_with_name"      value="result"/>
      </operator>
      <operator name="NearestNeighbors" class="NearestNeighbors">
          <parameter key="keep_example_set"     value="false"/>
          <parameter key="k"    value="3"/>
          <parameter key="weighted_vote"        value="false"/>
          <parameter key="measure_types"        value="MixedMeasures"/>
          <parameter key="mixed_measure"        value="MixedEuclideanDistance"/>
          <parameter key="nominal_measure"      value="NominalDistance"/>
          <parameter key="numerical_measure"    value="EuclideanDistance"/>
          <parameter key="divergence"   value="GeneralizedIDivergence"/>
          <parameter key="kernel_type"  value="radial"/>
          <parameter key="kernel_gamma" value="1.0"/>
          <parameter key="kernel_sigma1"        value="1.0"/>
      <parameter key="kernel_sigma2"        value="0.0"/>
          <parameter key="kernel_sigma3"        value="2.0"/>
          <parameter key="kernel_degree"        value="3.0"/>
          <parameter key="kernel_shift" value="1.0"/>
          <parameter key="kernel_a"     value="1.0"/>
          <parameter key="kernel_b"     value="0.0"/>
      </operator>
      <operator name="ModelGrouper" class="ModelGrouper">
      </operator>
      <operator name="ModelWriter" class="ModelWriter">
          <parameter key="model_file"   value="combined_model_bin.mod"/>
          <parameter key="overwrite_existing_file"      value="true"/>
          <parameter key="output_type"  value="XML"/>
      </operator>
  </operator>

</process>
Logged
Sebastian Land
Administrator
Hero Member
*****
Posts: 2426


« Reply #1 on: June 15, 2009, 09:55:19 AM »

Hi Chris,
this unfortunately is not possible. You still have to design a process for application. But you could use a trick for simplifying this:
If you store all the preprocessing stuff in a single process, you might load and apply it in both the training process as well as in the apply process using the process embedder. Then this process behaves like a modell itself.

Greetings,
  Sebastian
Logged
Pages: [1]
  Print  
 
Jump to: