Pages: [1]
  Print  
Author Topic: stock prediction model problem  (Read 4662 times)
ahanazi
Newbie
*
Posts: 11


« on: June 04, 2009, 11:52:54 PM »

gent;
i hope you are doing well.

i'm using SVM - LibSVMLearner

i gut this result:
class  precision : pred. Sell = 95%
                           : pred. buy= 90%
when i try to apply the model in another dataset it give me all rediction is :Buy!!!!!
how can i solve this??

***here is the XML *****
<operator name="XLU Prediction with a SVM" class="Process" expanded="yes">
    <parameter key="resultfile"   value="C:\Documents and Settings\ahanazi\Desktop\Testinf Rapid\myFirstModel\Model2\result.res"/>
    <operator name="Load Data from Spreadsheet" class="ExcelExampleSource">
        <parameter key="excel_file"   value="C:\Documents and Settings\ahanazi\Desktop\Testinf Rapid\myFirstModel\Model2\data\index-weekly-from-1-1-2003-T0-28-2-2009.xls"/>
        <parameter key="first_row_as_names"   value="true"/>
        <parameter key="create_label"   value="true"/>
        <parameter key="label_column"   value="11"/>
        <parameter key="create_id"   value="true"/>
        <parameter key="id_column"   value="2"/>
    </operator>
    <operator name="Normalize the Data" class="Normalization">
        <parameter key="return_preprocessing_model"   value="true"/>
        <parameter key="create_view"   value="true"/>
        <parameter key="min"   value="0.1"/>
        <parameter key="max"   value="0.9"/>
    </operator>
    <operator name="DataStatistics" class="DataStatistics">
    </operator>
    <operator name="Cross Validate" class="XValidation" expanded="yes">
        <parameter key="keep_example_set"   value="true"/>
        <parameter key="create_complete_model"   value="true"/>
        <operator name="Train the SVM" class="LibSVMLearner">
            <parameter key="keep_example_set"   value="true"/>
            <parameter key="degree"   value="5"/>
            <parameter key="gamma"   value="0.8976"/>
            <parameter key="C"   value="19.0"/>
            <list key="class_weights">
            </list>
            <parameter key="calculate_confidences"   value="true"/>
        </operator>
        <operator name="ModelWriter" class="ModelWriter">
            <parameter key="model_file"   value="C:\Documents and Settings\ahanazi\Desktop\Testinf Rapid\myFirstModel\Model2\realModelFile\SVM ahmed model.mod"/>
            <parameter key="overwrite_existing_file"   value="false"/>
            <parameter key="output_type"   value="XML"/>
        </operator>
    </operator>
    <operator name="Test the SVM's Performance" class="OperatorChain" expanded="yes">
        <operator name="Apply the SVM to Test Data" class="ModelApplier">
            <parameter key="keep_model"   value="true"/>
            <list key="application_parameters">
            </list>
            <parameter key="create_view"   value="true"/>
        </operator>
        <operator name="Give Performance Stats" class="ClassificationPerformance">
            <parameter key="keep_example_set"   value="true"/>
            <parameter key="accuracy"   value="true"/>
            <parameter key="weighted_mean_recall"   value="true"/>
            <parameter key="weighted_mean_precision"   value="true"/>
            <parameter key="correlation"   value="true"/>
            <parameter key="margin"   value="true"/>
            <parameter key="logistic_loss"   value="true"/>
            <list key="class_weights">
            </list>
        </operator>
    </operator>
</operator>
***************************************************end***************************************************
Best Regards for all.
Logged
Sebastian Land
Administrator
Hero Member
*****
Posts: 2426


« Reply #1 on: June 05, 2009, 07:32:12 AM »

Hi,
your are estimating the performance on the train set. Since SVMs easily overfit the data (especially with rbf kernel), the performance on the train set might be very good (possibly 100%) but will fail on new examples.
I would suggest you are take a look on the sample processes for the xvalidation, since you are using it in a strange way: You are learning a model for each fold and write each of them into the same file, overwriting the previous one. Originally XValidation is build for performance estimation. This would be achieved by a setup like the following:


Code:
<operator name="XLU Prediction with a SVM" class="Process" expanded="yes">
    <parameter key="resultfile" value="C:\Documents and Settings\ahanazi\Desktop\Testinf Rapid\myFirstModel\Model2\result.res"/>
    <operator name="Load Data from Spreadsheet" class="ExcelExampleSource">
        <parameter key="excel_file" value="C:\Documents and Settings\ahanazi\Desktop\Testinf Rapid\myFirstModel\Model2\data\index-weekly-from-1-1-2003-T0-28-2-2009.xls"/>
        <parameter key="first_row_as_names" value="true"/>
        <parameter key="create_label" value="true"/>
        <parameter key="label_column" value="11"/>
        <parameter key="create_id" value="true"/>
        <parameter key="id_column" value="2"/>
    </operator>
    <operator name="Normalize the Data" class="Normalization">
        <parameter key="return_preprocessing_model" value="true"/>
        <parameter key="create_view" value="true"/>
        <parameter key="min" value="0.1"/>
        <parameter key="max" value="0.9"/>
    </operator>
    <operator name="DataStatistics" class="DataStatistics">
    </operator>
    <operator name="Cross Validate" class="XValidation" expanded="yes">
        <parameter key="keep_example_set" value="true"/>
        <parameter key="create_complete_model" value="true"/>
        <operator name="Train the SVM" class="LibSVMLearner">
            <parameter key="keep_example_set" value="true"/>
            <parameter key="degree" value="5"/>
            <parameter key="gamma" value="0.8976"/>
            <parameter key="C" value="19.0"/>
            <list key="class_weights">
            </list>
            <parameter key="calculate_confidences" value="true"/>
        </operator>
        <operator name="Test the SVM's Performance" class="OperatorChain" expanded="no">
            <operator name="Apply the SVM to Test Data" class="ModelApplier">
                <parameter key="keep_model" value="true"/>
                <list key="application_parameters">
                </list>
                <parameter key="create_view" value="true"/>
            </operator>
            <operator name="Give Performance Stats" class="ClassificationPerformance">
                <parameter key="keep_example_set" value="true"/>
                <parameter key="accuracy" value="true"/>
                <parameter key="weighted_mean_recall" value="true"/>
                <parameter key="weighted_mean_precision" value="true"/>
                <parameter key="correlation" value="true"/>
                <parameter key="margin" value="true"/>
                <parameter key="logistic_loss" value="true"/>
                <list key="class_weights">
                </list>
            </operator>
        </operator>
    </operator>
    <operator name="ModelWriter" class="ModelWriter">
        <parameter key="model_file" value="C:\Documents and Settings\ahanazi\Desktop\Testinf Rapid\myFirstModel\Model2\realModelFile\SVM ahmed model.mod"/>
        <parameter key="overwrite_existing_file" value="false"/>
        <parameter key="output_type" value="XML"/>
    </operator>
</operator>

Greetings,
  Sebastian
Logged
emolano
Newbie
*
Posts: 13


« Reply #2 on: June 05, 2009, 07:11:42 PM »

so the model writer should be after the cross validate chain? always?
does it mean that the example on http://www.neuralmarkettrends.com/2007/05/09/building-an-ai-financial-model-lesson-iv/ is wrong ?
Please look at the way they do it at http://www.neuralmarkettrends.com/wp-content/uploads/2007/05/preformance-pref.JPG - they write the model after every iteration. Maybe land got the code from them.
Thanks
e
Logged
haddock
Hero Member
*****
Posts: 853



WWW
« Reply #3 on: June 05, 2009, 07:52:49 PM »

Hi there,

Sebastian is right to say that writing out the model each time makes no sense at all. I'd like to add that you should look at sliding window validation as well, and avoid stratified sampling. This latter issue has already been covered at...

http://rapid-i.com/rapidforum/index.php/topic,908.msg3395.html#msg3395

Otherwise you will have some very nasty surprises when you trade!

Good weekend to all!
Logged

Where is the wisdom we have lost in knowledge?
Where is the knowledge we have lost in information?

T.S.Eliot ~ Choruses from the Rock 1934
ahanazi
Newbie
*
Posts: 11


« Reply #4 on: June 05, 2009, 10:06:39 PM »

thaks a lot.
actually i'm new for rapidminer...
how can i finad find ready made models? which i can learn from and make development ?

my objective is to create models for trading stock . i lost a lot of money....now i back with rapidminer as last chance Grin.
Logged
haddock
Hero Member
*****
Posts: 853



WWW
« Reply #5 on: June 06, 2009, 12:52:05 AM »

Hi there Ahanazi,

Reading your last post I felt a lot of sympathy, but also a need to try and warn you about what you can achieve with RM, and what you cannot. I've been in the field of artificial intelligence/machine learning and finance for twenty years, I've made mistakes and I've tried to learn from them, some are obvious while others sneak up on you when you aren't paying attention. So here are my two cents...

Whatever you do, never trade a system you don't understand, so don't get one off the shelf, because you will never understand how to set your stop losses and profit targets, and without money management only doom awaits. Also understand that markets evolve, which means that there is not a model that will always work, unless of course it can evolve as well. We are talking about human activity, where the rules of the game change, not copper sulphate, whose properties are constant.

The good news is that by being here you are on the right track, but you will need to put in a lot of work. RM is in my experience simply the best environment to build and test models in a rigorous way. That means that you can build and test extensive systems rapidly, just like it says on the box. I use it to identify markets where models work best, and to understand what performance I can reasonably expect; generally if you ask people why they trade market X rather than market Y they have no idea. No wonder most traders lose their money in less than two years. So, in a nutshell, you can use RM to identify markets and trading horizons. If you press on the globe underneath my ludicrous picture you'll get an idea of what I mean. I'll post up more details of my methods there if folks are interested.

So you can reasonably expect to build and test systems whose limitations you will understand, what you will then need is a way of simulating their behaviour with various money management techniques, by integrating with Tradestation or the like. I cannot stress too much the need to match the performance of whatever you make with your ability to handle the hits that the real world will dole up to any model. Risk too much and die quickly, risk too little and die from boredom!

If you have managed to stay awake through all of the above you probably have the stamina to see it through. Start with the examples, and don't even think about markets until you can explain all of those examples. Like most things, it will take many many hours of hard work, but the rewards are there. No point in doing otherwise, as poor is not fun!

" I'm so poor I can't even pay attention." ~Ron Kittle, 1987

Logged

Where is the wisdom we have lost in knowledge?
Where is the knowledge we have lost in information?

T.S.Eliot ~ Choruses from the Rock 1934
ahanazi
Newbie
*
Posts: 11


« Reply #6 on: June 06, 2009, 06:46:02 AM »

Mr. Haddock
Good morning;

i read what you have written carefuly, and i would like to comments:
i like to talk to expert people like you.
2- in my working in RM, i like to acheive tow things, i want to create model give me percentage of propapilty of high return with percentage of the risk of buying this stock (or group of stocks), i beleve that RM will help in this. i have used Amibroker a lot, but i feel will not help me in this... that how and why i'm using RM.
i visit your website of risk, and i like to learn a lot about if you can post examples created by RM.

my question: from where can i get lot usefull examples of RM?.

again thanks a lot for your valuable advice.
BR
Logged
wessel
Hero Member
*****
Posts: 558


« Reply #7 on: June 06, 2009, 01:46:22 PM »

I have a problem very similar to this:
Regression problem: 2 numerical attributes, 1 numerical class attribute.

I would like to evaluate the performance of 2 different "machine learning algorithms"
- WEKA REP-TREE
- WEKA Linear Regression

I'm able to do this in WEKA, but I can't figure out the correct setup in RapidMinder.
I would be greatful for some xml examples on how to compare different  "machine learning algorithms".

This is what I do in WEKA:
TRAIN / TEST percentage split, random order using seed
do this 5 times, using a different seed

Dataset:
v       u       label
-------------------
v0     u0    label0
v1     u1    label1
v2     u2    label2
...     ...
v99  u99   label99

performance := <abs error, rel error>

Output example::
seed, performance(Linear Regression), performance(REP-TREE)
0, <5, 100%>, <3, 60%>
1, <5, 100%>, <2, 40%>
2, <5, 100%>, <3, 60%>
3, <5, 100%>, <2, 40%>
4, <5, 100%>, <3, 60%>

significant_difference(REP-TREE, LinearRegression) == True

Logged
haddock
Hero Member
*****
Posts: 853



WWW
« Reply #8 on: June 06, 2009, 07:07:53 PM »

Hi Folks!

Quick replies, as food beckons and stomach makes funny noises...

@ ahanazi - Sorry, when I said "examples" I should have said "samples", my apologies. So I mean all you can open if you go File/Open/Samples and pick your subject area. These are the files that the tutorial ( Help/Rapidminer Tutorial ) selects from.

@ wessel - If you cast a glance over the following code you'll soon see that it is just a wrapper applied around one of those validation samples, and some more learners bashed in, just a few minutes to produce more or less infinite combos. Given it grinds on random junk it is only a junk muncher, but it might point you in interesting directions, hope so.

Code:
<operator name="Root" class="Process" expanded="yes">
    <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
        <parameter key="target_function" value="random"/>
        <parameter key="number_of_attributes" value="2"/>
    </operator>
    <operator name="For Leaner = 1 to 3" class="IteratingOperatorChain" expanded="yes">
        <parameter key="iterations" value="3"/>
        <operator name="Set Learner Number" class="SingleMacroDefinition">
            <parameter key="macro" value="Learner"/>
            <parameter key="value" value="%{a}"/>
        </operator>
        <operator name="Train and Test" class="XValidation" expanded="yes">
            <parameter key="keep_example_set" value="true"/>
            <parameter key="sampling_type" value="shuffled sampling"/>
            <operator name="Create Model Using Learner Number" class="OperatorSelector" expanded="yes">
                <operator name="1 Training" class="LibSVMLearner">
                    <parameter key="svm_type" value="epsilon-SVR"/>
                    <parameter key="kernel_type" value="poly"/>
                    <parameter key="C" value="1000.0"/>
                    <list key="class_weights">
                    </list>
                </operator>
                <operator name="2 LinearRegression" class="LinearRegression">
                </operator>
                <operator name="3 GPLearner" class="GPLearner">
                </operator>
            </operator>
            <operator name="ApplierChain" class="OperatorChain" expanded="yes">
                <operator name="Test" class="ModelApplier">
                    <list key="application_parameters">
                    </list>
                </operator>
                <operator name="Evaluation" class="RegressionPerformance">
                    <parameter key="root_mean_squared_error" value="true"/>
                    <parameter key="absolute_error" value="true"/>
                    <parameter key="relative_error" value="true"/>
                    <parameter key="normalized_absolute_error" value="true"/>
                    <parameter key="root_relative_squared_error" value="true"/>
                    <parameter key="squared_error" value="true"/>
                    <parameter key="correlation" value="true"/>
                </operator>
            </operator>
        </operator>
        <operator name="Map results to a table" class="ProcessLog">
            <list key="log">
              <parameter key="Learner" value="operator.Set Learner Number.value.macro_value"/>
              <parameter key="Performance" value="operator.Train and Test.value.performance"/>
            </list>
        </operator>
    </operator>
    <operator name="Cleanup and view results" class="IOConsumer">
        <parameter key="io_object" value="ExampleSet"/>
        <parameter key="deletion_type" value="delete_one"/>
    </operator>
</operator>
Logged

Where is the wisdom we have lost in knowledge?
Where is the knowledge we have lost in information?

T.S.Eliot ~ Choruses from the Rock 1934
wessel
Hero Member
*****
Posts: 558


« Reply #9 on: June 06, 2009, 11:08:28 PM »

This looks really nice. Thanks a lot   Grin!

I think I can play with it and make it perform 10 X validations, all with a different seed.

I'm not sure how to modify the "Map results to a table" though.
When I double click on it, and then click "log: Edit List (2)...", it doesn't let me add columns .
Your table currently the looks like this:
Learner Performance
1          0.306
2          0.297
3          0.300

Something like this would be more informative:
Seed Absolute_Error_Learner1 Absolute_Error_Learner2 Absolute_Error_Learner3
1        0.306        0.297        0.300
2        0.326        0.227        0.320
3        0.333        0.233        0.333
4        0.346        0.247        0.340
5        0.305        0.295        0.305
...
10      0.306        0.297        0.300


With a table like this its easy to calculate if their is a significant difference by hand.
« Last Edit: June 06, 2009, 11:16:05 PM by wessel » Logged
haddock
Hero Member
*****
Posts: 853



WWW
« Reply #10 on: June 07, 2009, 11:45:18 AM »

Hi Wessel,

Not sure about the edit list problem you encountered, sometimes you need to add one by one, perhaps you could elaborate? Anyway here is a rework to expand along the lines you suggest, I've used the parameter iteration operator, as it is neater than nested loops, and I've converted the results log to an example set, so that the results can be aggregated. Have fun tinkering with it!

Code:
<operator name="Root" class="Process" expanded="yes">
    <operator name="ParameterIteration" class="ParameterIteration" expanded="yes">
        <list key="parameters">
          <parameter key="ExampleSetGenerator.local_random_seed" value="[1.0;5.0;5;linear]"/>
          <parameter key="Create Model Using Learner Number.select_which" value="[1.0;3.0;3;linear]"/>
        </list>
        <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
            <parameter key="target_function" value="random"/>
            <parameter key="number_of_attributes" value="2"/>
            <parameter key="local_random_seed" value="5"/>
        </operator>
        <operator name="Train and Test" class="XValidation" expanded="yes">
            <parameter key="keep_example_set" value="true"/>
            <parameter key="sampling_type" value="shuffled sampling"/>
            <operator name="Create Model Using Learner Number" class="OperatorSelector" expanded="yes">
                <parameter key="select_which" value="3"/>
                <operator name="1 Training" class="LibSVMLearner">
                    <parameter key="svm_type" value="epsilon-SVR"/>
                    <parameter key="kernel_type" value="poly"/>
                    <parameter key="C" value="1000.0"/>
                    <list key="class_weights">
                    </list>
                </operator>
                <operator name="2 LinearRegression" class="LinearRegression">
                </operator>
                <operator name="3 GPLearner" class="GPLearner">
                </operator>
            </operator>
            <operator name="ApplierChain" class="OperatorChain" expanded="yes">
                <operator name="Test" class="ModelApplier">
                    <list key="application_parameters">
                    </list>
                </operator>
                <operator name="Evaluation" class="RegressionPerformance">
                    <parameter key="root_mean_squared_error" value="true"/>
                    <parameter key="absolute_error" value="true"/>
                    <parameter key="relative_error" value="true"/>
                    <parameter key="normalized_absolute_error" value="true"/>
                    <parameter key="root_relative_squared_error" value="true"/>
                    <parameter key="squared_error" value="true"/>
                    <parameter key="correlation" value="true"/>
                </operator>
            </operator>
        </operator>
        <operator name="Map results to a table" class="ProcessLog">
            <list key="log">
              <parameter key="Seed" value="operator.ExampleSetGenerator.parameter.local_random_seed"/>
              <parameter key="Learner" value="operator.Create Model Using Learner Number.parameter.select_which"/>
              <parameter key="Performance" value="operator.Evaluation.value.absolute_error"/>
            </list>
        </operator>
    </operator>
    <operator name="Convert log for aggregation" class="ProcessLog2ExampleSet">
    </operator>
    <operator name="Make model number nominal" class="AttributeSubsetPreprocessing" expanded="no">
        <parameter key="condition_class" value="attribute_name_filter"/>
        <parameter key="attribute_name_regex" value="Learner"/>
        <operator name="Numerical2FormattedNominal" class="Numerical2FormattedNominal">
        </operator>
    </operator>
    <operator name="Do stat by leaner" class="Aggregation">
        <list key="aggregation_attributes">
          <parameter key="Performance" value="average"/>
          <parameter key="Performance" value="variance"/>
          <parameter key="Performance" value="standard_deviation"/>
        </list>
        <parameter key="group_by_attributes" value="Learner"/>
    </operator>
</operator>
Logged

Where is the wisdom we have lost in knowledge?
Where is the knowledge we have lost in information?

T.S.Eliot ~ Choruses from the Rock 1934
wessel
Hero Member
*****
Posts: 558


« Reply #11 on: June 07, 2009, 12:40:38 PM »

What version of RapidMiner are you using?
When I load the xml I get XMLException: unknown operator: Numerical2FormattedNominal

I installed the value series plugin, but still gives me the same error.
I have version 4.4

When I google:
Numerical2FormattedNominal + site:http://rapid-i.com/
it only finds this post  Huh
« Last Edit: June 07, 2009, 12:57:16 PM by wessel » Logged
haddock
Hero Member
*****
Posts: 853



WWW
« Reply #12 on: June 07, 2009, 01:09:15 PM »

Hi Wessel,

I'm on 4.4 enterprise, and note the operator is not on the community edition, not sure why. You can use numerical2polynominal instead, like this...

Code:
<operator name="Root" class="Process" expanded="yes">
    <operator name="ParameterIteration" class="ParameterIteration" expanded="yes">
        <list key="parameters">
          <parameter key="ExampleSetGenerator.local_random_seed" value="[1.0;5.0;5;linear]"/>
          <parameter key="Create Model Using Learner Number.select_which" value="[1.0;3.0;3;linear]"/>
        </list>
        <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
            <parameter key="target_function" value="random"/>
            <parameter key="number_of_attributes" value="2"/>
            <parameter key="local_random_seed" value="5"/>
        </operator>
        <operator name="Train and Test" class="XValidation" expanded="yes">
            <parameter key="keep_example_set" value="true"/>
            <parameter key="sampling_type" value="shuffled sampling"/>
            <operator name="Create Model Using Learner Number" class="OperatorSelector" expanded="yes">
                <parameter key="select_which" value="3"/>
                <operator name="1 Training" class="LibSVMLearner">
                    <parameter key="svm_type" value="epsilon-SVR"/>
                    <parameter key="kernel_type" value="poly"/>
                    <parameter key="C" value="1000.0"/>
                    <list key="class_weights">
                    </list>
                </operator>
                <operator name="2 LinearRegression" class="LinearRegression">
                </operator>
                <operator name="3 GPLearner" class="GPLearner">
                </operator>
            </operator>
            <operator name="ApplierChain" class="OperatorChain" expanded="yes">
                <operator name="Test" class="ModelApplier">
                    <list key="application_parameters">
                    </list>
                </operator>
                <operator name="Evaluation" class="RegressionPerformance">
                    <parameter key="root_mean_squared_error" value="true"/>
                    <parameter key="absolute_error" value="true"/>
                    <parameter key="relative_error" value="true"/>
                    <parameter key="normalized_absolute_error" value="true"/>
                    <parameter key="root_relative_squared_error" value="true"/>
                    <parameter key="squared_error" value="true"/>
                    <parameter key="correlation" value="true"/>
                </operator>
            </operator>
        </operator>
        <operator name="Map results to a table" class="ProcessLog">
            <list key="log">
              <parameter key="Seed" value="operator.ExampleSetGenerator.parameter.local_random_seed"/>
              <parameter key="Learner" value="operator.Create Model Using Learner Number.parameter.select_which"/>
              <parameter key="Performance" value="operator.Evaluation.value.absolute_error"/>
            </list>
        </operator>
    </operator>
    <operator name="Convert log for aggregation" class="ProcessLog2ExampleSet">
    </operator>
    <operator name="Make model number nominal" class="AttributeSubsetPreprocessing" expanded="yes">
        <parameter key="condition_class" value="attribute_name_filter"/>
        <parameter key="attribute_name_regex" value="Learner"/>
        <operator name="Numerical2Polynominal" class="Numerical2Polynominal">
        </operator>
    </operator>
    <operator name="Do stat by leaner" class="Aggregation">
        <list key="aggregation_attributes">
          <parameter key="Performance" value="average"/>
          <parameter key="Performance" value="variance"/>
          <parameter key="Performance" value="standard_deviation"/>
        </list>
        <parameter key="group_by_attributes" value="Learner"/>
    </operator>
</operator>
Logged

Where is the wisdom we have lost in knowledge?
Where is the knowledge we have lost in information?

T.S.Eliot ~ Choruses from the Rock 1934
Pages: [1]
  Print  
 
Jump to: