Pages: [1]
  Print  
Author Topic: Polynominal Value Reduction  (Read 753 times)
seanv507
Newbie
*
Posts: 2


« on: August 07, 2013, 10:58:09 PM »

Hi
I would like to replicate a process i have done in Python/scikit-learn/R:

I am looking at Advertising Click Through Rate prediction. ( Millions of rows, say ~5 polynominal features... each with up to 1000 different values (eg feature=Website, Country etc).

Since the feature data is "skewed", ie many values have very few instances in data and vice versa, I want to restrict the polynominal features to those that  change CTR significantly from base CTR ( and replace the "long tail" by a single "NA" category for each polynominal feature).

Is there any way of doing this within rapid miner?

Logged
fras
Global Moderator
Jr. Member
*****
Posts: 81


« Reply #1 on: August 12, 2013, 05:25:47 PM »

Hi,

as far as I understand the problem I would do two things first:

- get a sample of your data (reduce rows, 1%)
- apply operator "NominalToBinominal"

Then analyse how sparse your data is.
For more advice examples are useful.
Logged
seanv507
Newbie
*
Posts: 2


« Reply #2 on: August 27, 2013, 02:11:09 PM »

CTR data is "unbalanced" - ie ~1% chance of clicking.  So subsampling is good - but I have to do it only on the "non-click class" and then reweight the class in the training algorithm [ eg  data  contains 100 clicks, 100000 non-clicks - I am happy to subsample non-clicks]

feature data is JUST IDs: WebsiteID, AdID etc [ eg google.com=1, yahoo.com=2, cnbc.com=3,....], so no description of website.

So yes I want to to NominaltoBinominal, but then/at same time/before I want to FILTER out those Binominals eg certain websites for which there is little training data]
( see eg http://www.kaggle.com/about/papers ... click though rate)
Logged
Pages: [1]
  Print  
 
Jump to: