Welcome,
Guest
. Please
login
or
register
.
Did you miss your
activation email?
Home
Help
Search
Login
Register
Rapid-I
Rapid-I Forum
»
RapidMiner
»
Data Mining / ETL / BI Processes
»
[SOLVED] Probability of relevance in text classification
Pages: [
1
]
« previous
next »
Print
Author
Topic: [SOLVED] Probability of relevance in text classification (Read 342 times)
wmarella
Newbie
Posts: 6
[SOLVED] Probability of relevance in text classification
«
on:
July 06, 2012, 04:48:07 PM »
Hello, does anyone know if there is an operator or setting that will allow me to generate a vector or table of probabilities where each document in the corpus is rated for the probability that it's relevant?
I've trained a naive bayes operator on a set of about 1000 short documents to classify them as relevant or not relevant. I'm able to get it to work sufficiently well that the auc is .853. I'm wondering if there's a way to have not just two classes: definitely relevant, definitely not relevant, and not able to be classified. A human would definitely be able to classify the ones that machine would put in the third group, but I'm thinking if I could generate the probability of relevance for each document, I could pull the ones in the midrange out and improve the accuracy of those remaining.
Thanks in advance for any advice
«
Last Edit: July 11, 2012, 02:38:40 AM by wmarella
»
Logged
Marius
Global Moderator
Hero Member
Posts: 1283
Re: Probability of relevance in text classification
«
Reply #1 on:
July 09, 2012, 10:43:36 AM »
Hi, you can use the operator "Drop Uncertain Predictions" for exactly that.
Best, Marius
Logged
Please add [SOLVED] to the topic title when your problem has been solved! (do so by editing the first post in the thread and modifying the title)
Please
click here
before posting.
wmarella
Newbie
Posts: 6
Re: Probability of relevance in text classification
«
Reply #2 on:
July 11, 2012, 02:37:51 AM »
Thanks, Marius, helpful as always!
Logged
Pages: [
1
]
Print
« previous
next »
Jump to:
Please select a destination:
-----------------------------
General Community
-----------------------------
=> News and Updates
=> Data Mining
=> Chit Chat
-----------------------------
RapidMiner
-----------------------------
=> Getting Started
=> Data Mining / ETL / BI Processes
=> Problems and Support
=> Feature Requests
=> Development
-----------------------------
RapidAnalytics
-----------------------------
=> Getting Started
=> Applications and Integration
-----------------------------
RapidNet
-----------------------------
=> Getting Started
=> Problems and Support
Loading...