first of all: thanks for your kind words. We always really appreciate if people like working with our product!
Before I start I must admit that requests like yours leave a bit the scope of this forum and resembles more consultancy than technical support. However, please find below some comments and hints which might help. But don't expect too much: it is hardly possible to condense years of experience in a few lines only without having seen the data
- first a short comment: if the classes are more or less equally distributed, 30% might not be that bad (at least it's 30 times "better" than just guessing) taking into account that data seems to be noisy and this might be caused by the fact that humans maybe are not better for this task.
- it's often more about preprocessing than about learning. However, just try one or two different SVM with a linear kernel (often the best selection for text classification) and vary the important parameters to be sure that you don't give away too much by not properly selected and tuned learning schemes
- make sure that exactly the same term space is used for modeling and scoring
- try with and without stop word removal and stemming, also try n-grams. Latter might be important for texts of lower qualities
- you only have about 200 texts for each class. This is not really much - are text at least equally distributed? Try over-sampling seldom classes by text windowing approaches if necessary
- try different vectorization schemes, especially TFIDF instead of mere term frequency
- make sure that you have used appropriate distance measures for text classification if applicable (e.g. for K-NN)
- sometimes grouping of classes first, maybe even into a hierarchy of classes (if possible) deliver much better results
- especially if the data is extremely noisy and the amount of data is restricted you should consider postprocessing like multiple predicted based on the confidence scores or handling uncertainty
As I have said before these are more general hints instead of a real "consulting". If you need to discuss these and many other possible options with one of our consultants, please get in touch with Rapid-I.