I have a problem with my text pre processing. Maybe anyone can help me
My text looks like this:
T-Mobile US Inc. and two regional carriers, General Communication Inc. in Alaska and CT Cube LP in Texas. The order is subject to review by President Barack Obama.
Oil futures rose 67 cents to $93.98 a barrel as U.S. crude supplies dropped, while gold for August delivery climbed $8 to $1,405 an ounce.
European markets finished sharply lower today with shares in London leading the region. The FTSE 100 was down 2.12% while France's CAC 40 was off 1.87% and Germany's DAX fell lower by 1.20%.
: http://www.proactiveinvestors.com/companies/overview/2245/Salesforce.com : http://www.proactiveinvestors.comcompanies/overview/2245/salesforcecom--2245.html : http://www.proactiveinvestors.com/companies/overview/2397/Goldman+Sachs : http://www.proactiveinvestors.comcompanies/overview/3787/general-motors-company--3787.html : http://www.proactiveinvestors.com/companies/overview/1189/Dell : http://www.proactiveinvestors.comcompanies/overview/1189/dell-1189.html : http://www.proactiveinvestors.com/companies/overview/1189/Dell : http://www.proactiveinvestors.com/companies/overview/2306/Apple : http://www.proactiveinvestors.comcompanies/overview/2306/apple-2306.html : http://www.proactiveinvestors.com/companies/overview/4450/Samsung+Electronics : http://www.proactiveinvestors.com/companies/overview/2306/Apple :
I want to remove the URLs from the text. How can I do this?I think filter tokens does not work?! Is the solution Remove Document parts?
I think the solution should look like this rule: if the word starts with http. or www. then delete the word from the text..... (but only the url of the text)