Pages: [1]
  Print  
Author Topic: [SOLVED]Using k-means clustering on web log data  (Read 431 times)
star
Newbie
*
Posts: 14


« on: December 03, 2013, 12:27:58 PM »

I have a data set from a access web log file which I'm interested in finding similar clusters. (I'm an absolute beginner of data mining). So far I have referred many research papers on the same problem domain.

An Efficient Approach for Clustering Web Access Patterns from Web Logs
http://www.sersc.org/journals/IJAST/vol5/1.pdf

Classifying the user intent of web queries using k-means clustering
http://faculty.ist.psu.edu/jjansen/academic/jansen_user_intent_kmeans.pdf

I want to use k-means clustering to cluster web pages. Although these papers discuss about the algorithm, they do not specify the way of providing input data set. k-means calculate similarity between data points using Euclidean distance. So how to normalize my dataset to be mined using k-means since urls can not directly used for k-means. Any help/good reference on this?

Example Dataset(p1..pn are different web pages)

p1,p2,p3,p4
p1,p2
p1,p5,p6,p7
p1,p2,p3,p5

« Last Edit: January 07, 2014, 08:50:29 AM by star » Logged
ighyboo
Newbie
*
Posts: 10


« Reply #1 on: December 05, 2013, 02:05:32 PM »

Hi Star,

I'm not an expert but the way I would approach the problem is to create a table with p1...pn as columns and individual users as rows.
The values filling the table would be the count of how many times a page has been visited by the user.

UserIDp1p2p3..
User11111
User21100
User31000

Just an idea.. Smiley
« Last Edit: December 05, 2013, 07:30:30 PM by ighyboo » Logged
star
Newbie
*
Posts: 14


« Reply #2 on: January 07, 2014, 07:40:15 AM »

Hi ighyboo,

Thanks for the reply, this is what exactly ended up in doing.
Logged
Pages: [1]
  Print  
 
Jump to: