Copyright 2017 - CSIM - Asian Institute of Technology

Information Retrieval and Data Mining

Course code: AT71.07
Credits: 3(3–0)
This course is required

Course objectives

With the growth of massive digital data archives, which are not necessarily organized in any order, the twin and complementary processes of information retrieval and data mining have emerged together as a particular important discipline within the information sciences. The object of information retrieval is to automatically search a data archive in order to respond to a user’s query. The object of data mining, on the other hand, is to automatically process a data archive in order to find patterns that represent knowledge or, equivalently, information interesting to the user (not necessarily in response to a targeted query). Information retrieval and data mining invoke multidisciplinary techniques, including those from artificial intelligence, statistics, machine learning, pattern analysis, and others.

Learning outcome

The object of this course is to introduce information retrieval and data mining techniques with a view to practical application. Topics covered will include association and rule generation, classification and prediction (including Bayesian and rule-based), cluster analysis (including partitioning, hierarchical and grid-based methods, and outlier analysis), data stream mining, social network analysis, Boolean retrieval, index construction and compression, vector space model, relevance feedback and query expansion, probabilistic information retrieval. Practical case studies will use both commercial and non-commercial software packages.

Course outline

I.             Boolean Retrieval
1.      Inverted index
2.      Processing Boolean queries
3.      Extended Boolean model
4.      Ranked retrieval
 
II.          Index Construction
1.      Blocked sort-based indexing
2.      Single-pass in-memory indexing
3.      Distributed indexing
4.      Dynamic indexing
 
III.       Index Compression
1.      Statistical properties of terms in information retrieval
2.      Dictionary compression
3.      Postings file compression
 
IV.       Scoring and the Vector Space Model
1.      Parametric and zone indexes
2.      Vector space model for scoring
 
V.          Mining Frequent Patterns, Associations, And Correlations
1.      Efficient and scalable frequent itemset mining methods
2.      Mining association rules
3.      Association mining to correlation analysis
4.      Constraint-based association mining
 
VI.       Classification And Prediction
1.      Classification and prediction methods
2.      Accuracy and error measures
3.      Evaluation techniques
4.      Model selection
 
VII.    Cluster Analysis
1.      Clustering methods
2.      High-dimensional data
3.      Constraint-based cluster analysis
4.      Outlier analysis
 
VIII.Special Applications
1.      Mining data streams
2.      Mining time series data
3.      Graph mining
4.      Social network analysis
 
IX.       Case studies

Learning resources

Textbook

J. Han and M. Kamber (2006), Data Mining: Concepts and Techniques, 2nd edition, Morgan Kaufmann.
C. D. Manning, P. Raghavan, H. Schutze (2009), An Introduction to Information Retrieval, Cambridge University Press.

Reference books

M. J. A. Berry and G. Linoff (1997), Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management, Wiley.
I. H. Witten and E. Frank (2001), Data Mining: Practical Machine Learning Tools and Techniques, Morgan Kaufmann.
T. Soukup and I. Davidson (2002), Visual Data Mining: Techniques and Tools for Data Visualization and Mining, Wiley.
P. Tan, M. Steinbach and V. Kumar (2005), Introduction to Data Mining, Addison-Wesley.
D. T. Larose (2006), Data Mining Methods and Models, Wiley.

Grading

Assignment 30%,
Midterm exam 30%,
Final exam 40%

Back to the list

 

Login Form

Search

School of Engineering and technologies     Asian Institute of Technology