Copyright 2019 - CSIM - Asian Institute of Technology

Information Retrieval and Data Mining

Course code: AT71.07
Credits: 3(3–0)
This course is required

Course objectives

With the growth of massive digital data archives, which are not necessarily organized in any order, the twin and complementary processes of information retrieval and data mining have emerged together as a particular important discipline within the information sciences. The object of information retrieval is to automatically search a data archive in order to respond to a user’s query. The object of data mining, on the other hand, is to automatically process a data archive in order to find patterns that represent knowledge or, equivalently, information interesting to the user (not necessarily in response to a targeted query). Information retrieval and data mining invoke multidisciplinary techniques, including those from artificial intelligence, statistics, machine learning, pattern analysis, and others.

Learning outcome

The object of this course is to introduce information retrieval and data mining techniques with a view to practical application. Topics covered will include association and rule generation, classification and prediction (including Bayesian and rule-based), cluster analysis (including partitioning, hierarchical and grid-based methods, and outlier analysis), data stream mining, social network analysis, Boolean retrieval, index construction and compression, vector space model, relevance feedback and query expansion, probabilistic information retrieval. Practical case studies will use both commercial and non-commercial software packages.

Course outline

I.             Boolean Retrieval
1.      Inverted index
2.      Processing Boolean queries
3.      Extended Boolean model
4.      Ranked retrieval
II.          Index Construction
1.      Blocked sort-based indexing
2.      Single-pass in-memory indexing
3.      Distributed indexing
4.      Dynamic indexing
III.       Index Compression
1.      Statistical properties of terms in information retrieval
2.      Dictionary compression
3.      Postings file compression
IV.       Scoring and the Vector Space Model
1.      Parametric and zone indexes
2.      Vector space model for scoring

Learning resources


J. Han and M. Kamber (2006), Data Mining: Concepts and Techniques, 2nd edition, Morgan Kaufmann.
C. D. Manning, P. Raghavan, H. Schutze (2009), An Introduction to Information Retrieval, Cambridge University Press.


M. J. A. Berry and G. Linoff (1997), Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management, Wiley.
I. H. Witten and E. Frank (2001), Data Mining: Practical Machine Learning Tools and Techniques, Morgan Kaufmann.
T. Soukup and I. Davidson (2002), Visual Data Mining: Techniques and Tools for Data Visualization and Mining, Wiley.
P. Tan, M. Steinbach and V. Kumar (2005), Introduction to Data Mining, Addison-Wesley.
D. T. Larose (2006), Data Mining Methods and Models, Wiley.

Back to the list


Login Form


School of Engineering and technologies     Asian Institute of Technology