Welcome to the laboratory for database and data mining at Simon Fraser university. In today's information society, we witness an explosive growth of the amount of information becoming available in electronic form and stored in large databases. . For example, many companies operate huge data warehouses collecting many different types of information about their customers.

In biology, rapid progress has been achieved over the last few years in obtaining the genetic code of the human as well as many other organisms. Now, all these institutions want to make sense out of their valuable data.The size and diversity of these databases prohibits manual analysis. The goal of knowledge discovery in databases (KDD) is the (semi)-automatic extraction of implicit, valid and potentially useful knowledge from these databases. The core step of KDD, which has received most attention of researchers, is data mining, i.e. the application of efficient algorithms to extract all valid patterns from a database. Data mining techniques have already been successfully applied in a wide range of applications including direct marketing, fraud detection, analysis of web logs, and analysis of genome data.

KDD is an interdisciplinary area at the intersection of database systems, machine learning, statistics and other disciplines. Database systems provide a uniform framework for data mining by efficiently managing large datasets, integrating different data-types and storing the discovered knowledge. Our KDD research focuses on a database systems perspective and addresses both the foundations of data mining and its applications. In particular, it has the following major objectives:

  • scalability for large databases
  • integration of database systems and data mining
  • discovery of understandable and actionable knowledge
  • mining complex data types such as text data, spatial data or biological data

Our lab also investigates emerging applications of information systems posing new database systems challenges such as on-demand map generation and focused web crawlers. The challenges of on-demand map generation, for example, are the complexity of map generalization operations and the ill-defined notion of the quality of the resulting maps. The world-wide-web can be understood as a huge, distributed text database, and focused web crawlers need efficient and effective strategies to explore it.

More detailed information is available on the pages of the individual research projects.