Possible Thesis Topics with Oliver Schulte (December 2010)
These topics are in the area of statistical-relational learning, basically machine learning for relational databases. They are presented in descending order of my interest (roughly), so the one I'm keenest on is first. But I'm interested in all these topics and possibly others too, like relating learning to decision-making, planning, and game-playing.
- Inference with Bayes nets for relational data. I am working on a new approach to the difficult problem of how to do inference with Bayes nets when there are cyclic dependencies, as often happens with relational data. For example, suppose that the smoking of Jane predicts the smoking of Jack, which predicts the smoking of Cecile, which predicts the smoking of Jane, where Jack, Jane, and Cecile are all friends with each other. This is a great topic for a Ph.D. thesis, but requires mature math skills, specifically the ability to pursue mathematical conjectures and prove theorems.
- Link-based classification via combining probabilistic predictions from standard classifiers. This is a new way to upgrade standard classifier models for relational data. For instance, I would like to adapt decision trees and relevance vector machines for link-based classification. Relevance vector machines are a probabilistic version of support vector machines.
- Graphical Models for OLAP. On-line Analytic Processing is a mainstream tool for analyzing complex highly structured data, widely used in the database industry. An important part of the structure are hierarchies, like sales in Hamburg, which are part of sales in Germany, which are part of sales in Europe. I'm interested in developing and learning Bayes nets that can compactly represent statistical patterns at different levels of a hierarchy.
- Combining Bayes nets with recommendation systems. Nonnegative matrix factorization models are among the state-of-the-art methods for recommendation systems. They can be naturally represented as graphical models with latent variables. Typically the main focus is on building a latent variable analysis of the link/rating matrix. The methods we have developed so far deal with observed features, like gender, age, profession of users. The idea is to combine our methods with latent variable analysis to obtain a model of the correlations between observed and latent features.
- Bayes nets for ontologies, the semantic web, and description logic. Ontological hierarchies are essential, widely used structures in knowledge representation. Adding hierarchical information to web pages is a key part of the semantic web. The formal foundation for this is typically description logic. I would like to expand Bayes nets for relational structures to Bayes nets with ontologies, in the spirit of Koller and Pfeffer's P-Classic system. A very nice system for representing ontologies is Protege from Stanford and Manchester Universities. It has a plug-in for adding Bayes nets; a neat project would be to expand the plug-in so it learns a Bayes net for a given T-box and A-box.