Structure Learning for Directed and Undirected Graphical Models in Relational Data

Our general approach is to first learn a directed model (Bayes net). An undirected model (Markov Logic Network) can be obtained by converting the Bayes net using moralization.

Directed Model Learning

Our basic structure learning algorithm is the learn-and-join algorithm. The operation of the algorithm can be visualized as a level-wise search through a lattice of table joins, where dependencies learned on smaller joins are propagated to larger ones. At each point in the join lattice, a standard propositional Bayes net learner can be employed to build a model of context-sensitive dependencies. A complete presentation of the lattice search model is available in our Machine Learning journal paper (unprotected version here). The initial AAAI 2010 conference presentation of the learn-and-join algorithm is available here.

Undirected Model Learning

Markov Logic Networks (MLNs) form one of the most prominent SRL model classes; they generalize both first-order logic and Markov network models. The qualitative component or structure of an MLN is a finite set of formulas or clauses {pi}, and its quantitative component is a set of weights {wi}. Pedro Domingos and his collaborators have developed an ingenious way to view Markov Logic Networks as templates for Markov random fields. MLNs have become popular as a unifying framework because of their expressive structure that can model knowledge in many different fields of AI. Markov Logic networks provide a principled log-linear model for inference with a strong theoretical foundation. The Alchemy system is an open-source toolbox for MLNs with complete learning and inference routines.
Many of the current state of the art algorithms for learning MLNs have focused on relatively small datasets with few descriptive attributes, where predicates are mostly binary and the main task is usually prediction of links between entities. Scaling to larger datasets has been a challenge: State-of-the-art learners report results on datasets only as big as 3,000 ground atoms (basic facts). Our moralization approach performed structure learning for MLNs in medium sized relational datasets with schemas that feature a significant number of descriptive attributes, compared to the number of relationships. The evaluation was done on datasets with up to 170,000 true ground atom; structure learning with the learn-and-join algorithm takes only minutes.

Implementations

We provide three main methods for using our algorithm, depending on the desired output.

Tetrad GUI. We have integrated our system into the Tetrad interface. Tetrad is a powerful open-source Bayes net learning program from Carnegie Mellon university. There is some overhead in learning the Tetrad interface. For many users this will pay off because we have pull-down menus for setting up database connections, generating Markov Logic Networks, evaluating system performance, etc. The integration with Tetrad was carried out by Tianxiang Gao.
Structure Learning Only. This is run from the command line. Input: a relational database. Output: a Markov Logic Network structure, that is, a set of MLN clauses.
Structure Learning + Parameter Learning. This is run from the command line. Input: a relational database. Output: a Markov Logic Network structure with weights attached to the clauses.

Limitations

The main limitations of our current implementation are as follows.

Attribute Dependencies Only. The system finds dependencies among attributes given links (e.g., difficulty of a course predicts the intelligence of a student taking it). It does not find dependencies among links (e.g., if professor X advises student S, they are likely to be coauthors). We are actively working on extending the system so it finds joint statistical patterns over both links and attributes.
Complete Data. The system does not accommodate missing data, except to treat "null" as another attribute value. Khot et al. have shown that an EM-style approach can work in directed relational models. This would be a project for future work.
Normalization Assumption. We assume that the relational data tables satisfy a certain normalization assumption (described in the detailed pages), where information about entities is stored only once. This is not a fundamental issue for the algorithm, it has mainly to do with removing a ambiguity about the kind of statistical pattern to be discovered. From a practical point of view, the main consequence is that not all relational databases will run "as is". We are working on scripts to automatically convert a relational database into the appropriate format if necessary.

Inference.

For relational data, there are two inference types of interest (more details in this ILP paper).

Class-level inferences. An example would be "what is the percentage of Asian countries that are democracies?". These are performed most easily using standard Bayes net inference within the Tetrad GUI.
Instance-level inferences.

Other Systems.

If you've come to this place, you may well be interested in other statistical-relational systems. We provide a brief selective list here. Disclaimer: Our selection is not meant to reflect a judgement on the quality of different systems. Rather, these are just the systems with which we have some experience that we can pass on.

Alchemy. Sets the standard for SRL implementations. Provides learning procedures (both generative and discriminative). Sophisticated parameter learning, which in our experience provides high quality weight estimates, but can be slow if the number of parameters or the size of the data set is large. Comes with tutorials, data files, user manuals, FAQs, etc.
Tuffy. Provides inference for Markov Logic networks with a high degree of systems sophistication. Also performs parameter learning.
Aleph. A major system for inductive logic programming. Performs discriminative learning for a given target predicate.
RDN-Boost. A beautiful application of boosting for learning predictive clauses using relational regression trees. The basic learning mode is discriminative, but the intention is that you apply the program to each predicate, and then combine the resulting conditional probability models as with relational dependency networks. Needs negative examples (e.g., students who are not registered in a course), but has a default mode for generating negative examples if they are not supplied.
MLN boosting. Similar approach to RDN-Boost, but applies boosting to learning Markov Logic Networks (both structure and parameters). A neat idea, and the published evaluation results are quite impressive; code will hopefully be posted soon.

Developer's Manual.

An overview of the main routines in our source code. We haven't posted the source code, please ask us for it if you want to work with it directly.

Functor Bayes Net Algorithms