Structure Learning for Directed and Undirected Graphical Models in Relational Data
We used Join Bayes Nets to learn the structure of Markov Logic Networks. MLNs are a great invention by Pedro Domingos and his collaborators. Because of their strong theoretical foundations and well-developed inference capabilities, MLNs are one of the leading statistical-relational formalisms, if not the leading one.
Motivation and Overview
Markov Logic Networks (MLNs) form one of the most prominent SRL model classes; they generalize both first-order logic and Markov network models. The qualitative component or structure of an MLN is a finite set of formulas or clauses {pi}, and its quantitative component is a set of weights {wi}. MLNs have become popular as a unifying framework because of their expressive structure that can model knowledge in many different fields of AI. Many of the current state of the art algorithms for learning MLNs have focused on relatively small datasets with few descriptive attributes, where predicates are mostly binary and the main task is usually prediction of links between entities. Scaling to larger datasets has been a challenge: State-of-the-art learners report results on datasets only as big as 3,000 ground atoms (basic facts). Our moralization approachperformed structure learning for MLNs in medium sized relational datasets with schemas that feature a significant number of descriptive attributes, compared to the number of relationships. The evaluation was done on datasets with up to 170,000 true ground atoms. This work was published in the proceedings of the AAAI10 Conference. [ Find the related paper here.]
Approach
Our learn-and-join algorithm addressed structure learning for MLNs in medium sized relational datasets, our approach was to learn a (parametrized) Bayes net (BN) that defines connections between predicates/functions that represent a joint distribution of descriptive attributes given the links between entities in a database. The system learns a Bayes net that models the distribution of discrete descriptive attributes on medium to large datasets, given the links between entities in a relational database. A standard moralization technique produces MLN clauses from the BN structure. The weights for the formulas are obtained using weight learning methods implemented in the MLN Alchemy system from the University of Washington.
Installation
The main file is packaged into a jar file. The input taken in by the package is a name of a database that already exists in MYSQL and the output is an MLN structure without weights for the given dataset.
System Requirements
- The system runs under Unix.
- You need Java version 1.6 or greater.
- An installation of MySQL. The command "mysql" should work from the command line.
- Download the package from [here]
- The package contains MBN_learning.jar and a config.xml file.
- Check MySQL. JBNs uses relational database for data management, which allows them to take advantage of efficient querying techniques developed in the database community. JBNs package connects to MYSQL so it is expected that you have MYSQL installed on your computer.
- Prepare the database. JBNs use the Entity-Relationship model as input. The database is expected to consist of entity and relationship tables. The primary key of the entities should be set and the name primary key should end with "_id". The foreign keys in the relationship tables should have the same name as the primary keys and foreign key constraint should be set. The foreign keys should cascade on delete.
- Modify the config.xml file to replace your connection setting to MYSQL.
- Change directory into the directory where you extract the package and run "java -jar MBN_learning.jar foo" in a linux terminal command where foo is the name of a schema already in your MYSQL.
Examples
Examples of MLN files learned using MBN [here].
We go through the steps above with a database server called kripke and a database schema called University hosted on Kripe.- Modify the config.xml file to replace the line
jdbc:mysql://kripke/ such "kripke" is replaced by your database server name. - Change directory into the directory where you extract the package and run "java -jar MBN_learning.jar University" in a linux terminal command where University is the name of a schema already in your MYSQL database.
Structure + Parameter Learning: The Virtual Join Algorithm
In this paper we extend the moralization approach to parameter learning in addition to structure learning, where MLN weights are directly inferred from Bayes net CP-table entries. This work is currently under review.
Approach
Our parameter estimation methods are based on methods from directed graphical models, while previous MLN parameter learning techniques were based on undirected graphical models (optimizing log-linear weights). We estimate the CP-table entries from the database statistics and then investigate several conversion functions to map the conditional probabilities into MLN weights.
Installation
System Requirements
- The system runs under Unix.
- You need Java version 1.6 or greater.
- An installation of MySQL. The command "mysql" should work from the command line.
The main file is packaged into a jar file. The input taken in by the package is a name of a database that already exists in MYSQL and the output is an MLN structure with weights for the given dataset. To use the package, please follow these steps.
- Download the package from [here]
- The package contains VJ_learning.jar and a config.xml file.
- The system uses relational database for data management, which allows them to take advantage of efficient querying techniques developed in the database community. The JBN package connects to MYSQL so it is expected that you have MYSQL installed on your computer.
- The system is designed for databases whose schema follows the Entity-Relationship model. The database is expected to consist of entity and relationship tables. The primary key of the entities should be set and the name primary key should end with "_id". The foreign keys in the relationship tables should have the same name as the primary keys and foreign key constraint should be set. The foreign keys should cascade on delete.
- Modify the config.xml to replace your connection setting to MYSQL.
- Change directory into the directory where you extract the package and run "java -jar VJ_learning.jar foo" in a linux terminal command where foo is the name of a schema already in your MYSQL.
Examples
Examples of MLN files learned using Virtual Join [here].
We go through the steps above with a database server called kripke and a database schema called University hosted on Kripe.- Modify the config.xml file to replace the line
jdbc:mysql://kripke/ such "kripke" is replaced by your database server name. - Change directory into the directory where you extract the package and run "java -jar VJ_learning.jar University" in a linux terminal command where University is the name of a schema already in your MYSQL database.