PublicationsI have published papers on a number of topics in different areas, such as computational learning theory, statistical-relational learning, computation theory, inductive inference, philosophy of science, game theory (foundations, algorithms, evolutionary analysis), belief revision, machine learning for particle physics. The notes on the papers describe some of the connections between these different topics.
Learning Directed Relational Models With Recursive Dependencies Oliver Schulte, and Hassan Khosravi (2011).
Proceedings of the Conference on Inductive Logic Programming (ILP), extended abstract.
One of the biggest problems in using Bayes nets with relational data is the fact that the data feature cyclic dependencies (see papers below). One observation we prove in this paper is that this happens exactly when you have self-relationships where an entity type is related to other members of the same type (e.g., users are friends with each other in a social network). I proposed a pseudo-likelihood measure that allows us to learn Bayes nets efficiently even in this case (see SIAM SDM paper 2011). This paper presents a set of experiments that focus on learning with the pseudo-likelihood on databases with cyclic dependencies.
There is a nice theoretical contribution as well. Relational Bayes nets can represent recursive dependencies elegantly using "copies" of variables. Thus we can have a rule like "Smokes(X), Friend(X,Y) -> Smokes(Y)" to say that the smoking of a person X is associated with the smoking of their friends. The problem for structure learning now is that we don't want to treat all the copies of Smokes equally, because that would lead to a duplication of the same statistical patterns. Our proposal is to designate one of the copies as the main copy (e.g., "Smokes(Y)"). The rule is then that only the main copy can have parents. So a copy like "Smokes(X)" is just an auxilliary random variable for modelling the dependence of smoking on itself. We prove a theorem to the effect that restricting Bayes nets to this format involves no loss of generality. In a way this is a detail but it's an important and without the theorem it's not easy to see how to address it.
Learning Compact Markov Logic Networks With Decision Trees Hassan Khosravi, Oliver Schulte, Jianfeng Hu and Tianxing Gao (2011).
Proceedings of the Conference on Inductive Logic Programming (ILP), extended abstract.
One of the ideas we have been working out is to combine learning for directed relational models with inference in undirected relational models. This addresses the problem of how to do inference when there are cyclic dependencies in the relational data (see papers below). One problem we noticed is that if you just convert a Bayes net into a Markov Logic Network, you get many logical clauses (rules). The reason is that tables of conditional probabilities contain more rows than necessary because the tabular representation does not exploit local independencies for a given child-parent configuration. In this paper we show that if you use decision trees instead of conditional probability tables in the Bayes net, you get a much more compact set of rules. This makes the learned set of rules more readable, improves weight learning for rules, and ultimately increases predictive accuracy.
A tractable pseudo-likelihood function for Bayes Nets applied to relational data. Oliver Schulte (2011).
Proceedings of the SIAM SDM Conference on Data Mining, pp.462-473.
The most practically useful theoretical paper I have written. It considers the question of how we can quantify how well a Bayes net model fits the information in a relational database. In fact, the proposal works for any statistical-relational model that uses 1st-order logical variable. This is a problem because in relational data, units are not independent of each other, so we cannot apply the model to each unit separately and multiply the results. My proposal is to define the model fit by considering how well the model explains, on average, the properties of a random instantiation of the 1st-order variables. For instance in a social network, we can imagine randomly sampling pairs of actors, and applying the Bayes net to compute a probability for that pair. The average log-probability over all pairs is my proposed pseudo-likelihood measure.
The paper establishes various desirable theoretical properties of the pseudo-likelihood measure, gives an efficient closed form for computing it, and describes efficient algorithms for Bayes net parameter and structure learning that maximize the measure. Warning: the reviewers said the paper was "a little dense", so you may want to look at my powerpoint presentation first.
Erratum: In Table 1, the column labelled C(X) should be ommitted.
Learning Conservation Laws Via Matrix Search. Oliver Schulte and Mark S. Drew (2010).
Proceedings of the 13th Conference on Discovery Science, pp.236-250, Springer LNAI 6332. The original publication is available at www.springerlink.com.
Continues the research on learning conservation laws in physics and other areas of science. In my IJCAI 2009 paper I showed that choosing a maximally strict set of conservation laws---that rules out as many unobserved data points as possible---formalizes an important part of the methodology used by scientists. It's actually also a classic principle from Machine Learning known as selecting the least generalization of the data. In itself the principle is not enough to determine a unique set of conservation laws, so in this paper we added the requirement that the conservation laws should be maximally simple. "Simplicity" here means assigning small conserved quantities (the L1-norm of the conservation matrix). The combined criterion we call maximally simple maximally strict laws, or MSMS laws. A beautiful and surprising connection with particle ontology is that if there is a set of maximally strict laws that corresponds to a disjoint partition of particles into families (like a clustering), then that set is also the MSMS-optimum. This means that the MSMS criterion can be used to discover not only conservation laws but also particle families - it's like doing classification and clustering at the same time.
Finding an MSMS theory isn't easy because it requires minimizing the L1-norm with a non-linear constraint. My collaborator Mark Drew devised a clever way to do the minimization which is blindingly fast on realistic data sets. We show that this method re-discovers the laws and families in the Standard Model of particle physics, which is a very important scientific theory. It also works in chemistry, re-discovering the molecular structure of substances like ammonia and water.
Structure Learning for Markov Logic Networks with Many Descriptive
Attributes. Hassan Khosravi, Oliver Schulte, Tong Man, Xiaoyuan Xu,
Bahareh Bina (2010).
Proceedings of the Twenty-Fourth Conference on Artificial
Intelligence (AAAI), pp.487-493.
This paper is the first archival presentation of our new Bayes net structure learning algorithm for relational data. The learn-and-join algorithm is based on a single-table BN learner, which is used as a black box and can be chosen by the user. The single-table learner is applied to increasingly larger table joins, where the result of learning on smaller tables constrains learning on their joins. The simulations in this paper provide evidence that the algorithm is fast for relational databases of realistic sizes and fairly complex structure, returning a result with 10 min or so. In contrast, other state-of-the art Markov Logic Network learners do not return a result on these datasets even after days of running.
To evaluate the quality of the models learned by the algorithm, we use the log-linear formalism of Markov Logic Networks to derive predicitions about attributes of individual entities from the learned BN model. (For instance, the probability that Jack is highly intelligent is .7 given his grades in the courses he's taken.) Essentially, we convert the Bayes net to a Markov net via the standard moralization procedure (marry the parents of all nodes, drop the directions on the edges). Compared to other Markov Logic Network learning procedures, the predictions of the model are very good.
One reviewer asked why we don't derive predictions directly from the relational Bayes net. There are two reasons, the combining problem and the cyclicity problem. (1) The learned Bayes net predicts the generic probability that a student is highly intelligent given the properties of a single course they haven taken. But the database may contain information about many courses the student has taken, which needs to be combined. I sometimes call this the combining problem. To address the combining problem, one needs to use an aggregate function, or a combining rule, or the log-linear formula of Markov Logic networks. (2) The main reason is the cyclicity problem: there may be cyclic dependencies between the properties of individual entities. For example, if there is generally a correlation between the smoking habits of friends, then we may have a situation where the smoking of Jane predicts the smoking of Jack, which predicts the smoking of Cecile, which predicts the smoking of Jane, where Jack, Jane, and Cecile are all friends with each other. This has turned out to be a knotty problem for directed graphical models. Neville and Jensen (2007) go so far as to conclude that "the acyclicity constraints of directed models severely limit their applicability to relational data". Converting the Bayes net to an undirected model avoids the cyclicity problem. So we're trying to get the best of both the directed and the undirected worlds: Scalability and interpretatiblity from Bayes nets, the solutions to the combining and cyclicity problems from Markov nets.
The Imap Hybrid Method for Learning Gaussian Bayes Nets. O.
Schulte, G.Frigo, R. Greiner and H. Khosravi (2010).
Proceedings of the 23rd Canadian Conference on Artificial
Intelligence (CANAI), pp.123--134, Springer LNCS 6085. Click here for a full
version with proofs and more experimental results. The original
publication is available at www.springerlink.com. Best Paper Award.
We continue the project of using the learning-theoretic analysis of learning Bayes nets from statistical tests ("Mind-change optimal learning of Bayes net structure from dependency and independency data") to develop new algorithms for this problem. We follow the hybrid approach of the previous I-map learning paper that combines model selection scores with statistical tests. The previous paper considered Bayes nets with discrete variables; this one considers continuous variables with linear Gaussian distributions. This is a subclass of structural equation models, a very widely used model class in economics, psychology, biology and other disciplines.
We give simulation evidence to show that in these models, standard model selection scores (like Bayes Information Criterion) tend to select overly complex models with too many edges. We propose using statistical tests to cross-check the model selection score: before we add an edge to the graph, we check to see if this would help to cover a statistically significant correlation. If not, we don't add the edge, even if it would increase the score. We prove that this procedure converges to a correct graphical model in the sample size limit. In simulations it gives a better match to the true data-generating graph, on average, than the unconstrained model selection score. The procedure is especially good at finding models with the correct number of edges, which is the measure of model complexity that comes out of applying the mind change analysis from learning theory to Bayes net learning.
Evolutionary Equilibrium in Bayesian Routing Games: Specialization and
Niche Formation. with P.Berenbrink (2010).
Theoretical Computer Science, 411:1054-1074.
An extended journal version of our ESA paper below with the same title (sorry!). Analyses the selfish routing game, a popular model of user behavior on a computer network. This is basically a congestion game. Our paper combines evolutionary analysis with Bayesian games of incomplete information. A description of results appears with the conference version below. The journal version has examples and nicer pictures :-)
Erratum: in Figure 4 and on page 1060, replace the symbol p with \sigma.
Mind-change optimal learning of Bayes net structure from dependency and
independency data. Schulte, O., W. Luo, and R. Greiner (2010).
Information and Computation, 208:63-82.
An extended journal version of our COLT paper on "Mind Change Optimal Learning of Bayes Net Structure" (see below). This paper continues the program of applying the theory of mind change optimal learning to derive algorithms for practical problems. Another example is my work on discovering conservation laws in particle physics. The learning problem in this paper is to find a Bayes net structure that fits an increasing set of dependence or independence facts (correlations or the lack thereof). The COLT paper showed that if the data are observed significant correlations, the optimal method is to conjecture the Bayes net with the smallest number of edges that entails these correlations. A new result in this paper is that if the data are observed independencies, the optimal method is to conjecture the Bayes net with the largest number of edges that entails these independencies. The famous PC method for Bayes net learning seems to be a heuristic method for the independency case. For each data type, it is NP-hard to decide whether there is a unique optimal Bayes net and if so find it. The proof is fairly simple for the case of independence data, but for dependencies it runs over 7 pages. That was the most complicated NP-hardness proof I've done. We have implemented heuristic methods based on the learning-theoretic analysis of this paper (see our CIDM paper below).
Simultaneous Discovery of Conservation Laws and Hidden Particles With
Smith Matrix Decomposition . Schulte, O. (2009).
Proceedings of the Twenty-First International Joint Conference on
Artificial Intelligence (IJCAI-09), pp. 1481-1487.
One of the referees was kind enough to nominate this paper for the best paper award. It's another installment in my series on conservation laws in particle physics. Previously seen on this channel ("Inferring Conservation Laws in Particle Physics", 2000): A learning-theoretic analysis shows that minimizing mind changes requires a particle theorist to adopt the following principle: try to find a set of conservation laws that is consistent with all observed reactions, and rules out as many unobserved reactions as possible. I also showed that optimally fulfilling the principle may require hidden particles: Though it may seem paradoxical, explaining the data about observable entities may require positing unobserved entities. The 2000 paper gave a mathematical analysis that characterizes exactly when this phenomenon occurs. The technical answer is that it occurs when there is an unobserved reaction that is a linear combination of observed reactions, but one that involves fractions as coefficients (e.g., 1/2 reaction1 + 1/2 reaction2).
What's new is an efficient algorithm that allows me to compute when the fractionality condition for introducing hidden particles is met. Applying it to the actual particle data leads to the rediscovery of the electron antineutrino. This may not sound exciting for readers outside of particle physics, but in the particle physics literature they say that the existence of this particle is one of the two main questions the field is investigating. From a computer science point of view, the cool thing is that the algorithmic solution is to compute the Smith Normal Form of the reaction data: A hidden particle is needed if and only if the Smith Normal Form contains a diagonal entry greater than 1. I didn't know about the Smith Normal Form before I looked at this research. It's a classic tool for integer matrices from the 19th century, tells you a lot about the matrix you are dealing with. Kind of like singular value decomposition for real-valued matrices. So from learning theory we got to hidden particles, which led to fractions of reactions, which led to the Smith Normal Form.
How I found about the Smith Normal Form is a nice story about scientific collaboration, one of my best experiences in that regard. Once I had the question about hidden particles down to a linear algebra question about fractional linear combinations, I thought I could consult one of our theory experts, so I asked Arvind Gupta. Arvind said that if it was about linear algebra, I should ask Wayne Eberley from the University of Calgary. So I sent Wayne e-mail and he said it sounded like a job for the Smith Normal Form, and if that was the case, I should ask Mark Giesbrecht from the University of Waterloo. So I sent the problem to Mark, and he sent back the solution, complete with correctness proof and implementation advice. Thank you Mark!
Virtual Joins With Nonexistent Links . Khosravi H., Bina, B.,
Schulte, O. (2009).
Inductive Logic Programming (ILP-09).
A new dynamic programming algorithm for computing the number of instantiations that satisfy a first-order clause. We need this for parameter estimation in our relational Bayes nets. Yes, the problem is #P-complete, but read on!
Class-Level Bayes Nets for Relational Data. O. Schulte, H.
Khosravi, F. Moser, M. Ester (2009). CS-Learning Preprint Archive.
This is the current description of our research results on learning Bayes nets for relational data. The main idea is to learn Bayes nets for the class-level distribution over attributes and relationships, as defined in Halpern and Bacchus' classic AI work from the 90s. Shorter versions are available as follows.
- 3 pages: Bayes Nets for combining logical and probabilistic structure . Schulte, O., Khosravi H., Bina, B. (2009). STRUCK workshop at IJCAI-09.
- 6 pages: Join Bayes Nets: A New Type of Bayes net for Relational Data. Schulte, O., Khosravi H., Bina, B., F.Moser (2009). GKR workshop at IJCAI-09. More on structure learning for statistical-relational models.
- A technical report, SFU School of Computing Science Technical Report 2008-17. Pretty much superseded by the preprint.
A New Hybrid Method for Bayesian Network Learning With Dependency
Constraints. Schulte, O.; Frigo, G.; Greiner, R.; Luo, W. &
Khosravi, H. (2009). Proceedings IEEE CIDM Symposium, pp.53-60.
The learning-theoretic model of Bayes net structure learning presented in the paper "Mind Change Optimal Learning of Bayes Net Structure" below suggests using dependencies or significant correlations as a constraint in learning Bayes nets. We present a new hybrid criterion to make this idea mathematically precise (maximize a model selection score subject to the constraint of fitting the observed correlations), give an algorithm for optimizing the hybrid criterion, and report experimental results from simulations and real-world data. Our general conclusion is that the dependency constraints are helpful for discrete-variable Bayes nets that are not too large (10 nodes or less), especially on small to medium sample sizes (1,000-3,000). One of the main things I've learned is that multiple hypothesis testing is hard! Statisticians have recognized this for a long time and proposed various ingenious schemes. We tried a number of them with more or less success, but in our experiments it was hard to find one that solves the basic dilemma for all or even most problems: if the testing is too conservative (e.g., with low significance levels), you get little information from the test and diminishing returns on the computational overhead. If the testing is too aggressive, there are too many false positives leading to overly complex models. Hence our caveats on the range of problems for applying the hybrid method.
of Harman, G. and Kulkarni, S. Reliable Reasoning: Induction and
Statistical Learning Theory. The MIT Press, London, England,
2007. O. Schulte (2008). Published
in Biometrics, Sep 2008, Vol. 64(3), pp.992-993.
A review for a statistics journal by a computer scientist of a book written by a philosopher and an engineer - I couldn't resist. The book is basically an introduction to VC learning theory (known in CS as PAC theory, for "probably approximately correct"), with as little technicality as possible. Unfortunately, the word limit for Biometrics was very tight, so pretty much all I could do was a brief summary of the VC dimension and an outline of what Harmann and Kulkarni do with it to discuss problems of inductive inference.
Co-Discovery of Conservation Laws and Particle Families. O. Schulte (2008). Studies in
History and Philosophy of Modern Physics, Vol 39/2, pp.288-314,
doi:10.1016/j.shpsb.2007.11.003. The version typeset by the journal is here.
This is the expanded journal version of the previous paper on underdetermination of conservation laws in particle physics. The title alludes to a quote by David Lewis, that "laws and natural properties get discovered together". The main new result added is that particle reaction data alone are sufficient to determine the matter/antimatter division of the particle world (without a general model of particle ontology like the Standard Model). I also give more detailed comparisons between the algorithmic or learning-theoretic method for discovering conservation laws and how physicists have approached this problem in practice.
Particle Physics Cut Nature At Its Joints. O. Schulte (2009). Proceedings of the
13th International Congress on Logic, Methodology and the Philosophy of
Science (LMPS), Beijing, China, pages 267-286. Eds. Glymour, Wei,
Westerstahl. College Publications. Invited Lecture.
The Congress takes place every 4 years; it began 1960. It is organized by the International Scientific Union for History and Philosophy of Science (IUHPS), which is an organization affiliated with UNESCO. This is a major interdisciplinary conference with a proud tradition; for instance, Nobel Laureate Harsanyi and computation theorist Buechi presented major results there. I felt honoured to receive an invitation to give a lecture.
The paper considers the challenge of discovering conservation laws in particle physics from a philosophical point of view rather than an algorithmic one. The main new issue is to single out a unique set of conservation laws from the infinitely many sets that make the same predictions. In previous work I showed that the criteria based on mind change efficient learning developed in previous papers select exactly the conservation laws that make the ssame predictions as those found in the Standard Model of Particle Physics (see "Algorithmic Derivation of Additive Selection Rules and Particle Families from Reaction Data" and "Inferring Conservation Principles in Particle Physics: A Case Study in the Problem of Induction" below). But there are many different sets of conservation laws that achieve this (as many as there are bases for a vector space). So the question arises: why pick one of the law sets over another? In philosophy of science, this is sometimes called "global underdetermination" and sometimes the problem of choosing between empirically equivalent theories, meaning theories that agree on all their predictions. (The standard example is Copernicus' theory of the solar system vs. Ptolemy - with suitable parameters, both can be made to give the same predictions.) The answer in this paper is that particle ontology selects a unique theory. I prove that if you want a set of conservation laws that clusters particles into disjoint families (e.g., Baryons, Muon, Tau, Electron Types), then there is at most one way of doing that. Which---you guessed it---is the one physicists have chosen. It was very surprising to me that the clustering criterion picks out a unique set of conservation laws; mathematically it is a pretty result in linear algebra. So the main thing I learned in this research is the importance of ontology or taxonomy in scientific research. Based on this insight, I make a philosophical proposal, namely to answer the question: What is a natural kind? with: A natural kind is a category that appears in the ontologically simplest empirically adequate set of generalizations.
Association Rules in the Relational Calculus, O. Schulte, F. Moser, M. Ester, Z.Lu
(2007). SFU School of Computing Science Technical Report 2007-23.
Also posted in CS-Database Preprint Archive at
I'm starting to work on various issues that arise when we combine probabilistic concepts with a logic-based formalism, like relational databases. This paper defines the concept of support and confidence for a large class of association rules, based on safe queries in the domain relational calculus (a logical query language equivalent to SQL). The key problem is this: given a logical formula with free variables and a database instance (finite model), what is the percentage of free variables that satisfy the formula? For example, consider the formula Customer(X) AND Student(X). It seems this should be neither the percentage of students who are customers, nor the percentage of customers who are students. Our proposal is to define the frequency as the ratio of (#people who are both students and customers)/(#people who are students or customers). We extend this basic idea to general safe queries and prove various metatheorems about it, such as the a priori property.
The key differences with other relational rule approaches (e.g., WARMR) are this: 1) we do not assume that the user specifies a single target or base table; instead, we consider combinations of base tables (Student union Customer). 2) The base tables are defined dynamically and depend on the particular query formula, so there is no fixed table with respect to which the frequency of all formulas is defined. 3) Our rule language is much richer than conjunctions of atoms; it allows nested quantification and negation.
Equilibrium in Bayesian Routing Games: Specialization and Niche
Formation, with Petra
Berenbrink (2007). 15th European Symposium on Algorithms (ALGO/ESA).
Most of my papers on game theory are about using logic to address foundational questions or solve computational problems. This paper is more in the standard mould of an econommics game theory paper: take a game of interest and figure out its equilibria. The game of interest here is the parallel links network routing game popularized by Papadimitriou, which is a simple kind of congestion game. Our novel contribution is to consider evolutionarily stable equilibria in this game. In my view it makes a lot of sense to model a network with a large population as users, and in a population model evolutionary stability is a very natural concept to apply. The main result mathematically is that there is at most one evolutionarily stable Nash equilibrium. It was surprising to me that in this game evolutionary equilibrium has so much predictive power. We also give necessary and sufficient conditions concerning the distribution of tasks to resources that hold in the evolutionary equilibrium, which leads to some pretty pictures.
Minimum Consistent Subset Cover Problem and its Applications in Data
Mining . B. Gao, J. Cai,
M.Ester, O.Schulte and H. Xiong (2007). Proceedings of KDD 2007
(Knowledge Discovery and Data Mining).
Consider the following algorithmic problem: Given a set of points X, and a consistency constraint on subsets of X, find a minimum collection of consistent subsets that covers all points in X. We show that a number of data summarization problems in data mining can be viewed as instances of this problem (e.g., finding a minimum rule set for classifying, frequent pattern search, converse k-clustering). Many of these problems satisfy a monotonicity constraint: all subsets of a consistent set are also consistent. For subset cover problems with the monotonicity constraint, the paper describes new heuristic algorithms that take advantage of the constraint.
Derivation of Additive Selection Rules and Particle Families from
Reaction Data . O.
Schulte and M.S. Drew (2007). Posted at Preprint Archive,
A succinct summary (6 pages) of the result of applying the learning-theoretically optimal algorithm to finding conservation laws and particle families. (The algorithm was described in "Inferring Conservation Principles in Particle Physics", see below.) Written for physicists. Shows how particle families (e.g., baryons, lepton generations) are uniquely determined by reaction data. States the result that for any class of reactions (e.g., strong interactions), the number of conserved quantities in that class is no greater than the number of stable particles without a decay mode in the reaction class. This assumes the principles of Special Relativity - my only piece of theoretical physics so far! (Probably ever.) This paper supersedes our previous technical report posted at http://www.cs.sfu.ca/research/publications/techreports/#2006.
Change Optimal Learning of Bayes Net Structure. O.Schulte, W. Luo and R. Greiner (2007).
in 20th Annual Conference on Learning Theory (COLT), San Diego,
CA, June 12-15.
Previous papers of mine developed an approach to inductive inference based on success criteria like fast convergence to a correct hypothesis, and minimizing theory changes. One application area was the discovery of conservation laws and particle families in particle physics. This paper opens a new application domain: learning Bayes nets. Our learning model is based on increasing information about what statistically significant correlations obtain between domain variables (conditional on others). For example, the learner may be told that "father's eye colour is correlated with mother's eye colour given the child's eye colour".
We apply previous our previous results from general learning theory to determine that the optimal method by our success criteria is to find a Bayes net that entails the observed significant correlations with a minimum number of edges , or directed dependencies between variables. So the learning theory leads to a new notion of simplicity for Bayes nets: the fewer edges, the simpler the net.
The paper shows that it is NP-hard to determine whether there is a unique minimum-edge Bayes net for a set of observed probabilistic dependencies. As far as I know, this is the first NP-hardness result in the literature concerning the existence of a uniquely optimal Bayes net structure. Personally, it's the first NP-hardness proof I've done for a published paper (as opposed to exercise). It was fun but hard because of the uniqueness constraint (full proof runs over 6 pages). Since implementing the mind-change optimal learner is NP-hard, applying the learning theory requires heuristic solutions. We've done quite a lot of work on suitable heuristics already - coming soon on this webpage!
Reliable Inductive Inference. O.
Schulte (2007), in Induction, Algorithmic Learning Theory, and
Philosophy , pp. 157-178, eds. Michele Friend, Norma B. Goethe,
Valentina S. Harizanov. Series: Logic, Epistemology, and the Unity of
Science, vol. 9. Dordrecht: Springer, 2007.
Another introductory piece on formal learning theory. Provides a fair amount of comparison with other approaches to inductive inference like confirmation theory in philosophy and language learning theory in computer science.
Do the Harper and Levi Identities Constrain Belief Change? O.
Schulte (2006). Truth and Probability, Essays in Honour of Hugh
LeBlanc, eds. Francois LePage and Bryson Brown, pp.123--137. College
One of the oldest ideas in belief revision theory is due to Isaac Levi: that changing your beliefs given some new information should proceed in two steps: first, give up old beliefs contradicted by the new information, second add the new information. Bill Harper proposed the reverse for belief contraction or weakening: to give up a belief p, consider what you would think if you learned that p was false. I show exactly what these two ideas entail, and under what conditions they can be used to make belief revision and belief contraction inverses of each other. - I saw Harper and Levi once at a conference at the same time, so I got to tell them about these results at the same time. Fun for me at least.
Change Efficient Learning. Wei Luo and Oliver Schulte (2006). Logic
and Computation, 204:989-1011.
Our final paper on the relationship between topology and mind changes in the Gold learning model. It extends the previous conference paper with the same name (sorry!) in two ways: (1) we introduce a stronger, and mathematically simpler, notion of what it is for a learner to minimize mind changes at a given stage of learning, (2) we relate mind change complexity as measured in point-set topology to other notions of learning complexity from the literature (e.g.,"reducibility" of one learning problem to another).
Method. W.L. Harper and
(2006). McMillan Encyclopaedia of Philosophy
Covers conceptions of scientific method from Galileo and Newton to contemporary statistics and epistemology. Explains how modern work on inferring Bayes Nets or graphical models of causation fits into traditional philosophical methodology going back at least to the Middle Ages.
Change Efficient Learning. W. Luo and
O. Schulte (2005). 18th Annual Conference On Learning Theory.
, June 27-30. Springer LNAI 3559, pp.398-412, eds. P. Auer and R. Meir. Bertinoro, Italy
This paper continues my work on using the criterion of minimizing belief changes to determine optimal solutions to problems of inductive inference. We look at mind changes in the Gold learning paradigm, using a generalization of the idea of a mind change bound beyond finite mind change bounds to transfinite (ordinal) ones. First we characterize the learning problems that can be solved with a given ordinal mind change bound. It turns out that this corresponds exactly to Cantor's classic concept of accumulation order in a topological space. Second we characterize the mind change efficient inductive methods that achieve the best possible mind change bound given any data sequence. Basically, the mind change efficient learners are those that produces conjectures that maximize accumulation order. We show that mind change efficiency places strong constraints on inductive inference, and illustrate these constraints with examples such as identifying a linear subspace and identifying a one-variable pattern language (Angluin).
von Neumann-Morgenstern Games in the Situation Calculus. O. Schulte and J. Delgrande (2004). Annals
of Mathematics and Artificial Intelligence 42 (1-3):
73-101. (Special Issue on Multi-agent Systems and Computational
shorter version of this paper appeared in the 2002 AAAI Workshop
Decision and Game Theory.
Game Trees are a very general, highly-developed mathematical formalism for representing what an agent knows about its environment to plan goal-directed actions. The Situation Calculus is a logical, computational formalism for representing environmental dynamics. In this paper we show a strong relationship between the two: just about every game-theoretic model of an environment can be described in the situation calculus with a categorical set of axioms. This implies that all and only true features of the game-theoretic model are expressed by the situation calculus axioms. Some provisos are that the game tree must provide each agent with at most countably many actions, and that the agents' utility functions (goal specifications) must be continuous in the Baire topology. We give situation calculus definitions for game-theoretic concepts such as Nash equilibrium and backward induction.
Backward Inference: An Algorithm for Proper Rationalizability. O. Schulte (2003). Proceedings of TARK IX
(Theoretical Aspects of Reasoning About Knowledge),
, pp. 15--28. ACM, Bloomington, Indiana . Expanded version with full proofs. New York
An important approach to game theory is to examine the consequences of beliefs that rational agents may have about each other. This paper considers respect for public preferences. Asheim's epistemic characterization of respect for preferences interprets this condition as follows, roughly speaking: An agent explains unexpected behaviour by other agents as mistakes, and considers more serious mistakes less likely than less serious ones. "Properly rationalizable" outcomes of a multi-agent interaction are those that are consistent with common belief in respect for preferences. It is difficult to determine in a given game model what follows from common belief in respect for preferences. This paper proposes an algorithm for deriving consequences of that assumption called Iterated Backward Inference. Iterated Backward Inference is a procedure that generalizes standard backward induction reasoning for games of both perfect and imperfect information. Our main result shows that if respect for public preferences and perfect recall obtains, then players choose in accordance with Iterated Backward Inference.
The algorithm in this paper does not necessarily find all strategies consistent with proper rationalizability. This problem was recently solved by Andres Perea.
Learning Theory. O.
The Stanford On-line Encyclopaedia of Philosophy, ed.
The online encyclopaedia is a great idea! My contribution is an introductory entry on formal learning theory.
Minimal belief change, Pareto-optimality and logical consequence.
Oliver Schulte (2002). Economic Theory 19
A survey of some basic results about minimal belief revision. Covers topics such as Pareto-minimal theory change, the Levi and Harper Identities, base revision, conditionals and the Ramsey test, and the Grove representation theorem for the AGM postulates.
and Planning in an Action-Based Multi-Agent Framework: A Case
Study. B. Bart, J. Delgrande
and O. Schulte (2001). Advances
in Artificial Intelligence , eds. E. Stroulia and S.
Springer Lecture Notes in AI 2056, 121-130.
Together with Jim Delgrande, I showed that just about every game tree has a sound and complete (categorical) axiomatization in the situation calculus (see above). This shows that the situation calculus is sufficiently expressive to represent multi-agent interactions. We apply this insight to give a natural axiomatization of a variant of the "Clue" game in the situation calculus. The paper discusses a number of the issues that arise, such as the representation of what is common knowledge between various players and successor state axioms for the rules of the game.
Conservation Principles in Particle Physics: A Case Study in the
of Induction. Oliver
Schulte (2001). The
British Journal for the Philosophy of Science, 51:
Presents a learning-theoretic analysis of a problem that arises in particle physics: to infer conservation principles governing elementary particles from data about reactions among such particles. The criteria of reliable, fast and steady convergence to a correct theory (see "Means-Ends Epistemology" below) give strong constraints on what these inferences should be. In some cases they dictate that a particle theorist has to posit the existence of hidden particles.
to Believe and What to Take Seriously: A reply to David Chart
the Riddle of Induction.
Schulte (2000). The British Journal for the Philosophy of
In response to my article "Means-Ends Epistemology", David Chart constructed an example in which means-ends analysis sanctions the inference "all emeralds are grue" from a sample of green emeralds. I take this to show two points: (1) means-ends analysis does not depend on the predicates that an inquirer might have chosen to include in her language, but (2) means-ends analysis does depend on what hypotheses the inquirer takes seriously at the outset of inquiry. What hypotheses to take seriously is a different (though related) issue from what inferences to draw from given evidence.
of Martin and Osherson's "Elements of Scientific Inquiry". Oliver Schulte (2000). The British
for the Philosophy of Science , 51:347-352.
A review of Martin and Osherson's treatment of many topics in scientific inquiry and inductive inference from a learning-theoretic point of view.
Belief Change and the Pareto Principle. Oliver
Schulte (1999). Synthese,118:324-361.
What do changing beliefs, revising legal codes and updating knowledge bases have in common? The answer, according to many philosophers, logicians and computer scientists, is that they all ought to obey the principle of minimal belief change in the light of new information, change your beliefs or legal code or knowledgebase just enough to incorporate the new information but not more. Our paper "Reliable Belief Revision" below critically examines the effects of this principle on an agent's ability to reliably arrive at true beliefs.
In this paper, I ask what exactly a minimal belief change is. I apply the fundamental decision-theoretic principle of Pareto-Optimality to define a notion of minimal belief change, and then prove that a theory change is Pareto-minimal iff it satisfies certain axioms. It turns out that these axioms correspond exactly to three axioms for counterfactuals that are familiar from Lewis and Stalnaker's work. I also work out the consequences of Pareto-Optimality for minimal revision of axiomatic bases representing beliefs (rather than deductively closed theories). I compare Pareto-minimal belief change with the well-known AGM belief revision axioms.
Belief Change and Pareto-Optimality. Oliver
Schulte (1999). Advanced Topics in Artificial Intelligence,
ed. Norman Foo. Springer Lecture Notes in Computer Science,
Logic of Reliable and Efficient Inquiry. Oliver Schulte (1999). The Journal of
28:399-438 (compressed version only)
This paper pursues a thorough-going instrumentalist, or means-ends, approach to the theory of inductive inference. I consider three epistemic aims: convergence to a correct theory, fast convergence to a correct theory and steady convergence to a correct theory (avoiding retractions). For each of these, two questions arise: (1) What is the structure of inductive problems in which these aims are feasible? (2) When feasible, what are the inference methods that attain them? Formal learning theory provides the tools for a complete set of answers to these questions. As an illustration of the results, I apply means-ends analysis to various versions of Goodman's Riddle of Induction.
Schulte (1999). The
British Journal for the Philosophy of Science, 50:1-31.
Develops the same ideas as "The Logic of Reliable and Efficient Inquiry", but in different directions, and at a much lower level of technicality. Features a means-ends vindications of (aversion of) Occam's Razor as well as the natural generalizations in a Goodmanian Riddle of Induction. I establish a hierarchy of means-ends notions of empirical success , and discuss a number of issues, results and applications of means-ends epistemology.
Theory and the Philosophy of Science. Kevin
Kelly, Oliver Schulte and Cory Juhl (1997). Philosophy of
Compares learning theory with other approaches to inductive inference, especially Bayesian confirmation theory. Has a section of examples that illustrate learning-theoretic analysis, from scientific revolutions to cognitive science. No symbols!
Reasoning about Admissibility. Cristina
Oliver Schulte (1997). Erkenntnis 45:299-325.
in: Probability, Dynamics and Causality, eds. Constantini, D. and Galavotti, M.C.
I regularly use the decision-theoretic criterion of admissibility (avoiding weak dominance) to evaluate the performance of inductive methods. This paper applies admissibility to make recommendations not just for games of inquiry but for any game, including those modelling social phenomena of great interest. We define a natural procedure for eliminating weakly dominated strategies in both game trees and matrix games. We show that this procedure satisfies a commonly accepted wishlist of properties that we would like in our tools for analyzing games---one of the few solution concepts to do so. Technically speaking, iterated admissibility generalizes backward and forward induction reasoning and yields the same result in game trees as it does in the matrix games that represent them (proviso: the game tree must have perfect recall).
as Epistemology. Oliver Schulte and Cory Juhl
The Monist vol. 79:1, 141-147.
One interpretation of point-set topology is as the theory of inquiry for logically omniscient agents with no limitations on memory capacity (cf. Vickers' book "Topology via Logic"). This view turns topology into a powerful tool for epistemology. The paper is a friendly introduction to the connection between topology and epistemology. Features topological interpretations of Popper's falsifiability criterion.
Belief Revision. Kevin Kelly, Oliver Schulte
Hendricks. Proceedings of the XII Joint International
Congress for Logic, Methodology and the Philosophy of Science.
1995. compressed version Florence
When somebody proposes general norms for empirical inquiry, means-ends epistemology asks whether these norms help or hinder the goals of inquiry. In this paper we investigate whether axioms for "minimal-change belief revision" prevent agents from reliably finding the truth. In a nutshell, the answer is that these axioms don't help but they don't have to hurt either, provided we design our "belief revision operators" in the right way.
Thesis and Hume's Problem. Kevin Kelly and
Proceedings of the XII Joint International Congress
Logic, Methodology and the Philosophy of Science.
Tradition has it that deductive logic and empirical inductive reasoning are two disparate forms of inquiry. However, modern theories of computation and of empirical inquiry reveal deep and fundamental analogies between the two. We describe some of the most important similarities. These analogies are yet another reason to study the empirical capacities of computationally bounded agents---hence we review the results from the paper below.
Computable Testability of Uncomputable Theories.
Kelly and Oliver Schulte (1995). Erkenntnis
43:29-66. This paper was reviewed
in: The Journal of Symbolic Logic , 61: #3, p.1049.
If a computer cannot derive the predictions of a theory, is it possible for a computer to test the theory against empirical data? The surprising answer is: sometimes yes, even with infinitely uncomputable (hyperarithmetical) theories. The paper gives a complete map of the relationship between the inductive complexity of a theory---how hard it is to test the theory---and its deductive complexity---how hard it is to derive the predictions of the theory.