Learning First Order Rules

A classic problem that requires first-order logic is learning concepts about nodes in a graph.

We can't set each linked node as an attribute, because how many attributes would there be? Not being able to use a fixed set of attributes means we need to learn connectivity concepts.

Representing Graph Concepts in First Order Logic

Can represent graph concepts as:

(1)
\begin{align} Ancestor(x, y) \leftarrow Parent(x,y) \end{align}

This representation is called Inductive Logic Programming.

Prolog is based on logic programming (Horn Clause subset).

Induction as Inverted Deduction

Induction is finding h such that

(2)
\begin{align} (\forall<x_i, f(x_i)> \in D) B \wedge h \wedge x_i \vdash f(x_i)) \end{align}
  • Where xi is the ith training instance
  • f(x_i) is the target function value for x_i
  • B is other background knowledge

Example

Relations between people, such that x<u,v> means that v is the x of u.

e.g.

(3)
\begin{equation} f(x_i): Child(Bob, Sharon) \end{equation}

Deductive Operators vs Inductive Operators

Deductive operators are functions like F:

(4)
\begin{align} F(A, B) = C \text{, where } A \wedge B \vdash C \end{align}

We need inductive operators like O:

(5)
\begin{align} O(B, D) = h \text{, where } (\forall<x_i, f(x_i)> \in D) B \wedge h \wedge x_i \vdash f(x_i)) \end{align}

Positives of Induction as Inverted Deduction

  • Subsumes earlier idea of finding h that “fits” training data
  • Domain theory B helps define meaning of “fit” the dat
  • Suggests algorithms that search H guided by B

Negatives of Induction as Inverted Deduction

  • Doesn't allow for noisy data
  • First order logic gives a huge hypothesis space H - makes it intractable to calculate all acceptable h's.

Resolution Rules

If we have a true clause:
C1 = "P or L"
then we can find any other true clause that has "not L" (e.g. C2 = "not L or R") and create following resolution that must be true:
C = "P or R"

Screen%20Shot%202012-04-17%20at%207.45.12%20PM.png

(Because if both the two are true then at least one of P and R must be true, as L and not L can't both be true).

Inverting Resolution

Taking the resolution and one clause and figuring out the other is easy:

Screen%20Shot%202012-04-17%20at%207.49.40%20PM.png

take C1 and C, and find the literal that occurs in C1 but not C
(e.g. C1 = "PassExam or not KnowMaterial", C = "PassExam or not Study" - literal would be not KnowMaterial).

Form C2 by taking the terms from C that aren't in the and of C1 and C, and adding "or not Literal" of the literal.
(e.g. C2 = "not Study or not not KnowMaterial" = "not Study or KnowMaterial").

Duce Operators

These are just ways of replacing terms with other equivalent terms.

I.e.
If we know that "A, B implies p" and "A, q implies p"
Then we can just say that "B implies q" and use "A, q implies p"

Screen%20Shot%202012-04-17%20at%207.54.42%20PM.png

First-Order Resolution

First order resolution is exactly the same as resolution, except that we find a substitution, $\theta$ that makes the literals in C1 and C2 the same (e.g. if we have study(a) and study(b) we might make theta be replace a with b).

(6)
\begin{align} C = (C_1 - {L-1})\theta \cup (C_2 - {L_2})\theta \end{align}

Inverting First-Order Resolution

We do the same thing as before - take C1 and C and try to find C2.

So we take out the common terms from C1 and C, and then set C2 to be what's left in C or'd with the not of the literal in C1 but not in C. We also apply our theta transformation to make sure the literals are the same.

e.g. C1 = Father(Shannon, Tom) and C = GrandChild(Bob, x)
Hence C1's literal is Father(Shannon, Tom) and what's left of C is GrandChild(Bob,Shannon).
We hence set C2 to be GrandChild(Bob,Shannon) or not Father(Shannon, Tom)

We apply theta to replace the Shannon's in order to generalise them from literals to predicates.