# Approximating Predicates

A *predicate* is just a
Boolean function that applies to a variable (or, equally, a tuple of
variables), expressed formally. A predicate is used to describe a condition or
test to be applied to the variable (or collection of variables). The WHERE
clause from an SQL query is a typical example: for each possible result row,
the WHERE clause determines whether that row should or should not actually be
included in the result.

It often occurs that we have a predicate which cannot be directly
used in a given context. For example, we may tasked to interrogate a database
for entries that satisfy a particular predicate, but the predicate doesn't
directly translate into an SQL condition, perhaps because the formalism used
for the predicate is more powerful than that afforded by SQL, or perhaps
because it depends on quantities or functions that aren't available in the
database itself. In these situations, we wish to derive an *approximation* to the predicate such that the predicate satisfies certain
additional constraints (e.g., can be expressed in SQL). Continuing the
example, the approximating predicate could be used to construct an SQL query
that produces a collection of candidates from which the final result would be
computed by applying the original predicate.

## Necessary and Sufficient Approximations

Typically it's not enough just to "simplify" a predicate by simply ignoring parts that can't be used directly: the result would be an approximation that bears an unknown relationship to the original predicate. In other words, knowing the value of such an approximation still tells us nothing about the value of the original predicate when applied to the same variable value.

Much more useful is to identify approximations that are
either *necessary* or *sufficient* conditions.

A *sufficient condition*
for P(x) is a predicate S(x) such that:

S(x) => P(x) for all x.

Any value that satisfies a sufficient condition will also satisfy P. A sufficient condition may be "too strict," in that there may be values for which P would return true but S would not.

In the case of the query example, if our task is to select *all* the rows that satisfy the original predicate, then
we must ensure that the approximation not cause the SQL query to overlook any
candidates which the full predicate might accept. Such an approximation is
called a *necessary condition, *and
has the property that all values it rejects would also be rejected by P.
Formally, a necessary condition for P(x) is a predicate N(x) such that:

**not** N(x) => **not** P(x) for all x

or, equivalently,

P(x) => N(x) for all x.

Just as a sufficient condition is potentially too strict, a necessary condition is potentially too loose: there may be values of x for which N(x) returns true but P(x) returns false.

The universally true predicate, T(x) = **true**, is a necessary condition for any predicate. The
universally false predicate, F(x) = **false**, is a sufficient condition for any predicate. Even if no information
can be gleaned from a given predicate P, it's always possible to use these
universal predicates as approximations. Generally, however, we're interested
in constructing an approximation that's as close to the original predicate P as
possible; the universally true and universally false predicates are the always the
worst possible approximations.

## Conjunction and Disjunction

Typically a given predicate has a recognizable structure
involving conjunction ("**and**"),
disjunction ("**or**"),
and negation ("**not**").
Recognizing these Boolean operators makes it possible to deconstruct the
original predicate into simpler constructs and the form a full approximation
from the best approximations of those simpler constructs.

Let's take conjunction first, and suppose P(x) = P1(x) **and** P2(x). Then N_{P}(x) = N_{P1}(x) **and** N_{P2}(x) is a necessary condition for P,
and S_{P}(x) = S_{P1}(x) **and** S_{P2}(x), provided N_{P1}(x) is a
necessary condition for P1(x) and N_{P2}(x) is a necessary condition
for P2(x). Similarly, S_{P}(x) = S_{P1}(x) **and** S_{P2}(x) is a sufficient condition for
P(x), provided S_{P1}(x) is a sufficient condition for P1(x) and S_{P2}(x)
is a sufficient condition for P2(x).

Not only are N_{P}(x) and S_{P}(x)
respectively necessary and sufficient conditions, but they're also the best
approximations possible using the approximations of P1(x) and P2(x).

## Negation

Typically we're more interested in obtaining necessary
conditions than sufficient ones, because we need to use the necessary condition
to perform an initial search or query from which we'll select the final results
that satisfy our original predicate. The possibility of negation, however,
requires us to consider necessary and sufficient conditions together. That's
because a necessary condition for **not **P(x)
is the negation of a sufficient condition for P(x), that is:

N_{not}_{ P(x)} = **not** S_{P}(x)

Similarly, a sufficient condition of a negated predicate is the negation of a necessary condition:

S_{not}_{ P(x)} = **not** N_{P}(x).

## Indeterminacy

Under some definitions, it's allowed to have predicates whose value depends on more than just the argument value(s). As a kind of silly example, "x < 15 when the moon is full, and false otherwise" depends on x, but also on the phase of the moon, a completely external condition. If the phase of the moon isn't to be considered part of the argument to the predicate, then part of the predicate is indeterminate for our purposes.

We can re-write this example predicate in the logically-equivalent form:

P(x)
= (the moon is full) **and** (x < 15)

The "the moon is full" part is indeterminate, for our purposes, and so can only be approximated with the universally true or universally false predicates. In other words,

N_{P}(x)
= N_{the moon is full}(x) **and** N_{x
< 15}(x)

= T(x) **and** x < 15

= x < 15

assuming that x < 15 can serve as its own necessary condition.

## Applications

For CXWS, the CxCondition class expresses predicates for retrieving CIM instances. CxCondition provides a getRestriction method that providers use to test whether the condition requires a particular value for a given property for all qualifying CIM instances. (The provider uses this information to narrow its search, e.g. by knowing that instances are required from one particular remote system, and not all possible remote systems.)

The getRestriction method can be conceptualized as the construction of a necessary condition for the CxCondition, where the approximation has the form

x.p = c

for the given property p and some value c.

CxCondition can express arbitrary conditions, but it's much more typical that they're composed from a small vocabulary of primitive conditions combined with Boolean operators. The full CxCondition can be viewed as a tree, with Boolean operators on the interior nodes and primitive conditions as the leaves. One of the primitive conditions is, not surprisingly, a property value test, that is, P(x) is x.p = c for some property p and some value c.

To implement getRestriction, then, we just walk the CxCondition's tree down to the leaves. If a leaf is an x.p = c primitive for the property p we're interested in, then the leaf can be used unchanged as its own approximation; otherwise, we treat the primitive condition as indeterminate, and approximate it using T(x) or F(x), depending on whether we're looking for a necessary or sufficient condition.

The initial result of the tree walk is one of: T(x), F(x), x.p = c for some value c, or some Boolean composition of these. T(x) and F(x) can always be optimized away when used with Boolean operators, while combinations of property equality tests can always be simplified because they're either redundant (same value c) or contradictory (different values). Performing these optimizations and simplifications yields a final approximation that is either T(x), F(x), or x.p = c, exactly what we need for implementing getRestriction.