Approximating Predicates

A predicate is just a Boolean function that applies to a variable (or, equally, a tuple of variables), expressed formally. A predicate is used to describe a condition or test to be applied to the variable (or collection of variables). The WHERE clause from an SQL query is a typical example: for each possible result row, the WHERE clause determines whether that row should or should not actually be included in the result.

It often occurs that we have a predicate which cannot be directly used in a given context. For example, we may tasked to interrogate a database for entries that satisfy a particular predicate, but the predicate doesn't directly translate into an SQL condition, perhaps because the formalism used for the predicate is more powerful than that afforded by SQL, or perhaps because it depends on quantities or functions that aren't available in the database itself. In these situations, we wish to derive an approximation to the predicate such that the predicate satisfies certain additional constraints (e.g., can be expressed in SQL). Continuing the example, the approximating predicate could be used to construct an SQL query that produces a collection of candidates from which the final result would be computed by applying the original predicate.

Necessary and Sufficient Approximations

Typically it's not enough just to "simplify" a predicate by simply ignoring parts that can't be used directly: the result would be an approximation that bears an unknown relationship to the original predicate. In other words, knowing the value of such an approximation still tells us nothing about the value of the original predicate when applied to the same variable value.

Much more useful is to identify approximations that are either necessary or sufficient conditions.

A sufficient condition for P(x) is a predicate S(x) such that:

S(x) => P(x) for all x.

Any value that satisfies a sufficient condition will also satisfy P. A sufficient condition may be "too strict," in that there may be values for which P would return true but S would not.

In the case of the query example, if our task is to select all the rows that satisfy the original predicate, then we must ensure that the approximation not cause the SQL query to overlook any candidates which the full predicate might accept. Such an approximation is called a necessary condition, and has the property that all values it rejects would also be rejected by P. Formally, a necessary condition for P(x) is a predicate N(x) such that:

not N(x) => not P(x) for all x

or, equivalently,

P(x) => N(x) for all x.

Just as a sufficient condition is potentially too strict, a necessary condition is potentially too loose: there may be values of x for which N(x) returns true but P(x) returns false.

The universally true predicate, T(x) = true, is a necessary condition for any predicate. The universally false predicate, F(x) = false, is a sufficient condition for any predicate. Even if no information can be gleaned from a given predicate P, it's always possible to use these universal predicates as approximations. Generally, however, we're interested in constructing an approximation that's as close to the original predicate P as possible; the universally true and universally false predicates are the always the worst possible approximations.

Conjunction and Disjunction

Typically a given predicate has a recognizable structure involving conjunction ("and"), disjunction ("or"), and negation ("not"). Recognizing these Boolean operators makes it possible to deconstruct the original predicate into simpler constructs and the form a full approximation from the best approximations of those simpler constructs.

Let's take conjunction first, and suppose P(x) = P1(x) and P2(x). Then N_P(x) = N_P1(x) and N_P2(x) is a necessary condition for P, and S_P(x) = S_P1(x) and S_P2(x), provided N_P1(x) is a necessary condition for P1(x) and N_P2(x) is a necessary condition for P2(x). Similarly, S_P(x) = S_P1(x) and S_P2(x) is a sufficient condition for P(x), provided S_P1(x) is a sufficient condition for P1(x) and S_P2(x) is a sufficient condition for P2(x).

Not only are N_P(x) and S_P(x) respectively necessary and sufficient conditions, but they're also the best approximations possible using the approximations of P1(x) and P2(x).

Negation

Typically we're more interested in obtaining necessary conditions than sufficient ones, because we need to use the necessary condition to perform an initial search or query from which we'll select the final results that satisfy our original predicate. The possibility of negation, however, requires us to consider necessary and sufficient conditions together. That's because a necessary condition for not P(x) is the negation of a sufficient condition for P(x), that is:

N_not_P(x) = not S_P(x)

Similarly, a sufficient condition of a negated predicate is the negation of a necessary condition:

S_not_P(x) = not N_P(x).

Indeterminacy

Under some definitions, it's allowed to have predicates whose value depends on more than just the argument value(s). As a kind of silly example, "x < 15 when the moon is full, and false otherwise" depends on x, but also on the phase of the moon, a completely external condition. If the phase of the moon isn't to be considered part of the argument to the predicate, then part of the predicate is indeterminate for our purposes.

We can re-write this example predicate in the logically-equivalent form:

P(x) = (the moon is full) and (x < 15)

The "the moon is full" part is indeterminate, for our purposes, and so can only be approximated with the universally true or universally false predicates. In other words,

N_P(x) = N_{the moon is full}(x) and N_{x
< 15}(x)

= T(x) and x < 15

= x < 15

assuming that x < 15 can serve as its own necessary condition.

Applications

For CXWS, the CxCondition class expresses predicates for retrieving CIM instances. CxCondition provides a getRestriction method that providers use to test whether the condition requires a particular value for a given property for all qualifying CIM instances. (The provider uses this information to narrow its search, e.g. by knowing that instances are required from one particular remote system, and not all possible remote systems.)

The getRestriction method can be conceptualized as the construction of a necessary condition for the CxCondition, where the approximation has the form

x.p = c

for the given property p and some value c.

CxCondition can express arbitrary conditions, but it's much more typical that they're composed from a small vocabulary of primitive conditions combined with Boolean operators. The full CxCondition can be viewed as a tree, with Boolean operators on the interior nodes and primitive conditions as the leaves. One of the primitive conditions is, not surprisingly, a property value test, that is, P(x) is x.p = c for some property p and some value c.

To implement getRestriction, then, we just walk the CxCondition's tree down to the leaves. If a leaf is an x.p = c primitive for the property p we're interested in, then the leaf can be used unchanged as its own approximation; otherwise, we treat the primitive condition as indeterminate, and approximate it using T(x) or F(x), depending on whether we're looking for a necessary or sufficient condition.

The initial result of the tree walk is one of: T(x), F(x), x.p = c for some value c, or some Boolean composition of these. T(x) and F(x) can always be optimized away when used with Boolean operators, while combinations of property equality tests can always be simplified because they're either redundant (same value c) or contradictory (different values). Performing these optimizations and simplifications yields a final approximation that is either T(x), F(x), or x.p = c, exactly what we need for implementing getRestriction.