Spheres of Knowledge

William Kent
Database Technology Department
Hewlett-Packard Laboratories
Palo Alto, California

Sept 1990

> 1 INTRODUCTION . . . 2
> 2 GENERAL CHARACTERISTICS . . . 3
>> 2.1 Remembered and Consulted Knowledge of Objects . . . 3
>> 2.2 Native and Known Objects . . . 4
>> 2.3 Functions . . . 4
>> 2.4 Expressions . . . 6
>> 2.5 Types . . . 6
> 3 IDENTITY AND IDENTIFIERS . . . 7
> 4 SCHEMAS . . . 8
> 5 PERSISTENCE, CONCURRENCY OF SPHERES . . . 8
> 6 INCONSISTENT AND MULTIPLE KNOWLEDGE . . . 8
> 7 NAMING CONVENTIONS . . . 8

1 INTRODUCTION

We want our databases to be infallible windows on reality. When we need travel information, we want the database to list the actually available flights to our destination. Once in a while we're disappointed: it may list a flight that doesn't exist, or not list a flight that does exist. The database may or may not know a lot of other things about a real flight, such as the type of aircraft, or the movies to be shown.

It's like being disappointed in a travel agent in whom we've had a lot of confidence. Neither the travel agent nor the database are infallible windows on reality, but fallible agents of knowledge about reality. The knowledge of either one might be inaccurate or incomplete.

We begin to realize that, far from being windows on reality, they are memory banks of information. They only know what they've been told.

This is an important metaphorical shift, affecting the way we think about the semantics of databases.

Dealing with multiple databases introduces a new twist: inconsistency. A single database or travel agent can at least maintain internal consistency of its knowledge, even if it is incorrect. Semantic models and schemas of databases rest on an assumption of consistency. But multiple databases are likely to be inconsistent.

What happens when we ask several databases/agents about flights between Los Angeles and San Jose? One might list fifteen, another twenty, another might be listing flights to San Jose (in Costa Rica!), another might say it never heard of Los Angeles, and yet another doesn't know what a flight is.

It would be nice if a coordinated system of databases could be relied on to resolve such inconsistencies internally, and present us with the coherent view of a single database. That's not always possible, and we need to account for this in our models of database semantics.

A sphere of knowledge is an abstraction of one of these databases or agents. It might in fact be a subset of the information in a database, or a combination of several databases, or perhaps be some other sort of information source.

Our model of a multi-database system will be a system of such spheres, in which a user is dealing directly with one sphere, which may in turn be consulting other spheres. The metaphor supports various notions of transparency. For instance, the user's immediate sphere might spontaneously consult other spheres, or the user might direct it to consult particular spheres. The immediate sphere might be able to integrate the information it gets by consulting others, or it might simply report back what each has said and let the user cope with discrepancies.

The analogy holds. Your travel agent might spontaneously consult others, or you might tell him who to consult. Your agent might work out discrepancies, or he might just pipeline the answers back to you, and let you cope.

Casting this in terms of an object model, two fundamental things are relativized with respect to spheres of knowledge:

The existence and identity of objects.
The behavior of operations.

A given city may be known in one sphere, but not in another. We know that creating or destroying a city object has little correspondence with the actual founding or destruction of a city. It only has to do with whether or not the database knows about the city. Creation is a way of introducing a city to a particular database, and destruction a way of making the database forget it. This goes on independently for different databases, even if they are interconnected. Later we will talk about remembered versus consulted information known in a database.

Correspondingly, an operation that asks for a list of all cities will behave differently depending on which database we ask. With interconnected databases, the answer can also vary depending on which other databases it consults, and how it reconciles their responses.

2 GENERAL CHARACTERISTICS

A sphere of knowledge is something in which an object is "known".

Every request (function invocation) occurs with respect to a sphere.

A user of a database is connected to a given sphere, which is the default sphere for all requests issued by that user.

One sphere may be attached as a subsphere of another; everything known in the subsphere is known in the supersphere. The attachment and detachment of subspheres to superspheres might be quite dynamic (more later).

2.1 Remembered and Consulted Knowledge of Objects

An object is remembered in a sphere in which it was created. (It might be remembered in more than one sphere; more about that later.)

An object is known in a sphere if it is remembered in that sphere or known in any attached subsphere.

Suppose that employees e0 and e1 were created in spheres s0 and s1 respectively. While the spheres are detached from each other,

s0:Instances(Employee) = {e0},

s1:Instances(Employee) = {e1}.

But if s1 is attached as a subsphere of s0, we have

s0:Instances(Employee) = {e0, e1}.

Knowledge does not propagate downward. A request for the instances of employee addressed to sphere s1 would still yield

s1:Instances(Employee) = {e1}.

After s1 is detached from s0, we again have

s0:Instances(Employee) = {e0}.

Thus the information obtained by a request depends on the sphere to which it is addressed, and the spheres to which that sphere are in turn connected. (This could cascade.)

2.2 Native and Known Objects

Object creation, like any operation, occurs with respect to a sphere:

s0:Create Employee;

The Employee type must be known in sphere s0; more about that later. If s0 is the default, it may be omitted from the request.

An object created in a sphere is native to that sphere. The objects known to a sphere are the objects native to that sphere together with the objects known to its subspheres. In this case, the created employee is native to s0; it is known in s0 and any superspheres of s0.

There may be operations which coerce several objects to be the same object. If they were created in different spheres, then the resulting object may be native to several spheres. Destroying the object in one of those spheres does not automatically destroy it in the other spheres.

The idea is this: an object is known in sphere s1, and an object is known in sphere s2. If s0 is a supersphere of s1 and s2, it may be known in s0 that those are the same object. Deleting the object from s1 need not delete it from s2.

The default assumption is that distinct creation events yield distinct objects.

Object persistence depends on the spheres in which it is native. If a sphere disappears, objects native only to that sphere also disappear.

These ideas apply to all objects, including types and functions. Literals and certain system objects may be known to all spheres, by definition.

2.3 Functions

Functions are, first of all, objects. They are subject to creation and identification disciplines like ordinary objects.

A request, consisting of a function and arguments, occurs with respect to a "requesting sphere", which may be explicitly specified or defaulted. The notation s:f(x) signifies that the request f(x) is being addressed to the sphere s. If omitted, the sphere is assumed to be the current default sphere.

The function and the arguments must be known in the requesting sphere. Naming conventions (qualifiers) for the function and operator may explicitly refer to subspheres of the given sphere. The result will only include objects known in the requesting sphere.

Assume that s1 is attached as a subsphere of s0, and that the Employee type and the Instances function are known in both spheres. That is, Employee is known to be the same type object in both spheres, and Instances is known to be the same function object in both. [May need to elaborate on that later.]

A user connected to s0 can create an employee in s0 with either of the requests

e0 <- s0:Create Employee;

e0 <- Create Employee;

This employee is remembered and known in s0, but not in s1.

Our user, still connected to s0, can create another employee in s1:

e1 <- s1:Create Employee;

This employee is remembered and known in s1, and is known in s0 so long as s1 remains attached.

An Instances request can be addressed to one sphere or the other, yielding the results shown earlier.

It's important to understand that we consider Instances to be the same function in all these cases, but its behavior depends on the requesting sphere. In effect, the requesting sphere can be considered a hidden parameter to any function, so that its behavior can depend on the sphere.

In effect, the extension of a function may be partitioned over various spheres. When arguments or results include objects which are native to different spheres, that portion of the function's extension is only known to superspheres which include all of those spheres.

This section needs to be further elaborated. Consider examples of functions which are native to one, two, or three of these spheres, and what happens with arguments and results from the different spheres. Consider what happens with updates, and with constraints.

Here's an interesting example to play with. Consider a subsphere s3 of s0 in which the Assigned function is defined:

s3:create function Assigned (Employee) -> Department.

The Assigned function is thus native to s3, and is known to s3 and s0 but not to s1 or s2 (subspheres of s0).

What does it mean to say in s0

s0:Set Assigned(e1)=d2

if e1 is native to s1 and d2 is native to s2? I think we can give consistent answers to that and similar questions, but it will be interesting to work them out.

2.4 Expressions

In an expression, including a query, we have to be able to establish the appropriate sphere for each function call (operation). When we are able to recast the expression as a composition of functions, we still have some open questions. For example, suppose that the request

s1:f(g(x))

is issued in sphere s0. Should the default for g be s0, the global default, or s1, the "local" sphere currently in effect at that point?

There are other questions, too, I'm sure.

2.5 Types

Types are, first of all, objects. They are subject to creation and identification disciplines like ordinary objects.

The behavior of types should follow from the preceding behaviors of objects and functions.

The SubSuper predicate is itself a function. Any operation which has the effect of

s:Set SubSuper(T1,T2)=True

means that in the sphere s, T1 is known to be a subtype of T2. This is also known in any supersphere of s. As before, this can create interesting results if T1 or T2 are not native to s but to some subspheres of s.

Again, we should work out some detailed examples. For example,

s:Instances(Employee)

would include instances of Programmer if Programmer is known to be a subtype of Employee in the sphere s. [We will probably uncover some sticky cases when we elaborate these examples, but I trust we can work out solutions.]

We ought to be very careful in distinguishing two cases:

A type is known to be the same object in a sphere and its subspheres, although the Instances function returns different results depending on where it is requested.
A type in one sphere is a super-type of types in other spheres. These are known to be different type objects.

Similarly, we should carefully distinguish two notations:

s.f is a name qualification convention, meaning that the name f might refer to different objects in different spheres. s.f refers to the object which is known by the name f in sphere s. Thus s1.f and s2.f are qualified names for possibly different objects. The expressions s1.f(x) and s2.f(x) may be invoking different function objects.
s:f is a sphere constraint on the execution of f. Thus s1:f(x) and s2:f(x) are invoking the same function, which may behave differently in the two spheres.

Analogously with subtypes, the types of an object may differ depending on the sphere in which its types are requested, displaying inverse behavior of the Instances function.

3 IDENTITY AND IDENTIFIERS

Every sphere defines an implementation of object identifiers and the object identity operator ==. There are defaults, which might be the following (we need to discuss these):

Objects native to the sphere are assigned system-generated oid's with a high probability of uniqueness within the sphere.
An object native to a subsphere is identified by concatenating the oid provided by the subsphere with an identifier of the subsphere itself.
The object identity operator == consists of simple string comparison on oid's.

Explicit specifications could override some of these defaults. For example, oid's for instances of certain types within the sphere could be defined in terms of certain properties of those objects, i.e., user-defined oid's. [This should be subject to the constraints of an earlier note I wrote on this subject.]

We postulate that object identity is ultimately defined in terms of the identity operator x == y, which is true iff x and y refer to the same object. The default definition is based on string comparison of oid's. Overrides may be specified by various procedural specifications. One such specification might say that if x and y are instances of T1, then x == y iff p(x) == p(y) for some property p defined on T1. Another possibility might say x == y if id(x,y) has been asserted, where id is some user predicate. This would allow assertion of identity between arbitrary objects.

Identity specifications should also be subject to constraints [they are documented in one of my papers]. For example, if x == y, then we would expect f(x) == f(y).

The Iris/OSQL model should be clarified to use this == identity operator wherever appropriate, and to rely on the sphere-specific definition whenever necessary. For example, the Distinct operator on a query should not allow x and y to both be returned if x == y. If they are different oid's, then we need some mechanism to establish which one is preferred to be returned.

As with other operations, identity itself may depend on the requesting sphere. Thus x == y may be true in one sphere and not in another. For example, one supersphere might define identity on the basis of matching employee numbers, another on the basis of matching social security numbers.

Thus we should have a sphere-specific notation for the identity operation, e.g., x(s:==)y, or s:Ident(x,y).

4 SCHEMAS

Every sphere has a schema, consisting of the types and functions known in that sphere. It changes dynamically as subspheres are attached and detached. The native schema of a sphere consists of the types and functions native to that sphere.

There is a default under which the schema of a sphere includes the disjoint union of the schemas of its subspheres. [Need to work out details: universal objects such as literals and system objects are "above" the disjoint union.]

[Schemas also reflect identifier/identity conventions.]

5 PERSISTENCE, CONCURRENCY OF SPHERES

What happens if several users connect to the same sphere, and independently attach and detach subspheres? Do we have transient and persistent subsphere attachments? How are concurrent persistent attachments/detachments reconciled?

6 INCONSISTENT AND MULTIPLE KNOWLEDGE

This relates to the question of how many times a movie is highly rated.

Here we have a subtle but crucial shift in the basic semantics. Instead of simply modeling facts, we become conscious of the assertions of those facts. Instead of simply saying that movie X is good, we are now remembering that Y said movie X is good, or two people said movie X is good, or X said it but Y said the opposite.

While this appears to be a subtlety, it is a qualitative jump in the nature of the information we are dealing with. We get into some areas of AI and natural language, such as belief systems and reconciliation of multiple perceptions. The semantics of this extension are not to be taken lightly.

7 NAMING CONVENTIONS

[Discuss overloading. Differentiate between name mappings and the binding between indirect and direct functions.]