William Kent, "Carriers And Cargos: A General Paradigm For Modeling Collections, Tables, Multimeda, Spatial Constructs, And Other Intensional/Extionsional Objects", Oct 1993. [20 pp]


Carriers And Cargos: A General Paradigm For Modeling Collections, Tables, Multimeda, Spatial Constructs, And Other Intensional/Extionsional Objects

William Kent
Oct 1993


CONTENTS:
> 1 INTRODUCTION/ABSTRACT . . . 2
> 2 SETS . . . 2
>> 2.1 A Question . . . 3
>> 2.2 Extensional Sets . . . 3
>>> 2.2.1 Identity . . . 3
>>> 2.2.2 Immutability . . . 3
>>> 2.2.3 Existence . . . 3
>>> 2.2.4 Operations on Extensional Sets . . . 4
>> 2.3 Intensional Sets . . . 4
>>> 2.3.1 Identity . . . 4
>>> 2.3.2 Mutability . . . 5
>>> 2.3.3 Existence . . . 6
>>> 2.3.4 Operations on Intensional Sets . . . 6
>> 2.4 A Terminological Digression . . . 6
>>> 2.4.1 Extensional and Intensional Specification . . . 7
>>> 2.4.2 What's an Object? . . . 7
>>> 2.4.3 Casting . . . 7
>>> 2.4.4 Parameterized Types . . . 8
>> 2.5 Set Types . . . 9
> 3 CARRIERS AND CARGOS . . . 10
>> 3.1 Mutation Operators . . . 11
>> 3.2 Constraints on Mutation . . . 12
>> 3.3 Intensional Set Types . . . 13
>> 3.4 The Answer . . . 14
> 4 OTHER USES FOR CARRIERS AND CARGOS . . . 14
>> 4.1 Multisets, Lists, and Other Aggregates . . . 14
>> 4.2 Tuples and Records . . . 14
>> 4.3 Spatial and Geometric Constructs . . . 15
>> 4.4 Documents, Files, Multimedia, and Other Containers . . . 16
>> 4.5 Large Objects . . . 17
>> 4.6 Events . . . 17
>> 4.7 Relational Tables . . . 17
>> 4.8 A Comparison of Carriers and Variables . . . 19
> 5 SUMMARY/CONCLUSIONS . . . 19
> 6 REFERENCES . . . 20


1 INTRODUCTION/ABSTRACT

Confusion over identity, existence, and mutability of sets can be sorted out by distinguishing between intensional and extensional sets. The same applies to other collections, of course. A similar pattern also arises in distinguishing between a relational table and its collection of tuples, a team and its members, a document and its text, a file and its contents, a circular object and a geometric circle, a movable point and a point in geometric space, a photograph and the image it contains, an event and the time at which it occurs, a counter and its value, and probably other things as well.

The essential paradigm for dealing with all these phenomena is to treat them as pairs of associated but distinct objects, which we can generically call carriers and cargos. Carriers are created objects which carry other objects (often literals) as their cargos. The cargo of a given carrier is constrained to be of a specified cargo type. Thus the cargo of an intensional set is an extensional set, the cargo of a team is a set of persons, the cargo of a document is its text, and so on. A carrier will often "inherit" operations defined on its cargo, so that the union of two intensional sets is the union of their extensions, and printing a document means printing its character string text. In effect, a carrier can be "cast" into a role normally played by its cargo. At the same time, a carrier has unique operations of its own, such as recording the author of a document or the owner of a team.

This paradigm can be realized using familiar language facilities such as casting and parameterized types.

This paper explores carriers and cargos as a modeling technique. It has not yet been prototyped in a real language, and some specific concerns remain to be investigated.

2 SETS

We begin with a careful review of the familiar identity, existence, and mutability properties of sets, and then show how the principles extend nicely to other areas.

2.1 A Question

Suppose the set of Bowlers is {Dick,Jane}, and the set of Golfers also happens to be {Dick,Jane}. (Let's simplify one thing by pretending that people are uniquely identified by their first names.) Are Bowlers and Golfers the same? Well, yes and no. A precise answer requires a careful distinction between extensional and intensional sets.

An analogous question will help illustrate certain concepts. Suppose a compiler keeps two counters for counting Symbols and Procedures which occur in a given program. If there happen to be five symbols and five procedures, are Symbols and Procedures the same?

2.2 Extensional Sets

2.2.1 Identity

Extensional sets are the basic mathematical constructs we learned about in school. The essential catch phrase was "a set is determined by its extension" (i.e., by its members), which implies an essential principle of identity. If a set contains just Dick and Jane, that uniquely identifies the set. There is no other set which contains just Dick and Jane. It's even a mistake to say that two sets which contain just Dick and Jane are the same, because they weren't two sets to begin with. It's like saying two numbers which are both five are the same number. There is only one number five. There is only one set {Dick,Jane} which contains just Dick and Jane.

The nature and identity of an extensional set is not affected by type considerations. If Dick and Jane are both students and teachers, then the set of teachers {Dick,Jane} and the set of students {Dick,Jane} are one and the same set.

A natural corollary to the identity principle is that there is exactly one empty set, denoted {}.

2.2.2 Immutability

The identity principle also implies immutability. {Dick,Jane} and {Dick,Jane,Spot} are different sets, just as five and six are different numbers. The careless phrase "insert Spot into the set {Dick,Jane}" is a misnomer. Nothing gets inserted into that set; the set {Dick,Jane} is still the set {Dick,Jane}. This "insertion" is just an operation which yields a different set, i.e., {Dick,Jane} UNION {Spot} = {Dick,Jane,Spot}. The analogy here is that "incrementing" five doesn't change the number five in any way. It's just an operation which yields a different number: 5+1=6.

2.2.3 Existence

There's also an important existence principle. If anything exists, then the set containing it exists. That's the foundation of the set-theoretic definition of numbers: the empty set exists (corresponding to 0), and the set containing an existing set also exists (thus establishing the notion of successor). Furthermore, the union of any sets is an existing set. The net effect is that for any bunch of things which exist, the set containing those things exists. It's somewhat analogous to saying that all the numbers exist.

All sets of existing things exist, including such bizarre combinations as {Dick,5,Chicago,Jupiter,red,{}}. Such sets exist whether or not they have ever been mentioned before, and whether or not they are currently recorded in some data item. The number 49786554313989 exists, whether or not anyone has ever mentioned it before, and so does that bizarre set.

It follows that extensional sets cannot be explicitly created or destroyed. They exist or don't exist automatically in correspondence with the existence of their members. At the moment Dick comes into existence, all the sets containing Dick (with zero or more other existing members) come into existence, together with all the sets containing such sets, ad nauseam. The moment Dick ceases to exist, all such sets cease to exist.

We can call this an ephemeral mode of existence. Such things do not exist forever, nor are they overtly created or destroyed by some explicit operation. Their existence is determined by some rule which might be satisfied at some times but not at other times; they exist only while the rule holds. Things which always exist, like the numbers and the empty set, are said to have an eternal mode of existence. Things which are explicitly created and destroyed, such as people or departments or tables, can be said to have a mortal mode of existence. Some sets are in fact eternal, namely sets of eternal things, such as the sets of numbers. Thus the set {1,2,49786554313989} is eternal, always existing.

Calling the expression {Dick,Jane} a "constructor" is another misnomer to the extent that it implies that something new is being created. It's simply an expression that refers to an existing thing, just as 1+1 refers to an existing number. We will instead call such an expression a designator, to avoid the unintended connotation. (The term has been adopted in SQL3 [reference].)

Sets have no ordering, so that the expressions {Dick,Jane} and {Jane,Dick} designate the same set, just as 5+1 and 1+5 designate the same number. Thus different designators may designate the same set.

This also arises from the absence of duplicates in sets. Since sets can't contain duplicates, evaluation of set designator expressions eliminates duplicates, so that the expressions {Dick,Jane} and {Dick,Jane,Dick} designate the same set. Thus, if x and y are variables, then {x,y} might designate a set with either one or two members, depending on whether x and y refer to the same thing. {x,y} might even designate the empty set if neither variable has a value. Thus you can't necessarily determine the size of a set from the form of a designator expression.

2.2.4 Operations on Extensional Sets

The usual operations of union, intersection, and difference of sets are well known. Card(s) gives the cardinality of a set s, i.e., the number of members it has. (We are not much concerned about infinite sets.) The predicate Occurs(x,s), often written x IN s, is true if and only if x is a member of the set s.

Equality of set-valued expressions, in the sense of being identical (one and the same), is defined as

s1=s2 <=> FORALL x, x IN s1=x IN s2.

The expressions refer to the same set if and only if the memberships are the same for all objects.

2.3 Intensional Sets

The set Bowlers is an intensional set, differing significantly from an extensional set in the essential properties of identity, mutability, and existence.

2.3.1 Identity

Bowlers and Golfers are different teams, regardless of their membership.

If Tom becomes one of the Bowlers, the team is still the Bowlers, whose membership has changed to {Tom,Dick, Jane}. Tom might also become one of the Golfers, making its membership {Tom,Dick,Jane} as well, but Bowlers remains a different team from Golfers.

By analogy, Symbols and Procedures are different counters in the compiler, even if they happen to have the same value.

Thus the identity of an intensional set cannot be determined by its membership. Its identity is constant while its membership can change. In object systems, such identity is typically characterized by an "object identifier" (oid), which essentially traces back to a creation event [Section 2.3.3].

2.3.2 Mutability

Intensional sets are mutable, via a replacement process. An intensional set always has an associated extensional set (possibly empty), sometimes called the "value" of the intensional set. We can think of the intensional set as a carrier, having an extensional set as its cargo, being the value of a Cargo function applied to the carrier.

An intensional set (i.e., a carrier) is mutated by replacing its cargo with a different cargo. It's crucial to distinguish the two distinct steps involved:

  1. Compute the new set. This does not alter the old cargo, nor does it yet affect the carrier.
  2. Make this result the new cargo of the carrier. It replaces the previous cargo, without altering the previous cargo. Now we consider the carrier to be mutated.

Thus, Tom becomes one of the Bowlers in these two steps:

  1. Compute the new cargo value: Cargo(Bowlers) UNION {Tom} = {Dick,Jane} UNION {Tom} = {Dick,Jane,Tom}. This does not alter the old cargo, which is the extensional set {Dick,Jane}, nor does it yet affect the carrier Bowlers.
  2. Make this result the cargo of the carrier: Cargo(Bowlers)<-{Dick,Jane,Tom}. It replaces the previous cargo, without altering the previous cargo. Now we consider the carrier to be mutated.

Analogously, incrementing the Symbols counter occurs in the same two steps:

  1. Compute the new value, e.g., Symbols+1 = 5+1 = 6.
  2. Assign that as the new value: Symbols<-6.

Separating the two steps clarifies the significance of constraints (including type declarations). Constraints typically apply to carriers rather than directly to their cargos. For example, suppose that teams such as Bowlers and Golfers may only have sets of people as their cargos. It's meaningless to think of this as a constraint on the set {Dick,Jane}; there's no point in constraining it - it is a set of people, and it will always be one so long as Dick and Jane are people. The constraint is on the team, restricting what sorts of cargos it may carry.

Thus, if we try to add Spot the dog to Bowlers, the first step is still valid: it is perfectly legal to compute the set {Dick,Jane} UNION {Spot}={Dick,Jane,Spot}. That is, the union operation is legal. It is only the second step which is illegal, namely trying to make this set the cargo of Bowlers.

Similarly, if counters are restricted to non-negative integers, that does not constrain the number 5; it simply is a non-negative integer. Subtracting 10 from the current value of Symbols is perfectly legal arithmetic, though it would not be legal to decrement the Symbols counter by 10. The subtraction Symbols-10 is legal, while the assignment is not.

We are essentially differentiating between operations on the cargo and mutation of the carrier. Operations which may have the look and feel of altering something in place should really be thought of as a replacement.If we replace the third digit of 1234 with a 4, we haven't altered the number 1234. We have replaced a mention of 1234 with a mention of 1244. Editing operations on a document can logically be thought of as designating new text strings to replace the old.

2.3.3 Existence

There is no automatic way that teams such as Bowlers or Golfers come into existence. They are mortal, having to be overtly created by some explicit operation such as New(Team). Such a creation event establishes their identity, typically captured in a "birthmark" which serves as the oid for such objects. Such mortal objects can also be overtly destroyed.

2.3.4 Operations on Intensional Sets

We'd often like to perform operations like union, intersection, difference, cardinality, and occurrence on things like Bowlers and Golfers. These operations are defined on sets, but Bowlers and Golfers are teams, not sets.

It is thus useful to treat Cargo as a casting function [Section 2.4.3], which allows operations defined on cargos to be applied to the carriers of such cargos. We'd like Bowlers UNION Golfers to be interpreted as Cargo(Bowlers) UNION Cargo(Golfers). This isn't true for all operations, equality being one of the notable exceptions. Bowlers=Golfers is very different from Cargo(Bowlers)=Cargo(Golfers). Casting is only invoked for operations which are not otherwise defined on the carriers.

Carriers often have properties of their own apart from the properties of their cargos. A team might belong to a league, and have an owner, manager, schedule, and record. Thus Bowlers and Golfers would be instances of the type Team, on which such properties are defined.

Another general class of operations defined on carriers but not on their cargos are the mutators [Section 3.1], which change the cargos of carriers. Thus, for example, the union operation Bowlers UNION {Harry} may yield a different set, but it does not alter the membership of Bowlers. An Insert operation would be a different thing, assigning the result of such a union to be the new cargo of Bowlers. The Insert operation would be defined as

Insert(x,c) ::= Cargo(c)<-Cargo(c) UNION {x},

incorporating the two steps described in Section 2.3.2.

2.4 A Terminological Digression

Certain terminological matters need to be sorted out before we can progress further.

2.4.1 Extensional and Intensional Specification

If you find this section confusing, ignore it. It only matters if you've seen the terms used otherwise.

The terms "extensional" and "intensional" are sometimes used in a different sense to describe the form of specification of a set. An extensional specification enumerates the members, while an intensional specification provides a membership rule. The two usages are orthogonal, since we can find examples of extensional and intensional sets specified either extensionally or intensionally.

Here's an extensional specification of an extensional set: {1,2,3,4,5}. The members are simply enumerated in the specification. The same set can be specified intensionally by the membership rule "the integers between one and five, inclusive". Though expressed in the form of a membership rule, it defines an extensional set determined by its members. In fact, it is the same as the set specified by the rule "the prime factors of 120, including one".

Conversely, the specification Bowlers={Dick,Jane} defines the membership of the intensional set by enumerating the current members. It could also be defined by a rule such as "Bowlers are all the employees of the Sales department".

2.4.2 What's an Object?

The terminology of object technology is in great disarray, such that any choice of terminology is bound to alienate a sizable portion of readers. The best we can do is to make clear how terms are being used, hoping that the reader is willing and able to translate into his own favorite set of definitions.

The main issue with the word "object" itself is whether it denotes only mortal things [Section 2.2.3], or all things whether mortal, eternal, or ephemeral. We use it here in the latter broad sense, using the term "surrogate" for the narrower concept. We thus have the following taxonomy, with some alternatives shown in brackets:

2.4.3 Casting

Casting is a general mechanism which replaces an invalid argument to an operation with a related valid argument, selected by a casting function. For example, the Factorial function is defined on integers. Without casting, applying it to a rational, as in Factorial(2.5), would be a type violation. However, if Truncate is defined to be the casting function from rationals to integers, then Factorial(2.5) would be evaluated as Factorial(Truncate(2.5)), i.e., Factorial(2).

A casting function g is defined between a pair of types T1 and T2, with signature g:T1->T2. In effect it overloads the definitions of all functions which are defined on T2 but not on T1. If a function f:T2->T3 is not also defined on T1, then it implicitly acquires another definition on T1 such that f(x)=f(g(x)) if x is an instance of T1.

Casting may not be essential, since it can always be written explicitly, but it is a commonly provided convenience in many languages. It does help to simplify programming, and enhance readability of programs.

Languages place various sorts of constraints on casting functions. The following restrictions are adequate for our purposes:

There are several mechanisms by which a casting function may be identified:

For the purpose of this paper we will assume that Cargo is a reserved name for a casting function which can be overloaded with definitions for different argument types. We do not require multiple definitions for the same argument type; a carrier has just one cargo. (That might be a useful extension, but we won't pursue it now.) An aliasing or keyword capability is also desirable, since it is sometimes more natural for the cargo function to have a name such as Contents, Value, Text, Location, etc.

2.4.4 Parameterized Types

A family of types such as "Set of Persons", "Set of Integers", etc., can be described as "parameterized types" of the form SET(Person), SET(Integer), etc. SET(Person) designates a type whose instances are all the sets having only people as members (its extension is the power set of the set of all persons).

SET itself is not a type, but a type generator. It is a function which maps a type such as Person into another type, SET(Person), which is a generated type. The term parameterized type should be taken precisely to mean a generated type, not the generator itself.

Such generated types are themselves ephemeral objects. Whenever the type T exists, the type SET(T) is also an existing object, again recursively ad nauseam. Thus the type whose instances are sets whose members are sets of sets of integers exists, whether anyone cares or not. To be really precise, we should also avoid the term "generator", since it too implies that something new is being created. However, we will continue to use the terms "type generator" and "generated type" rather than the more precise "type designator" and "designated type".

Generated types, being ephemeral, are a distinct sub-category of types, distinguished from the more familiar sub-category of types which are explicitly created.

A type template is a description of a type generator. It is analogous to a function body (procedure); when the parameters in the body are instantiated, the template then serves as the definition of a specific generated type.

A simple type generator takes a single type as its only parameter. Such a generator is a function mapping any type into a generated type. Tuple types [Section 4.2] provide an example of non-simple type generators.

An interesting subtype relationship exists among simply generated types. Let's use T2 SUBSET T1 to denote subtyping. If T2 SUBSET T1, then G(T2) SUBSET G(T1) for any simple type generator G. Thus SET(Student) SUBSET SET(Person) - a set of students is also a set of persons.

A simple type generator G also has a maximal generated type G(Object) which is a supertype of any other type G(T) generated by G. Thus although SET itself is not a type, SET(Object) serves the same purpose, being the type whose instances are all the existing sets.

There is also a parallel notion of sub-generators: if G2 SUBSET G1, then G2(T) SUBSET G1(T) for any type T. If we wanted Set to be a subtype of Multiset, we might define SET as a sub-generator of MULTISET. Then SET(Person) would be a subtype of MULTISET(Person).

2.5 Set Types

We could say that ExtensionalSet is a single type, having all the extensional sets shown in earlier examples among its instances. However, since types are primarily used to constrain the permissible values of variables and functions, it's useful to have more specific set types, such as "Set of Persons" or "Set of Integers". These are conveniently described in terms of generated types, as just discussed.

Let SET be the type generator for extensional set types. (The unqualified term "set" will always mean an extensional set.) It is a function which takes a type as a parameter and returns the corresponding extensional set type. Thus the two types we just mentioned would be designated as SET(Person) and SET(Integer). The type whose instances include all extensional sets is given by SET(Object).

If Dick and Jane are students, then {Dick,Jane} is an instance of both SET(Student) and SET(Person). If an object can be an instance of several types which are not related as subtypes, then a set can be an instance of set types which are not related as subtypes. Thus if Dick and Jane are also teachers, then {Dick,Jane} is also an instance of SET(Teacher), which is neither a subtype nor supertype of SET(Student). In general, a set is an instance of SET(T) whenever all its members are instances of T.

If objects can change types, then sets containing those objects may also change types. If Dick alone becomes an Employee, then {Dick,Jane} is not yet an instance of SET(Employee) - but it will be as soon as Jane also becomes an Employee. This membership ceases as soon as either person stops being an Employee. If s1 and s2 are instances of SET(T1) and SET(T2), respectively, then the result of s1 UNION s2 is an instance of SET(Ti), where Ti is any type having both T1 and T2 as subtypes, i.e., any common supertype of T1 and T2. Such a common supertype is not necessarily unique. This is illustrated by the following example, under which the union of a set of interns with a set of trainees yields something which is both a set of students and a set of teachers.

        ..........
        : Person :
        ..........
           /\        
          /  \       
         /    \      
        /      \     
...........  ...........
: Student :  : Teacher :
...........  ...........
   |    \      /    |
   |     \    /     |
   |      \  /      |
   |       \/       |
   |       /\       |
   |      /  \      |
   |     /    \     |
   |    /      \    |
...........  ...........
: Intern  :  : Trainee :
...........  ...........

We'll deal with intensional set types in Section 3.3, after a general introduction to carriers and cargos.

3 CARRIERS AND CARGOS

We have defined Cargo to be a function which serves as a casting function. It can be redefined on various argument and result types. A specific Cargo function is a definition of Cargo on a particular argument type T, which will be denoted as T.Cargo.

A carrier type is a type which is the argument type of a Cargo definition, and a carrier is an instance of a carrier type. The cargo of a carrier c is the value of Cargo(c).

The following restrictions are useful:

3.1 Mutation Operators

Although it might be commonly assumed that casting functions are not updatable, the Cargo function is updatable. Mutation of carriers is defined as update of the Cargo function. The basic form is direct assignment:

Cargo(x)<-y,

as in

Cargo(Bowlers)<-{Dick,Jane},
Cargo(Symbols)<-5.

Many operations which modify the current cargo can be described in the general form:

Mutate(x,op,y) ::= Cargo(x)<-op(Cargo(x),y).

For example, a counter might be incremented by

Increment(Symbols,2) ::= Mutate(Symbols,Plus,2),

i.e.,

Cargo(Symbols)<-Plus(Cargo(Symbols),2).

Someone might be added to a team by

Insert(Bowlers,Tom) ::= Mutate(Bowlers,Append,Tom),

i.e.,

Cargo(Bowlers)<-Append(Cargo(Bowlers),Tom),

where

Append(x,y) ::= x UNION {y}.

It might also be useful to consider a form of reverse casting on assignment. If there is a Cargo definition such that x is an instance of the argument type (i.e., the carrier type) and y is an instance of the result type, then

x<-y

can be interpreted as

Cargo(x)<-y.

Then

Bowlers<-{Dick,Jane}

would be interpreted as

Cargo(Bowlers)<-{Dick,Jane}.

3.2 Constraints on Mutation

Constraints on mutation are defined by the applicable definition of the Cargo function. For the types Team and Counter, the Cargo function would be defined with signatures

Cargo:Team->SET(Person),
Cargo:Counter->Integer.

Thus any mutation of Bowlers or Golfers must leave their cargo as a set of persons, while any mutation of Symbols or Procedures must leave their cargo as an integer.

The type of cargo which a carrier can carry is the result type of the specific Cargo function which is applicable to that carrier. All instances of this carrier type have the same cargo type.

It is sometimes necessary to further constraint the cargo type for individual instances of a carrier type. For example, one team might have to be a team of doctors, another a team of pilots, etc. (This becomes relevant for relational tables [Section 4.7].) This essentially requires a CargoType function with signature

CargoType:T->Type.

Mutation of carriers is now further constrained to satisfy

Cargo(x) IN CargoType(x).

Given a carrier type T1 with the two functions

Cargo:T1->T2,
CargoType:T1->Type,

let CargoType(x)=T3 for some instance x of T1.

Cargo(x) must now be an instance of both T2 and T3, which only makes sense of they are non-disjoint. In general, it seems reasonable to require that T3 be a subtype of T2. In effect, this makes the cargo type of a carrier a subset of the cargo type defined for the carrier type. Thus we might have Medics as a team, with

CargoType(Medics)=SET(Doctor).

Type checking for carrier mutation now has to go beyond signature-based constraints implied by the Cargo function, and enforce the specific CargoType constraint as well.

One plausible approach to all this is to define Carrier as a type, with the following functions defined on it:

Cargo:Carrier->Object,
CargoType:Carrier->Type.

The Cargo function could be redefined on any subtype of Carrier. If Cargo is redefined on a subtype T of Carrier, then the following rules apply for any instance x of T:

Any subtype of Carrier could be considered a carrier type.

3.3 Intensional Set Types

Intensional sets are objects which are carriers of extensional sets, i.e., whose CargoType is a subtype of SET(Object). There does not seem to be a need to separately define a generator for intensional set types.

3.4 The Answer

Bowlers and Golfers are always different intensional sets, being distinct instances of the type Team. Thus Bowlers=Golfers is never true. Their cargos may sometimes be the same, so that Cargo(Bowlers)=Cargo(Golfers) is sometimes true.

4 OTHER USES FOR CARRIERS AND CARGOS

Most applications of the carrier/cargo concept take the following form:

4.1 Multisets, Lists, and Other Aggregates

The concepts extend trivially to multisets and lists. Details are not too important for the present paper. MULTISET and LIST can be defined as generators. MULTISET may or may not be a sub-generator of SET. There could be an abstract (non-instantiable) COLLECTION type which is the supertype of all the SET, MULTISET, and LIST types. Designator expressions are evaluated differently. Evaluating the multiset designator [Dick, Jane, Dick] does not eliminate duplicates, while evaluating the set designator {Dick, Jane, Dick} does. Similarly, the list designators <Dick,Jane> and <Jane,Dick> evaluate to different lists, while {Dick,Jane} and {Jane,Dick} evaluate to the same set.

The carrier/cargo paradigm also applies obviously to other aggregates such as arrays and vectors.

4.2 Tuples and Records

Tuple and record types will be useful later in the discussion of relational tables [Section 4.7].

A tuple type is defined by a sequence (list) of one or more types. There are two main variants on the tuple concept:

The TUPLE type generator is not a simple generator [Section 2.4.4], since it takes a list of types rather than a single type as its parameter. It has no corresponding maximal type. The type which contains all tuples has to be defined separately as the supertype of TUPLE(Object), TUPLE(Object,Object), TUPLE(Object, Object,Object),....

As a notational convenience, we allow G() to designate the union of all types generated by the generator G, allowing us to use TUPLE() to denote the type whose instances are all tuples.

4.3 Spatial and Geometric Constructs

A point in space is not the same thing as a point that can move in space. Points in a geometric space are much like literals. Once the space has been defined, all the points exist. They can't be created and destroyed, and they have immutable properties. Thus, we might define GeometricPoint as a two-dimensional Euclidean space. The points in the space all exist, having rectangular and polar coordinates as immutable properties. Such points might be designated via either sort of coordinates, e.g., RectPoint(10,20) or PolarPoint(10,30). Immutability of properties means that it makes no sense to move these points; their coordinates can't be updated. The identity of a geometric point is inseparable from its location; its location is its identity.

The definition of such a type might take the form

On the other hand, suppose we want to talk about a line between PointA and PointB, such that the line can be moved by moving these end points. These points are a different sort of thing, since they have to be explicitly created and destroyed, and they can be moved (i.e., their coordinates can change). Their identities are independent of their locations. They can change location without changing identity, and two of them can be in the same place without being the same point.

For these we need a different type, MovablePoint, which is a carrier type having GeometricPoint as its cargo:

PointA, an instance of MovablePoint, has an instance of GeometricPoint as its cargo. A property such as Xcoord(PointA) is defined by cargo casting, being evaluated as Xcoord(Cargo(PointA)); aliasing makes this equivalent to Xcoord(Location(PointA)). Mutators could be defined on MovablePoint such that, for example, Xcoord(PointA)<-25 means

Cargo(PointA)<-RectPoint(25,Ycoord(PointA)).

The same reasoning extends to other geometric constructs. A circle is not the same thing as a circular object. Circles are abstractions like literals, in the sense that all circles exist, and there is only one circle with a one-inch diameter. The identity of a circle is inseparable from its diameter (and its radius and circumference as well).

Circular objects, such as circular figures in a drawing, have to be created and destroyed, and many of them can have a diameter of one inch. Circular objects may change size without changing identity. Circles don't have a location, but circular figures do. Thus circular figures can be defined as a carrier type, having circles as their cargos.

4.4 Documents, Files, Multimedia, and Other Containers

A river is not the same thing as the water in it. A document is not the same thing as the text in it. A file is not the same thing as the bit-string or byte-string it contains. A container is not the same as its contents. All of these can be modeled as carriers and cargos.

Properties of document objects include such things as authors and publication schedules, which are not properties of character strings or structures of character strings. Operations such as concatenation, or finding substrings, are defined on character strings, not documents. Such operations can be performed on documents if they are defined as carriers whose cargos are character strings or structures of character strings (e.g., chapters). Documents are edited by effectively replacing one cargo with another.

Properties of files include such things as owners, creation dates, privileges, positions in directories, etc. Operations on bit strings are much the same as operations on character strings. Files can thus be modeled as carriers of bit string cargos.

A photograph is not the same thing as the image in it. Different photos may have the same image, and the image in a photo can be changed by retouching or other forms of enhancement. A photograph is thus a carrier having an image as its cargo.

A general characteristic of containers (things with content) is that many operations on the container objects are intended to be performed on their contents. In general, these cargos need not be literals.

4.5 Large Objects

Large objects [4,1] may also fit the paradigm of carriers and cargos. The distinction between a very long string and its "handle" is visible to the user to the extent that he can control which is to be moved or copied or otherwise operated on. This parallels the carrier/cargo distinction. At the same time, operations defined on strings are to be applied to their handles, with the intent of having them performed on the corresponding strings. This parallels the casting semantics of the Cargo function.

Different sorts of large objects (e.g., bit, byte, and character strings) can be distinguished by different result types on specific Cargo functions, or by different values of CargoType.

Some fine tuning may be needed. It may be desirable for equality tests on carriers to propagate to their cargos. Also, in order to behave like constants, the Cargo property should not be updatable once it has been initialized.

4.6 Events

An event can be a carrier having a time point as its cargo. The time of occurrence can be an updatable property, at least for events scheduled in the future. Operations on events which should be "inherited" from time points include relative ordering of events (did the stockholder meeting occur before the election?), finding the time interval between two events (how long before?) and specifying some period of time before or after an event (three weeks before the election). Such operations are really defined on time points, not events. When applied to events, they could be interpreted in terms of their cargo time points.

4.7 Relational Tables

For tables, the signature of the Cargo function defines the form of relational model, while the individual cargo types describe the row types for individual tables.

In its basic form, a relational table is a named, created object with an associated varying set of tuples of atomic elements, very much like a team whose members happen to be tuples rather than people. Thus a table is a carrier whose cargo type is SET(TUPLE(AtomicType,...,AtomicType)). This form of relational table is described by Cargo function signatures of the form:

Cargo: Table->SET(TUPLE(AtomicType,...,AtomicType)).

Variants of the relational model can be characterized by different sorts of cargo functions. Duplicates are admitted by using the MULTISET type generator in place of SET. Non-First Normal Form relations are admitted by allowing arbitrary tuples [Section 4.2], i.e.,

Cargo: Table->MULTISET(TUPLE()).

Ordered tables are admitted by allowing arbitrary collections. If the COLLECTION generator has SET, MULTISET, and LIST as sub-generators [Section 2.4.4], then the signature takes the form

Cargo: Table->COLLECTION(TUPLE()).

To go even further and allow a table to be a collection of arbitrary objects, the signature can be generalized to

Cargo: Table->COLLECTION(Object).

Generalized relations of this form have been considered for SQL [2,3].

The form of a particular table is given by the CargoType function. A Scores table might be created as an instance of Table with its CargoType initialized as

CargoType(Scores)<-SET(TUPLE(Person,Integer)).

In general, CargoType for a table has the form COLLECTION(RT), where RT is the row type of the table.

Overall, this approach differentiates between a table object and the collection it carries as its cargo. This allows queries to be described more precisely as closed operations on collections rather than on tables. The semantics are the same. When a query is applied to a table, it is naturally cast into a query on the associated collection. Queries on multiple tables can be defined as operating on the cross-products of their associated collections.

Letting the results of queries be collections rather than tables simplifies certain existence, identity, and typing issues. If results of queries are table objects, then executing the same query twice necessarily yields unequal results. Treating the results as extensional collections allows them to be equal, in the sense that {Tom} UNION {Dick} and {Tom} UNION {Dick} yield the same result.

Collections are instances of generated types, which are ephemeral and hence don't need explicit creation [Section 2.4.4]. Hence result types do not have to be created for the results of arbitrary queries. Result types can be determined from the form of the SELECT clause. For example, the result of a query to SELECT x.Name, x.Age FROM... is an instance of MULTISET(TUPLE(Character,Integer)). This type always exists, so long as the types Character and Integer exist, and does not have to be fabricated for any particular query.

If tables are desired as query results, then the collection yielded by a query can be assigned as the cargo of a table object, so long as it satisfies the CargoType constraint of the table.

As with carriers in general, tables have other properties in addition to the properties of their cargos, such as names, the schemas they belong to, and authorization constraints. Depending on how tuples are defined [Section 4.2], the list of column names could also be a property of the table.

4.8 A Comparison of Carriers and Variables

Even a variable is not quite the same thing as its value. A carrier and its cargo bear some resemblance to a variable and its value, with the cargo type of the carrier being analogous to the declared type of the variable. The Cargo function is somewhat analogous to an expression evaluator - for a carrier c, f(c) sometimes behaves like f(Cargo(c)), while for a variable x, f(x) behaves like f(Eval(x)). In other words, a counter such as Symbols sometimes behaves like an integer-valued variable.

However, carriers and cargos do not really provide a good model for variables and values. There are significant differences. Carriers and variables are created in different ways - carriers are created explicitly as persistent objects, while variables are implicitly created via declarations during a particular execution of a procedure. Carriers have single cargos, while the value of a variable may depend on which procedural environment it occurs in. A reference (not an assignment) to a variable is always replaced by its value, while a reference to a carrier is only replaced by its cargo when carrier casting applies, or when Cargo is explicitly invoked. Finally, it is not possible to define additional properties for a variable, apart from the properties of its values.

5 SUMMARY/CONCLUSIONS

Carriers and cargos provide a general mechanism for modeling many sorts of things which come in natural pairs, usually having an intensional/extensional or container/content flavor.

The Cargo and CargoType functions are defined on all carriers, imposing the constraint

Cargo(x) IN CargoType(x).

A carrier type is the argument type of a specific Cargo function. The default CargoType for a carrier is the result type of the specific Cargo function defined on its carrier type. The CargoType can be explicitly specified for individual carriers, so long as it remains a subtype of the default.

The essential characteristics of carriers and cargos:

This approach seems useful for such things as intensional sets and other collections, documents, files, geometric and spatial constructs, relational tables, and many others.

This approach is closely related to identity vs. equality in some object models, with identity referring to objects and equality referring to their "values" [6]. Such values are in fact the cargos for objects which are their carriers. The difference in the present approach is to recognize such objects and their values as being distinct things, with an explicit mapping between them.

Although it seems quite promising, this approach still needs to be validated by prototyping in a real language. Areas needing further investigation include:

6 REFERENCES

  1. ANSI X3H2-93-359R/ISO DBL MUN-003, (ISO-ANSI Working Draft) Database Language SQL (SQL3), Jim Melton (ed.), August 1993.
  2. ANSI X3H2-93-231, "Tables of ADTs", May 1, 1993, by Phil Shaw.
  3. ANSI X3H2-93-302, "Removal of table of ADT", March 8, 1993.
  4. ANSI X3H2-93-341r1, "Large Object Strings", July 11, 1993, by Paul Cotton, Toby Lehman, Nelson Mattos and Frank Pellow.
  5. ANSI X3H2-93-425, "Another Approach to BLOBs", Sept. 10, 1993, by Jim Melton.
  6. Setrag Khoshafian and George Copeland, "Object Identity", Proc. Conf. on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA), Portland, Oregon, 1986.