William Kent, "Carriers And Cargos: A General Paradigm For Modeling Collections, Tables, Multimeda, Spatial Constructs, And Other Intensional/Extionsional Objects", Oct 1993. [20 pp]

William Kent

Oct 1993

**CONTENTS:**

> 1 INTRODUCTION/ABSTRACT . . . 2

> 2 SETS . . . 2

>> 2.1 A Question . . . 3

>> 2.2 Extensional Sets . . . 3

>>> 2.2.1 Identity . . . 3

>>> 2.2.2 Immutability . . . 3

>>> 2.2.3 Existence . . . 3

>>> 2.2.4 Operations on Extensional Sets .
. . 4

>> 2.3 Intensional Sets . . . 4

>>> 2.3.1 Identity . . . 4

>>> 2.3.2 Mutability . . . 5

>>> 2.3.3 Existence . . . 6

>>> 2.3.4 Operations on Intensional Sets .
. . 6

>> 2.4 A Terminological Digression . . . 6

>>> 2.4.1 Extensional and Intensional
Specification . . . 7

>>> 2.4.2 What's an Object? . . . 7

>>> 2.4.3 Casting . . . 7

>>> 2.4.4 Parameterized Types . . . 8

>> 2.5 Set Types . . . 9

> 3 CARRIERS AND CARGOS . . . 10

>> 3.1 Mutation Operators . . . 11

>> 3.2 Constraints on Mutation . . . 12

>> 3.3 Intensional Set Types . . . 13

>> 3.4 The Answer . . . 14

> 4 OTHER USES FOR CARRIERS AND CARGOS . . . 14

>> 4.1 Multisets, Lists, and Other Aggregates .
. . 14

>> 4.2 Tuples and Records . . . 14

>> 4.3 Spatial and Geometric Constructs . . . 15

>> 4.4 Documents, Files, Multimedia, and Other
Containers . . . 16

>> 4.5 Large Objects . . . 17

>> 4.6 Events . . . 17

>> 4.7 Relational Tables . . . 17

>> 4.8 A Comparison of Carriers and Variables .
. . 19

> 5 SUMMARY/CONCLUSIONS . . . 19

> 6 REFERENCES . . . 20

Confusion over identity, existence, and mutability of sets can be sorted out by distinguishing between intensional and extensional sets. The same applies to other collections, of course. A similar pattern also arises in distinguishing between a relational table and its collection of tuples, a team and its members, a document and its text, a file and its contents, a circular object and a geometric circle, a movable point and a point in geometric space, a photograph and the image it contains, an event and the time at which it occurs, a counter and its value, and probably other things as well.

The essential paradigm for dealing with all these phenomena is to treat them as pairs
of associated but distinct objects, which we can generically call *carriers* and*
cargos*. Carriers are created objects which carry other objects (often literals) as
their cargos. The cargo of a given carrier is constrained to be of a specified cargo type.
Thus the cargo of an intensional set is an extensional set, the cargo of a team is a set
of persons, the cargo of a document is its text, and so on. A carrier will often
"inherit" operations defined on its cargo, so that the union of two intensional
sets is the union of their extensions, and printing a document means printing its
character string text. In effect, a carrier can be "cast" into a role normally
played by its cargo. At the same time, a carrier has unique operations of its own, such as
recording the author of a document or the owner of a team.

This paradigm can be realized using familiar language facilities such as casting and parameterized types.

This paper explores carriers and cargos as a modeling technique. It has not yet been prototyped in a real language, and some specific concerns remain to be investigated.

We begin with a careful review of the familiar identity, existence, and mutability properties of sets, and then show how the principles extend nicely to other areas.

Suppose the set of Bowlers is {Dick,Jane}, and the set of Golfers also happens to be {Dick,Jane}. (Let's simplify one thing by pretending that people are uniquely identified by their first names.) Are Bowlers and Golfers the same? Well, yes and no. A precise answer requires a careful distinction between extensional and intensional sets.

An analogous question will help illustrate certain concepts. Suppose a compiler keeps two counters for counting Symbols and Procedures which occur in a given program. If there happen to be five symbols and five procedures, are Symbols and Procedures the same?

Extensional sets are the basic mathematical constructs we learned about in school. The
essential catch phrase was "a set is determined by its extension" (i.e., by its
members), which implies an essential principle of *identity*. If a set contains
just Dick and Jane, that uniquely identifies the set. There is no other set which contains
just Dick and Jane. It's even a mistake to say that two sets which contain just Dick and
Jane are the same, because they weren't two sets to begin with. It's like saying two
numbers which are both five are the same number. There is only one number five. There is
only one set {Dick,Jane} which contains just Dick and Jane.

The nature and identity of an extensional set is not affected by type considerations. If Dick and Jane are both students and teachers, then the set of teachers {Dick,Jane} and the set of students {Dick,Jane} are one and the same set.

A natural corollary to the identity principle is that there is exactly one empty set, denoted {}.

The identity principle also implies *immutability*. {Dick,Jane} and
{Dick,Jane,Spot} are different sets, just as five and six are different numbers. The
careless phrase "insert Spot into the set {Dick,Jane}" is a misnomer. Nothing
gets inserted into that set; the set {Dick,Jane} is still the set {Dick,Jane}. This
"insertion" is just an operation which yields a different set, i.e., {Dick,Jane}
UNION {Spot} = {Dick,Jane,Spot}. The analogy here is that "incrementing" five
doesn't change the number five in any way. It's just an operation which yields a different
number: 5+1=6.

There's also an important *existence* principle. If anything exists, then the
set containing it exists. That's the foundation of the set-theoretic definition of
numbers: the empty set exists (corresponding to 0), and the set containing an existing set
also exists (thus establishing the notion of successor). Furthermore, the union of any
sets is an existing set. The net effect is that for any bunch of things which exist, the
set containing those things exists. It's somewhat analogous to saying that all the numbers
exist.

All sets of existing things exist, including such bizarre combinations as {Dick,5,Chicago,Jupiter,red,{}}. Such sets exist whether or not they have ever been mentioned before, and whether or not they are currently recorded in some data item. The number 49786554313989 exists, whether or not anyone has ever mentioned it before, and so does that bizarre set.

It follows that extensional sets cannot be explicitly created or destroyed. They exist or don't exist automatically in correspondence with the existence of their members. At the moment Dick comes into existence, all the sets containing Dick (with zero or more other existing members) come into existence, together with all the sets containing such sets, ad nauseam. The moment Dick ceases to exist, all such sets cease to exist.

We can call this an *ephemeral* mode of existence. Such things do not exist
forever, nor are they overtly created or destroyed by some explicit operation. Their
existence is determined by some rule which might be satisfied at some times but not at
other times; they exist only while the rule holds. Things which always exist, like the
numbers and the empty set, are said to have an *eternal* mode of existence. Things
which are explicitly created and destroyed, such as people or departments or tables, can
be said to have a *mortal* mode of existence. Some sets are in fact eternal, namely
sets of eternal things, such as the sets of numbers. Thus the set {1,2,49786554313989} is
eternal, always existing.

Calling the expression {Dick,Jane} a "constructor" is another misnomer to the
extent that it implies that something new is being created. It's simply an expression that
refers to an existing thing, just as 1+1 refers to an existing number. We will instead
call such an expression a *designator*, to avoid the unintended connotation. (The
term has been adopted in SQL3 [reference].)

Sets have no ordering, so that the expressions {Dick,Jane} and {Jane,Dick} designate the same set, just as 5+1 and 1+5 designate the same number. Thus different designators may designate the same set.

This also arises from the absence of duplicates in sets. Since sets can't contain duplicates, evaluation of set designator expressions eliminates duplicates, so that the expressions {Dick,Jane} and {Dick,Jane,Dick} designate the same set. Thus, if x and y are variables, then {x,y} might designate a set with either one or two members, depending on whether x and y refer to the same thing. {x,y} might even designate the empty set if neither variable has a value. Thus you can't necessarily determine the size of a set from the form of a designator expression.

The usual operations of union, intersection, and difference of sets are well known. Card(s) gives the cardinality of a set s, i.e., the number of members it has. (We are not much concerned about infinite sets.) The predicate Occurs(x,s), often written x IN s, is true if and only if x is a member of the set s.

Equality of set-valued expressions, in the sense of being identical (one and the same), is defined as

s1=s2 <=> FORALL x, x IN s1=x IN s2.

The expressions refer to the same set if and only if the memberships are the same for all objects.

The set Bowlers is an *intensional* set, differing significantly from an
extensional set in the essential properties of identity, mutability, and existence.

Bowlers and Golfers are different teams, regardless of their membership.

If Tom becomes one of the Bowlers, the team is still the Bowlers, whose membership has changed to {Tom,Dick, Jane}. Tom might also become one of the Golfers, making its membership {Tom,Dick,Jane} as well, but Bowlers remains a different team from Golfers.

By analogy, Symbols and Procedures are different counters in the compiler, even if they happen to have the same value.

Thus the identity of an intensional set cannot be determined by its membership. Its identity is constant while its membership can change. In object systems, such identity is typically characterized by an "object identifier" (oid), which essentially traces back to a creation event [Section 2.3.3].

Intensional sets are *mutable*, via a replacement process. An intensional set
always has an associated extensional set (possibly empty), sometimes called the
"value" of the intensional set. We can think of the intensional set as a
carrier, having an extensional set as its cargo, being the value of a *Cargo*
function applied to the carrier.

An intensional set (i.e., a carrier) is mutated by replacing its cargo with a different cargo. It's crucial to distinguish the two distinct steps involved:

- Compute the new set. This does not alter the old cargo, nor does it yet affect the carrier.
- Make this result the new cargo of the carrier. It replaces the previous cargo, without altering the previous cargo. Now we consider the carrier to be mutated.

Thus, Tom becomes one of the Bowlers in these two steps:

- Compute the new cargo value: Cargo(Bowlers) UNION {Tom} = {Dick,Jane} UNION {Tom} = {Dick,Jane,Tom}. This does not alter the old cargo, which is the extensional set {Dick,Jane}, nor does it yet affect the carrier Bowlers.
- Make this result the cargo of the carrier: Cargo(Bowlers)<-{Dick,Jane,Tom}. It replaces the previous cargo, without altering the previous cargo. Now we consider the carrier to be mutated.

Analogously, incrementing the Symbols counter occurs in the same two steps:

- Compute the new value, e.g., Symbols+1 = 5+1 = 6.
- Assign that as the new value: Symbols<-6.

Separating the two steps clarifies the significance of constraints (including type
declarations). Constraints typically apply to carriers rather than directly to their
cargos. For example, suppose that teams such as Bowlers and Golfers may only have sets of
people as their cargos. It's meaningless to think of this as a *constraint* on the
set {Dick,Jane}; there's no point in constraining it - it *is* a set of people, and
it will always be one so long as Dick and Jane are people. The constraint is on the team,
restricting what sorts of cargos it may carry.

Thus, if we try to add Spot the dog to Bowlers, the first step is still valid: it is perfectly legal to compute the set {Dick,Jane} UNION {Spot}={Dick,Jane,Spot}. That is, the union operation is legal. It is only the second step which is illegal, namely trying to make this set the cargo of Bowlers.

Similarly, if counters are restricted to non-negative integers, that does not constrain the number 5; it simply is a non-negative integer. Subtracting 10 from the current value of Symbols is perfectly legal arithmetic, though it would not be legal to decrement the Symbols counter by 10. The subtraction Symbols-10 is legal, while the assignment is not.

We are essentially differentiating between operations on the cargo and mutation of the carrier. Operations which may have the look and feel of altering something in place should really be thought of as a replacement.If we replace the third digit of 1234 with a 4, we haven't altered the number 1234. We have replaced a mention of 1234 with a mention of 1244. Editing operations on a document can logically be thought of as designating new text strings to replace the old.

There is no automatic way that teams such as Bowlers or Golfers come into existence. They are mortal, having to be overtly created by some explicit operation such as New(Team). Such a creation event establishes their identity, typically captured in a "birthmark" which serves as the oid for such objects. Such mortal objects can also be overtly destroyed.

We'd often like to perform operations like union, intersection, difference, cardinality, and occurrence on things like Bowlers and Golfers. These operations are defined on sets, but Bowlers and Golfers are teams, not sets.

It is thus useful to treat Cargo as a casting function [Section 2.4.3], which allows operations defined on cargos to be applied to the carriers of such cargos. We'd like Bowlers UNION Golfers to be interpreted as Cargo(Bowlers) UNION Cargo(Golfers). This isn't true for all operations, equality being one of the notable exceptions. Bowlers=Golfers is very different from Cargo(Bowlers)=Cargo(Golfers). Casting is only invoked for operations which are not otherwise defined on the carriers.

Carriers often have properties of their own apart from the properties of their cargos. A team might belong to a league, and have an owner, manager, schedule, and record. Thus Bowlers and Golfers would be instances of the type Team, on which such properties are defined.

Another general class of operations defined on carriers but not on their cargos are the mutators [Section 3.1], which change the cargos of carriers. Thus, for example, the union operation Bowlers UNION {Harry} may yield a different set, but it does not alter the membership of Bowlers. An Insert operation would be a different thing, assigning the result of such a union to be the new cargo of Bowlers. The Insert operation would be defined as

Insert(x,c) ::= Cargo(c)<-Cargo(c) UNION {x},

incorporating the two steps described in Section 2.3.2.

Certain terminological matters need to be sorted out before we can progress further.

If you find this section confusing, ignore it. It only matters if you've seen the terms used otherwise.

The terms "extensional" and "intensional" are sometimes used in a different sense to describe the form of specification of a set. An extensional specification enumerates the members, while an intensional specification provides a membership rule. The two usages are orthogonal, since we can find examples of extensional and intensional sets specified either extensionally or intensionally.

Here's an extensional specification of an extensional set: {1,2,3,4,5}. The members are simply enumerated in the specification. The same set can be specified intensionally by the membership rule "the integers between one and five, inclusive". Though expressed in the form of a membership rule, it defines an extensional set determined by its members. In fact, it is the same as the set specified by the rule "the prime factors of 120, including one".

Conversely, the specification Bowlers={Dick,Jane} defines the membership of the intensional set by enumerating the current members. It could also be defined by a rule such as "Bowlers are all the employees of the Sales department".

The terminology of object technology is in great disarray, such that any choice of terminology is bound to alienate a sizable portion of readers. The best we can do is to make clear how terms are being used, hoping that the reader is willing and able to translate into his own favorite set of definitions.

The main issue with the word "object" itself is whether it denotes only mortal things [Section 2.2.3], or all things whether mortal, eternal, or ephemeral. We use it here in the latter broad sense, using the term "surrogate" for the narrower concept. We thus have the following taxonomy, with some alternatives shown in brackets:

- Object [abstract object, anything]
- Surrogate [object, mortal thing]
- Literal [value, eternal thing]

*Casting* is a general mechanism which replaces an invalid argument to an
operation with a related valid argument, selected by a *casting function*. For
example, the Factorial function is defined on integers. Without casting, applying it to a
rational, as in Factorial(2.5), would be a type violation. However, if Truncate is defined
to be the casting function from rationals to integers, then Factorial(2.5) would be
evaluated as Factorial(Truncate(2.5)), i.e., Factorial(2).

A casting function g is defined between a pair of types T1 and T2, with signature g:T1->T2. In effect it overloads the definitions of all functions which are defined on T2 but not on T1. If a function f:T2->T3 is not also defined on T1, then it implicitly acquires another definition on T1 such that f(x)=f(g(x)) if x is an instance of T1.

Casting may not be essential, since it can always be written explicitly, but it is a commonly provided convenience in many languages. It does help to simplify programming, and enhance readability of programs.

Languages place various sorts of constraints on casting functions. The following restrictions are adequate for our purposes:

- There is at most one casting function from a given type to a given type.
- There may be any number of casting functions from a given type.
- Casting is only permitted between disjoint types.

There are several mechanisms by which a casting function may be identified:

- Use a reserved name, such as
*Cast*. It can be overloaded with multiple definitions for different argument and result types. Some languages might not allow overloading with more than one definition for a given argument type, thereby disallowing such casting to more than one target type. - A casting function can be defined with its own user-specified name, with a keyword flagging it as a casting function. This is somewhat similar to the previous case, if we consider the keyword as providing a common alias for the functions.

For the purpose of this paper we will assume that *Cargo* is a reserved name for
a casting function which can be overloaded with definitions for different argument types.
We do not require multiple definitions for the same argument type; a carrier has just one
cargo. (That might be a useful extension, but we won't pursue it now.) An aliasing or
keyword capability is also desirable, since it is sometimes more natural for the cargo
function to have a name such as Contents, Value, Text, Location, etc.

A family of types such as "Set of Persons", "Set of Integers", etc., can be described as "parameterized types" of the form SET(Person), SET(Integer), etc. SET(Person) designates a type whose instances are all the sets having only people as members (its extension is the power set of the set of all persons).

SET itself is not a type, but a *type generator*. It is a function which maps a
type such as Person into another type, SET(Person), which is a *generated type*.
The term *parameterized type* should be taken precisely to mean a generated type,
not the generator itself.

Such generated types are themselves ephemeral objects. Whenever the type T exists, the type SET(T) is also an existing object, again recursively ad nauseam. Thus the type whose instances are sets whose members are sets of sets of integers exists, whether anyone cares or not. To be really precise, we should also avoid the term "generator", since it too implies that something new is being created. However, we will continue to use the terms "type generator" and "generated type" rather than the more precise "type designator" and "designated type".

Generated types, being ephemeral, are a distinct sub-category of types, distinguished from the more familiar sub-category of types which are explicitly created.

A *type template* is a description of a type generator. It is analogous to a
function body (procedure); when the parameters in the body are instantiated, the template
then serves as the definition of a specific generated type.

A *simple type generator* takes a single type as its only parameter. Such a
generator is a function mapping any type into a generated type. Tuple types [Section 4.2] provide an example of non-simple type
generators.

An interesting subtype relationship exists among simply generated types. Let's use T2 SUBSET T1 to denote subtyping. If T2 SUBSET T1, then G(T2) SUBSET G(T1) for any simple type generator G. Thus SET(Student) SUBSET SET(Person) - a set of students is also a set of persons.

A simple type generator G also has a *maximal* generated type G(Object) which is
a supertype of any other type G(T) generated by G. Thus although SET itself is not a type,
SET(Object) serves the same purpose, being the type whose instances are all the existing
sets.

There is also a parallel notion of sub-generators: if G2 SUBSET G1, then G2(T) SUBSET G1(T) for any type T. If we wanted Set to be a subtype of Multiset, we might define SET as a sub-generator of MULTISET. Then SET(Person) would be a subtype of MULTISET(Person).

We could say that ExtensionalSet is a single type, having all the extensional sets shown in earlier examples among its instances. However, since types are primarily used to constrain the permissible values of variables and functions, it's useful to have more specific set types, such as "Set of Persons" or "Set of Integers". These are conveniently described in terms of generated types, as just discussed.

Let SET be the type generator for extensional set types. (The unqualified term "set" will always mean an extensional set.) It is a function which takes a type as a parameter and returns the corresponding extensional set type. Thus the two types we just mentioned would be designated as SET(Person) and SET(Integer). The type whose instances include all extensional sets is given by SET(Object).

If Dick and Jane are students, then {Dick,Jane} is an instance of both SET(Student) and SET(Person). If an object can be an instance of several types which are not related as subtypes, then a set can be an instance of set types which are not related as subtypes. Thus if Dick and Jane are also teachers, then {Dick,Jane} is also an instance of SET(Teacher), which is neither a subtype nor supertype of SET(Student). In general, a set is an instance of SET(T) whenever all its members are instances of T.

If objects can change types, then sets containing those objects may also change types. If Dick alone becomes an Employee, then {Dick,Jane} is not yet an instance of SET(Employee) - but it will be as soon as Jane also becomes an Employee. This membership ceases as soon as either person stops being an Employee. If s1 and s2 are instances of SET(T1) and SET(T2), respectively, then the result of s1 UNION s2 is an instance of SET(Ti), where Ti is any type having both T1 and T2 as subtypes, i.e., any common supertype of T1 and T2. Such a common supertype is not necessarily unique. This is illustrated by the following example, under which the union of a set of interns with a set of trainees yields something which is both a set of students and a set of teachers.

.......... : Person : .......... /\ / \ / \ / \ ........... ........... : Student : : Teacher : ........... ........... | \ / | | \ / | | \ / | | \/ | | /\ | | / \ | | / \ | | / \ | ........... ........... : Intern : : Trainee : ........... ...........

We'll deal with intensional set types in Section 3.3, after a general introduction to carriers and cargos.

We have defined *Cargo* to be a function which serves as a casting function. It
can be redefined on various argument and result types. A *specific* Cargo function
is a definition of Cargo on a particular argument type T, which will be denoted as
T.Cargo.

A *carrier type* is a type which is the argument type of a Cargo definition, and
a *carrier* is an instance of a carrier type. The *cargo* of a carrier c is
the value of Cargo(c).

The following restrictions are useful:

- Argument types of Cargo definitions should be disjoint, so that an object is an instance of at most one carrier type, and at most one specific Cargo function applies to any object.
- Argument and result types for any specific Cargo function should be disjoint.

Although it might be commonly assumed that casting functions are not updatable, the Cargo function is updatable. Mutation of carriers is defined as update of the Cargo function. The basic form is direct assignment:

Cargo(x)<-y,

as in

Cargo(Bowlers)<-{Dick,Jane},

Cargo(Symbols)<-5.

Many operations which modify the current cargo can be described in the general form:

Mutate(x,op,y) ::= Cargo(x)<-op(Cargo(x),y).

For example, a counter might be incremented by

Increment(Symbols,2) ::= Mutate(Symbols,Plus,2),

i.e.,

Cargo(Symbols)<-Plus(Cargo(Symbols),2).

Someone might be added to a team by

Insert(Bowlers,Tom) ::= Mutate(Bowlers,Append,Tom),

i.e.,

Cargo(Bowlers)<-Append(Cargo(Bowlers),Tom),

where

Append(x,y) ::= x UNION {y}.

It might also be useful to consider a form of *reverse casting* on assignment.
If there is a Cargo definition such that x is an instance of the argument type (i.e., the
carrier type) and y is an instance of the result type, then

x<-y

can be interpreted as

Cargo(x)<-y.

Then

Bowlers<-{Dick,Jane}

would be interpreted as

Cargo(Bowlers)<-{Dick,Jane}.

Constraints on mutation are defined by the applicable definition of the Cargo function. For the types Team and Counter, the Cargo function would be defined with signatures

Cargo:Team->SET(Person),

Cargo:Counter->Integer.

Thus any mutation of Bowlers or Golfers must leave their cargo as a set of persons, while any mutation of Symbols or Procedures must leave their cargo as an integer.

The type of cargo which a carrier can carry is the result type of the specific Cargo function which is applicable to that carrier. All instances of this carrier type have the same cargo type.

It is sometimes necessary to further constraint the cargo type for individual instances of a carrier type. For example, one team might have to be a team of doctors, another a team of pilots, etc. (This becomes relevant for relational tables [Section 4.7].) This essentially requires a CargoType function with signature

CargoType:T->Type.

Mutation of carriers is now further constrained to satisfy

Cargo(x) IN CargoType(x).

Given a carrier type T1 with the two functions

Cargo:T1->T2,

CargoType:T1->Type,

let CargoType(x)=T3 for some instance x of T1.

Cargo(x) must now be an instance of both T2 and T3, which only makes sense of they are non-disjoint. In general, it seems reasonable to require that T3 be a subtype of T2. In effect, this makes the cargo type of a carrier a subset of the cargo type defined for the carrier type. Thus we might have Medics as a team, with

CargoType(Medics)=SET(Doctor).

Type checking for carrier mutation now has to go beyond signature-based constraints implied by the Cargo function, and enforce the specific CargoType constraint as well.

One plausible approach to all this is to define Carrier as a type, with the following functions defined on it:

Cargo:Carrier->Object,

CargoType:Carrier->Type.

The Cargo function could be redefined on any subtype of Carrier. If Cargo is redefined on a subtype T of Carrier, then the following rules apply for any instance x of T:

- The initial (default) value of CargoType(x) is the result type of T.Cargo.
- CargoType(x) must be a subtype of the result type of T.Cargo.

Any subtype of Carrier could be considered a carrier type.

Intensional sets are objects which are carriers of extensional sets, i.e., whose CargoType is a subtype of SET(Object). There does not seem to be a need to separately define a generator for intensional set types.

Bowlers and Golfers are always different intensional sets, being distinct instances of the type Team. Thus Bowlers=Golfers is never true. Their cargos may sometimes be the same, so that Cargo(Bowlers)=Cargo(Golfers) is sometimes true.

Most applications of the carrier/cargo concept take the following form:

- Define a Cargo function on a type T, making it a carrier type.
- Define additional operations on the type T.

The concepts extend trivially to multisets and lists. Details are not too important for the present paper. MULTISET and LIST can be defined as generators. MULTISET may or may not be a sub-generator of SET. There could be an abstract (non-instantiable) COLLECTION type which is the supertype of all the SET, MULTISET, and LIST types. Designator expressions are evaluated differently. Evaluating the multiset designator [Dick, Jane, Dick] does not eliminate duplicates, while evaluating the set designator {Dick, Jane, Dick} does. Similarly, the list designators <Dick,Jane> and <Jane,Dick> evaluate to different lists, while {Dick,Jane} and {Jane,Dick} evaluate to the same set.

The carrier/cargo paradigm also applies obviously to other aggregates such as arrays and vectors.

Tuple and record types will be useful later in the discussion of relational tables [Section 4.7].

A tuple type is defined by a sequence (list) of one or more types. There are two main variants on the tuple concept:

- The tuple type is simply defined by a sequence of types. The elements can be accessed by
numerical index, just like a list. Tuples of the same length are equal if corresponding
elements are equal. Column names in a table correspond to ordinal positions within its
tuples.
Within this variant there are two ways to relate tuples and lists:

- They are always different things. A list of two people is not an instance of TUPLE(Person,Person).
- Lists can be instances of tuples. A list of two people is an instance of TUPLE(Person,Person). More generally, a list of n elements is an instance of TUPLE(T1,...,Tn) if the i-th element of the list is an instance of Ti for each i.

- The tuple type is defined by a sequence of name:type pairs, where the name serves as an "attribute name" or "field name". The generator then takes the form TUPLE(a1:T1,...,an:Tn).This variant is sometimes also called a "record type". Tuple equality depends on matching attribute names as well as element values. The column names of a table are the attribute names of its tuples.

The TUPLE type generator is not a simple generator [Section 2.4.4], since it takes a list of types rather than a single type as its parameter. It has no corresponding maximal type. The type which contains all tuples has to be defined separately as the supertype of TUPLE(Object), TUPLE(Object,Object), TUPLE(Object, Object,Object),....

As a notational convenience, we allow G() to designate the union of all types generated by the generator G, allowing us to use TUPLE() to denote the type whose instances are all tuples.

A point in space is not the same thing as a point that can move in space. Points in a
geometric space are much like literals. Once the space has been defined, all the points
exist. They can't be created and destroyed, and they have immutable properties. Thus, we
might define GeometricPoint as a two-dimensional Euclidean space. The points in the space
all exist, having rectangular and polar coordinates as immutable properties. Such points
might be designated via either sort of coordinates, e.g., RectPoint(10,20) or
PolarPoint(10,30). Immutability of properties means that it makes no sense to move these
points; their coordinates can't be updated. The identity of a geometric point is
inseparable from its location; its location *is* its identity.

The definition of such a type might take the form

CREATE LITERAL TYPE GeometricPoint p

FUNCTIONS

Xcoord->Real;

Ycoord->Real;

Magnitude->Real;

Angle->Real;

CONSTRAINTS

Magnitude(p)=Sqrt(Xcoord(p)² + Ycoord(p)²);

Ycoord(p)=Xcoord(p) * Tan(Angle(p));

DESIGNATORS

RectPoint(x Real, y Real): Xcoord(p)=x, YCoord(p)=y;

PolarPoint(m Real, a Real): Magnitude(p)=m, Angle(p)=a;

EQUAL(RectPoint(x1,y1),RectPoint(x2,y2)): x1=x2, y1=y2;

EQUAL(PolarPoint(m1,a1),PolarPoint(m2,a2)): m1=m2, a1=a2;

EQUAL(RectPoint(x,y),PolarPoint(m,a)): m=Sqrt(x²+y²), y=x*Tan(a);

On the other hand, suppose we want to talk about a line between PointA and PointB, such that the line can be moved by moving these end points. These points are a different sort of thing, since they have to be explicitly created and destroyed, and they can be moved (i.e., their coordinates can change). Their identities are independent of their locations. They can change location without changing identity, and two of them can be in the same place without being the same point.

For these we need a different type, MovablePoint, which is a carrier type having GeometricPoint as its cargo:

CREATE TYPE MovablePoint

FUNCTIONS

Cargo->GeometricPoint;

Color->Colors;

OccursOn->SET(Line);

Location ALIAS Cargo;

PointA, an instance of MovablePoint, has an instance of GeometricPoint as its cargo. A property such as Xcoord(PointA) is defined by cargo casting, being evaluated as Xcoord(Cargo(PointA)); aliasing makes this equivalent to Xcoord(Location(PointA)). Mutators could be defined on MovablePoint such that, for example, Xcoord(PointA)<-25 means

Cargo(PointA)<-RectPoint(25,Ycoord(PointA)).

The same reasoning extends to other geometric constructs. A circle is not the same thing as a circular object. Circles are abstractions like literals, in the sense that all circles exist, and there is only one circle with a one-inch diameter. The identity of a circle is inseparable from its diameter (and its radius and circumference as well).

Circular objects, such as circular figures in a drawing, have to be created and destroyed, and many of them can have a diameter of one inch. Circular objects may change size without changing identity. Circles don't have a location, but circular figures do. Thus circular figures can be defined as a carrier type, having circles as their cargos.

A river is not the same thing as the water in it. A document is not the same thing as the text in it. A file is not the same thing as the bit-string or byte-string it contains. A container is not the same as its contents. All of these can be modeled as carriers and cargos.

Properties of document objects include such things as authors and publication schedules, which are not properties of character strings or structures of character strings. Operations such as concatenation, or finding substrings, are defined on character strings, not documents. Such operations can be performed on documents if they are defined as carriers whose cargos are character strings or structures of character strings (e.g., chapters). Documents are edited by effectively replacing one cargo with another.

Properties of files include such things as owners, creation dates, privileges, positions in directories, etc. Operations on bit strings are much the same as operations on character strings. Files can thus be modeled as carriers of bit string cargos.

A photograph is not the same thing as the image in it. Different photos may have the same image, and the image in a photo can be changed by retouching or other forms of enhancement. A photograph is thus a carrier having an image as its cargo.

A general characteristic of containers (things with content) is that many operations on the container objects are intended to be performed on their contents. In general, these cargos need not be literals.

Large objects [4,1] may also fit the paradigm of carriers and cargos. The distinction between a very long string and its "handle" is visible to the user to the extent that he can control which is to be moved or copied or otherwise operated on. This parallels the carrier/cargo distinction. At the same time, operations defined on strings are to be applied to their handles, with the intent of having them performed on the corresponding strings. This parallels the casting semantics of the Cargo function.

Different sorts of large objects (e.g., bit, byte, and character strings) can be distinguished by different result types on specific Cargo functions, or by different values of CargoType.

Some fine tuning may be needed. It may be desirable for equality tests on carriers to propagate to their cargos. Also, in order to behave like constants, the Cargo property should not be updatable once it has been initialized.

An event can be a carrier having a time point as its cargo. The time of occurrence can be an updatable property, at least for events scheduled in the future. Operations on events which should be "inherited" from time points include relative ordering of events (did the stockholder meeting occur before the election?), finding the time interval between two events (how long before?) and specifying some period of time before or after an event (three weeks before the election). Such operations are really defined on time points, not events. When applied to events, they could be interpreted in terms of their cargo time points.

For tables, the signature of the Cargo function defines the form of relational model, while the individual cargo types describe the row types for individual tables.

In its basic form, a relational table is a named, created object with an associated varying set of tuples of atomic elements, very much like a team whose members happen to be tuples rather than people. Thus a table is a carrier whose cargo type is SET(TUPLE(AtomicType,...,AtomicType)). This form of relational table is described by Cargo function signatures of the form:

Cargo: Table->SET(TUPLE(AtomicType,...,AtomicType)).

Variants of the relational model can be characterized by different sorts of cargo functions. Duplicates are admitted by using the MULTISET type generator in place of SET. Non-First Normal Form relations are admitted by allowing arbitrary tuples [Section 4.2], i.e.,

Cargo: Table->MULTISET(TUPLE()).

Ordered tables are admitted by allowing arbitrary collections. If the COLLECTION generator has SET, MULTISET, and LIST as sub-generators [Section 2.4.4], then the signature takes the form

Cargo: Table->COLLECTION(TUPLE()).

To go even further and allow a table to be a collection of arbitrary objects, the signature can be generalized to

Cargo: Table->COLLECTION(Object).

Generalized relations of this form have been considered for SQL [2,3].

The form of a particular table is given by the CargoType function. A Scores table might be created as an instance of Table with its CargoType initialized as

CargoType(Scores)<-SET(TUPLE(Person,Integer)).

In general, CargoType for a table has the form COLLECTION(RT), where RT is the *row
type* of the table.

Overall, this approach differentiates between a table object and the collection it carries as its cargo. This allows queries to be described more precisely as closed operations on collections rather than on tables. The semantics are the same. When a query is applied to a table, it is naturally cast into a query on the associated collection. Queries on multiple tables can be defined as operating on the cross-products of their associated collections.

Letting the results of queries be collections rather than tables simplifies certain existence, identity, and typing issues. If results of queries are table objects, then executing the same query twice necessarily yields unequal results. Treating the results as extensional collections allows them to be equal, in the sense that {Tom} UNION {Dick} and {Tom} UNION {Dick} yield the same result.

Collections are instances of generated types, which are ephemeral and hence don't need explicit creation [Section 2.4.4]. Hence result types do not have to be created for the results of arbitrary queries. Result types can be determined from the form of the SELECT clause. For example, the result of a query to SELECT x.Name, x.Age FROM... is an instance of MULTISET(TUPLE(Character,Integer)). This type always exists, so long as the types Character and Integer exist, and does not have to be fabricated for any particular query.

If tables are desired as query results, then the collection yielded by a query can be assigned as the cargo of a table object, so long as it satisfies the CargoType constraint of the table.

As with carriers in general, tables have other properties in addition to the properties of their cargos, such as names, the schemas they belong to, and authorization constraints. Depending on how tuples are defined [Section 4.2], the list of column names could also be a property of the table.

Even a variable is not quite the same thing as its value. A carrier and its cargo bear some resemblance to a variable and its value, with the cargo type of the carrier being analogous to the declared type of the variable. The Cargo function is somewhat analogous to an expression evaluator - for a carrier c, f(c) sometimes behaves like f(Cargo(c)), while for a variable x, f(x) behaves like f(Eval(x)). In other words, a counter such as Symbols sometimes behaves like an integer-valued variable.

However, carriers and cargos do not really provide a good model for variables and values. There are significant differences. Carriers and variables are created in different ways - carriers are created explicitly as persistent objects, while variables are implicitly created via declarations during a particular execution of a procedure. Carriers have single cargos, while the value of a variable may depend on which procedural environment it occurs in. A reference (not an assignment) to a variable is always replaced by its value, while a reference to a carrier is only replaced by its cargo when carrier casting applies, or when Cargo is explicitly invoked. Finally, it is not possible to define additional properties for a variable, apart from the properties of its values.

Carriers and cargos provide a general mechanism for modeling many sorts of things which come in natural pairs, usually having an intensional/extensional or container/content flavor.

The Cargo and CargoType functions are defined on all carriers, imposing the constraint

Cargo(x) IN CargoType(x).

A carrier type is the argument type of a specific Cargo function. The default CargoType for a carrier is the result type of the specific Cargo function defined on its carrier type. The CargoType can be explicitly specified for individual carriers, so long as it remains a subtype of the default.

The essential characteristics of carriers and cargos:

- They are distinct objects, each having properties of their own.
- The Cargo function serves as a casting function, so that a carrier can be replaced by its cargo in operations which are defined on the cargo but not on the carrier. Thus in many respects a carrier behaves as though it was its cargo.

This approach seems useful for such things as intensional sets and other collections, documents, files, geometric and spatial constructs, relational tables, and many others.

This approach is closely related to identity vs. equality in some object models, with identity referring to objects and equality referring to their "values" [6]. Such values are in fact the cargos for objects which are their carriers. The difference in the present approach is to recognize such objects and their values as being distinct things, with an explicit mapping between them.

Although it seems quite promising, this approach still needs to be validated by prototyping in a real language. Areas needing further investigation include:

- Interaction with other forms of inheritance and casting.
- Type checking mechanisms for enforcing CargoType constraints.
- Complex cargos. A movie or video might contain several concurrent frame sequences and sound tracks, with synchronization controls.

- ANSI X3H2-93-359R/ISO DBL MUN-003,
*(ISO-ANSI Working Draft) Database Language SQL (SQL3)*, Jim Melton (ed.), August 1993. - ANSI X3H2-93-231, "Tables of ADTs", May 1, 1993, by Phil Shaw.
- ANSI X3H2-93-302, "Removal of table of ADT", March 8, 1993.
- ANSI X3H2-93-341r1, "Large Object Strings", July 11, 1993, by Paul Cotton, Toby Lehman, Nelson Mattos and Frank Pellow.
- ANSI X3H2-93-425, "Another Approach to BLOBs", Sept. 10, 1993, by Jim Melton.
- Setrag Khoshafian and George Copeland, "Object Identity", Proc. Conf. on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA), Portland, Oregon, 1986.