William Kent, "User-Defined Behavior of Generic Functions", HPL-93-57, Hewlett-Packard Laboratories, July 1993. [8 pp]


User-Defined Behavior of Generic Functions

William Kent
June 1993

> 1 INTRODUCTION . . . 1
>> 1.1 Purpose . . . 1
>> 1.2 Overview . . . 2
> 2 SPECIFIC FUNCTIONS . . . 2
> 3 GENERIC FUNCTIONS . . . 4
> 4 APPLICATION TO MULTIDATABASE SYSTEMS . . . 7
>> 4.1 Function Merging for Inconsistent Data . . . 7
>> 4.2 Merging Equivalent Objects . . . 9
> 5 CONCLUSIONS . . . 10
> 6 REFERENCES . . . 10


1 INTRODUCTION

1.1 Purpose

Overloaded functions (polymorphic operations) are a mainstay of object orientation. A "generic function" might be thought of as the equivalence class of functions having the same name. Little attention is paid to the generic functions themselves in most object-oriented languages. Even in languages such as OSQL [3], generic functions are implicitly defined and managed behind the scenes by the system.

Specific functions having a given name imply the existence of a generic function with that name. Allowing users to explicitly define behavior of generic functions to override the behavior implicitly defined by the system would permit:

This capability is being proposed as an extension to OSQL [3]. Besides their general utility, these features are particularly useful for multidatabase systems [Section 4].

1.2 Overview

A behavior may be specified for a generic function f with respect to a set of relevant types. Specifications may include a result type, a default result value, a disambiguation specification, and a uniqueness specification (these terms will be defined shortly). Different behaviors may be specified for different sets of relevant types. For example, they may have different result types; any function named f whose argument type is in a relevant type set must have the specified result type.

When f(x) is invoked:

Another extension proposed below would allow automatic resolution of overload ambiguity when the eligible functions are consistent, i.e., all return the same value.

2 SPECIFIC FUNCTIONS

A specific function has:

Inheritance behaves as follows:

A subtype T2 of T1 is an immediate subtype of T1 (and T1 is an immediate supertype of T2) if no supertype of T2 is a subtype of T1. An instance x of type T is an immediate instance of T (and T is an immediate type of x) if x is not an instance of any subtype of T.

In particular, both T1.f and T2.f are known for T4 in the following case:

                  ------
                  | T1 |
             T1.f ------
                    :
          .....................
          :                   :
       ------               ------
       | T2 |               | T3 |
  T2.f ------               ------
          :                   :
          .....................
                    :
                  ------
                  | T4 |
                  ------

Unless otherwise defined for the generic function, all specific functions having the same simple name must have the same result type. (OSQL may allow overloaded functions defined on literals to have different result types from functions with the same name defined on surrogates. That's not particularly significant here.)

A function call may involve a simple (generic) or specific function name. If specific, the call invokes the behavior defined for the specific function.

Unless otherwise defined for the generic function, a generic function call f(x) is processed as follows:

1 The eligible functions are the specific functions T.f which are known for the immediate types of x.

2 If there is exactly one eligible function, invoke it and exit.

3 If there are no eligible functions:

3a If strict type checking is in effect, then raise a type violation error and exit.

3b Else return null (or empty) together with a warning condition (if warnings are supported) and exit.

4 If there is more than one eligible function:

4a If all the non-null values are the same, then return that value (or null if all the values are null) and exit.

4b Else raise an ambiguity error and exit.

Item 3b (relaxed type checking) is supported in OSQL.

In current OSQL, f(x) is considered ambiguous when there are several eligible functions - even if they have the same value. Item 4a is a proposed extension allowing automatic disambiguation based on consistent values of the eligible functions.

3 GENERIC FUNCTIONS

A generic function has:

Behavior of a generic function may be defined with the following statement (the syntax is illustrative, in the style of OSQL [3]):

DEFINE GENERIC FUNCTION generic-name
[FOR type-list]
[RESULT_TYPE result-type]
[DEFAULT_VALUE [FOR var-1 IS] expr-1]
[DISAMBIGUATE [FOR var-1] USING expr-2 WITH {VALUE_BAG | FUNC_SET} var-2 ]
[UNIQUE];

We will use the function name f for illustration, with f(x) being a generic function call. (Note: we don't address the case where x is null.)

The statement may occur before any specific function named f is created, in order to provide control over result types.

The relevant types are those in the type-list and their subtypes. Types which become subtypes of these in the future will automatically be included. Different behaviors may be defined for different sets of relevant types for the same generic function f. A type may not be in more than one relevant type set for a given generic function. If type-list is omitted, then the relevant type set contains all types.

Specified behaviors are associated with a particular generic function and relevant type set.

If result_type is specified, then any specific function named f whose argument type is in the relevant set must have the specified result type. Specific functions named f may have different result types if their argument types are in different relevant type sets with associated result-type specifications. Specific functions named f whose argument types are not in any relevant type set having a result-type specification must all have the same result type; this is the default result type.

The DEFAULT_VALUE clause has the effect of making a function f defined for all the relevant types. If the types of x are in exactly one relevant type set for f having a DEFAULT_VALUE specification, and there is no specific function T.f known on any type T of x, then expr-1 defines the value of f(x). If var-1 is provided, it is bound to the value of x, and it may be used in expr-1. In the following example, if T0 is in the type-list but there is no specific function T0.f, then expr-1 defines the value of f for immediate instances of T0. This can be considered a limited form of upward inheritance.

                  ------
                  | T0 |
                  ------
                    :
                  ------
                  | T1 |
             T1.f ------
                    :
          .....................
          :                   :
       ------               ------
       | T2 |               | T3 |
  T2.f ------               ------
          :                   :
          .....................
                    :
                  ------
                  | T4 |
                  ------

The DISAMBIGUATE clause specifies action to be taken when a generic function call f(x) is ambiguous, and the argument types of all the eligible functions are in the relevant type set. The value of expr-2 is returned; it must yield a value whose type is the specified or default result-type. The variables var-1 and var-2 may be used in expr-2. If var-1 is provided, it is bound to the value of x. If VALUE_BAG is specified, then var-2 is bound to the bag of non-null results of the evaluated eligible functions. If FUNC_SET is specified, then var-2 is bound to the set of unevaluated eligible functions, allowing expr-2 to reason over these functions and select the ones to be invoked.

The UNIQUE keyword signifies that the generic function must be unique-valued for all instances of all its relevant types. It means that Ti.f(x)=Tj.f(y) => x=y if Ti and Tj are relevant types. Thus, for example, if a generic function named SSN ("social security number") was defined to be unique over a set of types, then distinct instances of relevant types may not have the same value of SSN, even via different specific functions named SSN. In OSQL, without this extension, uniqueness can only be specified within an individual specific function.

Section 4 provides other examples.

The processing of a generic function call f(x) is now extended as follows:

1 The eligible functions are the specific functions T.f which are known for the immediate types of x.

2 If there is exactly one eligible function, then invoke it and exit.

3 If there are no eligible functions:

3a If the types of x are in exactly one relevant type set for generic function f which has a specified result value, then return that value and exit.

3b If strict type checking is in effect, then raise a type violation error and exit.

3c Else return null (or empty) together with a warning condition (if warnings are supported) and exit.

4 If there is more than one eligible function:

4a If the argument types of all the eligible functions are in exactly one relevant type set for generic function f which has a disambiguation specification, then use that disambiguation specification and exit.

4b If all the non-null values are the same, then return that value (or null if all the values are null) and exit.

4c Else raise an ambiguity error and exit.

As before, the non-underlined portions of items 3 and 4 characterize the default behavior of the generic function.

4 APPLICATION TO MULTIDATABASE SYSTEMS

Multidatabase systems such as Pegasus and Multibase [1, 2] include the ability to reconcile inconsistent data from different data sources, and also to treat things imported from different data sources as the same object. These things can be done with user-defined generic function behavior.

The general approach to integration is to import data from different sources as distinct imported types, and then to provide correlation mechanisms across the imported types.

4.1 Function Merging for Inconsistent Data

A typical situation here is that data from two sources may be inconsistent, e.g., Salary(x) in one source differs from Wages(x) in another. If we impose the not unreasonable requirement that "semantically equivalent" functions should have the same name, even by renaming if necessary, then the disambiguation behavior of generic functions can be used to correlate their values.

Suppose we have the following schema after renaming, where T1 and T2 are imported types:

                       ------
                       | T0 |
                       ------
                         :
               .....................
               :                   :
            ------               ------
            | T1 |               | T2 |
  T1.Salary ------               ------ T2.salary

Then

DEFINE GENERIC FUNCTION Salary
FOR T0
RESULT_TYPE Number
DEFAULT_VALUE Return(0)
DISAMBIGUATE USING Average(sal_bag) WITH VALUE_BAG sal_bag;

defines the generic function behavior for Salary(x) which would be applicable if x is an instance of T0 or any of its subtypes. Any Salary function defined on T0 or any of its subtypes must have result type Number. When Salary(x) is invoked:

If the behavior was defined as

DEFINE GENERIC FUNCTION Salary
FOR T0
RESULT_TYPE Number
DEFAULT_VALUE Return(0)
DISAMBIGUATE FOR x USING Best_sal(x,funcs) WITH FUNC_SET funcs;

then, if x belongs to both T1 and T2, the functions T1.Salary(x) and T2.Salary(x) will not be evaluated. The variable funcs will contain the set of unevaluated functions T1.Salary and T2.Salary, which in this case will be passed to the Best_sal function along with x. That function might, for example, consider T1 to always be more reliable, and only evaluate and return T1.Salary(x). It might also do more complex reasoning, such as dealing with null values.

If no disambiguation was defined, then the default behavior in case of ambiguity under the present proposal would be to return the consistent value if T1.Salary(x)=T2.Salary(x), and raise an ambiguity error otherwise.

This approach to correlating imported data has several benefits:

Accidental conflicts with other unrelated functions which might also be named "Salary" can be avoided by

4.2 Merging Equivalent Objects

Multidatabase systems provide the ability to treat objects imported from different external data sources as being the same object, based on some criterion such as social security number. The general mechanism is to provide equivalence specifications across the imported types [2]. Unique-valued generic functions can simplify their specification.

A simple equivalence specification might take the form

T1.SocSecNum(x)=T2.SSN(y) => x=y,

implying that x and y are to be treated as the same object if T1.SocSecNum(x)=T2.SSN(y). Generic functions can become involved if we again adopt the same-name assumption for semantically equivalent functions. If the functions are all named SSN, and the generic function named SSN is defined to be unique for T1 and T2, this would imply

T1.SSN(x)=T2.SSN(y) => x=y.

More generally, we would have

Ti.SSN(x)=Tj.SSN(y) => x=y

whenever Ti and Tj were relevant. Thus a single generic function behavior definition can imply equivalence specifications between many pairs of types.

The equivalence holds even when Ti and Tj are the same. Thus if two people in the same imported type somehow had the same social security number, they would be treated as the same person. (A more plausible example might arise when courses using the same textbook are to be treated as the same course.) Of course, the earlier treatment of function merging does not work for such intra-type equivalence. Such intra-type equivalence is beyond the scope of the present paper.

We need to distinguish between two responses to "violations" of uniqueness:

This can be distinguished on the basis of a "view" concept:

Equivalence based on conjunction of multiple properties, e.g., Nationality and PassportNumber, can be handled by defining another function on each of the imported types as the concatenation of Nationality and PassportNumber, e.g.,

Ident(x) ::= <Nationality(x),PassportNumber(x)>,

and then defining the generic function Ident to be unique.

Equivalence based on disjunction of multiple properties, e.g., social security number or passport number, can be handled by separately defining a unique-valued generic function for each. (Cycles of such equivalence specifications can also cause intra-type equivalence. A person in one type might have a social security number corresponding to one person in another type and a passport number corresponding to a different person in that type.)

As before, this approach is readily extensible. New imported types can be accommodated by appropriately naming the functions and making the types relevant to the generic type.

5 CONCLUSIONS

Allowing users to explicitly define behavior of generic functions permits:

This capability is being proposed as an extension to OSQL [3].

6 REFERENCES

  1. R. Ahmed, P. DeSmedt, W. Du, W. Kent, M. Ketabchi, W. Litwin, A. Rafii, M.-C. Shan, "The Pegasus Heterogeneous Multidatabase System", IEEE Computer, December 1991.
  2. Umeshwar Dayal and Hai-Yann Hwang, "View Definition and Generalization for Database Integration in a Multidatabase System", IEEE Trans. on Software Engineering, SE-10(6), Nov. 1984, pp. 628-645.
  3. Dan Fishman, et al, "Overview of the Iris DBMS",Object-Oriented Concepts, Databases, and Applications, Kim and Lochovsky, eds, Addison-Wesley, 1989.