William Kent, "The Evolving Role of Database in Object Systems", British National Conference on Databases, York, England, July 1990. Also HPL-90-04, Hewlett-Packard Laboratories, Feb. 1990. [6 pp]

The Evolving Role of Database in Object Systems

William Kent

> ABSTRACT
> 1 INTRODUCTION . . . 2
> 2 BEHAVIORAL AND STRUCTURAL OBJECT ORIENTATION . . . 2
>> 2.1 Two Levels of Interface . . . 2
>> 2.2 Corollaries For Data Analysis And Design . . . 4
> 3 TRANSIENT AND PERSISTENT OBJECTS . . . 5
> 4 CONCLUSION . . . 7
> 5 ACKNOWLEDGMENTS . . . 8
> 6 REFERENCES . . . 8

ABSTRACT

In the context of object systems, database as we know it today may not survive as an identifiable entity. The distinction between behavioral and structural object orientation will diffuse the database interface, while the management of transient and persistent objects will blur the distinction between application run-time environments and database environments.

1 INTRODUCTION

The nature of database in object systems will be profoundly influenced by two factors: the distinction between behavioral and structural object orientation, and the relationship between transient and persistent objects. These factors have opposing effects. Behavioral and structural object orientation will divide a single interface into two, while the management of transient and persistent objects will unify two interfaces into one.

Other factors will also be influential, and predictions are risky. But a plausible extrapolation suggests that database as we know it today may not survive as an identifiable entity distinct from other system components.

2 BEHAVIORAL AND STRUCTURAL OBJECT ORIENTATION

2.1 Two Levels of Interface

Today's applications use database operations to manipulate data structures. There is a single interface between user programs and the system code managing the database (Figure 1). The semantics of the operations at this interface are defined in terms of the system-supplied data structures.

                                  |
                   application--->|structure
                                  |
                    user programs | database
                                  |

                              Figure 1.

Object-oriented database introduces a new middle ground [Ke] (Figure 2). Object-oriented applications no longer directly manipulate data structures. In keeping with the object-oriented principle of abstraction, applications apply operations (send messages) to objects without knowing the implementation of such objects.

                           B          S
                           |          |
            application--->|method--->|structure
                           |          |
                           |          |
                           | ......object base........
                           |          |
           ......user programs....... |
                                      |

                            Figure 2.

Applications use operations defined to be meaningful in the application domain. Working on documents, applications might use operations that return the next line after a given line, or insert a paragraph or a diagram between two other paragraphs. Other operations might identify or retrieve the authors of the document, or define the contract between the publisher and authors, or manage the records of royalty payments. The applications would not manipulate arrays or lists or pointers or tables. The structure of a document, as well as its other related information, could be implemented in various ways using such data constructs, which should be of no concern to the applications.

The operations are implemented by methods, which are programs in the object base that access the underlying data structures. An operation may have various implementations in terms of different method programs and different data structures, so long as they all provide the same behavior to the application.

The boundary between application and object base no longer coincides with the boundary between user programs and data structure. User programs exist both as applications and as methods in the object base. The object base includes both method programs and data structures. Applications do not manipulate data structures, but invoke operations implemented by methods which manipulate data structures.

In a certain sense, the whole notion of data structure is an illusion supported by a hierarchy of levels of abstraction. The application is using operations implemented by methods designed to provide the semantics of documents. Those methods in turn look like applications using operations implemented in, say, a relational data manager to provide the semantics of relations. The relational data manager in turn looks like an application using operations implemented in a file system and an operating system, to provide the semantics of files and buffers. This layering of abstraction illusions cascades down until we get to physical behaviors that support the semantics of binary digits implemented in magnetic and atomic phenomena.

Where is database in this picture? It depends on what you mean by database. Database is suffering an identity crisis at dual interfaces.

In its traditional role as a manager of structured data, the database exists to the right of the interface we have labeled "S" for "structure". A database is "structurally object oriented" [Di] if this interface supports complex data structures that physically mirror the structural composition of objects. In a structurally object-oriented database, a document might be represented as a data structure including all of its text and diagrams as well as lists of authors, contracts, royalty records, etc. Operations to manipulate such data structures are provided by the database management system.

However, in its other traditional role of providing the interface by which applications manipulate data, the database exists to the right of interface labeled "B" for "behavior". A database is "behaviorally object oriented" [Di] if this interface supports operations defined to be meaningful in the application domain.

The behavior of complex objects can be described without reference to the implementing data structures, in terms of propagation of operations. For example, a diagram behaves as though it were contained in a document if the diagram is displayed, copied, or destroyed when the document is displayed, copied, or destroyed.

At this level, from the viewpoint of the application, database is evolving from its traditional role of supporting structure to a new role of supporting behavior. Structure is hidden from the application by the methods implementing the behavior.

Structural and behavioral object orientation are orthogonal. Either can be provided without the other. Structural object orientation without behavioral retains the configuration of Figure 1, the difference lying in the complexity of data structures available to applications. More complex data structures allow applications to model more complex structures of objects. Applications still manipulate data structures, and are still sensitive to differences or changes in the structures used to implement the applications. In this configuration, the database management system itself constitutes the methods implementing the operations used by the applications.

Behavioral object orientation without structural allows applications to be expressed in terms of operators which are semantically meaningful to the application domain, without benefit of complex data structures that might improve the efficiency of the applications. Applications can be designed in object-oriented fashion, yet implemented with methods that map to conventional data storage such as files or relational databases. Such applications are relatively data independent and robust, enjoying most of the benefits of object orientation: ease of development, extensibility, maintainability, adaptability, interoperability, etc. They can be migrated to more efficient implementations, including various forms of structural object orientation as they become available, in object-oriented fashion by altering the methods that implement the operators.

2.2 Corollaries For Data Analysis And Design

This realignment of interfaces implies corollary realignments in the roles of data analysis and design.

Data analysis produces a conceptual schema formally capturing the semantics of the data requirements, generally expressed in terms of entities and relationships in current methodologies. The design phase then transforms the conceptual schema into a data model specifying the data structures in the database which will be used by applications. There is currently a "semantic gap" between the concepts and structures in which the conceptual schema expresses requirements and the concepts and structures in which applications manage information (Figure 3). Data analysts and application developers speak different languages.

            analysis         design
      rqmts---------->conc------------->data
                      schema            model
                       :                  |
                       :   application--->|structure
                       :                  |
                       :    user programs | database
                       :                  |
                       :                  |
                       :<--semantic gap-->|

                          Figure 3.

The methodology will obviously adapt to object orientation by expanding concepts. The notions of entities and relationships will be enriched to encompass such object-oriented features as subtypes, behavior, polymorphism, etc. Data design will encompass more complex data structures.

              analysis       design
        rqmts---------->conc-------->data
                        schema       model
                           |          |
            application--->|method--->|structure
                           |          |
                           |          |
                           | ......object base........
                           |          |
           .......user programs...... |
                                      |

                       Figure 4.

Perhaps less obvious is a shift in boundary alignments, the closing of the semantic gap (Figure 4). In effect, although the major phases of data analysis and design remain much the same, their "clients" will shift. The conceptual schema, now expressed in terms of objects, is directly usable as the interface to which applications are developed. The specifications produced by the design phase are for the benefit of method writers, not application developers.

3 TRANSIENT AND PERSISTENT OBJECTS

Object-oriented programming emerged as a way to enrich the data structures available to programs while making the programs less dependent on the implementation details. Object orientation was realized in terms of enhancements to run-time libraries, binding of variables, storage (heap) management, type checking, and similar programming facilities.

Typical objects were extensions of familiar programming constructs, such as lists, arrays, queues, and stacks. These objects were used in the course of some computation, vanishing when the program terminated. Then, for some applications, a need was realized for such objects to persist, to remain available for later executions of the same or different programs.

File systems provide the first approach to persistent objects. Objects are made persistent by writing them to files. When referenced, they are read back from the files into the run-time environment (Figure 5).

                         :
          application--->:transient run-time environment
                         :     |
                         :     |
                         :     V
                         --------------------------------
                           persistent environment

                                Figure 5.

Many influences are currently at work, and it's not entirely clear how to present them in a historical or logical sequence. It may not matter, since the various paths are likely to converge (evolution progresses along many paths), and debates over how we'll get there aren't that important.

Longevity isn't enough; persistent objects need other database-like services such as recovery, concurrent usage, and security. Database techniques for optimizing data access can sometimes eliminate the need to move entire objects into program space. Those objectives, as well as requirements to understand the scope of copy and delete operations, mean that the database must understand more of the semantics of objects, such as object properties, inter-object references, and sub-object containment.

Database management systems provide a persistent execution space as well as secondary storage. Executing methods in the persistent space of the database management system rather than in the application's transient run-time environment allows optimized data retrieval, and avoids the double movement of data from disk to database buffers to application space.

As more and more of the semantics of objects move into the persistent space, object services appear redundantly and often inconsistently in both spaces. This includes such services as the definition, installation, and maintenance of types and classes; resolution of polymorphism; type checking; and storage management and garbage collection. Other facilities are also provided unevenly, such as complex queries and set-oriented processing.

These factors sometimes lead to a configuration which requires applications to be aware of such differences and interact separately with programming facilities and database facilities (Figure 6).

                         :
          application--->:transient run-time environment
               |         :     |
               |         :     |
               V         :     V
          -----------------------------------------------
                  persistent environment

                            Figure 6.

The existence of all these interfaces may not be recognized. There is often a feeling that the application and its run-time environment are intimately bound together, while the file system or the database are alien, on the other side of some fence. The boundary to the persistent environment (file system or database) is recognized as an interface, while the boundary between the application and its run-time environment might not be.

Trying to extrapolate logically into the future (always a risky business), it seems we ought to evolve to a unified object management architecture (Figure 7).

                         |
                         |    transient environment
                         |
          application--->|object mgr ................
                         |
                         |    persistent environment
                         |

                         Figure 7.

Here applications deal with a unified object management interface, behind which are managed transient and persistent objects in a consistent and integrated fashion. The boundary between applications and object management becomes more defined, while the distinction between transient space and persistent space may become less so.

We can get there by several paths: extending persistent programming objects to provide database capability, or extending databases with object-oriented capability. The two forks will merge in a unified facility, and the debates over how we'll get there won't matter.

Database seems to have merged here with the run-time program environment, likely to be reflected in a unified database programming language (DBPL). Thus, while the distinction between transient and persistent objects remains, other object services are provided in a coordinated and consistent fashion. As mentioned earlier, object services include such things as the definition, installation, and maintenance of types and classes; resolution of polymorphism; type checking; and storage management and garbage collection. A unified DBPL would also provide other services uniformly, such as complex queries and set-oriented processing.

Arriving at such a unified DBPL will require some innovation in program and operating system technology to coordinate such things as process and storage management, and to be able to bind program variables directly to persistent space (e.g., database buffers) at run time.

4 CONCLUSION

Combining the two lines of development, we arrive at the configuration of Figure 8.

                         behavioral  structural
                         -----------------------
                         |          |          |
                         |method--->|structure |transient
                         |          |          |
          application--->|------object mgr-----|
                         |          |          |
                         |method--->|structure |persistent
                         |          |          |
                         -----------------------

                     Figure 8.  Where's the Database?

Once again, where's the database in this picture? We've already seen its identity split between behavioral and structural, depending on whether you think of database as the application's interface to data or as the manager of stored data structures. Now it's in danger of losing its identity as something distinct from the management of transient run-time program data. There is so much in common in the management of transient and persistent objects that the concept of database may give way to the unified notion of object base, integrating the management of transient and persistent objects.

We've only considered some factors here. Other requirements seem to work against such a monolithic approach. Rather than a simple dichotomy of transient and persistent spaces, we may need a spectrum of varied capabilities. Truly process-local data may need minimal services, but query and set-processing capability might be desirable. File systems provide persistence with a minimal degree of other database services, though security might be wanted here. Persistence may come in a range of durations, reflecting the varied lifetimes of short and long or nested transactions. Private databases (e.g., locally checked out data) may not need concurrency or security control. Distributed systems and client/server architectures also suggest dispersal of capabilities. The management of application operators strongly overlaps the technology of network message management.

Thus the configuration of Figure 8 may be too simplistic, but it is a reasonable intermediate stage of extrapolation. Other investigations [Th] are exploring how the various facilities of object and database technology can be mixed and matched in different contexts.

5 ACKNOWLEDGMENTS

This work evolved in discussions with Kevin Wentzel of Hewlett-Packard and Mary Loomis of Object Sciences in the course of working on a reference model for the Object Management Group.

6 REFERENCES

[Di] K.R. Dittrich, "Object-Oriented Database Systems: The Notion and the Issues", Proc 1986 IEEE International Workshop on Object-Oriented Database Systems, Asilomar, Pacific Grove, California, Sept. 23-26, 1986 (K.R. Dittrich and U. Dayal, eds).

[Ke] William Kent, "Object-Oriented Database: New Roles and Boundaries", InfoDB 4(3) Fall 1989.

[Th] Craig Thompson et al, "Open Architecture for Object-Oriented Database Systems", Texas Instruments Information Technology Laboratory Technical Report 89-12-01, December 6, 1989.