A Taxonomy for Entity-Relationship Models

William Kent
1983 


> ABSTRACT
> 1 INTRODUCTION . . . 1
> 2 THE QUESTIONS . . . 2
> 3 OTHER QUESTIONS, OTHER MODELS . . . 5
> 4 CONCLUSIONS . . . 6
> 5 REFERENCES . . . 6


ABSTRACT

There is a long list of characteristics by which many variations of entity-relationship model can be differentiated.

1 INTRODUCTION

It is inaccurate to speak of "the" entity-relationship (ER) model. There are many ER models. ER is an approach rather than a single model, as reflected in the names of the definitive conferences on the subject [ERA1-3].

One of the principal debates over the ER "model" is whether or not the attribute concept is necessary. Also at issue is the degree of relationships: are they binary or n-ary? These alternatives yield at least four legitimate ER models, having either binary or n-ary relationships, with or without an attribute concept. This was the basis of the framework for ER models proposed in [Chen].

There are, however, many more ways in which ER models differ from one another. We have developed a list of such characteristics, which we express as questions with multiple possible answers.

The list is preliminary, not definitive. It suggests a direction of work, in terms of completing and agreeing on such a list of questions and possible answers. In particular, we haven't dealt with the various extensions and enhancements that have been proposed in many ERA papers [ERA1-3].

Such a list of characteristics provides a basis for defining and classifying the variants of the ER model. It should be used as a checklist against any published work dealing with an ER model: does the work clearly define its variant of the model? Does it provide answers to all the questions in the list? Most (if not all) works involving the ER model are ill-defined in this regard. It might be interesting to see how well the questions can be answered with respect to each of the papers appearing in [ERA1-3]. Such an exercise might also uncover additional characteristics that should be listed.

The list should provide a basis for defining certain "generic" models, such as binary relational or entity-attribute-relationship (EAR) [ISO]. Does "binary" refer to the whole family of models in which relationships have exactly two participants, or are there some assumptions regarding the other characteristics as well? A similar question can be addressed to the "EAR model": does the term cover all variants that have any form of attribute concept, or is it limited to one definition of "attribute" (which?), and what is assumed about the other characteristics?

Incidentally, while speaking of classifying models, we might note that there are models which don't use the terms "entity" and "relationship", but are ER models nonetheless. One such model is the "object-role" model of Falkenberg [Falk].

The work of refining, adopting, and applying this taxonomy is not "research" work. There are no natural answers waiting to be discovered. The answers depend on what the model is to be used for, and by whom [Kent]. In any case, the answers will have to be a matter of consensus, to be arrived at by some study group or advisory committee, or (less likely) by an evolutionary pattern of common usage.

The number of ER models, according to this taxonomy, is on the order of 2**n, where n is the number of questions, if the questions have two possible answers. The exact number of models doesn't matter - this isn't a mathematical exercise, but an informal exploration into the diversity of ER models. The exact number is affected by the number of questions that we consider relevant, by the fact that some questions may have more than two answers, and by the fact that some combinations don't make sense.

We only provide the framework of the taxonomy. We don't try to classify any models - we can't, because most of them are ill-defined, as we've just said. We don't know how to classify them.

2 THE QUESTIONS

1. What is an entity?

a. Any thing.

b. An important thing, something about which information is maintained.

c. Something about which single-valued information is maintained.

d. A collection of information.

e. A collection of single-valued information.

2. Is it expected that an entity type corresponds to a record type (relation, table, etc.)?

If so, is the correspondence between entities and records one-to-one? If not, why not?

3. Can entity subtypes be specified?

4. Are all descriptions of an entity type inherited by its subtypes?

5. Can non-subset overlaps be specified?

E.g., customers and suppliers might overlap, with neither being a subset of the other.

6. Is there an attribute concept? (If not, skip to question 12.)

7. What is an attribute?

a. Any fact about an entity.

E.g., employee number, department, and height might be three attributes of an employee. (This is the attribute concept in the relational model.)

b. Something which is associated with an entity, but is not an entity in itself.

E.g., an employee's height.

c. The representation of something associated with an entity.

E.g., if one employee's height is "6 feet" and another's is "72 inches", they might not have the same height attribute.

Note: some models introduce a "value" concept, in which "attribute" refers to the type of information, e.g., height, and "value" refers to a particular occurrence, e.g., your height. The question still remains open: are "6 feet" and "72 inches" the same value of the height attribute?

d. Or is an attribute something that can be associated with various entity types?

E.g., is height one attribute, regardless of whether it is the height of a person or of a building, or are "height of person" and "height of building" two different attributes?

8. For definition 7b, is there an objective basis for distinguishing between attributes and entities?

E.g., what are the criteria for deciding whether or not the city in which an employee was born is to be treated as an entity?

What are the consequences? If the model provides attributes, but one chooses to model everything using only entities and relationships, what is lost?

9. Can attributes be multi-valued?

E.g., the multiple colors of an object.

10. Can attributes have attributes?

E.g., the percentage of the object covered by each color.

11. Can attributes be optional? (Omitted, null.)

12. Is there a domain concept? Does it correspond to entities or to representations?

E. g., if we have employees and employee numbers and social security numbers, how many domains are there? Ditto for weights in grams and weights in pounds.

13. Does "domain" refer to a perhaps infinite population of possible past and future occurrences, or does it refer to some finite set of currently valid occurrences?

14. Can relationships be n-ary (n>2)?

15. Can they be unary?

16. If binary, are they directed?

17. If binary, is there one relationship for each direction?

18. If binary, are there one or two names for the relationship?

19. Are relationships named?

Does the model allow for different relationships between the same pair of entity types?

20. Are the roles named?

21. Are many-to-many relationships permitted?

22. Is there a corresponding concept for n-ary relationships?

23. Can several entity types participate in the same role of a relationship?

24. Can optional relationships be specified?

Some vs. all.

25. Can relationships have attributes?

26. Can relationships participate in other relationships?

27. Are relationships entities?

28. Is there a distinction between things and their representations?

E.g., lexical object types and non-lexical object types.

One test: how many of the following can be distinctly expressed?

29. Are representations considered to be entities in themselves?

30. Is the connection between a thing and a name modelled as a relationship?

31. What other auxiliary features does the model include? How are they defined?

Some candidates: constraints, existence dependences, sets, ordering, time, events, processes, update, propagations, ...

3 OTHER QUESTIONS, OTHER MODELS

This approach to a model taxonomy can be extended to other models as well. Below we list some similar questions which might be applied to other models. Some of these questions may be applicable to the ER models as well, which would provide yet another point of discussion.

32. Does the model provide for entity types?

33. Are the types modelled as entities in themselves?

I.e., "entity type" is an entity type in itself, whose occurrences are the entity types.

34. Can an entity type have more than one name?

35. Is the type-instance connection modelled as a relationship?

36. Is more than one level of type-instance hierarchy allowed?

E.g., "engineer" is an entity type, but it is also an occurrence of the "job" entity type.

37. To what extent does the model require that entities of the same type have the same identifiers, relationships, attributes?

38. How complex is the unique identification of entities allowed to be?

39. Does the model allow more than one identifier type for an entity type?

40. Does the model allow more than one identifier for an entity occurrence?

41. May an occurrence of a relationship have more than one entity instance occurring in a given role?

In some formulations, a department and all of its employees constitutes one occurrence of the "employs" relationship.

42. Does the model have any provision for redundancy among relationships (e.g., sub-relations, or implications)?

43. Does the model allow relationships between entities of the same type?

44. Does the model allow relationships between an entity and itself?

45. Does the model allow a "loop" of relationships (A R1 B R2 ... Rn A) either on the type or instance level?

4 CONCLUSIONS

There is a large variety of models that come under the umbrella of the entity-relationship approach. We have proposed a list of characteristics as the basis for distinguishing the various ER models. This list serves, in itself, to illustrate the profuse variety of possible ER models.

This proposed list is tentative; it would take further work to achieve consensus on the appropriate set of questions and answers.

In any form, however, the list is a useful measure of the precision with which an ER model is defined. We suggest that in any work dealing with an ER model, all of the questions should be answered (explicitly or implicitly), to eliminate ambiguity.

Such a list of characteristics also provides a basis for classifying ER models, and for developing precise definitions of such terms as "binary relational model" and "entity-attribute-relationship model".

5 REFERENCES

[Chen] P.P. Chen, "A Preliminary Framework for Entity-Relationship Models", in [ERA2].

[ERA1] Entity-Relationship Approach to Systems Analysis and Design, North Holland, 1980 (P.P. Chen, ed.).

[ERA2] Entity-Relationship Approach to Information Modelling and Analysis, North Holland, 1981 (P.P. Chen, ed.).

[ERA3] Entity-Relationship Approach to Software Engineering, North Holland, 1983 (Davis, Jajodia, Ng, Yeh, eds.).

[Falk] E. Falkenberg, "Concepts for Modelling Information", in G.M. Nijssen, Modelling in Data Base Management Systems, North Holland, 1976. (Proc. IFIP TC-2 Working Conf., Freudenstadt, W. Germany, Jan. 5-9, 1976.)

[ISO] "Concepts and Terminology for the Conceptual Schema and the Information Base", ISO/TC97/SC5 Report N695, March 1982, J.J. van Griethuysen (ed.).

[Kent] W. Kent, "Issues in Semantic Data Modelling", Proc. IEEE COMPCON, Sept. 26-29 1983, Washington DC.