What is DATR?
DATR is a language for lexical knowledge representation. The lexical knowledge is encoded in a network of nodes. Each node has a set of attributes encoded with it. A node can represent a word or a word form.
DATR was developed in the late 1980s by Roger Evans, Gerald Gazdar, and Bill Keller, and used extensively in the 1990s; the standard specification is contained in the Evans and Gazdar RFC, available on the Sussex website. DATR has been implemented in a variety of programming languages, and several implementations are available on the internet, including an RFC-compliant implementation at the Bielefeld website.
DATR is still used for encoding inheritance networks in various linguistic and non-linguistic domains and is under discussion as a standard notation for the representation of lexical information.
What is DATR used for?
DATR is a simple, spartan language for defining nonmonotonic inheritance networks with path/value equations, one that has been designed specifically for lexical knowledge representation. In keeping with its intendedly minimalist character, it lacks many of the constructs embodied either in general-purpose knowledge representation languages or in contemporary grammar formalisms.
The DATR theory makes use of some standard abbreviatory devices that enable nodes and/or paths to be omitted in certain cases. For example, sets of sentences relating to the same node are written with the node name implicit in all but the first-given sentence in the set. The theory defines the properties of seven nodes:
- An abstract Verb node,
- EnVerb node,
- Aux node,
- Modal node,
- Abstract lexeme Walk,
- Mow lexeme
- Can lexeme
Each node is associated with a collection of definitional sentences that specify values associated with different paths. This specification is achieved either explicitly, or implicitly. Values given explicitly are specified either directly, by exhibiting a particular value, or indirectly, in terms of local and/or global inheritance. Implicit specification is achieved via DATR's default mechanism.
As an example, the definition of the Verb node gives the values of the paths (syn cat) and (syn type) directly, as verb and main, respectively. Similarly, the definition of Walk gives the value directly as walk.
On the other hand, the empty path at Walk is given indirectly, by local inheritance, as the value of the empty path at Verb. Note that in itself, this might not appear to be particularly useful, since the theory does not provide an explicit value for the empty path in the definition of Verb. However, DATR's default mechanism permits any definitional sentence to be applicable not only to the path specified in its left-hand-side but also for any rightward extension of that path for which no more specific definitional sentences exist.
What are DATR models?
To the first level of approximation, the DATR theory can be understood as a representation of an inheritance hierarchy (a 'semantic network'). Nodes can be written as labeled boxes, and arcs correspond to (local) inheritance.
Thus, the node Can inherit from Modal which inherits from Aux which in turn is a Verb. The hierarchy provides a useful means of visualizing the overall structure of the lexical knowledge encoded by the DATR theory. However, the semantic network metaphor is of far less value as a way of thinking about the DATR language itself.
What’s more, the DATR language includes constructs that cannot be visualized in terms of simple networks of nodes connected by (local) inheritance links. Global inheritance, for example, has a dynamic aspect that is difficult to represent in terms of static links. Similar problems are presented by both string values and evaluable paths. Our conclusion is that the network metaphor is of primary value to the DATR user. In order to provide a satisfactory, formal model of how the language 'works' it is necessary to adopt a different perspective.
DATR theories can be viewed semantically as collections of definitions of partial functions ('nodes' in DATR parlance) that map paths onto values. A model of a DATR theory is then an assignment of functions to node symbols that is consistent with the definitions of those nodes within the theory. This picture of DATR as a formalism for defining partial functions is complicated by two features of the language, however. First, the meaning of a given node depends, in general, on the global context of interpretation, so that nodes do not correspond directly to mappings from paths to values, but rather to functions from contexts to such mappings. Second, it is necessary to provide an account of DATR's default mechanism. It will be convenient to present our account of the semantics of DATR in two stages.
What are the constraints of DATR?
DATR is an `untyped' inheritance network representation language, in the sense that all extensional values are of the same basic type (sequences of atoms), and there is no restriction on what possible extensional values can be reasonably derived for a given path. `Typed' languages, on the other hand, constrain an entity to have (or represent) some value of the appropriate type. Characters, strings, integers, and integer subranges are familiar examples of types. `Typed feature structures' are those which are obliged to satisfy some set of `type constraints', typically a logical formula consisting of disjunctions, negations, equalities, and logical connectives.
The type of constraints with which we are concerned in this paper are intra-lexical, that is, they constrain a single lexical entry, not some projection of it (e.g., a phrase). They may be used for consistency checking in the lexicon, but they are not constraints on composite feature structures. An external constraint grammar (such as HPSG) imposes constraints on relations between feature structure constituents of separate feature structures.
DATR is designed only to represent lexical items, so we may say that any constraints evaluable in DATR are intra-lexical-item constraints. These are not without value, as certain types of constraints are inheritable within and apply to the lexical hierarchy itself. In particular, lexical signs (or entries) denied with respect to a type subsumption hierarchy must satisfy type constraints corresponding to the type hierarchy.