Technical Notes

What is happening in Deform and why
Martin Oehm, 15−10−2005

This document sketches some of the ideas that have gone into the implenentation of deform, my translation of Inform 6 into German. I think there are some interesting ideas; I hope that some of them might be useful for the translation of Inform into other languages.

The document is quite lengthy, I’m afraid, especially towards the end where I’m bitching about some awkward properties both of the German language and of the current Inform library implementation. Both, it has to be said, are not as bad as I describe them, though. ;-)

You’ll need to know the Inform library to understand this document. Knowledge of German is helpful but not required.

1 Output

1.1 Names and Inflection

The normal way to define the output properties of an object is to give it a name in the object header or a short_name property and to specify one of the gender attributes male, female or neuter. Additionally, pluralname and proper alter the name of an object.

In German, the articles have:

So, up to now, everything is easy and done as described in the DM4. I’ve extended DefArt, IndefArt etc. to accept an additional parameter and have created new printing rules.

The problem is with adjectives: Adjectives have a suffix which depends on the same three factors mentioned above. It is not enough to know that my object is a male singular and define it with

short_name "red herring", ...

but I have to mark the place where the inflecion is to be inserted. In deform, I use the low string @00 for this:

short_name "rot@00 Hering", ...

That definition gives me output like so (note the ending of “rot”):

Ein rot·er Hering ist hier.
Die siehst hier einen rot·en Hering.
Der rot·e Hering ist nicht wichtig.

Additionally, some nouns, especially male and plural ones, need a special inflection for some cases. These are stored in the low strings @01 to @04. They are not often needed, and luckily one noun never needs more than one of those.

The low strings work for the plural and list_together properties as well.

In rare cases, the short name has to be defined as a routine for printing the correct noun suffixed by hand.

* * *

Why have the @ notation at all and not just add the suffixes after the short_name? Well, the program doesn’t know which of the strings is actually an adjective or a noun, so the author has to tell. And it is not necessarily the last word that is a noun. In German, adjectives always come before nouns and adverbs are put in front of the corresponding adjective, but there are noun phrases like these:

The @ notation is ugly but universal.

* * *

The official lib used another approach which I have kept as an alternative. The name is defined as a combination of properties:

adj

an array of strings to precede the noun as adjectives, each with the proper suffix

dekl

a numerical value defining the declination type. There were ten such declinations plus the null declination. The huge number of types stems from the wrong thought that the parser should be able to build plurals. It never has to, and thus the ten declination types boil down to five: the null type plus one for each of the low string suffixes @01 to @04.

The appropriate suffix is always added to the short_name (or object name).

suffixes

an array of four suffixes, one for each case, in case none of the predefined declinations matched. Can only be used with a null declination and may be a routine, taking the case as argument.

post

text to be added, like “über Algebra” above.

The article property works as in English, except that one cannot use a string like "a" or "an", because the article depends on the case. A special constant definite has been added which uses the correct definite article. The constant no_article suppresses indefinite articles and only prints text when a definite article is called for. The constant yours prints the corresponding for of “your”, i.e. “dein”, “deine”, “deinen” etc.

(There are no contraction forms in German, so once the gender, number and case are known, the article is found.)

1.2 Plurals and lists

Plurals are very irregular in German, they are usually built by appending a suffix (die Karte / die Karten [cards, maps]; das Kind / die Kinder [children]). That suffix might be nothing (das Messer / die Messer [knives]). These suffixes are irregular.

Sometimes, but not always, vowels are transformed into the corresponding umlaut (Buch / Bücher [books]; Apfel / Äpfel [apples], but: Kuchen / Kuchen [cakes]).

Nontheless, plurals are no problem, because the author specifies the correct one in the plural property, with appropriate suffix low strings if necessary.

The list_together property works likewise. The only restriction is that the group name should be plural when a string is used. That sounds self-evident, but theoretically the group name could be a grammatical singular: “some food (a ham sandwich and a boiled egg)”.

The list writer looks at a global variable short_name_case which is already defined in the library. WriteListFromCase is a routine that passes the case as an extra argument to the list writer.

In order to allow low strings in plural and list_togeher, the list writer has to call a routine to set the low strings according to the output case just before sending a message to these properties.

Nested lists look very clumsy in German. I found it hard to find good translations for “(providing light)” or for “(inside which are ...)”. German nearly never uses gerunds and uses subordinate clauses instead. Besides, the word order in German is quite different from English. The sentence

There is a rucksack, inside which is a rope, here.

would read

A rucksack, in which a rope is, is here.

(Ein Rucksack, in dem ein Seil ist, ist hier.)

in English reordered to resemble a German sentence. I’d still prefer the much more ordered

A rucksack is here. In the rucksack there is a rope.

(Ein Rucksack ist hier. In dem Rucksack ist ein Seil.)

or something like that. Therefore I’ve added a new style bit, APPEND_BIT: output is postponed, pushed onto a stack and processed after the list has been written. This works, but you have to call a function to read the stack after writing lists:

WriteListFromCase(child(x1), my_bits, Nom);
    print "."; WriteSublists(); "";

The “providing light” problem was solved (to some extent) by adding a text constant LIT__TX which passes the problem of finding a suitable translation to the author.

[T.A.G. provides a property contents which allows to adapt the “In the x you can see...” phrase for each container, supporter and NPC.]

2 Input

2.1 Informisation

This is the process of preparing the player’s input and of rearranging it into Informese. This is done in the routine LanguageToInformese(). The German lib makes heavy use of this routine.

  1. Processing umlauts

    In German, there are four special letters, the umlauts 'ä', 'ö' and 'ü' and the sharp double s 'ß', which are very common. All of these are in the standard extended ZSCII character set, but would take up four Z-characters each out of the nine available in dictionary words.

    A solution would be to rearrange the Z-character table with the Zcharacter directive. I’ve chosen another approach: The Z-character set is left as it is, and the umlauts are changed in LTI: 'ä' becomes 'ae', 'ö' is 'oe', 'ü' is 'ue' and 'ß' is 'ss'. This is the way one would write German on a typewriter without accented characters (as in e-mail addresses or in early games of the Commodore C64 era). Crossword riddles use this notation, too, perhaps because it introduces more instances of the most common letter e.

    That way, the umlauts take up two characters, and games can be played on machines with US keyboards. A drawback is that the author has to define all dictionary words in this notation, which makes vacabulary that contains umlauts or ‘ß’ effectively “untypeable”. (A debug action ##LibCheck is provided for checking this.)

    [When discussing this with Fredrik Ramsberg he told me that such an approach would be absolutely unusable for Swedish. There might be ways to circumscribe the Swedish special characters, but they are not unique. In German, they are interchangeable, but the “real” umlauts are normally used. Some proper names for towns and family names (like my own) are traditionally written in the form with ‘e’, but this should not be a problem in parsing.

    I can imagine that for Spanish and Italian (but maybe not so for French), it might be sufficient to strip accents and convert, say, ‘á’ to ‘a’. That might not work for the eñe character, though.]

  2. Remove suffixes

    To make the parsing as easy as possible, the suffixes -e, -em, -en, -er, -es, -n and -s are cut off if that leaves a valid dictionary word. These suffixes can be found in several types of words: The imperative for the second person singular can be formed in two ways for regular verbs: “leg” and “lege” [put] are equivalent, but only 'leg' is defined as a verb. The adjective and noun suffixes are ignored, too. This might be a bit unhappy since it removes information on gender and case, but seems a practical method. Plural suffixes might be cut off, too, interpreting plural names as singular, if there’s no plural defined.

    To illustrate: The word 'pumpen' [to pump or the plural pumps] could have the suffixes ‘-n’ or ‘-en’ cut off. If 'pumpen' is in the dictionary, nothing happens. If 'pumpe' is in the dictionary, the ‘-e’ is cut off, otherwise and if 'pump' is defined the ‘-en’ is removed. If pruning the word cannot result in a dictionary word, the word is left as it is. So, the rule is to cut off the shortest possible suffix (which may be nil) and to leave a dictionary word if any pruning is done.

    The drawback of this is that if you have a 'pumpe' (a pump, as for a bicycle) in your game you’ll probably want to define the verb 'pump' which could be entered as the imperative ‘pump’ or ‘pumpe’. If you use the verb form ‘pumpe’, the ‘-e’ won’t be stripped, and the verb whose definition is ‘pump’ can’t be recognised. A possible solution would be to define an additional verb ‘pumpe’. The recommended solution is to define the ‘pumpe’ as ‘pump’ which would match the verb and the tool alike. The distiction is by context: In the phrase “take empty bucket”, “empty” is never analysed as verb.

    (In fact, that is not true: “empty” may be analysed as a verb if a sentence may be an answer to a disambiguation question. See below.)

  3. Synonyms

    The concept of synonyms is as follows: The author can define a table of dictionary word / string pairs before including Parser.h. LTI then looks through the words, and if the dictionary word of a synonym definiton is found, it is replaced with the corresponding string.

    So if the user defines:

    Array Synonyms table
        'don^t'         "do not"
        'carefully'     "";

    and the input is “don’t”, it is changed to “do not” which might be part of some syntax.

    In German, there are a lot of contractions of prepositions and definite articles, like “ins” meaning “in das” or “im” meaning “in dem”. These are defined in the array LanguageSynonyms in the library file.

    The string can contain any number of words, even none, removing a word completely. (Like “carefully” above, which just ignores the adverb.)

    A similar concept is that of “twins” which are synonyms for pairs of words. In German.h, for example, the two words “bis auf” are replaced by “ausser” [both: except]. This could be used to contract dots with abbreviations like:

    Array Synonyms table
        'mr' './/'      "mr"
        'mrs' './/'     "mrs"
        'dr' './/'      "dr";

    (It was a logical extension to the synonym concept, so it’s implemented anyway.)

  4. Splitting compound words

    In German, nouns can be compound and terribly long. For example, what would be a “front door key” in English is a “frontdoorkey” in German. (“Haustürschlüssel” is the word, which has 16 letters, 18 after replacing the two ‘ü’ with ‘ue’.)

    The dictionary resolution of version 5 z-code games and the default for Glulx is nine z-characters. For example, a match is a 'streichholz'. Okay, two letters are skipped, and the rest is pretty unique, so no harm is done here. But what if you want to distinguish this from its plural, 'streichhoelzer'? There are ways to parse longer words, as described below, but another method would be to split off a “compound head”.

    The principle is much like for synonyms: The user defines a table array CompoundHeads which contains pairs of strings and addtitional data. If the string is matched at the beginning of a word, it is split off and a hyphen is added. So, defining

    Array CompoundHeads table
        "streich" 0;

    splits 'streichholz' into 'streich-' and 'holz' which then can be defined as name entries. If 'hoelzer//p' is defined, this allows to distinguish singular and plural here. This method converts the head of the compound noun effectively to an adjective.

    A second table array defines CompoundTails which clips the string off from the tail. Which version to use depends on the word. If you have many keys, it might be wise to define 'key' as a tail, and if you have many magic items, defining 'magic' as head is the thing to do. (“Magic” not being an adjective, but a part of the compound words, of course, e.g.: “Zauberformel”, “Zauberstab”)

    The drawback here is that if you use a compounds, they are always split, so you have to use the split fragments throughout your game.

    The values for the additional data are 0 and 1 at present. If 1 is defined, the hyphen is left out. (The hyphen is used here to mark the split-off word as adjective, so a 'car-' 'key' is clearly distinguished from the 'car'. The compound heads are usually nouns of their own.)

  5. Handle special punctuation

    Comma, full stop and quotation marks are defined as “word separators” in Inform. That is, these are always treated as a single word, even if they have no spaces around them.

    Other punctuation, like question and exclamation marks or parentheses are not treated specially, so “what is a grue?” does not work. (The DM4 suggests to define 'grue?' as a name, but that’s, well, stupid.)

    As an extension of the system used in the official lib, Deform lets the author chosse between three methods of treating special punctuation. They are activated by mutually exclusive constants:

    IGNORE_PUNCTUATION

    The special punctuation is replaced by spaces.

    REPLACE_PUNCTUATION

    The special punctuation is replaced by full stops

    SEPARATE_PUNCTUATION

    The special punctuation is moved away from adjacent words, and a space is inserted. The punctuation behaves like the “word separators”. With this constant defined, the ##WhatIs grammar would be:

    Verb 'who' 'what'
        * 'is'/'are' scope=Topic          -> WhatIs
        * 'is'/'are' scope=Topic '?//'    -> WhatIs;</li>

    The fourth option is, of course, not to handle the punctuation. The routine Is_punctuation(char) determines which characters are treated specially.

Two entry points, PreInformese and PostInformese, allow the author to write their own LTI routines. Returning true from PreInformese skips the normal LTI process.

Since LTI changes the input in many ways and it is not always clear what the input looks like if seen with the parser’s eyes, a debug command “echo on/off” is defined which echoes the Informised input. (This is separated from the “trace” command, so you don’t have to read reams of output if you just want to check LTI.)

A copy of the original is always kept and used for error output with a marker of word shifts. It could be used for character-wise parsing of codes, maybe. (The example code for writing on the blackboard in Toyshop does this.)

2.2 Parsing noun phrases

Parsing noun phrases in German works like in English. The standard procedure is to read in as many words as possible matching the name properties of an object. The longest match wins. Suffixes are happily ignored.

As mentioned above, the German language makes use of quite long words. One solution is to split words in LTI, but long words can be analysed in a parse_name routine, too. In order to simplify parsing of long words, the routine WordMatch(s) is defined. It takes a string as argument and checks it against the word at wn. If it doesn’t match, the answer is false. If there is a match, wn is moved on and the length of the string, a true value, is returned. This allows for code like so:

Object -> Hauptsicherung "Sicherungsschalter"
  with article definite,
       name 'schalter' 'notstrom' 'hebel',
       parse_name [n;
            while (WordMatch("sicherungsschalter")
                || WordMatch("sicherungshebel")
                || WordInProperty(NextWord(),
                   self, name)) n++;
            return n;
        ],
    has male static;

A second parameter can be passed to WordMatch. If it is true, the length of the word must match, too. (Usually, trailing letters are ignored, so “sicherungshebeln” would be matched by the above code.)

A problem arising in parsing noun phrases is that objects have a GNA, but name entries don’t. That does not show in English, since only the number usually changes, but for most other languages I know this can lead to bizarre dialogue:

Ein Schränkchen hängt an der Wand.

> öffne Schrank
Es [should be: Er] ist abgeschlossen.

Likewise, in Spanish:

En la arena encuentras un pequeño bloque de granito.

>coge la piedra
Cogido. [Should be: Cogida.]

>mírala
No estoy seguro de a qué se refiere “-la”. [Pronoun should refer to “piedra”.]

The female “piedra” [stone] is defined as a male “bloque de granito”, the male “Schrank” [cupboard] as a neuter “Schränkchen”.

The output problem can be resolved by rewriting the vague Inform messages, usually referring to “these” and “those” and “that”, into clear sentences explicitly mentioning the addressed object, like “Has cogido el bloque de granito” or “Das Schränkchen ist abgeschlossen”.

The problem of wrong or unknown pronoun references can only be resolved if the gender or grammatical number is known for each synonym.

Deform has adapted a system used in the official lib: after a dictionary word in name, a gender may be specified. This can be done by defining one of the gender attributes:

name 'schraenkchen' 'schrank' male;

Unfortunately, the compiler admonishes the non-dictionary-word entry male, so alternatively, one of the untypable words 'm.', 'f.', 'n.' and 'p.' may be used. An entry without GNA information “inherits” the one defined by the object’s attributes.

Earlier, a (blank) property changing_gender had to be defined for every object that makes use of this system. Newer versions of deform store this information in a temporary table, so that it is enough to specify the gender information in name without any further requirements.

The printing rules for printing out personal pronouns print them according to the gender stored in that array. The “regular” rules for printing the names reset this value to zero, meaning: use the standard, i.e. the attributes, instead.

Inform defines so-called descriptors parsed before the noun phrase. Decscriptors include articles, numerals, possessive and demonstrative pronouns. When one or more are found, flags are set accordingly, and the word marker is moved on.

deform does not define the possessive pronouns. In my opinion, they are seldom used, and if so, the way Inform treats them is misleading. In Inform these pronouns really refer to possession, i.e. an object marked ‘my’ is valid if it is a child of the player object. But people tend use “my” in an owner sense: my purse is still my purse, even if it’s lying on the table or if my [sic!] sister is holding it. So if you want to allow possessive pronouns in deform, you have to define them as names. (The validity of possessives is, as for all Inform descriptors, only checked when there are ambiguities.)

Another reason for this decision is that the word ‘ihr’ (like the English ‘her’) can be both a possessive and personal pronoun, so that in phrases like “give her the apple” (or likewise “gib ihr den apfel”), it is understood as possessive and the phrase “her the apple” is interpreted as a single noun phrase.

[TADS 3 knows an owner property, which is quite sneaky and is used to parse ‘his’ or ‘her’ or ‘Rodney’s’.]

The articles are parsed, and Inform determines whether an indefinite or definite article was used. This information is stored in indef_mode as usual. The gender, better: the GNA, of the article is stored, but not much is done with it. (It would be hard to tell exactly anyway in German, since the descriptors have only one general GNA attached. But the articles change with the case and are not unique. The two cases regularly parsed are accusative and dative, and for example ‘den’ can be plural dative or female accusative.)

This regular method works in most cases. Very occasionally, there are two equal words that mean something different depending on which gender is used. The standard examples are “See” (meaning lake when male, sea when female) and “Schild” (meaning sign when neuter, shield or buckler when male). There are more, but they have abstract meanings which are hard to imagine as Inform objects anyway. So, this case must be considered an exception.

In the normal Inform process, the GNA of the article is only looked at when ambiguities arise, and then it get one point in the scoring system described at the end of §33 in the DM4. It is very likely that other circumstances like the objects’ positions override the gender. This is counterintuitive, because if the player clearly addresses one object with its correct article, the parser should not try to disambiguate; the correct article describes an object like an adjective.

The matter might be academic, but I have added a variable parse_noun_from which marks the word at the beginning of the noun phrase. A parse_name routine could then set wn back to that value and look at the descriptors, if desired. (There’s also article_word which stores the value of an article found, if any, but that’s pretty redundant.)

2.3 Parsing verbs

The normal process of parsing verbs is standard. The form used is the imperative in second person singular. This form is formed from the infinitive by leaving off the ending ‘-(e)n’ for regular verbs, like “gehen” [to go] for which the imperative would be “geh” or “gehe”. (Both forms are correct and only the first one is actually defined as verb, since the second one is obtained by cutting off the ending ‘-e’.

So far, so good. Unfortunately this is not the only way to write an instruction. For example in English, the command ‘listen’ can be an imperative addressed to one ore more persons in informal or formal language. It can also be interpreted as “(I) listen” or as “(I want to) listen”. The latter variants are often used to explain the input syntax to newcomers to text adventures.

So, if we consider the command “take apple”, we have:

> nimm den Apfel

In this case, there is no form with ‘-e’, since “nimm” is irregular. (But the non-existing word ‘nimme’ would be recognised anyway.) For the few irragular verbs, the regular form is also defined. I’m not sure whether they are valid imperatives in the strict meaning, but they are used in colloquial speech, and besides they provide the base for further forms. So:

> nehm den Apfel
> nehme den Apfel

The two other forms of imperative are the second and third persons plural. The latter is used in formal, polite speech, and is always plural, even if only one person is being spoken to:

> nehmt den Apfel > nehmen Sie den Apfel

I think it is worth to implement these two forms, as they may be used to address an NPC (or a group of NPCs), and it doesn’t seem natural to address a witness in colloquial speech when acting as a police detective.

There is also the possibility to give an instruction as an infinitive. The form would be:

> den Apfel nehmen

Note that in this case the verb goes to the end. Infinitives in German end in ‘-en’. Exceptions are a few where the ‘e’ is skipped as it wouldn’t be pronounced anyway, they end in ‘-ern’ or ‘-eln’. The two other exceptions are ‘sein’ [to be] and ‘tun’ [to do], both of which are seldom used as imperatives in text adventures. All of these would be recognised provided that for verbs like “klettern” [to climb] two forms, 'klettre' and 'kletter' are defined.

Split verbs are compound in infinitive, so the phrase

> hebe den Apfel auf

(which is a paraphrase of the phrases above, like “pick up” instead of “take”) would become

> den Apfel aufheben

in infinitive. The ‘auf’ really is part of the verb and used to distinguish for example “aufheben” [to pick up] from “anheben” [to lift (barely off the ground)] and “abheben” [to lift off (into the air)]. Usually, the second part of the verb is at the end of one of the defined verb patterns, and if not, there’s always one of the possible patterns which has it. Additionally, the second part is not arbitrary, it is usually a preposition or an adverbial pronoun. deform defines a table array VerbPreps of about 40 such possible second parts.

If the verb is at the end, it is rearranged in LTI. First the parser looks if the last word of a sentence may be a verb, which is denoted by a flag in the dictionary word. If so, that word is moved to the front, everything is re-tokenised and the first word is declared the verb_word.

Otherwise, the parser checks whether it can split off one of the possible words defined in the VerbPreps table. If so, the verb part is moved to the front and the preposition part is left where is was.

For easy shifting, the routine LTI_Delete(i, b) was defined. This routine returns the deleted character c which can then be used in LTI_Insert(i, c, b). (In deform, the buffer into which to insert and from which to delete can be passed as a parameter, hence the b.)

A special situation of verb parsing is done before LTI, where the special verbs “undo” and “oops” are treated. Now, the German equivalent of ‘undo’ would be ‘rückgängig’, a short form of “rückgängig machen”. This is admittedly clumsy but worse than that, there are two umlauts in it. The LTI routine has not yet processed the input, so any umlauts are still in. But since we allow the player to use the ‘ae’, ‘ue’ and so on alternatively, we should define

'rückg[ä]ngig'
'rückga|engig'
'rueckg[ä]ngig'
'rueckgaen|gig'

The pipes and brackets denote the nine character limit. The brackets around the [ä] mean that this limit is within the four z-characters used to reperesent the umlaut. (Since the usual policy is to use ‘ae’ and so on we don’t bother to put them in cheaper parts of the z-character table. In output they also eat up four z-characters. If this seems wasteful, the author can always define them as abbreviations, saving two bytes on every occurrence.)

That leaves us with four of the UNDO*__WDs defined. Given that “rückgängig” is such an awkward word, we’ll want to use the English equivalent “undo”, too. (Actually it seems a good idea to provide the well-known English commands for all meta-verbs.)

So instead of using the very limited way to define these words as constants, I’ve changed the way how these special words are treated to routines, like so:

[ Is_and_word w;
    if (w=='und') rtrue;
    if (w=='sowie') rtrue;
    rfalse;
];

Only the THEN*__WDs are left since THEN1__WD is used very often to refer to the full stop throughout the code, even in Infix.h, which I don’t dare touching. The “then” philosophy doesn’t catch in German anyway, because instead of

> get the bread then eat it
> go north then open door

the word order would be

> get the bread, eat it then
> go north, open then the door

and the “dann” [then] would even be redundant.

Another thing where verb parsing is not as easy as it seems is after questions asked by the parser like “What do you want to ...?” or “What do you mean, ...?” If the new input is not an answer but a new command, the question is ignored and that command used instead.

The problem here is that the old command is stored in buffer and parse. Since the answer is pasted in there, these arrays are kept and answers to parser questions are written to parser2 and input2. These are not passed through LTI, so the additional ‘-e’ at the end is not recognised for verbs. Neither are umlauts. The correct solution would be to shove buffer and parse to buffer2 and parse, read the input and write it to buffer and parse as usual, pass it through LTI and paste buffer to buffer2 if needed and then copy back everything to buffer again.

I’ve opted for the quick but incomplete solution of making a sloppy Informisation in LanguageIsVerb by just replacing the umlauts and stripping the ‘-e’. All the jazz of recognising different forms of addressing people and infinitives is skipped.

But even so, things are not made easy. As described above, I’ve introduced the buffer as optional parameter to LTI_Insert. If this parameter is zero, buffer is used as a default. I use DictionaryLookup in order to see if after cutting tails off I’m left with a known word. That routine uses buffer2 as temporary storage, so I can’t use it to informise buffer2. As just the ‘-e’ is cut off here, it’s not so important. The rest is returned as verb_word which is either a known word or zero.

2.4 Parse tokens

Basically, the structure of an imperative sentence is like in English, with a few exceptions, though:

  1. noun can refer to accusative and dative objects which differ in article and adjective endings.

    This is not so much a problem in parsing (where suffixes are ignored at the moment) but in constructing the command in questions like “What do you want to look under?”.

    To resolve that, I’ve added a token dative which flags the subsequent token as dative, i.e. a global variable dative_mode is set to true. After parsing a result token, it is reset to false, meaning accusative.

    I’ve refrained from using the dative_noun proposed in the DM4, since the tokens in Inform do not only carry grammatical information, but mainly information on the context. With the dative token, which always matches and doesn’t eat up words, it is possible to flag a token as using the dative case, and keep the semantics:

    Verb 'oeffne'
        * noun -> Open
        * noun 'mit' dative noun=KeyItem -> Unlock
        * noun 'mit' dative held -> Crack;

    In PrintCommand(), the flag is also set and cleared accordingly and allows a clean printing of the sentence.

    Here’s the dative token in all its glory:

    [ dative;
        dative_mode = true; return GPR_PREPOSITION;
    ];

    The dative_mode is set back at the end of SearchScope(), so parse_name routines can make use of it if desired. (For example to distinguish two objects with the same name but different genders.)

  2. The order of the objects is less strict, so all of

    > schließe mit dem Schlüssel die Tür auf
    > schließe die Tür mit dem Schlüssel auf
    > schließe die Tür auf mit dem Schlüssel

    should be matched. There’s no other way than to define all possibilities. (So maybe there is, but if you want to use the standard Inform parser without too much modification, this is the way to go, I guess. T.A.G. needs just one definition here: that there is a direct object, an indirect object with the preposition ‘mit’ and a split verb whose second part is ‘auf’. The infinitive is ‘aufschließen’.)

  3. Verbs can be reflexive, calling for a reflexive pronoun. This would ordinarily be ‘dich’, the accusative form of second person singular, ‘du’. Depending on the form of imperative, this might also be ‘euch’ (second person plural form of ‘ihr’) or ‘sich’ (third person plural, polite form.)

    In infinitive form, the reflexive may be left out: “auf den Stuhl setzen” (soundig a bit rude, though), or may be ‘mich’, the accusative of first person singular (as in the response to “What do you want to do?” – “I want to ...”)

    This can be achieved by a token dich which matches all of the possible words. A second token dir is the dative form. It is not really a reflexive, but sometimes it is used in (colloquial?) speech: “Sieh dir den Mann an.”, “Nimm dir einen Keks.” It can be left out, though, and usually is.

  4. There are adverbial pronouns like ‘damit’, meaning “with it” where it refers to an inanimate object. So the adverbial pronouns are contractions of a general kind of pronoun (with no explicit gender or case attached) and a preposition. These are fairly common. If a personal pronoun is used in an indirect object, the adverbial pronouns are used, when it is inanimate, and the name of the object is just replaced by the appropriate pronoun if the object is animate. (This is the only case I can think of where the German grammar distinguished between animate and inanimate objects.)

    I’ve got two solutions for this, and I use both. The easy, straightforward solution can only be used in some cases, ‘damit’ is one of them. In LTI, ‘damit’ is split into a perposition and a typeable but unlikely “general pronoun” referring to the last inanimate object. (The word I use is ‘ihm/r’ as a short form of ‘ihm/ihr’, the dative pronouns for ‘him/it’ and ‘her’ respectively.) It had to be a typeable word because is must be parsed regularly as if it belonged to the real vocabulary.

    The second way is a bit more fuzzy. The word ‘hinein’ means “into it”, for example. This could easily be broken up into ‘in’ and ‘ihm/r’, but consider the following sentences:

    1. Lege das Schwert in die Truhe hinein.
    2. Öffne die Truhe, lege das Schwert hinein.

    In phrase (1), ‘hinein’ is just an unecessary appendix, the phrase would have been complete without it. But nevertheless, it is often used. (German is cluttered with such redundancies, even more so in colloquial speech.) In phrase (2), the ‘hinein’ really means ‘into it’, referring to the Truhe [chest]. So what the grammar does is this:

    Verb 'leg'
        * multiexcept 'in' noun prep_hinein   -> Insert
        * multiexcept noun_hinein             -> Insert
        * ...;

    Both patterns are defined. The meaning of the ‘hinein’ is reflected by different tokens. prep_hinein matches ‘hinein’ or a bunch of similar words or no word at all and returns GPR_PREPOSITION.

    noun_hinein matches the last inanimate object if it is present. There’s also held_hinein which matches the last inanimate object and tries to pick it up if it isn’t held. (Actually, held_hinein doesn’t make a lot of sense, but there are held_* tokens which are useful for other prespositions.)

* * *

These are the differences bewteen German and English grammar as far as imperative sentences are concerned. Some problems arise due to the “first match” policy of the Inform parser, so some additional hackery is called for.

Inform has a ##Consult verb, and the ways to use it in English are:

> consult encyclopaedia about Pythagoras
> look up Pythagoras in encyclopaedia
> read about Pythagoras in encyclopaedia

In German, one could say:

> schlage/sieh Pythagoras im Lexikon nach
> schlage/sieh Pythagoras nach im Lexikon
> lies Pythagoras im Lexikon nach
> lies über Pythagoras im Lexikon (nach)
> lies im Lexikon über Pythagoras (nach)
> lies nach im Lexikon über Pythagoras
and so on.

The more or less arbitrary order of the objects and the greedy behaviour of the topic token makes parsing a bit difficult. The first sentence could be parsed by both of the patterns

* topic 'in' 'nach' dative noun -> Consult
* topic 'in' dative noun 'nach' -> Consult

(‘im’ is transformed to “in dem” in LTI.) The second sentence is the one we want, but the first one matches, too, with “Pythagoras im Lexikon” being the consult topic. It’s no use to swap these lines, because that would cause problems the other way round.

What makes this really annoying is that the compound verbs used for ##Consult have a meaning when used just with an object and without the “nach”: “schlage”, “sieh” and “read” mean “hit”, “look” and “read” respectively, so saying “schlage Grfml” would try (and succeed) to match the nonsense word ‘grfml’ as topic.

To work around this, I use general parsing routines that never move on the word marker wn and either return GPR_FAIL or GPR_PREPOSITION. These routines enforce the presence of 'nach' for 'schlag' and 'sieh', and they enforce the correct order of 'in' and 'nach'. A pain in the neck, but it works.

Luckily, the grammar for ##Ask and ##Tell is just like in English. (Come to think of it, it might not be a bad idea to allow to switch off the consulting verbs entirely with a constant.)

The seemingly clever invention of the dative token caused problems with multiinside and multiexcept which use a lookahead. There is a mechanism for parsing these tokens when the lookahead has failed, but it doesn’t seem to work properly. And the conditions for a proper lookahead are that after the multiinside and multiexcept tokens, there has to be a preposition, and that it is followed by a noun token.

So, in German, there might be one of the noun_hinein or similar tokens following, and in the case of a preposition following, the next token might not be noun but dative.

The lib now uses the replaceable routines Lookahead_Skip, Lookahead_Parse and Lookahead_Valid to override these conditions. (The noun_hinein token can match equally well as a preposition, since it matches a particular word, only that the return value is an object and not GPR_PREPOSITION. And the dative token may be skipped, too. And whether other tokens than noun can be allowed is just a matter of expanding the code.)

Above, when describing the Informisation, it was said that in the phrase “take empty bucket”, “empty” was never checked as a verb. That is not entirely true: “empty” is analysed as verb if a sentence may be an answer to a question asked by the parser in order to disambiguate between several objects. I’ve tweaked LanguageVerbMayBeName so that it considers “empty” as part of a noun phrase if any of the name entries of possible answers contains “empty”. This allows for dialogue like so:

> take bucket
Which bucket do you mean? The empty bucket or the full bucket?

> empty bucket
Taken.

> drop tool
Which tool do you mean? The awl or the spanner?

> empty bucket
You turn the bucket over and try to make a neat little puddle, but the bucket is empty.

... if there is a visible object whose name contains both “empty” and “bucket”.