7. Pointers and Quantifiers

1. Introduction

Pronouns and proper names are generated by objects, which we find already in the nucleus of English messages. To generate nouns and noun phrases—like "water", "some water", "any water", "that water over there"—we require some elaboration. I account for the data here with three elaborations:

type Message
  = ...
  | INDIRECT Int Description Message
  | ENUMERATED Int Multiplicity Message
  | AMASSED Int Proportion Message
  | ...

I will explain descriptions, multiplicities, and proportions presently. Suffice it to say, to begin with, that they are encoded in noun phrases like those just given above. First, a word about the integer variable that these elaborations all take.

Though elaborations apply to messages as a whole, they typically target one specific aspect of the input message. I call this the target of the elaboration. In most cases, this target is always the same. In a couple of cases—the PRIOR and NEGATIVE elaborations—it changes depending on the previous elaborations in the message that it operates on. In the case of the three elaborations we are now examining, the target must be an object in the nucleus, but there are no restrictions on which of the potentially many objects it can be. This itself is an informational choice at the discretion of the speaker. The integer argument, then, represents the target object. Balancing objects are indexed from 0, and so I use -1 to represent the main object. (In the interface, however, more helpful descriptions are given to the user hiding this underlying implementation detail.)

2. Descriptions, Multiplicities, and Proportions

Turning now to the more interesting arguments, descriptions are for describing an object rather than naming it with a pronoun or proper name, and thereby referring to it indirectly (hence my choice of label for the elaboration that takes this argument). Their string outputs include so-called “definite” descriptions like "the king of France" or "the elephants in the room" (a notorious subject of philosophical debate), but also demonstrative descriptions (e.g. "this king of France", "those elephants over there") and relative descriptions (e.g. "my favourite king", "your pet elephants"). The most important part of descriptions is what I call a pointer, responsible for the first word in each of these phrases. It has four possible values:

type Pointer
  = The
  | This
  | That
  | RelatedTo Object

The first three, as you would expect, are encoded in the articles "the", "this", and "that" respectively. The last is encoded in the first relative form of a pronoun ("my", "your", "their", etc.), the pronoun in question being determined by the subsequent object argument. What is signalled here is not the relation of possession in particular (as the standard terminology of the “possessive” pronoun suggests), but merely the idea of a relation more generally. What that relation is, in any particular case, must be gleaned pragmatically. For example, "your book" may encode a reference to the book you wrote, the book you edited, the book you bought me as a present, the book you lent me a little while ago, or perhaps something else besides.

Multiplicites, secondly, are for picking out one or more objects of a particular type, with a view to saying something about each of those objects; for example: "a person", "one person", "two people", "several people", "each person". Proportions, finally, are for picking out a proportion of some set, with a view to saying something about that quantity; for example: "all water", "most water", "enough water". The most important part of multiplicities and proportions alike is what I call a quantifier, responsible for the first word in each of these phrases. Corresponding to the multiplicity/proportion distinction, I classify quantifiers as enumerating or amassing, with the complication—which I will come to in due course—that Some and Any fall under both categories (i.e. they can show up in both multiplicities and proportions). Because of these two quantifiers, it is necessary to have a single type definition; the quantifiers up to and including Some and Any are enumerating, while those following and including them are amassing:

type Quantifier
  = A
  | Integer Int
  | Several
  | Many
  | Each
  | Every
  | Both
  | Some
  | Any
  | All
  | Much
  | Most
  | Enough

It is also possible to have proportions (but not multiplicites) with no quantifier at all. These, in my design, are behind noun phrases that have neither an article nor a determiner at their head: "water", "fresh water", "carnivorous animals", and so on. In these cases the proportion the speaker has in mind is deliberately left vague, and the overall effect is to convey something about the category in question in general, but not in total. For instance, I take it that the message encoded as "Dogs have four legs" is true, even though some dogs have lost a leg or two; while the message encoded as "All dogs have four legs" is false.

Here are the remaining type definitions necessary to complete the picture:

type alias Description =
  ( Pointer, Bool, Haystack )

type alias Multiplicity =
  ( Quantifier, Bool, Haystack )

type alias Proportion =
  ( Maybe Quantifier, Bool, Haystack )

type alias Haystack =
  = ( String, Maybe String, Maybe String )

Where pointers and quantifiers are responsible for the articles, relative pronouns, and determiners at the start of complex noun phrases, the haystack is responsible for most of the rest: the first string argument encodes the category (e.g. "rice"), the second encodes an optional description (e.g. "brown rice"), while the third encodes an optional restriction (e.g. "brown rice in a bag"). These are all instances of unearthed variables that users must encode for themselves, at least for now. We already met properties in the context of counters above; categories are new, but just like properties and adjectives, I have no intention of giving my system a large dictionary of categories and nouns any time soon. Restrictions are much more likely to be dug up in a future update, since they are in fact just balances, precisely like those we have already seen in the nucleus of plain messages. The reason I have not modelled them like this yet is that there is a complication here that will require a fair bit of work to implement. The complication is that, being balances, they include objects; and objects—wherever they appear in a message—can be targeted by one of the elaborations we are investigating in this section. The balance in the haystack encoded as "brown rice in a bag", for example, has itself been subjected to a dose of the ENUMERATED elaboration (whence the noun phrase "a bag"). To keep things from getting too large too soon, therefore, I have left restrictions unearthed for now.

The fundamental categories of person, place, and thing, when coupled with the quantifiers Some, Any, or Every, give rise to abbreviated determiners and nouns: "someone", "somebody", "anywhere", "everything", etc. These abbreviations are triggered in my model when one of these quantifiers is selected, and users enter "one", "body", "where", or "thing" as their category. This, admittedly, is something of a fudge; but until the unearthed category variable is at least partially dug up, it seems like the best solution.

Descriptions, multiplicites, and proportions all have a boolean argument in between the pointer or quantifier and the haystack. When set to True, this triggers (typically) the introduction of the word "other" immediately following the article or determiner; as in, "my other car", "the other elephant in the room", or "every other house in the street". In the case of the abbreviated determiners and nouns just mentioned, it triggers instead the introduction of the word "else": "someone else", "anything else", "everything else".

It is tempting to think of descriptions, multiplicities, and proportions as simply overwriting their target objects, but this is not quite right. First, it is only third person (i.e. Other) objects that can be the target of any of these elaborations, and so the object variable itself must be kept around at least to check that it is of the right type. Secondly, the main noun in the noun phrases generated by these variables must either be singular or plural, and which of the two it is is determined by the underlying object itself. And finally, if the targeted object is the main object, it may still be needed to decide the pronoun for any balancing object set to SameAsMain. Consider:

INDIRECT -1 ( That, "person" ) ( Male, Do "like", [ SameAsMain ] ) )
  -> "That person likes himself."

INDIRECT -1 ( That, "person" ) ( Others, Do "like", [ SameAsMain ] ) )
  -> "Those people like themselves."

The noun phrases "that person" and "those people" are determined jointly by the description and the underlying object; the former delivers the determiner "that" and the noun "person", but the latter sets the forms of these words. And the pronouns "himself" and "themselves" are determined solely by the main object. For these reasons, then, that object must be retained in the overall message, although it is somewhat obscured by the description that elaborates it.

3. Discrete and Continuous Categories

The distinction between multiplicities and proportions is, I hope, intuitively clear. It is related to the distinction between discrete (or “countable”) and continuous (or “uncountable”) categories, in that multiplicities are for the former and proportions are for the latter. The distinction between discrete and continuous categories is not an absolute one, however, and though it admits of clear paradigms on either side, more or less any category can, with sufficient ingenuity, be taken in either sense. Paradigmatically discrete categories include those encoded in "leg", "frog", or "person", and multiplicities involving these are common: "one leg", "two legs", "several frogs", "every person", and so on. Paradigmatically continuous categories include those encoded in "air", "meat", or "water", and proportions involving these are similarly common: "most air", "enough meat", "much water". But continuous stuff can be divided into discrete chunks, and thereby enumerated: "several meats", for example, can be used to refer to several types of meat. On the other side, discrete things can be bundled together into a continuous whole, with a view to saying something about a proportion of that whole: "all people", "most cars", "enough frogs". Typically this goes along with the plural; but the singular is also possible, if there is some way of intelligibly recasting the category as continuous: "enough leg", for example, might be used in talk about a continuous amount of chicken-leg meat.

More precisely, we can note the following general distinction between enumerating and amassing quantifiers: the former insist rigidly on the plurality of their underlying objects, some always taking the singular, others always taking the plural; whereas the latter are flexible in this regard. "All water" and "all waters" are both fine, for example, as are "most meat" and "most meats". But while "one car", "each car", and "every car" are all fine, "one cars", "each cars", and "every cars" are not; conversely, "two cars", "several cars", and "many cars" are fine, but "two car", "several car", and "many car" are not. There is however an exception to this rule: the amassing quantifier Much behaves like an enumerating quantifier, in that it insists rigidly on the singular: "much time" is fine, but "much times" is not. But Much cannot be an enumerating quantifier on semantic grounds: it plainly serves to denote a proportion, rather than a multiplicity. I expect some explanation for this anomaly. At present, the best one I have is based on the (unique) relationship that holds between Much and Many, the latter doing for discrete categories exactly what the former does for continuous ones. Consequently there is nothing for "much times" to convey that isn’t already conveyed by "many times".

By this criterion, Some and Any would be amassing quantifiers: "some water" and "some waters" are both fine, as are "any person" and "any people". On closer inspection, however, it seems that these two quantifiers can be used in enumerations as well as proportions (when used in enumerations, they are like A, Each, and Every in insisting on the singular). The distinction reveals itself when we consider how these quantifiers most naturally interact with (paradigmatically) discrete and continuous categories in the singular: "some water" is most naturally taken as referring to some proportion of water, while "some car" is most naturally taken as referring to some one car; "any tea" may be taken in an enumerated sense, to refer to any (one) type of tea, or in an amassed sense, to refer to any amount of tea.

4. Negation

When they target the main object, the AMASSED and ENUMERATED elaborations may—depending on the quantifier—change the target of the NEGATIVE elaboration from the condition to the multiplicity or proportion itself. The result is either a "not" prefixed to the determiner or, in the case of "some", the replacement of this word with "no". For example:

NEGATIVE ENUMERATED -1 ( Many, "person" ) ( Others, Do "like", [ Female "Grannie" ] )
  -> "Not many people like Grannie."

NEGATIVE ENUMERATED -1 ( Some, "one" ) ( Other, Be, "good enough", [ For, Hearer ] )
  -> "No one is good enough for you."

NEGATIVE AMASSED -1 ( All, "apple" ) ( Others, Be, "red" )
  -> "Not all apples are red."

Some enumerating quantifiers, however, are not negatable in this way: no English sentence begins with "not a(n)", "not several", or "not each". It must be something in the nature of the quantifiers A, Several, and Each that precludes this, but I confess I am not quite able to put my finger on what that something is.

5. Scope Ambiguities

The interaction of the INDIRECT, ENUMERATED, and AMASSED elaborations, both with each other and with other elaborations, is another considerable source of ambiguity, since the order in which these elaborations is applied typically has no effect on the output sentence. Since philosophers are entirely familiar with quantifiers and their scope ambiguities, and since the way in which my model handles the phenomena here is not at all unusual or surprising, I can be relatively brief.

The sentence "Everyone loves someone", for example, admits of two readings: on one reading, Some has widest scope, and the claim is that there is some special person who has the remarkable property of being loved by everyone; on the other, Every has widest scope, and the claim is merely that everyone has some special person in their lives (not necessarily the same person for all). These ambiguities receive exactly the sort of treatment in my system that you would expect: they depend on the order in which the (in this case ENUMERATED) elaborations are applied, something which leaves no mark on the output sentence:

ENUMERATED 0 ( Some, "one" ) (ENUMERATED -1 ( Every, "one" ) ( Other, Do "love", [ Other ] ) )
  -> "Everyone loves someone." -- lucky him

ENUMERATED -1 ( Every, "one" ) (ENUMERATED 0 ( Some, "one" ) ( Other, Do "love", [ Other ] ) )
  -> "Everyone loves someone." -- lucky them

When what exactly is picked out by some description, multiplicity, or proportion depends on the time at which it is picked out, the order in which any elaboration that effects the time of the condition’s satisfaction is applied also matters. For example, the sentence "The president of the United States was a Democrat" has two readings. The first maintains that the current president was, at some point in the past, a Democrat (but has perhaps changed affiliations since). The second maintains that, at some point in the past, the then president was a Democrat (but perhaps the current president is not). As you would expect, this difference boils down, in my system, to the difference between an INDIRECT PAST message (the current president used to be a Democrat) and a PAST INDIRECT message (it used to be that the then president was a Democrat). And as with the past, so with counterfactual possibilities: "The president of the United States could have been a woman" has two readings, a PRIOR PAST PREORDAINED INDIRECT reading (Hilary could have won) and an INDIRECT PRIOR PAST PREORDAINED reading (Donald could have had a sex change).

Though my account of these ambiguities is in essence the familiar and standard one, it is worth pointing out that it is only thanks to the codebreaker’s methodology that it can actually be applied to English. Because ambiguities like these do not arise in formal languages like the predicate calculus, logicians and philosophers have become complacent with regard to them. I defy anyone to account for them in English, however, if English is modelled as a function from sentences to messages. This is for the very simple reason that, to one and the same English sentence, there corresponds more than one message. This is the way of the English code quite generally. As such, it must be modelled as a function from messages to sentences rather than the other way around.

6. Further Work

Here, as elsewhere, much work remains. For example, I cannot yet account for multi-word determiner phrases like "too little" or "too much". I am as yet unsure as to whether these should just be hard-coded as the results of additional quantifiers, or whether it should be possible to construct them out of smaller parts (and if so, how this should be done). I am also unable to account for noun phrases containing more than one article, relative pronoun, or determiner, like "all the king's horses", "some of the time", "enough of Grannie's nonsense", and so on. On the face of it, it is tempting to diagnose these as involving multiple INDIRECT, ENUMERATED, or AMASSED elaborations all applied to the same underlying object, and this may well prove the right analysis. But I simply have not considered the data here enough to venture this hypothesis with any degree of certainty. There is also the puzzle of why the little word "of" creeps into so many of these phrases. For the time being, my model tolerates only one of these three elaborations applied to any object.

I mentioned already in the page on plain messages, section 4 that I cannot yet account for sentences like "I am here / there", still less sentences like "I am somewhere / anywhere / everywhere". I am reasonably confident that the messages resulting in the latter are ENUMERATED elaborations of the messages encoded in the former. But until I have incorporated places into my model, I cannot deal with multiplicities involving them either. There is also the fact that descriptions can—unless the pointer they involve is The—make do with no haystack at all: "this", "that", "these", and "those" make for complete noun phrases in their own right; in the case of the RelatedTo pointer, meanwhile, the absence of a haystack triggers the second relative form of the pronoun instead of the first: "mine", "yours", "hers", etc. This is a relatively easy addition, but as always one must pause somewhere.

I have restricted my implementation of INDIRECT, ENUMERATED, and AMASSED elaborations to the targeting of objects. The time argument for the PAST and PREORDAINED elaborations can also be targeted by the INDIRECT and ENUMERATED elaborations, however, giving rise to phrases like "the day after tomorrow", "your birthday", "one day soon", or "several years ago". The duration argument for the EXTENDED elaboration, furthermore, can apparently be targeted, at least somewhere in its inner workings, by the ENUMERATED and AMASSED elaborations, at least partially accounting for phrases like "for three hours", "for several minutes", or "all day".

The frequency argument of the REGULAR elaboration, for its part, often (if not always) has the air of a proportion of occasions, with an amassing quantifier at its heart: "sometimes" appears to result from the Some quantifier, and "always" from the All quantifier; perhaps "often" results (somewhat less obviously) from the Much quantifier. What is more, when the frequency argument is absent, this has very much the feeling of a proportion of occasions with no quantifier, sharing the same sense of generality discovered in, for instance, "Dogs have four legs". I would not say that the frequency argument is itself a potential target of the AMASSED elaboration; rather, it appears that it already is, in and of itself, a proportion. And if the frequency argument of REGULAR elaborations is a proportion, then the tally argument of SCATTERED elaborations is surely a multiplicity: "once", "twice", "two times", "several times", "many times", etc. My model currently leaves the frequency and tally arguments unearthed, but I have little doubt that the key to digging them up will be the amassing and enumerating quantifiers respectively.