The improbably complex Tagalog verb

Here is an excerpt from one of our papers that describe how TBTA can be used in Tagalog, the national language in the Philippines. I hope anybody can read through it with some amount of interest, but it’s written with linguistic geeks in mind.

Tagalog is a difficult language! Not only is it difficult for second language learners, but it is problematic in several ways for computational systems. In this section we will present the basics of the Tagalog verb and discuss several factors that can cause difficulties in computational systems. In the next section we will describe the LA implementation.

Screen Shot 2016-05-01 at 4.33.19 PMAn extremely important note to begin with: this linguistic description is not meant to be authoritative in any way. The Bible Translator’s Assistant (TBTA) team distilled these main points during approximately one month of working through Schachter & Otanes’ Tagalog Reference Grammar [12]. It is inevitable that our understanding is incomplete and in some cases completely incorrect. We found it interesting to see how far we could get implementing a language from a thorough reference such as Schachter and Otanes. This section, therefore, represents the starting point from which the real work will begin. We want to emphasize that the “real” work will be done by Philippine linguists, researchers and students. We do believe, however, that we can accurately describe several aspects of the language that make it difficult computationally and subsequently show how TBTA can handle those complexities with relative ease.

At the clause level, Tagalog is a focus or topic oriented language as opposed to being organized by grammatical roles such as Subject and Direct Object. The topic usually contains “old information;” that is, objects, or participants that have been previously introduced into the context. It can be approximated in English by these three situations surrounding a “give” event:

  • Agent focus: some man did something… It was the man who gave some money to a girl. (“the man” is topic)
  • Object focus: something happened concerning some money… It was the money that a man gave to a girl. (“the money” is topic)
  • Direction focus: some girl is being talked about… It was the girl to whom a man gave some money. (“the girl” is topic)

The speaker determines which constituent of the clause is in focus and places that constituent in the topic position. There are a variety of heuristics for choosing a topic; we will briefly discuss some of them in the next section. Obviously there are other important clause-level phenomena in Tagalog (word order, marking the nominal constituents, etc), but our focus in this paper is on the verb. Furthermore, this paper will only describe basic sentences with one of the three types of focus shown below; Tagalog has other, more complicated constructions that we will not cover.

The verb is marked by an affix to show which nominal constituent is the topic. There is a relatively complicated system of affixes to accomplish this. Schachter and Otanes divide verbs into three general classes:[1]

  1. object verbs. For example, kita (to see)
  2. directional verbs. masid (to look at); talo (to win against)
  3. double object verbs. bigay (to give)

In general, object verbs correspond to English verbs that take a direct object; directional verbs correspond to English verbs that take an indirect object or prepositional phrase argument; and double object verbs correspond to ditransitive English verbs. But these distinctions are very approximate.

An object verb can be marked with either actor focus (AF) or object focus (OF). A directional verb can be marked with either actor focus (AF) or directional focus (DF). And a double object verb can be marked with actor focus (AF), object focus (OF) or directional focus (DF). There are approximately 15 common affixes (including prefixes, infixes, suffixes, and combinations of prefixes and suffixes) used to mark the focus type of a verb. Each Tagalog verb uses a particular set of affixes; Schachter and Otanes call this set the affix correspondence set of the verb. Table 1 shows the affix correspondence classes for several verbs.

To recapitulate: the speaker chooses which constituent is in focus; the verb is then marked with the appropriate affix to indicate whether it is AF, OF or DF.

Complication 1. Some verbs have irregular forms when the -an or -in suffix is used. For example, the kinig base becomes kingg. Note: Tagalog orthography represents the velar nasal as “ng”. To further complicate matters, the addition of aspect sometimes deletes the -in focus suffix, which causes the stem to revert back to its regular form. This will be covered in complication 5 below.

Table 1: Affix correspondence class for select verbs

  Type AF OF DF
hiwa (to cut) object mag- -in  
kinig (to listen to) object ma- pa-…-an  
tahi (to sew) object -um- -in  
kita (to see) object maka- ma-  
himasok

(to meddle in)

directional mang-   pang-…-an
tuto (to learn) object ma- ma-…-an  
talo (to win against) directional mang-   ma-
bahagi (to apportion) object mang- ipang-  
kailangan (to need) object mang- -in  
bigay (to give) double obj mag- i- -an
kudkod (to grate) object mag- -in  
damdam (to feel) object maka- ma-…-an  
tuto (to learn) object ma- ma-…-an  

Aspect. Tagalog does not mark tense in a traditional sense. Instead it uses an aspectual system with the following possible values: contemplative, imperfective and perfective.[2] These aspects are marked on the verb in the following manner:

  • Contemplative is marked by reduplication.
  • Imperfective is marked by reduplication and by adding N-.
  • Perfective is marked by adding N-.

Reduplication. The contemplative and imperfective aspects include reduplication. Tagalog syllables are CV or CVC.[3] Reduplication associated with aspect copies only the initial CV. Thus hiwa -> hihiwa and kudkod -> kukudkod.

Complication 2. In some cases, the base is reduplicated before the focus prefix is added. In other cases, the focus prefix is added first and then it is reduplicated. We have made a first approximation of a rule to describe these two possibilities: if a focus-related prefix is one syllable long (or there is no focus-related prefix; i.e. an infix or suffix only is used), then reduplicate the base before adding the prefix. If the focus prefix is more than one syllable, then add the focus prefix to the base before reduplication (the last syllable of the focus prefix should then be reduplicated). For example, maka-kita (where maka- is the focus prefix) -> makaka-kita.

Complication 3. Focus prefixes that end in “ang-“ (i.e. ipang-, mang- and pang-) cause a morphophonemic alternation in the base before reduplication. This occurs even though, in a computational sense, the reduplication must occur before the prefix is added. Thus:

mang- + redup(bahagi) -> ma- + redup(mahagi) -> ma-mamahagi

mang- + redup(talo) -> ma- + redup(nalo) -> ma-nanalo[4]

mang- + redup(kailangan) -> ma- + redup(ngailangan) ->                  ma-ngangailangan

The N- Aspectual morpheme. The imperfective and perfective aspects add the N- morpheme.

Complication 4. The realization of N- is complex.

  • If the focus prefix begins with “m”, simply change this to “n”. For example, N- + ma-kikinig -> na-kikinig
  • If the focus infix is -um-, delete N-. Thus, N- + t-um-atahi -> t-um-atahi
  • If the focus prefix is i-, then if the base starts with an L (l,r,w or y) or H (h or the glottal stop), add ni- after the focus prefix.[5] Thus N- + i-walisan -> i-ni-walisan. But if the base does not start with L or H, add the -in- infix after the first letter of the base. Thus N- + i-bigay -> i-b-in-igay.
  • If the focus prefix is not i- but it begins with “i” (for example ipag-), then insert the -in- infix after the second letter of the focus prefix. For example, N- + ipag-bili -> ip-in-ag-bili.
  • Otherwise if the “basic form” (base plus any focus affix) begins with L, add ni- (we could not find examples of this case). Otherwise, add the -in– infix after the first letter of the basic form. Thus, N- + bigay-an -> b-in-igay-an.

Overview of complications so far. From a computational point of view, it would be nice to have the following series of actions that apply to the base to form the imperfective:

  1. If there is a focus suffix, substitute the irregular base if necessary. For example, kinig -> kingg
  2. Add any focus suffix to the base. For example, kingg + -an -> kingg-an.
  3. Reduplicate initial CV of base: kikingg-an
  4. Add the focus prefix: pa-kikingg-an
  5. Add N-: p-in-a-kikingg-an

Screen Shot 2016-05-01 at 4.36.25 PM.pngUnfortunately, such a straightforward series of actions is impossible. Complication 1 shows that we cannot substitute the irregular base solely on the basis of the presence of a focus suffix because some aspects will delete the -in focus suffix. Complication 2 shows that in some cases the prefix is duplicated instead of the base. Complication 3 shows that some prefixes will cause a change to the base before reduplication of the base (but before that prefix is actually added!). Complication 4 shows that the realization of N- can affect the prefix or the base. Therefore sometimes it needs to be applied before the prefix is added, sometimes after.

In summary, you need to know the focus prefix before you can reduplicate. But you cannot actually add the focus prefix until after reduplication in some cases. You cannot reduplicate until you know the correct base, but some focus suffixes cause irregular bases, but then again some aspects remove the suffix and the base reverts back to the regular form (see complication 5).

Complication 5. The addition of N- by imperfective and perfective aspects cause an -in focus suffix to be deleted. This in turn causes any irregular bases to revert back to their regular form. Thus sunod + -in -> sundin, but the perfective is N- + sundin which deletes the -in suffix causing the base to revert to its regular form, giving N- + sunod -> sinunod.

Complication 6. There are at least two morphophonemic rules that operate on top of all this:

  1. A word final or initial d in the base -> r when surrounded by vowels. Thus the contemplative, object focus (OF) of kudkod is kukudkod + -in -> kukudkur-in. (The next morphophonemic rule changes the final “o” to “u”). Note also the interaction of complication 5 when we move to the imperfective of kudkod: N- + kukudkur-in -> k-in-udkod (the -in suffix is deleted which removes the environment for the d->r change).  Also, ma- + damdam -> ma-ramdam. Notice that this occurs after reduplication so we get ma-dadamdam -> ma-radamdam, not *ma-raramdam.
  2. o -> u before -in or -an, and oC -> uC in the same environment. Thus the kukudkod + -in -> kukudkur-in example above.

The following verbs exemplify most of these processes and complications. All of these verbs were produced automatically by the current LA implementation. For each verb base we show the contemplative, imperfective and perfective for each type of focus. See section 5 for several sample derivations.

hiwa (to cut; N- changes m to n; N-> -in-; -in deleted with -in-)

contemplative      imperfective                perfective

AF: Mag-hihiwa         Nag-hihiwa                 Nag-hiwa

OF: Hihiwa-in             H-in-ihiwa H-in-iwa

kinig: (to listen to; irregular base with -an)

AF: Ma-kikinig            Na-kikinig Na-kinig

OF: Pa-kikingg-an       Pina-kikingg-an            Pina-kingg-an

tahi: (to sew; -um- deleted in contemplative; N- deleted with -um-)

AF: Tatahi                  T-um-atahi                   T-um-ahi

OF: Tatahi-in              T-in-atahi   T-in-ahi

kita: (to see; 2 syllable prefix reduplicated instead of base)

AF: Makaka-kita          Nakaka-kita                  Naka-kita

OF: Ma-kikita              Na-kikita                      Na-kita

himasok:  (to meddle in, oC -> uC before -in or -an)

AF: Mang-hihimasok Nang-hihimasok  Nang-himasok

DF:Pang-hihimasuk-an Pinang-hihimasuk-an Pinang-himasuk-an

tuto (to learn; o-v -> u-v)

AF: Ma-tututo             Na-tututo                     Na-tuto

OF: Ma-tututu-an        Na-tututu-an                Na-tutu-an

talo: (to win against; uses mang- + alveolar -> ma- + n; stem change before reduplicating stem)

AF: Ma-nanalo           Na-nanalo Na-nalo

DF: Ma-tatalo             Na-tatalo                     Na-talo

bahagi: (to apportion; uses mang- + bilabial -> ma- + m; ipang- + bilabial -> ipa- + m; stem change before reduplicating stem)

AF: Ma-mamahagi     Na-mamahagi             Na-mahagi

OF: Ipapa-mahagi       Ipinapa-mahagi           Ipina-mahagi

kailangan: (to need; uses mang- + velar -> nasalize velar then duplicate if necessary)

AF: Ma-ngangailangan Na-ngangailangan Na-ngailangan

OF: Kakailangan-in  K-in-akailangan             K-in-ailangan

bigay: (to give to; N- + iC -> iCin…)

AF: Mag-bibigay         Nag-bibigay                 Nag-bigay

OF: I-bibigay               I-b-in-ibigay                  I-b-in-igay

DF: Bibigay-an            B-in-ibigay-an              B-in-igay-an

kudkod (to grate; vd-v -> vr-v;  oC{-in | -an} -> uC{-in | -an} but the latter disappears in imperfective/perfective because the -in disappears).

AF: Mag-kukudkod        Nag-kukudkod            Nag-kudkod

OF: Kukudkur-in         K-in-ukudkod               K-in-udkod

damdam: (to feel; v-dv -> v-rv)

AF: Makaka-ramdam  Nakaka-ramdam           Naka-ramdam

OF: Ma-radamdam-an  Na-radamdam-an  Na-ramdam-an

tuto (to learn; o{-in | -an} -> u{-in | -an})

AF: Ma-tututo             Na-tututo                     Na-tuto

OF: Ma-tututu-an        Na-tututu-an                Na-tutu-an

[1] We will not consider intransitive verbs in this paper. In general they do not complicate the verbal system beyond what is described below.

[2] There is also a recent perfective that we will not cover here.

[3] Syllables that apparently start with a verb have a glottal stop that is not indicated in the orthography.  LA includes the glottal stop in its underlying representation but deletes it in the final translation.

[4] As an example of the difficulty in learning Tagalog, compare this with the DF contemplative of the same verb, talo, which is ma- + redup(talo) -> ma-tatalo.

[5] There is another optional realization (using an –in- infix), but computationally-speaking, we can just choose one.