The improbably complex Tagalog verb

Here is an excerpt from one of our papers that describe how TBTA can be used in Tagalog, the national language in the Philippines. I hope anybody can read through it with some amount of interest, but it’s written with linguistic geeks in mind.

Tagalog is a difficult language! Not only is it difficult for second language learners, but it is problematic in several ways for computational systems. In this section we will present the basics of the Tagalog verb and discuss several factors that can cause difficulties in computational systems. In the next section we will describe the LA implementation.

Screen Shot 2016-05-01 at 4.33.19 PMAn extremely important note to begin with: this linguistic description is not meant to be authoritative in any way. The Bible Translator’s Assistant (TBTA) team distilled these main points during approximately one month of working through Schachter & Otanes’ Tagalog Reference Grammar [12]. It is inevitable that our understanding is incomplete and in some cases completely incorrect. We found it interesting to see how far we could get implementing a language from a thorough reference such as Schachter and Otanes. This section, therefore, represents the starting point from which the real work will begin. We want to emphasize that the “real” work will be done by Philippine linguists, researchers and students. We do believe, however, that we can accurately describe several aspects of the language that make it difficult computationally and subsequently show how TBTA can handle those complexities with relative ease.

At the clause level, Tagalog is a focus or topic oriented language as opposed to being organized by grammatical roles such as Subject and Direct Object. The topic usually contains “old information;” that is, objects, or participants that have been previously introduced into the context. It can be approximated in English by these three situations surrounding a “give” event:

  • Agent focus: some man did something… It was the man who gave some money to a girl. (“the man” is topic)
  • Object focus: something happened concerning some money… It was the money that a man gave to a girl. (“the money” is topic)
  • Direction focus: some girl is being talked about… It was the girl to whom a man gave some money. (“the girl” is topic)

The speaker determines which constituent of the clause is in focus and places that constituent in the topic position. There are a variety of heuristics for choosing a topic; we will briefly discuss some of them in the next section. Obviously there are other important clause-level phenomena in Tagalog (word order, marking the nominal constituents, etc), but our focus in this paper is on the verb. Furthermore, this paper will only describe basic sentences with one of the three types of focus shown below; Tagalog has other, more complicated constructions that we will not cover.

The verb is marked by an affix to show which nominal constituent is the topic. There is a relatively complicated system of affixes to accomplish this. Schachter and Otanes divide verbs into three general classes:[1]

  1. object verbs. For example, kita (to see)
  2. directional verbs. masid (to look at); talo (to win against)
  3. double object verbs. bigay (to give)

In general, object verbs correspond to English verbs that take a direct object; directional verbs correspond to English verbs that take an indirect object or prepositional phrase argument; and double object verbs correspond to ditransitive English verbs. But these distinctions are very approximate.

An object verb can be marked with either actor focus (AF) or object focus (OF). A directional verb can be marked with either actor focus (AF) or directional focus (DF). And a double object verb can be marked with actor focus (AF), object focus (OF) or directional focus (DF). There are approximately 15 common affixes (including prefixes, infixes, suffixes, and combinations of prefixes and suffixes) used to mark the focus type of a verb. Each Tagalog verb uses a particular set of affixes; Schachter and Otanes call this set the affix correspondence set of the verb. Table 1 shows the affix correspondence classes for several verbs.

To recapitulate: the speaker chooses which constituent is in focus; the verb is then marked with the appropriate affix to indicate whether it is AF, OF or DF.

Complication 1. Some verbs have irregular forms when the -an or -in suffix is used. For example, the kinig base becomes kingg. Note: Tagalog orthography represents the velar nasal as “ng”. To further complicate matters, the addition of aspect sometimes deletes the -in focus suffix, which causes the stem to revert back to its regular form. This will be covered in complication 5 below.

Table 1: Affix correspondence class for select verbs

  Type AF OF DF
hiwa (to cut) object mag- -in  
kinig (to listen to) object ma- pa-…-an  
tahi (to sew) object -um- -in  
kita (to see) object maka- ma-  

(to meddle in)

directional mang-   pang-…-an
tuto (to learn) object ma- ma-…-an  
talo (to win against) directional mang-   ma-
bahagi (to apportion) object mang- ipang-  
kailangan (to need) object mang- -in  
bigay (to give) double obj mag- i- -an
kudkod (to grate) object mag- -in  
damdam (to feel) object maka- ma-…-an  
tuto (to learn) object ma- ma-…-an  

Aspect. Tagalog does not mark tense in a traditional sense. Instead it uses an aspectual system with the following possible values: contemplative, imperfective and perfective.[2] These aspects are marked on the verb in the following manner:

  • Contemplative is marked by reduplication.
  • Imperfective is marked by reduplication and by adding N-.
  • Perfective is marked by adding N-.

Reduplication. The contemplative and imperfective aspects include reduplication. Tagalog syllables are CV or CVC.[3] Reduplication associated with aspect copies only the initial CV. Thus hiwa -> hihiwa and kudkod -> kukudkod.

Complication 2. In some cases, the base is reduplicated before the focus prefix is added. In other cases, the focus prefix is added first and then it is reduplicated. We have made a first approximation of a rule to describe these two possibilities: if a focus-related prefix is one syllable long (or there is no focus-related prefix; i.e. an infix or suffix only is used), then reduplicate the base before adding the prefix. If the focus prefix is more than one syllable, then add the focus prefix to the base before reduplication (the last syllable of the focus prefix should then be reduplicated). For example, maka-kita (where maka- is the focus prefix) -> makaka-kita.

Complication 3. Focus prefixes that end in “ang-“ (i.e. ipang-, mang- and pang-) cause a morphophonemic alternation in the base before reduplication. This occurs even though, in a computational sense, the reduplication must occur before the prefix is added. Thus:

mang- + redup(bahagi) -> ma- + redup(mahagi) -> ma-mamahagi

mang- + redup(talo) -> ma- + redup(nalo) -> ma-nanalo[4]

mang- + redup(kailangan) -> ma- + redup(ngailangan) ->                  ma-ngangailangan

The N- Aspectual morpheme. The imperfective and perfective aspects add the N- morpheme.

Complication 4. The realization of N- is complex.

  • If the focus prefix begins with “m”, simply change this to “n”. For example, N- + ma-kikinig -> na-kikinig
  • If the focus infix is -um-, delete N-. Thus, N- + t-um-atahi -> t-um-atahi
  • If the focus prefix is i-, then if the base starts with an L (l,r,w or y) or H (h or the glottal stop), add ni- after the focus prefix.[5] Thus N- + i-walisan -> i-ni-walisan. But if the base does not start with L or H, add the -in- infix after the first letter of the base. Thus N- + i-bigay -> i-b-in-igay.
  • If the focus prefix is not i- but it begins with “i” (for example ipag-), then insert the -in- infix after the second letter of the focus prefix. For example, N- + ipag-bili -> ip-in-ag-bili.
  • Otherwise if the “basic form” (base plus any focus affix) begins with L, add ni- (we could not find examples of this case). Otherwise, add the -in– infix after the first letter of the basic form. Thus, N- + bigay-an -> b-in-igay-an.

Overview of complications so far. From a computational point of view, it would be nice to have the following series of actions that apply to the base to form the imperfective:

  1. If there is a focus suffix, substitute the irregular base if necessary. For example, kinig -> kingg
  2. Add any focus suffix to the base. For example, kingg + -an -> kingg-an.
  3. Reduplicate initial CV of base: kikingg-an
  4. Add the focus prefix: pa-kikingg-an
  5. Add N-: p-in-a-kikingg-an

Screen Shot 2016-05-01 at 4.36.25 PM.pngUnfortunately, such a straightforward series of actions is impossible. Complication 1 shows that we cannot substitute the irregular base solely on the basis of the presence of a focus suffix because some aspects will delete the -in focus suffix. Complication 2 shows that in some cases the prefix is duplicated instead of the base. Complication 3 shows that some prefixes will cause a change to the base before reduplication of the base (but before that prefix is actually added!). Complication 4 shows that the realization of N- can affect the prefix or the base. Therefore sometimes it needs to be applied before the prefix is added, sometimes after.

In summary, you need to know the focus prefix before you can reduplicate. But you cannot actually add the focus prefix until after reduplication in some cases. You cannot reduplicate until you know the correct base, but some focus suffixes cause irregular bases, but then again some aspects remove the suffix and the base reverts back to the regular form (see complication 5).

Complication 5. The addition of N- by imperfective and perfective aspects cause an -in focus suffix to be deleted. This in turn causes any irregular bases to revert back to their regular form. Thus sunod + -in -> sundin, but the perfective is N- + sundin which deletes the -in suffix causing the base to revert to its regular form, giving N- + sunod -> sinunod.

Complication 6. There are at least two morphophonemic rules that operate on top of all this:

  1. A word final or initial d in the base -> r when surrounded by vowels. Thus the contemplative, object focus (OF) of kudkod is kukudkod + -in -> kukudkur-in. (The next morphophonemic rule changes the final “o” to “u”). Note also the interaction of complication 5 when we move to the imperfective of kudkod: N- + kukudkur-in -> k-in-udkod (the -in suffix is deleted which removes the environment for the d->r change).  Also, ma- + damdam -> ma-ramdam. Notice that this occurs after reduplication so we get ma-dadamdam -> ma-radamdam, not *ma-raramdam.
  2. o -> u before -in or -an, and oC -> uC in the same environment. Thus the kukudkod + -in -> kukudkur-in example above.

The following verbs exemplify most of these processes and complications. All of these verbs were produced automatically by the current LA implementation. For each verb base we show the contemplative, imperfective and perfective for each type of focus. See section 5 for several sample derivations.

hiwa (to cut; N- changes m to n; N-> -in-; -in deleted with -in-)

contemplative      imperfective                perfective

AF: Mag-hihiwa         Nag-hihiwa                 Nag-hiwa

OF: Hihiwa-in             H-in-ihiwa H-in-iwa

kinig: (to listen to; irregular base with -an)

AF: Ma-kikinig            Na-kikinig Na-kinig

OF: Pa-kikingg-an       Pina-kikingg-an            Pina-kingg-an

tahi: (to sew; -um- deleted in contemplative; N- deleted with -um-)

AF: Tatahi                  T-um-atahi                   T-um-ahi

OF: Tatahi-in              T-in-atahi   T-in-ahi

kita: (to see; 2 syllable prefix reduplicated instead of base)

AF: Makaka-kita          Nakaka-kita                  Naka-kita

OF: Ma-kikita              Na-kikita                      Na-kita

himasok:  (to meddle in, oC -> uC before -in or -an)

AF: Mang-hihimasok Nang-hihimasok  Nang-himasok

DF:Pang-hihimasuk-an Pinang-hihimasuk-an Pinang-himasuk-an

tuto (to learn; o-v -> u-v)

AF: Ma-tututo             Na-tututo                     Na-tuto

OF: Ma-tututu-an        Na-tututu-an                Na-tutu-an

talo: (to win against; uses mang- + alveolar -> ma- + n; stem change before reduplicating stem)

AF: Ma-nanalo           Na-nanalo Na-nalo

DF: Ma-tatalo             Na-tatalo                     Na-talo

bahagi: (to apportion; uses mang- + bilabial -> ma- + m; ipang- + bilabial -> ipa- + m; stem change before reduplicating stem)

AF: Ma-mamahagi     Na-mamahagi             Na-mahagi

OF: Ipapa-mahagi       Ipinapa-mahagi           Ipina-mahagi

kailangan: (to need; uses mang- + velar -> nasalize velar then duplicate if necessary)

AF: Ma-ngangailangan Na-ngangailangan Na-ngailangan

OF: Kakailangan-in  K-in-akailangan             K-in-ailangan

bigay: (to give to; N- + iC -> iCin…)

AF: Mag-bibigay         Nag-bibigay                 Nag-bigay

OF: I-bibigay               I-b-in-ibigay                  I-b-in-igay

DF: Bibigay-an            B-in-ibigay-an              B-in-igay-an

kudkod (to grate; vd-v -> vr-v;  oC{-in | -an} -> uC{-in | -an} but the latter disappears in imperfective/perfective because the -in disappears).

AF: Mag-kukudkod        Nag-kukudkod            Nag-kudkod

OF: Kukudkur-in         K-in-ukudkod               K-in-udkod

damdam: (to feel; v-dv -> v-rv)

AF: Makaka-ramdam  Nakaka-ramdam           Naka-ramdam

OF: Ma-radamdam-an  Na-radamdam-an  Na-ramdam-an

tuto (to learn; o{-in | -an} -> u{-in | -an})

AF: Ma-tututo             Na-tututo                     Na-tuto

OF: Ma-tututu-an        Na-tututu-an                Na-tutu-an

[1] We will not consider intransitive verbs in this paper. In general they do not complicate the verbal system beyond what is described below.

[2] There is also a recent perfective that we will not cover here.

[3] Syllables that apparently start with a verb have a glottal stop that is not indicated in the orthography.  LA includes the glottal stop in its underlying representation but deletes it in the final translation.

[4] As an example of the difficulty in learning Tagalog, compare this with the DF contemplative of the same verb, talo, which is ma- + redup(talo) -> ma-tatalo.

[5] There is another optional realization (using an –in- infix), but computationally-speaking, we can just choose one.

TBTA #Korean #BibleTranslation of Ruth 1

Several years ago TBTA worked on a #Korean module for TBTA. Below is the #BibleTranslation of Ruth 1. See for the rest of Ruth and other Korean translations, along with translations from other languages. I don’t read Screen Shot 2016-04-23 at 2.11.47 PMKorean, so someone let me know if this font is not working correctly. Also let us know if anybody is interested in helping to put this into format for a children’s ministry (see one of their pics below).

Ruth 1:1 엘리멜렉이라는 남자가 이스라엘에서 살았다. 엘리멜렉의 아내의 이름은 나오미였다. 사사들이 이스라엘을 다스렸을 때 엘리멜렉과 나오미는 이스라엘에서 살았다.

Ruth 1:2 엘리멜렉과 나오미는 아들 두 명이 있었다. 아들 한 명의 이름은 말롞이었다. 그리고 다른 아들의 이름은 기룐이었다. 엘리멜렉과 나오미는 베들레헴이라는 도시에 살고 있는 에브랎 사람들이었다. 베들레헴에 흉녂이 들었기 때문에 그 가족은 베들레헴에서 모압 땅으로 이사하였다.

Ruth 1:3 얼마후에 엘리멜렉은 죽었다. 그래서 나오미는 아들 두 명과 함께 살았다.

Ruth 1:4 나오미의 아들들은 자라서 어른들이 되었고 모압 출생인 여자들과 결혼하였다. 아들 한 명은 오르바라는 여자와 결혼하였다. 그리고 다른 아들은 룻이라는 여자와 결혼하였다. 나오미의 아들들이 여자들과 결혼한 후에 나오미는 아들들과 함께 10 녂 동안 모압에서 살았다.

Screen Shot 2016-04-23 at 2.15.52 PM.pngRuth 1:5 그리고서 말롞과 기룐도 죽었다. 그래서 나오미는 홀로 남았다. 나오미는 남편과 아들들이 없었다.

Ruth 1:6 나오미가 모압에서 사는 동안 어떤 사람이 나오미에게 하나님께서 베들레헴에서 자기 사람들을 돌보고 계시다라고 말하였다. 하나님께서는 이 사람들에게 음식을 주고 계셨다. 그래서 나오미는 베들레헴으로 돌아가기 위해 죾비하였다. 나오미의 며느리들도 나오미와 함께 베들레헴으로 돌아가기 위해 죾비하였다.

Ruth 1:7 그리고서 이 여자 세 명은 모압을 떠나서 유다로 여행하기 시작하였다.

Ruth 1:8 그러나 나오미는 자기 며느리들에게 말하였다. “내 딸들아, 너희들은 모압으로 돌아가야맊 한다. 어머니들의 집으로 돌아가라. 나는 너희들이 나와 내 아들들을 돌봤던 것처럼 여호와께서 너희들을 돌보시기를 바띾다.

Ruth 1:9 나는 여호와께서 너희들에게 새 남편들을 주셔서 너희들에게 평화를 주시기를 바띾다.” 그리고서 나오미는 자기 며느리들에게 입을 맞추었다. 그러자 며느리들은 큰 소리로 울었다.

Ruth 1:10 나오미의 며느리들은 나오미에게 “저희들은 어머니와 함께 어머니의 사람들에게 가고 싶습니다” 라고 말하였다.

Ruth 1:11 그러나 나오미는 말하였다. “아니다. 내 딸들아, 네 집으로 돌아가고 나와 함께 오지 말아라. 나는 너희들을 돌볼 수 없을 것이다. 나는 아들들을 더 낳을 수 없기 때문에 너희들에게 새 남편들을 죿 수 없을 것이다.

Ruth 1:12 그래서 너희들은 부모님의 집으로 돌아가야 한다. 나는 너무 늙었기 때문에 다른 남자와 결혼할 수 없다. 맊일 내가 오늘 밤에 남자와 결혼하더라도 그리고 맊일 내가 아들들을 낳더라도 너희들은 그 아들들이 자라기를 기다릴 수 없을 것이다. 그 아들들이 자라는 동안 어떤 사람도 너희들을 돌보지 않을 것이다.

Ruth 1:13 여호와께서 나를 슬프게 하셨기 때문에 나는 매우 슬프다. 맊일 너희들이 나와 함께 오면 너희들도 매우 슬플 것이다.”

Ruth 1:14 나오미의 며느리들은 다시 울었다. 그리고서 오르바는 나오미에게 입을 맞추었고 나오미와 작별하였고 모압으로 돌아갔다. 그러나 룻은 나오미를 떠나기를 거부하였다.

Ruth 1:15 나오미는 룻에게 말하였다. “오르바는 자기 사람들과 자기 싞들에게 돌아갈 것이다. 너도 네 사람들에게 돌아가야 한다.”

Ruth 1:16 그러나 룻은 나오미에게 말하였다. “저에게 어머니를 떠나라고 말씀하지 마세요. 제가 어머니를 따라가도록 하세요. 맊일 어머니께서 다른 곳으로 여행하시면 저도 그 곳으로 여행할 것입니다. 맊일 어머니께서 그 곳에서 사시면 저도 그 곳에서 살 것이고 어머니의 사람들과 함께 살 것입니다. 그리고 저는 어머니의 하나님을 경배할 것입니다.

Ruth 1:17 저는 어머니께서 돌아가실 곳에서 죽을 것이고 그 곳에 사람들에 의해 묻혀질 것이고 죽을 때까지 어머니와 함께 머무를 것을 여호와께 맹세합니다. 맊일 제가 그 약속을 지키지 않으면 저는 여호와께서 저를 처벌하여 주시기를 부탁할 것입니다.”

Ruth 1:18 그래서 나오미는 룻과 논쟁하는 것을 멈추었고 룻이 자기 부모님의 집으로 돌아가지 않는 것을 이해하였다.

Ruth 1:19 그래서 나오미와 룻은 베들레헴으로 여행하였다. 나오미와 룻이 베들레헴으로 들어갔을 때 베들레헴에 살고 있는 사람들은 흥분되었다. 베들레헴에 살고 있는 여자들은 서로에게 “이 여자는 나오미예요?” 라고 물었다.

Ruth 1:20 그러나 나오미는 사람들에게 말하였다. “나를 나오미라고 부르지 마세요. 여호와께서 나를 슬프게 하셨기 때문에 나를 마라라고 부르세요.” 나오미는 행복이라는 뜻이다. 그러나 마라는 슬픔이라는 뜻이다.

Ruth 1:21 나오미는 말하였다. “나는 베들레헴을 떠났을 때 맋은 것을 가졌어요. 그러나 여호와께서는 나에게서 이 것을 빼앗으셨어요. 그래서 여호와께서 나에게 시렦을 겪게 하셨기 때문에 나를 나오미라고 부르지 마세요.”

Ruth 1:22 그래서 나오미와 룻은 모압에서 베들레헴으로 돌아왔다. 나오미와 룻은 사람들이 정기적으로 보리를 추수하는 시기에 베들레헴에 도착하였다.

Example-based Machine translation??

The nature of blogs is that they are semi-off-the-cuff. So don’t hold me too closely to this one. But The Bible Translator’s Assistant’s methodology is probably closer to a smart example-based machine translation system than to a regular machine translation system. Example-based MT typically uses a huge corpus of bilingual texts to “train” its translation engine. For example, it might use the English Wall Street Journal corpus with an “aligned” Spanish corpus. The paragraphs and sentences are more-or-less in the same order in the two corpora. So a computer program (by the way I was an innovator in this field back in the 1990s at Carnegie Mellon) goes through and figures out that “Microsoft acquired ABC” is translated as “Microsoft adquiere ABC” in Spanish.  It’s easy to see that if the computer ever needed to translate “Microsoft acquired ABC” in a different document at some later date, then the computer would know how to do it. Really easy, right?

The trickier part is using the same input pair of sentences and a little common sense (which is in short supply for computers) to know that “Microsoft acquired DEF” is probably translated “Miscorsoft adquiere DEF” – even though that wasn’t the exact sentence in the original corpus.  And even trickier, given two pairs of sentences:

Microsoft acquired ABC                                      Microsoft adquiere ABC


John went to the store yesterday                     Juan fue a la tienda ayer.

then if in a new document the computer needs to translate “Microsoft acquired DEF yesterday” the computer can guess with some confidence that a good translation would be “Microsoft adquiere DEF ayer.”

So, the whole example-based process is: 1) learning the translations for small phrases (and even words) using the bilingual corpus, 2)  combined with a smart algorithm for stitching together bits of translations into the translation for a whole sentence.

TBTA takes this one step further. One BIG step. We have defined an unambiguous, straight-forward semantic language and are encoding the whole Bible into it. Additionaly, since we designed the semantic language, we know EVERYTHING that needs to be said in it. In fact, we know everything that CAN be said. For example, we know we can say these kinds of things:

John built the building.

John will build the building.

John might build the building.

John finished building the building.


girl with crossed arms
You want to know how to say what?

So what we in TBTA do is go to a target language (that needs an Old Testament translation), and we build our own specialized and highly targeted bilingual corpus. We figure out how to say all the above sentences in the target language, and about 300 other sentences too, which, in their entirety, will let us say ANYTHING WE WANT TO SAY in that target language!! TBTA then utilizes a really smart algorithm to stitch bits and pieces of translations together. For example, if we want to translate “Jesus finished speaking to the crowd” we can take bits of translations that tell us how to handle “finished”, “Jesus”, “speak”, “crowd”, etc, and weave them all together to make a good translation for the whole sentence.

How does that compare to regular machine translation? First of all, “regular” translation will try to translate from – for example – the NIV or KJV (or Greek or Hebrew) scriptures. Any kind of human language is extremely ambiguous and almost impossible for a computer to understand reliably. So that’s the first problem, and it’s a huge one. In fact it makes any kind of regular machine translation of the Bible into smaller languages impossible to do accurately (using “regular” techniques). The second problem is that the linguists and programmers need to try to figure out hundreds of rules that translate English into the target language. This is really only possible for major language pairs (like English and Spanish) for which millions of dollars and hundreds of man-years of research have been invested. Other machine translation approaches try to use statistical techniques similar to example-based translation, but they are impractical for small languages because there is little or no bilingual corpora available between, say, English and the small target language.

OK – I hope that wasn’t too boring. I think it’s cool and I especially think that it is cool that we can use this methodology to accurately and quickly translate the Old Testament for potentially thousands of languages.


Tagalog translation for poor in Manila

Dr. Tod Allman (president and co-founder of TBTA) has been living in Manila with his family for almost two years now. He has worked with native speakers to produce an accurate TBTA model of Tagalog – one of the national languages of the Philippines. (He is also currently working on at least two other small languages which have no Old Testament, and TBTA’s goal and prayer is to expand to many of the country’s 183 languages).

We have partnered with to produce Bible portions in illustrated format. We are working with local Christians to distribute these to poor youth in Manila. The response has been wonderful – thank God for helping us to find this practical method for creating interest in reading His Word amongst some of the neediest children in the world.

Below are some examples of our new Tagalog translation from the book of Ruth. Thanks for praying for and supporting our work in this country! Also, please visit to see English versions of the illustrated Bible.




TBTA Workshop plans in Vanuatu

Steve Beale of TBTA spent three years as a Wycliffe Bible translator’s consultant on Motalava, Vanuatu. We are now returning to Vanuatu with TBTA in tow to help with Old Testament translation in this country with over 100 languages.

We will begin working on Tanna – an island in the south of the country where Summer Institute of Linguistics (SIL) Vanuatu director, Greg Carlson, and his family spent 17 years translating the New Testament for the North Tanna people group. There are several languages on the island, and the plan is for representatives of each language to attend a TBTA workshop for two or three months. The goal is to use TBTA to produce drafts of the narrative sections of the Old Testament. The workshop is scheduled to occur in 2018, but as you can imagine, planning and actual work is already well underway.

How will it work?

Well, first of all there are a number of logistical things for which we ask your prayers. We will need to transport and house several people per language, and then run a workshop somewhere. The participants will be giving up a lot to attend – normal, everyday things like taking care of their families is a never-ending task. It’s not like they can just go on vacation for three months.  So we ask for God’s provision for them along with practical ideas about how best to run the workshop on a day-to-day basis while keeping a balance between 1) maintaining enthusiasm and attention, 2) getting the work done, 3) promoting an atmosphere of worship, and 4) allowing the participants to do what they need to do for themselves and their families.

What about the workshop itself? In a nutshell, here’s what a TBTA “language description” workshop needs to do. We have an unambiguous “Semantic language” that we developed. We are in the process of encoding the Bible into this language. The resulting semantic (or meaning-based) descriptions of the Bible are very accurate, unambiguous and computer-friendly. SO – for each language, X, where we want to translate the Bible, we need to “teach” the computer how to translate between this semantic language and language X. We can accomplish a large part of that process using our “Grammar and dictionary startup.” For example, we know that we need to be able to translate things like this:

John built the house.

John will build the house (today/tomorrow/next year – sometimes they look different)

John wants to build the house.

John started building the house.

John finished building the house.


So in the workshop we will go through all the groups of such related sentences that make up the semantic language. We will ask each native speaking participant to produce a translation for each. A big part of the process will be to ensure that the translations are natural and really “get at” what we want. We will carefully work through each group of sentences, put them in context, use Biblical examples when appropriate, encourage collaboration between the languages (which are all relatively related and will probably say things “in the same way” even if the exact words are different). And of course we will also need to uncover the vocabulary that is necessary to translate the Bible.

My approach to making this a fast-moving, encouraging process is to focus the sessions on what is needed to translate: first a few verses, then a chapter, then a whole book of the Old Testament. We like to use Ruth for this purpose. It is an interesting book, a relatively simple narrative, and it allows us to proceed through the initial stages of “language description” fairly quickly, grounded in actual Scriptural examples, and allows us to keep enthusiasm high.

I estimate that we can get through 90% of the language description process (for these multiple languages) in a relatively short time in such a workshop environment. Then we will start producing drafts of the Old Testament, adding vocab as we go, and “tweaking” our rules and grammars as we see them in action. Along with this will be modules on the editing process and, if necessary and appropriate, translation principles.  By the end of the two or three months we hope and pray that we will have: 1) the language description pretty much completed, 2) drafts of the narrative sections of the Old Testament produced, and 3) a trained group of editors working on the editing and checking stages.

Please pray with us! During the lead-up time to the workshop we will be preparing the workshop materials and methodology, finishing up the semantic representations for the narrative sections of the Old Testament, and fund raising for the costs associated with travel, food and lodging for the workshop participants.

A simple example of how/why The Bible Translator’s Assistant works

Daniel 1:2 “And the Lord gave Jehoiakim king of Judah into his hand.”  KJV

What are the potential difficulties with translating this verse into other languages? Let me count the ways…

  1. “the Lord” – is this a name, a title? Is it referring to God? You probably know the answers, but the computer is stupid. In this case, “The Lord” is the KJV translation of “Yahweh” – God’s name. Now, whether you think this should continue to be translated as something like “Lord” (in our target languages) or if you think that some transliteration of “Yahweh” should be used – in either case, the computer needs to know that this refers to God. It needs to know this for a number of reasons, but one in particular: many languages use a system of honorifics, and God would get the highest honor.
  2. Jehoiakim – the computer needs to know that this is a human name for a male. Some languages will put different affixes on the verb or add different words to the sentence depending on whether the name is for a male or a female.
  3. “Jehoiakim king of Judah” – the computer needs to know that “king of Judah” is a description of “Jehoiakim.”
  4. “Judah” – the computer needs to know that this is the name of a city
  5. “his” – who does this refer to? If you, as a human, had just read verse 1, you would know it refers to Nebuchadnezzar, the king of Babylon. But again, computers are stupid. We need to specifically tell the computer to whom the pronoun is referring. And see the next point for more about “his hand”
  6. X gave Y into Z’s hand: this is an idiom in English and would not make sense in many languages. A literal, word-for-word translation would be confusing (at best) and completely misleading (at worst) – think of Nebby carrying around Jehoiakim in his arms! This is what a “must use word for word translation” approach fails to realize – we need to make the target readers understand the same thing that the original readers understood (in this case, the original Biblical Hebrew language speakers). THAT is meaning-based translation, and it is critical in Bible translation. In this case, we need the computer to understand that the Lord (Yahweh) caused/enabled Nebuchadnezzar to capture Jehoiakim. And even more specifically, it is the kind of capture that happens, for example, when police capture a criminal, or a soldier captures an enemy soldier. It isn’t the kind of capture like when my pet gerbil got loose and I re-captured it.

SO – for that simple verse, there are at least 6 things that most of us probably took for granted, but would cause my stupid computer to start spewing steam. Each of those six things (and thousands of other issues, especially ambiguities that we don’t even notice) are VERY hard for computers to figure out on their own. That’s why there are so few good computer translation programs – and the ones that do exist were developed over many years and with millions of dollars. But at TBTA – we bypass that REALLY HARD STEP by telling the computer what each verse means – by hand. We use a semantic (meaning-based) representation that unambiguously and extremely accurately encodes the meaning of each verse in a code that the computer understands. We have an underlying “semantic language” that these meaning representations are written in.  THEN, for each target language that we want to translate into, we “just” need to tell the computer how to translate the different elements of the semantic language – which has its own unambiguous dictionary and syntactic patterns.

In a nutshell – that’s what TBTA is and why it works so well. We bypass the step that is impossible for the computer to do well by itself.  That leaves us with the much simpler (and technologically feasible) task of “just” having to tell the computer how to translate the meaning language into the target language. That might sound like a bit of mumbo jumbo to you – we still have to tell the computer how to translate the meaning language into the target language. But since the meaning language is completely clear and unambiguous, it is much, much, much, MUCH easier to do than trying to translate English, or Hebrew, or Greek directly into a new target language.