Collocation in Arabic-Thesis

COLLOCATION AND SYNONYMY IN CLASSICAL ARABICA CORPUS-BASED STUDY A thesis submitted to the University of Manchester Institute of Science and Technology (UMIST) for the degree of Doctor of Philosophy 2004 Abdel-Hamid Elewa Centre for Computational Linguistics No portion of the work referred to in the thesis has been submitted in support of an application for another degree or qualification of this or any other university, or other institution of learning ` 1 Acknowledgements First and foremost, I thank God Almighty, Who teaches man what he does not know. Then, I would like to express my gratitude to my supervisor Dr. Paul Bennett who throughout the years I have spent doing my research showed me an unequivocal perseverance, gave me so much time and enriched my work with his invaluable comments. I am deeply grateful to Mona Baker, Professor of Translation Studies and Director of CTIS, University of Manchester, who provided me with the first drops of genuine knowledge. I am also indebted to Paul Johnston, Department of Computation, University of Manchester for all the technical support he gave me and also for the statistical programs he wrote specifically for my research. During this work I have collaborated with many colleagues for whom I have great regard, and I wish to extend my warmest thanks to all those who have helped me with my work in the Department of Language Engineering, particularly, Sattar Izwaini and Amin Almuhanna; we managed together through our discussions and commentary on Arabic language to raise a lot of interesting points. I would like also to thank my examiners, Prof. Harold Somers, Dept. of Informatics, University of Manchester, and Dr. James Dickens, Dept. of Middle Eastern Studies, University of Durham, for their criticism and helpful comments that gave my thesis its academic form. My thanks are also due to my wife, Iman Refaey who helped me in assembling the electronic corpus for use in this research. This work was funded by the Egyptian Government. ` 2 Table of Contents TABLE OF CONTENTS........................................................................................................... ABSTRACT................................................................................................................................ NOTES ON TRANSLITERATION......................................................................................... ARABIC TRANSLITERATION CHART............................................................................... CHAPTER ONE: INTRODUCTION...................................................................................... 1.1 THE RATIONALE BEHIND THE STUDY......................................................................................... 1.2 GOALS................................................................................................................................... 1.3 CORPUS-DRIVEN OR CORPUS-BASED........................................................................................... 1.4 LEXICAL COLLOCATION............................................................................................................ 1.5 SYNONYMY............................................................................................................................ CHAPTER TWO: SOME ASPECTS OF THE ARABIC LANGUAGE.............................. 2.1 INTRODUCTION........................................................................................................................ 2.2 THE STATUS OF ARABIC........................................................................................................... 2.3 FACTORS IN THE SURVIVAL OF THE CLASSICAL ARABIC................................................................. 2.4 THE DEVELOPMENT OF ARABIC LINGUISTICS............................................................................... 2.4.1 Recent Contributions to Arabic Linguistics................................................................. 2.5 SOME FEATURES OF ARABIC GRAMMAR...................................................................................... CHAPTER THREE: CORPUS LINGUISTICS..................................................................... 3.1 INTRODUCTION........................................................................................................................ 3.2 INTUITION VS. EMPIRICISM........................................................................................................ 3.3 HISTORICAL SURVEY................................................................................................................ 3.3.1 Pre-computational Corpus Linguistics......................................................................... 3.3.2 Computational Corpus linguistics................................................................................ 3.4 CORPUS DESIGN...................................................................................................................... 3.4.1 The purpose of the corpus............................................................................................ 3.4.2 Text Sampling............................................................................................................... 3.4.3 Text Typology............................................................................................................... 3.5 TECHNICAL REQUIREMENTS...................................................................................................... 3.6 CORPUS PROCESSING............................................................................................................... CHAPTER FOUR: DESCRIPTION OF THE CORPUS AND TOOLS OF ANALYSIS. . . 4.1 INTRODUCTION........................................................................................................................ 4.2 ARABIC FOR COMPUTATIONAL ANALYSIS...................................................................................... 4.2.1 Progress in machine-readable Arabic language............................................................ 4.2.2 Arabic Language resources........................................................................................... 4.2.2.1 Available Arabic Corpora....................................................................................... 4.2.2.2 Arabic Online Texts................................................................................................ ` 3 ........1....................................................................................2..7 EXTRACTION OF COLLOCATION.........1 Lemmatisation....................... 5.......................... CHAPTER SIX: SYNONYMY: AN OVERVIEW.........2 Why these texts?....................................................6............3 Near-synonymy......... 7....... 5............................... 5................................................1 Absolute synonymy:.....................1 INTRODUCTION................................................................2...........................................................................................1 Synonymy ........................................2.............................................................................................................1..... 5.......................................................................... 7........................................................................... 4...1 A Few Remarks..................... 6..........................................................5 CONCLUSION.................7......5... 5.....................2 DATA CHOICE............ 7........... CHAPTER SEVEN: COLLOCATIONAL TREATMENT OF SYNONYMY IN ARABIC...7..................... 7.............................4............................................3.................1 Summary......................3 COLLOCATION AND COLLIGATION............... 7...............7 A CASE STUDY: THE WORD PAIR H}BB AND WDD ‘LOVE’............................................................................................2 Degrees of Synonymy...................................................................1................................................................................................................................. 6.................................................. 5. 6............................................................. 5..... 7..................4 A CASE STUDY: THE WORD PAIR JAA’A AND ATA ‘COME’......................................................................................................2........................................................ CHAPTER FIVE: LEXICAL COLLOCATION................................................................ 4....................................................................................................................... 6...7..................................................................... 7..................................................3 DATA ANALYSIS..............................................2......................................... 7.............................................................................. 4.......................... 6............................................................................1 Summary..... 6.........2 DEFINITION OF COLLOCATION........................2 Summary.....................5 SPANS..... 7.............................4 Conclusion........................................................................2.............3 Frequency.........................................................................2....... 6................................................................................................................................6 A CASE STUDY: THE WORD PAIR H}ASIBA AND Z}ANNA ‘THINK’ . 6......................................Four Approaches..................4 T-test: a measure of difference. 7......4 TYPES OF COLLOCATION.....5 A CASE STUDY: THE WORD PAIR ITHM AND DHANB ‘SIN’.. 5..........1 Using statistics in collocation extraction....3 SYNONYMY IN ARABIC......3..........................5................................................................................... 7...................................1...........................................................7.... 5..............................................1 The rationale behind this selection....................................................................................................................................... ` 4 ........................................................................................................3 Tagging Arabic Texts.............2..................................................................2................. 6........................................................................2 Concordances.........................................7............1 INTRODUCTION.................. 4............................................3 DESCRIPTION OF THE CORPUS.......................................8 CONCLUSION.................. 4...........2......................2 Propositional synonymy.........................2 DEFINITION..............................................................................................................................................................................................................................................................................4 THE REPETITION OF SYNONYMS IN ARABIC..........................................................6 SEMANTIC PROSODY...........................4 Tools for Processing Arabic................................................................................................................................................................... 7........................ 6....... 5.................... 5.............................. 5..................1 INTRODUCTION..............................4......................................................................... ............................ APPENDIX 1: COPYRIGHTS.................... APPENDIX: 4.......................................................................................................................................................................................................... GENRES AND TEXTS INCLUDED IN CAC........................................................................................................................................................ THE CONTENTS OF THE CAC ARE SUMMARISED IN THE FOLLOWING CHARTS (1) & (2):.................................. APPENDICES.......................... APPENDIX 2:...................................................................................................................................................................................................................................................................... BIBLIOGRAPHY......................................... ` 5 .................................................................................... APPENDIX 3:.................................................CHAPTER EIGHT: CONCLUSION....................... APPENDIX 5:...................................................................................................................................................... I have assembled a classical Arabic corpus which covers the early period of Islam. ` 6 .Abstract I am concerned in this study with applying the corpus linguistics methodology that concentrates on investigating language use. I do not wish to undermine what has been done on the basis of intuition. I will argue that absolute synonyms do not exist in terms of their collocational patterns. but the time is now opportune to use modern tools to discover new facets of linguistic behaviour in relation to Classical Arabic and to demonstrate the potential impact of computational methods on Arabic linguistic studies. To do this. because the available Arabic corpora are only limited to Classical Arabic of today which is called Modern Standard Arabic. it is possible to compare seemingly synonymous words and find out whether they are real synonyms or not. Collocation is. particularly synonymy. would have identical meanings) if and only if all their contextual relations were identical. with particular reference to Classical Arabic. This study is also an attempt at explaining some issues in semantic relations. which can be accounted for in terms of collocations by using a computerized concordancer that enable large quantities of text to be searched for all occurrences of a particular lexical item. In order to prove that subtle differences can be brought out by collocation. Through collocation we can distinguish one sense of a word from another and know whether a seemingly synonymous pair are real synonyms or not. Through lexical collocational analysis I can compare and contrast the characteristic uses of semantically related words such as synonyms. Through corpus analysis we can show whether two items are indeed absolute synonyms or not by checking their relations in all available contexts. One of our main aims will be to demonstrate the usefulness of the corpus methodology in describing Classical Arabic by examining lexical collocations. a device with which a particular sense of a word is activated. the collocates for a list of synonymous pairs are analysed. aiming to show that many synonyms are partial or incomplete.e. This will be explored through the analysis of these seemingly synonymous Arabic words. therefore. According to Cruse (1986) two lexical units would be absolute synonyms (i. By this technique. and none can be called true (absolute) synonyms. For a practical reason. The former is based on graphemic mapping and the latter is phonemic. ‘This leaves plenty of scope for scholarly debate. the British Standard. 2002). with the result that there are now many supposedly international standards’ (Whitaker. lines and other marks. it depends on what purpose one has for rendering them in either way. the US Library of Congress and the American Library Association. I tried to use a transliteration system which makes the utmost use of the English alphabet. BS 4280. For the letters which have no Roman equivalent. linguists or Arabic users sometimes adopt a set of symbols which are mainly transcriptions. Among the most common systems are the one adopted by the International Convention of Orientalist Scholars in 1936. These are easy to transliterate or transcribe. This is dependent to a great extent on the one adopted by the US Library of Congress with some modifications as shown below: ` 7 . The latter have issued “Romanisation tables” for more than 150 non-Roman written languages and dialects including Arabic (ibid). One of the reasons given by Whitaker (ibid) for the inefficiency of these Romanisation systems is that they are not easy to key due to the sophisticated figures they use like dots.Notes on Transliteration There are two common ways to represent the Arabic script in the Roman script: transliteration and transcription. Such a process yields a mixed system of transliteration and transcription. There are some Arabic consonants and vowels which have equivalent letters in the Roman alphabet. except for Proper Nouns which are commonly used among Arabs and Arabists. This is much easier than struggling with new symbols. The long vowels are marked by doubling the short vowel to avoid putting more figures on the symbols. which sometimes takes another form when assimilated with the following sound is represented as is without showing any sort of assimilation. For the former we put a dot under the symbol to show emphasis and for the latter we used two symbols (c and ‘). The Arabic definite article al ‘the’. This makes it difficult to represent the doubling of consonant like dhdh or khkh. We would rather ignore doubling with such consonants. For Arabic consonants that do not have equivalents in English we used the most common system. ` 8 . This applies with two types of sounds: emphatic and pharyngeal.Arabic Transliteration Chart Name of letter hamza ba: ta: θa: ji:m ha: xa: da: dh:l ra: za: si:n shi:n sa:d da:d ta: z{a: cayn ġayn fa: qa:f ka:f la:m mi:m nu:n ha: wa:w ya: Arabic letter shape ‫ء‬ ‫ب‬ ‫ت‬ ‫ث‬ ‫ج‬ ‫ح‬ ‫خ‬ ‫د‬ ‫ذ‬ ‫ر‬ ‫ز‬ ‫س‬ ‫ش‬ ‫ص‬ ‫ض‬ ‫ط‬ ‫ظ‬ ‫ع‬ ‫غ‬ ‫ف‬ ‫ق‬ ‫ك‬ ‫ل‬ ‫م‬ ‫ن‬ ‫ه‬ ‫و‬ ‫ي‬ Symbol in Transliteration ‘ b t th j h{ kh d dh r z s sh s} d} t} z{ c gh f q k l m n h w y Such a chart is easy to use because it is familiar to both Arabic and English speakers. the major activity of the study has been the assembly and analysis of a corpus comprising samples of different types of written Arabic: biography. Employing modern technology in investigating language use should enable us to research more aspects of linguistic behaviour. in more detail. etc. Haywood 1965). and his outstanding pupil. religion. The mainstream lexicography is undoubtedly intuition-based. Linguistic studies in Arabic were first introduced and established by Al-Khalil.Chapter One: Introduction 1. I decided to work toward the compilation of a comprehensive corpus of written Classical Arabic in order to facilitate research in a range of disciplines concerned with Arabic and with the general methodology of Corpus Linguistics. I would like to emphasise that the Classical Arabic Corpus will be available for any potential user for her or his needs.2 Goals The current study will provide the resources for accurate descriptions of the way words cooccur in classical Arabic. 1.1 The Rationale Behind the Study A general motivation for many recent linguistic studies has been the desire to automate some descriptive processes and to employ scientific observation in the study of language. For that purpose. Sibawayh in the late 8th century. with Al-Khalil’s manual corpus discussed below in Chapter Two. What Al-Khalil and Sibawayh did was to investigate language use to formulate rules and describe linguistic devices. We can investigate how people exploit the resources of their language and how they use it to achieve their communicative goals. Although Arab lexicographers were the first to integrate corpus-analysis into the dictionarymaking process. With this in mind. who was the first lexicographer to give lexical order in the collection of his dictionary (cf. poetry. ` 9 . a corpusbased approach is certainly not used in contemporary lexicography in the Arab world. when he/she finds out something unexpected to him/her. When a linguist in describing a language using this methodology observes a phenomenon without a prior knowledge on the validity of a particular theory.]). it is called corpus-driven. Al-Sakkaki. or semantically. trans. when we use corpus linguistics methodology to support or invalidate an existing hypothesis or a theory. On the other hand. lexically. 1998. He also considers the classification of Thacalibi’s lexicon. classifies the types of actions with their specific doers and the types of words with their specific predicates. We are only concerned with combinations on the 1 Corpus-based methodolgy has been widely used for other linguistic fields (Biber. i. For instance.4 Lexical Collocation Lexical collocation has become trendy in linguistic research. For example. but the phenomenon was just referred to between the lines and did not get an extensive study. it can be considered like Benson’s (1997) work on collocation.1 Collocation was recognised early by Arab linguists. The BBI Dictionary of English Word Combinations. 2000). This phenomenon gained such currency after computational corpus-based methodology had been adopted as an accurate and effective way of text analysis. in Miftah alc Ulum defined it as ‘likull kalimah maca s}aah}ibatiha maqaam’ (every word has with its companion a position [lit. syntactically. morphologically. which was written ten centuries ago.e.1. So. This roughly means that every word has a different sense with a different adjacent word. then it is called corpus-based. 2002). 2 This lexicon. 1. in Chapter Five we test a collocation assumed to be fixed and find out that it is not a collocation at all. Meyer. ` 10 . Linguistic units can be combined with each other phonologically. Fiqh Al-Lughah2. which is the company that a word keeps. for example.3 Corpus-driven or Corpus-based Two approaches can be at play when working with corpora: corpus-based and corpus-driven (Tognini Bonelli. as showing his awareness of how significant collocational relations are. the subtle differences that occur between synonymous pairs and the semantic features extracted for every word that distinguishes it from another (as shown in Chapter Seven) are not obvious by casual observation nor available in the literature I have examined. Emery (1988: 51) regards this quotation as equivalent to Firth’s (1957: 179) definition of collocation. Chapter Seven tries to find differences between seemingly synonymous word pairs by studying their collocation and suggests that applying corpus linguistics methodology to Arabic can help us become aware of lexical matters. ` 11 . whether throughout the whole corpus or in a particular genre. ‘collocation is restricted to idiosyncratic relationships between words’ (Wouden. In this sense.5 Synonymy One of the main goals of this study is to check the synonymy or non-synonymy of a given pair of items. Chapter Two discusses Arabic linguistics scope and pinpoints some technical problems in digitising Arabic. This is what is traditionally called collocation3. However. Subsequently. Chapter Four describes the corpus compiled especially for this study and gives an account of the tools used for analysis. or senses. Chapter Six addresses the concept of synonymy in English and Arabic. Such research might show that near synonymous words or structures are used in different ways. A further step is taken here in this study to demonstrate that absolute synonymy does not exist in Arabic. since an investigation may reveal differences in syntactic and/or stylistic distribution. there is a widely held opinion among semanticists that strict or absolute synonymy is rare in human languages (see Cruse: 1986). Chapter Eight is dedicated to findings and conclusions. a word has. Synonymy is understood as a gradual cline along which we may locate different degrees of synonymy: near. The study will argue that Arabic never has two words that mean nearly the same thing and are used in the same range of grammatical and lexical patterns. This is especially interesting for words which are considered synonyms. 1997: 24). Chapter Three gives a brief account about the methodology of corpus linguistics and surveys its historical background. 3 Extensive definitions and explanation of collocation will be given in Chapter Five. 1. Chapter Five discusses lexical collocations with a particular emphasis on Arabic. cognitive and absolute. We will use the corpus-based analysis and the computer technology that can help us identify easily the relative frequency of words. we can explore the collocates of words and further isolate the various meanings.lexical level. and colloquial Arabic. universities. Today it is spoken as an official language by almost 200 million people.1 Introduction The Arabic language originated in Arabia in pre-Islamic times. and as far south as Somalia and Sudan. in spite of many and sometimes substantial differences. Modern Standard Arabic is the variety of Arabic which is essentially a continuation of Classical Arabic as it was passed down to us throughout the ages and which is partly a modernised form of expression of contemporary ideas. mass-media and personal writing as in letters and autobiography. are reckoned as dialectal varieties of a single language. Classical Arabic is still employed today as the written language. It is the liturgical language of about one billion Muslims. Although it is widely used throughout the Arab world. they can be used in formal situations such as schools. I will use the former to refer to the early Classical Arabic which extends over the first four centuries of Islam. and spread rapidly across the Middle East. It differs considerably from its descendant. These two varieties are sometimes interchangeable. with different vernaculars. There are many varieties of Arabic: Classical Arabic (CA). until the early eleventh century. rather as Latin was in the lands of the Roman church.Chapter Two: Some Aspects of the Arabic Language 2. The term Classical Arabic is sometimes used as a synonym of Standard Arabic. Classical written Arabic. in everyday language. has changed little over the centuries. Muslims and Christians. the Modern Standard Arabic is still 4 The term ‘Arabic’ is applied to a number of speech-forms which. the modern colloquial Arabic that is the medium of general conversation. In addition. ` 12 . 2. from Morocco in Africa to Iraq in Asia. lectures (whether religious or academic). However. Modern Standard Arabic (MSA). It is taught as a first language in all Arab countries and as a second language in non-Arab Muslim states. the Holy book of Islam.2 The Status of Arabic Arabic4 is the oldest language which is still used for communication and culture in the Arab world. i. whereas the latter is used to refer to the modern Classical Arabic. language of communication and entertainment. which differs from country to country. textbooks. As the language of Qur’an. but it is restricted to formal usage as a spoken tongue.e. however old. it is to some extent familiar throughout the Muslim world. concepts. Modern Standard Arabic is the lingua franca used and respected by educated Muslims throughout the entire world. in more than twenty two countries. science and technology. as a living language should be used in formal written and spoken language. but it is not a ‘living’ in the sense of colloquial’ (Bakalla 1983: xvii). Because Qur’an is revealed in Arabic. it is likely to become obsolete. by Mohamed Khan). or extinct in terms of its usage. most Arabs think that this language must be perpetuated and kept alive (Haeri. Bakalla (1983) argued that ‘living’ language is by definition the language acquired by children in their early age and this is not the case with Classical Arabic. ‘We are not entitled to-day to innovate. 2003). Belief in its divinity Most Arab grammarians and theologians regarded Arabic as a divine language. for this would mean corrupting the language and annihilating its essence. Ibn Abbas [a well-known exegete of the Qur’an] said.’ (Ibn Faris. writing and speeches. This could make one wonder how Classical Arabic has been preserved over so many centuries.e.3 Factors in the Survival of the Classical Arabic One of the main characteristics of language is change. The obvious connection between the Holy Qur’an and the language in which it was revealed to Prophet Muhammad explains the preservation of this language. 1. valley. trans. This is an important question in linguistic study because if we believed that Arabic is Godgiven. Below we will give three reasons that made the Classical Arabic language survive throughout the past centuries. or to develop analogies which they did not know.17) said. we would stick to the Qur’anic language and the expressions used by the ancient Arabs and the early Muslims. mountain. If a language does not change through time. However. all generic nouns] such as animal. They always emphasise that Classical Arabic. ‘In that sense Classical Arabic is [a] ‘living’ language. s}ah}ibi:33). “And He taught Adam all the names (of everything)” (Qur’an: Sura 2. earth. Ibn Faris (s}ah}ibi. to use expressions which they did not use.’ ` 13 . 31. ‘Allah taught him all common names [i. p. the general desire among the educated Arabs is to write and read literary works. 2. Islamic and general books in an elegant language and nothing can be more beautiful than Classical Arabic. Explaining Allah’s saying.adopted as the formal language of press. donkey etc. the Qur’an is Arabic. Arabic itself was very limited before the advent of Islam in terms of use by a large number of people. the clearest and the richest language. owing to the aforementioned Qur’anic verses. the language of the Qur’an. The introduction of Arabic grammar was motivated by Islamic incentives to protect the language from being corrupted by converts. there was no detailed discussion in Arabic literature concerning the origin of speech. for the former it is their religious language which contains the Qur’an. This question was considered as theological rather than linguistic. ‘I am an Arab. Arabs believe that Arabic is the most perfect. Even those who thought that Arabic is not revealed by Allah gave up investigating this question since there was no conclusive evidence for either position. they thought that Arabic is revealed by Allah. The Prophet was reported to have said. Arab linguists did not concern themselves with this question because. Therefore. As a point of departure. the Prophetic traditions and the early Muslim works and for the latter it is the medium of the Arabic culture. All they could do was to describe this usage for Arab and non-Arab people in order to stick to the genuine Arabic. Ibn Faris (Sahibi p. In the introduction of his Lisan Al-Arab. 17) noted that Arabic is the most eloquent language. Ibn Manzur says. Arabs had to stick to the usage of their predecessors to whom the Qur’an was revealed.Unlike English and other languages. 2. however. regarded Arabic as God-given language. the noblest. “Allah made the Arabic language superior to all other languages and enhanced it further by revealing the Qur’an through it and by making it the language of the people of Paradise. Attempting to ` 14 .’” This is why Arabs believe in the supremacy of Arabic as a God-given language. Belief in its Supremacy As a God-given language. and the language of the people of Paradise is Arabic. Most grammarians. Arabic is of supreme and great importance for all Muslims and for those who are interested in study of the orient. we can realise how Islam influenced the study of language. There is no consensus among Arab or foreign linguists with regard to who is the founder of Arabic grammar. they wanted to preserve their language from the distortion and the solecism introduced by non-Arabic speakers and. into Persian we would have only one word as equivalent. perhaps. others said that Abu Al-Aswad Ad-Du’ali was the first one to write the ` 15 . He gave the first glimpse by dividing the word classes into a ‘noun’. the rhyme. Damascus and Cairo can be stirred to the highest degree by the recital of poems. Arabic became corrupted in the course of being used by the new converts. a ‘verb’ or a ‘particle’. Hardly any language seems capable of exercising over the minds of its users such irresistible influence as Arabic. the main motivation for the introduction of Arabic descriptive models was to preserve the knowledge of Classical Arabic. The rhythm. and by the delivery of orations in the classical tongue. Hitti (1958: 90) said. Some argued that Ali (the fourth Caliph) is the true founder of Arabic grammar as a science. though it be only partially understood. to teach those converts Arabic to help them perform their Islamic rituals properly. Arabic has a magical effect on their souls. on the other hand. we can have many words for ‘sayf’.translate the word sayf (sword). as the Arabs. Muslim scholars began to fear lest the language become completely corrupted. Thus. No people in the world. 3. each with a specific connotation. Modern audiences in Baghdad. manifest such enthusiastic admiration for literary expression and are so moved by the word. On the one hand. In Arabic. Those new converts made mistakes when reading the Qur’an. To most Arabs. the music produce on them the effect of what they call ‘lawful magic’ (sih{r h{alaal). They had to put an end to such a situation to protect the Holy Qur’an. since prayers can only be performed in Arabic. spoken or written. It has a long standing and genuine linguistic heritage After the expansion of the Muslim Empire and the increase in the number of foreign people who embraced Islam. only vaguely comprehended. for example. Some modern linguists assumed that the beginning of Arabic linguistics was influenced by Indian or Greek linguistics. Al-Anbari (Nuzhat: 11) concluded that the first founder of grammar was Ali ibn Abi Talib. Sibawayh’s Kitab. had no contact with Indian or Greek culture at that time. 2.first treatise of Arabic grammar on the basis of what Ali or Ziyad Ibn Abihi. Many of the early Arab scholars had the ability to write in all branches of linguistics. but there is no concrete evidence for such a theory. Therefore. ` 16 . supposedly told him. The first written treatises in Arabic grammar appeared at the end of the eighth century when Al-Khalil ibn Ahmad and his outstanding pupil Sibawayh wrote their influential and pioneering books describing the Arabic language. Arabic linguistics was introduced by Arabs since Ali Ibn Abi Talib. Abu al-Aswad himself admitted that he learned grammar from Ali ibn Abi Talib. the true founder of Arabic linguistics. The science introduced by Abu al-Aswad dealt with all branches of modern linguistics as a whole. they are unanimous in asserting that it was introduced to preserve the language of the Qur’an. Although people differ as to who introduced Arabic grammar. in addition to his pioneering work in the exegesis of the Qur’an. because all stories referred to him and Abu al-Aswad referred to Ali ibn Abi Talib.4 The Development of Arabic Linguistics It is well known that Arabic linguistics emerged in the seventh century for a religious motivation: to preserve the language of the Holy Qur’an from the mistakes made by the new foreign converts. syntax. The former wrote his dictionary of Arabic Al-cAyn and the latter wrote his grammatical description of Arabic. Moreover. There was no separation among the different fields of linguistics as in the modern time. who was the governor of Iraq by then. dealt with phonetics. Al-Zamakhshari had outstanding works in the field of syntax and lexicography. morphology and phonology. For example. The science was founded before the beginning of the great movement of translation from other languages into Arabic in the Umayyad and Abbasid eras. quoted by Owens. or in finding new applications for it’ (Bohas. However this little contribution. based on the same corpus used by their predecessors.the major preoccupation of grammarians… (after 1077)… was to find ever new ways of saying the same thing’ (Carter 1985a: p. but they were mainly interested in reworking what had been done by their predecessors. Chejne (1969: 170) notes that “in the 12th and 13th centuries Arabic was looked upon with admiration by the West.” Owens (1998. 8).The golden age of Arabic linguistics was between the eighth and the eleventh century. ‘Sibawayh had. this approach was more interested ` 17 .1 Recent Contributions to Arabic Linguistics There is still something to be done in the study of Arabic language especially with the introduction of scientific approaches and modern technology in the field of linguistic investigation. Guillaume and Kouloughli: 1990. “If someone has in mind another cause for grammar than the one I mentioned. laid down the basic rules and methods of grammar. Little contribution has been made in the past millennium. 270. in fact. On one hand. Al-Khalil ibn Ahmad for example said. There are many contributions made by later linguists until the end of the eleventh century. while the later grammarians’ contribution consisted only in expounding his theory in a more explicit and systematic form. p. in the same manner the Arab of today looks at the more developed Western languages. ch. p. 1078). 1988: p. They were mainly concerned with codifying and preserving the literature of their predecessors.. Linguists throughout this period used only to remodel or to add relatively slight changes to what has been done in the early ages of Islam.. was still within the general framework introduced by the early linguists as ‘. 9) argued that Arabic linguistics reached its highest methodology and its most sophisticated level with Jurjani (d. 2.5). The early Arab linguists felt that their contribution was not enough. 66 quoted in Versteegh: 1997: 74). In other words. In the early 20th century the current trend was to rely totally on what has been formulated during the early period of Arabic linguistics.4. let him come forth with it!” (Al-Iid{aah{. During the last four decades the study of Arabic language has increased dramatically. Arabica and Al-cArabiyya (Arabic). the use of modern techniques in linguistic research. especially to the teaching of Arabic as a first language. There are also indications of the same interest in engaging with the development in linguistic theory as it is a very dominant paradigm in all branches of science represented by the establishment of some Arabic teaching centres in the Arab world and abroad and the appearance of some periodicals and journals interested in Arabic linguistics like the Journal of Arabic and Islamic Studies (JAIS). xxxvii) pointed out that much of the work on Arabic linguistics ‘has been influenced by developments within linguistic theory and that many studies have been formed in. language planning. workshops and seminars devoted to Arabic linguistics for many purposes: scientific. Moreover a number of the big universities all over the world are now engaged in organising conferences. Secondly. contemporaneous theory’. Straley (1989) listed the dissertations done in the American universities in the field of Arabic linguistics from 1967 to 1987 in an annotated bibliography. both in the universities of the Arab world and abroad. or others. The current tendency has been to enrich Arabic with modern theories of linguistics through comparative or applied linguistic studies. it tries to explain and interpret such work in modern linguistic terms. and reflect. commercial. sociolinguistics and pedagogy. Much of the work in this field was done in thesis or dissertation form. in general.in verifying and editing the grammatical manuscripts left by the Arab grammarians. comparative linguistics. ` 18 . There are two main features which characterise modern Arabic linguistics of the last decades. First. grammar. as in computational linguistics and corpus linguistics. On the other hand. He noticed that these dissertations. cover a wide variety of topics: phonology. the tendency towards the application of linguistic theories and methodologies. Very few of these studies have been published. Journal of Arabic Linguistics (in Germany). Bakalla (1983: p. English. To pursue the notion. Below are some of these features: 1. Arabic is written from right to left. is nonsynthetic. The use of cases in Arabic is complicated by the fact that they are mainly represented by short vowels and the Arabic script only allows the writer to show consonants and long vowels. It managed lately to produce an Arabic/Dutch dictionary based on a large Arabic corpus.2. 4. on the other hand. focuses nowadays on Arabic Natural Language Processing. Research in this domain is currently under development. under or above the preceding consonant.With the introduction of computational techniques into the field of linguistics in USA and Europe.1).f. 5 The Institute for the Languages & Cultures of the Middle East. 4. some companies like Sakhr (based in Egypt) are involved with developing solutions for Arabic computationally. Arabic. a corresponding interest in the use of computers to investigate the Arabic language grew. the shape of each letter depends on what position it occurs in a word: initial. accusative and genitive. Classical Arabic in particular. This results in compound cases of morphological-lexical and morphological-syntactical ambiguities’ (Khalid et al 1974: 29). 3. This has been sorted out recently with programs that can handle all diacritics in Arabic (c. in order to acquaint the reader with the variety I am going to use in this study. as was also the case for the theoretical linguistics. I will illustrate the main features of Arabic grammar to help those who are to construct a computational system for Arabic know what kind of complexities they may face. University of Nijmegen. ‘For technical reasons the diacritisation is impossible when using the computer. Arabic script has twenty-eight letters representing the consonants in addition to three long vowels. Academic centres. is a synthetic (inflectional) language. ` 19 . this section serves as an introduction to the problems encountered when attempting to search the Arabic texts by lemmas. 2. 2. companies and conferences specialised in Natural Language Processing flourished in the Arab countries and abroad5. or final. Also. Arabic has three cases: nominative. middle. Unlike English. More importantly. Diacritics which are traditionally used for case endings are computationally problematic. like Latin.5 Some Features of Arabic Grammar So far I have briefly outlined some aspects of the status and development of Arabic. Arabic short vowels are written in a diacritical form. and there are also conferences which are specialised in Arabic worldwide. 5. Arabic words are formed from roots, based on fixed morphological patterns, where vowels, suffixes, prefixes, or infixes can be added to form new words. Once we know these patterns, it is easy to form any possible word without making mistakes. More interestingly, we can add to the base form other linguistic units such as person, tense, mood, participles case, and verbal noun. English words, on the other hand, are generated from stems. Therefore, the key word for searching the traditional lexicon in Arabic is the root6, whereas in English it is the stem (the basic word form). 6. As Arabic is a synthetic language, it allows pronouns to combine with words forming one single word. Such personal pronouns can be suffixed to nouns, verbs or particles. We may form an Arabic word representing a whole sentence. Consider the following word in (1) below. (1( ‫ ضربوك‬d{arabuuka (they hit you). This property raises another problem of analysing Arabic computationally. When searching for a word in an electronic text, we have to search for every possible form of this word. This is because, if we look for the stem of this word, like in English, we will find a huge amount of results which are not needed. In Arabic we can form different roots by adding more characters. For example, cam (year) can include camer (populated), nacam (ostrich), camel (worker) are derived from different root words. All the occurrences of each word in a simple word search program which is not trained on Arabic idiosyncrasies can give a good result which won’t need a laborious hand-editing. 7. Word order in Arabic is more flexible than in English. There are two types of word order in Arabic: VSO and SVO. 6 By the word ‘root’ I mean the three or four nuclear conosonantal letters from which we can generate all possible word forms in Arabic by adding suffixes, prefixes or infixes. ` 20 Chapter Three: Corpus Linguistics 3.1 Introduction Corpus is a Latin word which means ‘body’, hence any collection of texts, linguistic or nonlinguistic, can be called a corpus, such as the Corpus Juris Civilis which was a collection of early Roman laws and legal principles in the sixth century and the corpus Manuscript of Chaucer (1400) which included Chaucer’s works. In 1731 Alexander Gruden used the Bible (King James Version) as a corpus to show that the Bible is consistent (Kennedy 1998: 14). In modern linguistic terms, a corpus is a designed collection of written, spoken or a mixture of written and spoken data which can be used for linguistic investigation. In this sense, not any collection of texts can be called a corpus since there is a big difference between a corpus and a text database; the former has to be ‘a systematic, planned, and structured compilation of text’ (ibid: 4). Linguists throughout the history of linguistic research used to rely on textual resources as a source of evidence, at least, to prove the correctness of their theories about language. ‘It is obvious that if someone sets about writing a grammar of English, he must have a suitable body of material from which he is to elicit his rules, whether they be purely descriptive, or, as is more common, prescriptive or even pedagogical. These bodies of material may be considered corpora, with some extension of the term’ (Francis 1992: 28). The study of language in general, whether in the context of modern linguistics or in the context of earlier linguistic studies has also been largely based on empirical research. This empirical approach to language is basically dominated by the observation of naturally occurring data, as linguists tended to gather evidence for the grammaticality of a given word or a sentence. This is partly what corpus linguistics deals with. However, corpus linguistics goes beyond the use of corpora as a source of evidence in linguistic description. ‘Corpus linguistics, like all linguistics, is concerned primarily with the description and explanation of the nature, structure and use of language and languages and with particular matters such as language acquisition, variation and change’ (Kennedy 1998: 8). ` 21 Nowadays, two main objectives can be met via corpus collection: linguistic investigation and language processing. As Souter and Atwell (1993: i-ii) explained, Two primary research applications of corpora can be identified. On the one hand, linguists hope to exploit computer technology to explore linguistic data for the purpose of identifying linguistic trends and developing new theories. On the other, computer scientists and practitioners of artificial intelligence hope to use the linguistic information (including frequencies) present in and derivable from machine-readable corpora to develop software tools and systems for the automatic analysis, understanding and generation of natural languages like English. In some cases, of course, they will also employ the frameworks developed by the linguists, but this is by no means always the case. 3.2 Intuition vs. Empiricism A general motivation for much of the linguistic studies before 1950s was the desire to deal with linguistics on the ground of a positivist and behaviourist view of the science. Linguists like Harris and Hill regarded the corpus as the ‘primary explicandum of linguistics’. For such linguists, the corpus can sufficiently meet this approach, whereas intuition can, if need be, be used as a second source (Leech 1991: 8). With the advent of Chomskyan theories in the 1950s, less emphasis was placed on empirical observations. With the authority of his works, Chomsky has directed linguistics away from empiricism and the study of language use towards rationalism for many years. Following de Saussure, he made a distinction between two approaches to looking at language: a theory of language system and a theory of language use. These two approaches are drawn (1965) as competence and performance.7 Chomsky, rejecting the corpus linguistics approach, argued that: Any natural corpus will be skewed. Some sentences won’t occur 7 Competence can be defined as ‘the speaker-hearer’s knowledge of his language’ whereas performance is ‘the actual use of language in concrete situations’ (Chomsky: 1965: 4). Competence both explains and characterises one’s internalised knowledge of a language. The only way to investigate competence is through introspection. ` 22 Therefore. he gave a lecture at the Linguistic Society of America Summer Institute in 1964. To prove his argument. Horrocks (1987: 13-14) argues that although performance is the only available evidence to the linguist. This is because our mind has a finite storage capacity and the choices of language we produce are infinite. In fact. The corpus. he gave the following examples in (1a & 1b) below: 1a. still others because they are implicit. He (ibid: 16) expounded that an observationally adequate grammar cannot simply list all the wellformed sentences of a given language. However. I live in Dayton. quoted in Leech 1991: 8) In the course of invalidating the corpus-based studies. it is not a transparent reflection of competence. Horrocks (1987: 16-17) further argued that relying on a corpus to derive grammatical rules will lead to some sort of rules which have a predictive power which can generate strings not available in the corpus itself. most notably register variation where probability plays a major role in selecting certain combinations of meaning with certain frequencies. do not cope with vast areas in language study. Only by positing competence can we account for a finite system with the capacity to define the membership of an infinite set. the approaches based on Chomsky’s theories. we can only test the validity of such strings through referring to the intuition of a native speaker. The sentence (a) above is more likely to occur more frequently. just for demographic reasons! Following Chomsky. if natural. 1996: 5). 1962. I live in New York. However. in which he rejected any kind of quantitive (statistical) data.because they are obvious. (Chomsky. Ohio. 1b. others because they are false. the bitter criticism of corpus data arising from the tradition ` 23 . will be so wildly skewed that the description [of language] would be no more than a mere list. as the linguist must seek to model language competence rather than performance’ (McEnery and Wilson. Chomsky suggested that ‘the corpus could never be a useful tool for the linguist. which were considered mainstream in linguistics. that needs a radical surgery. He argued that ‘we may see formal patterns being used overtly as criteria for analysing meaning. Hence. I would suggest. Fillmore (1992) argued that the two approaches can have interface and complement each other. however small. can pinpoint interesting facts. Makkai (1987) considers the total reliance on intuition a serious disease that affects modern linguistics. etc. On the other hand. Criticising de Saussure’s approach. studying corpora of naturally occurring data is a very useful way to test a ` 24 . we would rather make use of both of them in a more interactive way. Firth and Halliday. He observed that the language we produce is governed to a large extent by particular conventions (social.which Chomsky established has led corpus linguists to remedy the drawbacks of corpus data such as balance and representativeness. 1991: 6-7). which he called textphobia. which is the main object of linguistic study (Roulet. Instead of treating corpus-based and intuition-based linguistics as two contradicting disciplines. is inadequate to cover all aspects of language. In conclusion. is reading Malinowski. which is a more secure and less eccentric position for a discipline which aspires to scientific seriousness’ (Sinclair. since a corpus. To pursue the premise. A useful cure for this disease. Sinclair (1991) also criticised the reliance on intuitive data. lexis. however large. 1975: 78).). especially in the field of word meaning. he proposes. Firth (1957) also discredited the introspection of the native speaker as a reliable source of data. the grammatical rules are derived by analysis and generalisation of a corpus. It is worth stressing that eliminating observation from the study of language was fervently criticised by linguists even before Chomsky. if someone sets about writing a grammar of a given language. following Francis (1992). a corpus. situational. he must have a corpus from which he is to derive his rules. He emphasised the role of the native speaker’s introspective judgement as a subsequent step. Malinowski in 1936 suggested overlooking the question of langue and parole and paying more attention to the living speech in a context of situation. poetry and nomad proverbs. most studies of corpus linguistics are mainly focused on English. since none of them has left an account of the methodology used. but on the other hand. They made it as representative as possible. This is obvious in their use of quotations from these sources as linguistic evidence. With the introduction of the computer into the field.1 Pre-computational Corpus Linguistics The definition of corpus as a designed collection of texts for linguistic investigation subsumes all early corpora compiled in this respect. being limited to the text of the Qur’an and the pre-Islamic poetry. However. genres. The computer made the process easier and more reliable. styles and varied topics including poetry and ` 25 .3. Thrax’s grammars of Greek and early Arab linguistics were definitely based on textual resources. 3. However. Versteegh explained.3 Historical Survey We have to bear in mind that the manual collection of textual resources was the regular means before the invention of computers. The early Arab linguists relied mainly on three sources of linguistic data to describe their language: the Holy Qur’an. 3. registers.theoretical model put forward through intuition or to investigate a language with an emphasis on what is typical in this language or what is called norms of use. the grammarians upheld the fiction of native speakers whose judgement could be trusted’ (1997: 42). Such quotations were certainly taken from a corpus they designed for their inquiry about language. Thus we can distinguish between two stages of corpus collection: Pre-computational and computational corpus Linguistics. This is because the manipulation of large corpora accurately is quite hard without the use of computer techniques. the interest in corpora has grown and continues to increase. the corpus used by the grammarians was closed. Ditters (1990: 130) described this corpus as consisting of specific media. ‘on the one hand. although corpora in this sense are deeply rooted in the history of linguistics as most of the great civilizations have long traditions of the study of language. we do not know exactly what form of corpus they used. For instance Panini’s grammar of Sanskrit. They have postulated certain selection criteria for designing such a corpus. apart from Arabic. 1996:10). 1995: 244).prose. But he pinpointed some drawbacks in these collections due to (1) the editors of lexicographical collections like Oxford English Dictionary and Webster’s Dictionary in particular. boring and very expensive to process (McEnery and Wilson. as a basis for describing English grammar. Therefore. encountered a big problem. this grammar became the norm for language use. which was initially manually assembled is considered a transitional point between a non-computerised corpus and modern corpus linguistics. in an attempt to avoid the shortcomings of the other corpora. at the expense of the normal core of the language’ (Francis 1992: 28). taken from a wide range of genres. Commenting on this. articles and pronouns. collected a more representative corpus (spoken and written). later computerised yielding the LondonLund Corpus’ (Svartvik. as they did not have enough citations for function and simple words like. He divided corpora into three types: lexicographical. (2) The major difficulty with collections assembled for grammatical investigation is that ‘they are inevitably skewed in the direction of the unusual and interesting constructions that the readers encounter. 1990 quoted in Kenny 1999: 32). It now takes a ` 26 . As for English language corpora. error prone. but instead of the grammar being tested out again and again on corpus-data in a cyclic process as is the case in modern corpus linguistics. Undoubtedly. Johansson (1995) suggested. dialectological and grammatical. ‘the natural solution to this problem is to collect texts in a systematic manner and subject them to the principle of “total accountability”‘ (Johansson. Quirk. prepositions. banal. Therefore. however. Kennedy (1990: 17) pointed out that the SEU corpus. Francis (1992) gives a full description of English precomputer corpora. It is important to note that ‘the spoken part of SEU corpus was. This is because corpora without the assistance of computer techniques are time-consuming. He (ibid: 133) pointed out the way early Arab grammarians employed the corpus they assembled: Originally corpus-information constituted the basis for a grammar of the Arabic language. his Survey of English Usage is considered a landmark in corpus-based grammatical description in the 20th century. working on such large corpora was tedious and exhausting. matter of minutes to process such corpora by computer accurately. As a point of departure we can conclude that the methodology of corpus linguistics, however unrepresentative of the actual use of language, was widespread in linguistics for a long time. Corpora remained as a source of data for linguistic research in spite of the difficulties raised above until the 1950s, when the corpus for linguistic research underwent a severe blow at the hands of Chomsky, who invalidated it as a reliable methodology (see 3.2). 3.3.2 Computational Corpus linguistics With the introduction of computers to the field of corpus linguistics, much attention has been given to this methodology. The electronic corpus has become widely recognised and exploited when Francis and Kucera launched their pioneering corpus (Brown Corpus) in 1961. Then, linguists began to realise that electronic corpora can offer a new insight and a reliable methodology for natural language processing, as they found out that computers have made possible the collection, storage and processing of very large and varied texts. Unlike manual corpora, computerised corpora can provide us with well-designed and representative corpora, which are easy to process in few minutes. This can reveal unexpected features of language. More important, ‘the ability to examine large text corpora in a systematic manner allows access to a quality of evidence that has not been available before.’ (Sinclair, 1991a: 4) Computerised English Corpora Today, there are many electronic corpora available on either punched cards or CD ROMs in various languages such as the Lancaster/Oslo-Bergen Corpus (LOB), London-Lund Corpus the Lancaster/IBM Spoken English Corpus (SEC), The Longman/Lancaster English Language Corpus, and the British National corpus (BNC). Below I am going to give a brief account of two major English corpora: Brown Corpus as the first computerised corpus and Birmingham Collection as the first major computerised corpus used for dictionary-making based on a thorough study of the language use. Brown Corpus This was, undoubtedly, a pioneering corpus not only because it was the first computerised corpus of English, but also because it was against the mainstream, which was intuition- ` 27 oriented. The corpus consisted of about one million words of the written English printed in US in 1961, comprising 500 text samples of about 2000 words each. The samples were taken from a variety of genres excluding verse and drama. The project started in 1961 and only after three years (in 1964) was the corpus ready for distribution on a magnetic tape. Birmingham Collection The starting point of this corpus goes back to the 1960s in the form of research carried out at Birmingham University where Sinclair (1969) issued his early computational British corpus: OSTI project (135000 running words of informal conversation transcribed and computerised). The collection undertaken at Birmingham University is made up of written texts and transcribed speech. It was intended to provide raw language data for a variety of purposes, relevant to the needs of the learners and teachers, lexicographic in particular (Renouf, 1984: 4-5). Since 1980 Cobuild, which is a joint venture between Collins and the School of English at Birmingham University, has been collecting a corpus for dictionary compilation and language study, making use of the Birmingham collection. In October 2000 the latest release of the corpus amounted to 415 million words and it continues to grow with the constant addition of new material. Research at COBUILD over the last fifteen years has shown that very large samples of text are necessary for good linguistic study, since the vocabulary of English is so large (well over half a million different words) and there is such variety in current usage. In order to draw statistically valid conclusions from computerised analysis of a corpus, researchers need to have adequate data samples at their disposal (http://titania.cobuild.collins.co.uk/). In addition to the corpora mentioned above, there are ‘a number of initiatives that have aimed at collecting and disseminating textual material amongst the international research community’ (Kenny 1999: 34). Below are examples of these initiatives: The ACL/DCI (the Association for Computational Linguistics’ Data Collection Initiative) which produced a CDROM containing just plain orthographic text. It consists of the Collins English Dictionary; selections from the Wall Street Journal; the Penn Treebank of skeleton-parsed data compiled by Mitch Marcus and his team at the University of Pennsylvania; and a database of scientific abstracts. There are also some other initiatives like ECI (European Corpora Initiative), LDC ` 28 (The Linguistic Data Consortium), ELRA (The European Language Resources Association). 3.4 Corpus Design The corpora we have mentioned above are not assembled haphazardly, since a corpus is defined as a designed collection of texts. Prior to the process of collecting a corpus there should be theoretical research to specify what type, time period, language variety or state, size and design method a corpus involves (Sinclair 1987; Atkins et al. 1992; Biber 1993; McEnery & Wilson 1996; Kennedy 1998, Meyer 2002). 3.4.1 The purpose of the corpus From the many corpora we have discussed above we can conclude that corpora can be designed for several purposes: as a basis for a dictionary; to create a word frequency list; to study some linguistic phenomenon; to study the language of a particular author or time period; to study language change; to train an NLP system; as a teaching resource for nonnative speakers; to study language acquisition. Due to the diversity of corpora purposes, there is no consensus among corpus linguists as to the procedures or the selection criteria to be followed in corpus design. For example, the selection criteria for Cobuild excluded poetry, drama and technical language (Renouf, 1984: 6). In addition to excluding poetry and drama, the Brown Corpus is designed to be a synchronic corpus- it contains written texts of American English published in 1961. If the purpose of the corpus is to highlight the features of a language over a period of time, we will definitely need a criterion that allows that purpose to be met. Moreover, specialist corpora may introduce different criteria to study a certain aspect of the language. Some of the first considerations in constructing a corpus is to specify for whom and for what the corpus is designed: for personal research, or to serve as a general resource. Kennedy (1998: 70) argued, ‘the optimal design of a corpus is highly dependent on the purpose for which it is intended to be used.’ Anyhow, Atkins et al (1992) and Meyer (2002) drew up the principal features of corpus design for whatever purpose. They discussed the practical stages in building a corpus: selection of sources, text annotation, copyright permission, in addition to some extra-linguistic variables. ` 29 British or American English corpus is not feasible. There are two ways of sampling a language: language reception and language production. Leech and McEnery. We can hardly achieve a representative sample of the total language production for the vast demographic and contextual variation among people. is small when compared with the entire population of the language under investigation. in terms of the variety of registers on text types it represents. Moreover. let alone speech. purpose and content of a corpus is to select and sample the actual texts which will make up the corpus. but a complete 20th c.’ However.. This is because it is too difficult to access all the publications in a given language. that there may be a corpus that is designed to represent not the language as a whole but one particular genre or the whole works of an author for example. i. irrespective of being representative or not. 1992: 5). whether to sample the audible and readable language or the spoken and written language (Atkins et al. The main challenge in ` 30 . Secondly. ‘the value of a corpus as a research tool cannot be measured in terms of brute size. Garside. 1987: 6) noted that Sinclair (1982) defined the problem of corpus compilation as a problem of selecting the right sample from the existing massive quantities of machine–readable texts. a corpus. In addition. can be an equally important (or even more important) criterion’ (Garside. The diversity of the corpus. Leech and Sampson. Biber (1993: 243) pointed out that any selection of texts is considered a sample.3. in the first place.2 Text Sampling The next step after deciding the type. it is feasible to get a grip of the complete Old English corpus or the complete Early Middle English corpus. 1997: 2). we have to bear in mind. however big. but he noted that ‘a corpus must be ‘representative’ in order to be appropriately used as the basis for generalisations concerning a language as a whole.e. With this in mind.4. cars. ` 31 . topics or registers while keeping the corpus at a manageable size. Sampling all data randomly.sampling the population8 of a given language lies in representing all the relevant genres. can also reduce the stylistic idiosyncrasies of authors. in relation to texts such as newspapers. Sinclair (1995: 27-28) made a distinction between a ‘whole text’ corpus and a ‘sample corpus’. sampling has to be conducted according to statistical measures and thus will be qualitatively and quantitatively representative of the entire publication and population. this word does not necessarily refer to human beings as commonly used. he thinks that ‘whole text corpus’ should be a default value for anyone building a corpus. We may have a population of anything to be counted such as people. etc. (Stuart. where all texts have a chance to be represented. trees. radio programmes. we have to ensure the diversity of the selected data. Stubbs (1993: 11) also argues in favour of whole texts being the unit of study. in general corpora. Sampling from various genres can reduce the possibility of being dominated by stylistic idiosyncrasies of a particular author (Atkins et al. Therefore. Biber (1993: 244) argued that the process of random sampling is mostly used within each subgenre to ensure a representative selection of texts. ‘the use of small samples is just a remnant of the early restraints on corpus building’ (ibid). He also quoted Sinclair saying that ‘few linguistic features of a text are distributed evenly 8 To statisticians. More importantly. animals. To him. books.’ Unlike many corpus linguists like Francis and Kucera in their pioneering corpus (Brown Corpus) in 1961. 1968: 10). and of a constant size. With the diversity of the corpus. we can avoid the pervasiveness of a certain genre or the stylistics of an author. in order to achieve an accurate representativeness of the samples. He noted that ‘samples are small. companies. hence not qualifying as texts. However. books. 1992: 2). ` 32 . topic. These criteria are by definition non-linguistic. data capturing.throughout’. Before starting the process of creating a corpus. (1992) have given a full systematic account of non-linguistic characteristics in corpus design. and social (age. there are also some considerations one has to keep in mind when designing a corpus such as getting permission. setting. whereas the latter won’t be attained until the corpus becomes available for analysis (ibid: 5). sex. Work-related criteria include. Atkins et al. ethnic. factuality. personal vs. dialogue. telephone (informal). markingup. These criteria are mainly demographic: geographical. profession.). education. mode (written. the designer of the corpus has to take into account some important information about both the author and the reader who differ in regard to certain author-related and work-related criteria.4. preparedness. impersonal. participants. spoken. text origin.3 Text Typology Atkins et al (1992) distinguished between two criteria for constructing a corpus: external (non-linguistic) and internal (linguistic). In sampling written texts. age and size of intended audience or readership. The former criteria are the first to look at when compiling a corpus. national or international. written to be spoken). are also required when sampling spoken data. monologue vs. 3. he needs to capture the data. Contextual criteria refer to situationallydefined varieties such as conversation (face-to-face vs. Having got permission. Such considerations.5 Technical Requirements In addition to the criteria mentioned above. in addition to contextual criteria. socioeconomic. etc. to use the text in an electronic form for language research. which could be overlooked with use of sample texts. date of publication. written to be read. style. 3. Author-related criteria are those associated with authors. genre. nationality. the designer may have to get permission from the publishers of his selected works. among other things. is captured electronically. Meyer (2002) and Kenny (2001) gave an overview of how to process such a corpus. or the sentence structure and the function in the sentence for each word (as in parsed corpora). The first thing the computer techniques can do with texts is to provide word frequency lists for the whole contents of the texts. undoubtedly. headings. and so focus on investigation’ (Sinclair. time-consuming. which can be added to the text to show the parts of speech of each sentence (as in tagged corpora). Frequency Lists These lists can be made by identifying every word form in the text. 1991: 31). homophonous words.Written corpora are easy to capture by keyboarding. expensive and error-prone. proofreading is still needed to make sure of the reliability of the data. etc. ` 33 . sentence boundaries. Spoken material. written or spoken. situations. which can be searched. on the other hand. This is because people’s perception of speech may differ in respect of prosodic features. 3. Since most corpora are incredibly large. There are a lot of tools designed for such a purpose. various types of hyphenation. some information can be added. This can be done in descending or ascending order. to indicate some text features such as titles. it is nonsense to search a corpus without the help of some software that can highlight what we look for accurately and fast. To have a reliable transcribed text is. is difficult to capture.6 Corpus Processing Once a corpus is available to use in an electronic form it needs to be processed by computer for use in linguistic research. one can get an idea of what further information would be worth acquiring: or one can make guesses about the structure of the text. There is also some other information. electronically. paragraphs. This process is called marking-up. chapters. counting identical forms and classifying them according to a particular order: alphabetical. Therefore. Spoken materials need to be recorded and then transcribed before processing. scanning or downloading from the Internet. Barnbrook (1996). Listing words according to their frequencies can show how often every single word form occurs in the text. etc. ‘by examining a list. Hence. However. Once a text. we need tools to turn the electronic texts into databases. or according to their frequency. ` 34 . most programs also offer the possibility of searching for word combinations within a specified range of words. size.7 Summary This chapter has given a brief account about the methodology of corpus linguistics and has surveyed its historical background. Furthermore. if the program is a bit more sophisticated. the search-word is represented within its contextual environment. Unlike word frequency lists. it might also provide its user with lists of collocates based on some statistical tests. this can give more information about the nature and behaviour of words. 3. We have investigated some aspects of corpus linguistics to make it easy for the reader to be aware of the state of the art. with a space on each side. The arrangement of each key word is alphabetical according to the left-hand or the right-hand context. Such aspects include the methodology for creating a corpus. This process is also called KWIC (key word in context).Concordances A concordance can be defined as listing all occurrences of search-words in the text with a short section of the context that precedes and follows each word. Collocation In addition to KWIC and word frequency lists. Barnbrook (1996) describes the main features of concordance programs in detail. such as representativeness. The search-word can be highlighted by putting it in the centre of each line. Collocation is discussed in detail in Chapter Five. sampling. etc.. the types of corpora as well as the technical requirements needed for utilising corpora. Consider for example the three letters-word ‫ ورد‬wrd which can be lexicalised as a verb َ‫ وَرَد‬warada ‘come. taking into consideration the state of the art of Arabic which we will discuss below. 1999: 162-63). Van Mol (2002). 4. be mentioned’.1 Progress in machine-readable Arabic language The Sakhr Company has been working on digitising Arabic since 1985. Garside and Knowles (2001). Then we can find some old Arabic poems and some primary schoolbooks with only vowels that mark the words cases. although one of the most sophisticated systems of linguistic analysis ever devised.Chapter Four: Description of the Corpus and Tools of Analysis 4. a noun ٌ ْ‫ وِر‬wird ‘watering place’. This raises some problems of digitising Arabic which require laborious work of computation. Two years later they managed to produce the first Arabic morphological analyser. ` 35 . 4. Not until 2001 did they manage 9 A few written Arabic texts contain vowels. with a fully-detailed vowel system.2 Arabic for Computational analysis Work in Arabic computing did not start as early as European languages. For instance.2. and syllable’ (Koenraad et al.1 Introduction Based on the information given in the previous chapter we embarked on building a computerised Arabic corpus to use in our linguistic study on lexical collocations and synonymy in Arabic. a noun ٌ ‫ وَر‬ward ‘flower’. the absence of vowels in Arabic9 makes the process of tagging or any morphological analysis quite hard and sometimes ambiguous. Attempts have been made. was developed by scholars who lacked the concepts of consonant. vowel. Khoja. but due to some technical problems with Arabic script (orthography) and grammar there is far less development than in English and languages written with the Roman alphabet. For more details ‫ْد‬ ‫د‬ about the difficulties of analysing Arabic computationally see Goweder and Roeck (2001). This is because ‘the native Arabic grammar [which is produced by early Arab linguists]. the most famous one is Qur’an. We attempted to meet all the design criteria for corpora compilation in order that we can conduct a methodical study based on it and to make it available as a resource for other researchers to use in the future. these texts cannot be considered corpora because they lack systematicity. 10 This corpus consists of translated works into English. even the problem of diacritics. The Arabic corpus which is used for analysis with the naked eye11 consists of translated Arabic text from Swedish. They noted some difficulties that they encountered in the process of digitising Arabic that had not previously been tackled. consisting of translated English text from Arabic. identifying orthographic words in Arabic is more complicated than in English. like wa ‘and’ and fa ‘then’ which are always attached in writing to the following word. because many Arabic words can be attached to the following string of characters. ` 36 . many such texts are now available on the web. which is part of TEC (Translational English Corpus)10 is electronic. For instance. 11 He analysed the English corpus electronically using Wordsmith tool. He studied the impact of translation on collocations in Arabic using two corpora: English and Arabic. representativeness and proper planning. This corpus is also transliterated. Using the latest techniques to handle Arabic through OCR. done on a mainframe computer. especially religious material. Nevertheless. into machine-readable form. it was not manually keyed. Manchester University. Later on. it was first suggested by M. However. but with Arabic corpus he could not find at the time an efficient tool (OCR) to convert the text into an electronic form nor a tool to process it (a concordancer). CTIS. there was some work predating the widespread use of personal computers capable of handling Arabic script at European universities. but he used the corpus he selected in a hard copy form. This corpus was considered one of the major computerised sources of Arabic literary material before the personal computer could handle Arabic. a lot of attention has been given to render Arabic texts.to launch an Arabic OCR that can handle Arabic efficiently. Izwaini (2000) attempted to use corpus-based analysis with respect to Arabic but he ended up using a manual Arabic corpus. Baker (1995). Al-Jabouri and Knowles (1988) compiled a transcribed corpus of Arabic to investigate the quantitative properties of cohesion in Arabic. So. although this work used transliterated versions of the Arabic texts. One of the pioneering projects. was the corpus of early Arabic poetry assembled by Alan Jones at Oxford University. he used to look up the novels he selected with his naked eyes to find interesting patternings. The English corpus. after the remarkable development in the field of computational Arabic. Moreover. containing 76 million words and a corpus of Egyptian Arabic speech. For the most part. LDC (Linguistic Data Consortium) have also two Arabic corpora: a corpus of Arabic newswire text. was completed in 2003 at the University of Nijmegen for lexicographical use (http://www. The former is just a raw corpus whereas the latter has only markup notation.Izwaini in his Ph. headings.1 Available Arabic Corpora Compilation of a large corpus of MSA (Modern Standard Arabic). ELRA (European Language Resources Association) provide two Arabic corpora: An-Nahar newspaper Corpus. After a long search. books. based on that corpus.2. reports.Arabic dictionary (2003).nl/WBA/index. English and Swedish) electronically. but most of them do so in the form of images and this is useless for the assembling of computational corpora.htm). etc. novels. 4. consisting of several million words. and typographic features. deletions.2. is a pioneering company using the latest techniques ` 37 .e.D. there is a strong tendency among Arabic newspaper publishers to post their articles on the Internet. lasting between 5 and 30 minutes. consisting of 60 unscripted telephone conversations. They at last managed to publish a Dutch. thesis (in progress) used another corpus covering these three languages (Arabic.2 Arabic Language resources With respect to the development of tools for machine-readable Arabic that can handle Arabic. textual material in digital format can be obtained from Arab publishing houses or companies interested in building Arabic databases for commercial purposes. i. with more information relating to the original layout of the texts. including sentence and paragraph boundaries. they decided to use Monoconc Concordance Program which they found sufficient for their needs. a lot of links to Arabic linguistic and Arabic and Islamic cultural sources exist on the web.kun.let. The Sakhr software Company in Egypt. 4. This corpus is a raw corpus (not tagged or lemmatized) containing a variety of genres: newspapers.2. containing around 140 million words and Al-Hayat newspaper corpus. cotaining 18 million words. but the material is mostly religious.to fulfil the needs of the Arabic market and the Arabic speaking population in the field of Arabic processing. The process of tagging an Arabic corpus is in itself tedious and time-consuming. tense and/or aspect of verbs etc.3 Tagging Arabic Texts A tagged corpus is a corpus which is informed with coding to indicate additional information like Part of Speech.J. To tag an Arabic text. 4. produced a CD-ROM of all its recent issues but it is in Macintosh format.muhaddith. Albaath Newspaper: (a daily Syrian newspaper). University of Lancaster (Khoja. Al-Hayat newspaper.sahafa. Likewise.com. Al-Akhbar Newspaper (an Egyptian daily newspaper).2. like www. But it is manually tagged and is very small.org and www. 4.2. et al 2001). It provides a large number of text collections and databases. E. the text must be segmented into their component lexemes (Freeman 2001). this sort of corpora is lagging behind in connection with Arabic. ` 38 . There are also some other Arabic newspapers posted on the internet in text (not images) such as: Al-Ahram Newspaper (an Egyptian daily newspaper). There is an Arabic tagged corpus in Lancaster assembled by Shereen Khoja based on an Arabic morphosyntactic tagset along with an Arabic part-of-speech tagger in the Computing Department. published in Arabic. And many others which can be found on: www..alwaraq. V and Particle) plus some syntactic information (sing. which have recently become available on its web site: www. Al-Wafd Newspaper (an Egyptian daily newspaper). Al Bayan Newspaper (a daily newspaper from the United Arab Emirates).2 Arabic Online Texts There are many sites on the internet which provide Arabic books in digital format for free. This is because we may find an Arabic word representing a whole sentence.sakhr.. Brill in Leiden is going to release a CD-ROM version of the Encyclopedia of Islam (in Arabic).com. Although tagged corpora are now available for many Roman languages. The non-religious texts are mainly journalistic.2. masc.com. it only consists of 1700 words with the following tags (Arabic POS (N. and definite common noun)). She also has a tagged corpus of 50,000 words of Arabic newspaper text with the basic tags (N, V, Particle). Not until (2003) was Khoja able to produce a tagger for Arabic in the fulfilment of her Ph.D. thesis (Khoja 2002). Indeed, the Arabic language is relatively difficult to tag due to most of the problems raised in section 2.4. The Institute of Modern Languages of the Catholic University of Leuven started with the manual annotating of a 4-million-word Arabic corpus. They are still working hard to elaborate this corpus which will be used in the future as a basis for a semi-automatic tagging of raw Arabic corpora (Van Mol, 2002). Apart from Khoja’s corpus which is very small and manually tagged in addition to the Leuven one, which is still in progress, we do not know, at the time of writing, of any other tagged corpus except for LDC’s and Sakhr’s. The most recent of these is the one produced by the Linguistic Data Consortium (LDC) in 2003. They produced an Arabic Treebank: Part 1 v 2.0 consisting of 140,265 words (168,123 tokens after clitic segmentation). This is published as part one of a 1m. words Modern Standard Arabic corpus. As for Sakhr’s, Sakhr Company in Egypt often claims that it owns a tagged corpus, but the company said it is for their own purposes; they did not want to share it even for academic research. Although these years witnessed a vast stride in development of machine-readable tools that can handle Arabic, barely can we find a public domain tagged corpus12 or a POS tagger that can work on Arabic to disambiguate unvoweled written Arabic texts which is a very daunting task. Almuhanna (2003), for example, had to romanise the Arabic alphabet (transliteration) following Bulkwalter13 in an attempt to tag his Arabic corpus. He followed this process: 1) compiling a raw corpus, 2) transliteration, (3) segmentation, (4) tagging, (5) re-transliteration into Arabic. He used the language-independent Brill tagger to automatically tag his transliterated and segmented text after training it by using a training corpus of 100,000 words, which was already tagged manually using Freeman’s tagset (2001). 12 The tagged LDC corpus was not personally assessed; in addition we could not find in the literature a proper description of how much computation was involved in tagging that corpus other than what is mentioned above (http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2003T06). 13 http://www.xrce.xerox.com/competencies/content-analysis/arabic/info/translit-chart.html ` 39 With Brill’s tagger Almuhanna achieved 93% accuracy in his corpus consisting of 1-million words. Khoja’s APT (Arabic Part-of-Speech Tagger)14 skips the above steps needed to tag Arabic texts and works directly on Arabic script. She used a corpus of 50,000 words to train the tagger. Her rule-based tagger arrives at word-roots by removing all affixes which are then used to determine the grammatical position of the attached word. Some words were so ambiguous that they did not receive any tags. So, she used a probability-based tagger with which she managed to achieve 90% accuracy after disambiguating ambiguous words. Nonetheless, there must be a manual tagging for all the lexical items in the training phase (www.comp.leeds.ac.uk/bshawar/papers/group_paper.doc). The main difference between the tagset Almuhanna used and Khoja’s is that the former is based on Latin convention in terms of the labelling categories like N, V, Adj, Adv etc., whereas the latter follows the Arabic traditional classification of the word into N, V and particle; all other categories are treated as subcategories which can be marked by inheritance, i.e. all the subcategories of the tripartite division inherit properties from the parent categories. Secondly, Khoja’s tagger arrives at the item to be tagged directly without a need of segmentation or transliteration. However, these two attempts would, in the first place, require a lot of human intervention to hand-edit the results prior to tagging; secondly, they would fail to deal with some aspects of Arabic like seeming homonyms unless they recourse to a more sophisticated semantically-based analyser. 4.2.4 Tools for Processing Arabic As for the tools that can be used for processing Arabic corpora, there are a few that can handle Arabic texts, though not as well as English. They are as follows: 14 Khoja’s tagger is not personally assessed. 40 ` XConcord, developed by Malek Baualem, Mark Leisher and Bill Ogden (1996) MonoConc, designed by Michael Barlow (1999) Concordance by R. J. C. Watt (2001). aConCorde by Andy Robert (early release, 2004) XConconrd is designed to work on texts in Unicode Standard. It supports 17 languages, including Arabic, and allows flexible searching. It displays Arabic correctly and target word is well aligned. However, it only works on the Solavis operating system. MonoConc is a concordance program. This Windows tool is very easy to use, it can initiate concordance searches for words and phrases immediately. MonoConc offers functionality and flexibility through a variety of configurable options. This program works well for Arabic text analysis in Arabic windows, but with one major drawback: the concordance output is presented on the screen backwards. In other words, in the middle we find the KWIC (keyword-in-context) as normal, then the context that is supposed to come after the key word appears before it and that which precedes the key word follows it. However, you can save the concordance output to a text-only file, and when you open it in a text editor (e.g. MS Arabic Word), the text appears in the right order. Although this is a serious interface problem, the program generally gives a good result. A sample of the search screen of Monoconc program and the search results after saving it to a text-only file are given in Appendix 4 and 5. This program was not designed to deal with Arabic orthography, since Arabic is not among the list of languages it is claimed to handle. When I used it with Arabic texts, it turns out that it can deal with them but with some discrepancies, due to the idiosyncrasies of Arabic mentioned in section 2.4. The fact that Arabic is written with or without vowels15 requires extra laborious work to search for all the possible forms of a given word separately. For example, if you search using Monoconc for a voweled word, it will give you all the exact occurrences of that word in the corpus disregarding the other possible forms of that word or even the character variations such as alif with/without hamza and dotted/un-dotted yaa’. This makes it extra hard to arrive at all possible forms of words under examination or work out the various conventions of Arabic writing. A big problem that needs to be solved is 15 Vowels are diacritics to be put above or under the consonants. In Modern writing it is left to the intuition of the reader to guess about them. ` 41 MonoConc and Concordance are not designed to handle Arabic texts.ac. which can be downloaded free (http://www. and Web Concordances from electronic texts. so we need to strip off any affixes and look for the base-form. written right-to-left. therefore.comp. it only accepts one item as a search-term. Watt’s Concordance is not designed to handle Arabic. The major problem that faces the user is that these two programs handle the Arabic text as languages written in the Roman alphabet in terms of alignment whereas Arabic is right-aligned language. concordances. To me.e. ` 42 .html) is written in Java and will run on any platform that has the Java Runtime Environment installed. It is noteworthy to mention that Watt is currently working to add Arabic in the language list in his program (Concordance). A lot of Arabic connectives (conjunctions) are attached to their following constituents. but this program also fails to define prefixes to ignore when sorting.uk/andyr/software/index. the major drawback is that it creates a large number of temporary files and one huge ‘Concordance data file’. i. be essential to have a large storage capacity on a machine for text files containing several million words.lemmatization. which is not a major impediment. This program.0. ignoring punctuation and limited search options. like its inability to cope with markup notation. Windows 2000. aConCorde is originally developed for native Arabic concordance and support right-to-left languages.leeds. However. For example. and Windows ME which makes wordlists. Anyhow. Windows 95/98. a 1. I found this program useful in dealing with Arabic despite the problem of interface. This is an online program.9 MB concordance data file is created. it gives you a 30-day free trial and for further use the user needs to buy a registration. As mentioned above. It would. For example. for a text document of 120 KB. Without that program I would not have finished my work. ‫ الولد‬al-walad ‘the boy’ and ‫ ولد‬walad ‘boy’ get counted as separate items unless the user lists all possible forms of this word. this program is released early with shortcomings noted by the designer. Text lines need to be short (no more than 15 words per line). Concordance is a program for Windows NT 4. but they happen to do so. otherwise it would be awkward to trace the full line of the ‘headword’ in the ‘view’ window. and entirely consists of written materials16. Using electronic texts available through the Internet is useful for the following reasons: it saves time and cost and it is more accurate. Because we are concerned in investigating lexical issues in Arabic we do not see having a tagged corpus as a major requirement. I got a copyright permission for academic use from the web site designers for the effort they have made in making these books available on the Internet (see appendix 1). 98 and 2000 and thus assures a reasonable degree of userfriendliness with its graphical user interface. Lexical investigations can be carried out with the aid of a raw corpus. 4. I will count them as written texts because they reached us in such a form. it is compatible with Windows 95.3 below. As for the question of copyright. they were transmitted orally for long time before the early Muslims put them in a written form. are adequate for grammatical purposes. go back ten centuries or more.The reason why I can use Monoconc is that the only text that has diacritics is the Holy Qur’an which constitutes 1. and it is particularly useful for anyone involved in Arabic lexicography and basic corpus-based linguistics. Lexical patterning on the other hand requires the use of very large corpora. The works I downloaded are mainly books.8% of the corpus as discussed in 4. I also gathered some short poems written by one poet into a collection and I treated them as a text. of one million words or even fewer. Sinclair (1991:100) argues. since the frequency of occurrences of so-called grammatical or function words is quite high. apart from the Holy Qur’an. all of these materials. For quick concordances and word frequency counts MonoConc is a very useful tool.’ It is easy to pinpoint 16 Some of the Classical Arabic texts are originally spoken texts such as the Holy Qur’an which is Allah’s Book and the Prophet traditions. In addition. However. A good percentage of all published material in Classical Arabic now exists in electronic form. To investigate lexical collocation in Arabic it is important to create a big corpus to inform our research. so they do not need copyright from their authors. However. This corpus has currently around five million words. Experience has shown that grammatical patterning can be identified and described on the basis of a relatively small corpus. so it is easy to include this in my corpus without it having to be scanned or retyped.3 Description of the Corpus The Classical Arabic Corpus (CAC) is a raw corpus. The time span of these writings starts as early as the advent of Islam up to the end of the eleventh century. ` 43 . ‘fairly small corpora. a 5-million-word corpus is not large compared to the available non-Arabic corpora. Generally speaking. The proportion of fiction.1. as Sinclair (ibid) pointed out. many of them are fictional (Somekh. 1991: 21). However. Moreover.some generalities concerning function words through checking how common a word is. These two tables are represented in charts as shown in Appendix 2. literature. Under belief and thought we have five subgenres: the Holy Qur’an. linguistics and science as represented in table 4. there are a variety of stories and popular legends written to have a moralistic impact on Muslims. 1993). With regard to our corpus. but this may take a long time to do.315. which is apparently a part of literature.2. Although the corpus is just one million words. biography and philosophy. Therefore a further dichotomy for my corpus is needed which can give a close picture of the major interests of the early Muslim writers. By doing this. theology. For example. The corpus can be divided into four genres: belief and thought. it is the biggest Classical Arabic corpus assembled so far and I have a motivation to keep on maintaining it to become bigger and much more diverse. at 11% is drastically less than the non-fictional texts. In addition. there is a possibility to include every text that exists. in LOB. The genres can further be divided into subgenres as shown in table 4. the Prophetic Tradition. the first most frequent word that one can notice in LOB corpus is the at 68. This is because of the considerable lack of fictional materials in the period under investigation. Linguistics is represented in this corpus as having two ` 44 . Nevertheless. Although the majority of these narratives concern the leading personalities in Islam.3-million-word corpus. we still are able to make some investigations about the function words and other grammatical issues which are expected to co-occur frequently in any corpus. it would definitely be representative (Biber. the first possible dichotomy is into fiction and non-fiction. it is enough for the purpose of my corpus to conduct a principled selection rather than a mere accumulation of texts. It is a well-known fact that the novel and drama have only recently been introduced to Arabic literature. Literature has also two subgenres: poetry and fiction. Since the corpus is limited to the early period of Islam. it is noteworthy to mention that the Cobuild dictionary was informed at the very beginning by the observations derived from a 7. 1 The rationale behind this selection The selection of texts to be included in a corpus can be done by chance or by choice. mathematics.2): Subgenres included in CAC17.1 1.080 82. under science we have geography. since it is too long to include in the CAC.8 13.000. For more illustration about the natural texts included in the corpus see appendix (3).97 766.385 579.4 11.553 736.6 20.608 12. ` 45 .subgenres: proverbs and lexicons.9 9.499 57.3.000 Table (4.037.2 8. Text Size in Words 88.205 Table (4. we arrived at that number after deleting a part from the theology genre.1): The genres of CAC.684 Percentage 53.6 1.387 69.7 1. As Atkins et al (1992: 3) put it. ‘the selection of sources might be based on a systematic analysis of the target population or on a random selection method’.141 1.5 100% 5.054 404.682.134 903. Tabari’s book after that deletion constitutes about one-sixth of the corpus. Finally.933 478. particularly the Tabari’s book on Tafseer (exegisis of the Qur’an). Genre Thought and Belief Literature Linguistics Science Genre: Thought and Belief subgenre The Holy Qur’an Prophetic Tradition (Hadith) Biography Philosophy Theology Literature: Poetry Fiction Linguistics proverbs lexicons Science Geography Physics Medicine Mathematics Total Size in Words 2. 17 The overall total of CAC is exactly 5-million words.7 7.86 percentage 1.46 15.223 362.7 0.970 393. The latter alternative enables the corpus builder to make deliberate selection of the texts to be included.2 14.6 7. 4. physics and medicine.035 648.622 683.32 17.469 26. 3. Below is a description of the subgenres included in the corpus: 1. ` 46 . political. and it is not prose because it is not composed in the same manner in which prose was customarily composed. I tried not to skew the corpus too much in any direction as ‘the stylistic idiosyncrasies of a particular author can be reduced in significance if texts by many different authors are included’ (Atkins et al. The Qur’an consists of 114 chapters (surahs) covering the social.2 Why these texts? Unlike early Arab linguists whose corpus. The first text on which early Arab linguists relied is the Holy Qur’an. where we may come across sentences that extend over a number of lines. This can ensure some sort of representativeness and this is what I adopted in this thesis. This is the primary evidence which Arab linguists relied on to prove the correctness of any linguistic issue. It is not poetry because it does not observe the metre and rhyme of poetry. gathered for linguistic investigation in their period. as it is more convenient for investigating Arabic. 4. As mentioned earlier. The Qur’an’s structure is neither poetry nor prose. mainly comprised first and foremost the Qur’an and the old tribal poetry in addition to the nomads’ proverbs and sayings (Versteegh. 1997: 42). I used other genres and subgenres to have a real representative corpus. To Muslims. Likewise. like the Cobuild corpus and the Bank of English. The corpus in hand includes among other things texts from the main branches of knowledge introduced by the advent of Islam. 1992: 2). There are many corpora nowadays based on the approach of whole texts. cultural. By doing this. I prefer whole texts to be the unit of study. there are two ways of sampling: ‘whole text’ or ‘word text fragment’. and religious life of Arabs of the early seventh century with references to some previous peoples.To select parts of the population as an object of research there has to be consensus among linguists on the authenticity of the selected parts or the selection has to be based on principled choices. the Qur’an has the highest position in religion and in language. where they used to present valuable prizes for the best poet. To me. He said. poetry is unrepresentative of mainstream linguistic behaviour (Renouf. to Qur’an 447 times. and 350 to prose. we have to bear in mind that the older the poetry. Thus the Qur’an says: If the whole of mankind and the jinn were to gather together to produce the like of this Qur’an. To me. poetry has been regarded as a main and authentic source of pure language. especially at ‘Ukaz. Ibn Abbas in his commentary on the Qur’an relied on poetry to explain the meaning of unclear lexical items in the text of the Qur’an. The Qur’an is inimitable. This point is repeatedly emphasised in the Holy Book itself.’ Also. Next to the Qur’an. they could not produce the like thereof. The importance of poetry as a source of data for linguistic investigation can be shown in Sibawayh’s reliance on it as the primary type of textual evidence. God challenged the Arabs to produce even a verse (a line) like the Qur’an but they could not. especially Classical Arabic. look for it in poetry. ‘When you want to learn the meaning of any weird word in the Qur’an. it is unique in style and unexcelled in beauty. Within this specifically Arab context.The early Arabs privileged language. Poetry was highly valued in Arabic cultures of the Middle Ages. The selection criteria for Cobuild excluded poetry from the Cobuild collection because. to them. In his Kitab he referred to poetry as evidence 1050 times. 3. (17:88) 2. six times to Hadith. poetry cannot be ignored when looking into Arabic linguistics. even if they backed each other up. Hadith by definition can be subsumed under spoken material as it includes all the recorded sayings and actions of the Prophet Muhammad. the more authority it possessed. It was transmitted orally as was the Holy Qur’an. the use of Hadith for grammatical investigation in classical Arabic is of great importance since the Prophet Muhammad is considered one of the most eloquent speakers of his community because of his early upbringing among Bedouins who were renowned for ` 47 . 1984). they held public fairs for poetry in Mecca. Hadith (Prophetic Tradition): It is a main source of authentic data. the Prophet Muhammad was sent as a Messenger and his major evidence is the Qur’an. The language of the Bedouin has changed less than other varieties because they live away from urban communities where different people of different dialects and languages live together. are usually referred to by scholars as Al-S{ah{ih{aan. Theology: This type of texts flourished very early as the Muslims encouraged by caliphs and motivated by their interest in studying their religion. Jurisprudence was also introduced to explain the Islamic rulings that concern all Muslims in worshipping. These rulings were derived from the Qur’an and Hadith. political system and relations with other people. and comments on the syntactic and semantic structure. dogmatics. Another branch of theology was the Foundations of Creed. Based on what Al. This gave rise to the rational approach of presenting Islam to nonMuslims. i. 922). the two authentic collections. the reasons behind their revelation i.Mufad}d}al did. 5. These two collections. 1124) compiled their collections of proverbs in a more comprehensive way. Al-Mydani’s Majmac Al-Amthaal (the Collection of Proverbs) contains explanatory notes on poetry. The first person to collect the Arabic proverbs was Al-Mufad}d}al ibn Salim (d. 4.e. etc. Abu Hilal Al-Askary (d. 1004) and Al-Maydani (d. One of the most famous works on Tafseer is Al-Tabari’s (d.e. 784 AD). the historical references. 935) Al-Ibaanah fi ` 48 . introduced some sciences related to the Holy Qur’an and Hadith such as the Qur’an exegesis. the most authentic of them are AlBukhari and Muslim. dogmatics. This science of studying the basics of belief was introduced as a result of defending the Islamic belief against heretics and other sects. jurisprudence (Fiqh). Some scholars compiled and classified these hadiths in systematic collections. One of the most important works in this field is Al-Ashcari’s (d. daily transactions.their eloquence. Hadith literature also retains a lot of ancient usage. which is considered ‘the richest repository in this branch of study containing from verse to verse everything he could gather from earlier literature’ (Goldziher: 1966: 46). Proverbs and Bedouin sayings: As already mentioned the third authentic source of data which early Arab grammarians depended on is the Bedouin proverbs. which I included in my corpus. The Qur’an Exegesis deals with the meaning of the verses. Al-Khalil (d. Then came Ibn Sina. The shift undertaken by Al-Kindi (d. Philosophy was just another aspect of religious studies.‘Us}uul al-Diyaanah (The Explanation of the Roots of Creed). Ibn Ishaq’s work was more comprehensive than Ibn Hisham’s. 895). who is known in the West as Avicenna. For example. including a book in logic. Linguistics: Early Arab linguists influenced linguistic investigation universally. It is a significant contribution to sociology and political science. Biography: The first coherent biography of the Prophet was written by Ibn Ishaq (d. was considered the first systematic and comprehensive work of its kind. Al-cayn. Al-Ashcari was the first to formulate the orthodox thinking of creed. One of his most important contribution is Aara’ Ahl Al-Madiinah Al-Faad}}ilah (The Utopia). History then became an independent genre with works like Al-Akhbaar Al-T}iwaal (Long Narratives) by Abu Hanifa al-Dinawri (d. 8. 833) to make the oldest and most classical work in this field. 7. Ibn Ishaq was first entrusted by the Caliph Al-Mansur with the task of writing a book for his son Al-Mahdi on history since the first man on earth until their time. nations and kings) by Al-Tabari (d. to inform this genre with many works. Philosophy: Arabs started to try the speculative methods in order to defend or spread Islam. 922) which reflected various historical and cultural aspects of Islamic life. 768) whose Siirat Rasuul Allaah (The life of the Apostle of Allah) was revised and reworked by Ibn Hisham (d. Taariikh al-rusul wal-umam wa al-muluuk (the History of prophets. as the latter was only on the biography of the Prophet. 6. He combined Greek philosophy and Muslim theology. 872) from writing on philosophy as a religious tool to writing on pure philosophy is considered the beginning of the separation of philosophy from dogmatics. Therefore he was the first independent writer on philosophy. 786) was the first to give lexical order in the collection of his dictionary. Al-Khalil’s lexicon. ` 49 . held the belief that philosophy and Islam are in harmony. His book Al-Ibaanah has influenced most writings on theology even today. Al-Farabi. for example. In medicine Ibn Sina’s book: Al-Qaanuun fi AlT}ibb (Canon of Medicine) was considered the first comprehensive encyclopaedia in medicine. This is not the case with the works I included in CAC since the works of linguistics and lexicons I included are written entirely in Arabic without quoting any single foreign word. ` 50 . Al-Biruni’s (d. pre-Islamic poetry and nomad proverbs (cf. 977) studied most of the Islamic world and wrote his marvellous book: Ah}san al-taqaasiim fi macrifat al-aqaliim (The best Division in the knowledge of Climes) that made him a pioneering geographer of his time. to include such works in a corpus could be misleading in the sense that they may use other languages to prove a universal phenomenon or to investigate these languages themselves (Paul Bennett. Physics was also studied by Arab scholars. 9. His book Kitaab al-Jamaahir discusses the properties of various precious stones. Arabs are the inventors of the symbol 0 (zero) and this laid the foundation of positional arithmetic.In addition to lexicography. medicine. personal communication).2). In geography Al-Maqdisi (d. Secondly. Science: The early Arab scientists paved the way for the modern scientific observation in mathematics. it became the textbook for medical education in Europe in the 12th century. It is noteworthy to mention that there are arguments for not including lexicons and linguistics works in a corpus because in the first place they may contain citations from other works which have their own grammars and stylistics. physics and so on. citations from other works are only restricted to certain texts: Qur’an. 849). philology was also mastered by the early Arab linguists. In addition. When the Al-Qaanuun fi Al-T}ibb (Canon of Medicine) was translated into Latin. AlThacalibi’s Fiqh Al-Lughah (The Code of Language) (d. 1037) was really a marvellous compendium of philology. He was a pioneer in the study of metals and precious stones. Another field of science in which the West was indebted to the Arabs was mathematics. 1048) contributions in physics were pervasive during the first part of the last millennium. 3. The first to write an arithmetic was Al-Khawarizmi (d.2. Al-Jahiz (d. ` 51 . The Misers is a collection of anecdotes that criticises the social conditions of his time in a comic way. 3)It is a monitor corpus. It was translated and reworked completely to leave no Persian traces so as not to contradict the Islamic thought during the Abbasid period. i. the 5-million word Classical Arabic corpus (CAC) is considered a pioneering corpus for the following reasons: 1)It is an electronic corpus.4 Conclusion To sum up. which deals with only one variety of Arabic along a particular period of time. however. The Thousand and One nights (Arabian Nights) (850) was originally written in Persian. we will keep on maintaining it by adding more texts and genres. 2)It is balanced. This can make the study based on it more consistent and more methodical.868) had contributions in a variety of genres among which are philology and artistic prose. His book. Fiction: As mentioned earlier that there is a considerable lack in Arabic fictional works. early Classical Arabic.e. it covers a wide scope of written Arabic texts to be used for more than one purpose. some early Arabic works can be subsumed under the narrative prose such as AlBukhalaa’ (The Misers) by Al-Jahiz and Arabian Nights. 4)More importantly.10. 4. this makes investigating Arabic a more accurate and faster process. this corpus is synchronic. etc. place. as will be discussed in detail in chapter six. or the noun work with the adjective interesting. and collocation can only be observed through repeated usage (Smadja. and the copulative verb is is related with the adjective interesting. what is the first word that comes into your mind when you come across a word like work? There are many possible answers such as is. Generally speaking. The word work in (1) above is syntagmatically related with the definite article the. does. derivational. syntagmatic. it is the next word in the phrase or the sentence. Sense relations are divided into three classes: paradigmatic. not mentioned in the sentence. We are of the position that both types of sense relations. are complementary to each other because words acquire meaning from both axes. Finally. paradigmatic and syntagmatic. therefore. This is called a syntagmatic reply because it provides the phrase or the sentence with a required syntactic form.Chapter Five: Lexical Collocation 5. if the 18 Any two words can have a relation. Synonyms in their propositional sense can be substituted for one another. McKeown. This is called a paradigmatic reply because it chooses another word from a set of semantically related words. it would be sensible to give a brief account of the relationship that holds between synonymy and collocation. This is mainly a syntactic relation. and Hatzivassiloglou. The significance of a relation is discussed by Cruse (2000:145-47). Through collocation we can distinguish one sense of a word from another and know whether the seemingly synonymous words (for example) are real synonyms or not. ` 52 . a device with which a particular sense of a word is activated. but there might be words which are more significant than others. 1996:5). The relationship that a linguistic element has with other elements inside the sentence is called syntagmatic. Or the answer could be words like job or career.1 Introduction As the subject matter of this thesis is to look at synonymy in Arabic contextually through lexical collocation. Collocation is. Both involve two different kinds of relations18: synonymy is a paradigmatic relation and collocation is syntagmatic. Let us consider the following example: (1) The work is interesting. the early bird catches the ` 53 . A list of phraseological norms derived from corpus analysis corresponds to a cognitive profile of the word’s meaning. as are idioms. compounds and clichés. Hanks (2000:1) argues that ‘corpus analysis shows that differences in meaning (metaphorical and literal alike) are associated with different phraseological and syntactic contexts. ingenuity. that has lost originality. 1967).answer uses the same word but in a different form. Likewise. this is called derivational. Collocations can be defined as the co-occurrence of words. kick the bucket. usually expressing a popular or common thought or idea. quoted in Stubbs (2001a)) noticed. collocates provide observable evidence of word meaning. As Fraas (in press. as will be illustrated below. and impact by long overuse’ (the Random House Dictionary of the English Language. a sentence or phrase. With the possibility of carrying out linguistic contextual analysis of large quantities of electronic texts. stereotyped expression. Firth (1957) emphasised that the meaning of a word is determined by its co-occurrence with other words. Sinclair states. Since then the term has been extensively used by linguists to explain how words are related to one another and for other purposes. He called this phenomenon collocation. we become more or less able to account for the interaction between meaning and syntactic structure in an empirical way. Idioms are those in which the meaning of the whole cannot be understood from the meaning of its parts. Collocation is a clear-cut way of looking at word meaning in a practical way rather than by means of conceptual analysis. For example. break a leg. A cliché is defined as a ‘trite. Such a trend could be an interpretation of Wittgenstein’s statement that ‘the meaning of a word is its use in the language’ (1953: 20). ‘meaning can be associated with a distinct formal patterning’ (1991: 6). For instance.’ Stubbs also noted that the word meaning could be defined not only by individual words or grammatical structures but also by collocations (Stubbs 1996: 89). Collocation had a significant currency in linguistics from Firth’s Modes of Meaning (1957) on. etc.2 Definition of Collocation As mentioned above one of the relationships that hold between words on the syntagmatic or horizontal axis is collocation. 5. the meanings of collocation can be predicted or deduced from the meanings of their parts. 1990: 16). Let us consider the following examples: (2) John took the bull by the horns.e. Cowie (1981: 224) refers to them as composite units. under which collocation. (4) John crossed swords with Bill. In (5). expansion. ‘idioms contain frozen parts that do not allow any sort of substitutions’ (Gross. salt of the earth. the subject) for any thing equivalent without missing out the idiomatic sense. Nelson (2000) calls such a phenomenon of word packaging Multi-Word Items. He gives an ` 54 . only John and Bill can be changed. (3) John took Bill for a ride. Following Mitchell. the main distinction between collocation and idiom is that. a collocation is a group of words that occur together more often than by chance. He points out that the former permits the substitutability of at least one item of its constituent elements. In (4) and (5) only John and Bill can be changed. Specifically speaking. This varies from idiom to idiom as some idioms are more frozen than others. (6) The game is not worth the candle. unlike idioms. etc. etc. kick the bucket which is more fixed in terms of the transformational or substitutional processes than idioms like spill the beans. He then makes a distinction between collocation and idioms in terms of substitutability of items. On the other hand. on the other hand. So. both the subject and the object can be swapped or changed. life sucks. In (6). which can undergo a process of passivisation as follows: the beans have been spilt.worm. (ibid: 224). In (3). and then you die. For example. The latter. only the tense is free. In (2) above we can only change one part of the sentence (i. The above sentences are idiomatic in the sense that we cannot understand the meaning of the whole by understanding the meaning of its parts. idioms and compounds can be subsumed (1971: 57). transposition. cannot undergo any type of transformational processes of substitution. Mitchell uses the term ‘composite element’. (5) John cut the ground from under Bill’s feet. Compounds are built up of two or more free morphemes in a single lexical unit. . 5. 9. 3. 1990).. 1986: 40). as well as methods for their extraction and classification. 1991: 170). ‘Collocation is the occurrence of two or more words within a short space of each other’ (Sinclair. 8. ‘Two words co-occur if they are in the same sentence and are not separated by no more ` 55 . 1987:133). ‘A recurrent co-occurrence of words’ (Clear. Firth. ‘A collocation is an arbitrary and recurrent word combination’ (Benson. There have been many diverse definitions of collocations. ‘A sequence of words that occurs more than once in identical form. 2. We are going below to present some definitions and examples of collocations. but which are nonetheless fully transparent in the sense that each lexical constituent is also a semantic constituent’ (Cruse. and which is grammatically well structured’ (Kjellmer. defined collocation as the company that a word keeps (Firth 1957:179). 1978: 132). 6. 1995b: 245). 7. He illustrates his point by the following example: ‘One of the meanings of ass is its habitual collocation with an immediately preceding you silly’ (1957: 196). For example: 1.. ‘sequences of lexical item which habitually co-occur.interesting brief account of the definitions and types of such multi-word items since 1864. Let us first have a look at some of the definitions of collocation. 1993:277). ‘The co-occurrence of two or more lexical items as realisations of structural elements within a given syntactic patterns’ (Cowie. ‘The habitual co-occurrence of words’ (Stubbs. 4. who was the first to introduce the idea. in its general sense without any sort of syntactic restrictions is more likely in conformity with the commonly asserted claim that all words and expressions. put it more specifically within the boundaries of sentence.8. discuss in more detail the questions of spans and frequency to arrive at a definition which could be closer to the purpose of the present study. the other two. are restricted in their distribution (van der Wouden 1997: 45). ` 56 . These definitions seem to have three main characteristics: the co-occurrence of at least two words. the frequency of this co-occurrence and the fact that the whole chunk should occur within a given span of words. This also enables us to investigate interrupted phrases of interesting distribution which we might not be able to account for with a restricted definition of collocation. as far as I know.3 Collocation and Colligation The company that a word keeps could be lexical or grammatical. For instance. Thirdly. 10. is contrary to all linguistic definitions of collocation. collocation. The term colligation is introduced by Firth. Choueka. 1993:151). a word may collocate freely with another lexical item or with a particular grammatical class. To me. For Kjellmer and Cowie. and whose exact and unambiguous meaning or connotation cannot be derived directly from the meaning or connotation of its components’ (Choueka. However these definitions do not mention how frequent a given combination must be or whether a single occurrence in a corpus should be eliminated or not (see 5. which. 1988 quoted in Manning 1999: 172). apart from Kjellmer. To pursue the premise I shall. that has characteristics of a syntactic and semantic unit. Secondly. ‘a sequence of two or more consecutive words. Cowie. 5.4 for more detail). Choueka’s definition deals only with collocation of adjacent words.than five words’ (Smadja. there is no syntactic condition given. Choueka and Smadja. after addressing the types of collocations. the grammatical structure must be considered. and Smadja. regardless of their syntactic position. The former is called collocation and the latter is colligation. who states. For example. factors of collocational eligibility. what. without reference to grammar. first person singular nominative. Following Halliday. He also states. an open window. that enters into collocation’ (1966: 20). We can only preserve the simplicity of our grammatical description if we are prepared from the start to let it be understood that there are lexical factors. cannot be described in this way. Firth (1957: 13) Halliday makes a distinction between collocational and grammatical levels or lexis and grammar but he noted that they are still interrelated. McIntosh considers the distinction between grammar and lexis necessary ‘if the patternings are to be economically stated or defined’ (1966: 183). He argues that ‘collocation is outside grammar: it has no connection with the classes of the words. To differentiate between grammatical and lexical levels. Halliday et al (1964: 32-33) note that where there is a choice between different classes of language items at a place in structure we have the grammatical level. Such items do not belong to grammar but to lexis. which tend to rule ` 57 . as grammar cannot fully distinguish between items like table and chair. the past tense of a transitive verb and the third person singular in the oblique or objective form. He used the term lexical as a substitute for collocational (1966: 152). or the opening of the window collocate with window in the same way irrespective of its grammatical position. on the other hand. which. Other language items. It is the lexical item. when we choose between this which is singular and which is not that and between items like who. For example open in open the window. we can account for differences between such items grammatically. whose.The statement of meaning at the grammatical level is in terms of word and sentence classes or of similar categories and of the inter-relation of those categories in colligation. both McIntosh and Sinclair view grammar and lexis as separate. Grammatical relations should not be regarded as relations between words as such – between ‘watched’ and ‘him’ in ‘I watched him’ – but between a personal pronoun. which(ever). Mitchell (1971: 53) argues in favour of Firth’s approach but with additional implications. Recently he defines colligation as ‘the co-occurrence of words with grammatical choices’ (2000: 10). the grammatical item or class that a word tends to co-occur with. that.). So the relationship that holds between collocation and colligation is just a matter of generality. To Mitchell. 1966: 183-84) Sinclair (1987b: 322) emphasises that lexical collocations and grammatical collocation are just tendencies and choices.out of actual use a large number of ‘sentences’ (and smaller units) even though they seem to conform to all the rules of grammatical pattern. Recently. Further distinctions are made between upward and downward collocations (Sinclair 1987b). For example. whose) (ibid). Counting the frequency of the occurrence of each sense shows that reason in its sense of ‘cause’ occurs more frequently with the demonstrative deictics (like this. 5. Hoey (1998: 8) reintroduces the term colligation. colligation can be defined as a class of collocations. When the collocates around the search term (node) are more frequently used than the node itself. He rejects the Halliday and Sinclair approach and views colligation as necessary to account for the assemblages that prefer to appear in a certain structure. your. the different senses of the word reason as meaning cause or rational faculty or logic can be accounted for grammatically. what(ever)) and not with the possessive ones (like my. For ` 58 .4 Types of Collocation We have mentioned earlier that in the collocation literature distinctions are made between grammatical and lexical collocation (see 4. it is called upward collocation.more specifically. (‘motive’ verb + ‘directional particle). John’s.3. for example. (McIntosh and Halliday. He notes that lexical items tend to co-occur with other lexical items in a certain grammatical position. Therefore. collocation is different from colligation in the sense that the former uses words and the latter uses word-classes. He therefore defines colligation as ‘the grammatical company a word keeps’ (ibid)-. decisions to be made etc. He observes three types of collocation: open collocation (which is free word combination).: 27). climbed. get. it is a sequence of words that cannot be broken or interrupted without losing the meaning of the phrase such as stock market and foreign exchange. Some authors.example. The last is considered ‘a bridge category between collocations and idioms’ (Cowie 1983: 228). on . Emery (1988). from. like Emery 1988. restricted. the search term back is less frequent than words like at. ‘in a restricted collocation. Compromising between Cowie’s and Aisenstadt’s views. On the other hand. yet hardest to identify is made of ‘two words repeatedly used together as a similar syntactic relation’ (ibid. and 3) phrasal templates. or no empty slots’ (ibid 149) such as the often repeated sentence in weather reports: temperatures indicate previous day’s high and overnight low to 8 a. Open collocations contain elements which can be used with different words without a big difference such as (bada’at/intahat alh{arb/almacraka ‘the war/the battle began/ended’ ) (Emery 1988). 1979 quoted in Emery. The second. bring. several. come. following Cowie (1981) sees collocations as a scale at the end of which lie idioms. Emery does not count such combinations as collocations because they are unrestricted by usage (ibid. Combinations of words that select each other not only in terms of semantics (like in open collocation) but also by usage are called restricted collocations (Ainsenstadt. one (but ` 59 . Smadja (1993) identifies three types of collocations: 1) rigid noun phrases. The third type is characterised as long and domain-driven collocations.m. 2) predicative relations.: 148] such as to make a decision we can put in several ways such as made an important decision. he. downward collocation is when the collocates around the search term (node) are less frequently used than the node itself. and bound (rigid). him. ‘Phrasal templates consist of idiomatic phrases containing one. like back with arrive. Smadja 1993 and Lewis 1993. The first is the most fixed type of collocation. Emery says. Cowie (1983: xiii quoted in Emery: ibid) puts a condition for such restricted collocation that one item of the combination should have a figurative sense. which is the most flexible one. 1988). made another distinction where several types of collocations can be identified according to the degree of the collocation strength and currency. 19 ‫‘ أمر‬amara’ can be replaced by certain other verbs such as ‫ حكم‬h}akama ‘sentenced’ and ‫ قضى‬qad}a ‘made a judgement’. But in the fixed collocation ‫ يداك تربت‬taribat yadaak ‘may your hands become dusty’20 there is no alternative to the noun ‫ يداك‬yadaak ‘hands’. On the other hand. 20 This means your hand you will have nothing to sleep or sit on it and become poor. whereas the collocations in which individual words may be replaced by certain other words. are called free collocations. sakaraatu al-mawt ‘death throes’ al-dunya wa al-‘aakhirah ‘this world and the hereafter’. ‫ نهيق‬nahiiq ‘braying’ have a strong bond to appear in such a context. in the free collocation ‫ القاضي أمر‬amara alqaad}i ‘the judge commanded’. such as foot the bill. the sounds made by dogs or donkeys ‫ نباح‬nubaah} ‘barking’. in Arabic or in English can be subsumed under fixed collocations. 19 To show the differences between near synonyms I translate them literally.not more) of the elements may be either literal or figurative’ (ibid: 27).8 we will embark on a more methodical way in identifying collocation using corpus-based methodology. the verb in explodes a myth/ a belief has a figurative sense whereas that in clench one’s teeth is literal. etc. ` 60 . The third type is restricted collocation. This discussion of collocation is apparently a semantic-based. In this study I will take the view that the phenomenon of collocation should be understood as a gradual cline along which we may locate different degrees of collocation: fixed (rigid/bound). In section 5. the free collocations are words that are most likely to co-occur in infinitely creative ways (Lewis 1993).: 29). semantically and by usage. For example. For example. It is a combination of two or more words which attract one another syntactically. in bound collocation ‘one of the elements is uniquely selective of the other’ (ibid. i. For example. For example. in non-metaphorical expressions. Finally. All three possibilities are collocations and have the same meaning. which constitutes the majority of Arabic collocations and falls halfway between fixed and free collocation.e. al-ghiiybah wa al-namiimah ‘backbiting’. In fixed collocations the replacement of individual words is not allowed. there is no other word to describe the dog’s or the donkey’s sound in normal language use. The sounds of animals and birds. which can give intuition a free hand to identify it. semi-fixed (restricted) to free (flexible). In his investigation on Arabic collocation. Each in his own wilderness by Doris Lessing and Everything in the Garden by Giles Cooper. It is also remarkable for its abundance of near synonyms. Berry-Rogghe (1970) made some experiments to arrive at an optimal span on his corpus which consists of three works: A Christmas Carol by Charles Dickens. al-s}aahibi. as proposed by Sinclair. quoted in Nelson 2001). To limit the span to the number of orthographic words. This is because Classical Arabic has a very rich and varied vocabulary with highly specific meanings. strong (fixed) collocations in English are relatively few (Lewis & Hill 1998. He intuitively found out that four words are the optimal span as it is long enough to produce an optimal number of relevant counts. Such abundance in vocabulary is a treasure trove that can let words select particular words without repetition. decorate. While some languages have a single word to describe one thing. Hoogland (1993) concentrated on restricted collocation because as he argued it constitutes a large and unpredictable category. For example. there are over 500 words for ‘lion’. painting. each with a specific connotation (Ibn Faris. 21). Arabic has hundreds. Sinclair (1991) proposes a short span of no more than five on each side of the search term.Unlike Arabic. outside. opposite. does not ` 61 . this. When he tried three words as the span size. empty. it is expected for any researcher on Arabic collocations to be swamped by a huge amount of collocations varying from free to fixed. irrelevant words found their way as standard collocates such as Bernard. Increasing the span to six words. my. God. They prefer that span to consist of four words to the right of the node and four to the left. Others like Martin et al (1983 quoted in Kenny 1999: 70) think that five words to the left and five to the right are enough.5 Spans Jones and Sinclair (1974: 21) use the term ‘span’ to refer to the number of lexical items on each side of the word under investigation (the collocate). Therefore. 200 for ‘snake’. full. buying. etc. p. Later. More practically. loves. 5. the collocations of the word house include words like: sold. the range of the Arabic sentence could be a bit bigger than in English. One disadvantage of Monoconc is that it is not possible to capture the frequency of collocations consisting of more than three words. based on the particular item we investigate. there might still be plenty of occasions where such two items appear close enough to each other. So. We can decide the size of the span according to the grammatical position of the category to be examined. so a span of two words on each side of the search term would be enough. For example. In Arabic which is different from English in terms of the grammatical structure. would be more realistic. Moreover.work in Arabic all the time. However. we would need a careful treatment of the question of span. I would like to utilise all the corpus results as much as I can because my corpus is not so big and I do not want to miss the occurrences of a particular word because of the distance between them. For example. a flexible span. Monconc can maximally provide three on either side. Therefore. a span of five will be sufficient for such study. i. At the same time. to examine idiomatic verbs. transitive verbs. which ranges from two to seven. ` 62 . I have to use Microsoft Word to process it. We can fix the span to two or more according to the mobility of our linguistic items. to save the concordance file into a Word document. which more likely tend to have relations across the text. It all depends on the first reading of the concordance lines before getting to any analysis. Therefore for a longer span. I cannot examine all the occurrences that extend over a concordance line. we can easily search for their immediate constituents to study what particles can follow such verbs. While in other programs like Wordsmith it is possible to have up to 25 words on each side of the search-term. As mentioned before in chapter three. Even though. we may find a big distance which may extend over a number of lines between a verb and its subject or complement provided the verb contains a referential pronoun irrespective of how many words intervene between them. some items could be modified through the text which might extend over the concordance line.e. In some cases we might extend the span to involve as many items as we can from the concordance line such as in studying nouns. prepositions in Arabic show a tendency to precede their objects without any sort of interruption. etc. For instance. the verb happen collocates with unpleasant things such as accidents etc. later termed discourse prosody (Stubbs. one or more of the collocates can be used as a semantic category label for the others (comfortable for example). decay. is further enhanced by Stubbs (1995a). on the other hand. On the other hand. which have the same range of collocations. as mentioned above.’ He further argued that words seem to co-occur in a certain semantic profile. collocate with a semantic class of words.’ It is Louw who gives this phenomenon its name. From this collocational range. such as rot.e. either with positive or negative connotations. Hence. all collocates that a given search-term has across a particular text. Sinclair (1987b: 322) noted that ‘many uses of words and phrases show a tendency to occur in a certain semantic environment. 2001a).. (Sinclair 1991:112) and the phrasal verb set in occurs primarily with words which refer to unpleasant states of affairs. despair. i. Louw (1993: 157) defines semantic prosody as a ‘consistent aura of meaning with which a form is imbued by its collocates. The notion of semantic prosody. collocate with a particular grammatical class. ill-will and decadence (1991: 70ff). malaise. proposes that items like chair. and sofa are all likely to occur.e. which is called semantic prosody. where he highlighted a similar tendency towards negative or positive semantic prosody of collocates.e. on the one hand. The collocational range is defined as the whole collocates of a single node grouped together in a particular text or corpus. i. colligation.6 Semantic Prosody As words. semantic prosody is the phenomenon for which a common semantic feature among the collocates provides evidence. we can conclude that the study of semantic prosody is more or less a useful way for employing pragmatic information in the collocational analysis. For example. on one hand. ` 63 . He also noted that collocation can be simply defined as the semantic feature which stretches over several units (2001b). although the idea of semantic prosody was known for a long time before he coined it. i. McIntosh (1966).5. the items sit and comfortable and so they are all members of the same class which share the same probability of occurrence. seat. or collocate with. describing the phenomenon as the connotations that words have when they occur together (1996: 172). they. wars. ` 64 . drought. so they designate a period of time and do not carry any positive or negative sense. like a drought. 21 Following Stubbs we will judge negative or positive collocates intutively. For the positive collocates the corpus shows the following examples as in (b) below: b) goodness. weakness. In (a) above the examples show the most frequently recurring left collocates for the word sanah. Such incidents became milestones in Muslim history. it turned out that‫ سنه‬sanah ‘year’ and‫ عام‬caam ‘a year’ which are widely regarded as synonyms are used in different contexts.In Arabic I studied the lemmas ‫ سنه‬sanah and ‫ عام‬caam by looking at their occurrences in CAC. Such collocates are considered neutral as they refer to a certain historical incident such as‫‘ عام الطاعون‬caam al-t}acuun ‘the year of Plague’ and‫ عام الزن‬cam alh{uzn ‘the year of sadness’. epidemics. which seem negative21. On the other hand there is a considerable shortage of negative examples for caam. The only real negative word that collocates with caam is drought which is also shared by sanah. support The corpus-based analysis shows us how each word has its own preferred collocates and relatively different distributions. bride. worse. fertile. inflation. hardship. destruction. shared by sanah and caam. These collocations can be summarised as follows: •To refer to a bad experience that happened during this year. There are some neutral collocates. a plague or common crisis. The corpus provides lots of unpleasant examples for sanah and only one pleasant example as in (a) below: a) punishment. infertility. provision. One clear piece of evidence does come from the Qur’anic verse that states. for 950 years. Then. they did not believe. and the Deluge overtook them while they were Z{aalimuun (wrong-doers. Unfortunately. and he stayed among them a thousand (sanah) years less fifty caam (years) [inviting them to believe in the Oneness of Allah (Monotheism). the word sanah is used with reference to the first stage of his life which was full of hardships and cam is used for the rest of his life. This use found its way to the Arabic lexicons as an equivalent to infertility of the soil. Noah with the believers repupolated the earth in peace and serenity. and discard the false gods and other deities].). Consider the following example in (8) below: (8( ‫أصابت الناس سنه‬ as}aabat al-naasa sanah befell the-people year The people went through an infertility of the soil. 22 According to the Muslim literature. In Modern Standard Arabic the frequent use of cam on happy occasions is quite evident. disbelievers. polytheists. sanah could encapsulate the meaning of its collocation. Hence. Egyptians are very likely to say when congratulating one another with a new year: ‘caam saciid’ (happy new year). drought or famine (cf. More interestingly. Lisaan Al-Arab and Al-Muheet).(7) ‫وََلقدْ أَرسَلنَا ُوحًا إِلَى قوْمهِ فلِبثَ ِيهمْ أَلْف سنَة إِلّا خمْسِيَ عَا ًا فأَخذَهمُ ال ّوَان وهُمْ َاِلمُون‬ ‫م َ َ ُ طف َُ ظ‬ َ ٍَ َ ِ ‫َ ِ ََ ف‬ ‫َ ْ ْ ن‬ And indeed We sent Nuh (Noah) to his people. before the flood. but much less likely to say: ‘sanah saciidah’. drought or famine. as we can drop that collocation and use sanah to give the same meaning. ` 65 . which could be infertility of the soil. (Qur’an: 29: 14) where sanah and cam are used altogether to refer to different stages of the life of the Prophet Noah22 who suffered a lot to call his people to belief in God until God destroyed them by flood. so Allah destroyed them by flood and saved Noah and the believers. In the (8) sanah is used to describe a hard experience happened to people. the Prophet Noah lived among his people for 950 years working hard to guide them to Allah. drought or a famine. etc. So Noah lived a tiring life. we can resort to semantics. To assess a given collocation. the verb die in John died. McIntosh (1966: 194) says that our experience of the meanings that a given word has in a certain context sheds light on what words it collocates with and what range of collocations they have. competence (Emery 1986). or collocate with. Restrictions of this type are called collocational restrictions. but also with larger units i. 278). *the cow kicked the bucket and *the tree kicked the bucket. In addition. Firth views such a phenomenon as a relation of mutual expectancy and as an inseparable part of the native speaker’s knowledge of his own language.e. This approach differs from the semantic one in 23 The lexical approach not only deals with individual words. the items sit and comfortable and so they are all members of the same class which have the same range of collocations. However such an approach cannot figure out what is more frequent or typical in language use and we can discover interesting aspects of our language. The lexical approach23 concentrates on the language as a complete unit. this is called selectional restriction. and sofa are all likely to occur. empiricism in Chapter Three.7 Extraction of Collocation Collocations can be identified intuitively.5. ` 66 . the lexical items: chair. by using advanced technology in the field of corpus linguistics we can assess the problem more accurately and quickly.e. Cruse (1986) makes a distinction between two types of semantic co-occurrence restrictions: (1) selectional restrictions which can be defined as ‘semantic co-occurrence restrictions which are logically necessary’ (p. This is due to our experience with such items in a variety of contexts. In short. it does not make a distinction between grammar and vocabulary. as might be understood. the word combinations that we store in our minds. 279). seat. semantically. i. Further semantic requirements are needed in sentences like John kicked the bucket. lexically or quantitatively. For example. which could not be formed by introspection. For example. the tree leaves died and *the book died needs to be preceded by a (+animate) grammatical subject. which is (+human). The lexical item kick the bucket requires in addition to the (+animate) feature another restriction. I gave a detailed account on the credibility of intuition vs. the semantic approach tries to define collocations by the actual meanings they have and by the usefulness of combinations of words in different contexts. which is defined as ‘co-occurrence restrictions that are irrelevant to truth conditions’ (p. (2) collocational restriction. The available concordancing programs can do lots of applications: frequency lists.2. Barnbrook 1996 for more details). etc. 5. 1988: ch. However. It does not explain why a given lexical item collocates with another lexical item (Lehrer. on the other hand. Concordances can only help us find the words under examination in their environments as shown in figure (5. edit and analyse such concordances. human intervention is needed to run. The whole corpus is too large to deal with in its entirety. statistics.2) below. Therefore. It is not our goal to discuss in detail the various methods of extracting collocations. 1974: 176). Lehrer (1974: 173) criticised both approaches: the lexical approach does not give an explanation for the co-occurrence of lexical items whereas the semantic approach cannot account for the combinations that are arbitrarily restricted. Therefore.that the latter tends to account for all the relations that hold among lexical occurrences ‘in a semantically motivated way’ (as in Cruse’ collocational restriction) (Emery. it is not feasible to study all the texts manually.1 Using statistics in collocation extraction Since the main goal of my study is to use a corpus to investigate language use in Arabic and to demonstrate the potential impact of computational methods on Arabic linguistic studies. in this approach we can easily make use of computer analysis of large corpora to focus on high frequency language and to highlight typical patterns of language use. The lexical approach. word associations. let alone group them and rank them in order of importance’ (Church et al 1990: 2627).e. as a matter of combinatorial process without giving any explanation.1. in extracting collocations from Arabic corpora. i.7. ` 67 . This approach tries to define collocations by the frequency of certain word combinations in a text. for example. However. This is because ‘the unaided human mind simply cannot discover all the significant patterns. (cf. I am rather more concerned with applying the most commonly used methodology. In Chapter Three we talked about the concordance as a means of processing corpora. semantically or lexically. she argued for an eclectic view that combines aspects from both approaches. intuitively.3). looks at collocation. pl.xrce. if you search for the lemma ‘play’ using wild card.7. the output will include words like ‘plays’. verbs. In Arabic a lemma is actually a stem of a set of forms (hundreds or thousands of forms in each set) that share the same morphological. ‘playing’ and so on.qamus.htm). ‫ سنتان‬sanataan (dual in nominative case ‘two years’). ‘years’). ‫السنة‬al-sanah ‘the year’. In English we can. Apart from Xerox’s morphological analyser24 and Buckwalter’s25 we do not have at the time of writing a public domain lemmatiser which can work on Arabic because Arabic is a nonconcatenative word formation system and other idiosyncrasies mentioned above. if we search for the word ‫ سنة‬sanah ‘a year’ we will have many forms such as ‫ سنوات‬sanawaat and ‫ سني‬siniin (fem.1 Lemmatisation When examining a word. 2001 and Kamir. nouns. For example. to a great extent. For example. it is often useful to consider the different forms of the word altogether. Xerox’s morphological analyser was first made 24 (http://www. In doing so. ‫ سنينها‬siniinaha ‘her years’. which can provide all possible forms of a given word. ‫ سنته‬sanatahu ‘his year’ ‫ سنتها‬sanataha ‘her year’. ` 68 . & masc. ‫ سنينهم‬siniinahum (masc. search for a word irrespective of its grammatical change such as tense or plurality by the wild card search. On the other hand. ‘their years’). Although this seems a very simplistic search.org/morphology. ‘played’. This is because Arabic is an inflected (synthetic) language where affixes have a different function from non-synthetic languages like English. yet we could not find at the time of writing this thesis a program which can combine between the features of a concordancer and an Arabic stemmer. ‫سنينه‬siniinahu ‘his years’. ‘years’ in genitive case). syntactic or semantic features (Dichy. adverbs etc reveals that the output needs an exhausting hand editing before proceeding further to any assessment. ‫ سنتي‬sanatayn (dual in accusative and genitive case ‘two years’).1. I faced some problems when searching for words as base-forms (lemmas). 2002). ‫ سن‬saniy (pl. using wild cards with Arabic to get all related word classes.xerox. It would be difficult to search for the lemmas without some sort of human intervention such as editing our automatic counts.5. ‘their years’) ‫ سنينهن‬siniinahun (fem.com/competencies/content-analysis/arabic//) 25(http://www. xerox. to write a morphological analyser for Arabic. Secondly. there are only a few irrelevant instances containing the same root letters but they do not belong to the search-term such as lafaz{ani. ayqaz{ani. ` 69 . the program only deals with Modern Classical Arabic. This program is now available through LDC. it shows all the meanings of the root ktb without a specific reference to the word in context (http://www. Tim Buckwalter managed. enclitics and function words which are normally attached to words as prefixes. and stems. However. most of the Arabic words need an extensive hand editing because of the absence of vowels in Arabic which makes these forms morphologically identical. Also. The analyser consists primarily of three Arabic-English lexicon files: prefixes. The program uses a very limited Arabic dictionary of 4930 roots. Buckwalter’s Arabic Morphological Analyser was created for POS-tagging Arabic text. which might produce some ambiguous forms. suffixes. haz{una.xrce. To him ignoring diacritics could lead to misinterpretation and misanalysis of Arabic lexemes. using Perl. using wild card search with z{anna.for the company’s research purposes in 1997. It can deal with all words with or without diacritics. Buckwalter included diacritics in his lexicons. This program. which supports Arabic script. is not problematic because this word is not polysemous in the first place. contrary to the main stream of Arabic writing system which ignores diacritics. However. it is rather a suffix added for a morphological reason. is not originally a root letter. Secondly. hafaz{ani. For example. works on a twolevel morphological analysis: (1) roots and patterns.com/competencies/contentanalysis/arabic/). if you search for a word like ktb. For example. Only in 2002 were they able to produce an improved commercial version for teaching purposes and as a component in larger natural language processing systems. it analyses words separately from their contexts. The wild card search is useful sometimes with Arabic when the search-term is not polysemic or the base word has a limited potentiality for word building. Otherwise. Although the second root letter that is n. (2) affixes. This makes the process of singling out the search-term quite complicated and sometimes ambiguous. in these examples. are as follows: 1. this method is also useful for work on raw corpora to exclude the counts which are inaccurate morphologically27. 2. In CAC I have got 10447 instances of the base-form ktb. in the corpus. yaktubu.multiply the total number of your counts. 3. 28 I counted just the hits with maximum frequency of 100 and minimum frequency of 5. (1998: 91) propose a statistical way for editing such data26.. To guarantee the accuracy of this procedure Biber et al. Having edited the sample. 4. Let us consider the following example. I found out that 290 hits are irrelevant.e.select a random sample of the counts of the word under investigation. which are not under examination). ` 70 . kateebah NUMBER OF PERCENTAGE WORDS 2112 88% 78 3% 26 By data I mean the total number of occurrences of the word under investigation. by the proportion computed in step 3. I think. Their procedure is meant to remedy the tagged corpora. 27 This procedure is useful to know the frequency of a given word without editing the whole corpus. However. (ibid: 91) suggest ‘more than one random sample should be taken from each category in order to make sure that the proportions are similar across samples’. with slight refinement.Biber et al.compute the proportional use of the irrelevant counts in the sample. The proportion of the irrelevant examples can be represented as follows. which involve a lot of hand-editing based on intuition. These steps.1 below). by identifying the wrong and correct forms through a small sample. where we may meet irrelevant grammatical categories (i. …. although some are derived from the same root while others do not belong to the root (see table 5. Let us select a random sample28 of 2402 including relevant and irrelevant hits.edit it by hand. 290 x 100 / 2402 = 12% Thus the total number of the irrelevant forms of ktb is: 12% x 10447 = 1253 On the other hand the proportional total number of the relevant forms is 88% x 10447 = 9193 CATEGORY Relevant Counts Irrelevant Counts Based EXAMPLES Kataba. we can rather select a sample. We can notice that the irrelevant counts can be calculated proportionally.1): This table shows how misleading it can be to search on an Arabic raw corpus without hand-editing.1.… ‫عليه‬ ` 71 . In other words. instead of going through the whole corpus to eliminate the irrelevant forms of a given search-term.7.. edit it manually and run a proportional calculation as shown above. 5. a list of the occurrences of the word ktb using wild card search. In figure 5.on the same root Irrelevant Counts from Yaktasib 212 9% Other Roots Table (5. ‫1.2 Concordances With KWIC (key word in context) we can search the whole corpus in a way that saves our time and effort instead of looking up each occurrence of the word under investigation...2 below. Using the above statistical methodology in editing our data will save time and effort in hand editing. د بن داود الدينوري رحمه الله وجدت فيما [[كتب]] أهل العلم بالخبار الولى أن آدم‬ .. عيب إلى قومه فكان منهم ما حكاه الله في [[كتابه]]. ..‫2.. وكانت نسخة [[كتبه]] إلى عماله: من دارا بن دارا‬ ‫المضيء لهل ....‬ ‫81....‬ ‫3.... من عاد كما قد قصه الله تبارك وتعالى في [[كتابه]] وهو أصدق الحديث..‬ ‫7.... د فعلت. قال: ونشأ‬ ‫في ذلك الدهر ... وغربها ليعامل الناس على قدر فلما انتهى [[كتابه]] إلى دارا بن دارا غضب من ذلك‬ ‫غضباً شديدا .. ..‬ ‫4.‬ ‫5. . ى بئر الملك فكان من قصته ما هو مشهور قد [[كتبناه]] في غير هذا الموضع. ثم [[كتب]] إلى أم دارا وامرأته بالتعزية وهما‬ ‫بمدي . عذرك والسلم...‬ ‫11. ثم أمر بهما فرجما حتى ماتا..‬ ‫6.. قالوا:‬ ‫ولما ابتعث ا ... ...‬ ‫8. .. ويقال: إنه‬ ‫كان بين مه ..... .‬ ‫9..... خزائن من خزائنه وإن عبد الملك بن مروان [[كتب]] إلى عامله في بلد المغرب‬ ‫موسى بن نصير . هي بلقيس ما قد قصه الله تبارك وتعالى في [[كتابه]] إلى أن تزوجها وبنى بأرض‬ ‫اليمن ثلثة حص . .. ..‬ ‫21.‬ ‫41.‬ ‫01. . بته مهران وجودرز كاتب الجند وجشنساذربيش [[كاتب]] الخراج وفناخسرو صاحب‬ ‫صدقات الملكة‬ ‫`‬ ‫27‬ . . ... بن دارا تجبر واستكبر وطغى. .. ما ذقت من غيري والسلم........ .‬ ‫31. أبرهة قالوا: ثم ملك أرض‬ ‫اليمن أبرهة ب . إياه بتكليمه ورسالته ما قد قصه علينا في [[كتابه]] وانصرف إلى شعيب ورد أهله إليه‬ ‫ومضى حتى . إليه أردشير بالدخول في طاعته فلما أتاه [[كتابه]] امتل غيظاً وقال لرسله: لقد‬ ‫ارتقى ابن س .. . لم يرعووا فأهلكهم الله عز وجل كما نص في [[كتابه]] وهو أصدق الحديث. دارا والسكندر فلما ورد [[كتابه]] على السكندر جمع إليه جنوده‬ ‫وخرج متوجه . .‬ ‫71.. قصته وبنائه الردم ما قد أخبر الله به في [[كتابه]] فسألهم عن أجناس تلك المم‬ ‫فقالوا: نحن .. ابي وفيرك الذي تدعى مرتبته مهران وجودرز [[كاتب]] الجند وجشنساذربيش كاتب‬ ‫الخراج وفناخسرو‬ ‫91. .. يزل يؤديها إلينا أيان حياته فإذا أتاك [[كتابي]] هذا فل أعلمن ما بطأت بها فأذيقك‬ ‫وبال أ .. لك سرت فكتبت إليه: إن الذي حملك على ما [[كتبت]] به فرط بغيك وعجبك بنفسك‬ ‫فإذا شئت أن تسي .....‬ ‫51....... فلما رجع جواب [[كتابه]] أرسل إليها بملك مصر وكان‬ ‫في طاعته ليدعو .‬ ‫61. ..... ... ......2) a sample of the concordance of the base-form (lemma) ktb in CAC‬‬ ‫‪Such a facility is useful enough when there are only a few lines to look into.‬ ‫92. ... ت بإدمانه النظر في كتاب كليلة ودمنة لن [[كتاب]] كليلة ودمنة يفتح للمرء رأياً‬ ‫أفضل من ر ... ..‫02.‬ ‫82.. One of the‬‬ ‫. .. . نصارى الهواز يقال له يزدفنا.. their‬‬ ‫. ابزين والنخارجان وسابور بن أبركان ويزدك [[كاتب]] الجند وباد بن فيروز وشروين‬ ‫بن كامجار و .. ... . . some software for analysing concordance lines statistically is available.‬ ‫22. .. اجر فتاك وقال له: إني قد كتبت إلى الملك [[كتاباً ]] في بعض المور فأغذ السير به‬ ‫حتى تدفعه .. وأن قيصر [[كتب]] إلى كسرى يسأله الصلح ورد ما‬ ‫احتوى عليه .‬ ‫.‪Figure (5....‪eye‬‬ ‫‪Today.. .. لى محبسه فإنه فاجر فتاك وقال له: إني قد [[كتبت]] إلى الملك كتاباً في بعض‬ ‫المور فأغذ ال .. ر بمدينة همذان ارتاب بابن عمه ذلك وكتب [[كتاباً ]] إلى الملك يعلمه: أنه قد رده‬ ‫إليه ليأمر . However... منه الساعة حين أخبرت بإدمانه النظر في [[كتاب]] كليلة ودمنة لن كتاب كليلة‬ ‫ودمنة يفتح .‬ ‫62....‬ ‫42. م في بلدهم فأجابوهم إليه وكتبوا بينهم [[كتاباً ]]: أل يتأذى أحد بأحد فأقاموا آمنين‬ ‫واتخ ... الموادعة فأجابه قيصر إلى ذلك فانصرف ثم [[كتب]] إلى عماله بأرمينية وأذربيجان‬ ‫فاجتمعوا و . Statistical‬‬ ‫‪techniques can help us go deeper and reveal what we might not have observed with the naked‬‬ ‫.)3891( ‪early attempts that used statistics to analyse corpora automatically was Choueka et al‬‬ ‫‪They proposed an algorithm to retrieve collocations automatically from texts.‬ ‫12. أذن لعظماء أصحابه فدخلوا عليه ثم أقرأهم [[كتاب]] الملك إليه فلما سمع أصحابه‬ ‫ذلك يئسوا م ..‪work can only deal with a particular type of collocation: uninterrupted bigrams‬‬ ‫`‬ ‫37‬ .‬ ‫52. the human mind could be overwhelmed with these large data..‬ ‫72... But with‬‬ ‫‪thousands of lines... ..‬ ‫32. ر هرمزد جرابزين حتى دخل على خاقان ومعه [[كتاب]] كسرى وأوصل إليه هدايا‬ ‫كسرى وألطافه فقبل‬ ‫03.. ... uses the chi-square measure. which means either a financial body or one side of a river. Statistical programs such as Collocate and Typical (Sinclair et al: 1998) can also analyse the lexical context of words under examination. are designed to work on languages written in the Roman alphabet. Smadja (1991) designed a program called Xtract that can make statistical observations in collocation extraction. problems of polyseme (words with two meanings) cannot be sorted out automatically except in opportunistic (specialised) corpora (Smadja 1991). Such programs also make use of statistics in a wide range. Typical is designed to find the most typical citations for a given word in a line by assessing the significance of co-occurring words in a line and then evaluating the whole line (ibid: 232). He used statistical methods such as z-score to identify relevant pairs of words. Homonyms are relatively uncommon in Arabic. This program is able to retrieve interrupted word pairs but limited to retrieving collocations that contain no more than two words. in the first place. ‘Xtract retrieves interrupted as well as uninterrupted sequences of words and deals with collocations of arbitrary length’ (Smadja. concordance and key words. which is called association ratio. However. Secondly. otherwise they need some sort of hand-editing. This measure. For instance the words like bank. Wordsmith. 1998: 193). For example. while CobuildDirect uses mutual information and t-score (Oaks. 1993: 150).Church and Hanks (1990) proposed a measure to estimate collocations directly from electronic corpora. These programs. is mainly based on the Mutual Information statistic. Collocate is designed to assess the significance of collocations in a concordance file as it calculates the actual frequency of a given collocation and normalises it with its expected frequency (ibid: 229-230). there are some concordance programs consisting of a number of tools in one package such as wordlist. To remedy such drawbacks. could be disambiguated if we have an economic corpus for example. Arabic is rather full of homographs ` 74 . In addition. which is designed by Scott (1996). To solve such a dilemma we need to disambiguate these senses semantically. an Arabic reader would predict a certain vowel to occur in a certain position according to his own mental lexicon. helps disambiguate the seemingly homonymous words. which is not the case. Some learners of Arabic think that most Arabic words are mainly homonymous. Moreover.29 Arabic is a language in which vowels are represented in diacritic form. ‫ ورد‬wrd can be diacritised in the following diverse ways: ‫د‬ ٌ ْ‫ وَر‬wardun ‘flowers’. though this is obviously tedious and time consuming. َ‫ وَ ّد‬warrada ‘flowerise’ ‫ْد‬ َ ‫ر‬ ‫د‬ ّ َ‫ وَر‬wa radda ‘and replied’ and ّ ُ‫ وَر‬wa rudda ‘and was replied’ ‫د‬ All of the above choices can fit in the text on the syntactic level. This is due to the absence of vowels in modern orthography. providing the case endings according to their position in the sentence. This can be easily sorted out by inserting the diacritics when keying the corpus.which are distinguished in pronunciation. Let us consider example (9): (9)ُ‫ابتلَى ابراهيمَ رّبه‬ َ ibtala ibraahiima rabbuhu tested Abraham-Acc. Alternatively. The Lord tested Abraham. one can use a tool to diacritise Arabic text. 29 The vowels in Arabic are predicted according to personal intuition. ` 75 . Such a phenomenon can make problems for both human learners of Arabic as a foreign language and electronic processing. In theory. In other words. the vowels are rather predicted. we may change the positions of the words inside the sentence for rhetorical reasons without breaching its meaning. to diacritise Arabic texts does not work all the time since the program is expected to make a choice from a big list of probable words. ‫ورد‬ ‫الرجل‬in wrd ala alrajul. his Lord-Nom. For example. produced by Sakhr Company. This tool which is called the Diacritiser. Change from a vowel to vowel makes a different base-form and ignoring these vowels produces such homographs. ٌ ‫ وِر‬wirdun ‘portion’ َ‫ وَرد‬warada ‘came’. The starting point to analyse our corpus quantitatively to find collocations is counting. Miller. which is an extension of traditional descriptive linguistics. we have to find a way to sort it out and simplify it in such a way that it would be easy to examine and manipulate. it is particularly very important (Allen. 1995. some combinations of words will tend to occur relatively often.. a normal morphological parser will confuse the accusative with the nominative because of the absence of the diacritics which can distinguish between both of them. R1 (search-term) L1 R1 (search-term) L1 Frequency ` 76 . 1997 and Oaks.3) below. Therefore. 1963 & Fasold. because he is the Most High. in CAC. For corpus linguistics.com). as. 1993. 5.In (9) the nominative ُ‫ رّبه‬rabbuhu ‘his Lord’ which occurs next to the verb ‫ابتلَى‬ibtala ‘tested’ َ was interrupted by the accusative َ‫ ابراهيم‬ibraahiima ‘Abraham’ to give precedence to Allah’s name. The tool has not been personally assessed (see www. book) i. Analysing our corpus in such a way does not work all the time since much of the output we get may not be very interesting as shown in figure (5. Statistics has been a useful technique in all branches of language studies (cf.3 Frequency When we have large masses of electronic data to analyse. Charniak. In such sentences.e. Statistics is considered a good way of simplifying and telling us what things we would like to highlight. The table shows the frequency of the top 30 trigrams with ktb (write. Corpus-based statistical study. while others are rare or impossible. can shed light on some aspects in language which we might not be able to discern otherwise. This is apparently tedious and time-consuming as well. this is a rhetorical device.sakhr.7. 1984). the most frequently occurring three word phrases. proofreading and hand editing is necessary to eliminate such discrepancies before doing any sort of statistics automatically. 1998). Krenn and Samuelsson. for instance. The more frequent the word under examination (the node) with another word (or words) the surer we are that this combination has a significant pattern.1. . we can examine how far such a pattern is interesting by comparing their joint ` 77 . ‫ما‬ 110 53 35 32 32 28 25 24 24 23 ‫وفي . ‫في‬ . على‬ ‫إلى . ‫في‬ ‫كما ..3) the top 10 co-occurring trigrams of the base-form (lemma) ktb in CAC.3 above most of the patterns do not have a special justification to occur together. Mutual Information can help us identify interesting patterns. هذا‬ ‫يضار .. 1990) and t-score (Church. We can notice that five of the ten trigrams with ktb significantly co-occurs with God’s name (Allah) referring to the Qur’an whereas four occurrences are flanked with function words.. In figure 5. Frequency does not tell you very much.. ول‬ ‫ما . الله‬ ‫من . There are some interesting and useful statistics that one can use to assess and enhance such counts. For example.. To extract collocations statistically we need to examine how probable it is that a certain combination will occur...in…Allah from … Allah what… Allah and in… Allah from … in on … like to… Allah from… this harm … and not what … to them ‫الله‬ ‫الله‬ ‫الله‬ .. الله‬ ‫من‬ ... ‫من‬ ... Hanks and Hindle.. لهن‬ Figure (5.. 1999:147). if a word or more shows up in our corpus a number of times around our search term. mutual information (Church & Hanks. 1991).... But many collocations consist of two words that stand in a more flexible relationship to one another’ (Manning & Schütze. The most prominent ones are z-score (BerryRogghe: 1970)... By using statistical tests we are more likely to get reliable results and test how likely two words are to occur near each other. it may be misleading because ‘frequency-based search works well for fixed phrases. If p(x.ac.probability with chance. Church and Hanks (1990) argue. then we can predict no interesting association. 1991). then it is evidence that there is more likely a genuine association.html ` 78 .umist. The formula as introduced by Church et al. To use these formulae to find collocations in CAC let us have the word ‫ الدنيا‬al-dunya (the world) as our search-term and then carry out the calculations for the word as shown in figure (5. 30 http://lismore.e. Paul Johnston30 in his web site designed a program that can do the calculation automatically on condition that one has the number of each variable. y) equals or is less than p(x) p(y).4). for given two words reads: The Mutual Information compares probabilities of x and y together with probabilities of (x) and (y) independently. y) is bigger than p(x) p(y). The word under investigation al-dunya is given (x) value whereas the corpus size is represented as (n). i. to count the number of the occurrences of the combination with the number of the occurrences of each word independently. Words with large mutual information scores are likely to be more interesting (Church et al.ccl.uk/paulj/develop/mutual. If p(x. 36 2.89 1. after’ and the definite article al (the) are considered.73 2. consequently. ‫ واحد‬waah{id ‘one’ ‫وليد‬ ` 79 .34 0.16 3.4) the left collocates of the word al-dunya with maximum frequency of 100 and minimum frequency of 5. n = 5000000 (x.55 1. personal pronouns in either genitive or accusative case and the definite article with the following or preceding string of characters.82 0. To decompose such combinatory units automatically may lead to a serious problem of identifying what a word is. ‫ و‬wa ‘and’ could be a conjunction and could function as the initial letter of hundreds of Arabic words like ‫ وجد‬wajada ‘he found’.07 0.57 2.2 5. in writing.04 5.49 1.y 6 7 79 14 5 6 6 9 24 6 8 18 5 45 7 19 10 11 8 11 15 7 12 6 5 )f(y 11 17 10.25 1.77 673 551 1864 6939 2150 4462 11157 3853 34830 6987 24009 13165 11214 13356 1.03 4.82- Table (5.‫ فـ‬fa ‘and.26 2. as parts of the words that follow. This is because such units can be kernel parts of base-forms in Arabic. It is the nature of Arabic orthography to attach some particles.14 22956 32835 24534 47664 32000 32835 MI 10.f(x) = 1350.75 0.67 3.00 7.57 571 398 338 5. For example. For instance conjunctions like ‫ و‬wa ‘and’.y) the world perishable the world and its adornment the world and the hereafter the world good deed the world and torture the world little the world house the world and certifies the world what the world without the world means the world except the world mentioning the world from the world and for the world in the world until the world to the world namely the world then the world said the world verily the world on the world and not the world the statement ‫الدنيا الفانية‬ ‫الدنيا وزينتها‬ ‫الدنيا والخرة‬ ‫الدنيا حسنة‬ ‫الدنيا وعذاب‬ ‫الدنيا قليل‬ ‫الدنيا دار‬ ‫الدنيا ويشهد‬ ‫الدنيا ما‬ ‫الدنيا دون‬ ‫الدنيا يعني‬ ‫الدنيا إل‬ ‫الدنيا ذكر‬ ‫الدنيا من‬ ‫الدنيا وأما‬ ‫الدنيا في‬ ‫الدنيا حتى‬ ‫الدنيا إلى‬ ‫الدنيا أي‬ ‫الدنيا ثم‬ ‫الدنيا قال‬ ‫الدنيا قد‬ ‫الدنيا على‬ ‫الدنيا ول‬ ‫الدنيا القول‬ )f(x.100.520.98 9. which is a general statistical measure that can compare two probabilities. are ‫ الخرة‬al-‘aakhirah ‘hereafter’ with MI score at 9. (10)‫اللهم آتنا في الدنيا حسنة وفي الخرة حسنة وقنا عذاب النار‬ allahumaa aatina fi al-dunyaa h{asanatan wa fi al-‘aakhirati h{asanatan wa qinaa cadhaaba al-naar. Moreover.00. To me. I would consider the particle as a part of the word like an affix. the mainstream is to count words like within.) ‘O Allah give us a good deed in this world and next and protect us from the Hell-fire). In table 5. I will consider the word as what is between spaces as it is easier and more practical. It is a part of an often-quoted prayer (supplication) in (10) below. it is rather a direct object for the verb aatinaa (give us). 2) minimal permutable unit. These collocations. The t-score. we need a mechanism to single out real collocations from the apparent ones. insofar and themselves as three words irrespective of how many units they contain.waliid ‘newborn’. ‫ زينتها‬ziinatahaa ‘adornment’ at 10. reiterated by Muslims in religious contexts. To be more practical. So. is a useful statistic to assess the relative strength of ` 80 . describe the reality of this world according to the Muslim perspective. whereas the genuine life will be in the Hereafter. pairs with the highest MI scores. The pair ‫ حسنة الدنيا‬al-dunyaa h{asanatan ‘the world a good deed’ appears as a strong collocation. 3) maximally uninterruptible (Cruse 2000). In addition.e. etc. The most significant left collocates. the repeated citation of this prayer is not an independent occurrence of this collocation. The word is defined as 1) a sequence of characters with spaces in between. Accordingly. The collocate h{asanatan does not modify al-dunyaa in the first place. Muslims view the world as an adornment which will inevitably perish.98.57 and ‫ الفانية‬alfaaniyah ‘perishable’ at 10. it would be more realistic if we stipulate from the very beginning what a word is. (trans. i. in English.4 the search term ‫ الدنيا‬al-dunyaa ‘world’ occurs 1350 times in CAC. In addition to the usefulness of MI in finding a given collocation without any prior knowledge of its plausibility.e. it is the problem of all statistical tests. The major problem with MI is that it does not work very well when there is not much data. For the very strong bond between ‫ رسول‬rasuul ‘Messenger’ and ‫ الله‬Allah the word ‫ الرسول‬al-rasuul ‘the Messenger’ with definite article can replace the whole pair. ‫ مرسل‬mursal ‘sent’ and ‫ مصدق‬mos}addaq ‘truthful’ for their high MI scores. Manning and Schütze (1999: 169) calculated the MI scores of ten bigrams that occurred once to prove the invalidity of MI with sparse data.99.22 1.573. sparse data problems. Of the three collocations. it can also detect whether a given combination is really a collocation or not.5) shows that the search term ‫ رسول‬rasuul ‘Messenger’ collocates with ‫‘ الله‬Allah’. Table (5. This will be discussed in detail in the next chapter.045. the measure does not work very well either. f(x) = 11805.y 11598 24 14 9 9 6 4 4 4 )f(y 19246 34830 -1. the first has a strong bond with our node with MI at 7. We cannot calculate the probability of a given pair if one of the variables has the value zero and with very low occurrences.306.77 24534 67 18843 34 4566 5046 12261 MI 7.83 2.431.23- Table (5. n = 5000000 (x.collocation. i.99 2.y) Messenger of Allah Messenger from Messenger verily Messenger truthful Messenger prayed Messenger sent Messenger with what Messenger of the king Messenger except ‫رسول الله‬ ‫رسول من‬ ‫رسول قد‬ ‫رسول مصدق‬ ‫رسول صلى‬ ‫رسول مرسل‬ ‫رسول بما‬ ‫رسول الملك‬ ‫رسول إل‬ )f(x. They found out that ‘a large proportion of bigrams are not well characterised by ` 81 .5) the left collocates of the word ‫ رسول‬rasuul with minimum frequency of 4. Let us consider the following example. This draws our attention to the strong bond between the pair ‫ الله رسول‬rasuul Allah ‘Allah’s Messenger’ in addition to the main traits of this Messenger which are ‘truthful’ and ‘sent by Allah’. When discussing the collocations of body parts in Classical Arabic. MI can reveal what is not expected or often missed out of the obvious typical patterns. For the other adjective ‫ جرار‬jarraar ‘huge’ it collocates with ‫‘ عسكر‬askarun ‘soldiers’. ‘The ELE. In practice. i. no matter how unusual. it equals (0). Emery (1988) argued that the adjectives in ‫ ضروس حرب‬h{arb d}aruus ‘fierce war’ and ‫ جرار جيش‬jayshun jarraar ‘huge army’ uniquely collocate with their preceding nouns. however. I found out that none of them co-occur with such nouns even once. gives an equally likely probability to each possible word class’ (ibid: 195). only events that recur are worth assessing the significance of. For example.e.e. to investigate how typical a given pattern is. the Mutual Information statistic will be inapplicable.corpus data (even for larger corpora) and that mutual information is particularly sensitive to estimates that are inaccurate due to sparseness’. (ibid: 81) This is in conformity with the corpus linguistics methodology. for language. He noted that despite the insufficiency of such a condition it could guarantee that such a pattern is not accidental. unlike many other areas of research. giving no result. Allen (1995: 194-5) proposes a more practical solution by adding a small amount to each count to guarantee that there will be no zero probabilities. Let us now consider the following example to see how useful MI is in extracting collocations. This process is called expected likelihood estimator (ELE). On the other hand. a single occurrence is unremarkable in the first instance. i. Obviously such a problem can be superficially avoided by using words with a frequency of at least four or three. he was successful in ascertaining ` 82 . if one category of the formula happens not to occur in our corpus. Sinclair (1996) proposes a primitive test to measure the significance of a given pattern by looking into patterns with minimum frequency of two. Having analysed CAC. The adjective ‫ ضروس‬d{aruus ‘fierce’ rather co-occurs with ‫ مطر‬mat}arun ‘rain’. 86 13356 370 11214 32 2111 MI 11. I need a 31 Using corpus-based analysis to assess Emery’s results is useful in supporting or invalidating his hypothsis. if you find x. Table (5.6 strongly collocates with ‫ رأسه‬ra’sahu ‘his head’ because of its high MI score. f(x) = 57.6) the problem of sparse data.that the verb ‫ أطرق‬at{raqa ‘bowed’ uniquely collocates with a particular body part: ‫ رأس‬ra’s ‘head’. The second combination in the table seemingly appears as a strong collocation despite its low frequency because the word it collocates with is rare. which ‫ أطرق‬can collocate with such as ‫ حياء أطرق‬at}raqa h{ayaa’an ‘bowed out of shyness’ ‫ كرا أطرق‬at}raqa kara ‘Kara bowed’.42 15175 261 5937 24107 1. All three occurrences of this word occur only with ‫ أطرق‬at}raqa.31 MI is useful only for testing similarities. ` 83 .39 3.88 2. In fact. an idiom or any other stereotyped phrase.71 7.45 24 .37 Table (5. all of them belong to a certain context or domain: proverbs. which is necessary for assessing seemingly synonymous or collocated words. It can give evidence for the closely related words. you are more or less likely to find y. y 35 3 1 1 1 1 1 1 1 1 1 )F(y 1106 3 16. n = 5000000 (x. But we cannot use it to test the differences between words. This gives an indication that such combination is more likely to be a cliché. Nevertheless. However. The left collocates of the word at}raqa ‘bowed’.45 8.y) bowed his head bowed Kara bowed so not bowed and thought bowed but bowed then bowed namely bowed the young man bowed to bowed out of shyness bowed Hasan ‫أطرق رأسه‬ ‫أطرق كرا‬ ‫أطرق فلم‬ ‫أطرق وفكر‬ ‫أطرق وإنما‬ ‫أطرق فإذا‬ ‫أطرق أي‬ ‫أطرق الشاب‬ ‫أطرق إلى‬ ‫أطرق حياء‬ ‫أطرق حسن‬ )f(x.11 5.6) below shows that there are more categories. other than body parts.88 2. this leaves us no doubt that the combination under investigation is part of a proverb. which is good for finding collocations as it can calculate the probability whether two words occur together very often in a text. ‫ أطرق‬at}raqa ‘bowed’ in table 5. For testing differences.43 2. The main aim behind it.e. as suggested by Church et al (1991). An example of strong and powerful. typically pairs of near synonyms. ` 84 . particularly synonymous words are not easy to identify on traditional syntactic or semantic grounds. The thesauri. T-test can make a negative statement by looking at items which are less likely to co-occur with either X or Y altogether. Six). Contrary to Mutual Information which can only make positive statements32 or what is more likely to occur after a given item. are sometimes misleading and through frequent use we may get used to accept all the entries given as synonyms of a word as absolute synonyms. By analysing the significant collocates. which are introduced for practical or pedagogical reasons (cf. In other words. 32 By positive statement I mean that in MI we can find the words which are more likely to co-occur after X but we cannot account for the items which are more significant with Y or did not occur at all with either. is to see the more significant words that are more likely to appear with each item of the synonymous pair. We will now use another statistical technique: t-test. which is t-score. the t-test can work the other way around. can show the importance of this test. In Chapter Five we introduced Mutual Information statistic which is useful for detecting similarity between items.1. the difference between powerful and strong in powerful support and strong support can be brought out by comparing the most significant right collocates of both of them. namely intrinsic vs. as given by Church et al. T-test simply calculates the difference between two probabilities. The formula as given in Church et al (1991) for the pair of words ‘strong’ and ‘powerful’ is represented as: where w stands for the collocate and ơ for the standard deviation.different statistic.4 T-test: a measure of difference Differences between items.7. Church et al (1991) managed to abstract an attribute that can differentiate between both words. extrinsic. Ch. highlight what is less likely to occur after that item. i. 5. which is useful in assessing the significant differences between two groups of patterns. we have to bear in mind that the statistical calculation is not an end in itself in linguistic analysis.Finally. 33 “Significantly frequent” here means the statistically significant combination of words. The use of numerical methods is normally only the first stage of a linguistic investigation. of the above definitions. In conclusion. and this kind of work should be distinguished sharply from the heavy reliance on statistical methods in some styles of linguistic-analytical operations such as parsing or translation. with some refinement. As Sinclair (1996: 80-81) puts it. ` 85 . Collocation is the significantly frequent33 cooccurrence of two or more words. we can give a definition of collocation which is relatively an amalgam. 1) we are going to explain how similar a word is to another in meaning to be called a synonym. 6. 5).2. synonymy is a paradigmatic relation that holds between words on the vertical axis (cf. some linguists like Bloomfield (1935: 145) deny the existence of synonyms in natural languages.2 Definition Synonymy is defined as two or more expressions which are different in form but not in meaning (Harris. But is that closeness of meaning considered synonymy? On the other hand. In this respect. many dictionaries are assembled to fulfil that purpose like Roget’s Thesaurus. Through corpus analysis we can show whether two items are indeed absolute synonyms or not by checking their relations in all the available contexts. When asking anybody about the meaning of a given word. Therefore we need from the very beginning to explain what exactly synonymy is and give a systematic survey of the phenomenon to put forward a more convincing explanatory hypothesis that will be statistically applicable. This type of sense relation simply means the sameness or similarity of meaning as defined in dictionaries. Such dictionaries provide for every entry a list of words that have close meaning or descriptive detail of the concept. Let’s first talk about the expressions involved. Ch. Should it be complete similarity.1 Introduction As discussed earlier. strong similarity or even a thin shade of similarity to be considered? Below in section (6. To have two different phonological words of the same meaning can bring up some arguments as regards how much sameness do both of them have. they will intuitively provide you more than a word as alternatives. ` 86 .Chapter Six: Synonymy: An overview 6. I will try in this chapter to review the phenomenon from the corpus linguistic perspective. Webster’s Synonym Dictionary and Crabb’s English Synonyms. As the subject matter of this thesis is to look into synonymy in a different way and to examine readily empirical issues that have interesting theoretical results. 1973: 6). Unicorn entails animal. For instance. We are concerned with the first type of similarity. 6. Synonymy in this sense is defined as a relation of similarity in meaning between lexical items. On the other hand.2 below.Four Approaches According to the definitions given above we are left with four attitudes about the treatment of synonymy: one denying the existence of synonymy and the other three differ as to how much 34 The extensional approach is the only way to give all sorts of information. ` 87 . 1986: 70). dog is a hyponym of animal).2. 1983: 107).g. the extension of the word dog is the class of dogs. Synonymy can also be defined as sameness of intension or extension34 (Jones. then they have to be mutual hyponyms to each other.2. Moreover.1 Synonymy . synonymy is considered a type of hyponymy.It is important from the very beginning to distinguish two notions of semantic similarity: a) similarity between single words and b) paraphrase. ‘if X is a hyponym of Y and if Y is also a hyponym of X. This definition requires that the two forms under investigation are interchangeable in every possible context. i. 1986: 66). Tests have been introduced to check the credibility of any seemingly synonymous pairs as will be discussed in section 6. For example dog entails an animal. whereas human male is a paraphrase of man. it is relatively easier to study words intensionally than extensionally. For instance commence and start are synonymous verbs. the intensional approach to meaning is more general than the extensional approach since there are some words which do not have an extension like unicorn. Palmer (1981: 88) defines synonymy as ‘symmetric hyponymy’. The synonymy relation between a given pair of words can be ruled out if we spot any change in the context. A more restrictive definition of synonymy was put forward by Quine who views synonymy as ‘two forms are synonymous if their interchange leaves their contexts synonymous’ (Jones. Therefore. the intension is the word property (or to put it in Lyon’s words: ‘the set of attributes which characterise any entity to which the term is correctly applied’ (Lyons 1968:454)). In this respect. however it does not refer to anything extensionally. hyponymy will not come up if we do not take that approach (e. then X and Y are synonymous’ (Hurford & Heasley. For example. The extension of a word (denotation) is all things referred to by a word. all cars are automobiles and all automobiles are cars.e. if we take car and automobile as synonyms. However. Ullmann and Haas view synonymy as perfect interchangeability between the items under investigation in all possible contexts. this phenomenon of complete interchangeability without any sort of alteration in meaning is rare in natural language (1962: 142). in short. This approach was also adopted by Palmer who claimed that there are no real synonyms (1981: 89). First Approach: Denial of the existence of synonymy We stated in the introduction of this chapter that some linguists deny the existence of synonymy. ‘each linguistic form has a constant and specific meaning. The principle of economy eliminates one of these two terms as redundant. but fail to keep that ` 88 . Bloomfield (1935: 145) argues. and how far. He (ibid: 143) states. that there are no actual synonyms’. Ullmann introduced a test for ruling out seemingly synonymous pairs called the substitution test. ‘two words are synonymous if they can be used interchangeably in all sentence contexts. ‘The best method for the delimitation of synonyms is the substitution test … which is (considered) one of the fundamental procedures of modern linguistics. It is enough to prove that a given pair of words is non-synonymous if any shade of meaning (increased or decreased) alters with the change of the context. and in the case of synonyms it reveals at once whether. If the forms are phonemically different. we suppose that their meanings are also different… We suppose. The three other approaches of looking at synonymy acknowledge it but with some different treatments. as argued by Ullmann. The definition of synonymy as complete interchangeability is partly in conformity with this approach. Second Approach: Strict definition of synonymy Quine. He gave a few examples like broad and wide which are used synonymously in broad sense and wide sense.’ This is also the requirement given by Ullmann (1962) when he talks about synonymy in that strict sense.similarity to be considered synonymy.2. As Jackson (1998: 65) put it. they are interchangeable’. Such an approach can be entertained or discredited in our theoretical treatment of synonymy according to what definition of synonymy we adopt. This approach arises from the question of why natural languages tend to have two words which mean the same thing and are used in the same range of grammatical and lexical patterns.2. Synonymy in this sense is called absolute synonymy as will be explained in section 6. which at least are similar in some context. they take each word and lists its synonyms followed by antonyms. He did not go into more detail. Third Approach: a more lenient approach This looks at synonymy in a broader sense and is adopted to accommodate as many synonyms as possible for each item. For example. such as in the pedagogical field and in the production of dictionaries. 2000) comes up with a different test. One single occurrence where we find one item of a pair more or less normal than the other can undermine the synonymy relation between them.synonymy in five foot wide. ‘Every difference of meaning between two expressions will show up as a difference of normality in some context’ (Cruse. as every word can be more or less normal than the other. During his illness (normal) b. Webster’s Synonymy Dictionary and Crabb’s English Synonymy are word-based. He argued that there is a normality profile for all possible words and sentences in a language. Roget’s Thesaurus is based on concepts as it gives to every concept list or lists of terms. Haas used the notion of normality as a primitive intuition. *During his disease (abnormal) We can distinguish between words through the grammatical aspects of meanings. i. Haas (quoted in Cruse. Unlike Roget’s Thesaurus. He also stated that ‘one can also distinguish between synonyms by finding their opposites (antonyms). A closer look at dictionaries of synonyms reveals that the criterion considered in the definition of synonymy is to have two items. which describe that concept. the normality profile test. For example: illness and disease are not synonymous as in (1a&b). ` 89 . 1.e. 2000: 12). These terms are grouped according to the degree of their synonymy relation. The meaning of a word is its normality profile across its grammatical occurrences.a.’ For instance decline and reject are not synonymous if opposed to rise and accept. This approach is more likely entertained for practical purposes. they are identical on all (relevant) dimensions of meaning. But later on he regarded them as different degrees of similarity as shown below. S1 and S2 are equivalent. Lyons made a further distinction between partial synonymy and near-synonymy. implies another sentence. He defines synonymy as ‘words whose semantic similarities are more salient than their differences’. all their meanings are identical. and only if. 2. but not identical. Lyons (1981: 50-51) drew a distinction between three types of synonymy: 1. 3. If one of the above criteria were missed. of course. S1. Later on. He distinguishes ` 90 . in meaning’ (ibid: 50).synonyms are totally synonymous if. S2. ‘absolute synonyms are expressions that are fully.Fourth Approach: halfway between the two extremes The definitions given by Lyons (1969 and 1981) and Cruse (1986 and 2000) represent this attitude. Lyons (1968: 450) defines synonymy as follows: If one sentence. Absolute synonymy combines all these three categories. Cruse (2000:156) proposes a different classification of synonymy. The latter is defined as ‘expressions that are more or less similar. Cruse (1986: 292) viewed Lyons’ distinction of partial and near-synonymy as one. synonymy would be partial (ibid). they represent only one variety)’. and only if. then x and y are synonymous. and if the converse also holds. and therefore as partial synonyms (though. totally and completely synonymous’. ‘By his (Lyons’) definition near-synonyms qualify as incomplete synonyms. They do not see interchangeability in all texts as a requirement for synonymy recognition.synonyms are completely synonymous if. They rather made a distinction between different categories of synonymy. … If now the two equivalent sentences have the same syntactic structure and differ from one another only in that where one has lexical item x.synonyms are fully synonymous if. Lyons (ibid: 51) states. the other has y. they are synonymous in all contexts. and only if. three types of synonyms according to the degree of the similarity that holds between items: absolute. These types can be located on a scale at the end of which falls absolute synonymy.1 Absolute synonymy: Cruse (1986: 268) states that ‘two lexical units would be absolute synonyms … if and only if all their contextual relations … were identical’. To have this sort of complete similarity is not motivated in natural languages since one of the items would be redundant and accordingly undergo shift of meaning or expire. ? Little Billy was so courageous at the dentists’ this morning. The only possible reason why we have absolute synonyms is for avoiding repetition of forms. and near-synonymy. In the above examples the (a) sentences are more normal than their (b) counterparts. It should be noted that when doing the test we have to stick to one meaning of the word under investigation especially with words of subtle differences. To show that two items are not absolutely synonymous. for more profound and elegant discourse. For example. Apparently he died in considerable pain. Little Billy was so brave at the dentists’ this morning. b. there are rare examples that satisfy this strict definition. sad and happy. propositional.e.2 Degrees of Synonymy 6. Cruse (1986 and 2000) used the Normality Test. b. It is enough to rule out the synonymy relation between two items if you find a single context in which they differ.2.a. isn’t he? b. in ` 91 . The over-quoted example is caecitits and typhlitis (which mean inflammation of the blind gut) (Ullmann (1963) and Lyons (1981b)). 3. collocational restriction. i. Let us now have a look at the following examples (taken from Cruse 2000: 157): 2. ?He is a large baby. isn’t he? 4.2. introduced by Haas as mentioned in the previous section.2. the two words have to have exactly the same normality in all cases.a.a. Accordingly. ? Apparently he kicked the bucket in considerable pain. This is a very strict definition. He is a big baby. 6. where any difference in meaning will be reflected in a difference in contextual relations. Accordingly. This is called idiosyncratic collocational restrictions. but not flawless. ?It is a sad baby.a. ` 92 .1 above comes out as propositional. can modify either animate or inanimate objects. and ii) any grammatical declarative sentence S containing X has equivalent truth-conditions to another sentence S1. fiddle and violin are propositional synonyms. b. It is an unhappy baby. i) have the same syntactic structure.4 and 5 below.2.1. ii) have the same truth-conditional properties as they entail one another. Lyon’s definition of synonymy discussed in section 6. Cruse (1986: 281) gave some examples with no semantic explanation.a. b. For instance. unblemished or impeccable. So. This is an important point that gives our premise more credence when we talk about the treatment of synonymy through collocations. because the two sentences: He plays the violin very well and He plays the fiddle very well. It is an unhappy story. 5. The Haasian test is semantically based and does not work otherwise.2. To put it in Cruse’ words: X is a cognitive synonym of Y if (i) X and Y are syntactically identical. 6. (Cruse 1986: 88) For instance. 6. Lyons’ definition quoted in 6.2. It is a sad story.2 Propositional synonymy Propositional synonymy (commonly called cognitive synonymy) is widely regarded as synonymy.1 can fit here as an appropriate definition for this type of synonymy. which is identical to S except that X is replaced by Y. one’s record can be spotless. whereas one’s credentials cannot be but the last. The way to prove that two words are propositional synonyms is to find a situation where one is more or less typical than the other.g. the same contextual environment. The old man passed away. c. Such a type of synonymy is more common than absolute synonymy.The key point in defining propositional synonymy is substitutability with the truth-condition preserved. a. positive or negative as shown in the examples below: 7.Collocational variations. d. Arguing that ‘there are no real synonyms’. begin: commence. car: automobile. autumn and fall (the latter is used in American English). politician and statesman (each show approval and disapproval). rancid bacon or butter and addled eggs or brains. e. Cruse (1986 & 2000) discussed these differences as follows: •Differences in expressive meaning: a sentence can be expressively neutral.Stylistic variations.Emotive variations. in addition to keeping the truth-condition of the substituted words. ` 93 . e. as Lyons put it ‘substitutability salva veritate’.Dialectal variations. while preserving the truth-value. 9. 2000: 158). The old man kicked the bucket. for instance. b. die: pass away and brave: courageous. whereas (8) has an additional meaning of respect and (9) has a sense of disrespect. More precisely.g. so it is less strict than absolute synonymy which requires.g. Propositional synonymy allows some differences of non-propositional meaning to occur between synonymous pairs (Cruse. Palmer (1981: 89) mentioned four facets that render differences between synonymous pairs as shown below.The old man died. 8. begin and commence (the latter is more formal). The sentence (7) above is neutral. e. mode: re: concerning: about. or arbitrary as in (11). ii) x is human. In (10.a). temporal: wireless: radio. (i) is a logical presupposition of (6. we have three components of meaning: i)x is an organism. In (11..•Differences of evoked meaning Dialect: different lexical items that are used in different dialects in the same range of references. Geographical: autumn: fall.a) x died. corn: wheat: oats.b). lavatory: toilet. 10. style: money: bread: dough: dosh: filthy lucre. i)x is an organism. Field: marriage: matrimony.a) above. etc.a). •Differences of presupposed field of discourse Two propositional synonyms can differ in respect of presupposed meaning. dead: deceased. we have two components of meaning: i)x is an organism ii) x became not-alive If you negate the sentence in (10. ` 94 . then you leave one meaning intact. as in (10) below. the audience or the speaker’s intention may bring up different lexical items with same range of reference. Therefore.a) x passed away. Presuppositions can be either logical.b) x did not die. social: sofa: settee. 10. ii) not-(x became not-alive). Register: the change of situation. swimming baths: swimming pool. 11. iii) x became not-alive.a) as in (10. Near synonyms must share central aspects of meaning but are allowed to differ in peripheral aspects. The above variations give rise to the significance of using synonyms in sensitive areas. Therefore (i) and (ii) are presuppositions (the latter is arbitrary because it depends on usage and collocation. pretty. it just means ‘die’ when speaking respectably of humans. i. 2000: 158). i)x is an organism ii) x is human iii) not-(x became not-alive) In (11) above.2. we do not use pass away with animals). The difference between it and propositional synonymy is that near-synonymy is not propositionally equivalent. 6. 1986) is the type commonly adopted by dictionary-makers. In other emotionally sensitive areas like death and money one can also make use of these variations to choose what is regarded as euphemistic (Cruse. handsome and beautiful are considered near synonyms because they share the same capital component. by central aspects we mean the capital components. we can divide word meanings into components or atoms. whereas peripheral means subordinate ones or the modifiers. the head is the first sense that comes to one’s mind about a given word.e. Thus. one item can highlight one aspect and the other highlight another. 11. the former is considered the head component and the latter is the subordinate one. defecation. For example.2.e. i. like taboo areas. In other words. urination. ` 95 .a) are left intact. pretty can be analysed as [GOOD LOOKING] [FEMALE]. the heads. So two synonyms can differ in respect of what is highlighting.3 Near-synonymy Near-synonymy (called plesionyms in Cruse. So. such as when talking about sex. etc. so can be called dictionary synonymy. when analysing sentences componentially. one or two of the meaning components in (11.b) x did not pass away. pass away is not a special way of dying.In negation. In more detail. This view is consistent with the widely held opinion among semanticists that strict or absolute synonymy is rare in human languages (see Cruse: 1986). for instance in connection with rhetoric (balaaghah) though it did not get extensive study. More interestingly. propositional synonymy and near synonymy. Let us consider the following example. or rather a pond. in this study I will take the view that the phenomenon of synonymy should be understood as a gradual cline along which we may locate different degrees of synonymy: absolute synonymy.b) above lake and tree are not near-synonyms because of the great differences between them. ` 96 . we will apply corpus-based analysis methodology to a list of selected Arabic word pairs which are presumed by some Arabic linguists to be absolute synonyms to see how credible their presumption is. To prove the credibility of such hypothesis. The main contribution of Arab linguists was the collection of what is called lexicons nowadays. Al-Iskafi’s mabaadi’ al-lughah (Principles of Language) is considered a classical work on Arabic Synonymy. and more exactly (Cruse 1986: 287) with which we can signal the minor differences between near synonyms. The study will argue that Arabic never has two words that mean nearly the same thing and are used in the same range of grammatical and lexical patterns. b) ?This is a lake. 6. A further step is taken here in this study to demonstrate that absolute synonymy does not exist in Arabic. or rather a tree.3 Synonymy in Arabic Synonymy was recognised early by Arab linguists. Some works on a large scale were based on the collection of all names. In (12. The Names of the Snake and The Wine’s Names. Al-Fayrouzabadi produced a dictionary-like book called al-rawd}u al-masluuf fi-maa lahu ismaan ila uluuf (The Best Garden of Words (or Expressions) That Have Two to a Thousand Names). or rather descriptions. 12.Near-synonymy can be easily tested by expressions like or rather. that a given word has such as Khalawayh’s The Names of the Lion. It was arranged according to topics like stars.a) This is a lake. In conclusion. food. They are all primarily concerned with distinguishing apparent synonyms. who always used words for a reason. Haywood (1965: 113) described it as follows: It is a vast storehouse of vocabulary which sometimes gives synonyms. clothes. etc. nujcat al-rraa’id wa shircat al-waarid fi-l-mutaraadif wa-lmutaawarid (The Spring of the Seeker in Synonyms and Associations) in which Arabic words. This was an Arabic dictionary based on a concept classification. from a theoretical point of view. However. Al-Mubarrad and Al-Siyuti stressed that synonymy is widespread in Arabic. On the other hand. ‘any two forms used synonymously by Arabs.constellations. by early Arab linguists. Al-Askari’s Al-furuuq (The Differences) is another work on synonymy. He argued that every word should have a specific meaning. everyone of them has a specific meaning which is missing in its ` 97 . where the author tried to pursue the finer shades of differences that hold between the seemingly synonymous words. human behaviour. good and bad manners. senses. Similar works were made by later writers. investigating the contexts of qacada and jalasa ‘sit’ which are commonly taken as synonyms will show that they have different meaning from each other (Versteegh et al. including synonyms.174). Perhaps the idea of denying the existence of synonymy was introduced by Ibn AlArabi (d. Generally speaking. etc. synonymy was frequently discussed. and at other times distinguishes between the finer shades of meaning of words which are roughly synonymous. Ibn Faris denied the existence of synonyms because this would contradict the wisdom of Arabs. Some linguists like Sibawayhi. an example is al-alalfaaz{ al-kitaabiyyah (Idiomatic Expressions). p. For example. Thaclab argued that there is a difference of meaning between any given pairs of synonyms. time. 802) whose apprentice Thaclab reported him saying. Furthermore. the best known of these classical thesauri is Thacalibi. were arranged under such headings as physical descriptions. weapons. Such attempts were unsystematic by modern standards and cannot be regarded equivalent to the modern thesauri since they were not arranged alphabetically and lack comprehensiveness. 1983. Al-Addad: 7). 14) takallama wa qaala He spoke and said. we will not investigate this very pair. So. we will pay more attention to pairs which are still considered as absolute synonyms. Adj and Adv) can undergo such a phenomenon. Ullman (1963: 193) called this phenomenon quasi-synonymy. safety and security in for the safety and security of this state. 15) yujaahidu wa yuh}aaribu fi sabiili-llaah He fights and battles for Allah’s cause. For example. It is customarily used in situations where the speaker’s fluency is needed for convincing the addressees especially in religious and political contexts. However.4 The Repetition of Synonyms in Arabic A general look at prose in Modern Standard Arabic shows that Arabs tend to mention two synonyms following each other in most cases to give more rhetorical force to their expressions. because it has already been discussed by Thaclab.counterpart’ (Al-Anbari. therefore. ` 98 . The repetition of synonyms in this fashion is widely used in Modern Standard Arabic. They also stated that the repetition of synonyms can be ‘syndetic’. Let us consider the following examples. Thaclab investigated the differences between lemmas like qacada and jalasa manually and we can surely offer a more accurate analysis if we investigate the phenomena computationally. particularly with the use of adjectives or ‘asyndetic’ without using connectives. Hervey and Higgins (2002: 59) noted that all major parts of speech (N. tends to use adjacent terms which share some of the semantic properties for stylistic reasons. Dickins. This conjunction between seemingly synonymous words is not only acceptable in Modern Standard Arabic but is used frequently in the everyday language as well. when a connective is used. 13) sharah}tu al-darsa wa fas}s}altuh I explained and elaborated the lesson. The speaker. 6. In the above examples it is obvious for Arabic speakers that the two different verbs in every sentence can be substituted for only one verb in English. V. e. 18( h}iss wa shucuur Perception and consciousness. Accordingly. each item of the pairs in examples (16-19) can be either static or dynamic. in (16-19) above.We may also find this phenomenon often used in Late Classical Arabic. This process is used merely for subtle discourse as Ivanyi (1993: 53) put it: The extended use of these and similar pairs of expressions in the classical and Modern Literary Arabic (and not only in the literature. To me that proposition can be restated as the meaning of one of the two items of the pair may be more general than the other. too) indicates that this device may be more than simply a rhetoric device and also points to the basically linguistic ` 99 . so a refinement of Ivany’s proposition is needed. In this way. He argued that the synonymity of such pairs could be discredited by the virtue of semantic attributes like static and dynamic. i. 19) ghumuud{ wa ibhaam Obscureness and ambiguity. in some cases the two terms of the pair could be dynamic or static. In addition. Let us consider the following examples from Al-Hamadhani’s Maqamat quoted by Tamas Ivanyi (1993: 52-53): 16) taraktuhu wa ins}araft I left him and departed. Ivanyi offered an explanation for how such pairs of conjoining seemingly synonymous words exist in Arabic. However. one of the two synonymous words as emphasis. one term of the pair tends to have more action than the other in the sense that one expects the addressee to understand the repetitive synonymous term. but in everyday usage. 17) fit}na wa dhakaa’ intelligence and cleverness. this is not an inclusive condition since we may have pairs in which one item can be regarded as more general than the other. (and not stylistic) roots of the phenomenon we called here semantic conjunction. by comparing the concordances of claimed synonymous items in order to point out all possible contextual overlaps or disparities.5 Conclusion This chapter discusses the various approaches and types of synonymy. ` 100 . the notion of substitutability in all contexts can easily be grounded on corpus evidence. this is very important for our research orientation to instigate the analysis of our data in the following chapter based on a detailed theoretical stance. With respect to absolute synonymy. 6. I will argue that collocation is very useful to describe word meaning and is a mechanism by which we account for seemingly synonymous pairs. Collocation is. So collocation is one of the conditions he gives to consider a pair of words absolute synonyms. I will argue that absolute synonyms do not exist in terms of their collocational patterns.Chapter Seven: Collocational Treatment of Synonymy in Arabic 7.2 Data Choice The present study is restricted to a list of some selected lexemes. In order to prove that these subtle differences can be brought out by collocation. frequently used by Arab linguists when discussing synonymy. how synonyms behave in all contexts. the collocational range of an expression can reveal the differences between apparent synonyms.1) below. a device with which words of multiple senses can be accounted for precisely. it is possible to compare seemingly synonymous words to find out whether they are real synonyms or not. According to Lyons (1995).1 Introduction This chapter will discuss the semantic relation of synonymy. 7. therefore. Following Lyons. the most recent of them ` 101 . to extract semantic features which can make distinctions between them and to explore the possibility of distinguishing such differences using statistical analysis of corpora. As mentioned in the previous chapter. has a different distribution or is used in a different register. in order to highlight the subtle differences that might occur between them. Through collocation we can distinguish one sense of a word from another and know whether a seemingly synonymous pair are real synonyms or not. absolute synonyms can be ruled out if we come across one context in which one of the synonymous pair carries more meaning. I will analyse the collocates for a list of synonymous pairs. By this technique. as shown in table (7. I propose that employing collocation in the analysis of synonyms can help distinguish their meanings and reveal the similarity and/or dissimilarity that hold between them. The Mutual ` 102 . Following Barnbrook (1996: 90). This helps us decide from the very beginning what to look for in the concordances. the word ‘elements’ in physics could mean ‘the four natural elements’ and in literary texts ‘factors’ or ‘principles’. 7.Muh}iit}. The meanings of these items were examined first in four Arabic dictionaries in order to arrive at the most seemingly synonymous pairs which are presented in Table (7. Majmac allughah al-carabiyyah’s Al-Wasiit}}. These dictionaries are: Al-Fayruzabadi’s Qaamuus al-Muh}iit} ‘Al. and Ibn Manz}ur’s Lisaan Al-cArab. The items in this list are also used in Al-Askari (non-dated).is Ghali (1998). Al-Hamadhani (1991).Muh}iit} Lexicon’. I will consider words that occur at least three times within the span to be relevant for collocational analysis. Then the words under investigation are categorised syntactically and according to their frequency. Set 1 2 3 4 POS V N V N Synonyms ‫ أتى‬ata / ‫ جاء‬jaa’a (come) ‫ ذنب‬dhanb /‫ اث‬ithm (sin) ‫ حسب‬h}asiba / ‫ ظن‬z}anna (think) ‫ حب‬h}ubb / ‫ود‬wudd (love.1) below. All irrelevant hits are eliminated manually. For practical reasons I would suggest to use the distribution of the word under investigation represented in its collocates rather than using the whole concordance line. This is because words that occur just once or twice can give spuriously high significance scores.1): The sets of randomly selected synonyms for our analysis. This list is selected to be general words rather than genre-specific words whose usage and meanings may differ from one domain to another. affection) Table (7. Al-Bustani’s Muh}iit} Al. For example. Al-Yaziji (1970).3 Data Analysis The data for the study are taken from the CAC: all forms of the words to be examined were extracted. and Leceibi’s (1980). these two words are definitely eligible to be called absolute synonyms. If the difference between a given pair of words is not brought out by a simple scrutiny of the MI results. An independent t-test compares the averages of two samples that are selected independently of each other (the words in the two groups are not the same).e. Then the word senses of both items are probed through their collocation to find out the semantic attribute that makes one item different from the other. If we can exchange one word for the other in all contexts without changing the meaning of the sentence to any extent.1) above. ` 103 . This is a good step in recognising what we are going to analyse as it summarises all the concordance lines and enables us to make comparison and contrast to bring out the subtle differences between seemingly synonymous items by examining their collocation. 1962: 143) to see if any change happens in the meaning of the sentence based on intuition. based on their collocational distribution.Information statistic can help us observe what patterns are most distinct. The semantic feature that would distinguish the meaning of a given synonym can be discovered by dividing the collocations of each item into a distinct list according to their frequency. i. in terms of collocation. we may have collocation between interrupted words. A semantic feature is then identified. Collocation as defined in Chapter Five does not necessarily work on adjacent words. we will apply the substitution test of one word for the other (Ullmann. to show the difference between both items. For more explanation. we will use the t-test statistic. The remaining part of this chapter will examine the four case studies presented in table (7. Collocation can include items that habitually collocate with other items from a definable semantic set. semantic prosody. 58 6. (2) ‘arrive’. which are widely regarded as absolute synonyms and then have a look at the their contextual distribution. Table (7.3. In order to analyse significant collocations.4 A case study: The word pair jaa’a and ata ‘come’ To prove the credibility of our methodology let us take the first synonymous pair: jaa’a and ata. Al-Muh (4) in addition to (5) ‘have sex’ which is euphemistically related to the meaning in (3).7. hits are calculated first to include all possible syntactic forms of the pair under investigation. Secondly. First. (2). x for our search term and y for the collocate. because of the inapplicability of wild-card search with Arabic texts. we took a number of preliminary decisions.2) the words are defined in terms of each other. n will stand for the total size of our corpus.3 below whereas the collocations of jaa’a are represented in table 7.2) Definitions of jaa’a and ata in Arabic dictionaries The dictionaries above distinguish three main meanings for jaa’a: (1) ‘come’. (3) ‘do’. The remaining meanings are mentioned because of the following prepositions: bi ‘with’ and cala ‘on’. however. But before that we give the definitions of jaa’a and ata as provided in the most authentic Arabic dictionaries.2.05 34 590 64 94 168 MI 9.55 ` 104 .79 6. as mentioned in section 5. As for ata. we will see how frequent every item of the pair is in the corpus as a whole before doing further analysis. Then we can compare that to the frequency of the words used with them. it has the meanings (1). n= 5000000.4. y 48 4 4 29 3 4 7 )F(y 139 17 9. Al-Wasiit} gives one more sense for ata: ‘approach’ which is also related to the previous meanings. (3).72 6.05 6. we discarded all combinations with a frequency lower than three as indicated in 7.60 8. In table (7. Thirdly. f(x) = 2219 ( y) mischief with sin unbelief torment soothsayer falsehood Allah ‫الفاحشة‬ ‫بذنب‬ ‫الكفر‬ ‫عذاب‬ ‫كاهن‬ ‫الباطل‬ ‫ال‬ )f(x.4. it is more or less closely related to the meaning (1). The other meaning (4) ‘bring’ comes up because of the preposition bi ‘with’. The significant collocations for ata are shown in table 7. 36 0.570.63 0. y )F(y MI ` 105 .50 1.the prophet no man Jibreel Syria calamity the good the mosque heaven to messenger command Makkah the night Moses owner the truth on women his family king Umar with it his wife people from with him day his tribe-men sin son in father he that ‫النب‬ ‫ما‬ ‫رجل‬ ‫جبيل‬ ‫الشام‬ ‫بأس‬ ‫الي‬ ‫السجد‬ ‫السماء‬ ‫إل‬ ‫رسول‬ ‫أمر‬ ‫مكة‬ ‫الليل‬ ‫موسى‬ ‫ذا‬ ‫الق‬ ‫على‬ ‫النساء‬ ‫أهله‬ ‫ملك‬ ‫عمر‬ ‫به‬ ‫امرأته‬ ‫بن‬ ‫من‬ ‫ومعه‬ ‫يوم‬ ‫قومه‬ ‫معصية‬ ‫ابن‬ ‫ف‬ ‫أبا‬ ‫هو‬ ‫ذلك‬ 169 6 9 5 4 3 15 5 4 51 46 10 3 4 5 5 3 74 4 8 8 4 35 6 7 24 5 3 4 7 5 6 4 3 3 6777 6939 8646 386 520 404 2269 924 814 11214 11805 3002 914 1352 1985 2023 1171 36416 2035 537 5046 3180 28912 401 4564 34830 8738 5243 7901 19246 16734 24009 17618 15055 17455 5.50 2.83 1.81 5.14 1.96 0.88 2.78 1.22 0.90 2.06 3.60 3.35 3.500.151.73 2.820.07 5.010.36- Table (7.36 0.46 3.89 3.47 2. f(x) = 2566 ( y) )f(x.06 4.3) The immediate left collocates of ata in a span of four word-forms with minimum frequency of 3.961.13 2. n= 5000000.86 4.11 4.31 2.44 1.19 2. 94 386 3.51 1479 2.07 422 5.37 5243 1.45 385 4.75 1005 3.54 1334 4.54 1117 3.44 36416 1.33 1352 4.31 111 6.50 11805 1.39 3002 1.38 24009 3.81 6130 2.92 1011 3.41 29 7.97 11 9.24 706 3.19 11214 4.79 7 9.92 1171 4.79 11 9.96 2775 3.92 11 9.87 3347 1.65 244 6.52 4132 1.39 34830 2.37 .70 137 8.80 2537 1.06 1493 3.48 2154 1.08 1985 1.13 87 6.with fertility with good deed empty empty visiting clear proofs with lies nomad second dragging victory time the truth knowledge Ramdan the night man to Islam one Jibreel owner somebody The Qur’an in the day-time wants information the boy the Prophet wealth from Moses Umar explanation the country-men women other Messenger Abraham on command day ` ‫بالصب‬ ‫بالسنة‬ ‫فارغا‬ ‫فارغا‬ ‫زائرا‬ ‫البينات‬ ‫بالكذب‬ ‫أعراب‬ ‫ثانيا‬ ‫ير‬ ‫نصر‬ ‫وقت‬ ‫الق‬ ‫العلم‬ ‫رمضان‬ ‫الليل‬ ‫رجل‬ ‫إل‬ ‫السلم‬ ‫أحد‬ ‫جبيل‬ ‫صاحب‬ ‫ن‬ ٌ ‫فل‬ ‫القرآن‬ ‫ف‬ ‫النهار‬ ‫يريد‬ ‫الب‬ ‫الولد‬ ‫النب‬ ‫مال‬ ‫من‬ ‫موسى‬ ‫عمر‬ ‫تأويل‬ ‫القوم‬ ‫النساء‬ ‫آخر‬ ‫رسول‬ ‫ابراهيم‬ ‫على‬ ‫أمر‬ ‫يوم‬ 3 8 5 5 3 24 3 10 4 3 10 27 14 15 4 13 81 96 12 22 3 7 6 6 117 3 4 6 6 18 4 67 4 6 6 4 3 6 17 3 49 4 7 106 3 10.52 1726 4.97 3181 1.04 961 3.01 1486 2.97 828 2.22 8646 4.61 2035 1. 161.61 0.04 0. we have only given translation and transliteration for the information which are relevant to our discussion. as we can see in the examples below.40 0.330.481. person. But jaa’a tends to be more frequently used with time and. Analysing the concordances of ‘ata and jaa’a shows that there is a wide range of overlap between them.270.4) The immediate left collocates of jaa’a in a span of four with minimum frequency of 3. is always followed by the preposition ila ‘to’ before places as can be seen in the tables.88 0.b.171.010.60 0.) ‫فسار حت أتى الشام فقتل أهلها‬ fasaara h}atta ata al-Shaam faqaatala ahlahaa.hitting after before Allah people already with father about except said that until this ‫يضرب‬ ‫بعد‬ ‫قبل‬ ‫ال‬ ‫ناس‬ ‫وقد‬ ‫مع‬ ‫أبو‬ ‫عن‬ ‫إل‬ ‫فقال‬ ‫ذلك‬ ‫حت‬ ‫هذا‬ 3 7 3 15 4 13 4 8 9 5 12 4 3 3 3007 7383 3825 19246 5882 24534 7901 17618 21153 12261 32835 17455 13165 15501 0.) ‫فسار يشي ويتتبع آثار الطريق حت جاء إل باب الدينة‬ fasaara yamshii wa yatatabbac aathaara al-t}ariiq h}atta jaaca ila baab almadiinah.170. there are several instances where both appear with words denoting place. then he killed its people. He went until he arrived at Syria. following the road signs until he arrived at the entrance of the city. (1. time or abstract object. (1.95 0. ` 107 . For the sake of brevity.a. unlike ata.40- Table (7. He kept going. And say: ‘Truth has come and falsehood has vanished’. (3. (2.‫ث جاء النب صلى ال عليه وسلم يشي ف الصفوف‬ thumma jaa’a al-nabiyyu …yamshii fii al-s}ufuuf.b.b) .‫أتى النب صلى ال عليه وسلم بيت فاطمة فلم يدخل‬ ata alnabiyyu … bayta faat}imah falam yadkhul.b.) .a.a) ‫فلما جاء الليل نام‬ falammaa jaa’a allaylu naama.a) ُ‫وقلْ َاء اْلح ّ وزَهقَ الَْاطل‬ ِ ‫َُ ج َق َ َ ب‬ wa qul jaa’a al-h}aqqu wa zahaqa al-baat}ilu. (3.) ‫وهل يأت الي بالشر ؟‬ wa hal ya’ti al-khayru bi-l-sharri? Does the good bring evil? ` 108 .) ‫ولا أتى الليل طلبته أمه فلم تده‬ wa lamma ata allaylu t}alabatuh ummuhu falam tajidhu. When the night came he slept. (4. (4. When the night came. Then came the Prophet …walking between rows. his mother looked for him but she could not find him.(2. The prophet came to Fatimah’s house but he did not enter. For example. we manually eliminated instances where ata means ‘commit’. namely ‘come’ to set it off against jaa’a which is mainly used in this sense. This gives another dimension for the use of both verbs. So we need to make a more precise analysis before coming to a conclusion. simply because we should be aware of the fact that words could have multiple senses and different syntactic forms could entail different senses.e. It can also be used metaphorically to refer to having sex. ‘come’. The first obvious point we can get is that in table (7. To look at this sense. collocates of highest MI scores. This particular use of ata and jaa’a is interesting to analyse because their meanings are so similar that native speakers of Arabic tend to use them interchangeably. Can we say then that the semantic feature which distinguishes between jaa’a and ata is positivity vs.60.e. i.3) shows that al-faah}ishah ‘mischievous deed’ is the strongest collocate with MI score at 9.3 & 7. It is important to mention that jaa’a and ‘ata followed by the proposition bi (with) are frequently used in CAC with the meaning ‘to bring’ but our pre-theoretical approach of what a word is does not count propositions or conjunctions that are attached to the root word. However we will be restricted to analysing only one sense of ata.Let us now study the statistics given in table (7. we cannot come up with an exclusive distinction between jaa’a and ata by making such a simple analysis. negativity? Actually.92. A closer look at the words reveals that the two words are not synonymous all the time. For example. We ` 109 . ata. is bi-l—khis}b ‘with fertility’ with an MI score at 10. table (7. in particular. ata followed by the proposition cala means ‘to finish off or destroy something’ ata cal al-t}acaam (he has finished all food).4) the most statistically significant collocation of jaa’a .4) above to see how similar or dissimilar the collocations of the word pair under examination are. (‘ata cala al-‘akhd}ar wa-l-yaabis he destroyed everything (literally: he destroyed the cultivated and non-cultivated land). which constitute about 3% of the whole occurrences of ata. ata imra’tahu/ahlahu (to have sex with his wife). has several meanings in different contexts. Using ata in this sense is called euphemism which is widely used in Qur’an. only we have to manually proofread our counts and exclude all the instances which have other meanings. For example. As for ata. in addition to the previous differences brought out between them. i. get into your habitations’ (Qur’an. when they came to the people of a town. We only focused on the phrases which contain the words under investigation.5) below. 35 The translation of Qur’anic verses are taken from Al-Hilali and Khan’s The Noble Qur’an. Now let us have a look at the following uses of both of them: i. but-refused-(they) to entertain-them Then they [Moses and Al-Khidr] both proceeded. (5.) ُ‫حَّى إِذَا أََتوْا علَى وَادِي الّنمْلِ قَاَلتْ َنملَ ٌ َا أَّيهَا الّنمْ ُ ادْخلُوا مسَاكَِنكمْ لَا َيحْطمَّنكمْ سلَْيمَان‬ ُ ُ ِ ُ َ ُ ‫ل‬ ‫ْةي‬ َ ‫ت‬ . (5.cannot always use the two words interchangeably. Al-Naml: 18). I made a further analysis of the concordances of jaa’a and ata with a minimum frequency of three. asked-foodthey-(dual) people-it.) ‫فَانطلقَا حّى إِ َا أَتََا َأهْلَ قَرْيَةٍ اسْتَطعَمَا َأهلَهَا فَأبَو َن ُيضَّ ُوهمَا‬ ُ ‫ْ ْ َ أ يف‬ ‫ََ َت ذ ي‬ fa-nt}alaqaa h}atta idha atayaa ahla qaryatin… Then-proceeded-they-(dual) till when came-they-(dual) people town. they asked them for food. ` 110 . so we deleted the transliterated glosses and all extra explanatory comments rendered by the translator for elucidation.b. I examined all the concordances of ata and jaa’a throughout CAC which enabled me to come up with the following three major distinctions between them. one of the ants said: ‘O you ants. The result of this further analysis will be tested later on by t-test statistic as shown in table (7.َ ‫َ ُُو ُ ُ َهمْ َا يش ُ ُو‬ ‫وجن ده و ُ ل َ ْعر ن‬ h}atta idhaa ataw cala waadi al-namli qaalat namlatun ya ayyuha al-namlu udkhulu masaakinakum… When they came to a valley of ants. AlKahf: 77) A full translation35 of the example in (5.a) can make the meaning clearer. (Qur’an. When ata is followed by a place it means that place is not a destination point.a. but they refused to entertain them. but it is slightly amended. to omit information which is irrelevant to the main discussion and the exegetical glosses included in the translation and marked by inverted commas or brackets. till. The Prophet did the evening prayer.” (An-Naml: 18) The ants’ colony was not meant to be the destination point for Sulayman and his army. so it gives a sense of stability.c) the prophet returned to his house after giving his prayers to sleep. nor did they stay there for a long time. The whole army was only passing by the colony when Sulayman heard the ant warning the rest of the colony of an imminent destruction by Sulayman and his army. where they got hungry. Then they passed by a town.c.) . lest Solomon and his troops crush you without knowing it. get into your habitations. His house is ` 111 . In (5. (5. thumma jaa’a ila manzilihi. the place that follows jaa’a is meant to be a destination point where one can stay for longer time or for ever. then he came to his house where he prayed four prostrations and slept.) ‫وسِيقَ اّلذِينَ اّتقوْا رَّبهُمْ ِإلَى اْلجَّةِ زُمَرًا حَّى إِذَا َا ُوهَا وُِتحتْ أَْبوَاُبهَا‬ َ ‫ج ؤ َف‬ ‫ت‬ ‫ن‬ َ َ …h}atta idha jaa’uha wa futih}at abwaabuha ‘And those who were pious to their Lord will be led to Paradise in groups.b) it is a part of Moses’ story with Al-Khidr when he set out on a journey searching for that knowledgeable person. so they asked them for food but the people of that town refused to host them.… thumma naama. After Moses had found him. Conversely.‫صلى النب صلى ال عليه وسلم العشاء، ث جاء إل منله، فصلى أربع ركعات، ث نام‬ s}alla al-nabiyyu… al-cishaa’. Al-Khidr started teaching him a series of lessons practically. one of the ants said: “O you ants. which was not their terminal point. till. In (5.d. when they reach it and its gates will be opened’ (Qur’an.When they [Solomon’s army] came to a valley of ants. (5. al-Zumar:73). Makkah. They were expelled from their own hometown. Likewise. freeing Jerusalem and the Al-Aqsa mosque is the dream of all Muslims. (6. In (5.) ‫ح‬ ُ ْ‫إِذَا َاء َنصْ ُ الّهِ وَاْلفَت‬ ‫ج ر ل‬ idhaa jaa’a nas}ru Allahi wa al-fath}u.therefore an end point as he did not mean to carry on going to any other place. they were prevented from performing their pilgrimage to the Holy House to fulfil the duty which Allah had imposed upon them. they are all waiting for Allah’s promise to come.a) the conquest of Makkah and the victory over the disbelievers of Makkah was something which the Prophet and all Muslims were longing for.b) is mentioned in the context of the conflict between Muslims and the Jews where Allah promises the Muslims to return to their mosque and defeat the Jews in the end. ii. al-Nas}r: 1). the example in (6. al-Isra: 7).) َ‫فإِ َا َاء وَع ُ الخِرَةِ ليسوءوا ُ ُوهكمْ ولِيَدْخلُواْ اْلمسجِد‬ ْ َ ُ َ ُ َ ‫وج‬ ‫ذ ج ْد‬ fa-idhaa jaa’a wacdu al-‘aakhirati… ‘Then. In addition.b. Actually. In (6.d) the paradise is the final abode of the pious people so when they come to it they will live therein forever. (they will make your faces sorrowful and enter the mosque (of Jerusalem))’ (Qur’an. ‘When come the victory of Allah and the conquest (of Makkah)’ (Qur’an. since the advent of Islam. jaa’a when followed by an event means that event has been waited for or expected. without a just cause and left behind everything. ` 112 .a. (6. when the second promise comes. would you then call upon any one other than Allah?’ (Qur’an. For example.) ْ‫حَّتىَ إِذَا أَخذَتِ الرْضُ زُخْرفهَا وَا َّّنتْ وظَ ّ أَهلهَا أَّنهمْ قَاد ُونَ علَْيهَا علَْيهَا أَتَا َا َأمْ ُنَا لَيْلً َأو‬ ‫ه ر‬ َ َ ‫َُ زي َ ن ُْ ُ ِر‬ َ َ . Al-Ancam: 40). or the Hour comes upon you. (7. jaa’a means ‘arrive’ as shown in (5. In (6.c & d) the events are not expected because Allah keeps such things hidden so that every person is rewarded for what he does.a. and its people think that they have all the powers of disposal over it. ` 113 . ata associates with things that happen unexpectedly. Al-Ancam 39). (6.ِ ْ‫ن َا ًا فج َلَْا َا ح ِي ًا ََن ّلمْ تغْ َ ِالم‬ ‫َه ر َ َع ن ه َص د كأ َ ن ب َ س‬ h}atta idhaa akhadhati al-ard}u zukhrufahaa… atahaa amrunaa… ‘When the earth is clad with its adornments and is beautified.) َ‫ُلْ َأرَأَيُْت ُم ِإنْ أَتَاكمْ عذَا ُ الّهِ َأوْ أَتَْتكمُ ال ّاعَ ُ أَغَيْرَ الّهِ َتدْعون‬ ‫ل‬ ‫ُ س ة‬ ‫ُ َ ب ل‬ ‫ك‬ ‫ق‬ qul ara’ytakum in ataakum cadhaabu Allah… ‘Say :”Tell me if Allah’s Torment comes upon you.) ‫ِْ ه‬ ُ ‫أَتَى َأمْرُ الّهِ فلَ َتسَْتعجلُو‬ َ ‫ل‬ ata amru Allahi fala tastacjiluuh ‘(Inevitable) cometh (to pass) the Command of Allah: seek ye not then to hasten it’ (Qur’an. Our Command reaches it by night or by day and We make it like a clean-mown harvest. whereas ata has a sense of approaching a place or a time. Al-Nah}l: 1).c & d) above. (6. iii.c. as if it had not flourished yesterday’ (Qur’an.On the other hand. and the people are not aware of what is hidden for them.d. Sulayman and his army have not reached the ants’ colony yet. one of the ants said: “O you ants.ُ ّ‫ُو ِي ِن َاط ِ اْل َا ِي الَْي َ ِ ِي الْبق َ ِ اْل َُا َ َ ِ ِ َ الش َ َ ِ َن َا ُو َى ِّي ََا ال‬ ‫ن د م ش ِئ و د أْمن ف ُ ْعة مب ركة من ّجرة أ ي م س إن أن له‬ ` 114 . they were still by its outskirts because one of the ants asked the rest of the ants to go inside their colony. get into your habitations. Indeed. Transliteration is provided for the underlined Arabic words.Allah’s command is the Last Day (the Day of Judgement) and this apparently contradicts the situation but it rather means ‘approached’.a) above can also be used here. The English translation by Dr. Muhammad Al-Hilali given below translated ata to ‘at length … came’ which is a close interpretation to the meaning of ‘come’ in this context. Khan and Al-Hilali’s translation of the above verse: At length. (8. فل ّا أَتَا َا‬ ‫َل ََم ه‬ ُ َّ ‫ن‬ َْ ‫ت ن ل َل ك م‬ ‫ْ ْك ن‬ . An-Naml: 18).a.َ ‫َهمْ َا يش ُ ُو‬ ‫و ُ ل َ ْعر ن‬ h}atta idha ataw cala waadi al-namli qaalat namlatun ya ayyuha al-namlu udkhulu masaakinakum… ‘When they came to a valley of ants. get into your habitations’ (Qur’an. Al-Naml: 18). lest Solomon and his hosts crush you (under foot) without knowing it’ (Qur’an. More interestingly. the slight change in the contextual use between ata and jaa’a in the following three verses can bring out the subtle difference between them. ‫ِ ُ ُ ن ج ن ده‬ ُ ُ ‫حّى إِ َا أََتوْا علَى وَادِي الّنمْلِ قَاَلتْ َنملَ ٌ يَا أَّيهَا الّنمْ ُ ادْخلُوا مَسَاكَِنكمْ لَا َيحْطمَّنكمْ سلَْيمَا ُ وَ ُُو‬ ُ ُ ‫ل‬ ‫ْة‬ َ ‫َت ذ‬ . We also marked the similar parts throughout the following three examples with square brackets. one of the ants said: ‘O ye ants. Muhsin Khan and Dr.) ‫قَالَ لِأَهلِهِ ام ُثُوا إِّي آَنسْ ُ َارًا ّعّي آتِي ُم ّْنهَا ِبخَبَرٍ َأوْ جذوَةٍ مِنَ الّارِ لَعلكمْ َتصْطُونَ. The evidence mentioned in (5. when they came to a valley of ants. Al-Qasas: 29-30). (8. (8. from a tree in hallowed ground: “O Moses! Verily I am Allah”‘ (Qur’an. or a burning firebrand. [(v) he was called:] “O Moses! Verily I am thy Lord! therefore put off thy shoes: thou art in the sacred valley”‘ Tuwaa’ (Qur’an.c.] [(iii) soon will I bring you from there some information. or I will bring you a burning brand. Al-Naml: 7-8).) ‫قَالَ ُوسَى لِأَهلِهِ إِّي آَنسْ ُ َارًا سَآتِي ُم ّْنهَا ِبخَبَرٍ َأوْ آتِي ُم ِبشهَابٍ قَبَسٍ ّعلكمْ َتصْطَُونَ فل ّا‬ ‫ل ََم‬ ُ َّ ‫ل‬ ِ ‫ك‬ ‫ك م‬ ‫تن‬ ‫ْ ن‬ ‫م‬ َ‫َاءهَا ُودِيَ َن ُورِكَ مَن فِي الّارِ ومَنْ حوَلهَا و ُْبحَانَ الّهِ رَ ّ اْلعَاَلمي‬ ِ ‫ل ب‬ ‫ن َ َ ْ َس‬ ‫ج ن أ ب‬ qaala muusa liahlihi innii aanastu naaran sa’aatiikum minhaa bi-khabarin… falamma jaa’aha nuudiya… ‘[(i) Moses said to his family:] [(ii) “I perceive a fire. that you may warm yourselves. ` 115 . “Tarry you.b. Taha: 10-11). [(v) he was called] from the right bank of the valley.] [(ii) I perceive a fire.] [(ii) I perceive a fire.] [(iii) perhaps I can bring you from there some information. or find some guidance at the fire. that you may warm yourselves.) ‫فقَالَ لِأَهلِهِ ام ُثُوا إِّي آَنستُ َارًا ّلعَّي آتِي ُم ّْنهَا ِبقَبَسٍ َأوْ أَج ُ علَى الّارِ هدًى فل ّا أَتَا َا ُودِي يَا‬ ‫ِد َ ن ُ ََم ه ن‬ ‫ْ ن ل ك م‬ ‫ْ ْك ن‬ َ ‫ُوسَى إِّي أَنَا رَّكَ فَاخلعْ َنعْلَيْكَ إِّكَ بِاْلوَادِ اْلمق ّسِ طوًى‬ ُ ‫ُ َد‬ ‫ن‬ َْ ‫ب‬ ‫ن‬ ‫م‬ faqaala liahlihi imkuthu innii aanastu naaran lacallii aatiikum minha bi-qabasin… falamma ‘aataaha nuudiya… ‘So [(i) he said to his family.] [(iii) perhaps I can bring you some burning brand therefrom.”] [(iv) But when he came to the (fire)]. the Lord of the Worlds”‘ (Qur’an.”] [(iv) But when he came to the fire].”] [(iv) But when he came to it]. [(v) a voice was heard:] “Blessed are those in the fire and those around: and Glory to Allah.qaala li-ahlihi imkuthuu innii aanastu naaran lacallii aatiikum minhaa bikhabarin… falammaa atahaa nuudiya… ‘[(i) He said to his family: “Tarry you. Therefore he used the verb without a modal of probability. because every verse tells one aspect of the story. The fireplace was so remote that he had to promise his family not to give up. In Arabic one can use the preposition fi ‘in’ to mean absolute closeness. The story is put in different wordings in these three surahs.a. and (8. The remaining parts make the meanings of the three verses different from one another.) comes from the bank of the valley.) mentions that he is in the sacred valley and has not arrived at the fireplace. because the call following ata in (8. This is to reassure his family even if the fire is far or he takes long time. in (8.a & 8.a. Let us consider the following example. Most importantly. ‘those in the fire’ refers to Moses and ‘those around’ are angels as al-Razi said. whereas their equivalent in (8.As shown above we have three verses from different surahs (chapters) relating the story of Moses when he saw the fire where Allah talked to him. is used in subjunctive form to express a wish but it is uncertain.b) employ the verb ata ‘come’ marked [iv] in both to refer to a degree of nearness to the fire-place. jaa’a ‘come’ marked [iv] in (8. One more piece of evidence that supports the above argument is that the word Allah occurred in object position with ata 7 times and did not occur at all with jaa’a.c.) Moses asked his family to wait until he goes and sees the fire.b. in (8. where he talked to Allah.c) uses jaa’a to describe a state of absolute closeness.b. marked iii in both verses.a) and (8.c) Moses does ask his family to wait and the verb ata ‘bring’ used is in near future which expresses certainty.) ata is used to indicate that Moses is still far from the actual fireplace. He has the intention to do his best to get some information from the people around the fire or to get a burning brand from it to warm themselves. The verb ata ‘bring’. So. For example. In (8.) and (8. ` 116 .) the call implies that Moses arrived at the fireplace because Allah says ‘blessed are those in the fire and those around’. It is interesting to know that the two verses (8. We can notice that there are similar parts in each verse (as marked in i-v).c) is used to relate the final part of the story after Moses’ arrival at the fire-place. But in (8.b. 66 2.20 P<0.0001 P < 0.e.87 3.20 2. so no one can come to a point of closeness to Allah’s entity like a physical object.0001 P < 0. As mentioned earlier. the test can be done generally without restricting it to one sense of the words under investigation. The Day whereon neither wealth nor sons will avail except him who came to Allah with clean heart ata is used in the above example. Let us now use the t-test to show what sort of differences holds between jaa’a and ata as shown in table (7. because Allah is not limited to a place nor can vision grasp Him.10 36 Customarily.73 significance P < 0. To me. items with the highest MI scores.91 4. i.05 P<0.23 1. I will search the items whose MI scores are significant.01 P<0. i. W time clear proofs knowledge nomad victory the truth empty visiting dragging ‫وقت‬ ‫البينات‬ ‫العلم‬ ‫أعراب‬ ‫نصر‬ ‫الق‬ َ‫فارغا‬ َ‫زائرا‬ ‫ير‬ )f(w 1726 137 1334 244 422 1171 11 7 87 )f(Jaa’a /w 27 24 15 13 10 14 5 3 3 )f(‘ata/w 1 0 0 1 1 3 0 0 0 t 4. ` 117 . it would be easier if we chose to do the calculation inside a closed set for short cut and quick results.36 We will analyse the most significant left collocates. namely ‘come’.71 2.73 1.5) below. which co-occurred with ja’a and ata in a particular sense.01 P<0.01 P<0.89 3.ٍ‫َيوْمَ َا يَنف ُ َا ٌ وَلَا َبُونَ إِلّا منْ َأتَى الّهَ ِبقلْبٍ سَليم‬ َ ‫ل‬ َ ‫ل َع م ل ن‬ yawma la yanfacu maalun wa la banuun illa man ata allaaha biqalbin saliim. To do the test it would be better to stick to one sense of the words under investigation.e.0001 P < 0. we will be restricted to analysing only one sense of the pair. kaahin ‘soothsayer’38. and baat}il ‘falsehood’.0001 P<0.5) above are not synonymous because they are used in a different range of contexts.3) as in ya’ti ‘comes’ followed by khayr ‘good’ or h}aqq ‘truth’ is not 37 These collocates are identified by MI statistic. 39 We ignore all hits which have neutral senses.73 1. ata frequently co-occurs in negative contexts: c adhab ‘torment’.34 0 P<0. The bigger the t-score.05 P<0.64 2. In the first place. ` 118 . the more different the pair under examination.00 1. the former has a strong tendency to occur in positive contexts.Not sig Table (7.05 P<0. and albayyinat ‘clear proofs’. al-h}aq ‘the truth’.5) show the differences between jaa’a and ‘ata.second the Prophet torment Allah disbelief Syria calamity soothsayer command falsehood Gabriel َ‫ثانيا‬ ‫النب‬ ‫عذاب‬ ‫ال‬ ‫الكفر‬ ‫الشام‬ ‫بأس‬ ‫كاهن‬ ‫أمر‬ ‫باطل‬ ‫جبيل‬ 111 6777 401 168 406 520 404 64 537 94 386 4 18 1 0 0 0 0 0 4 1 5 1 169 29 7 4 4 3 3 10 4 5 1. the positive use of ata in tables (7. Two points might seem contradicting to the above conclusion.20 P < 0.0001 P < 0. 38 Soothsaying is forbidden in Islamic religion and is classified as a major sin.00 2. ba’s ‘calamity’.10 P<0.10 P<0. whereas the latter has a negative sense. ‘amr ‘command’ (meaning difficulty or torment).39 Therefore. The highest scores of jaa’a and ata in the table show that the items having this score is more likely different from each other.5) the most significant ten left collocates37 with jaa’a (the top ten words) and ata (the last ten words). jaa’a gets the highest scores with the following positive items: alcilm ‘knowledge’.04 5.11 2.60 1.34 11. The t-scores in table (7.73 1. al-kufr ‘disbelief’.20 . On the other hand. ata and jaa’a as shown in table (7.01 P<0.20 P<0. For example. in Al-Furuuq is that ata requires a complement.e. ` 119 .5) with al-nabiyy ‘the prophet’ (11. so they fall in the area of overlap between ata and jaa’a as indicated in (7.considered strong evidence because they are only used with ata in its present tense form. Secondly.04) and waqt ‘time’ (4. 9. The only difference brought out by Al-Askary. Figure (7. Figure (7. who belongs to the Classical period. i.6) below shows the contextual preference of both of them.6) below. yajii’. and falsehood.91) are not significant because they are both neutral.6) the collocational differences of jaa’a and ata with minimum frequency 1. so jaa’a in present form is about five times less common in CAC than ata in present form. The native speakers of Arabic are themselves unaware of these collocational differences between jaa’a and ata. jaa’a in its present form occurs 181 times whereas its corresponding ata occurs 980.a jaa’a alrajulu nafsuhu. Further examples from CAC show that ‘ata is overwhelmingly used in unpleasant contexts. jaa’a in present tense form. the high t-scores in table (7. Analysing the concordances of jaa’a and ‘ata with minimum frequency of 1 can show their tendency to occur in negative or positive contexts. trouble. We think there might be a morphological reason why ata in its present simple form is used for both negative and positive sense. is not as easy to pronounce as ata. The main collocates concern committing sins. 9). which can differentiate between them.a) above is consistent with our approach that jaa’a is always followed by the preposition ila ‘to’ before places. Otherwise they can replace each other without any loss of meaning. So. To me. the multiplicity of the senses40 with ata makes leaving the complement position empty as in (9. the missing preposition in (9.b) above. ambiguous. Then we managed to highlight some distinctions between the two items by analysing their contexts.a) eliminates the possibility of a following category that refers to a place. In the third stage. came the-man self-him The man came himself. 7. This is not consistent with Al-Askary’s proposition that difference in form must produce difference in meaning but that difference was abandoned as time passed (Al-Askary. the use of jaa’a in (9. negativity vs. i. 9. We finally used T-Test to capture the subtle differences between the pair by extracting a semantic feature. this included manual elimination of all irrelevant hits. Al-Furuq: p.b *ata alrajulu nafsahu. the analysis of the seemingly synonymous pair jaa’a and ata was carried out in three stages in order to highlight the subtle differences that occur between them. 40 The senses with ja’a are all related to a directional motion. ` 120 .a) involves some sort of directional motion which implies an action not toward a place but rather toward the speaker.1 Summary To sum up. Therefore. the use of jaa’a in (9. The first stage consisted of lexical search for all occurrences in CAC of the tokens jaa’a and ata. positivity. The second stage involved the categorisation of the tokens syntactically and according to their frequency. we used MI to highlight the collocations of both.e. whereas the senses with ata are diverse and some of them are metaphorical or euphemistic.came the-man self-him The man arrived himself. On the other hand.4. 7. a small window seems not effective for languages with many non-adjacent complements that result in non-adjacent collocations as shown in table (7. as mentioned in chapter five. which have a similar semantic and syntactic form and also a broadly similar frequency (645 vs. We searched both ithm and dhanb in a span of 3:3 and the result was ` 121 .7) Definitions of ithm & dhanb In the last section we used a range of two words on either side of the node to get an understanding of the contextual distribution of a given pair. Al-Muhiit Table (7. Indeed. In Arabic.5 A case study: the word pair ithm and dhanb ‘sin’ ithm and dhanb ‘sin’ are commonly treated as synonymous as shown in the dictionary definitions in (7.8) below. 917 word forms) have been used in CAC to mean ‘committing a bad deed’ in general. A casual account of the two words reveals that the two Arabic nouns ithm and dhanb.7) below. some differences might be overlooked within that short range due to the syntactic structure of Arabic. so-who haste-becomes in days-two so-no sin on-him and who late-becomes so-no sin on-him. The span of 3:3 resulted in either non-adjacent complements. But whosoever hastens to leave in two days. which will be analysed in table (7.inconclusive with both. let us consider the top ten collocates of the pair under investigation to see how insufficient a span of 3:3 is as indicated in the following table. So. ِ‫فمَن تَعجلَ فِي َيوْمْينِ فلَ ِإْثمَ عليْه وَمَن َتأَ ّرَ َل ِإثْمَ عَلْيه‬ َ ‫خ ف‬ ِ ََ َ َ َّ faman tacajjala fi yawmayni fala ithm calayhi wa man ta’akhar fala ithm calayh. In the table above we could not find any statistically significant collocation for ithm except for one item: al-maysir ‘gambling’. 3L ‫ تأخر‬delay ‫ نفعهما‬their usefulness ‫ من‬from ‫ للناس‬for people ‫ فل‬so not ‫ غفر‬was forgiven ‫ في‬in ‫برئ‬ innocent ‫ إثم‬sin ‫ حرج‬wrong Frq 41 20 18 14 13 12 10 9 7 7 2L ‫ قال‬say ‫ من‬from ‫ ومن‬and from ‫ في‬in ‫يقول‬ says ‫بينهم‬ between them ‫يعني‬ means ‫ومنافع‬ and benefits ‫أي‬ namely ‫ ل‬no Frq 67 46 43 26 21 16 15 15 12 12 1L ‫عليه‬ on him ‫وإثمك‬ and your sin ‫كبير‬ much ‫من‬ from ‫ على‬on ‫قال‬ said ‫يقول‬ says ‫أكبر‬ bigger ‫ في‬in ‫ ،عليه‬on him Frq 169 31 30 26 25 24 23 23 21 18 Search term ithm ithm ithm Ithm Ithm Ithm Ithm Ithm Ithm Ithm 1R ‫ فل‬no ‫ أو‬or ‫ تبوء‬carry ‫ من‬from ‫ بإثمي‬with my sin ‫ فيهما‬In them ‫ باب‬chapter ‫ ل‬no ‫متجانف‬ deliberately ‫ فإنما‬verily Frq 181 54 44 44 31 24 19 19 17 15 2R ‫تأخر‬ stay ‫يومين‬ two days ‫ أن‬that ‫جنفا‬ unjust ‫ تبوء‬carry ‫غير‬ without ‫ قل‬say ‫ قال‬said ‫ قوله‬his saying ‫بينهم‬ between them Frq 60 47 46 34 28 28 23 19 18 15 3R ‫ في‬in ‫ ومن‬and from ‫ موص‬the testator ‫ قال‬said ‫والميسر‬ and gambling ‫ فأصلح‬so he reconciled ‫ ما‬no ‫ من‬from ‫ أن‬that ‫ كبير‬much Frq 62 60 30 27 14 13 13 11 8 8 Table (7.8): The top ten collocates of ithm in a span of 3:3. To take an example the underlined words in the table above are part of a Qur’anic verse about performing pilgrimage which reads.9) below. there is no sin on him and whosoever ` 122 . Further analysis of the pair under investigation by using MI statistic and in a bigger span (7:7) shows more interesting collocations which are statistically significant. I run the concordance first and save the result into a text-only file. Therefore the span size for this study is set to 7:7 i.03 MI 12. ` 123 . I will use Microsoft Word to capture the nodes that extend over that span. if we stick to such a span in this case. in the first place.09 58 .00 4.43 3. As mentioned in section 5.stays on. Secondly. we would like to make use of all possible collocations in our realtively small corpus. then I use Microsoft Word to count the hits which I see relevant to my search term such as an adjective modifying it which is located far apart in the line. it is hard to capture in that span the semantic features that stretch over several units not included in our span41. some verbs co-occur more often with one item than the other as in table (7. y 44 11 4 16 5 )F(y 161 33 49 9.32 9.8 8. In addition. However.2 we chose to work on flexible spans since there might be some expressions in Arabic that stretch over the average span: 4:4.92 194 61 MI 11.11 540 185 149 755 2219 4. so the span of 3:3 failed to capture it.29 1764 1995 )F(y 59 24 32 9. taking a span of 3 or 4 words can reveal some intersting differences between the collocates of ithm and dhanb.49 11.42 For example.79 9. 42 The maximum span I can handle automatically using the Monoconc tool is 3:3. y 85 17 5 45 13 7 3 8 4 3 )f(x. seven word forms to the left and to the right.49 11.62 3.45 9. we will ovelook many intersting collocations. (y) (dhanb/w) ask for forgiveness sin confess precede repent follow hit approach do fall bear intend allege gain avoid ‫استغفر‬ ‫يذنب‬ ‫اعترف‬ ‫تقدم‬ ‫يتوب‬ ‫تأخر‬ ‫يصيب‬ ‫يأت‬ ‫فعل‬ ‫يقع‬ )f(x. al-Baqarah: 203) The verb tacajjala ‘hastens’ does not appear in the table as a collocation for ithm because it comes fourth to the right.e. (Qur’an. So.9).31 (y) (ithm/w) ‫تبوء‬ ‫يتعمد‬ ‫يفتري‬ ‫يكسب‬ ‫يتنب‬ 41 In fact. there is no sin on him. yatacammad ‘intend’. ‘ask for forgiveness’ and ‘repent’ often co-occur with dhanb. On the other hand.10) below to support or eliminate that distinguishing feature.75 7. collocations of ithm do not reveal how sins are expiated.increase earn carry record incur ‫يزداد‬ ‫يكتسب‬ ‫احتمل‬ ‫يكتب‬ ‫يوجب‬ 12 5 15 5 4 200 179 542 928 615 8. tabuu’ ‘bear’. ` 124 . i. yaksib ‘gain’. except blasphemy against Allah. For example. there must be a difference in meaning between these two items because of their contextual differences. taqaddama ‘precede’. In the Islamic creed sins can be forgiven. hits with high MI scores as shown in the table above. The first question which can be raised. As for ithm. whether venial or deadly. yudhnib ‘sin’. There are sins which can only be forgiven through repentance when a fault is done against people. we can say that every repented sin can be forgiven: venial sins by the act of inner repentance alone (by asking for forgiveness or practically by doing good deeds and refraining from bad deeds).e. and yajtanib ‘avoid’ are the strongest collocates. Therefore. and mortal sins by repentance expressed through the compensation or reconciliation with those who you wronged. are astaghfiru ‘I ask for forgiveness’. Then there must be an extra action: compensating those who have been wronged or obtaining their forgiveness. ictarafa ‘confess’ and yatuub ‘repent’.83 5.9) the top ten verb collocates of ithm & dhanb in a span of 7 words. But how can one attain forgiveness? Sins can be forgiven by doing good deeds and/or repentance.86 7. Let us now examine all other types of collocations as shown in (7.74 5.65 Table (7. then. They rather refer to a state of accumulation of such sins. The most significant V+N collocation of dhanb. yaftari ‘allege’. is why such collocates appear more frequently with either item. According to our approach. ‫‘ منع الزكاة‬not paying charity.59 MI 8.46 8.01 10.67 10.99 8.86 8.99 6.78 3 3 3 3 5 9 38 MI 12.67 9.01 10. y 232 34 10 16 35 45 93 193 2643 )F(y 15 178 12. ` 125 . The table reveals that ithm is mainly used for sins that are personal or do not entail a punishment in this world. ‫نقض اليثاق‬ ‘breaking treaties’.19 7.11 9. ‫‘ الكذب على ال‬lying to Allah’. like missing some obligatory worshipping acts or doing a bad deed 43 ‘Adultery/ fornication’ are referred to as faah}isha and zinaa in CAC.07 3 6 28 4 5 8 28 3 )F(y 11 11 10.64 6.50 8.‫‘ الرور بي يدي الصلي‬walking in the prostration position of someone’.n= 5000000.10) the most significant noun collocates with ithm (the top ten words) and dhanb (the last nine words) with minimum frequency 3.29 fornication unbelief treaties apostasy orphans theft major sins oppression murder Table (7.78 11. x(ithm/w) = 645 x(dhanb/w) = 917 (y) (ithm/w) changing staying on walking charity gambling Lying property eating drinking usury ‫تبديل‬ ‫التأخر‬ ‫المرور‬ ‫الزكاة‬ ‫الميسر‬ ‫الكذب‬ ‫أموال‬ ‫الميتة‬ ‫الخمر‬ ‫الربا‬ (y) (ithm/w) ‫34الزنا‬ ‫الكفر‬ ‫الميثاق‬ ‫الردة‬ ‫اليتامى‬ ‫السرقة‬ ‫الكبائر‬ ‫الظلم‬ ‫القتل‬ )f(x. The underlined words in the table above are parts of multi-word religious concepts which are frequently used in CAC. y 21 99 6 6 105 19 35 88 542 240 )f(x.43 11. ‫التأخر أو التعجل ف الج‬ ‘staying on or haste to leave in pilgrimage’. They read as follows: ‫‘ تبديل الوصية‬changing the will’.92 11. that recurs on oneself. a subtle difference will emerge by using the t-score statistic. killing. theft. T-score can tell us how much difference exists between ithm and dhanb by comparing the frequency of the co-occurrence of either word of the pair and its collocates with the other. adultery. etc. The items of the pair are not absolute synonyms though they share the same range of application (both refer to committing a bad deed in general). e. Then we will be able to abstract out of the differences that will come up the main attributes that distinguish both of them. ` 126 . 4) Causing harm to one’s own self. dhanb is used for sins that entail punishment in this world or the next. 5) Secret bad deeds caused to others. 2) Doing an act that causes harm to one’s religion. gambling. 2)Doing an act which is considered illegal. 4)Doing an act that might entail punishment in this world. lying to Allah. dhanb 1)Doing an act that causes harm to others. However. etc. This will help us find out each word’s preferential usage.g. 3) Behaving or doing actions which are considered morally wrong. like drinking. The uses can be summarised as follows: ithm 1) Missing an obligatory worshipping act. 3) Committing a major sin. On the other hand. 94 0 1.44 0 5. y 1 6 0 0 1 0 2 6 11 T )f(dhanb/w 15 3.73 3 1.21 3 1.0001 Not sig.13 9 0.73 )F(y 11 2.10 P < 0.82 1 5.20 P < 0. Not sig. Not sig.29 1 1.01 0 1.11) the t-test scores of the most significant collocates of ithm and dhanb. Not sig. ` 127 .(y) (ithm/w) ‫تبديل‬ changing staying on walking charity gambling Lying property eating drinking usury fornication unbelief treaties apostasy orphans theft major sins oppression murder ‫التأخر‬ ‫المرور‬ ‫الزكاة‬ ‫الميسر‬ ‫الكذب‬ ‫أموال‬ ‫الميتة‬ ‫الخمر‬ ‫الربا‬ (y) (ithm/w) ‫الزنا‬ 44 )f (ithm/w 0 99 3 6 28 4 5 8 28 3 )f(x. P < 0. Not sig. P < 0.73 5 1.0001 ‫الكفر‬ ‫الميثاق‬ ‫الردة‬ ‫اليتامى‬ ‫السرقة‬ ‫الكبائر‬ ‫الظلم‬ ‫القتل‬ Table (7.00 3 1.11).01 0 9.0001 Not sig.05 P < 0. As we can see in table (7. eating dead or unslaughtered animals or birds or missing an obligatory worshipping act like staying on or haste to leave in pilgrimage. it describes one’s actions whose results will only 44 ‘Adultery/ fornication’ are referred to as faah}isha and zinaa in CAC. the high t-scores go with ithm when describing one’s own actions that bring harm to oneself such as gambling.73 3 1. P < 0. P < 0.0001 Not sig.87 P < 0. Not sig. In addition. Not sig.23 0 2.73 0 2.85 P < 0.88 11 1.34 0 2.02 Not sig.77 38 3. drinking wine. theft. we can say that ithm is intrinsic whereas dhanb is extrinsic.1 A Few Remarks If both words are used to describe the same bad action. etc. ` 128 . not paying charity. Hopefully. It always collocates with dhanb as mentioned in the Qur’an. 2) or with committing suicide. by increasing the corpus in the future. it can be noticed that either one is often the preferred choice when a particular semantic attribute is involved. and I fear they will kill me. ‫قال رب إن قتلت منهم نفسا فأخاف أن يقتلون‬ He said: My Lord! I have killed a man among them. Al-Qasa: 33 However. Al-khamr ‘Drinking wine’ is considered a major sin in Islamic belief. This is the most we can get out of our corpus. 7.5. So. Cain and this is obviously logical because nobody was there to prosecute Abel for that murder.affect one’s abode in the hereafter such as missing prayers. dhanb gets the highest t-scores45 with actions that bring harm to other people like murder. such as ‘the ithm ‘sin’ that recurs when some people starved themselves until they died’. it collocates 45 Although the scores are significant with a few examples of dhanb. On the other hand. it is mentioned twice with ithm in the following contexts: 1) when it refers to the murder of the son of Adam. etc. more examples may appear. (Al-Shu’ara: 14) This sin is explained in another verse as a charge of murder. However. and I fear that they will kill me. yet we still can draw a conclusion. ‫ولم علي ذنب فأخاف أن يقتلون‬ wa lahum calayyi dhanb fa’akhaafu an yaqtluun And they have a sin (a charge of crime) against me. For example. murder is a major sin that entails punishment in this world and the next. something between a man and his God. Therefore it becomes a public menace and should be controlled. Kufr (unbelief) is described one time as ithm and other time as dhanb. But if this behaviour is publicised. Zinaa ‘Fornication/adultery’ co-occurs more often with dhanb than with ithm. killing. The T-test highlighted an interesting difference between the two words by comparing all occurrences of both words with high MI information scores. the consequences of fornication are grievous and can harm the whole society by unwanted pregnancy or abstaining from marriage. extrinsic. etc. This indicates that it is not a personal action as some people might think in that it does not affect others. ` 129 . so he is harming himself when missing such acts. such as unjustly taking of people’s property. Indeed. It is.2 Summary The MI tests conducted in this section for ithm and dhanb show that these two words significantly collocate with negative actions. in the first place. missing some obligatory prayers.with ithm not dhanb. He said that drinking might prevent the drinker from doing obligatory acts he is entitled to do so as to enter Paradise. The most significant collocations of ithm refer to sins which involve harming oneself.5. dhanb significantly collocates with sins which involve harming others. such as drinking intoxicants. 7. The semantic feature that was extracted from the T-test tables affirmed that the semantic feature that distinguishes between ithm and dhanb is intrinsic vs. i. They both describe one’s bad deeds in religious terms. etc. it will spoil the unity of this society. which are described as ithms accordingly. it contradicts the main stream of the society. An interpretation given by Al-Asfhani in his Mucjam (Lexicon) explains why it is considered an ithm. the right soil for procreation. it is something that rests in one’s heart.e. it will be called then riddah ‘apostasy’. On the other hand. both can occur with subordinate clauses. Table (7. (10. In addition. The meanings (1) and (3) seem contradictory because it would be confusing to have a word meaning something and its opposite. which mean ‘think’..12) before doing our corpus-based analysis. (2) suspicion and (3) certainty. Whoever goes abroad will think an enemy a friend. these two verbs are seemingly syntactically parallel since both are ditransitive verbs i.e. In addition to the similarities in meaning.b.7. (10. The near synonym pair h}asiba and z}anna are used to define each other in many dictionaries.) ‫ظ ّ عمرو بكرا خالدا‬ ‫ن‬ z}anna camrun Bakran Khaalidan. The dictionary meaning is given in Table (7. c amr thought Bakr as Khalid. h}asiba and z}anna.a. they can have two direct objects (like give) and they may have an intransitive usage as well.6 A case study: The word pair h}asiba and z}anna ‘think’ The seemingly synonymous pair. will be examined below to extract other semantic features. Let us consider the following examples in (10a-12b) below. ` 130 .12) definitions of h}asiba and z}anna in four Arabic dictionaries In the first place. In addition. the Al-Muhiit dictionaries presume that the pair is synonymous by not giving a definition to h}asiba but rather refer to it as a sole synonym. the dictionary meanings of the pair as shown above give the denotation of the words under investigation which simply refers to (1) uncertainty or probability.) ‫ومن يغترب يسب عدوا صديقه‬ wa man yaghtarib yah}sibu caduwan s}adiiqahu . In the second place. they can be nominal modifiers and undergo nominalisation. and an accusative case for the second.) ‫َ نم ْ ه‬ ُ َ‫.. What have (Allah) prepared of torment to the sinners who think they are righteous despite the sins they committed.”‫وأكب ظن أنه قال: “شهادة الزور‬ wa akbaru z}anni annahu qaala shahaadata al-zuur. I more likely think that he said. ما أعد من عقابه لهل معصيته بسبانم أنم فيما أتوا من معاصي ال مصلحون‬ ma acadda min ciqaabihi li’ahli macs}iyatihi bi-h}usbaanihim annahum fiimaa ataw min macaasi}i allaahi mus}lih}uun.(11.‫...b) .. “the false witness”. (12. z}anna assigns an accusative case for two objects that follow. the subject. 1982: 180). ` 131 . As shown above. we need to carry out the same methodology to 46 nawaasikh are verbs that assign case endings for the first two nouns that follow.. it seems that z}anna and h}asiba can be used interchangeably..a. h}asiba and z}anna are called ‘afcaal al-quluub’ (heart verbs) and they function as nawaasikh46... Some verbs like kaana assign a nominative case for the first noun. (11. (12. َيحْسبُ أَ ّ َاَلهُ أَخلَد‬ yah}sabu anna maalahu akhladahu He thinks that his wealth will make him eternal. They are both used to mean certainty and probability but probability is more likely to be the dominant case (Mubarak.b. apart from the subject which is always in nominative case. As corpus data revealed some important and subtle differences between the previous pairs of synonyms that are hard to recognise solely by intuition....a) .‫.) . In Arabic grammar books.ظن قوم أن سم الصلة خاصة بارد‬ z}anna qawmun anna summa al-’as}lah khaas}s}atan baarid Some people think that the python’s poison is cold. the complement. which is 46.13 below shows all adjective and genitive collocates of z}ann with minimum frequency 3. But h}asiba is exclusively used as a verb. ` 132 .75 8. particularly those collocates that function as adjectives or genitives.examine whether the pair under investigation are synonyms or not. we discarded all insignificant collocations.y) MI 47 We have carried out the statistics after singling out all possible forms of the search-term. so we searched the corpus for what we know as verbs and nouns separately. searched all collocations with minimum frequency 3.25 8. The statistics47 shown by the corpus demonstrate that for z}anna the most frequent word form is z}ann in nominal form. This can help us extract the contrasts or subtle differences that pertain to their collocational distribution.e. we. Thirdly. Having a tagged corpus will require a lot of time before conducting such lengthy research.41% (1086/2340). The second most frequent is the verb z}anna. we manually eliminated collocations other than adjectives and genitives. This can be done easily automatically if we have a tagged corpus. which is 53. There are 1254 instances of that particular form. with 1086 hits.56 5. In order to analyse the significant collocations of this word.75 F(x.03 5. Table 7.y) 10 10 7 6 6 MI 7. combinations with MI scores lower than 1. in the first place. z}ann (left collocates) true false invalid the era of ignorance suspicious (x) ً‫صادقا‬ ‫كاذب‬ ‫فاسد‬ ‫الجاهلية‬ ‫سيئ‬ z}ann (right collocates) (x) F(x.58% (1254/2340) of the total. That is a lot and suggests that z}ann as a noun and z}anna as a verb are both central items to learn. with only one occurrence in nominal form. We will examine the first left and right collocates of z}ann as nominal modifiers below. i. Secondly. false ‘kaadhib’.85 3.13).25 7. we cannot draw any conclusive description of z}anna before studying the other form (verb).79 Table (7. ` 133 . the era of ignorance ‘aljaahiliyyah’. Let us now examine the left collocates48 of z}anna (verb) with the same procedure taken above in table (7. So we examined the left collocates of z}anna (V) because (1) analysing the right collocates could mislead us by counting items relating to other verbs and (2) most of the items which modify the Arabic verb fall on the left-hand side. However. the table shows the following: good ‘h}asan’. true ‘s}aadiqa’.13) as shown in table (7. Examining the most significant collocates of z}ann (in nominal form) as represented in table (7.y MI 48 In Classical Arabic. bad ‘suu’’. certain ‘mu’akkad’. Also. For the negative collocates the table shows the following examples which occurred altogether 72 times in CAC: invalid ‘faasid’.bad good more likely much certain suspicious ‫سوء‬ ‫حسن‬ ‫أغلب‬ ‫كثير‬ ‫مؤكد‬ ‫سيئ‬ 40 31 18 6 4 3 7. A minority of examples are neutral. suspicious ‘sayyi’’.05 10. we find out that z}ann occurred more frequently with words of negative sense.92 6. the canonical structure of a sentence is VSO. nacbuduka ‘worship-You’ but a pronoun referring to Allah preceded the verb to exclude any other one from the act of worshipping. much ‘kathiir’). For the positive collocates which occurred 41 times. The alternative basic order which is SVO is also possible provided that we have a good reason like emphasis.51 10.14) below. According to the Arabic grammar you can say. which constitute only 28 examples: more likely ‘aghlab’. (x) )F(x.13 the first left collocates with nominal z}anna (the top five words) and the right collocates (the last six words): adjectives and genitives with minimum frequency 3. we may have a fronted object as in iyyaaka nacbudu ‘You-(alone) we-worship’ (surah Al-Fatihah: 5). (13.02 Table (7.22 10.92 13.y 18 9 6 4 3 MI 12.10 6. For the sake of brevity I will give two examples only. ` 134 .43 4.15) the right collocates of h}asiba with minimum frequency 3. If you see them.23 Table (7. (which is mainly used as a verb in CAC) we have collocates like the following: eternity for the martyrs entering Paradise safety from torment good truth (x) ‫الله أموات سبيل في قتلوا‬ ‫تدخلوا الجنة‬ ‫العذاب من مفازة‬ ‫خير‬ ‫الحق‬ )F(x.) َّ ‫وَيَطُوف ع َلَيْهِم وِلدَان مخل َّدُون إِذ َا رأَيْتَهُم حسبْتَهُم لُؤ ْلُؤ ًا منثُورا‬ َ ُّ ٌ ُ ِ َ ْ َ ْ ْ ً َ wa yat}uufu calayhim wildaanun mukhalladuun idha ra’aytahum h}asibtahum lu’lu’an manthuura. There are also many positive instances of h}asiba in CAC which occurred once or twice.suspicions untrue false bad death ‫الظنون‬ ‫94 غير الحق‬ ‫كاذب‬ ‫سوء‬ ‫الموت‬ 18 14 5 3 3 10.22 3. And round about them (will serve) boys of everlasting youth.96 7.70 2.61 2. you would think them scattered pearls 49 The construction ghayr al-h}aq ‘not truth’ is examined as a whole. As for h}asiba.14) the immediate left collocates with z}anna (verb).a. As shown above z}anna is mainly used negatively. we still can say that z}anna is used more negatively than positively whereas h}asiba shows a tendency to be more positive. to look at all the occurrences of z}anna and h}asiba (without designating a particular threshold) to draw some differences between them in terms of their semantic features. We are now able. z}anna & h}asiba (N) Nominal z}ann h}usban z}anna & h}asiba (V) Verbal Positive Negative Neutral Total z}anna 20 68 998 1086 h}asiba 61 25 301 387 Table (7.16) z}anna and h}asiba in terms of negativity and positivity. ` 135 . However. we would rather refer to their total number added to the total number of the negative and positive occurrences of z}anna and h}asiba with minimum frequency 3 as shown in table (7.(13. They think while doing that they are righteous. are negated51.e.16). Most of the negative occurrences of h}asiba. For example: Positive 41 1 negative 83 0 Neutral 1130 1254 0 Total 1 50 Examples 10a and 10b mentioned earlier are good examples of the neutral sense of z}anna & h}asiba.b.16) below. The result is given in table (7. 51 The negative forms are underlined in the Arabic text along with the transliteration. We can notice that the neutral sense50 of z}anna and h}asiba is dominant. We can further support the hypothesis of negativity and positivity of z}anna and h}asiba by searching their occurrences in CAC. with a frequency lower than 3. i. motivated by the collocational analysis above. unlike z}anna. However we will not be able to show all these occurrences for the sake of brevity.) ‫وهم يحسبون أنهم بفعلهم ذلك مصلحون‬ wa hum yah}sabuuna annahum bi-ficlihim dhaalika mus}lihuun. on the other hand h}asiba is used in the context of praise as in the following hadith: (15. no.b. Two explanations can be given here.a. So-not think Allah breaking promise-His Messengers-His. The first explanation is that the non-occurrence of h}asiba in the nominal form could be for morphological or phonological reasons since z}anna can be used as a verb and as a noun alike.‫إذا كان أحدكم مادحا أحدا فليقل أحسب فلنا هكذا‬ Idha kaana ah}adukum maadih}an akhahu fa-l-yaqul: ah}sibu fulaannan haakadha. ch. 830). 48. ‘I think that he is so and so (Sahih Muslim: Volume 3. Think not that Allah is unaware of that which wrongdoers do. Whoever amongst you has to praise his brother should say. And not think Allah unaware of-what do-they the-wrongdoers. Whoever was one-(of)-you praising brother-his so-say think-I somebody like-this. So think not that Allah will fail to keep His Promise to His Messengers. 2) to change the first vowel ` 136 . (14: 42) (14.(14.) . is ‘suspicious’ and this sense is mainly negative in Arabic.) ‫ول تسب ال غافلً عما يعمل الظالون‬ wa la tah}sabanna Allaaha ghaafilan camma yacmalu al-z}aalimuun. as mentioned earlier. To change h}asiba into a noun one morphological process (step 1 below) and two other phonological processes (steps 2 and 3) have to take place: 1) to add a suffix which is (aan) in this case. We can also wonder why the difference in frequency between z}anna and h}asiba is so great.) ‫فل تسب ال ملف وعده رسله‬ fala tah}sabanna Allaaha mukhlifa wacdihi rusulah. (14:47) One more item of evidence is that one of the dictionary meanings of z}anna. Secondly. On the other hand. One commentator. z}anna is mentioned in the Qur’an 55 times. Al-Tabari. al-s}ariikh to mean the rescued and the rescuer. which literally means (the separated) to mean the night because it is separated from the day and the same applies to the day that is separated from the night. 52 Some other linguists give two explanations for the existence of such phenomenon: 1. He is not actually the only one who is in favour of this approach. Therefore oppositeness can no longer hold between homophonous words. 2. entirely verbs. So it refers to two contradicting senses: the thing and its opposite. we can start by looking at the distribution of the pair across the Qur’an subcorpus. For example. of which 6 are in nominal form whereas h}asiba occurs 43 times. Ibn Al-Anbari compiled a book called AlAd}daad ‘The Opposites’ where he collected all homophones of opposite meanings.Dialectical variations: for instance. the top word of which was z}ann. al-ma’tam which originally means a gathering of men and women for a sad or a merry occasion is limited later on to the sad occasion. gives more examples from Arabic to strengthen his point of view. some linguists denied the existence of this phenomenon in Arabic52 like Ibn Durustwayh who compiled a book called Ibt}}aal Al-Ad}daad (Refuting the book of Opposites) in which he denied that approach because it contradicts the wisdom of Arabs (AlSuyuti. Sometimes z}anna is used in the Qura’n to mean ‘certainty’ and other times ‘doubt’. Al-sudfa which means both light and darkness can be explained in the same way. Such similarity in frequency in the Qur’an subcorpus can give equal data for analysis.into u and 3) delete the second vowel. but different here means oppositeness. So the output form is h}usbaan. and compare the distribution. al-sudfa is originally put to mean to hide so when darkness comes it hides the light of the day and when light comes it hides the darkness of the night. They treat z}ann as a polyseme that has two different meanings. To see whether this is the case. al-jawn means black in Tamim’s dialect and white in Qays’. he mentions al-sudfa to mean darkness and light. ` 137 . such as al-s}areem.Broadening of meaning. All commentators of Qur’an give two contradictory meanings to z}ann. This search in the Qur’an subcorpus shows us how similar z}anna and h}asiba are in terms of frequency with 49 times for z}anna as a verb and 43 times for h}asiba. perhaps mainly in particular contexts only. One more reason can be added to the above explanations which is not mentioned in Al-Muzhir: narrowing. we can say that the less frequent word (h}asiba) is used in a more restricted sense. Al-Muzhir: 400). but they trust 53 The translation is slightly modified (cf. who know not the Book. The selection of meaning depends entirely as they presume on context. If they are doubtful they would not be called believers. And there are among them unlettered people. And seek-help-you with-the-patience and the-prayer and verily-it very-big except on the humble who think that-they meeting Lord-their and that-they to-Him returning. z}ann in the following two verses mean certainty in (16. 2: 46) Allah described the true believers as those who have z}ann that they will meet Allah and they will return to Him. like Mujahid who says whenever z}ann is mentioned in Qur’an it means certainty yet he interprets z}ann in some verses as meaning doubt. (16. And from-them unlettered not know-they the-book except wishes and but they think-they. For example. Some commentators.‫واستعينوا بالصب والصلة وإنا لكبية إل على الاشعي. trans by M. (16. ` 138 . and truly it is hard except for the humbleminded.b. This is a matter of belief.b). Khan.Now let us come back to the subject matter of this chapter by looking at z}ann which is often regarded as a polyseme that has two opposite meanings.a) . They are those who are certain that they are going to meet their Lord and that unto Him they are going to return. fn. 26). ‫ومنهم أميون ل يعلمون الكتاب إل أمان وإن هم إل يظنون‬ wa minhum ummiyuuna la yaclamuun al-kitaaba illa amaaniyya wa in hum illaa yaz}unnuun. And seek help in patience and prayer.) . alladhiin yaz}unnuuna annahum mulaaquu rabbihim wa annahum ilayhi rajicuun. ‘doubt’ and ‘certainty’.53 (The Noble Qur’an.a) and doubt in (16. الذين يظنون أنم ملقو ربم وأنم إليه راجعون‬ wa istaciinuu bi-l-s}abri wa al-s}alaati wa innahaa lakbiiratun illa cala alkhaashciin. z}anna an lan naqdira ‘alayhi. So if we interpret z}anna as doubt or guess as commentators say. let him spend according to what Allah ` 139 . (The Noble Qur’an. And the man whose resources are restricted. And Dhun-Nuun when he went off in anger. The inconsistency of the interpreters of Qur’an and the translators later on created a big confusion when assessing the following verse.upon false desires and they but guess. Ibn Katheer and Al-Qurtubi interpreted naqdira as ‘to narrow’ or ‘constrict’ as in (18): )18( ‫ومن قدر عليه رزقه فلينفق ما آتاه ال‬ wa man qudira calayhi rizquhu fa-l-yunfiq mimma ‘aataahu Allaah. 21: 87) The first dictionary meaning of naqdira is ‘be able’. But this is not the case since this category is blindly following their scholars and this is a type of belief. we presume that that second category of Jews who do not know the reality of their book do not believe in it. 2: 78) z}anna is translated above as guess. We would rather say there are some Jews who only know the false version of the Bible and they are certain about what they believe even if it is false. Glorified are You. they follow their scholars blindly and believe them. however. The verse in (2) talks about some Jews who are illiterate and do not know the reality of their book. This is a different category from those who know the truth and falsify it mentioned in the verse preceding it (2: 77). and imagined that we shall not punish him! But he cried through the darkness: none has the right to be worshipped but You. )17(‫وذا النون إذ ذهب مغاضبا فظن أن لن نقدر عليه فنادى ف الظلمات أن ل إله إل أنت سبحانك‬ wa Dha An-Nuun idh dhahaba mughaad}iban fa. And who was-restricted on-him livelihood-his so-spend-(he) of-what gives-him Allah. (The Noble Qur’an. Commentators on Qur’an eliminated the possibility that the Prophet Jonah had doubt that Allah was not able to get him by explaining the meaning of qadara as to constrict.has given him (Qur’an. (y) F(x. The meaning of the verse as presented by Ibn Katheer is ‘So Jonah (Dhul-Nuun) thought that Allah might not constrict him in the belly of the fish’. 65:7). The use of the fa with the following verb naada clarifies this point as fa introduces a result. how Prophet Jonah. So the meaning is Jonah was certain that if he prayed to Allah he will be saved. Jonah was certain he won’t be constricted in the belly of the fish if he prayed to Allah. So we can say.y) )F(y MI ` 140 . Still we need to figure out what semantic features that make them different in a more methodical way by means of corpus-based analysis. But the question is still raised. while he went off in anger fleeing from his people without permission from Allah. In short. If we interpret z}anna here as certain. there is a subtle difference between z}anna and h}asiba because of the contextual variation that occurs with them. Let us try to study the whole environment of z}anna and h}asiba particularly the first and second left collocates to see what preferential distribution they appear in. who is infallible according to the Islamic creed. thinks that Allah might not constrict him in the belly of the fish. the whole argument will be solved. 94 3. ` 141 .09 Table (7.40 1.87 10.17) the1st & 2nd left collocates with z}anna with minimum frequency 3.82%) and z}anna (2340/579=24. whereas it occurred just 102 times with h}asiba.74%). 54 We searched this item plus the following one because they constitute one concept which is resurrection.31 3. So they both have the same proportion of occurrences with subordination.y) 102 5 3 3 F(y) 15537 24009 34845 19246 MI 6.00 Table (7.71 10. y) that in at Allah ‫أن‬ ‫ف‬ ‫بـ‬ ‫ال‬ F(x.05 2.15 1.42 0.suspicions most meeting Allah54 certainly certainly very that in much in ‫الظنون‬ ‫غالب‬ ‫ملقو ال‬ ‫كل الظن‬ ‫مؤكد‬ َ‫ظنا‬ ‫أن‬ ‫بـ‬ ‫كثي‬ ‫ف‬ 22 18 11 4 4 17 579 252 6 48 25 22 14 6 7 64 15537 34845 1547 24009 10. (x.14 6. However.77 10.18) the1st & 2nd left collocates with h}asiba with minimum frequency 3.25 9. The first thing to notice from the above table is the high frequency of an ‘that’ (an introducer of a subordinate clause) which occurred 579 times with z}anna. an ‘that’ has almost the same percentage with the both items: h}asiba (388/102= 26.47 10. whereas none of these intensifiers occurs with h}asiba. belief and certainty.We can also see that z}anna collocates with the full range of intensifiers such as ‘certainly.z}anna ------- -----. For practical reason. We can say then that z}anna collocates with a word denoting belief in resurrection and this involves certainty. we can easily include all senses of z}anna: probability. However. very’. most. we can fit all these senses in an epistemic scale. a state of strong or weak possibility. whereas ‘possibility’ and ‘doubt’ is the lowest. we can say that z}anna is something that can increase or become more certain.e.a) ‫ول تسب ال غافلً عما يعمل الظالون‬ wa la tah}sabanna Allaaha ghaafilan camma yacmalu alz}aalimuun. So. as represented in the following scale. is to denote belief. We then see that ‘z}anna mulaaquu Allaah’ (they believe they will meet Allah) has a high MI score at 10. doubt possibility probability necessity prediction factuality ‘Factuality’ in the above scale represents the highest degree of certainty. this sense eliminates its use in relation to the prophet in the following verses: (19. to get the unanimity of all lexicographers by just sticking to one sense which resides halfway between ‘doubt’ and ‘certainty’ or between ‘doubt’ and ‘certainty not’. much. the dominating sense for z}anna so far. the use of z}anna to mean ‘believe’ reflects a faith-related commitment. In fact. Therefore.71.a) Qur’an: 2:46). Therefore. certainty not (certainty) In fact. even after a further assessment of all possible occurrences of both items. there are some occurrences of z}anna which are assumed to denote probability or doubt as mentioned earlier. It can increase to reach a level of conviction as mentioned above in example (16. ` 142 ----.z}anna ------- factuality . on the basis of the evidence given throughout. i. that suits all potential addressees. Similar phrasing can be earmarked in Qur’an in more than one place. ‘this is to relieve the Prophet (Muhammad) after relating to him this sad story about the people of Abraham and how impudent they are in discrediting his religion. because this is put in an admonishing style. not just for the Prophet. In Al-Qurtubi’s explanation. However. h}asiba should have another meaning. ‘O ye who believe. that the addressee in the following verse (14: 44) is the Prophet shows that the addressee in the previous verse has to be him as well. Consider not that Allah is unaware of that which wrongdoers do. So-not think Allah breaking promise-His Messengers-His. it is obvious that the ` 143 . Secondly.b) ‫فل تسب ال ملف وعده رسله‬ fala tah}sabanna Allaaha mukhlifa wacdihi rusulah. Accordingly. (14: 42) (19. So think not that Allah will fail to keep His Promise to His Messengers.’ To know. in the first place. believe’ (4: 136). For instance. (2) To draw the attention to the fact that Allah is aware of the wrongdoers actions and He will punish them accordingly. (14:47) Two explanations are given in Tabari’s Tafseer (Commentary on the Qur’an) for h}asiba in this particular context: (1) To highlight the Prophet’s belief that he does not consider Allah unaware of what the wrongdoers do. the addressee in the above verses can include all categories of the participants in the speech-act: the speakers. Allah says. the literal meaning will not be infringed. different from z}anna which means something in between certainty and doubt. the listener/ reader and the audience. If the addressee is not the Prophet.And not think Allah unaware of-what do-they the-wrongdoers. We would better define it as a verb that refers to the inclination of one’s heart to think. This is in conformity with the basic idea of prophethood and the revelation which is for the good of the whole people. he said. In this case.e. meaning direct exhortations. or negative when connoting prohibitive warnings (ibid 110-111). Imperatives can be positive. prohibitive warnings. the literal meaning of imperative mood is for direct instructions and admonition. the use of h}asiba in this way implies that the message to be delivered is enough to treat a superficial problem that did not find its way to the heart. The negated imperative occurs 37 times with h}asiba (i.12%). object or complement. coupled with the language of direct address are significant in religious discourse where the speaker tries to remedy the defects of the listeners/ hearers without any sort of sophisticated locution.two verses (19a & b) are imperative and negative at the same time. Let us now look at some possible explanations as to why this is so. 1966: 34). Thus. is the passive part of the speech-act. The personal pronoun. This sense. 9. it can reduce the ` 144 . This is quite significant in drawing up the differences between z}anna and h}asiba. We have seen that h}asiba occurs as negative imperative more than z}anna. namely the imperative mood. The language of direct address is an appropriate vehicle for effective communication. where the addresser seems as if holding a conversation and talking to the addressee directly. they are used as prohibitive warnings. Is not that proof that h}asiba is a passive word? No. First of all. Secondly. we cannot make that claim before we assess the other part of the description. In this context I examined z}anna and h}asiba in verbal forms and it turned out that all their occurrences in negative imperative are followed by clausal complements (subordinate clauses) and these clauses can function as subject. singular or plural. the pronouns used with h}asiba in imperative case must be second person. Therefore. Basically. the addressee. the person receiving the message. All occurrences of h}asiba and z}anna in imperative mood are accompanied by the negative form. feminine or masculine. The speaker only aims to touch the souls of his/ her audience in a simple and short cut way. you is used in ‘a direct address language’ (Leech. This use is only typical with h}asiba.56%) and only three times with z}anna (0. gave a lengthy treatment of such a problem. الذين يظنون أنم ملقو ربم وأنم إليه راجعون‬ wa istaciinuu bi-l-s}abri wa al-s}alaati wa innahaa lakbiiratun illa cala alkhaashciin. 2: 46) (20. They are those who are certain that they are going to meet their Lord and that unto Him they are going to return. Therefore the main distinction between h}asiba and z}anna is that the former is used for deeply held belief or conviction whereas the latter is for superficial belief (i.) . Allah gives an account.discourse complexity. belief about relatively unimportant issues). trans by M. Khan.a. For example. I did believe that I shall meet my Account.) . using z}anna.c. Let us consider the following examples of z}anna: (20. 69: 19-22) (20. (The Noble Qur’an. واستعينوا بالصب والصلة وإنا لكبية إل على الاشعي. Then as for him who will be given his Record in his right hand will say… Surely.b. And as-for who given-(him) book-his in-right-hand-(his) so-says … verily-I thought that-I meeting reckoning-me. z}anna is used throughout the Qur’an subcorpus to mean a state of belief or disbelief that leads either to heaven or Hell-fire. And seek help in patience and prayer. (Qur’an.) ` 145 .‫وأما من أوتى كتابه بيمينه فيقول هاؤم اقرؤوا كتابيه إن ظننت أن ملق حسابيه‬ wa ammaa man utiya kitaabahu bi-yamiinihi fa-yaquulu … inni z}anantu anni mulaaqin h}isaabiyah. As for z}anna as in (Qur’an 2:154-171). He. which is cowardice or fear of death. after it had found its way to their hearts.e. And seek-help-you with-the-patience and the-prayer and verily-it very-big except on the humble who think that-they meeting Lord-their and that-they to-Him returning. alladhiin yaz}unnuuna annahum mulaaquu rabbihim wa annahum ilayhi rajicuun. therefore. and truly it is hard except for the humbleminded. by expressing in just one or two sentences (as in example (19 a & b) above) what would otherwise have been expressed in a lengthy address with z}anna.‫. of the behaviour of some Muslims in the battlefield and the remedy of it. he thought that he would never come back (to Us)! (Qur’an. Said to-her enter-you the-building so-when saw-it thought-it pool. But whosoever is given his Record behind his back. And round about-them boys overlasting if see. 84:10-14) Now let us have a look at the following examples of h}asiba. If you see them.) ‫يسبهم الاهل أغنياء من التعفف‬ yah}sabuhum al-jaahilu aghniyaa’a mina al-tacaffuf.you-them think-you-them pearls scattered. but when she saw it. Thinks-them the-not-knower rich from modesty. It was said to her: Enter the palace. and she (tucked up her clothes) uncovering her legs. He will invoke (his) destruction. (21.b.) ‫قيل لا ادخلي الصرح فلما رأته حسبته لة وكشفت عن ساقيها‬ qiila lahaa udkhuli al-s}arh}}a fa-lammaa ra’athu h}asibathu lujjah. And as for who given-(him) book-his behind back-his so-will invoke-he destruction… Verily-he thought that not return. … Verily.a. she thought it was a pool. 27: 44). ` 146 . you would think them scattered pearls (21. (21.” (Qur’an.‫وأما من أوتى كتابه وراء ظهره فسوف يدعو ثبورا ويصلى سعيا إنه كان ف أهله مسرورا إنه ظن أن لن يور‬ wa amma man uutiya kitaabahu waraa’a z}ahrihi fa-sawfa yadcu… innahu z}anna an lan yahuura. And round about them (will serve) boys of everlasting youth..c.) ‫وَيطُو ُ عَلْيهمْ ِلدَا ٌ مخّ ُونَ إِ َا رَأْيَتهمْ حَسبْت ُمْ ُلؤُْلؤًا ّنُورًا‬ ‫مث‬ ‫َ ف َ ِ و ن ّ َلد ذ َ ُ ِ َه‬ wa yat}uufu ‘alayhim wildaanun mukhalladuun idhaa ra’ytahum h}asibtahum lu’lu’an manthuuraa. the latter wanted to impress her in a way that makes her believe in Allah. by contrast. The discourse characterised by h}asiba tends to be an immediate reaction which is mainly positive in the sense that it represents only what is the case. She came to her decision just by mere sighting.e. without deep thinking. These perspectives can be true with someone and false with another according to how accurate or inaccurate his perception of something is. complement each other. although apparently unrelated. we have probed two different semantic features that distinguish between h}asiba and z}anna: positive vs. We can eventually say that h}asiba and z}anna are verbs whose meanings imply a personal element which is described by Badawi (2000) as an introducer for the relationship that holds between subject-predicate on the basis of one’s own point of view. thinks that they are rich because of their modesty.The one who knows them not. negative and immediate reaction vs. so she thought nothing was there and tucked up her clothes. In conclusion. ` 147 . i. He asked her to enter a glass palace built on water. considered reaction. In (21a) when Queen Belqees visited Solomon. z}anna as discussed above is based on personal perspectives residing in one’s own mind with which he can believe in the validity or the invalidity of a given concept. She had never seen such edifice before. But h}asiba describes a personal state attained via feelings or mere senses rather than on facts and knowledge. it expresses one’s personal evaluation of the situation or state of affairs referred to. 2: 273) We can notice in the above examples that h}asiba is used to describe one’s own impression of a particular situation. The two features. So the use of h}asiba here refers to a state of roughly-held perspectives based on non-methodical conception inducted to one’s mind or heart through mere sighting as in (21a-b) or hearing or by prediction as in (21c). (Qur’an. So the semantic feature which can be deduced out of these differences between h}asiba and z}anna is that the former is immediate reaction (based on one’s feelings or mere senses) whereas the latter is considered reaction (based on one’s own ideas which he obtained after long contemplation on it). With z}anna. it gives the impression of a considered reaction which is mainly a negative report of the events. (14: 42) 22. (20.Therefore. The replacement seems to work for the second sentence but not for the first. If we try that with the examples above.b. it turns out that the synonymy relation can no longer hold between h}asiba and z}anna. we get.a. And not think Allah unaware of-what do-they the-wrongdoers.) ً‫أحسبك رجلً عاقل‬ ah}sabuka rajulan caaqila. Think not that Allah is unaware of that which wrongdoers do.) ‫ول تسب ال غافلً عما يعمل الظالون‬ wa la tah}sabanna Allaaha ghaafilan cammaa yacmalu al-z}aalimuun. This is because the addressee in (22a) is the Prophet who basically believes in Allah’s ` 148 . believe-you man rational I believe you are a rational man.) *‫ول تظنن ال غافلً عما يعمل الظالون‬ wa la taz}unnanna Allaaha ghaafilan cammaa yacmalu al-z}aalimuun.b. believe not that Allah is unaware of that which wrongdoers do. Think-you man rational I think you are a rational man. we should be able to exchange one word for the other without changing the meaning of the sentence to any great extent. (14: 42) (20. One more piece of evidence is that if we assume that h}asiba and z}anna are synonymous. (22a. ً‫أظنك رجلً عاقل‬ az}unnuka rajulan caaqila. And not think Allah unaware of-what do-they the-wrondoers. represented in 1) singling out the most central forms of the pair. ` 149 .1 Summary In this section we had to carry out some preliminary analysis prior to the statistical tests.ultimate power and has no doubt that Allah is a ware of everything. h}asiba can only fit with its meaning ‘Do not let the phenomenal situation of Allah’s wisdom (in postponing the punishment of the tyrants and the wrongdoers and giving them the upper hand) be inducted to your mind or heart through just mere observation of the situation. 2) discussing the grammatical and semantic position of both words. immediate reaction. We found out that the T-test is not useful with this pair of words. 7. simply because he is a prophet. positive and considered vs. So. 3) refuting the polysemous nature of z}anna as having two opposite senses. because lists of collocations with both are different.6. Then we have identified interesting differences between the pair of words by probing the semantic features of both: negative vs. we have only used MI to highlight how significant the collocations of both words are. Statistically. the difference between them can be brought about by MI. So z}anna which means something based on facts does not fit in here. will be examined below to see if they are absolute synonyms.7 A case study: The word pair h}bb and wdd ‘love’ The synonymous pair. h}bb and wdd .7. we will search all the word-forms that occur in CAC.) he loves him they love her love he beloved her love he loves him he loves her my love your love the lovers pl. Lexical Items 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 ‫يحب‬ ‫المحبة‬ ‫الحب‬ ‫أحب‬ ‫أحب‬ ‫الحبيب‬ ‫تحب‬ ‫المحب‬ ‫تحبون‬ ‫حبه‬ ‫محبته‬ ‫الحباب‬ ‫حبيبي‬ ‫يحبه‬ ‫يحبون‬ ‫حبها‬ ‫المحبوب‬ ‫محبتها‬ ‫يحبه‬ ‫يحبها‬ ‫محبتي‬ ‫محبتك‬ ‫الحبة‬ ‫حبيبك‬ ‫حبي‬ ‫أحبها‬ POS V N N V V N V N V N N N V V N N V N V N N N N N V N Frequency 386 213 209 144 126 104 72 61 58 51 47 46 45 42 38 33 26 26 24 24 17 16 14 13 12 12 he loves the love the love I love he loved the lover (mas. Let us have a look first at the dictionary meaning in table (7. Searching every category separately takes a lot of time but it is more accurate.) love) his love his love the beloved persons my love (masc.19) definitions of h}bb and wdd in four dictionaries Al-Muhiit Having made a search for the exact match we found out that the output is dramatically less than the one reached by using a wild card although the results include all word classes of the above lexeme.) my love he loved her ` 150 . which are commonly taken to mean ‘love’. your lover (masc.) (you (sing. So.) love) the lover (they (Nom. The total number of the occurrences of the base-word h}bb in CAC is 1972 and the search result can be represented in the following table. This could leave some word-forms without analysis because they did not occur in our corpus. Table (7.19) below. fem. he loved them the lovers (nom) you love her love one another (pl. to discuss them all will be a tedious work and time-consuming. However.. Nominal Verbal Frequency 638 786 Percentage 45% 55% Table (7.) her beloved person (acc. fem. jussive/ acc..21) above shows that these top ten word forms comprise most of the overall occurrences of the base-word h}bb. h}bb.) N V N N N V N V N N N N V N N V N N V V V N N V V V V N N N N 9 9 9 8 7 7 6 5 5 4 4 4 4 4 4 4 4 3 2 2 2 2 2 1 1 1 1 1 1 1 1 Total: 1972 Table (7...) love (your love) their love the lovers pl. They altogether form 72% of the total frequency.) the beloved person (fem. jussive/ acc.) you (dual) love they (pl.) his beloved person (nom.) love they love one another (pl. Instead. fem.) they loved them love a beloved (acc.20): Lexical Frequency of h}bb in CAC The lexical items in the table above are all derived from the same root.) two lovers (acc.) their love (acc.) the (dual) lovers the lovers he loves them they love one another they love him they (dual) love his beloved (masc.. which ` 151 Total 1424 . we can choose the most frequent items from the above list to analyse and see if we can get a significant understanding of the whole scope.) your love (pl.) the lovers (acc.27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 ‫حبيبتي‬ ‫محب‬ ‫يحبونهم‬ ‫التحاب‬ ‫حبيبا‬ ‫الحبيبة‬ ‫محبتكم‬ ‫تحابا‬ ‫تحبوا‬ ‫محبتكم‬ ‫محبتهم‬ ‫الحباء‬ ‫أحبهم‬ ‫المحبون‬ ‫تحبها‬ ‫تحابوا‬ ‫المتحابان‬ ‫المتحابين‬ ‫يحبهم‬ ‫يتحابون‬ ‫يحبونها‬ ‫يحبان‬ ‫حبيبه‬ ‫تحبان‬ ‫يحبوا‬ ‫يتحابوا‬ ‫تحاببتم‬ ‫حبيبته‬ ‫حبيبتها‬ ‫محبتين‬ ‫محبتهن‬ my lover (fem.) they loved each other (dual) you (pl.) you loved one another (pl.21) Statistics of the top ten words of the base-word h}bb in CAC Table (7.. based on corpus linguistics techniques. fem. To be consistent.205 1.223 683.141 69. The list above shows that the base-word h}bb is more frequently used in Fiction. 029.387 V 43 105 107 196 9 23 10 36 26 231 N 7 28 301 82 8 49 11 70 18 64 Total 50 133 408 278 17 72 21 106 44 295 55 . ` 152 . 004. The occurrences of the words under examination in these sub-corpora exceed the overall occurrence of such words in the whole corpus. 070.22): Subcorpus frequencies of the top ten forms of the base-word h}bb in CAC. shows that it is more likely to be frequent in love stories and fiction.000. unlike wdd. Subcorpus The Holy Qur’an Biography Fiction Hadith Lexicons Philosophy Poetry Proverbs Science Theology Text size 88. Total 5. We can also notice that there is no big difference in frequency between verbal and nominal forms.is enough to work on for a realistic result. 004.000 786 640 1424 028.933 579. So let us see how often this word occurs in different kinds of texts. 028. Table (7.970 404.028%. we need either to avoid analysing this item or treat it as exceptional since we are working from the very beginning on just general words as mentioned in 7. the Holy Qur’an and Hadith than in any other text type.385 362. Initial observation of the base-word h}bb.622 393. 033. 040. one can see the distribution of the base-word h}bb (of the above top ten items) in the different subcorpora in the CAC.054 903. They are least frequent in the texts that are considered technical like 55 By percentage Imean the ratio of the item under examination ‘hbb’ per subcorpora. 015. which is .2. This can help us find out whether it is a general word or register-specific.037. In the examples below.080 478. related to love stories and fiction. 030.Perc 056. Freq 105 77 10 9 6 6 6 5 5 5 4 4 4 4 3 3 Right1 who Allah who people he man the son the father the lover the soul sins abandoning their hearts his Messenger repentant the king ‫من‬ ‫ال‬ ‫الذي‬ ‫الناس‬ ‫وهو‬ ‫رجل‬ ‫الولد‬ ‫الوالد‬ ‫البيب‬ ‫النفس‬ ‫الذنوب‬ ‫فراق‬ ‫قلوبم‬ ‫رسوله‬ ‫التوابي‬ ‫اللك‬ . it is probably important in that text. [al]h}ubb and [al]mah}abbah in a window of 2 items on either side of the search-term with a minimum frequency of 3. Therefore. Let us now search our corpus for the words in question to see which dictionary meaning mentioned in table (7. In table (7. This can be done by concordances. this could be an indication that the more general the text is the more likely the word love occurs.19) is the most common and what semantic feature/s are associated with it.23) below is a list of the immediate left and right collocates of the forms yuh}ibbu.Freq 57 47 26 15 12 12 12 11 11 11 11 11 9 8 7 7 Left1 Allah who righteous sexual intercourse repentant corruption Messenger for himself transgressors the pious the perseverant the purified Muhammad Ansar those who trust Allah oppressors ‫ال‬ ‫من‬ ‫الحسني‬ ‫الماع‬ ‫التوابي‬ ‫الفساد‬ ‫رسول‬ ‫لنفسه‬ ‫العتدين‬ ‫التقي‬ ‫الصابرين‬ ‫التطهرين‬ ‫ممد‬ ‫النصار‬ ‫التوكلي‬ ‫الظالي‬ ` 153 . Analysing the co-occurrences of h}bb shows that this word occurs in a pattern. which are able to detect patterns of usage in different contexts. if a word is frequently used in a specific text. . Secondly. but if it is frequently used in all texts. In the first place.linguistics. This can enable us to examine their collocation easily and discover what words they group with. science and philosophy. it is not important in any of them. h}bb is a general word because it occurred in all texts and is used frequently in general texts. however the most frequent left collocate in the list is the word Allah. man 6. son 6. simply because we filtered the results by removing adjunct56 examples. “What does X love?” Most of the objects listed in the table above are either good or bad qualities. It emerged that all of the right collocates which can stand for subjects are animate. 57 In Arabic relative pronouns are of three types: +human (e. general (such as alladhi ‘who’ for masculine and allati ‘who’ for feminine). We then can conclude that the base-word h}bb can describe someone’s strong feeling of liking towards something. That thing which is loved can either be animate 56 Non-nuclear elements in the sentence like adverbs. father 5. ` 154 . Also we have objects like Messenger. dog 3. The concordances show that the most frequent subject in the list is the relative pronoun ‘who’ 105 times57. woman.e. ma ‘which’).g.3 3 3 3 the world faith Zabeedah dog ‫الدنيا‬ ‫اليان‬ ‫زبيدة‬ ‫الكلب‬ 6 5 4 4 4 4 4 4 3 3 3 3 3 3 3 3 disbelievers his action the poor who man the soul woman praise optimism fun sleep traitors food people life ‫ أحدكم‬one of you ‫الكافرين‬ ‫عمله‬ ‫الفقراء‬ ‫الذين‬ ‫الرء‬ ‫النفس‬ ‫امرأة‬ ‫الدح‬ ‫التيمن‬ ‫اللهو‬ ‫النوم‬ ‫الائني‬ ‫الطعام‬ ‫الناس‬ ‫الياة‬ Table (7. did not constitute subjects or objects for the verbs or complement of the noun phrase. Studying the right and left collocates of h}bb (verbal and nominal) can reveal potential subjects. fun and sleep. -human (e. Those examples.23): The base-word h}bb in a window of two items on either side. people 9. man ‘who’). left or right collocates). adjectives etc. the base-word h}bb reflects one’s inner feeling of liking something. We can also examine the objects and then ask. the word Allah 76 times. man. soul 5. Not all hits are represented in this table or discussed below. food. although they contained the desired lexemes (i. So.g. ) their love his love you love him the lover 155 N N 37 V V V V N N V N N N N N N N V V N N N V N . (2) friendship.) your love (sing. sleep.) she loved love their love (mas. which is more problematic because there is no consistency in explaining its meaning in the Arabic Qur’anic exegeses and in translating it afterwards..) my love love (acc.) her love your love (mas. (3) sexuality. pl. man or inanimate such as food. and (5) non-human objects. fun.. Let us now have a look on the other item of the pair: wdd. acc. (4) family. Lexical Items 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 ` ‫المودة‬ ‫الود‬ ‫يود‬ ‫ود‬ ‫تود‬ ‫أود‬ ‫مودته‬ ‫الودود‬ ‫يودون‬ ‫ودي‬ ‫وده‬ ‫مودتي‬ ‫ود‬ ‫مودتك‬ ‫مودتها‬ ‫مودتكم‬ ‫ودت‬ ‫تواد‬ ‫ودادهم‬ ‫ودهم‬ ‫وداده‬ ‫توده‬ ‫الودود‬ POS Frequency 78 32 23 13 11 10 8 7 7 5 6 5 3 3 3 3 3 2 2 2 2 2 the love the love he/they love he/they loved you love I love his love (mas. kindness or friendship as will be discussed below.) the lover they love I like his love (fem. The collocates of h}bb can be summarised in terms of frequency in the following domains: (1) religious experience.such as people. It is sometimes translated as affection. is used to refer to humans only. The only semantic feature that can be extracted out of these examples is that wdd co-occurs with +human lexemes whereas h}bb is more general as it can co-occur with +animate lexemes. .Freq 12 11 10 4 4 3 58 Left1 one of them ‫أحدهم‬ who ‫الذين‬ many of ‫ كثي‬them somebody ‫فلن‬ the friend ‫الصديق‬ family ‫أهل‬ Table (7. Examining the first node on the left and right hand side of wdd. attached or detached.) my love love him the love they love they love them her love he loves you he loves him N N N N N N V N V V V V V 1 1 1 1 1 1 1 1 1 1 1 1 1 Total: 280 Table (7. the frequency of wdd constitutes only . ` 156 . 58 In Arabic.005% (280 occurrences) in the whole corpus whereas h}bb is .24): Lexical Frequency of wdd in CAC We can notice that the overall frequency of wdd in the whole corpus is far less than h}bb. as represented in table (7.25): The base-word wdd in a window of two items on either side. does not give as much significant information about collocation as h}bb. the personal pronoun in plural masculine position. This can be represented in the following table.24 25 26 27 28 29 30 31 32 33 34 35 36 ‫مودتنا‬ ‫وداد‬ ‫ودادي‬ ‫مودتهم‬ ‫ودادكم‬ ‫ودادتي‬ ‫أوده‬ ‫التواد‬ ‫يودوا‬ ‫يوادونهم‬ ‫ودها‬ ‫يودك‬ ‫يوده‬ our love love my love their love (mas.039% (1972 occurrences).25) above.) your love (mas.Freq 15 who Right1 ‫من‬ . Al-Nisa’:102) (25) ` 157 . if you were negligent of your arms and your baggage. law (if) followed wdd (verbal) 27 times.ً‫.. Al-Baqarah: 96) (24) . We can say then that wdd behaves like verbs of imagination such as hope and wish because of the similarity between wdd as a verb having if-clauses following it and these verbs having the same function. to attack you in a single rush. Let us have a look at the following examples... (23) . و ّ اّل ِينَ كَفَرُواْ َلوْ َتغْفُونَ عنْ َأسلِحتِكمْ وأَمِتعتِكمْ فَيمِيُونَ عليْ ُم مْيلةً وَاحدَة‬ ِ َ ّ ‫ُل َ ْ َ ُ َ ْ َ ُ َ ل ََ ك‬ ‫َد ذ‬ wadda alladhiina kafaruu law taghfluuna… fa-yamiiluuna calaykum Wish those disbelieve if neglect-you about arms-your and baggage-you so-attackyou attack one.26) Left and right collocates of h}bb and wdd wdd (R) X X √ √ X X X X X Further analysis of the same collocates without applying a filter may reveal other semantic features invisible to us within a span of two..h}bb (L) h}bb (R) wdd (L) Allah √ √ X Messenger √ √ X Man √ √ √ Woman √ √ √ Food √ X X Sleep √ X X Dog X √ X Sexual intercourse √ X X Fun √ X X Table (7. (Qur’an. Everyone of them wishes that he could be given a life of a thousand years. (Qur’an. For example. ‫َيو ّ أَحدهُمْ َلوْ ُيع ّرُ أَلْفَ سنة‬ ‫َم‬ ُ َ ‫َد‬ yawaddu ah}aduhum law yucammaru alfa sanatin wish one-of-them if live-long thousand years. law did not co-occur at all with h}bb. on the other hand. Those who disbelieve wish. In a span of five on each side of the search-term we found out that that word tends to occur in a certain semantic profile different from h}bb. However. 55 of which are followed by law (if). This apparently applies to wdd in verbal forms. (Qur’an. This leaves 57 examples and after examining them we do not get any interesting collocation either. as shown in table (7. So we will exclude these 55 examples to get a reliable comparison between h}bb and wdd meaning affection. he will wish that there were a great distance between him and his evil.‫. wdd in verbal form occurred 112 times in CAC.Freq Left1 . Below we will examine the ` 158 . Right 1 . َيوْمَ َتجِ ُ ك ّ َنفسٍ ّا عمِلتْ منْ خْيرٍ محْضَرًا وَمَا عَملتْ مِن سوَءٍ َتو ّ َلوْ أَ ّ َبْيَنهَا وَبْينَ ُ َأمدًا َبعِيدًا‬ َ ‫َ ه‬ ‫ُ َد ن‬ َِ ّ َ ِ َ َ ‫د ُل ْ م‬ yawma tajidu kullu nafsin maa camilat min khayrin muh}d}ara wa ma c amilat min suu’ tawaddu law anna baynaha wa baynahu amadan baciidaa day find every soul what did of good present and what did-it of evil wish-it if that between-it and between-it distant time.27) below. and all the evil he has done..Freq 5 5 5 2 ‫ومن‬ ‫أن‬ and who 5 that ‫ما‬ ‫كثي‬ not 4 many ‫َد‬ ّ‫و‬ in his ‫ بقوله‬saying 3 wished ‫قوله‬ ‫الذين‬ His saying 3 who Table (7. Therefore we will exclude examples containing if-clauses. That extra sense (wish) is obviously a good distinction between h}bb and wdd. this is mentioned in the dictionary meanings. to make the analysis to find subtle differences between h}bb and wdd we need to stick only to one side of the meaning: affection. Al-Imran 30) The use of wdd followed by an if-clause in the above examples sheds light on the possibility of using this word to mean either love or wish.27): Collocates of wdd after excluding the instances followed by if. because none of the if-clauses occurred after wdd in nominal form in CAC. On the Day when every person will be confronted with all the good he has done. As looking into the concordances of wdd in a small span does not show any significant collocation we need to increase the span to see whether we can get any particular distribution of that word. tadnu ‘come closer’. We found this test useful in summarising the whole data which we can use for further analysis. yabdhul ‘give’.word wdd meaning affection in nominal forms. wdd occurs more frequently with verbs that mainly describe a concrete or observable action such as ta’ti ‘come’. ` 159 . yud}mir ‘hide’. We only selected the verbs with minimum frequency 3 for the test below. yaddaci ‘claim’. In case of a limited list like the one we have in table 7. we found the following results: 1) None of the intensifiers or adverbs of degrees. 3) The verbs that co-occur with h}bb mainly describe an abstract or unobservable action: tahakkama fi ‘control’. 4) The preposition fi ‘IN’ occurs 37 times with the verbs that precede h}bb. yussir ‘does discretely’. tanqatic ‘cut off’. kathrat ‘much’ and zaa’idah ‘exceedingly’. yufrit} ‘exaggerate’. yarzuq ‘bless’. yuksib ‘cause to gain’. zaa’idah ‘excessively’ (5). sakanat fi ‘rest in’. yas’al ‘request’. such as shadiid ‘very’. yucadhib ‘torture’. The last four verbs tend to be concrete. did occur with wdd. ra’a ‘see’. The last verb is the only example which describes an abstract action. mazaja ‘establish’. kathrah ‘much’ (4). 2) Some verbs occur more often with wdd than with h}bb. whereas h}bb occurred with intensifiers like shadiid or shiddah ‘very or strong’ (17 times). ad}aaca ‘waste’. zaada fi ‘increase’.29 we prefer to run the t-test statistic only. whereas it occurs twice only with wdd. incaqad ‘interlink’ and jacala ‘make’. and adverbs like fart}u ‘exceedingly’ (6). waqaca fi ‘fall in’. yaquum cala ‘maintain’. We then need to compare the two sets of verbs and determine how likely the difference between the two sets occurred by chance. yajlub ‘bring’. For instance. yu’thir ‘prefer’. 59 I used to only search the items whose MI scores are significant. this can be done by the t-test59. tashtadd ‘strengthen’. yuz}hir ‘disclose’. yunaasih} ‘does sincerely. Having searched the concordances of wdd and h}bb in that bigger span. abana ‘show’. tarjuu ‘wish’. ‘alqa fi ‘put in’. 4 . taghalghala ‘establish’..7 P< 0.20 2. 0.02 P< 0.Not sig 1.3 P< 0.V fall in increase establish bring keep come cut does discretely claim request give disclose ‫ف يقع‬ ‫يزداد‬ ‫تغلغل‬ ‫يلب‬ ‫يفظ‬ ‫أتى‬ ‫قطع‬ ‫يسر‬ ‫يدعي‬ ‫يطلب‬ ‫يبذل‬ ‫أظهر‬ f(h}bb /w) 21 10 5 5 0 0 0 0 3 0 1 2 f(wdd/w) 1 0 0 0 5 4 4 3 0 3 3 3 Gram. In the table above the higher the t-score the more different the pair under examination. it co-occurs with verbs that describe an abstract action.20 P< 0.29) T-score of h}bb and wdd (nouns).1 2.3 2.7 1.7 1.0 1. yajlub ‘bring’ and yaddaci ‘claim’.0 2. Function O S S O O S O O O O O O T 4.20 .. We can notice that h}bb gets the higher t-score in the context of verbs like waqaca ‘fall’.20 P< 0.20 2.26 3.3 Significance P < 0.0001 P < 0.Not sig . zaada ‘increase’.0 Not sig.Not sig Table (7. On the other hand wdd gets higher t-score when co-occurring ` 160 .Not sig . and have (on the contrary) driven out the Messenger and yourselves (from your homes). even though they have rejected the Truth that has come to you. cut. in a span of five on both sides.th. the following Qur’anic verse can be a piece of evidence in favour of the above conclusion as shown (26): (26) O ye who believe! Take not my enemies and yours as friends (or protectors). we can conclude that wdd (as in result 1) is more emphatic than hbb. the frequent use of motion verbs with wdd (as in 2 and 3 & table (7. wdd is used with verbs that express a practical action which affects somebody else. In other words. s. which might be an indication that h}bb tends to be contained or lying in a particular place. the preposition IN. it expresses an abstract action like X falls in love. occurs more frequently with h}bb (as in 4). Finally. Al-Mumtahinah: 1) This verse was revealed about a man (Hatib ibn Abi Baltacah) who was in the Muslim army ` 161 . brings love. X claims love. So we can conclude that wdd is +emphatic and +concrete. giving him etc. because intensifiers are superfluous items used to amplify actions. a further look on the concordances of the pair. such as cutting a relation with him.29)) shows that wdd is more concrete than h}bb. offering them (your) mawaddah (love). keep. This gives another evidence that h}bb is an abstract feeling. I mean a private action which does not necessarily affect the recipient. reveals that qalb ‘heart’ co-occurs 79 times with h}bb and only once with wdd. Thirdly.with verbs that refer to a concrete action. Secondly. i. asking him. maintaining a relation with him.29). So the absence of intensifiers often indicates more emphasis. As for h}bb. (simply) because ye believe in Allah your Lord! (Qur’an. request and give’. which means containment or inclusion. love increases. On the basis of the above results (1-4) and table (7. By abstract action. locating or limiting the activities of the contained entity. Secondly. this indicates that there is a strong bond between them and that the heart is traditionally and psychologically connected to feelings like h}bb. Because the word heart co-occurs more frequently with h}bb. love is established in his heart. such as ‘come.e. Moreover. wdd is more general than h}bb. He sent a message to the pagans of Quraysh requesting protection for his children and relatives left behind in Makkah in return for information about the Muslims’ strategy and weaponry being prepared to conquer Makkah. the widely claimed four synonymous pairs discussed above can be summarised as follows: •intrinsic vs. 7. This is because all occurrences of hbb in verbal forms with Allah show that Allah loves particular people who are righteous and does not like the wrongdoers. hbb (love) is commonly understood. in the first place. this would exclude some people from His bounties and blessings.heading towards Makkah to liberate it from Pagans. further (as between ata & jaa’a ) •negative vs. the pious who are real true believers. When the man was caught he declared that he hates those people to whom he sent the message and he was truthful about his feeling. which could be in Allah’s sight. In other words. as a bond between two entities and some kind of need. if Allah named Himself Alh}abiib.e. which are available to all people. immediate reaction (as between z}anna & h}asiba) •considered ` 162 . Secondly. based on the above remarks. So if He named Himself Al-h}abiib this would be a static attribute that eliminates some people forever. the semantic feature that can be extracted to differentiate between hbb and wdd is abstract vs. concrete. This story is recorded in the Qur’an where Allah described the favour he did towards the People of Makkah as wudd. it is a state of lack of control. extrinsic (as between ithm& dhanb) •closer vs. The Prophet said he was truthful. i. one of Allah’s names is al-waduud (the Loving).8 Conclusion First. This is because. h}asiba & z}anna) vs. Fourthly. h}bb is devoted to particular persons. He only intended to do the Makkah people a favour by virtue of which his family and property in Makkah may be protected. These apparently do not fit with Allah’s perfection. positive (as between ata & jaa’a . Therefore. Applying statistics to find anything interesting about their distribution. ` 163 . Identifying the word class of a given item. the collocation of a word which is a verb. Analysing collocation. we used the following methodology to test the synonymy between two items of a given pair. is more likely to be found in the right hand side in Arabic. This is done manually. concrete action (as between h}bb & wdd) Secondly. Determining the syntactic function of the term under investigation. For example. This is important in looking for collocation because it enables us to know which word is more significant. Analysing the context to understand how/when the variants are used (semantic prosody).•abstract vs. Substituting one word for the other to see if any change happens in the meaning of the sentence. The identification of a semantic feature of the search term according to their contextual use. It is also important because sometimes we need to look at the complement of an item. Using a corpus to help get hold of all the occurrences of the pair under investigation quickly and accurately. Chapter Eight: Conclusion Arabic corpus linguistics is a very active area. syntax. 99). in Arabic a given word is expected to appear less often than in an English text of the same length (Goweder and de Roeck (2001)). More than that. Arabic corpora should be big enough to be reliable for generalisation. I do not claim that my analysis is correct or privileged. One of the main important contributions this study made is providing a computational Arabic corpus of the early classical Arabic. it can give new insights and introduce rules and models which have not been previously discussed. but rather that it is more methodical and systematic than one based on intuition. p. For example. This can be done by abstracting semantic features through comparing differences observed in their contextual idiosyncrasies and examining practical examples of the usage of such items. I had to rework what I have done several times because of the incessant contributions in this field. With regard to size. In this way. ` 164 . classical in particular. especially when discussing the available corpora and tools that work on Arabic language. etc. semantics. This is because of the inflectional nature of Arabic and the abundance of its vocabulary (cf. Final findings suggest that applying corpus linguistics methodology to Arabic can help us improve lexical awareness and choice as most Arabic linguists are unaware of the collocational differences between synonymous pairs. due to the richness of Arabic vocabulary. There seems to be no corpus-based research directly analysing synonymous words in Arabic. This corpus will be available for research purposes to be exploited in NLP applications for Arabic and for more accurate analysis of Arabic linguistic phenomena. Corpus-based analysis of items which are often regarded as roughly synonymous in Arabic can highlight subtle differences in meaning among such items. The corpus-based analysis can be used as a successful methodology for testing what has been introduced by early linguists on all linguistic levels (morphology.). let alone ordinary native speakers of Arabic. The outcome was huge. had the dictionary-makers been aware of the subtle differences and uses of seemingly synonymous words they would have made more accurate definitions. Although Nijmegen University recently has managed to create that kind of corpus-based lexicon.000 words. it is only restricted to texts written in Modern Standard Arabic. we might ask ourselves if a dictionary containing 24. Since the prevailing view is that the Arabic vocabulary is very extensive. the macro structure of an Arabic-Dutch dictionary contains 24. with the aid of statistical techniques we can have an accurate account of whether there are systematic differences in the use of certain types of seemingly synonymous words by summarising their distribution in the corpus.absolute synonyms can be ruled out if we come across one context in which one of the synonymous pair carries more meaning. Nonetheless. In the field of Qur’an exegesis lots of work has been done but based on the old perspectives: non-corpus-based. The results given throughout my work imply a need for a fresh look at Arabic studies. Suppose we use the corpus-based methodology to build up an Arabic lexicon. The new and unexpected shades of meanings will raise lots of questions about the credibility of most old and modern Arab contributions in the following fields: 1)Lexicons 2)Interpretation of the Holy Qur’an 3)Translation of Qur’an 4)Jurisprudence 5)Prophetic Traditions (Hadith) 6)Poetry 7)Linguistics In lexicography. some verses are left either vague or misinterpreted because of the vagueness of some lexemes as in verse 2:78 wa minhum ummiyuuna la yaclamuun al-kitaaba illa amaaniyya wa in hum illaa ` 165 . As mentioned elsewhere. yielding various contributions. for example.000 words will serve the user sufficiently when reading or listening to Arabic. has a different distribution or is used in a different register. Also. In Jurisprudence. however. This is a different category from those who know the truth and falsify it mentioned in the verse preceding it (2: 75). And from-them unlettered not know-they the-book except wishes and but they thinkthey. For example. (Qur’an. like Al-Tabari and ibn Katheer. So if we interpret z}anna as doubt as commentators. 2: 78) The verse above talks about some Jews who are illiterate and do not know the reality of their book. Another group of Muslim scholars interpret s}uurah as statue because this is the meaning which was current in the Prophet’s lifetime. we presume that that second category of Jews who do not know the reality of their book do not believe in it. We would rather say there are some Jews who only know the false version of the Bible and they are certain about what they know even if it is false. much of the arguments between Muslim scholars and schools of thoughts arises from their own understanding of the language of the Qur’an and Hadith. But this is not the case since this category is blindly following their scholars and this is a type of belief. They further argue that this ruling is only applicable to statues which are made to be respected and worshipped. Such interpretation could lead to forbidding all types of painted pictures or photographs. As for the translation of Qur’an. One of the main reasons of such differences is their linguistic differences concerning some texts of the Holy Qur’an and Prophetic traditions on the syntactic or semantic level. This meaning cannot not be attained by simple study of the word.yaz}unnuun. And there are among them unlettered people. it rather requires an accurate probing of the whole senses of the word based on the corpus methodology. s}uurah as used in hadiths is interpreted as ‘picture’. they follow their scholars blindly and believe them. This sometimes leads to the difference in understanding and formulating laws derived from such texts. but they trust upon false desires and they but guess. who know not the Book. It depends on the same methodological approach of the author of the exegesis. say. This is the opinion of a big group of Muslims nowadays called Salafis who understand s}uurah as a picture. ` 166 . it is basically based on its exegesis. With this methodology. a particular sense of a word is clarified.Corpus-based analysis can distinguish between the different senses of a given word synchronically or diachronically. ` 167 . and you will have to put your Email address and then you will recive it immidieatly . 2 Mar 2002 10:30:37 +0400 •Next •Give •Upon From: To: Subject: Date sent: Alwaraq website Dear Sir : Thank you for your message . completion.muhaddith. according to the following conditions: •The usage of the book on your site must be for non-commercial purposes. informing us and sending us a link to [email protected] Top of Page You may copy books from our site to another. and adding a link to our site. to each book you copy.uk> Re: I need permission for downloading Sat. mentioning that your source is “Al Muhaddith Project”. refer to our note concerning Ibn Katheer’s summary by Sabooni.com> <Abdel-Hamid.ac. You can get any page you want from any book by pressing the button whose hint is “Send me this page” .org “Moutasem Zakkar” <moutasem@cosmos-software. In the future you will be able to Download any book after paying a fee .umist. As an example. Regards Moutasem Zakkar Technical manager ` 168 [email protected] Appendix 1: Copyrights Muhaddath website Conditions for copying books from our site: taken from www. proper notice concerning books that are not permitted to use for commercial purposes. D project is on Arabic > linguistics and I need to work on computerised Arabic > texts.uk> To: <moutasem@cosmos-software. > thx alot for the big effort you have done in Alwarak > project.Elewa@student. Just give me permission and I can download some > pages from the your site.ac. February 12.umist.----. 2002 5:37 PM Subject: I need permission for downloading > as-salamu alaykum. > yours > Elewa > Elewa > Department of language engineering > Centre for Computational Linguistics > UMIST > Manchester > UK ` 169 . Can I get hold of some books from that project for > the sake of research as my Ph.Original Message ----From: “Abdel-Hamid Elewa” <Abdel-Hamid.com> Sent: Tuesday. Genre: Thought and Belief Subgenre Texts belief & linguistics The Holy Qur’an The Holy Qur’an thought Prophetic 1.Ara’ Ahl Al-Madinah Al-Fadilah by Farabi 2.970 393.Al-Mutanabbi poems 1.5 362.6 7.Majmac Al-Amthaal 2.Arabian Nights 2.037.141 Perc.Jamharat Al-Mthaal science Text Size 88.579.385 collected 3.387 20.seerah of Ibn Hisham 2.Logic by Ibn Sina 3.Jurisprudence (fiqh): Al-Risalah by Al-Shafi’i 3.622 683.Appendix 2: mathematics The contents of the CAC are summarised in the following charts (1) & (2): physics geography lexicons proverbs fiction poetry Qur'an Hadith (1) Chart biography philosophy theology (2) Chart medicine Appendix 3: Genres and texts included in CAC.Al-Mucallakt 2.The Misers 1.2 ` 170 .8 9.5 Theology 1.223 1. 1.054 7.Dogmatics (al-’Aqeedah): AlIbanah by Al-Ash’ari 1.Exegesis of the Qur’an (Tafseer): by Al-Tabari) 2.933 478.7 Literature: Poetry Fiction Linguistics Proverbs 69.Sahih Al-Bukhari literature Tradition (Hadith) 2.Al-Akhbar Al-Tiwal 1.Sahih Muslim Biography Philosophy 1.Al-Falsafa Al-Ula by Al-Kindi 1.8 13.3 11. 15 14.Lexicons Science Geography Physics Medicine Mathematics 1. ` 171 .46 1.72 .684 1.469 26.Al-’Ayn (Al-Khalil 2.553 736.0 82.080 8.53 Al-Jamaher Fi Ma’rifat al-Jawahir by Albiruni Al-Qanun fi Al-Tib by ibn Sina Mafatih Al-’Ulum by Al-Khawarizmi Appendix: 4 A sample of concordances as appearing on the Monoconc window.Fiqh al-Lughah (Al-Tha’alibi Ahsan al-Taqaseem fi Aqaleem by Al-Maqdisi ma’rifat al- 404.499 57. ` 172 .Appendix 5: A picture of the concordance lines run by Monoconc and then saved to an only-text file. ` 173 .. ) Imad al-Barudi. M. A. (ed. Cairo. K. Damascus. Cairo. A Detailed Dictionary of the Arabic Language). Al-Tabari. Al-Ashcari. (1991).Bibliography Aijmer. Almuhanna. (b. A.) Abu al-Fadl Ibrahim. (2003). Al-Hamadhani. Longman. Daar Al-Kitaab Al-cArabi. Slatkine. R. Scientific and Technological Term Transfer into Arabic: A CorpusBased Study of Arabic Noun + Noun and Noun + Adjective Compounds. Anglia (Norwich). Al-Fayruzabadi. (1998). 59-77. University of E. (d.e. muh}}iit al-muh}}iit: ay qaamuus mut}awwal lil-lughah al-cArabiyyah (The Comprehensive Ocean: i. Inc. Computers in Literary and Linguistic Research. Al-Askary. Beirut. Beirut. UMIST. al-qamuus al-muhiit (The Comprehensive Lexicon). (1994).) Abbas Sabbagh. 1-4 April 1986. jamic al-bayaan fi ah}kaam al-qur’aan (The ` 174 . B. thesis. Al-Sakkaki. Manchester.). of 13 th International conference. (1995). nuzhat al-alibbaa’ (The Fun of the Men of Wit). (ed. E. Paris. 1066). al-furuuq fi al-lughah (Differences in Language). Y. “A computer-assisted study of cohesion based on English and Arabic corpora: An interim report”. & Knowles. A. (b. Al-Jabouri. Al-Qurtubi M. 1819-1883). J. (ed. Al-Bustani. A. The Benjamin/Cummings Publishing Company. 1207). al-ibaanah can us}uul al-diyaanah (Explanation About the Basics of Belief). Lebanon. CA. Allen. & Altenberg B. London and New York. Redwood City. Beirut. (1952). al-jaamic li-ahkaam al-qur’aan (The Compendium of Qur’anic Rulings). Mat}bacat Mustafa Al-Babi Al-Halabi. J. Abu Barakat. Cairo. English Corpus Linguistics. al-alfaaz} al-kitaabiyyah (The Literary Words). Natural Language Understanding. Al-Anbari. A. Chapion. F. Dar al-Kutub alc Ilmiyyah. Unpublished Ph. (1931). Daar Al-Nafaa’is. Daar Al-Fikr. Proc. Geneva. Daar Nahd}at Misr li-l-T}{abbc wa-l-Nashr. 922). (1988). Miftaah} al-culuum (the Key to Sciences) (1st ed.D. (1991). (d. Mat}bacat Mustafa Al-Babi Al-Halabi. I. pp. . M. J. 15. (1990).. “Collocations and general-purpose dictionaries”.) Ibrahim. “Corpus Design Criteria”. vol. Corpus Linguistics: Investigating Language Structure and Language Use. G. Amsterdam/Philadelphia. Allen & Unwin. Conrad..). Atkins. Biber. (eds. Beirut. (eds. John Benjamins. (1999). Collocational and Idiomatic Aspects of Composite (2nd ed. International Journal of Lexicography. (2000). Brinton L. 7. In Diversity in Language: Contrastive Studies in English and Arabic Theoretical and Applied Linguistics. Unpublished Ph. (1970). & Reppen. S. Clear. Literary and Linguistic Computing. (1996). Houston. Monoconc Program. vol. Mansell Publishing Ltd. E. Biber. (1998). The American University in Cairo Press. Bloomfield. “Representativeness in corpus design”. & Ostler. “Corpus-based approaches to issues in applied linguistics”. (1997). Conrad. Daar Al-Fikr. D. The BBI Dictionary of English Word Combinations.D. (1992). S.Comprehensive Book in the Rulings of the Qur’an). M. Routledge. (1983). Collocations: Their Computation and Semantic Significance. vol. S. Language. Bakalla. 2: 169-189. Beirut. “An opinion on the meanings of icrab in Classical Arabic: The state of the nominal sentence”. Badawi. Arabic Linguistics: An Introduction and Bibliography. D. D. London. London. R. Berry-Rogghe. nujcat al-raa’id wa shurcat al-waarid fi al-mutaraadif wa-l-mutaawarid (The Spring of the Seeker in Synonyms and Associations) Lubnaan. R. Biber. Cambridge University Press. vol. Version 1. Benson. M. (1935). 3. L. Benson. 4: 243-257. Edinburgh. M. Literary and Linguistic Computing. Bohas. Applied Linguistics. Athelstan. 8. Kassabgy. D. Edinburgh University Press. (1994). M. (1970). University of Manchester. J. USA.. 1: 1-16. Lebanon. M. Maktabat ` 175 . N.. London. I. thesis. Aydelott.) (1999). (1993). The Arabic Linguistic Tradition. S. & Kouloughli. R. & Ilson. G. L. E. Al-Yaziji. and Akimoto. Barnbrook.. Barlow.0. Cairo. N. Benson. Z. & Reppen. Guillaume P. (1990). 1: 23-35.. H. Cambridge. Language and Computers. Minneapolis. Chomsky. The MIT Press. (1993). The MIT Press. A. Oxford. (1993). M. Clear. The Cambridge Encyclopaedia of Language. Choueka. and Hindle M. Chejne. Allen & Paul Van Buren. Charniak. in Lexical Acquisition: Exploiting On-line Resources to Build a Lexicon. (1971). Lexical Semantics. mutual information and lexicography”. Oxford University Press. Hanks P. (1981). G.) Zernik. N. and Neuwitz. Statistical Language Learning.. “Automatic retrieval of frequent idiomatic and collocational expressions in a large corpus”.Predicates in the History of English. NJ: Lawrence Erlbaum Associate. “The treatment of collocations and idioms in learners’ dictionaries”. (1990). University of Minnesota Press. in Honour of A. K. J. E. D. Allen & Unwin. Cowie. Chomsky. Klein. London. & Tognini-Bonelli. Cambridge. N. Crystal. The Arabic Languag: Its Role in History. Oxford University Press. London. in Honour of John Sinclair. (1978). (1969).. John Benjamins. vol. Oxford University Press. Computational Linguistics. “Using statistics in lexical analysis”. Cambridge University Press. Cruse D. Vocabulary: Applied Linguistic Prescriptive. pp. The MIT Press. A. Journal for literary and linguistic computing. Y. A. 3. E. vol. Church. (1965). 16: 22-29.P. The Place of Illustrative Material and Collocations in the Design of a Learner’s Dictionary. Francis. Cambridge. Cambridge University Press. Massachusetts. Cambridge. Carter. “Word association norms.) Baker. Meaning in Language: An Introduction to Semantics and Pragmatics. Chomsky: Selected Readings. M. (1999). (1987). Applied Linguistics vol. Hillsdale. (1991). Hornby. E. Cowie. Christopher D. Cruse D. computational tools for the study of collocation”.. ` 176 . (1983).S.A. Gale W. and Hinrich S. (eds) J. 223-235. 271-292. Oxford. John Benjamins Publishing. Church K. (1986). (ed. U. Cambridge. Foundations of Statistical Natural Language Processing. Massachusetts. Philadelphia/Amsterdam. “From Firth principles. In Text and Technology. R.P. P. Massachusetts. and Hanks P. 115-164. T. 4: 34-38. (1987). A. (eds. (2000). Aspects of the Theory of Syntax. C. thesis. J. Amsterdam. John Benjamin Publishing Company. C. New York. Routledge. Mouton de Gruyter. and McEnery. Francis. pp. Joseph (2001) “On lemmatization in Arabic. H. (1935). Dickins. England. Thinking Arabic Translation. Leech G.) Svartvik J.R. M. London & New York. (1997). and Higgins. London. Basil Blackwell Ltd. A Course in Translation Method: Arabic to English. Toulouse. Longman. Toulouse. University of Manchester. R. Corpus Annotation. T. In Papers in Linguistics. & Carter G.R. Ghazala. Papers in Linguistics. (1957). N.Cambridge. New York. “Language corpora B.. Synonyms of the Glorious Qur’an. Workshop on Arabic Language Processing: Status Prospects. (2001).) Svartvik J. Berlin. London and New York. In Directions in Corpus Linguistics. 1. Unpublished Ph. Cairo. 129-141.D. Morocco. S. In Studies in the History of Arabic Grammar II. Al-Ribat. Emery. (eds. Oxford University Press. Firth. (1990). (1984). London. Garside. R. a formal definition of the Arabic entries of multilingual lexical databases”. Freeman A. 39th Annual Meeting of the ACL. (2002). Ditters E. Ghali. Longman. J. Mouton de Gruyter. P (1988). (1992). 39th Annual Meeting of the ACL. vol. (eds. In Directions in Corpus Linguistics. a Corpus-Based Approach. Body-Part Collocations and Idioms in Arabic and English. ` 177 . Berlin. “Cross-cultural link in translation (English-Arabic)”. Daar al-nashr lil-Jaamicaat. (1997). Majallat Al-Lisaan Al-cArabi (The Magazine of the Arabic Language). Fasold R. Dichy. Hervey. (1987). J. The Sociolinguistics of Society: Introduction to Sociolinguistics vol.”. (1992). and Sampson G. (eds. I. Fillmore. Workshop on Arabic Language Processing: Status Prospects. “Arabic corpus linguistics in past and present”. “‘Corpus linguistics’ or ‘Computer-aided armchair linguistics’”. Oxford University Press. London. The Computational Analysis of English. Firth.) Versteegh K. (2001). “Brill’s POS tagger and a morphology parser for Arabic”. “Modes of meaning”. 50. Leech G. Garside. Goldziher, I. (1966). A short history of Classical Arabic Literature. (trans.) J.DeSomogyi. Georg Publishers, Olms. Goweder A. and Roeck, A. (2001). “Assessment of a significant Arabic corpus”. 39th Annual Meeting of the ACL, Workshop on Arabic Language Processing: Status Prospects, Toulouse. Granger, S. (1999). “Use of tenses by advanced EFL learners: Evidence from an Error-tagged computer corpus”. In Out of Corpora, (eds.) Hasselgard & Signe Oksefjell, Rodopi, Amsterdam, pp 191-202 Granger, S. (eds) (1998). Learner English on Computer. Longman, London and New York. Gross M. (1990). Constructing Lexicon-Grammar. University of Paris, Paris. Guillaume, A. (1931). The Legacy of Islam. Oxford University Press, Oxford. Haeri, N. (2003). Sacred Language, Ordinary People: Dilemmas of Culture and Politics in Egypt. Palgrave Macmillan, New York. Halliday, M.A.K. (1991). “Corpus studies and probabilistic grammar”. In English Corpus Linguistics, (eds.) Aijmer, K. & Altenberg B. Longman, London and NewYork. Halliday, M.A.K., McIntosh, A. and Stevens, P. (1964). The Linguistic Sciences and Language Teaching. Longman, London. Hanks, P. (2000). “Literal and metaphorical word meaning”. Tuscan Word Centre document. Harris, R. (1973). Synonymy and Linguistic Analysis. University of Toronto Press, Toronto. Haywood, J. (1965). Arabic Lexicography, (2nd ed.). Brill, Leiden. Hitti, P. K. (1958). History of the Arabs. Macmillan, New York. Hoey, M. (1997). “From concordance to text structure: New uses for computer corpora”. Talk given at the 1997 Practical Applications of Language Corpora (PALC) conference, University of Lodz, April 12-14, Later published in Melia, J. & Lewandoska, B. (eds) Proceedings of PALC 97. Lodz University Press, Lodz. Hoogland, J. (1993). “Collocation in Arabic (MSA) and the treatment of collocations in Arabic dictionaries”. The Arabist, Proceedings of the Colloquium on Arabic Lexicology and Lexicography, Budapest, 1-7 Sept. 1993, (eds.) Devenyi, K., Ivanyi, T. and Shivtiel, A. Csoma de Koros Soc, Budapest, Hungary. Horrocks, G (1987). Generative Grammar. Longman, London & NewYork. ` 178 Hurford, J. & Heasley, B. (1983). Semantics: A Coursebook. Cambridge University Press, Cambridge. Ibn Al-Anbari, (1904). al-ad}daad (Antonyms). (ed.) Abu al-Fadl Ibrahim, AlMaktabah Al-cAs}riyyah, Lebanon. Ibn Faris, A. (d. 1105). al-s}aahibi. (ed.) Al-Sayed Sakr. Mat}bacat Isa Al-Babi Al-Halabi wa-shurakaah, Cairo Ibn Katheer, I. (1996). Tafseer al-qur’aan alcaziim (Explanation of the Great Qur’an). Daar al-macrifah, Lebanon. Ibn Jinni, A. (d. 1102). al-khasaa’is (The Properties). Mat}bacat Al-Hilal, Cairo Ibn Manzur, M. (b.1232-1311 or 12). lisaan al-carab (Arabs’ Language). Daar Bayruut lilT}ibacah wa-al-Nashr, Beirut. Ivanyi, T. (1993). “Dynamic vs. static: a type of lexical parallelism in the maqamat of alHamadhani”, The Arabist, Proceedings of the Colloquium on Arabic Lexicology and Lexicography, Budapest, 1-7 Sept. 1993, (eds.) Devenyi, K., Ivanyi, T. and Shivtiel, A. Csoma de Koros Soc, Budapest. Izwaini, S. (2000). Translating Collocations: Arabic/English/Swedish. Unpublished MA dissertation, CTIS, UMIST, Manchester. Izwaini, S. (in progress). Translation and The Language of Information Technology: A Corpus-Based Study of the Vocabulary of Information Technology and Translation from English into Arabic and Swedish. Unpublished Ph.D. thesis, UMIST, Manchester. Jackson H. (1988). Words and Their Meaning. Longman, London and New York. Johansson, S. (1995). “ICAME-Quo Vadis? Reflections on the use of computer corpora in linguistics”. Computer and the Humanities, vol. 28: 243-252. Jones, S. (1986). Synonymy and Semantic Classification. Edinburgh University Press, Edinburgh. Jones, S. and Sinclair, J.M. (1974). “English lexical collocations”. Cahiers de Lexicologie, vol. 24: 15-61. Kamir, D. Soreq, N. Neeman, Y. (2002). “A Comparative NLP system for Modern Standard Arabic and Modern Hebrew”. In Rosner, M. & Wintner, S., Proceedings of the Workshop on Computational Approaches to Semitic Languages. University of ` 179 Pennsylvania. Kennedy, G. (1998). An Introduction to Corpus Linguistics. Longman, London. Kenny, D. (1999). Norms and Creativity: Lexis in Translated Text. Unpublished Ph.D. thesis, UMIST, Manchester. Kenny, D. (2001). Lexis and Creativity in Translation: a Corpus-Based Study. St. Jerome Publishing, Manchester. Khalid, J. AlDaimi and Maha A. Abdel-Amir (1994). “The syntactic analysis of Arabic by machine”. Computers and Humanities, vol. 18: 29-37. Khoja, S., Garside, R. and Knowles, G. (2001). “A tagset for the morphosyntactic tagging of Arabic”. Proc. of the Corpus Linguistics 2001 Conference, Lancaster University, 29 Mar-2Apr 2001. Khoja. S. (2003). An Automatic Arabic Part-of-Speech Tagger. Unpublished Ph.D. thesis, University of Lancaster. Kjellmer, G. (1987). “Aspects of English collocations”. In Corpus Linguistics and Beyond, (ed) Meijs, W. Rodopi, Amsterdam, pp. 133-140. Knowles, G. (1996). “Corpora, databases and the organisation of linguistic data”. In Using Corpora for Language Research, (eds.) Thomas J. & Short M. Longman, London and NewYork. Koenraad, d., Hazel, G, Espen, O., Tito, O, Harold, S, Jacques, S. and William V. (eds.) (1999). Computing in Humanities Education: A European Perspective. The University of Bergen, Bergen. Krenn B. and Samuelsson, C. (1997). “The Linguist’s Guide to Statistics”, http://citeseer.nj.nec.com/krenn97linguists.html Langendoen, T. (1968). The London School of Linguistics. MIT Press, Cambridge, Massuchesetts. Leceibi, H. (1980). al-taraaduf fi al-lughah (Synonymy in Language). Dar al-Rashiid, Baghdad. Leech, G. (1991). “The state of the art in corpus linguistics”. In English Corpus Linguistics, (eds.) Aijmer, K. & Altenberg B. Longman, London, pp. 8-29. Lehrer, A. (1974). Semantic field and lexical structure. North-Holland, London. Lewis, M. (1993). The Lexical Approach. Language Teaching Publications, Hove, England. ` 180 (1963). H. Lyons. (1966). Cambridge. Mitchell. Longmans. Corpus Linguistics. Foundations of Statistical Natural Language Processing. Makkai. (1963). Halliday. Inc. “Major diseases of linguistics”. T. 2 (new series. Ch. “Linguistic ‘going on’: Collocations and other lexical matters arising on the syntagmatic/linguistic record”. Fontana Paperbacks. T. J. (1995). Cambridge University Press.A. (1971). pp.Louw. 157-176. Linguistic Semantics: An Introduction. Toronto. (1993). MIT Press. GB. H. McGraw-Hill Bark company. Patterns of Language: Papers in General. (eds. Cambridge. Cambridge University Press. New York. ARCHIVUM LINGUISTICUM. A. Massachusetts. Edinburgh University Press.. Language. Lyons. Structural Semantics. Edinburgh. J. pp. Cambridge. (1997). J. Basil Blackwell. In Text and Technology: In Honour of John Sinclair. ` 181 .K. (eds. George A. Lyons. (1969). John Benjamin Publishing Co. M. Lyons. London. Introduction to Theoretical Linguistics. Cambridge. J.& Terry T. In Language topics. J. Lyons. Cambridge University Press.. Descriptive and Applied Linguistics. McIntosh. A. J. Lyons. Cambridge. (1981b). Majmac al-lughah al-carabiyyah (1977) al-wasiit} (the intermediate) Daar al-macaarif. Semantics. and Schütze. & Wilson.F. 269-280. Language and Linguistics. TogniniBonelli. B.) Ross S. (1981a). P. “Irony in the text or insincerity in the writer? The diagnostic potential of semantic prosodies”. Cambridge University Press. (1996). Matthews. Amsterdam. A. (1987). Manning C. and Halliday. (2002). G. Cairo. Cambridge. Francis.( Baker. Oxford. (1993). London. Meaning and Context. Amsterdam/ Philadelphia. Language and Communication. Cambridge. English Corpus Linguistics. John Benjamins. Cambridge University Press. NewYork: Cambridge University Press. M. (1977). Meyer. 35-69). Essays in honour of M. and E. McEnery. Grammatical Theory in the United States from Bloomfield to Chomsky. Miller. Manchester: University of Manchester.) (1995). pp. (1998).. Théories grammaticales. Amsterdam Palmer. J. “The search for units of meaning”. A. John Benjamin Publishing Co. University of Leeds (http://www.) Christopher N.).. In Lexical ` 182 . Longman: London. Collins Coubild English Dictionary (2nd ed).D. Ball. Rene. and Barnbrook G.leeds. A.ac. The Grammar of Arabic. (1991). Tafsiir Mujahid (Explanation of the Qur’an by Mujahid). “Language. Halliday. In Language topics. vol. (1982). (1996). The Foundation of Arabic Grammar. J. (1975).) Jan Aarts & Willem Meijs. (eds. A Corpus-Based Study of Business English and Business English Teaching Materials. Candlin. Owens. Mason. Essays in honour of M. In Corpus Linguistics: Recent Developments in the Use of Computer Corpora in English Language Research.h}adiithah. M.comp. (1981). Oxford. (trans.) Ross S. Collins London and Glasgow. Reprinted with permission from Textus IX. (2000(. version 1. Oxford University Press. pp. Sinclair. (2000). Descriptions et Enseignement des Langues (Applied linguistics and language study). Computers and the humanities. Rodopi. Daar Al-Fikr al-Islaamiy al. 1.. issue 1. Cambridge. Cairo. Corpus. (2004). HarperCollins. (1989). Abdel-Salam. (1988). Sinclair. (1991). O. concepts and culture: old wine in new bottles”. F. R. Cambridge University Press. “Corpus development at Birmingham University”. pp. Sinclair J. “Collocation: a progress report”. Robert. version 0. John Benjamins Publishing Company. A. E. Daar Al-Kitaab Al-Lubnaani. (ed. Unpublished Ph. aConCorde Program. J. Nelson. Renouf.& Terry T. Cambridge University Press.html). I. Semantics (2nd ed. 31: 229-255. Mujahid. “language independent statistical software for corpus exploration”. thesis. 75-106. Beirut. Bilingualism: Language & Cognition. (eds.4. Smadja F. J. M. Roulet. verified by M. 3:39.Mubarak. J. J. Amsterdam. London. (1987b). “Macrocoding the Lexicon with Co-occurrence Knowledge”. Looking Up. 319-332. J. J. (1987a).uk/andyr/software/index. Concordance and Collocation. Sinclair. Sinclair. (1984). Sinclair. Amsterdam/ Philadelphia. M. (eds. (2001b). al-muzhir fi culuum al-lughah wa-anwacihaa (Tthe Flowery Book in Linguistics and Types of Languages). AATA. Otto Harrassowitz. M. “British traditions in text analysis from Firth to Sinclair”. (eds. M.) (1993).. B. (1995a). Ballantine Books. M. M.Acquisition. Charles Griffin & Company Ltd. The Random House Dictionary of the English Language. and E. 19. Oxford. 1: 23-55. Ohio. Stubbs. MIT Press. Tognini-Bonelli. (ed. Wiesbaden. (eds) Cook.(d. E. J. Steins. In Computational Linguistics. Lawrence Erlbaum Associates. G. Literature and Culture. M. and Alexander B. Amsterdam. Oxford. and Atwell. (1994). Stubbs.( Baker. 22. Functions of Language. F. & Seidlhofer. 2. Blackwell. Blackwell. Stubbs M. 1505). “Translating collocations for bilingual lexicons: A statistical approach”. Francis. John Benjamins. K. Somekh.. (1995b).G. “Corpus evidence for norms of lexical collocation”. London. 1: 1-38. Oxford. “Recent Work on Phraseology: The View from Corpora”. Basic Ideas of Scientific Sampling. Oxford University Press. Smadja F. (1978). G. “Collocations and semantic profiles: on the cause of the trouble with quantitative studies”. (1996). 165-189. C. McKeown. Beirut. Suyuti. Corpus-Based Computational Linguistics. Smadja. Stubbs. (1993). NJ. A. Hatzivassiloglou (1996). A seminar given at CITS. Stubbs.Widdowson. Rodopi. Stuart A. Corpus Studies of Lexical Semantics. S. In Text and Technology: In Honour of John Sinclair. ` 183 . vol. and V.) (1991). (ed. Computational Linguistics. New York. Genre and Language in Modern Arabic literature. (1968). Stubbs. UMIST. 1967-1986. U. An Annotated Bibliography of American Doctoral Dissertations on Arabic Language.. Text and Corpus Analysis. 1: 143-177. Columbus. (1989). (2001a). Words and Phrases. “Retrieving collocations from text: Xtract”. Souter. M. pp. D. Daar al-Jiil.. Amsterdam. In Principle and Practice in Applied Linguistics: Studies in honour of H. Straley. S. 1-33.) Zennik. H. L. Amestrdam. Versteegh. Polarity and Multiple Negation. Amsterdam and Philadelphia. Semantics: An Introduction to the Science of Meaning. Negative Contexts: Collocation. New York. (2002). In Directions in Corpus Linguistics. Monday June 10. (eds. 1. T. Philosophical Investigation. personal product. Cairo. (1953). “Lost in translation”.htm Tognini Bonelli. Landmarks in Linguistic Thought.) Sylviane Granger and Bengt Altenberg. London. (2002). 3rd international conference on language resources and evaluation. The Nijmegen Dutch-Arabic/Arabic-Dutch Dictionaries (2003) Bulaaq. Watt. Van der Wouden. K.co. (2001). (1997). “The semi-automatic tagging of Arabic corpora”. Benjamins. (1962). Mouton de Gruyter.let. Las Palmas. In Arabic Language Resources and Evaluation-Status and Prospects. Wehr. Abd al-Malik ibn Muhammad.Svartvik. London. R. (ed. J. Macdonald and Evens Ltd. http://www. 2002. Ullmann. Spain.) Svartvik J. Blackwell. Maktabat Al-Khanjiy. ` 184 . (2000).kun.) fiqh al-lughah wa-sirr alc arabiyyah (The Philology of the Arabic Language and Its Secrets) (vol). “Corpus Linguistics comes of age”. A workshop held in LREC. B. Routledge. Wittgenstein. E. S. Thacalibi. the Arabic Linguistic Tradition. Oxford.0. M. Routledge. (1997). (1992). The Guardian (UK).uk/. http://www.nl/WBA/Content1/PractInfo. Berlin. London and New York Van Mol. Concordance Program. Whitaker. Basil. Version 3. Oxford. A Dictionary of Modern Written Arabic. Blackwell. pp. 7-13.rjcw. (1980).freeserve. In Studies in Corpus Linguistics. “Lexis in contrast”. (b. 961 or 2-1037 or 8.

Comments

Description