Multi-category Words of Verb-Noun in Chinese Learner’s Mobile App Dictionary and Their Applications for Dictionary Compilation

: One of the most important and heated topics in modern Chinese studies is the classification of word types. Among all word types, the word of multi-categories is a special and difficult case, especially multi-category words of verb-noun (VNs) for students learning Chinese as the second language. To help them learn this kind of word, several teaching methods and textbook compilation methods have been proposed, with only a few studies focusing on dictionary compilation. However, as a significant tool for second language learning, the dictionary plays an important role in helping learners improve their productive ability in using multi-category words. This paper examined the definitions of the multi-category word of verb-noun by different scholars and chose Yu’s criteria from the need of learning Chinese as the second language. Common colligations and collocations of five VNs in native Chinese were extracted from two corpora and Online Chinese Collocation Assistant tool. Then, taking 说明 as an example, this paper analyzed the differences in its use by native Chinese and Chinese learners with a particular focus on its colligations and collocations. The results reveal that it was overused by Chinese learners as a verb, while underused as a noun. There was an imbalance between the use of the part of speech as verb and noun. Finally, this study proposed a better way to represent 说明 in a learning dictionary to help Chinese learners use this word.


Introduction
Multi-category word is a special type of word in Modern Chinese, which has received a great deal of scholars' attention.Although the definitions of multi-category words differ, there is a consensus that multi-category words have two or more parts of speech simultaneously.Words of verb-noun, verbadjective and adjective-noun are three common types of the multi-category words, with the verb-noun words being the most frequently used two-category words [1].According to Hu, 12.91%-19.33% of multi-category words are verb-noun words (VNs), which account for a large proportion of all multicategory words [2].However, no unified criteria were reached in terms of VNs in modern Chinese, so it's necessary to confirm an operatable definition which is useful for learning Chinese as a second language.Due to its high frequency and specialty, the Chinese learners are confronted with many difficulties in using VNs in written and spoken Chinese.They cannot fully master the use of two category of words and they tend to use only one part of speech.Furthermore, they cannot use the common colligations or collocations accurately.
To help the Chinese learners improve the capability of using VNs, researchers have proposed a series of methods in teaching those words [3].In addition to these teaching suggestions, the dictionary is also an important tool for language learning.However, even though there are some classical Chinese learning dictionaries, they still have many problems in their compilation, such as the lack of frequency-based representation, easily understandable definition, and learner-supportive explanation.Therefore, the article is intended to find a suitable definition of VNs for Chinese language learning, to investigate the use of VNs by Chinese learners and to propose a solution to their problems of using VNs by providing the mobile app dictionary method.This method features a detailed corpus-based investigation of VNs usage of colligations and collocations between native Chinese and Chinese learners.

Multi-category words of verb-noun
Multi-category Part of Speeches is one of the most urgent and complex problem in modern Chinese [4].Multi-category words means that they often undertake the grammatical functions of different parts of speech of two or more and are related in meaning.Therefore, Multi-category words of verbnoun (VNs) means that they can be used as a verb, also as a noun.Take the word 说明 shuo1ming2 as an example, when we say that "说明情况 shuo1ming2 qing2kaung4" (explain the situation), it is used as a verb; when we say that "药物说明 yao4wu4 shuo1ming2" (the drug instructions), it is used as a noun.Zhu proposed two criteria for identifying VNs: 1) they can directly modify nouns without the word "的"de or be modified by other nouns; 2) can be the object of the words "有 you3, 进行 jin4xing2, 受到 shou4dao4, 加以 jia1yi3, 予以 yu3yi3" and so on [4].Similarly, Hu [5] concluded three standards for identification: 1) can be directly modified by nominal measure words; 2) can directly be the object of "有 you3"; 3) can be directly modified by a noun.
Criteria argued by Zhu and Hu are both based on grammar function.But the identification of words can't be separated from the meaning of words, and definitions may vary from case to case.Another scholar Lu gave three different definitions of VNs in terms of different perspectives：1) from the needs of ontological research -words that are homophonic and homonymous but have different parts of speech; 2) from the needs of Chinese teaching -words with the same word form, the same sound, and a very close relationship in meaning, but with different parts of speech; 3) from the needs of Chinese information processing -words with the same word form, the same sound but different meanings or different parts of speech [6].
However, the standard of "very close relationship" is vaguely defined, so Yu deemed that the verb and noun of the dual category need to be semantic roles of each other [7].For example, in the sentence "用锁 2 锁 1 门 yong4 suo3 suo3 men2"(use the lock 2 to lock 1 the door), the noun "Lock 2"is the tool of the verb "Lock 1"; the verb "Lock 1"is the function and role of the noun "Lock 2".
In conclusion, taking grammar function, the real need in teaching Chinese, and meaning into consideration I will take Yu's criteria as a definition of VNs.

The difficulties of VNs use for Chinese learners
As a very special kind of word, the multi-category word has complex types and numerous numbers, which is the reason why it has always been the key and difficult points in Teaching Chinese as the second language (TCSL) [8].For foreign students, especially those whose native language is Indo-European, it is very difficult to learn multi-category words, because Indo-European languages have explicit morphological markers to indicate part of speech of words, whereas Chinese does not and tends to have words with multiple categories.Among multi-category words, VNs are the most frequent and should be paid enough attention in teaching, and so it has attracted the attention of many researchers.Hence a great number of researchers studied the errors of Chinese learners in learning VNs.
Long found two types of errors in VNs' use; one is the non-part-of-speech errors, and another is the VNs' part of speech errors [9].He points out that the former can be divided into three types, of which the most common is the substitution of VNs by other words.Through the analysis, he drew to the conclusion that one of the causes of such errors is the improper collocation 1 , but there aren't enough data to support his idea, with only several examples.Based on his study, Wang made corpusbased comparative analysis on the VNs' collocations between native Chinese and Chinese learners through the CCL corpus, Modern Chinese Corpus of state Language Commission, and HSK Dynamic Composition Corpus [8].She found that most of the typical colligations 2 frequently used by native Chinese are underused by Chinese learners, account for a low proportion or even do not appear under different parts of speech in interlanguage corpus, and a few typical colligations are used too frequently, so that they're underused in terms of their frequency and variety.Besides that, there are many typical collocations that are used with low frequency under different parts of speech of each multi-category word, or the collocations are not typical enough.
Although there are a number of problems when Chinese learners use VNs, there is limited research to help them reduce those mistakes.Considering the collocation is an important factor in learning VNs, and dictionaries usually present collocations as illustrative examples, I will take some advice from the point of view of dictionary compilation.

The Learners' dictionary compilation system
Learners' dictionaries should be different considerably from those for natives in compiling style, word entry, lexical definition, and illustrative examples in order to help language learners use sentences accurately.Having this aim in mind, Laufer suggested a special dictionary for productive purposes.He considered an entry should including three parts: (1) L1-L2 translations, followed by information about the L2 word (definitions, examples, etc.); (2) semantically related words; (3) additional L1 meanings of the L2 translations [10].
These requirements can be fulfilled by an electronic dictionary since such a dictionary can combine the features of an L2-L1bilingual dictionary, an L1-L2 bilingual dictionary and an L2 monolingual dictionary [11].Then Hurskainen proposed a new corpus-based approach to dictionary compilation, whose core features include (a) single-word headwords; (b) multiword headwords; (c) various types of cross-references; and (d) a user-defined selection of use examples in context, including controlled random selection, and selection based on frequent contexts [12].
But the electronic dictionary is easy to trap into the "corpus only theory" [13], which is divorced from the real needs of ordinary learners.As learners' dictionaries serve users of different levels of proficiency, the examples they provide cannot always be cited exactly in the form in which they appear in the corpus, but need to be carefully edited before being placed in the dictionary [14].

The compilation system of Chinese learner's dictionary
Cheng proposed a "QEI" principle: Q (Quick), E (Easy), I (Immediate), and the method must consider three elements: (1) The time pressure of learning Chinese for non-native Chinese; (2) the entry arrangement is based on a syntactic-semantic unit that is easy to remember and use as a whole; (3) the setting and corresponding of the syntactic distribution pattern of terms clue 3 in bilingual illustrative example translation [15].
Furthermore, since the dictionary for English learners has received great success in the world, it deserves us to learn their compilation mothed.For example, Zhang has noticed that mainstream contemporary English Learner's dictionaries pay great attention to the application of linguistics, and show the semantic structure of the paraphrased words from various aspects of the language, including the information of morphology, syntactic collocation and semantic discrimination [16].Therefore, there are more information items in its microstructure, which are generally around 25-30 items, while the information items in TCSL dictionaries are much less, generally not more than 10 items.Although the Chinese learner's dictionary has noticed the communicative mode of words and tries its best to express the rules of the use of words in terms of style and interpretation, the way of expression is vague and the syntactic pattern is separated from the concept, so that ordinary users can hardly understand the words, not to mention imitate and use.So, Zhang suggested incorporating information that reflects the distribution characteristics of words, such as syntactic structure, collocation, or colligation into the dictionary definition [16].
As the technology developed, how to make use of the advantages of the electronic dictionary is also an important issue.Ni summarized that the colorful icon, the multi-media assisted paraphrase, and the multi-level display will benefit learners a lot in the Chinese learner's mobile app dictionary [17].
Above all, the consensus reached in the academic circles is that TCFL dictionaries must reflect the characteristics of "extroversion", adhere to the principle of "practicality", ensure to meet the needs of non-native Chinese learners, and improve the efficiency and quality of Chinese learning.

3.
Chinese learners' difficulties in their use of multi-category words of verb-noun 3.1.Methods

Target words
Target words were chosen according to the following four criteria: 1) medium level of difficulty so that they are easy for learners to acquire; 2) frequently used in both native Chinese and non-native Chinese; 3) frequently used in classical textbooks; 4) included in the classical Chinese learners' dictionaries 4 .Since this paper mainly focuses on VNs that of medium difficulty level, 27 VNs that are required of intermediate Chinese learners in the HSK4 5 were extracted from Long's [9] research.
According to the first criteria and previous research [9], VNs that are required for students in HSK4 were identified (N = 27) from the HSK Test Syllabus (followed by Yu's definition of VNs).Then I counted each word's frequency in the following four corpora: CCL Corpus, BCC corpus, HSK Dynamic Composition Corpus, and Corpus of Teaching Chinese as Second Language 6 .At last, according to the frequency and the criteria above, 5 VNs were chosen as the target words: 保证 bao3zheng4、感觉 gan3jue2、经历 jing1li4、说明 shuo1ming2、规定 gui1ding4.

Corpus-based analysis of the use between native Chinese and Chinese learners
As mentioned before, many errors in the use of VNs result from improper use or misuse of colligations and collocations.Besides, collocations 7 are indispensable for language learners and second language teaching [18].Therefore, the main focus of this research is the colligations and collocations' use of VNs.
To analyze the differences in VNs' use between native Chinese and Chinese learners, typical colligations and collocations of the five target words were extracted from BCC corpus, CCL corpus, Corpus of Teaching Chinese as Second Language (native Chinese corpus) and the HSK Dynamic Composition Corpus (interlanguage corpus).
High-frequency colligations and collocations of native Chinese were extracted with the help of the Online Chinese Collocation Assistant8 tool.However, this tool only identifies colligations and collocation from two corpora (the Corpus of Teaching Chinese as Second Language and Chinese Wikipedia Corpus), so it can't fully represent typical usage by native Chinese.Therefore, as supplement, the use of colligations and collocations by native Chinese were further examined in the BCC and CCL Corpus.As for VNs' use by Chinese learners, the function of word collocation retrieval in the HSK Dynamic Corpus was used to extract the colligations and collocations by Chinese learners.
Taking 保证 bao3zheng4 as an example, firstly, typical colligations and collocations were extracted by Online Chinese Collocation Assistant (See Table 1).Then I checked the BCC corpus and CCL corpus to see if there was any colligation or collocation that are not identified by the Online Chinese Collocation Assistant.Finally, the typical colligations and collocations of 保证 were summarized in Table 1.Following this procedure, the typical colligations and collocations of the remaining four target words were also extracted and presented in Table 2.

Results
Since 说明 is most frequently used among the five target words, it was analyzed in depth in terms of its colligations and collocations.There are 168,448 concordance lines of 说明 in BCC corpus and 401 concordance lines in HSK Dynamic Composition Corpus.The proportion was calculated respectively by dividing the total number of raw occurrences of the collocations by the number of concordance lines of 说明 in the two corpora.As shown in Table 4, in terms of the collocation of 说明, 按说明 and 使用说明 are underused by Chinese learners (p < .05), 举例说明，就说明，向 X 说明 and 说明一下 are overused (p < .01).In Table 5, it can be noted that the verb usage of 说明 by native Chinese is widely distributed in various colligations, such as ~+noun, noun+~, ~+verb，and ~+Classifier, the proportion of which is 35.05‰,32.96‰, 19.72‰, and 4.85‰ respectively, while the verb usage of 说明 by Chinese learners in those colligations is much more frequent, the proportion of which is 825.44‰,147.13‰, 152.12‰, and 52.37‰.Besides, the verb usage of 说明 by Chinese learners occurs much more frequently than noun usage.As for the noun usage of 说明, Chinese learners apparently underuse it at a low frequency, and there are even two colligations (i.e.Noun+说明 and VN+说明) which are not used.
In terms of the verb usage, the colligations of 说明+Noun, Verb+说明, Noun+说明, Preposition+ 说明, 说明+Classifier are greatly overused by Chinese learners (p < .001, the absolute value of LogRatio > 2).In terms of the noun usage, the colligations of Noun+说明 and VN+说明 are greatly underused by Chinese learners (p < .01,LogRatio > 2).Overall, there is a huge difference between the usage of 说明 by native Chinese and Chinese learners.

Discussions
The present study examined the differences of VNs use between native Chinese and Chinese learners.As shown by the above tables, the noun-part-of-speech is underused by Chinese learners, while the verb-part-of-speech is overused.The findings of the present study indicate that verb-part-of-speech of VNs were used more frequently than noun-part-of-speech.The findings are consistent with Wang's study [8] which investigated the VNs use.She found that Chinese learners tend to use a limited range of colligations and collocations, and she also observed that there is an imbalance between the verb and noun part of speech use, which corroborates current study [8].To acquire the knowledge of word usage, language learners usually depend on classroom teaching and dictionaries.Therefore, if the textbooks or dictionaries don't represent the usage of VNs properly, Chinese learners may easily adopt negative learning strategies, which means that they will underuse some difficult or unfamiliar patterns to avoid making mistakes.
Moreover, this study shows that corpus-based analysis of colligations and collocations can be of great help for Chinese learners.Although there are only a few corpus-based studies on Chinese VNs colligations and collocations (e.g.[8]), studies on other languages demonstrate the importance of colligations and collocations on language acquisition [18][19][20][21].The collocations can be a source of difficulty for non-native speakers of a language [22].A common collocation typically used in the target language sometimes cannot be translated word by word, and its meaning depends on the specific context [21].Besides, the colligations and collocations are essential for fluency and accuracy in spoken and written language, learning typical colligations and collocations rather than individual words helps learners to understand the meaning of lengthy utterance effectively [23].Therefore, colligations and collocations indeed play a great role in second language acquisition and call for more attention in language teaching [18].Identification and extraction of colligations and collocations have been facilitated by corpus linguistics which proposed a variety of methods to automatically extract collocations from the text corpora.Collocation coding based on corpus linguistics can effectively provide useful lexical information about language habits, which is useful for second language acquisition [21].
Collocations and colligations have been considered as an important component in dictionaries, since the publication of Collins Cobuild English Language Dictionary [24], which is the earliest learning dictionary complied by the corpus-driven approach, paying special attention to the typical colligations and collocations revealed by authentic collection of texts.From then on, learning dictionaries have entered the corpus-based compilation era.However, there still lack classical Chinese learning dictionaries that systematically represent typical colligations and collocations based on real corpora.Considering the lack of specialized dictionaries in certain fields, especially for teaching Chinese as a second language, a new way of dictionary compiling through corpus-based analysis is proposed in the following section.

4.
Suggestions for the Compilation of Chinese learner's Dictionary for VNs (Heading 3)

Introduction of the target dictionary APP Pleco
Pleco (Official website: http://www.pleco.com/) is one of the most famous and frequently used Chinese learning mobile app dictionary [25].The dictionary was developed by a foreign student named Michael Love studying in China.Currently, the Pleco dictionary has more than one million users in 180 countries.

Analysis of the problem in dictionary compilation
In the era of Internet and smart phones, Chinese learner's dictionary APPs have come into being and are favored by many learners.However, there are still some demerits in those dictionary APPs, so this article will analyze some of the problems existing in dictionary APP compilation, taking Pleco as an example, with particular focus on the illustrative information.
Examples of word collocation and example sentences are mixed, as shown in Figure 1: The first four entries are phrases or collocations, and the fifth entry is an example sentence.In addition, the colligation patterns are not included in the compilation system of Pleco dictionary APP.
Word collocations are not arranged hierarchically, and there is no clear logic in the entry's order, neither by difficulty nor by frequency.Taking Figure 1 as an example, there are four collocations 说 明理由，说明真相，说明用法，举例说明，however, 说明用法 and 举例说明 are easier than the first two phrases according to the HSK Test Syllabus.The entries are also not ordered by frequency, since the concordance numbers of the four collocations in BCC corpus are 471, 122, 18, and 518 respectively.
Some of the entries are not accompanied by example sentences, and some example sentences in Pleco lack context.As shown in Figure 1, there are only collocations of 说明理由，说明真相 and 举例说明 without example sentences, so that learners can still face difficulties in knowing how to use the collocations.Besides, the usage of 说明 in sentence "代表团认为有必要说明自己的立场" is not very typical, and the context of this sentence is not clear enough, so maybe it is a little bit confused for learners to understand when and where they should use 说明 in such sentence.
The example sentences are not very practical, some of which presented in the dictionary are rarely used even by native Chinese.For example, the collocation 说明立场 occurs only once in BCC corpus, so this sentence is not very useful for Chinese learners.

The principle of compilation
There are two kinds of dictionary for language learners.One is called the introverted dictionary, which aims to serve the native language learners, and another is the extroverted dictionary, which is targeted for second language learners.For example, Pleco is an extroverted dictionary targeted for Chinese learners.Introverted dictionaries focus on language understanding, mainly on interpretation, and are supplemented by collocation information.Extroverted learning dictionaries focus on language generation, so the examples should strive to reflect the more comprehensive and typical usage of words, rather than providing evidence and supplement for interpretation like the introverted dictionaries.Therefore, how to present collocation information should be considered in the first place when compiling collocation examples of export-oriented learning dictionaries.
1. Orderliness: The orderliness principle requires that the arrangement of examples should show the hierarchical order from frequently-used to infrequently-used, from simple to complex, and from concrete to abstract.The orderliness will make learners clearly and directly know the most common use, and it should be combined with the law of learners' cognitive development [26].2. Representativeness: The colligations and collocations must be retrieved from each type of source corpus, and every collocation should be complained with a representative example sentence, so that the examples can show its most typical usage [26].3. Generativity: If the dictionary is cleverly compiled, the example sentences should enable the users of the dictionary to make sentences generatively [27].Learners should be able to form some correct sentences in their mind by combining examples with their existing language basis.4. Practicality: Language learning cannot be divorced from real life, so the examples should be practical and useful by Chinese learners in daily life, and are supposed to be put into a context that learners feel natural, understandable and willing to apply [28]. 5. Context: Each example should be easy to grasp the meaning, with a complete and clear context [29].If the example sentences lack context, the learners can hardly understand how to use it in their writing or speaking.

A sample
Based on the above principles, I proposed a sample for the dictionary compilation.The colligations and collocations were extracted from BCC corpus, CCL corpus, and Online Chinese Collocation Assistant tool, and were ordered by frequency.The example sentences were adapted from HSK Dynamic Composition Corpus oriented to Chinese learners.The above entries were compiled according to the five principles: orderliness, representativeness, generativity, practicality, and context.Each part of speech of the word was accompanied with the most frequently used colligations and collocations, and collocations were ordered by the frequency of use, which is rarely represented in the classical Chinese learning dictionaries.Each word of the example sentences was consistent with the HSK syllabus difficulty, so that learners can easily understand the meaning of the collocation and the example sentences without having to search dictionary again and again.Besides, compared with the traditional dictionaries, the definition of an entry is not a list of synonyms, but a detailed description of their semantics and usage.

Conclusion
In this article, a creative method for compilation of VNs in Chinese learner's mobile app dictionary learning dictionary app for Chinese learners through corpus-based work was presented.Firstly, I sorted out scholars' discussions on the definition of the multi-category word of verb-noun and decided to take Yu's [7] opinion as the criteria.Then I summarized the differences between native Chinese and Chinese learners in their use of VNs according to the native Chinese corpora and interlanguage corpora taking 说明 as a target words.Next, I analyzed some of the demerits of the widely-used mobile app dictionary Pleco, and summed up five principles of dictionary compilation.At last, I proposed a dictionary compilation sample of the VN 说明 according to the extracted common colligations and collocations.However, there are still some limitations in my research: 1) the representativeness of the corpora used: the interlanguage corpus only contains the sentences in the HSK tests, so the themes and topics are limited in a certain area and cannot fully represent the usage of Chinese learners.For example, the collocation 药品说明 is used in special theme or topic, whereas the HSK test is a general test, rather than a test oriented at some specific area; 2) the results of the study should be disgusted with reference to Markedness Theory in second language acquisition [30].For example, the underuse of noun usage may result from the learners' extensive use of the target language.It is possible for English speakers to overemphasize the characteristics of Chinese verb usage, and thus neglect the noun usage.
Despite the limitations, this article is of great importance for learning dictionary compilation, which offers a new way of dealing with the problems that traditional dictionary cannot solve, and suggestions to compile a dictionary that is production-oriented, linguistically accurate, and userfriendly, like simplifying the explanation by colligations and collocations.In the future work, expanding the size of corpora and collecting some elicited sentences through questionnaires or homework is probably a good way to improve the representativeness and comprehensiveness of corpora.Besides, due to the limited time and energy, the study only analyzed one word in details and can be extended to other words in the future.

Figure 1 :
Figure 1: Screenshot of the Entry for 说明 in Pleco

Table 1 :
Typical Colligations and Collocations of 保证 in Online Chinese Collocation Assistant.

Table 3 :
Typical Colligations and Collocations of the Other Four VNs.

Table 4 :
The Log-likelihood and Effect Size of Collocations of 说明.
Note.O1 means the BCC corpus and O2 means the HSK Dynamic Composition Corpus."*" means p < .05;"**" means p < .01;"***" means p < .001.The Log Ratio statistic is an "effect-size" statistic, not a significance statistic.It represents the magnitude of difference between two corpora for a particular keyword.Every extra point of Log Ratio score represents a doubling in size of the difference between the two corpora for the keyword under consideration."-" means underuse, "+" means overuse, "/" means equal use.

Table 5 :
The Log-likelihood and Effect Size of Colligations of 说明.

Table 6 :
The Sample of 说明 for Dictionary Compilation.
xiàng X 说 shuō 明 míng他不好意思向父母说明他点的都是这个饭店最便宜的菜。He was ashamed to explain to his parents that all he ordered were the cheapest dishes in this restaurant.