T.Thanga pandiyan,III BCA
K.Srikanth,III BCA
SBK COLLEGE
Abstract
One can access all the tools prepared under the project “Computing Tools for TamilLanguage teaching and learning” either through TVA website or through the followingAmrita Vishwa Vidyapeetham , . One of the main focus was to develop computing tools which helpTamils and on-Tamils including children to learn as well as develop skills in Tamillanguage.
Keywords
Developing NLP tools, NLP tools for Verb Conjugation, Onto-thesaurus for Tamil
Introduction
finalize the futuristic road map for taking the Tamil Computing as well as promotion of Tamil language through online to the next level so as to fulfil the aspirations of the Tamils across the globe. Tamil Scholars, Linguistics Experts, Renowned Authors, Software industry representatives and experts from abroad were invited to participate in the workshop. It was decided in the workshop that TVA would collaborate with a group of institutes, industries and individuals to develop computing tools for Tamil to fulfil the immediate need of the Tamil around the world. One of the main focus was to develop computing tools which help Tamils and non-Tamils including children to learn as well as develop skills in Tamil language.
Tamil Language teaching and learning”. Under this project the following Tamil computing tools have been developed by Centre for Excellence in Computational Engineering and Networking (CEN) Amrita University, Coimbatore with the financial and physical support of TVA. The tools developed have been classified into two types:
- Natural language processing (NLP) tools
- Teaching and learning tools.
The following Graphical Interface is used to all the tools listed above.
1. Developing NLP tools
Under the NLP tools four tools have been developed. They are:
- verb inflection generator
- noun inflection generator
- word generator
- Onto-thesaurus.
1.1. NLP tools for Verb Conjugation, Noun Declension and word generation
components of the present project. Tamil is a morphologically rich language. Being agglutinative language most of the grammatical information is expressed by suffixes. For example, nouns are inflected for number and cases and verb are inflected for tenses, moods and aspects and subject agreement markers. A morphological generator capturing conjugation of verbs has been developed as one of components of the present project. The Morphological Generator takes lemma and grammatical information as input and gives inflected forms of the given word. It is a reverse process of Morphological Analyzer. Morphological generator system implemented here is a rule based system which makes use of morpheme concatenating rules and gives us the all the conjugated forms of a given verb and declension forms of a given noun. This tool will help Tamil learners in understanding verb conjugation and noun declension in Tamil.
1.2. Onto-thesaurus for Tamil
semantics of Tamil vocabulary. It went through several stages before being culminated into Tamil onto-thesaurus. It depicts our travel from Tamil thesaurus to Tamil word net. It is a lexical resource which amalgamates all sorts of information available in a dictionary, thesaurus and word net. A paper thesaurus for Tamil was prepared in 1990 based on the principles of componential analysis of meaning propounded by Nida (1975) and was published in 2001 (Rajendran, 2001), nearly after a decade. Following the paper thesaurus, an Electronic thesaurus for Tamil was attempted and a book on Tamil electronic thesaurus was published in 2006 (Rajendran and Baskaran, 2006) The preparation of wordNet for Tamil was undertaken (2001-2003) with the financial assistance from Tamil Virtual University (renamed now as Tamil virtual academy) and a crude version of it based on the ontology developed by Rajendran (Rajendran, 2001) was submitted to the institute in 2003. After that, from 2009 onwards with the fund received from MHRD and Department of electronics and information Technology of Govt. of India the building of Dravidian wordNet was executed based on Hindi wordNet; nearly 30000 synsets (concepts) have been completed. Still we have to go a long way to achieve the desired target. At present onto-thesaurus for Tamil has been completed with a vocabulary of 50000 words as a part of the project entitled “Computing Tools for Tamil Language teaching and learning”. Onto-thesaurus is a knowledge representation and this knowledge is visualized in the form of clusters instead of hierarchical tree. Each parent entity and it’s all available sibling entities regrouped on to a cluster and all clusters generated during search patterns are linked with the label entities.
entities are noting but nodes and each node is dominated by their parent node in the hierarchical structure. The noted are related by semantic and lexical relations such as synonymy, homonymy, meronymy, antonymy, etc. Hierarchical structure is created for different types of semantic domains. The top domains are entities, events, abstracts and relationals. Entities consist of concrete nouns; events consist mostly of verbs and verbal nouns, abstracts consist of abstract nouns, adjectives and adverbs and relational consists of prepositions/postpositions, connectives, and some functional words or units. The hierarchical structure is converted into a visual representation using tree viewer. Here we discuss in detail about the building ontological structure for Tamil vocabulary and how the ontological structure is converted in to a visual knowledge representation and user friendly retrieval system. All types of lexical and semantic relationship between lexical items are captured by using unique notations. The overall system is ontology based intelligent system for information retrieval. The ontology based thesaurus calleds Onto-thesaurus is also described here.
CONCLUSION
foundation for enhancement in future Tamil computing. The efforts will take us further into Tamil oriented language processing. We express our gratitude to the Tamil Virtual Academy, Govt. of Tamil Nadu which has supported us with financial assistance to make this possible. Our sincere thanks to the Project review committee members for their guidance and support. Our thanks are due to the director and assistant directors of Tamil Virtual Academy for the initiatives taken up by them for the enhancement of Tamil Language Computing. Our sincere thanks to the NLP team of Tamil Virtual Academy who was with us throughout this project.
FINAL DELIVERABLES:
- Language learning Tool for Verb Conjugation.
- Language learning Tool for Noun Declension.
- GUI oriented exercises for Language Learning (using Verb Conjugation and
- Noun Declension)
- Tamil Ontology Data and Tamil Ontology Tool. (vocabulary development)
- Learning English-Tamil Machine Translation system for Simple Sentences.
- English- Tamil Bi-lingual Dictionary.
FUTURE ENHANCEMENT
- Visual Onto Thesaurus with more Coverage
- Augmenting Conjugation and Declension module with more exercises
- Creation of Word Sense Disambiguation System using Ontology
- Linking Onto Thesaurus with English-Tamil Dictionary
- Augmenting translation module with Complex sentence patterns so as to
enable the learners to achieve more skills in English-Tamil translation
NOTE:
One can access all the tools prepared under the project “Computing Tools for Tamil Language teaching and learning” either through TVA website or through the following Amrita Vishwa Vidyapeetham