Enhancement of Arabic Text Classification Using Semantic Relations with Part of Speech Tagger


When it comes to Arabic text documents, Text Categorization (TC) becomes a challenge. TC is needed for clustering purposes in order to complete Text Mining (TC). Based on the nature of Arabic language, extracting roots or stems from the breakdown of multiple Arabic words and phrases is important task before applying TC. The results obtained by applying the proposed algorithm are compared with the results of three popular algorithms. These algorithms are Khoja stemmer, Light stemmer, and Root extractor. The performance of these three techniques are evaluated and compared based on the accuracy of Naive Bayesian classifier. The obtained result demonstrates that these techniques are not as promising as expected. Therefore, we decided to consider the position tagger and conceptual representation to answer the question, which approach enhances the Arabic TC performance? Arabic WordNet (AWN)) is used as a lexical and semantic resource. The performance of new relation "Has-hyponym",suggested in this work, is compared with other already used relations like Synset, term+ Synset, all Synsets, and Bag of words representation to demonstrate its effectiveness. From the experimental results, it was found that the new suggested relation improved the Arabic text classification, at which the macro average F1 is raised to 0.75437 compared with the performance of the other approaches.


El Kabani T.I.


Yousif S.A., Samawi V.W., Zantout R.

Journal/Conference Information

14th International Conference on Artificial Intelligence, Knowledge Engineering and Data Bases (AIKED '15),