Research Article Open Access

A Multi Layer Perceptron Along with Memory Efficient Feature Extraction Approach for Bengali Document Categorization

Quazi Ishtiaque Mahmud1, Noymul Islam Chowdhury1 and Md Masum1
  • 1 Shahjalal University of Science and Technology, Bangladesh

Abstract

In terms of the total number of speakers in the world Bengali stands as the seventh language and it has been used by approximately 265 million people worldwide. Day by day more people are expressing their views and opinions in Bengali in digital platforms like blogs and social media on various topics. Despite this, very little work has been done to structure these electronic documents according to their categories. In this paper, a methodology is developed for automatically categorizing Bengali news among twelve predefined categories using a Multi Layer Perceptron (MLP) model. We also explored the optimization opportunities that lie within the feature space and illustrated the difficulties that arise while handling large feature spaces in neural networks. It has been shown in this paper that the feature space can be optimized to achieve better accuracy. Using our modified feature extraction technique, we reduced the feature space and achieved an accuracy of 93.3%.

Journal of Computer Science
Volume 16 No. 3, 2020, 378-390

DOI: https://doi.org/10.3844/jcssp.2020.378.390

Submitted On: 5 January 2020 Published On: 28 March 2020

How to Cite: Mahmud, Q. I., Chowdhury, N. I. & Masum, M. (2020). A Multi Layer Perceptron Along with Memory Efficient Feature Extraction Approach for Bengali Document Categorization. Journal of Computer Science, 16(3), 378-390. https://doi.org/10.3844/jcssp.2020.378.390

  • 4,033 Views
  • 1,591 Downloads
  • 0 Citations

Download

Keywords

  • Document Categorization
  • TF-IDF
  • Multi Layer Perceptron
  • Activation Functions