In a world where generate 2.5 quintillion bytes of data is being generated every day, Sentiment Analysis has become key component to systematically extract, identify, and quantify the data. Nepali Sentiment Analysis is hardly found, which has immense possibilities, it tends to revolutionize the surveys and review collections in Nepal with its growing applicability to a wide variety of applications from customer service to marketing. Nepali Sentiment Analysis generally consists of sample Nepali data collection, data processing, feature extraction and classification. Among these procedures feature extraction aims to detect and extract features that can be used to determine the meaning of a given Nepali-Contextual-Data. The extracted features should be able to classify the data reliably into positive, negative or neutral class.
मलाई यो मन परे न। [ Negative ]
आहा कती राम्रो घडी। [ Positive ]
Three Thousand Nepali context data were collected from different Facebook pages like (Routine of Nepal Banda, Meme Nepal, Amazon) and other different news portal sites. Then they were converted into phrase level data of each sentence.Find here.
User given data may not be clean and fine thus preprocessing was necessary, first data were split into each individual tokens and their importance were analyzed by the model i.e. if tokens were processable or not. And noises were removed by the system such as unnecessary punctuations, emojis, and other symbols.
Eg. मलाई यो घडी सान्चिकै सुहाँउछ ।:)
मलाई यो घडी सान्चिकै सुहाँउछ
In feature extraction, we use the Term Frequency and Inverse Document Frequency (TF-IDF). TFIDF gives the relative weight of the individual words in the given input.
Eg. मलाई यो घडी सान्चिकै सुहाँउछ
[0.82, 0, 0, 0, 0, 0.23, 0, 0.56, 0………………0, 0.86, 0, 0, 0, 0.65]
Implementation Naive Bayes Algorithm
Naive Bayesian Classifier is a simple probabilistic classifier based on Bayes Theorem with strong independence assumptions of feature space. Depending on the precise nature of the probability model, Naive Bayes classifier can be trained very efficiently in a supervised learning setting.
sent = मलाई यो घडी सान्चिकै सुहाँउछ
P(sent/positive) = P(यो/positive) * P(घडी/positive)*P(मल इ/positive)*P(सुह उछ/positive)
P(sent/negative) = P(यो/negative) * P(घडी/negative)*P(मल इ/negative)*P(सुह उछ/negative)
P(sent/neutral) = P(यो/neutral) * P(घडी/neutral)*P(मल इ/neutral)*P(सुह उछ/neutral)
Compare three probabilities which one is grater “sent” belongs to corresponding class. i.e P(sent/positive) is grater among other then given sentences belongs to positive class.
Using this Nepali sentiment analysis system, we can effectively evaluate people opinion on products and political affairs in nepali context.This classification can help companies to collect feedbacks and to make better products and services. Political status and people opinion can be extracted from this classification at some level.
Project Code Find Here
 Peter D. Turney, “Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews,” Institute for Information Technology National Research Council of Canada Ottawa, Ontario, Canada, K1A 0R6, 2002.
 Ashok Panta, “Sentiment Analysis on Nepali Movie Reviews using Machine Learning,” Tribhuwan University Department of Science and Technology, 2013.