Chatbot-in-Nepali

How to make Chat bot in Nepali Language?

Let me make you easy to understand what Chatbot is? It is Chat + Bot. Chat means messaging and Bot means a Robot ( or Machine ). When we automate Bot for messaging, we call it chatbot. Siri, Alexa, Google Assistant etc are some common chatbot you have seen. Although Chatbots seems to be new, they are not. First chatbot(ELIZA) was built in 1966.

You can use chatbots for replying your customers. Generally used to handle orders, hotel booking etc.

How do Chat bot Works?

The Chat bots generally works in 2 approaches – Rule Based and Self Learning.

In Rule Based Approach, we define number of questions and relevant answers to those questions. When user hits a query present in question list, you will get response else not.

In Self Learning Approach, we use Machine Learning Algorithms to answer the user query. It is more effective than rule-base approach.

Building Chatter Bot in Nepali?

The bot will not be favorable for Nepali Language. Hence, we used simply roman language.

Actually, there are no libraries in python that supports Nepali Language. We have to built the functionalities on our own. However, there is a trick. Generally we use roman while messaging, which contains English words. Hence, It will be possible to make chatbot to answer queries in Nepali Roman. For this you must have text of particular domain. If you built chatbot in Nepali for Online Shopping, create a text providing details of the Company in Roman.

Before building chatbot in Nepal, We assume, you have knowledge of scikit library and NLTK. We will be using concept of NLP for building the bot. NLP is the interactions between human language and computers is called Natural Language Processing, or NLP for short.

Step By Step Guide to Build Nepali Chat Bot

HOW WE CREATED A ROMAN NEPALI BOT FOR YOURKOSELI

Step 1: Create a Roman Word Text about The Company you want to make chatbot for. We did for Yourkoseli. So, we have inserted detailed information about Company in 1000 words in a text corpos called “yk.txt”.

Step 2: The main problem is text is in string form. We need to convert them to number to apply Machine Learning Algorithms/ Mathematics. So initially, we did text processing of the given text corpus ( Company Detail ).

  • Changing all characters into either UPPERCASE or lowercase. So that the system donot show different vectors (numbers) for same word.
  • Tokenization: convert whole text into smaller tokens in the form of sentences and then to words.
  • Remove Noise: character other than letters and number like #,%,^,- etc.
  • Stemming: convert plays,playing,played -> play
  • Lemmatization: convert good, best -> better ( However, Stemming and Lemmatization won’t work good in full roman language )
  • Term Frequency: the frequency of the word in the current document.
  • Inverse Document Frequency: how rare the word is across documents.
  • Cosine similarity: similarity measure between two vectors.

Step 3: Coding

#import necessary libraries
import io
import random
import string # to process standard python strings
import warnings
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import warnings
warnings.filterwarnings('ignore')

import nltk
from nltk.stem import WordNetLemmatizer
nltk.download('popular', quiet=True) # for downloading packages

# uncomment the following only the first time
#nltk.download('punkt') # first-time use only
#nltk.download('wordnet') # first-time use only


#Reading in the corpus
with open('chatbot.txt','r', encoding='utf8', errors ='ignore') as fin:
    raw = fin.read().lower()

#TOkenisation
sent_tokens = nltk.sent_tokenize(raw)# converts to list of sentences 
word_tokens = nltk.word_tokenize(raw)# converts to list of words

# Preprocessing
lemmer = WordNetLemmatizer()
def LemTokens(tokens):
    return [lemmer.lemmatize(token) for token in tokens]
remove_punct_dict = dict((ord(punct), None) for punct in string.punctuation)
def LemNormalize(text):
    return LemTokens(nltk.word_tokenize(text.lower().translate(remove_punct_dict)))


# Keyword Matching
GREETING_INPUTS = ("hello", "hi", "greetings", "sup", "what's up","hey",)
GREETING_RESPONSES = ["hi", "hey", "*nods*", "hi there", "hello", "I am glad! You are talking to me"]

def greeting(sentence):
    """If user's input is a greeting, return a greeting response"""
    for word in sentence.split():
        if word.lower() in GREETING_INPUTS:
            return random.choice(GREETING_RESPONSES)


# Generating response
def response(user_response):
    robo_response=''
    sent_tokens.append(user_response)
    TfidfVec = TfidfVectorizer(tokenizer=LemNormalize, stop_words='english')
    tfidf = TfidfVec.fit_transform(sent_tokens)
    vals = cosine_similarity(tfidf[-1], tfidf)
    idx=vals.argsort()[0][-2]
    flat = vals.flatten()
    flat.sort()
    req_tfidf = flat[-2]
    if(req_tfidf==0):
        robo_response=robo_response+"I am sorry! I don't understand you"
        return robo_response
    else:
        robo_response = robo_response+sent_tokens[idx]
        return robo_response


flag=True
print("YourKoseli BOT: My name is YK. I will answer your queries about Yurkoseli - Online Cake Delivery. If you want to exit, type Bye!")
while(flag==True):
    user_response = input()
    user_response=user_response.lower()
    if(user_response!='bye'):
        if(user_response=='thanks' or user_response=='thank you' ):
            flag=False
            print("YourKoseli BOT: You are welcome..")
        else:
            if(greeting(user_response)!=None):
                print("YourKoseli BOT: "+greeting(user_response))
            else:
                print("YourKoseli BOT: ",end="")
                print(response(user_response))
                sent_tokens.remove(user_response)
    else:
        flag=False
        print("YourKoseli BOT: Bye! take care..")    
        

RESULT:

Nepali-Bot-Yourkoseli

Related: Sentiment Analysis in Nepali

Leave a Comment

Your email address will not be published. Required fields are marked *