Let me make you easy to understand what Chatbot is? It is Chat + Bot. Chat means messaging and Bot means a Robot ( or Machine ). When we automate Bot for messaging, we call it chatbot. Siri, Alexa, Google Assistant etc are some common chatbot you have seen. Although Chatbots seems to be new, they are not. First chatbot(ELIZA) was built in 1966.
You can use chatbots for replying your customers. Generally used to handle orders, hotel booking etc.
How do Chat bot Works?
The Chat bots generally works in 2 approaches – Rule Based and Self Learning.
In Rule Based Approach, we define number of questions and relevant answers to those questions. When user hits a query present in question list, you will get response else not.
In Self Learning Approach, we use Machine Learning Algorithms to answer the user query. It is more effective than rule-base approach.
Building Chatter Bot in Nepali?
The bot will not be favorable for Nepali Language. Hence, we used simply roman language.
Actually, there are no libraries in python that supports Nepali Language. We have to built the functionalities on our own. However, there is a trick. Generally we use roman while messaging, which contains English words. Hence, It will be possible to make chatbot to answer queries in Nepali Roman. For this you must have text of particular domain. If you built chatbot in Nepali for Online Shopping, create a text providing details of the Company in Roman.
Before building chatbot in Nepal, We assume, you have knowledge of scikit library and NLTK. We will be using concept of NLP for building the bot. NLP is the interactions between human language and computers is called Natural Language Processing, or NLP for short.
Step By Step Guide to Build Nepali Chat Bot
HOW WE CREATED A ROMAN NEPALI BOT FOR YOURKOSELI
Step 1: Create a Roman Word Text about The Company you want to make chatbot for. We did for Yourkoseli. So, we have inserted detailed information about Company in 1000 words in a text corpos called “yk.txt”.
Step 2: The main problem is text is in string form. We need to convert them to number to apply Machine Learning Algorithms/ Mathematics. So initially, we did text processing of the given text corpus ( Company Detail ).
- Changing all characters into either UPPERCASE or lowercase. So that the system donot show different vectors (numbers) for same word.
- Tokenization: convert whole text into smaller tokens in the form of sentences and then to words.
- Remove Noise: character other than letters and number like #,%,^,- etc.
- Stemming: convert plays,playing,played -> play
- Lemmatization: convert good, best -> better ( However, Stemming and Lemmatization won’t work good in full roman language )
- Term Frequency: the frequency of the word in the current document.
- Inverse Document Frequency: how rare the word is across documents.
- Cosine similarity: similarity measure between two vectors.
Step 3: Coding
#import necessary libraries import io import random import string # to process standard python strings import warnings import numpy as np from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.metrics.pairwise import cosine_similarity import warnings warnings.filterwarnings('ignore') import nltk from nltk.stem import WordNetLemmatizer nltk.download('popular', quiet=True) # for downloading packages # uncomment the following only the first time #nltk.download('punkt') # first-time use only #nltk.download('wordnet') # first-time use only #Reading in the corpus with open('chatbot.txt','r', encoding='utf8', errors ='ignore') as fin: raw = fin.read().lower() #TOkenisation sent_tokens = nltk.sent_tokenize(raw)# converts to list of sentences word_tokens = nltk.word_tokenize(raw)# converts to list of words # Preprocessing lemmer = WordNetLemmatizer() def LemTokens(tokens): return [lemmer.lemmatize(token) for token in tokens] remove_punct_dict = dict((ord(punct), None) for punct in string.punctuation) def LemNormalize(text): return LemTokens(nltk.word_tokenize(text.lower().translate(remove_punct_dict))) # Keyword Matching GREETING_INPUTS = ("hello", "hi", "greetings", "sup", "what's up","hey",) GREETING_RESPONSES = ["hi", "hey", "*nods*", "hi there", "hello", "I am glad! You are talking to me"] def greeting(sentence): """If user's input is a greeting, return a greeting response""" for word in sentence.split(): if word.lower() in GREETING_INPUTS: return random.choice(GREETING_RESPONSES) # Generating response def response(user_response): robo_response='' sent_tokens.append(user_response) TfidfVec = TfidfVectorizer(tokenizer=LemNormalize, stop_words='english') tfidf = TfidfVec.fit_transform(sent_tokens) vals = cosine_similarity(tfidf[-1], tfidf) idx=vals.argsort()[-2] flat = vals.flatten() flat.sort() req_tfidf = flat[-2] if(req_tfidf==0): robo_response=robo_response+"I am sorry! I don't understand you" return robo_response else: robo_response = robo_response+sent_tokens[idx] return robo_response flag=True print("YourKoseli BOT: My name is YK. I will answer your queries about Yurkoseli - Online Cake Delivery. If you want to exit, type Bye!") while(flag==True): user_response = input() user_response=user_response.lower() if(user_response!='bye'): if(user_response=='thanks' or user_response=='thank you' ): flag=False print("YourKoseli BOT: You are welcome..") else: if(greeting(user_response)!=None): print("YourKoseli BOT: "+greeting(user_response)) else: print("YourKoseli BOT: ",end="") print(response(user_response)) sent_tokens.remove(user_response) else: flag=False print("YourKoseli BOT: Bye! take care..")
Related: Sentiment Analysis in Nepali