1. What do you understand by Natural Language Processing?
Ans: Natural Language Processing is a field of computer science that deals with communication between computer systems and humans. It is a technique used in Artificial Intelligence and Machine Learning. It is used to create automated software that helps understand human spoken languages to extract useful information from the data it gets in the form of audio. Techniques in NLP allow computer systems to process and interpret data in the form of natural languages.
2.List any two real-life applications of Natural Language Processing.
Ans: Two real-life applications of Natural Language Processing are as follows:
- Google Translate: Google Translate is one of the famous applications of Natural Language Processing. It helps convert written or spoken sentences into any language. Also, we can find the correct pronunciation and meaning of a word by using Google Translate. It uses advanced techniques of Natural Language Processing to achieve success in translating sentences into various languages.
- Chatbots: To provide a better customer support service, companies have started using chatbots for 24/7 service. Chatbots helps resolve the basic queries of customers. If a chatbot is not able to resolve any query, then it forwards it to the support team, while still engaging the customer. It helps make customers feel that the customer support team is quickly attending them. With the help of chatbots, companies have become capable of building cordial relations with customers. It is only possible with the help of Natural Language Processing.
3.What is TF-IDF?
Ans: TFIDF or Term Frequency-Inverse Document Frequency indicates the importance of a word in a set. It helps in information retrieval with numerical statistics. For a specific document, TF-IDF shows a frequency that helps identify the keywords in a document. The major use of TF-IDF in NLP is the extraction of useful information from crucial documents by statistical data. It is ideally used to classify and summarize the text in documents and filter out stop words.
TF helps calculate the ratio of the frequency of a term in a document and the total number of terms. Whereas, IDF denotes the importance of the term in a document.
The formula for calculating TF-IDF:
TF(W) = (Frequency of W in a document)/(The total number of terms in the document)
IDF(W) = log_e(The total number of documents/The number of documents having the term W)
When TF*IDF is high, the frequency of the term is less and vice versa.
Google uses TF-IDF to decide the index of search results according to the relevancy of pages. The design of the TF-IDF algorithm helps optimize the search results in Google. It helps quality content rank up in search results.
4.What does a NLP pipeline consist of?
Ans: Any typical NLP problem can be proceeded as follows:
- Text gathering(web scraping or available datasets)
- Text cleaning(stemming, lemmatization)
- Feature generation(Bag of words)
- Embedding and sentence representation(word2vec)
- Training the model by leveraging neural nets or regression techniques
- Model evaluation
- Making adjustments to the model
- Deployment of the model.
5.What is Parsing in the context of NLP?
Ans: Parsing a document means to working out the grammatical structure of sentences, for instance, which groups of words go together (as “phrases”) and which words are the subject or object of a verb. Probabilistic parsers use knowledge of language gained from hand-parsed sentences to try to produce the most likely analysis of new sentences.
6.How is feature extraction done in NLP?
Ans: The features of a sentence can be used to conduct sentiment analysis or document classification. For example if a product review on Amazon or a movie review on IMDB consists of certain words like ‘good’, ‘great’ more, it could then be concluded/classified that a particular review is positive.
Bag of words is a popular model which is used for feature generation. A sentence can be tokenized and then a group or category can be formed out of these individual words, which further explored or exploited for certain characteristics(number of times a certain word appears etc).
7.What are the metrics used to test an NLP model?
Ans: Accuracy, Precision, Recall and F1. Accuracy is the usual ratio of the prediction to the desired output. But going just be accuracy is naive considering the complexities involved. Whereas, precision and recall consider false positive and false negative making them more reliable metrics.
And, F1 is the sweet spot between precision and recall.
8.What are some popular Python libraries used for NLP?
Ans: Stanford’s CoreNLP, SpaCy , NLTK and TextBlob.
There is more to explore about NLP. Advancements like Google’s BERT, where a transformer network is preferred to CNN or RNN. A Transformer network applies self-attention mechanism which scans through every word and appends attention scores(weights) to the words. For example, homonyms will be given higher scores for their ambiguity and these weights are used to calculate weighted average which gives a different representation of the same word.
9. List some Components of NLP?
Ans: Below are the few major components of NLP.
- Entity extraction: It involves segmenting a sentence to identify and extract entities, such as a person (real or fictional), organization, geographies, events, etc.
- Syntactic analysis: It refers to the proper ordering of words.
- Pragmatic analysis: Pragmatic Analysis is part of the process of extracting information from text.
10.List some areas of NLP?
Ans: Natural Language Processing can be used for
- Semantic Analysis
- Automatic summarization
- Text classification
- Question Answering
Some real-life example of NLP is IOS Siri, the Google assistant, Amazon echo.
11. Define the NLP Terminology?
Ans: NLP Terminology is based on the following factors:
- Weights and Vectors: TF-IDF, length(TF-IDF, doc), Word Vectors, Google Word Vectors
- Text Structure: Part-Of-Speech Tagging, Head of sentence, Named entities
- Sentiment Analysis: Sentiment Dictionary, Sentiment Entities, Sentiment Features
- Text Classification: Supervised Learning, Train Set, Dev(=Validation) Set, Test Set, Text Features, LDA.
- Machine Reading: Entity Extraction, Entity Linking,dbpedia, FRED (lib) / Pikes
12.What is ngram in NLP?
Ans: N-gram in NLP is simply a sequence of n words, and we also conclude the sentences which appeared more frequently, for example, let us consider the progression of these three words:
- New York (2 gram)
- The Golden Compass (3 gram)
- She was there in the hotel (4 gram)
Now from the above sequence, we can easily conclude that sentence (a) appeared more frequently than the other two sentences, and the last sentence(c) is not seen that often. Now if we assign probability in the occurrence of an n-gram, then it will be advantageous. It would help in making next-word predictions and in spelling error corrections.
13. What is perplexity in NLP?
Ans: The word “perplexed” means “puzzled” or “confused”, thus Perplexity in general means the inability to tackle something complicated and a problem that is not specified. Therefore, Perplexity in NLP is a way to determine the extent of uncertainty in predicting some text.
In NLP, perplexity is a way of evaluating language models. Perplexity can be high and low; Low perplexity is ethical because the inability to deal with any complicated problem is less while high perplexity is terrible because the failure to deal with a complicated is high.
14.What is pragmatic ambiguity in NLP?
Ans: Pragmatic Ambiguity can be defined as the words which have multiple interpretations. Pragmatic Ambiguity arises when the meaning of words of a sentence is not specific; it concludes different meanings. There are various sentences in which the proper sense is not understood due to the grammar formation of the sentence; this multi interpretation of the sentence gives rise to ambiguity.
For example- “do you want a cup of coffee”, the given the word is either an informative question or a formal offer to make a cup coffee.
15.What is pragmatic analysis in NLP?
Ans: Pragmatic Analysis: It deals with outside word knowledge, which means knowledge that is external to the documents and/or queries. Pragmatics analysis that focuses on what was described is reinterpreted by what it actually meant, deriving the various aspects of language that require real-world knowledge.
16.What is the difference between NLP and CI(Conversational Interfaces)?
Ans: Difference between NLP and CI(Conversational Interfaces)
|Natural Language Processing||Conversational Interfaces|
|NLP is a kind of artificial intelligence technology that allows identifying, understanding and interpreting the request of users in the form of language.||CI is a user interface that mixes voice, chat and another natural language with images, videos or buttons.|
|NLP aims to make users understand a particular concept.||Conversational Interface provides only what the users need and not more than that.|
17.What is the difference between NLP and NLU?
Ans: Difference between NLP and NLU are
|Natural Language Processing||Natural Language Understanding|
|NLP is the system that works simultaneously to manage end-to-end conversations between computers and humans.||NLU helps to solve the complicated challenges of Artificial Intelligence.|
|NLP is related to both humans and machines.||NLU allows converting the unstructured inputs into structured text for easy understanding by the machines.|
18.What is Lemmatization in NLP?
Ans: Lemmatization generally means to do the things properly with the use of vocabulary and morphological analysis of words. In this process, the endings of the words are removed to return the base word, which is also known as Lemma.
Example: boy’s = boy, cars= car, colors= color.
So, the main attempt of Lemmatization as well as of stemming is to identify and return the root words of the sentence to explore various additional information.
19. What is tokenization in NLP?
Ans: Natural Language Processing aims to program computers to process large amounts of natural language data. Tokenization in NLP means the method of dividing the text into various tokens. You can think of a token in the form of the word. Just like a word forms into a sentence. It is an important step in NLP to slit the text into minimal units.
20.Tell me the steps involved in solving an NLP Problem?
Ans: The following steps involved are:
- Gather the text from the obtainable dataset or by web scraping
- Apply stemming and lemmatization for text crackdown
- Apply characteristic engineering techniques
- Embed using word2vec
- Train the built model using neural networks
- Assess the model’s performance
- Make suitable changes in the model.
- Deploy the replica
21.What is the F1 score in NLP?
Ans: F1 score evaluates the subjective standard of recall and precision. It considers both false unconstructive and false constructive instances while evaluating the model. F1 score is more answerable than accurateness for an NLP model when there is a rough allocation of class.
22.What are bigrams, unigrams and n-grams in NLP?
Ans: When we parse a ruling one word at a time, then it is called a unigram. The ruling parsed two words at a time are a bigram. When the ruling is parsed three words at a time, then it is a trigram. Likewise, n-gram refers to the parsing of n languages at a time
23.Which one of the following is not a pre-processing technique in NLP?
- Stemming and Lemmatization
- converting to lowercase
- removing punctuations
- removal of stop words
- Sentiment analysis
Sentiment Analysis is not a pre-processing procedure. It is done after pre-dispensation and is an NLP use case. All other scheduled ones are used as part of declaration pre-processing.
24.There are several tagging using for processing natural languages. In all those tagging part of speech (POS), tagging is one of the popular ones in our industry. Please explain in details about part of speech (POS) tagging, and how it can be used properly?
Ans: Part of speech tagger is a very interesting and most important tool for processing natural language with proper manner. This part of speech (POS) tagger is a normal tool or software which helps for reading some critical text independent of any languages, then assign entire sentence in part of speech for each word or some other tokenization logic define in the software, such as adjective, verb or noun etc.
It is normally holding some specific algorithm which helps to label some of the terms in the entire text body. It has some varieties categories which are more complex than define above utility. The above define functionality is one of the very basic features of the POS tag.
25.There have some classification model define in NLP. What kind of features can be followed by NLP for improving accuracy in the classification model?
Ans:There have several classifications followed by NLP, explaining the same below:
- Counting frequency of define terms.
- Notation of vector for every sentence.
- Part of Speech (POS) tagging.
- Grammatical dependency or some define dictionary or library.