1 Introduction
The field of study which focuses on the interactions of human language and computers is natural language processing. NLP mainly focuses on the intersection of artificial intelligence, computer science, and computational linguistics. To examine, understand, and conclude importance and definition in a wise manner from human language, NLP uses computers. By using NLP, knowledge can be structured and analyzed to do different things like translation, automatic summarization, sentiment analysis, speech recognition, and topic segmentation. NLP is required to analyze text, allowing machine to know how human speaks. It is required for machine translation, automatic question answering, and mining. The exactness in human language is rare and this is the most difficult problem for NLP in computer science. The connection between human and machine is required to know its meaning and not by simply understanding the words. The ill-defined part of language makes NLP a critical task for computers to master and not the learning of language which is quite easy for individuals to learn. On machine learning algorithms NLP is developed. NLP can rely on machine learning than hand-coding big set of rules for automated rule learning by examining a pair of references such as down to a collection of sentences, a large corpus etcetera, and make predictions statistically. To infer, more the information is examined, more the model will be explicit.
1.1 Applications of NLP
Machine translation
Automatic summarization
Sentiment analysis
Text classification
To get the detail which is significant or which can ease few things by permitting predefined categories to a document and fit them is feasible through this classification only.
Question answering
For answering the human request, the term of question answering is a capable system and for its popularity, the major gratitude goes to Siri, OK Google, and Chat boxes. It provides authenticity and will go long in the upcoming time, therefore this will remain a challenging task for searching devices and will remain the crucial term of NLP research.
1.2 Introduction to Sentiment Analysis
Sentiment analysis is a process to obtain valuable information or sentiment from data. It uses various techniques like text processing, text analysis, natural language, and computational linguistics to process the data. The motive is to find out polarity of a document by analysis of data inside the document. The polarity of document is according to the opinion of the document and that can be either positive or negative or can be neutral polarity. Sentiment analysis is categorized into 3 main areas which are mentioned below.

Different sentiment analysis levels
1.3 Introduction to Sarcasm Detection
Sarcasm is a verbal device, with the intention of putting someone down or is an act of saying one thing while the meaning is opposite. It is mostly used on social media to make a remark that means the opposite of what they say, in order to hurt someone’s feelings. The polarity of the statement is also transformed by sarcasm into its opposite. For instance, if someone says, “You have been working hard,” he said with heavy sarcasm as the person looked at the empty page.
Dataset Formation: It is the first step in which dataset can be collected from different sources, e.g., Twitter or posts from Facebook.
Data Preprocessing: In this case, cleaning of data is performed such as removal of URLs, hashtags, tags in the form of @user and unnecessary symbols.
Sarcasm Identification: It involves two different phases, i.e., feature selection and feature extraction. Feature extraction involves Part of speech (), Term presence, Term frequency, Inverse document frequency, Negation and opinion expressions for extracting the features. On the other hand, the lexicon method and statistical method are used in case of feature selection.
1.4 Sarcasm Classification Approaches
- i.
Machine Learning Approach
- ii.
Lexicon Based Approach
- iii.
Hybrid Approach
1.4.1 Machine Learning
It is a field of artificial intelligence that trains the model from the current data in order to predict future outcomes, trends, and behaviors with the new test data. Machine Learning is categorized into Supervised and Unsupervised Learning.
Supervised Learning
Supervised Learning is used when there is a finite set of classes. In this method, labeled data is needed to train classifiers. In a machine learning based classifier, a training set is used as an automatic classifier to learn the different characteristics of documents, and a test set is used to validate the performance of the automatic classifier. Two steps are involved, i.e., training and testing.
Unsupervised Learning
This method is used when it is hard to find labeled training documents. It does not depend upon prior training for mine the data. In document level, SA is based on deciding the semantic orientation (SO) of particular phrase within the document. If the average semantic orientation of these phrases is above some predefined threshold, then the document is classified as positive, otherwise it is deemed negative.
1.4.2 Lexicon Based Techniques
One of the unsupervised techniques of sentiment analysis is lexicon based technique. There has been a lot of work done based on lexicon. In this classification is performed by comparing the features of a given text in the document against sentiment lexicons. The sentiment values are determined prior to their use. Basically, the sentiment lexicon consists of lists of words and expressions that are used to convey people’s subjective feelings and opinions. Three methods to construct sentiment lexicon are:
Manual Method
In this approach each opinion word, such as nice (adjective), fast (adverb), love (verb), is selected manually and the corresponding polarity is assigned. This manual approach is a little time consuming and that is why it is never used alone.
Dictionary Based Method
This approach has three steps. In the first step, opinion words are constructed with their sentiment orientations manually. Then, in the second step, the seed list is grown by searching for synonyms and antonyms of seed words in a dictionary that is available online such as WordNet. The search results are combined with the seed list with the same polarity as their synonyms in the list or the opposite polarity of their existing antonyms, and the seeking process is started again until no new word is found in the dictionary. In the third step, a correction process is done manually to remove any existent errors. By using machine learning techniques and using additional information in WordNet such as “hyponym, -, it is possible to generate better and richer opinion words lists”.
The most important drawback of this simple approach is that it is unable to distinguish between opinion words with respect to their domains. For example, “quiet” is expressing positive sentiment in the context of a car but a negative sentiment for a speakerphone.
Corpus-Based Method
This method is intended to solve the problem of the dictionary based approach. This method is intended to solve the problem of the dictionary based approach. It consists of two steps. The first step is constructing a seed list of opinion words which have adjective part of speech tags and their polarities. In the second step, a set of linguistic constraints is introduced to search for additional opinion words from the existing corpus as well as their sentiment orientations.
These linguistic constraints are based on the idea of “Sentiment Consistency.” According to sentiment consistency, people usually express the same opinions on both sides of conjunctions (for instance, “and”) and the opposite opinion around disjunctions (for instance, “but”). This idea helps to discover new sentiment words in a collection. For instance, in the sentence “This house is lovely and big.” If we do not have “big” in our seed list, we can conclude from “lovely” and conjunction (“and”) that “big” has the same polarity as “lovely.” Therefore, we can extend our list.
1.4.3 Hybrid Based Techniques
It involves a combination of other approaches namely machine learning and lexical approaches.
2 Literature Survey
Tanwar et al. [2] presented a huge amount of multimedia data also defined as MMBD is produced with a rapid incline in the supplying of multimedia devices over the invent of things in “Multimedia big data computing and Internet of Things applications: A taxonomy and process model.” In the present time, there research and development activities do not consider the complexity of MMBD over IoT rather focus on scaler sensor data. This process model mainly directs a number of challenges related to research such as accessibility, scalability, QoS, and reliability requirements.
A survey is presented by Jasandeep Kaur et al. on phases of sarcasm detection and also discusses various approaches based upon the combination of multiple features for classifying the text in “Text Analytical Models for Data Collected from Micro-blogging Portal—A Review”. Moreover, various classification algorithms are deployed for various text analytics systems, which are shortlisted on the basis of the feature engineering mechanism and type of data. For the data collected from Twitter, the Random Forest, SVM, and KNN are used with the punctuation-related, syntax-based, and other features for the sarcasm detection. The Random forest classifier is found the best in comparison with other classification models, where it outperforms the other model by minimum margin of 1.6% from KNN.
Shubhodip Saha et al. proposed an approach for sarcasm detection in Twitter in which textblob is used for preprocessing which includes tokenization, part of speech tagging, parsing, and by using python programming stop words are also removed. For polarity and subjectivity of tweets, RapidMiner is used and weka tool is used for calculating the accuracy of tweets using two classifiers, i.e., Naïve Bayes and SVM. At the end, naïve bayes provides more accuracy as compared to SVM.
A survey is provided by V. Haripriya et al. on various methodologies used to sarcasm detection in Twitter social media data and also done an analysis of various classifiers such as Naïve Bayes, Lexicon Based, and Support Vector Machine. Sarcasm can be determined efficiently only if the existing approaches can deal with large data set but most of the existing approaches can deal with only small datasets. So a deep learning approach is considered as an efficient approach to detect Sarcasm in case of large datasets.
Aditya Joshi et al. described datasets, approaches, trends, and issues in sarcasm detection. Datasets are divided into three classes: short text, long text, and other datasets approach like rule-based, statistical, deep learning-based, shared tasks are discussed and issues in data, issues with features, dealing with dataset skews are the issues for sarcasm detection.
Two approaches are presented by Aditya Joshi et al. in “Expect the unexpected: Harnessing Sentence Completion for Sarcasm Detection” that use sentence completion for sarcasm detection, one is all-words approach and other is incongruous words-only approach. Two datasets are used for the evaluation (i) tweets by [3] contains 2278 tweets out of which 506 are sarcastic annotated manually (ii) discussion forum posts by [4] 752 sarcastic and 752 non-sarcastic tweets manually annotated. For similarity measures, Word2Vec and WordNet similarities are used. The evaluation is configured into overall performance and twofold cross-validation. In overall performance, when Word2Vec similarity is used for the all-words approach an F-score of 54% is obtained but when WordNet is used in Incongruous words-only approach then F-score is 80.24%. In case of two-fold cross validation, when incongruous words-only approach and WordNet similarity are used then F-score is 80.28%.
A pattern-based approach is proposed by Mondher Bouazizi et al. for sarcasm detection on Twitter. They also proposed four different sets of features, i.e., Sentiment-related features, Punctuation-related features, Syntactic and Semantic features, and Pattern-based features. In this approach, the authors proposed more efficient and reliable patterns, i.e., words are divided into two classes: “CI” and “GFI” and this approach achieved 83.1% accuracy, 91.1% precision.
Different supervised classification technique is identified by Anandkumar D. Dave et al. in “A Comprehensive Study of Classification Techniques for Sarcasm Detection on Textual Data” for sarcasm detection and also train SVM classifier for 10X validation along simple Bag-of-words as features and use TFIDF for frequency measurement of the feature. Two datasets were collected (Amazon product reviews and tweets) and preprocessing also done for the removal of noise (spelling mistakes, slang words, user-defined label, etc.) present in the dataset.
An ensemble approach is introduced by Elisabetta Fersini et al. in “Detecting Irony and Sarcasm in Microblogs: The Role of Expressive Signals and Ensemble Classifiers” in which BMA (Bayesian Model Averaging) along with different classifiers on the basis of their marginal probability predictions and reliabilities. The two main ensemble approaches, i.e., Majority Voting and Bayesian Model Averaging are considered to detect sarcasm and irony. In order to evaluate the proposed BMA approach, Fersini et al. [5] considered baseline classifier the one with the highest accuracy and four configurations: BOW, PP, POS, PP & POS, and the experimental result shows the proposed solution outperforms the traditional classifiers for the well-known Majority Voting mechanism and in this paper sarcasm can be better characterized by PoS tags or ironic statements are captured by pragmatic particles.
Tomas Ptacek et al. represent the first attempt at sarcasm detection on two different languages, i.e., Czech and English in “Sarcasm detection on Czech and English twitter.” For this two different datasets collected 140,000 tweets in Czech and 780,000 English tweets from Twitter Search API and Java Language Detection for the evaluation and two classifiers were used, i.e., Maximum Entropy and Support Vector Machine for the classification. Tests were organized in the 5-fold cross-validation and this approach achieved F-measure of 0.947 and 0.924 on the balanced and imbalanced datasets in English. SVM achieved good results, i.e., F-measure 0.582 on the Czech dataset with the feature set upgraded with patterns.
Two additional features are proposed by Edwin Lunando et al. in “Indonesian Social Media Sentiment Analysis with Sarcasm Detection” to detect sarcasm, i.e., number of interjection words and negativity information, after a common sentiment analysis is conducted. Three different types of experiments were conducted, i.e., experiments on sentiment score, experiments on classification method and experiments for sarcasm detection. In last experiment, the additional features evaluated in the sarcasm classification accuracy which shows that the additional features are effective in sarcasm detection.
A novel bootstrapping algorithm is presented by Ellen Riloff et al. in “Sarcasm as Contrast between a Positive Sentiment and Negative Situation” this naturally learn record of positive sentiment phrases and negative situation phrases from sarcastic tweets. Two baseline systems are created and to train SVM classifiers LIBSVM library is used and 10-fold cross-validation is used to evaluate the classifiers. The SVM achieved 64% precision and 39% recall with both unigram and bigram features and the hybrid approach, applying the contrast method with only positive verb phrases raises the recall from 39 to 42%.
Bruno Ohana et al. present sentiment analysis on film reviews by using hybrid approach which involves a machine learning algorithm namely support vector machine (SVM) and a semantic oriented approach namely sentiwordnet. The features are extracted from sentiwordnet. Training of support vector machine classifier is done on these features. Film reviews are classified by support vector machine afterward. To determine the sentiment orientation of the film reviews, counting of negative and positive term scores has been done.
2.1 Research Gaps
- 1.
The word compression method used in the existing model can lower the performance of the sentiment analysis by removing the necessary bias and affecting the total emotion of the text data [6].
- 2.
The existing model offers the accuracy of the nearly 83%, which carries a room for improvement and can be improved up to the higher level. The accuracy of the system can be improved by using the various improvements in the existing model. The system accuracy may be improved by using the above steps [6].
- 3.
The existing model requires high computational power and slower the process of sentiment analysis. The proposed model can be extended to increase the process execution speed of the process. The existing model works in the various levels and uses the multivariate feature descriptors along with the classifier, which includes the overall elapsed time of the sentiment analytical system [6].
- 4.
Only sentiment and emotion clues, which include the polarity and emoticon features, are used to detect the sarcasm in the existing scheme. It analyzes the existence of both positive and negative sentiment-related features, which may lead to false results in many cases [7].
- 5.
The existing approach is best acceptable for the smaller text datasets, where the results have been proved to be efficient for Twitter with 140-word tweets. As Twitter has raised the number of allowed words from 140 to 280, this scheme is no longer efficient for the Twitter data. It must be improved for the larger text databases [8].
- 6.
The sarcasm detection is based upon the different levels of sarcastic tweet in existing scheme. Sarcasm can’t be properly described with the particular predefined set of rules; hence this scheme can’t meet such requirement. A more generalized model can be a better option for sarcasm detection [9].
3 Proposed Methodology
3.1 To Create a Dataset for Sarcasm Detection
The dataset has been collected fromTwitter using the Rest API, and the tweets are captured for the different streams, which includes different keywords for sarcasm, such as #sarcasm #sarcastic, etc. The normal tweets are collected from the natural discussion threads with keywords, such as #happy, #good, etc. A total of 25000 tweets are extracted from the Twitter API used as training data, and 609 tweets for testing purpose.
3.2 Implementation
The proposed work is implemented in Anaconda framework for complete sarcasm detection model. The configuration of the system is windows 8 (64-bit operating system) having an Intel i3 processor and 3 GB RAM. A detailed explanation of the implementation is done in this section.
3.2.1 Feature Comparison Model
The new model is designed for classifying tweet data into various categories, which involves the tweet data obtained from Twitter containing several tweets including non-sarcastic and sarcastic tweets. It is basically based upon the mixture of knowledge-based sarcasm detection with feature amalgamation to explore the various aspects of the text in order to recognize the correct type of the tweet. N-gram analysis techniques are used to extract tokens from the message data. Basically, for word-level tokenization is occurred. Mainly, the tokenization process relies on simple heuristics and is distinct from the whitespace characters (such as a space, line breaks) or by punctuation characters. These whitespace and punctuation can or cannot have comprised in the developing accrued record of tokens. On the other hand, there are many cases like hyphenated words, contractions, emoticons, and larger constructs such as URIs. The sarcasm detection technique is quite based upon the tweet category database that uses the n-gram analysis for message data. This model is designed in various components and each component has its own working and design. The new model contains different modules such as tokenization, feature extraction, classification density estimator, stop word filter, etc. Each components have created the final model of proposed work based on the sarcasm detection using feature engineering (or amalgamation) using various aspects of the text data.
3.2.2 Tokenization
It is the method to extract the keyword data from the input message string. It also enables the automatic sarcasm detection algorithms to find the category of the input tweet data, which gives better results within the lower time and small dictionary as compared to the complete phase dictionary. The proposed model has been analyzed under the N-gram model, which is capable of extracting the word combinations with the higher influence rather than extracting the stag words of less influence.
- 1.
Acquire the string from the message body
- 2.
Split the string into the word list
- 3.
Count the number of words in the splitted string
- 4.
Load the STOPWORD data
- 5.Start the iteration for each word (index)
- a.
Check the word (index) against the STOPWORD list
- b.If the word (index) match return true
- i.
Filter the word out of the list
- i.
- c.
Otherwise, match the word (index) with the supervised data provided for the tokenization
- d.If the token matches the data in the supervised lists
- i.
Add to the output token list
- i.
- e.
Check the token relation with the next word against the phrase data
- f.If relation found
- i.
Pair both of the words word (index) and word (index + 1)
- i.
- g.
Otherwise, return the singular word (index)
- h.If it’s the last word
- i.
Return the word list
- i.
- i.
Otherwise GOTO 5(a)
- a.
Feature 1 Contrasting features: The first feature is entirely based upon the contrasting connotations, which is the most prominent factor showing the sarcastic expressions. The use of the contrasting combinations of emotion or meaning based words and phrases are mainly used to show sarcasm in the sentences. The example of the sarcastic expressions such as “I love being robbed during holidays,” or “I enjoying being cheated by businesses” are the high sarcasm phrases, which are used to show the most common form of sarcastic sentences. From these combinations, two primary things are cleared, which involves the affection and sentiment scores. The sentiment score is calculated by using the following algorithm, which is a kind of supervised sentiment analysis method.
- 1.
Perform the data acquisition
- 2.
Perform user list extraction over the input data acquired from the social thread
- 3.
Perform the message level extraction from the input data
- 4.Apply the supervised tokenization with the localized dictionary to extract each message M out of the total messages N
- 5.Apply the STOPWORD filtering over the message data denoted by M
- 6.Apply the polarization method over the filtered message in step 5
- 7.Return the polarization value to the decision maker method under the proposed sentiment analysis algorithm
- 8.Classify the message polarity according to the computed weight
- a.If the computed weight is lesser than 0
- i.
mark the message as negative
- i.
- b.If the weight is higher than 0
- i.
mark the message as positive
- i.
- c.If weight equals zero
- i.
mark message as neutral
- i.
- a.




Here, t denotes text contained in the tweet, whereas the w represents the words in the tweets. The affect () is the function, which accepts the input of each word one by one and returns the matching affection in the form of affection weight, also known as special sentiment weight. The sentiment is computed by using the sentiment () function, which works similarly as the affect (). The difference or contrast of the affection or sentiment is denoted with the symbols of ∆affect and ∆sentiment. The minimum affection score is subtracted from the maximum affection and a similar step is performed for the sentiment score vector. The contrasting weight is returned to the program.
Feature 2 Affection analyses: The tweet data evaluation algorithm based upon the vital combination of above techniques such as sentiment analysis, tokenization, affection, etc. The new model is designed to collect data directly from online source or offline data source. The following algorithm explains the design of the affection model for the proposed model:
- 1.
Acquire the dataset and chooses the raw data form of the read CSV file
- 2.
Count number of rows in raw data,
- 3.
Load STOPWORD list
- 4.Run the iteration for each message in the raw data
- a.
Extract the current message in the raw data
- b.
Filter the STOPWORDS from the input message data
- c.
Extract the tokens from the input messages data
- d.
Evaluate the affection score of the overall message
- e.
Return the message score to determine the degree of affection
- f.
Add to the detected polarity list of positive, negative or neutral
- g.If the message is negative
- i.
Acquire the deep emotion supervised lists
- ii.
Determine the message under anger module
- iii.
Determine the message under disgust module
- i.
- h.
Return the deep sentiment results
- a.
Feature 3 Punctuation: This feature is the detailed feature, which counts for the various terms and their individual weights in order to understand the composition of the sentences in the given tweets. This is considered very important, as there is always a unique pattern behind each and every kind of phrase or sentence being written. The composition includes the various terms together such as special characters, punctuations, verbs, adverbs, etc. The following algorithm is used to determine this feature in an elaborative way:
- 1.
Acquire the tweet data obtained from the API
- 2.
Count the rows in the tweet data matrix
- 3.Iterate for every row in tweet data matrix
- a.
Read the current tweet from the tweet data matrix
- b.
Convert the tweet string to lowercase
- c.
Normalize the string to make it process able through NLP processors
- d.
Replace the URL with the word “url”
- e.
Replace the string “@username” with the word “at_user”
- f.
Remove the hashtags from the string
- g.
Remove the number values from the input string
- h.
Remove the special characters from the input string
- i.
Convert the string to Unicode string
- j.
Apply the tokenization on the string
- k.
Replace the internet slangs with the original syntactic replacements in the tokens array
- l.
Convert the tokens to the string
- m.
Reapply the tokenization on the re-prepared string
- n.
Remove the stopwords from the extracted keywords under N-gram analysis
- o.
Extract the subjective words
- p.
Add the output to the processed array
- a.
- 4.
Acquire the training data
- 5.
Process the training data
- 6.
Apply the classification and Return the classification results
- 7.
Compute the classification performance
- 8.
Return the performance parameters.
3.3 Main News Classification Algorithm

Generalized supervised classification model for sarcasm detection
4 Results
The proposed model has been designed for the sarcasm classification using the text analytical methods over the Twitter dataset. This data contains the various parameters, which includes various features such as affection, sentiment and punctuation related features, syntactic features, pattern related features, etc. In this work, the SVM, Maximum Entropy, KNN, and Random Forest classifiers are applied to the dataset in order to obtain the results.
Afterward, the data is divided into training and testing dataset, which is done using random selection by creating the random number series. The cross-validation split works on the different ratios, such as 10, 20, 30, 40, and 50% for cross-validation, which divided the testing samples accordingly into random groups of training and testing signatures under the prepared sub-datasets.
4.1 Performance Parameters
The performance evaluation of the proposed model is evaluated using the following parameters:
4.1.1 Accuracy

4.1.2 Recall

4.1.3 Precision

4.1.4 F1-Measure

4.1.5 True Positive
The true positive reading is observed when the target tweet belongs to the sarcastic category and classification result also indicates similar after evaluating the tweet text.
4.1.6 True Negative
The true negative reading is observed when the target tweet is not sarcastic and the classification also confirms its non-sarcastic nature.
4.1.7 False Positive
The true negative reading is observed when the target tweet is not sarcastic, but the classification shows it as sarcastic tweet.
4.1.8 False Negative
The true positive reading is observed when the target tweet belongs to the sarcastic category, but the classification result indicates it as non-sarcastic after evaluating the tweet text.
4.2 Confusion Matrix
Confusion matrix
True condition | ||
---|---|---|
Predicted condition | True positive | False positive (type 1 error) |
False negative (type 2 error) | True negative |
4.3 Four Different Classifiers for the Classification
SVM (Support Vector Machine)
MaxEnt (Maximum Entropy)
KNN (K Nearest Neighbor)
Random Forest
4.3.1 SVM
The method which is used for classification and regression is known as SVM, in which data is examined and patterns are identified. It is also used for outlier detection. This technique uses the concept of decision planes which define boundaries for decision. Basically, it is classification method in which hyperplane is constructed in multidimensional space to perform classification tasks which classify data into different label classes. The main task of SVM is to identify the right hyperplane to segregate classes. One of the most important techniques of SVM is kernels which transform low dimensional input space into higher dimensional input space and a kernel function that converts not separable problem to separable problem. SVM performs well when margin of separation is clear and also effective in high dimensional spaces.
4.3.2 Maximum Entropy
This classifier is commonly used in speech and information retrieval problems in NLP. Moreover, MaxEnt does not make assumption in considering the features, conditionally independent of each other, unlike the naïve Bayes. It is based on the application of maximum entropy from all the models that fits the training data. To solve a big number of text classification problems like sentiment analysis, topic classification etcetera, and this classifier can be applied. In terms of estimating the parameters of model, it is required to resolve the optimization problem and due to which mainly it takes more time to train as compared to naïve Bayes. However, in terms of CPU and memory consumption, it is quite competitive as it provides tough results while computing the parameters mentioned earlier.
4.3.3 KNN
Among all machine learning algorithms, k-nearest neighbor is the smallest one with the maximum vote of its neighbors, an object is classified. It is typically small, positive integer. The assignment of the object is simply done to the category of its closest neighbor if k = 1. Choosing k to an odd number is helpful in binary classification problem as tied votes are avoided by it.
The method which is used for KNN can be applied to regression by taking the average value of KNN to be the property value for the object. All the training samples are stored in instance based or lazy learners, which are nearest neighbor classifiers and a new sample is required to be categorized without which it cannot build a classifier. Also, for making projections it can be used.
4.3.4 Random Forest
This [10] was the first paper which brought the concept of ensemble of decision trees which is known random forest, which is composed by combining multiple decision trees. While dealing with the single tree classifier there may be the problem of noise or outliers which may possibly affect the result of the overall classification method, whereas random forest is a type of classifier which is very much robust to noise and outliers because of randomness it provides. Random forest classifier provides two types of randomness, first is with respect to data and second is with respect to features. Random forest classifier uses the concept of bagging and bootstrapping.
4.4 Result Evaluation
Result analysis of 10% split ratio with statistical accuracy based parameters
Classification algorithms | Precision (%) | Recall (%) | Accuracy (%) | F1-measure (%) |
---|---|---|---|---|
SVM | 84.4 | 77.2 | 78.6 | 80.7 |
MaxEnt | 81.0 | 82.0 | 80.5 | 81.5 |
KNN | 77.9 | 73.1 | 73.1 | 75.4 |
Random forest | 78.5 | 92.3 | 85.1 | 84.8 |
The accuracy based analysis shows the dominance of random forest classifier among all other classification options. The random forest-based model is observed with 92.3% recall, 85.2% overall accuracy and 84.9% f1-error, which are highest among the other options, whereas the 84.4% precision is observed for SVM as highest value, in comparison with random forest (78.5%), which is only exception.
Confusion matrix for SVM classifier of 10% split ratio
True condition | ||
---|---|---|
Predicted condition | 272 | 50 |
80 | 206 |
Confusion matrix for maximum entropy classifier of 10% split ratio
True condition | ||
---|---|---|
Predicted condition | 261 | 61 |
57 | 229 |
Confusion matrix for KNN classifier of 10% split ratio
True condition | ||
---|---|---|
Predicted condition | 251 | 71 |
92 | 194 |
Confusion matrix for random forest classifier of 10% split ratio
True condition | ||
---|---|---|
Predicted condition | 253 | 69 |
21 | 265 |
The following line graphs contain two axis, i.e., x-axis and y-axis. In x-axis, there are four different supervised algorithms that are used to classify the data and in y-axis contain a range of data in percentage for precision, recall, accuracy, and f1-measure.

Result analysis of 10% split ratio with precision and recall based parameters

Result analysis of 10% split ratio with accuracy and F1-measure based parameters
The accuracy based analysis shows the dominance of random forest, where maximum recall (93.2%), overall accuracy (84.9%), and f1-measure (85.4%) are observed, which is significantly higher than other classifiers. In contrast, the SVM classifier is observed with 83.9% precision is observed in comparison with random forest (78.8%).
Result analysis of 20% split ratio with statistical accuracy based parameters
Classification algorithms | Precision (%) | Recall (%) | Accuracy (%) | F1-measure (%) |
---|---|---|---|---|
SVM | 83.9 | 79.1 | 78.6 | 81.4 |
MaxEnt | 83.0 | 85.4 | 82.6 | 84.2 |
KNN | 76.1 | 76.3 | 73.4 | 76.2 |
Random forest | 78.7 | 93.2 | 84.9 | 85.3 |

Result analysis of 20% split ratio with precision and recall based parameters

Result analysis of 20% split ratio with accuracy and F1-measure based parameters
Confusion matrix for SVM classifier of 20% split ratio
True condition | ||
---|---|---|
Predicted condition | 570 | 109 |
150 | 386 |
Confusion matrix for Maximum Entropy classifier of 20% split ratio
True condition | ||
---|---|---|
Predicted condition | 564 | 115 |
96 | 440 |
Confusion matrix for KNN classifier of 20% split ratio
True condition | ||
---|---|---|
Predicted condition | 517 | 162 |
160 | 376 |
Confusion matrix for Random Forest classifier of 20% split ratio
True condition | ||
---|---|---|
Predicted condition | 535 | 144 |
39 | 497 |
Result analysis of 30% split ratio with statistical accuracy based parameters
Classification algorithms | Precision (%) | Recall (%) | Accuracy (%) | F1-measure (%) |
---|---|---|---|---|
SVM | 83.9 | 78.7 | 78.7 | 81.2 |
MaxEnt | 81.9 | 85.7 | 82.6 | 83.7 |
KNN | 75 | 74.2 | 72.0 | 74.6 |
Random forest | 78.6 | 93.4 | 85.2 | 85.3 |
Confusion matrix for SVM classifier of 30% split ratio
True condition | ||
---|---|---|
Predicted condition | 839 | 161 |
226 | 597 |
Confusion matrix for Maximum Entropy classifier of 30% split ratio
True condition | ||
---|---|---|
Predicted condition | 819 | 181 |
136 | 687 |
Confusion matrix for KNN classifier of 30% split ratio
True condition | ||
---|---|---|
Predicted condition | 750 | 250 |
260 | 563 |
Confusion matrix for Random Forest classifier of 30% split ratio
True condition | ||
---|---|---|
Predicted condition | 786 | 214 |
55 | 768 |

Result analysis of 30% split ratio with precision and recall based parameters

Result analysis of 30% split ratio with accuracy and F1-measure based parameters
Result analysis of 40% split ratio with statistical accuracy based parameters
Classification algorithms | Precision (%) | Recall (%) | Accuracy (%) | F1-measure (%) |
---|---|---|---|---|
SVM | 84.3 | 76.7 | 77.6 | 80.3 |
MaxEnt | 83.0 | 84.7 | 82.7 | 83.9 |
KNN | 75.2 | 72.9 | 71.4 | 74.0 |
Random forest | 79.1 | 92.1 | 85.0 | 85.1 |
The accuracy based analysis shows the dominance of random forest, where maximum recall (92.1%), overall accuracy (85.0%), and f1-measure (85.1%) are observed, which is significantly higher than other classifiers. In contrast, the SVM classifier is observed with 85.3% precision is observed in comparison with random forest (79.1%).
Confusion matrix for SVM classifier of 40% split ratio
True condition | ||
---|---|---|
Predicted condition | 1109 | 206 |
336 | 779 |
Confusion matrix for maximum Entropy classifier of 40% split ratio
True condition | ||
---|---|---|
Predicted condition | 1092 | 223 |
196 | 919 |
Confusion matrix for KNN classifier of 40% split ratio
True condition | ||
---|---|---|
Predicted condition | 989 | 326 |
367 | 748 |
Confusion matrix for random forest classifier of 40% split ratio
True condition | ||
---|---|---|
Predicted condition | 1041 | 274 |
89 | 1026 |

Result analysis of 40% split ratio with precision and recall based parameters

Result analysis of 40% split ratio with accuracy and F1-measure based parameters
Result analysis of 50% split ratio with statistical accuracy based parameters
Classification algorithms | Precision (%) | Recall (%) | Accuracy (%) | F1-measure (%) |
---|---|---|---|---|
SVM | 85.2 | 75.4 | 77.1 | 80.0 |
MaxEnt | 83.3 | 84.7 | 82.9 | 84.0 |
KNN | 74.3 | 72.9 | 71.3 | 73.6 |
Random forest | 80.0 | 91.9 | 85.4 | 85.5 |
Confusion matrix for SVM classifier of 50% split ratio
True condition | ||
---|---|---|
Predicted condition | 1394 | 242 |
453 | 948 |
Confusion matrix for maximum entropy classifier of 50% split ratio
True condition | ||
---|---|---|
Predicted condition | 1363 | 273 |
246 | 1155 |
Confusion matrix for KNN classifier of 50% split ratio
True condition | ||
---|---|---|
Predicted condition | 1216 | 420 |
451 | 950 |
Confusion matrix for random forest classifier of 50% split ratio
True condition | ||
---|---|---|
Predicted condition | 786 | 214 |
55 | 768 |

Result analysis of 50% split ratio with precision and recall based parameters

Result analysis of 50% split ratio with accuracy and F1-measure based parameters
5 Conclusion and Future Scope
The proposed model has been designed for evaluation of the tweet data on the various categories, which involves the tweet data obtained from Twitter containing the several tweets including non-sarcastic and sarcastic tweets using the unique combination of the feature descriptors, which primarily includes the contrasting sentiment this feature is entirely based upon the contrasting connotations, which is the most prominent factor showing the sarcastic expressions, the second feature is affection analysis, i.e., used for the evaluation algorithm based upon the vital combination of above techniques such as sentiment analysis, tokenization, affection, etc., and third feature is punctuation the detailed feature, which counts for the various terms and their individual weights in order to understand the composition of the sentences in the given tweets. The proposed model based upon the supervised classification based upon random forest has been observed the best among the test classification algorithms, where the random forest is observed with (84.7%) of overall accuracy in comparison with other supervised classification models of SVM (78.6%), logistic regression (80.5%), and KNN (73.1%).
In the future, the proposed model can be further improved by using the more advanced and/or compact feature set, which can provide the more specific information to the sarcastic expressions than the approach used in this paper. The application of feature selection based upon effective algorithms like particle swarm optimization (PSO), genetic algorithm (GA), etc. will be used to attain higher exactness for sarcasm detection.