Skip to content

rupav/Predict-Happiness

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

20 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Predict-Happiness

Hackerearth Challenge( deadline- 30th nov. 2017)

Smileys

Stats of 70% of the test dataset, as checked on Hackerearth:

Submission date My Score Leaderboard Max score Approach
18th oct. 17 86.781 90.624 multinomialNB with standard stoplist
20th oct. 17 86.363 90.624 using MultinomialNB with TF1 stoplist only
20th oct. 17 84.070 90.624 using MultinomialNB with TF1 stoplist only and TFIDF approach
20th oct. 17 86.300 90.624 using MultinomialNB with tf_high(thresh 7500) stoplist in addition to tf1
20th oct. 17 86.630 90.624 using MultinomialNB with tf_high(thresh 7500) stoplist in addition to tf1 and standard stoplist
23rd oct. 17 80.878 90.624 Random Forest Classiffier
28th oct. 17 86.668 90.624 Removed hyphens and used Lemmatizer, used MultinomialNB
  • My Final Private Leaderboard Ranking and score : 177/554 and 86.781
  • Private Leaderboard top score : 91.051
  • My Final Public Leaderboard Ranking and score :
  • Public Leaderboard top score :

Comments:

This repository is made to accumulate and test various techniques in sentiment analysis. Couldnt make any submissions in nov. because of college exams :( . Should have tried Deep Learning.

References:

Stopwords analysis : For 2nd approach- key points :

  • TF1 outperforms other stoplists.
  • standard stoplist has a negative impact on sentiment analysis.
  • NB is more sensitive to stopwords removal than MaxEntropy.
  • Two more approaches- TBRS and Mutual Information can be explored!

TO DO after challenge deadline:

  • To explore Deep Learning Techniques on sentiment Analysis.
    • CBOG (continuous bag of words) techniques
    • skip grams with negative sampling.
  • Other techniques with different preprocessing.

Contribution:

Please create an issue first, and then make a relevant PR for it.