FILM REVIEW SENTIMENT CLASSIFIER

Supervised train dataset

Project description

The project focuses on classifying movie comments into two categories: positive and negative. It employs techniques such as data preprocessing, text vectorization, and supervised classification. The models tested include Naive-Bayes (BernoulliNB) and Support Vector Machine (SVM via SVC). Performance evaluation involves classification reports and confusion matrices.

The primary goals include preprocessing the comments by cleaning and lemmatizing them to remove irrelevant elements like punctuation and stop-words. Texts are then converted into numerical vectors using the TF-IDF technique. The project tests and compares the performance of SVM and Naive-Bayes models, using standard metrics and confusion matrices for evaluation.

Data preprocessing involves tokenizing and POS tagging comments, lemmatizing words to their base forms, cleaning punctuation and stop-words, and unifying the remaining lemmatized tokens into cleaned texts. Data is then split into training (70%) and test (30%) sets.

Overall, the project demonstrates effective application of natural language processing and supervised classification techniques for sentiment analysis, offering valuable insights into the comparative performance of SVM and Naive-Bayes models for this task.

Discover more about this project and click on the button below to access the GitHub Repository.

Find out more on this project

Explore More Projects

If you're interested in exploring more projects, please select another project from the dropdown menu.