Unsupervised Lexical Sentimen Analysis on Shopee Product Review

WordCLoud
WordCLoud
WordCLoud
Slide Show
Slide Show
Slide Show
Bar Chart Sentimen
Bar Chart Sentimen
Bar Chart Sentimen

Project:

Web Scraping & Sentiment Analysis

Category:

User Project

Tools:

Web Scraper, GColab, & VSCode

Skills & Tools Used:

✔ Web Scraping: Extracted customer reviews from Shopee using Web Scraper & Python (API).
✔ Data Preprocessing: Cleaned and structured text data using Pandas, NumPy, and Regex.
✔ Text Mining & Feature Engineering: Tokenization, stopword removal, and stemming with NLTK.
✔ Sentiment Analysis (Unsupervised, Lexicon-Based): Classified reviews using a custom sentiment dictionary.
✔ Word Frequency & Trend Analysis: Identified key topics using word clouds (WordCloud library).
✔ Data Visualization: Created bar plots, pie charts, and word clouds using Matplotlib & Seaborn.
✔ Evaluation & Insights: Analyzed sentiment distribution and identified key areas for product improvement.

This project analyzes customer reviews from Shopee Madevine Official using web scraping and sentiment analysis to identify key trends, customer satisfaction levels, and areas for product/service improvement.

Important

In the competitive world of e-commerce, understanding customer opinions is crucial for improving satisfaction and fostering loyalty. By leveraging web scraping and sentiment analysis, we analyzed thousands of customer reviews from Shopee Madevine Official to uncover trends, preferences, and challenges in their shopping experience.

Site URL: Shopee Store

Script Code: GitHub Profile


Unveiling Customer Satisfaction on Shopee Madevine Official Through Sentiment Analysis

Understanding Customer Keywords

Every customer review holds valuable insights that help brands better understand their audience. Our word frequency analysis revealed the most commonly mentioned words: "cocok", "rambut", "tekstur", "bagus", dan "wangi".

This indicates that customers are highly concerned with how well the product suits their hair, its texture, and its scent.

Sentiment Trends: A Majority of Positive Reviews!

Our analysis of thousands of reviews yielded impressive results:

  • More than 50% of reviews express positive sentiment.

  • A small portion of reviews are neutral.

  • Only 1.7% of reviews contain negative sentiment.

This suggests that, overall, products sold by Shopee Madevine Official successfully meet customer expectations.

Addressing Customer Complaints

Although the number of negative reviews is minimal, identifying common issues is key to continuous improvement. Our deeper analysis found that the most frequent complaints revolve around:
📦 Shipping – Some customers reported delays or damaged packaging.
🧴 Product texture – A few customers felt that the texture did not match their expectations.

However, a major challenge in analyzing negative sentiment is the lack of clear word patterns in these reviews. Further analysis may be needed to pinpoint specific customer concerns in more detail.


Data Analysis Process

1️⃣ Data Collection (Web Scraping)

  • Customer reviews were extracted from Shopee using Web Scrapper and Pyhton with shopee API.

  • The dataset contained thousands of reviews in Bahasa Indonesia, requiring preprocessing for analysis.

2️⃣ Data Preprocessing

  • Text Cleaning: Lowercasing, capital split, removing special characters, URLs, and stopwords.

  • Tokenization: Breaking text into individual words.

  • Filtering: Removing irrelevant words using a sentiment lexicon.

  • Word Frequency Analysis: Identifying the most commonly used words.

3️⃣ Sentiment Classification (Lexical-Based, Unsupervised Approach)

  • Used a custom sentiment dictionary for classification, categorizing words into positive, negative, or neutral sentiment.

  • Applied lexicon-based sentiment scoring, where:

    • Positive words increase sentiment score.

    • Negative words decrease sentiment score.

    • Neutral words have no effect.

  • Visualized results using word clouds, sentiment distribution charts, and bar plots.


Challenges & Limitations

  1. Limited Context Understanding

    A lexicon-based approach analyzes words in isolation, meaning it struggles with: sarcasm or irony (e.g., "wow, amazing... not!"); mixed sentiment in a single review (e.g., "great product, but delivery was terrible").


  2. Sentiment Dictionary Limitations

    The custom dictionary needs constant updates to adapt to new words, slang, and informal expressions used in customer reviews.


  3. Negation Handling Issues

    The model does not effectively detect negations (e.g., "not bad" may be classified as negative instead of positive).


  4. No Machine Learning Model (Rule-Based Only)

    Unlike supervised ML models (e.g., Naïve Bayes, BERT), this approach does not learn from data and relies only on predefined word lists.