Not supported for mobile device.
Please use the website on a desktop or larger screen.
Sentiment Analysis
🎭· Target: 83% accuracy
Data Preview
Features
Scaling
Train
Dataset Overview
25,000 rows19
TOTAL COLUMNS
13
NUMERIC FEATURES
3
MISSING COLUMNS
7
OUTLIER COLUMNS
Data Quality Issues Detected
• Review Text: 0.2% missing values
• Avg Word Sentiment: 2.1% missing values
• User Rating: 15.5% missing values
• Word Count: contains outliers
• Exclamation Count: contains outliers
• Question Count: contains outliers
• ALL CAPS Ratio: contains outliers
• Review Age (Days): contains outliers
• Helpful Votes: contains outliers
• Total Votes: contains outliers
| Column | Type | Sample Values | Distribution | Missing | Outliers | Importance |
|---|---|---|---|---|---|---|
Sentiment TARGETTarget: 0 = Negative, 1 = Positive | Target | 10 | — | None | No | |
Review Text Raw movie review text (bag-of-words TF-IDF) | Text | loved itboring | — | 0.2% | No | 92% |
Word Count Number of words in review | Numeric | 12864245 | μ=230 σ=174 | None | Yes | 41% |
TF-IDF Top Features Top 5000 TF-IDF weighted word features | Numeric | 0.4200.71 | μ=0.08 σ=0.21 | None | No | 87% |
Avg Word Sentiment Average lexicon sentiment score of words | Numeric | 0.78-0.620.85 | μ=0.12 σ=0.54 | 2.1% | No | 69% |
Exclamation Count Number of '!' in review | Numeric | 037 | μ=0.8 σ=2.3 | None | Yes | 28% |
Question Count Number of '?' in review | Numeric | 012 | μ=0.4 σ=1.1 | None | Yes | 15% |
ALL CAPS Ratio Percentage of words in ALL CAPS | Numeric | 0.010.150.05 | μ=0.04 σ=0.08 | None | Yes | 34% |
Positive Adjectives Count of positive adjectives | Numeric | 305 | μ=2.5 σ=2.1 | None | No | 72% |
Negative Adjectives Count of negative adjectives | Numeric | 041 | μ=2.1 σ=2.4 | None | No | 76% |
User Rating Optional star rating left by user | Numeric | 514 | μ=3.4 σ=1.4 | 15.5% | No | 95% |
Is Verified Purchase Whether reviewer bought the product | Binary | 10 | — | None | No | 11% |
Review Age (Days) Days since review was posted | Numeric | 1436542 | μ=420 σ=650 | None | Yes | 2% |
Helpful Votes Number of people who found review helpful | Numeric | 12045 | μ=5.2 σ=45.3 | None | Yes | 25% |
Total Votes Total votes on the review | Numeric | 15250 | μ=8.4 σ=52.1 | None | Yes | 18% |
Has Spoilers User flagged review as containing spoilers | Binary | 01 | — | None | No | 8% |
Flesch Reading Ease Readability score of text | Numeric | 758265 | μ=72 σ=15 | None | No | 14% |
Random Tag 1 Meaningless metadata tag | Categorical | XYZ | — | None | No | 0% |
Random Tag 2 Meaningless metadata tag | Categorical | ABC | — | None | No | 0% |
💡 Review the data carefully — understanding your features helps you make better preprocessing choices.
── PIPELINE SCORE ────
71/100
Accuracy modifier: ×1.05
Features
100
Scaling
65
Outliers
30
Architect
75
⚡ Remove low-importance features (<25%) to reduce noise.
⚡ Some features are highly skewed — try Log or Sqrt normalization.
⚡ You have outlier columns — consider clipping or imputing them.
Step 1 of 3
Score: 71/100