Not supported for mobile device.
Please use the website on a desktop or larger screen.
Titanic Survival
๐ขยท Target: 80% accuracy
Data Preview
Features
Scaling
Train
Dataset Overview
891 rows23
TOTAL COLUMNS
10
NUMERIC FEATURES
7
MISSING COLUMNS
7
OUTLIER COLUMNS
Data Quality Issues Detected
โข Age: 19.9% missing values
โข Fare: 0.2% missing values
โข Embarked: 0.2% missing values
โข Cabin: 77.1% missing values
โข Fare per Person: 0.2% missing values
โข Deck: 77.1% missing values
โข Age Group: 19.9% missing values
โข Age: contains outliers
โข SibSp: contains outliers
โข Parch: contains outliers
โข Fare: contains outliers
โข Family Size: contains outliers
โข Fare per Person: contains outliers
โข Name Length: contains outliers
| Column | Type | Sample Values | Distribution | Missing | Outliers | Importance |
|---|---|---|---|---|---|---|
Survived TARGETTarget: 0 = No, 1 = Yes | Target | 01 | โ | None | No | |
Pclass Ticket class (1st, 2nd, 3rd) | Categorical | 13 | ฮผ=2.3 ฯ=0.84 | None | No | 72% |
Sex Passenger gender | Binary | malefemale | โ | None | No | 88% |
Age Age in years | Numeric | 223826 | ฮผ=29.7 ฯ=14.5 | 19.9% | Yes | 58% |
SibSp # siblings / spouses aboard | Numeric | 01 | ฮผ=0.52 ฯ=1.1 | None | Yes | 31% |
Parch # parents / children aboard | Numeric | 01 | ฮผ=0.38 ฯ=0.81 | None | Yes | 22% |
Fare Passenger fare (USD) | Numeric | 7.2571.287.92 | ฮผ=32.2 ฯ=49.7 | 0.2% | Yes | 65% |
Embarked Port of embarkation (C/Q/S) | Categorical | SCQ | โ | 0.2% | No | 19% |
Cabin Cabin number (mostly missing) | Categorical | C85C123 | โ | 77.1% | No | 8% |
Ticket Ticket number (high cardinality) | Text | A/5 21171PC 17599STON/O2 | โ | None | No | 5% |
Is Alone 1 if traveling alone, 0 otherwise | Binary | 01 | โ | None | No | 15% |
Family Size Total family members aboard | Numeric | 012 | ฮผ=1.9 ฯ=1.6 | None | Yes | 35% |
Title Passenger title (Mr, Mrs, Miss, etc) | Categorical | MrMrsMiss | โ | None | No | 65% |
Fare per Person Fare divided by family size | Numeric | 3.635.67.9 | ฮผ=19.9 ฯ=35.6 | 0.2% | Yes | 45% |
Deck Extracted from Cabin | Categorical | ?C | โ | 77.1% | No | 25% |
Ticket Length Length of ticket string | Numeric | 987 | ฮผ=6.8 ฯ=2.7 | None | No | 5% |
Name Length Length of passenger name | Numeric | 235122 | ฮผ=26.9 ฯ=9.2 | None | Yes | 12% |
Random Noise 1 Randomly generated noise feature | Numeric | 0.40.10.9 | ฮผ=0.5 ฯ=0.28 | None | No | 1% |
Random Noise 2 Randomly generated noise feature | Numeric | 124588 | ฮผ=50 ฯ=28 | None | No | 2% |
Random Noise 3 Random categorical feature | Categorical | ABC | โ | None | No | 0% |
Age Group Age binned into categories | Categorical | Young AdultAdult | โ | 19.9% | No | 45% |
Is Child 1 if Age < 16, else 0 | Binary | 01 | โ | None | No | 38% |
Has Cabin 1 if Cabin is known, else 0 | Binary | 01 | โ | None | No | 28% |
๐ก Review the data carefully โ understanding your features helps you make better preprocessing choices.
โโ PIPELINE SCORE โโโโ
71/100
Accuracy modifier: ร1.05
Features
100
Scaling
65
Outliers
30
Architect
75
โก Remove low-importance features (<25%) to reduce noise.
โก Some features are highly skewed โ try Log or Sqrt normalization.
โก You have outlier columns โ consider clipping or imputing them.
Step 1 of 3
Score: 71/100