๊ฒฐ์ธก์น ํ์ธ ์ค์ตยถ
Inย [ย ]:
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
Inย [ย ]:
# ํ์ดํ๋ ๋ฐ์ดํฐ
titanic = pd.read_csv('C:/Users/LOVE/Downloads/vscode/ML/titanic/train.csv')
titanic.head(3)
Out[ย ]:
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
Inย [ย ]:
titanic.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 891 entries, 0 to 890 Data columns (total 12 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 PassengerId 891 non-null int64 1 Survived 891 non-null int64 2 Pclass 891 non-null int64 3 Name 891 non-null object 4 Sex 891 non-null object 5 Age 714 non-null float64 6 SibSp 891 non-null int64 7 Parch 891 non-null int64 8 Ticket 891 non-null object 9 Fare 891 non-null float64 10 Cabin 204 non-null object 11 Embarked 889 non-null object dtypes: float64(2), int64(5), object(5) memory usage: 83.7+ KB
Inย [ย ]:
# ๊ฒฐ์ธก์น ์ญ์ ํ๊ธฐ
titanic.dropna(axis = 0).info()
<class 'pandas.core.frame.DataFrame'> Index: 183 entries, 1 to 889 Data columns (total 12 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 PassengerId 183 non-null int64 1 Survived 183 non-null int64 2 Pclass 183 non-null int64 3 Name 183 non-null object 4 Sex 183 non-null object 5 Age 183 non-null float64 6 SibSp 183 non-null int64 7 Parch 183 non-null int64 8 Ticket 183 non-null object 9 Fare 183 non-null float64 10 Cabin 183 non-null object 11 Embarked 183 non-null object dtypes: float64(2), int64(5), object(5) memory usage: 18.6+ KB
Inย [ย ]:
cond = (titanic['Age'].isna())
titanic[cond].head(5)
Out[ย ]:
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
5 | 6 | 0 | 3 | Moran, Mr. James | male | NaN | 0 | 0 | 330877 | 8.4583 | NaN | Q |
17 | 18 | 1 | 2 | Williams, Mr. Charles Eugene | male | NaN | 0 | 0 | 244373 | 13.0000 | NaN | S |
19 | 20 | 1 | 3 | Masselmani, Mrs. Fatima | female | NaN | 0 | 0 | 2649 | 7.2250 | NaN | C |
26 | 27 | 0 | 3 | Emir, Mr. Farred Chehab | male | NaN | 0 | 0 | 2631 | 7.2250 | NaN | C |
28 | 29 | 1 | 3 | O'Dwyer, Miss. Ellen "Nellie" | female | NaN | 0 | 0 | 330959 | 7.8792 | NaN | Q |
Inย [ย ]:
cond2 = (titanic['Age'].notna())
titanic[cond2].info()
<class 'pandas.core.frame.DataFrame'> Index: 714 entries, 0 to 890 Data columns (total 12 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 PassengerId 714 non-null int64 1 Survived 714 non-null int64 2 Pclass 714 non-null int64 3 Name 714 non-null object 4 Sex 714 non-null object 5 Age 714 non-null float64 6 SibSp 714 non-null int64 7 Parch 714 non-null int64 8 Ticket 714 non-null object 9 Fare 714 non-null float64 10 Cabin 185 non-null object 11 Embarked 712 non-null object dtypes: float64(2), int64(5), object(5) memory usage: 72.5+ KB
fillna ์ด์ฉํ ๋์นยถ
Inย [ย ]:
age_mean = titanic[['Age']].mean().round(2)
titanic['Age_mean'] = titanic['Age'].fillna(age_mean)
titanic.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 891 entries, 0 to 890 Data columns (total 13 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 PassengerId 891 non-null int64 1 Survived 891 non-null int64 2 Pclass 891 non-null int64 3 Name 891 non-null object 4 Sex 891 non-null object 5 Age 714 non-null float64 6 SibSp 891 non-null int64 7 Parch 891 non-null int64 8 Ticket 891 non-null object 9 Fare 891 non-null float64 10 Cabin 204 non-null object 11 Embarked 889 non-null object 12 Age_mean 714 non-null float64 dtypes: float64(3), int64(5), object(5) memory usage: 90.6+ KB
SimpleImputer ์ด์ฉํ ๋์นยถ
Inย [ย ]:
from sklearn.impute import SimpleImputer
si = SimpleImputer()
si.fit(titanic[['Age']])
Out[ย ]:
SimpleImputer()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
SimpleImputer()
Inย [ย ]:
si.statistics_
Out[ย ]:
array([29.69911765])
Inย [ย ]:
titanic['Age_si_mean'] = si.transform(titanic[['Age']])
titanic.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 891 entries, 0 to 890 Data columns (total 14 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 PassengerId 891 non-null int64 1 Survived 891 non-null int64 2 Pclass 891 non-null int64 3 Name 891 non-null object 4 Sex 891 non-null object 5 Age 714 non-null float64 6 SibSp 891 non-null int64 7 Parch 891 non-null int64 8 Ticket 891 non-null object 9 Fare 891 non-null float64 10 Cabin 204 non-null object 11 Embarked 889 non-null object 12 Age_mean 714 non-null float64 13 Age_si_mean 891 non-null float64 dtypes: float64(4), int64(5), object(5) memory usage: 97.6+ KB
'๐ Python' ์นดํ ๊ณ ๋ฆฌ์ ๋ค๋ฅธ ๊ธ
ํ์ด์ฌ ์ค์ผ์ผ๋ง ์ค์ต (0) | 2024.02.05 |
---|---|
ํ์ด์ฌ ์ธ์ฝ๋ฉ ์ค์ต (0) | 2024.02.05 |
ํ์ด์ฌ ์ด์์น ํ์ธ ์ค์ต (0) | 2024.02.05 |
240126 FRI ํ์ด์ฌ ๋ฐ์ดํฐ ๋ถ์ ๊ฐ์ธ ๊ณผ์ (0) | 2024.01.26 |
240125 THU ํ์ด์ฌ์ ํ์ฉํ ์ ์ฒ๋ฆฌ & ์๊ฐํ ๊ฐ์ (1) | 2024.01.25 |