๊ธ์ต CSS(์ ์ฉํ๊ฐ๋ชจ๋ธ) ๊ฐ๋ฐ์ ์ฃผ์ ๋ก ํ ํ๋ก์ ํธ์ ์ฐฉ์ํ๋ค. ๊ฐ๋จํ๊ฒ ์ ์ฒ๋ฆฌ์ EDA๋ฅผ ์งํํ๋๋ฐ, ๋ฒ์ฃผํ ๋ฐ์ดํฐ๋ค ๋ค ๊ฑด๋๋ ธ๋๋ ์ ๋ ์๊ฐ์ด ํ์ฉ ๋์ด ์์๋ค.
๊ณ ๋ฏผ์ ํ์ ๋ค ใ .. ์ฃผํผํฐ ๋ ธํธ๋ถ์๋ ๊น๋นกํ๊ณ ์ ์ ์๋๋ฐ, ์ ์ฉ๋ฑ๊ธ(A, B, …, G) ๋ํ int ํ์ ๊ทธ๋๋ก ๋์ง, ์์นํ์ผ๋ก ๋ฐ๊พธ๋ ๊ฒ ๋์์ง ๊ณ ๋ฏผ ์ค์ด๋ค.
๊ทธ๋ฆฌ๊ณ ์์นํ ๋ฐ์ดํฐ๋ค์ ์ด๋๋ถํฐ ์ด๋ป๊ฒ ํ์ธํด์ผ ํ ์ง ๊ฐ์ด ์ ์กํ๋ค. ์ ์ฉ๋ฑ๊ธ๊ณผ์ ๊ด๊ณ๋ฅผ ์ข ๋ณด๊ณ ์ถ์๋ฐ ๋ฌด์์ pairplot ์ฐ๋ ค๋ ๋ฐ์ดํฐ๊ฐ ๋๋ฌด ๋ง๋ค. ์ผ๋จ ์ค๋์ ์ฌ๊ธฐ๊น์ง๋ง ํด ๋๊ณ , ๋ด์ผ ๋ค์ ๋ฏ์ด๋ด์ผ ํ ๊ฒ ๊ฐ๋ค.
์์ฆ ์ฝํ ๋ฅผ ๋ง์ด ๋ชป ํ๊ณ ์๊ณ , ํํ ํ์ ๊ธฐ๊ฐ์ด ADsP ์ค๋น ๊ธฐ๊ฐ์ด๋ ๊ฒน์ณค๋ค. ํ ๊ฒ ์ฐ๋๋ฏธ๋ผ ์ฌ์ ๊ฐ ์์ง๋ง ์๊ธฐํจ๋ฅ๊ฐ์ ์ต๋์น์ธ ๋๋ ๋ค (ใ ใ )
์ ๊ทธ๋ฆฌ๊ณ ๊นํ์ ํ์ ๋ ํฌ์งํ ๋ฆฌ๋ฅผ ํ ๋ค. ์ผ๋จ ์ค๋ ์์ฑํ ์ฝ๋๋ง์ด๋ผ๋ ๋ฐฑ์ ํ๋ ค๋ค๊ฐ ๋ ํด๋ ๊ตฌ๋ถํ๊ณ ์ด์ฉ๊ณ ํ๊ธฐ๊ฐ ์ข ํผ๊ณคํด์ ๋ค์์ผ๋ก ๋ฏธ๋ฃจ๊ธฐ๋ก ํ๋ค. ์๋๋ ์ค๋ ์น๊ณ ๋ฏ๊ณ ๋ง๋ณด๊ณ ์ฆ๊ฒผ๋ ๋ด์ฉ ๐
PJT: ๊ธ์ต CSS(์ ์ฉํ๊ฐ๋ชจ๋ธ) ๊ฐ๋ฐยถ
์ดํด์
ํจ์ ๋ชฉ๋ก
- get_loan_month: ๋์ถ๊ธฐ๊ฐ ์ ์ฒ๋ฆฌ
- get_work_year: ๊ทผ๋ก๊ธฐ๊ฐ ์ ์ฒ๋ฆฌ
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import sklearn
plt.rcParams['font.family'] ='D2Coding'
train_df = pd.read_csv('C:/Users/LOVE/Downloads/vscode/CSS/train.csv')
test_df = pd.read_csv('C:/Users/LOVE/Downloads/vscode/CSS/test.csv')
train_df
ID | ๋์ถ๊ธ์ก | ๋์ถ๊ธฐ๊ฐ | ๊ทผ๋ก๊ธฐ๊ฐ | ์ฃผํ์์ ์ํ | ์ฐ๊ฐ์๋ | ๋ถ์ฑ_๋๋น_์๋_๋น์จ | ์ด๊ณ์ข์ | ๋์ถ๋ชฉ์ | ์ต๊ทผ_2๋ ๊ฐ_์ฐ์ฒด_ํ์ | ์ด์ํ์๊ธ | ์ด์ํ์ด์ | ์ด์ฐ์ฒด๊ธ์ก | ์ฐ์ฒด๊ณ์ข์ | ๋์ถ๋ฑ๊ธ | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | TRAIN_00000 | 12480000 | 36 months | 6 years | RENT | 72000000 | 18.90 | 15 | ๋ถ์ฑ ํตํฉ | 0 | 0 | 0.0 | 0.0 | 0.0 | C |
1 | TRAIN_00001 | 14400000 | 60 months | 10+ years | MORTGAGE | 130800000 | 22.33 | 21 | ์ฃผํ ๊ฐ์ | 0 | 373572 | 234060.0 | 0.0 | 0.0 | B |
2 | TRAIN_00002 | 12000000 | 36 months | 5 years | MORTGAGE | 96000000 | 8.60 | 14 | ๋ถ์ฑ ํตํฉ | 0 | 928644 | 151944.0 | 0.0 | 0.0 | A |
3 | TRAIN_00003 | 14400000 | 36 months | 8 years | MORTGAGE | 132000000 | 15.09 | 15 | ๋ถ์ฑ ํตํฉ | 0 | 325824 | 153108.0 | 0.0 | 0.0 | C |
4 | TRAIN_00004 | 18000000 | 60 months | Unknown | RENT | 71736000 | 25.39 | 19 | ์ฃผ์ ๊ตฌ๋งค | 0 | 228540 | 148956.0 | 0.0 | 0.0 | B |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
96289 | TRAIN_96289 | 14400000 | 36 months | 10+ years | MORTGAGE | 210000000 | 9.33 | 33 | ์ ์ฉ ์นด๋ | 0 | 974580 | 492168.0 | 0.0 | 0.0 | C |
96290 | TRAIN_96290 | 28800000 | 60 months | 10+ years | MORTGAGE | 132000000 | 5.16 | 25 | ์ฃผํ ๊ฐ์ | 0 | 583728 | 855084.0 | 0.0 | 0.0 | E |
96291 | TRAIN_96291 | 14400000 | 36 months | 1 year | MORTGAGE | 84000000 | 11.24 | 22 | ์ ์ฉ ์นด๋ | 0 | 1489128 | 241236.0 | 0.0 | 0.0 | A |
96292 | TRAIN_96292 | 15600000 | 36 months | 5 years | MORTGAGE | 66330000 | 17.30 | 21 | ๋ถ์ฑ ํตํฉ | 2 | 1378368 | 818076.0 | 0.0 | 0.0 | D |
96293 | TRAIN_96293 | 8640000 | 36 months | 10+ years | RENT | 50400000 | 11.80 | 14 | ์ ์ฉ ์นด๋ | 0 | 596148 | 274956.0 | 0.0 | 0.0 | C |
96294 rows ร 15 columns
print(train_df.shape, test_df.shape)
(96294, 15) (64197, 14)
1. EDA & ์ ์ฒ๋ฆฌยถ
train_df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 96294 entries, 0 to 96293 Data columns (total 15 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 ID 96294 non-null object 1 ๋์ถ๊ธ์ก 96294 non-null int64 2 ๋์ถ๊ธฐ๊ฐ 96294 non-null object 3 ๊ทผ๋ก๊ธฐ๊ฐ 96294 non-null object 4 ์ฃผํ์์ ์ํ 96294 non-null object 5 ์ฐ๊ฐ์๋ 96294 non-null int64 6 ๋ถ์ฑ_๋๋น_์๋_๋น์จ 96294 non-null float64 7 ์ด๊ณ์ข์ 96294 non-null int64 8 ๋์ถ๋ชฉ์ 96294 non-null object 9 ์ต๊ทผ_2๋ ๊ฐ_์ฐ์ฒด_ํ์ 96294 non-null int64 10 ์ด์ํ์๊ธ 96294 non-null int64 11 ์ด์ํ์ด์ 96294 non-null float64 12 ์ด์ฐ์ฒด๊ธ์ก 96294 non-null float64 13 ์ฐ์ฒด๊ณ์ข์ 96294 non-null float64 14 ๋์ถ๋ฑ๊ธ 96294 non-null object dtypes: float64(4), int64(5), object(6) memory usage: 11.0+ MB
(1) ๊ธฐ์ ํต๊ณ ํ์ธยถ
train_df.describe(include='all')
ID | ๋์ถ๊ธ์ก | ๋์ถ๊ธฐ๊ฐ | ๊ทผ๋ก๊ธฐ๊ฐ | ์ฃผํ์์ ์ํ | ์ฐ๊ฐ์๋ | ๋ถ์ฑ_๋๋น_์๋_๋น์จ | ์ด๊ณ์ข์ | ๋์ถ๋ชฉ์ | ์ต๊ทผ_2๋ ๊ฐ_์ฐ์ฒด_ํ์ | ์ด์ํ์๊ธ | ์ด์ํ์ด์ | ์ด์ฐ์ฒด๊ธ์ก | ์ฐ์ฒด๊ณ์ข์ | ๋์ถ๋ฑ๊ธ | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 96294 | 9.629400e+04 | 96294 | 96294 | 96294 | 9.629400e+04 | 96294.000000 | 96294.000000 | 96294 | 96294.000000 | 9.629400e+04 | 9.629400e+04 | 96294.000000 | 96294.000000 | 96294 |
unique | 96294 | NaN | 2 | 16 | 4 | NaN | NaN | NaN | 12 | NaN | NaN | NaN | NaN | NaN | 7 |
top | TRAIN_00000 | NaN | 36 months | 10+ years | MORTGAGE | NaN | NaN | NaN | ๋ถ์ฑ ํตํฉ | NaN | NaN | NaN | NaN | NaN | B |
freq | 1 | NaN | 64479 | 31585 | 47934 | NaN | NaN | NaN | 55150 | NaN | NaN | NaN | NaN | NaN | 28817 |
mean | NaN | 1.830400e+07 | NaN | NaN | NaN | 9.392672e+07 | 19.379590 | 25.304827 | NaN | 0.345681 | 8.225035e+05 | 4.282282e+05 | 54.380584 | 0.005805 | NaN |
std | NaN | 1.032908e+07 | NaN | NaN | NaN | 9.956871e+07 | 33.569559 | 12.088566 | NaN | 0.919119 | 1.027745e+06 | 4.402111e+05 | 1414.769218 | 0.079966 | NaN |
min | NaN | 1.200000e+06 | NaN | NaN | NaN | 0.000000e+00 | 0.000000 | 4.000000 | NaN | 0.000000 | 0.000000e+00 | 0.000000e+00 | 0.000000 | 0.000000 | NaN |
25% | NaN | 1.020000e+07 | NaN | NaN | NaN | 5.760000e+07 | 12.650000 | 17.000000 | NaN | 0.000000 | 3.075720e+05 | 1.346160e+05 | 0.000000 | 0.000000 | NaN |
50% | NaN | 1.680000e+07 | NaN | NaN | NaN | 7.800000e+07 | 18.740000 | 24.000000 | NaN | 0.000000 | 5.976960e+05 | 2.870040e+05 | 0.000000 | 0.000000 | NaN |
75% | NaN | 2.400000e+07 | NaN | NaN | NaN | 1.128000e+08 | 25.540000 | 32.000000 | NaN | 0.000000 | 1.055076e+06 | 5.702160e+05 | 0.000000 | 0.000000 | NaN |
max | NaN | 4.200000e+07 | NaN | NaN | NaN | 1.080000e+10 | 9999.000000 | 169.000000 | NaN | 30.000000 | 4.195594e+07 | 5.653416e+06 | 75768.000000 | 4.000000 | NaN |
๋ฉ๋ชจ
- ๋ฒ์ฃผํ
- ๋์ถ๊ธฐ๊ฐ: 2๊ฐ(36m, 60m). 'months' ์ญ์
- ๊ทผ๋ก๊ธฐ๊ฐ: '+', 'years' ๋ฑ ์ญ์
- ์ฃผํ์์ ์ํ: 4๊ฐ. ์ํซ์ธ์ฝ๋ฉ (ํ๋ ฌ๋ก?) ?
- ๋์ถ ๋ชฉ์ : 12๊ฐ. ์ํซ์ธ์ฝ๋ฉ (ํ๋ ฌ๋ก?) ? ๋๋ฌด ๋ง์ ๋ฏ
- ์์นํ
- ์ต๊ทผ 2๋ ๊ฐ ์ฐ์ฒด ํ์: MAX 30, MIN 0, MEAN 0.3, STD 0.9 -> ์ด์์น ํ์ธ
- ์ด์ฐ์ฒด๊ธ์ก: MAX 75768, MIN 0, MEAN 54, STD 1414 -> ์ด์์น ํ์ธ
- ์ฐ์ฒด๊ณ์ข์: MAX 4, MEAN 0.005, STD 0.079 -> ์ด์์น ํ์ธ
๊ฐ์ค ์์ค
- ์ด๊ณ์ข์-์ฐ์ฒด๊ณ์ข์-๋์ถ๋ฑ๊ธ
- ๋ถ์ฑ๋๋น์๋๋น์จ-๋์ถ๋ฑ๊ธ
- ๋์ถ๊ธ์ก-์ฐ๊ฐ์๋-์ด์ํ์๊ธ-๋์ถ๋ฑ๊ธ
(2) ๋ฒ์ฃผํ ์๋ฃ ํ์ธ & ๋ณํยถ
for i in train_df[['๋์ถ๊ธฐ๊ฐ', '๊ทผ๋ก๊ธฐ๊ฐ', '์ฃผํ์์ ์ํ', '๋์ถ๋ชฉ์ ', '๋์ถ๋ฑ๊ธ']].columns:
print(train_df[i].value_counts())
๋์ถ๊ธฐ๊ฐ 36 months 64479 60 months 31815 Name: count, dtype: int64 ๊ทผ๋ก๊ธฐ๊ฐ 10+ years 31585 2 years 8450 < 1 year 7774 3 years 7581 1 year 6249 Unknown 5671 5 years 5665 4 years 5588 8 years 4888 6 years 3874 7 years 3814 9 years 3744 10+years 896 <1 year 370 3 89 1 years 56 Name: count, dtype: int64 ์ฃผํ์์ ์ํ MORTGAGE 47934 RENT 37705 OWN 10654 ANY 1 Name: count, dtype: int64 ๋์ถ๋ชฉ์ ๋ถ์ฑ ํตํฉ 55150 ์ ์ฉ ์นด๋ 24500 ์ฃผํ ๊ฐ์ 6160 ๊ธฐํ 4725 ์ฃผ์ ๊ตฌ๋งค 1803 ์๋ฃ 1039 ์๋์ฐจ 797 ์๊ท๋ชจ ์ฌ์ 787 ์ด์ฌ 506 ํด๊ฐ 466 ์ฃผํ 301 ์ฌ์ ์๋์ง 60 Name: count, dtype: int64 ๋์ถ๋ฑ๊ธ B 28817 C 27623 A 16772 D 13354 E 7354 F 1954 G 420 Name: count, dtype: int64
# ๋์ถ๊ธฐ๊ฐ
def get_loan_month(mt):
return int(mt.strip().replace('months', ''))
train_df['๋์ถ๊ธฐ๊ฐ'] = train_df['๋์ถ๊ธฐ๊ฐ'].apply(get_loan_month)
train_df[['๋์ถ๊ธฐ๊ฐ']]
๋์ถ๊ธฐ๊ฐ | |
---|---|
0 | 36 |
1 | 60 |
2 | 36 |
3 | 36 |
4 | 60 |
... | ... |
96289 | 36 |
96290 | 60 |
96291 | 36 |
96292 | 36 |
96293 | 36 |
96294 rows ร 1 columns
# ๊ทผ๋ก๊ธฐ๊ฐ
def get_work_year(yr):
if yr in ['<1 year', '< 1 year']:
return 0
elif yr in ['1 year', '1 years']:
return 1
elif yr in ['10+ years', '10+years']:
return 10
# 'Unknown'์ NaN์ผ๋ก ๋์ฒด
elif yr == 'Unknown':
return np.nan
# ๊ทธ ์ธ๋ ์ซ์๋ง ์ถ์ถ
else:
return int(''.join(filter(str.isdigit, yr)))
train_df['๊ทผ๋ก๊ธฐ๊ฐ'] = train_df['๊ทผ๋ก๊ธฐ๊ฐ'].apply(get_work_year)
train_df[['๊ทผ๋ก๊ธฐ๊ฐ']]
๊ทผ๋ก๊ธฐ๊ฐ | |
---|---|
0 | 6.0 |
1 | 10.0 |
2 | 5.0 |
3 | 8.0 |
4 | NaN |
... | ... |
96289 | 10.0 |
96290 | 10.0 |
96291 | 1.0 |
96292 | 5.0 |
96293 | 10.0 |
96294 rows ร 1 columns
# ์๊ฐํ
ctgr_list = ['๋์ถ๊ธฐ๊ฐ', '๊ทผ๋ก๊ธฐ๊ฐ', '์ฃผํ์์ ์ํ', '๋์ถ๋ชฉ์ ']
def ctgr_value_counts(iumn_name):
for uniq in list(train_df[iumn_name].unique()):
cond_unique = (train_df[iumn_name]==uniq)
print(uniq)
print(train_df.loc[cond_unique]['๋์ถ๋ฑ๊ธ'].value_counts())
print()
for i in ctgr_list:
plt.figure(figsize=(10,6))
sns.countplot(data=train_df, x=i, hue='๋์ถ๋ฑ๊ธ', hue_order=['A', 'B', 'C', 'D', 'E', 'F', 'G'])
plt.legend(loc='best')
plt.show()
ctgr_value_counts(iumn_name=i)
36 ๋์ถ๋ฑ๊ธ B 22883 C 16935 A 15952 D 6485 E 1895 F 270 G 59 Name: count, dtype: int64 60 ๋์ถ๋ฑ๊ธ C 10688 D 6869 B 5934 E 5459 F 1684 A 820 G 361 Name: count, dtype: int64
6.0 ๋์ถ๋ฑ๊ธ B 1157 C 1124 A 650 D 543 E 296 F 85 G 19 Name: count, dtype: int64 10.0 ๋์ถ๋ฑ๊ธ B 9865 C 9105 A 6009 D 4385 E 2331 F 649 G 137 Name: count, dtype: int64 5.0 ๋์ถ๋ฑ๊ธ B 1679 C 1646 A 986 D 736 E 475 F 119 G 24 Name: count, dtype: int64 8.0 ๋์ถ๋ฑ๊ธ B 1417 C 1364 A 867 D 707 E 385 F 128 G 20 Name: count, dtype: int64 nan Series([], Name: count, dtype: int64) 9.0 ๋์ถ๋ฑ๊ธ B 1137 C 1049 A 621 D 545 E 289 F 85 G 18 Name: count, dtype: int64 2.0 ๋์ถ๋ฑ๊ธ C 2495 B 2493 A 1454 D 1107 E 695 F 170 G 36 Name: count, dtype: int64 1.0 ๋์ถ๋ฑ๊ธ B 1867 C 1860 A 1031 D 927 E 481 F 110 G 29 Name: count, dtype: int64 3.0 ๋์ถ๋ฑ๊ธ B 2302 C 2225 A 1315 D 1028 E 608 F 153 G 39 Name: count, dtype: int64 7.0 ๋์ถ๋ฑ๊ธ B 1138 C 1103 A 619 D 535 E 318 F 79 G 22 Name: count, dtype: int64 4.0 ๋์ถ๋ฑ๊ธ B 1659 C 1577 A 969 D 796 E 440 F 122 G 25 Name: count, dtype: int64 0.0 ๋์ถ๋ฑ๊ธ C 2447 B 2361 A 1317 D 1173 E 658 F 162 G 26 Name: count, dtype: int64
RENT ๋์ถ๋ฑ๊ธ C 11478 B 11200 D 5653 A 5268 E 3056 F 855 G 195 Name: count, dtype: int64 MORTGAGE ๋์ถ๋ฑ๊ธ B 14518 C 13106 A 9640 D 6163 E 3452 F 883 G 172 Name: count, dtype: int64 OWN ๋์ถ๋ฑ๊ธ B 3099 C 3038 A 1864 D 1538 E 846 F 216 G 53 Name: count, dtype: int64 ANY ๋์ถ๋ฑ๊ธ C 1 Name: count, dtype: int64
๋ถ์ฑ ํตํฉ ๋์ถ๋ฑ๊ธ C 16349 B 15680 D 8529 A 8036 E 4979 F 1311 G 266 Name: count, dtype: int64 ์ฃผํ ๊ฐ์ ๋์ถ๋ฑ๊ธ B 1845 C 1689 A 1225 D 805 E 425 F 145 G 26 Name: count, dtype: int64 ์ฃผ์ ๊ตฌ๋งค ๋์ถ๋ฑ๊ธ C 503 B 479 A 372 D 238 E 150 F 53 G 8 Name: count, dtype: int64 ํด๊ฐ ๋์ถ๋ฑ๊ธ C 196 B 119 D 78 A 40 E 27 F 4 G 2 Name: count, dtype: int64 ์๋ฃ ๋์ถ๋ฑ๊ธ C 375 B 238 D 197 A 104 E 91 F 27 G 7 Name: count, dtype: int64 ์๋์ฐจ ๋์ถ๋ฑ๊ธ B 258 C 213 A 174 D 87 E 46 F 15 G 4 Name: count, dtype: int64 ์ ์ฉ ์นด๋ ๋์ถ๋ฑ๊ธ B 8917 A 6424 C 6036 D 2092 E 877 F 137 G 17 Name: count, dtype: int64 ์๊ท๋ชจ ์ฌ์ ๋์ถ๋ฑ๊ธ C 249 D 192 E 158 B 75 F 64 G 30 A 19 Name: count, dtype: int64 ๊ธฐํ ๋์ถ๋ฑ๊ธ C 1705 B 1080 D 916 E 478 A 356 F 147 G 43 Name: count, dtype: int64 ์ด์ฌ ๋์ถ๋ฑ๊ธ C 203 D 131 B 86 E 50 F 18 A 13 G 5 Name: count, dtype: int64 ์ฃผํ ๋์ถ๋ฑ๊ธ C 85 D 68 E 63 B 36 F 29 G 12 A 8 Name: count, dtype: int64 ์ฌ์ ์๋์ง ๋์ถ๋ฑ๊ธ D 21 C 20 E 10 F 4 B 4 A 1 Name: count, dtype: int64
(3) ์์นํ ์๋ฃ ํ์ธ & ๋ณํยถ
train_df.describe()
๋์ถ๊ธ์ก | ๋์ถ๊ธฐ๊ฐ | ๊ทผ๋ก๊ธฐ๊ฐ | ์ฐ๊ฐ์๋ | ๋ถ์ฑ_๋๋น_์๋_๋น์จ | ์ด๊ณ์ข์ | ์ต๊ทผ_2๋ ๊ฐ_์ฐ์ฒด_ํ์ | ์ด์ํ์๊ธ | ์ด์ํ์ด์ | ์ด์ฐ์ฒด๊ธ์ก | ์ฐ์ฒด๊ณ์ข์ | |
---|---|---|---|---|---|---|---|---|---|---|---|
count | 9.629400e+04 | 96294.000000 | 90623.000000 | 9.629400e+04 | 96294.000000 | 96294.000000 | 96294.000000 | 9.629400e+04 | 9.629400e+04 | 96294.000000 | 96294.000000 |
mean | 1.830400e+07 | 43.929466 | 6.007791 | 9.392672e+07 | 19.379590 | 25.304827 | 0.345681 | 8.225035e+05 | 4.282282e+05 | 54.380584 | 0.005805 |
std | 1.032908e+07 | 11.288582 | 3.728511 | 9.956871e+07 | 33.569559 | 12.088566 | 0.919119 | 1.027745e+06 | 4.402111e+05 | 1414.769218 | 0.079966 |
min | 1.200000e+06 | 36.000000 | 0.000000 | 0.000000e+00 | 0.000000 | 4.000000 | 0.000000 | 0.000000e+00 | 0.000000e+00 | 0.000000 | 0.000000 |
25% | 1.020000e+07 | 36.000000 | 2.000000 | 5.760000e+07 | 12.650000 | 17.000000 | 0.000000 | 3.075720e+05 | 1.346160e+05 | 0.000000 | 0.000000 |
50% | 1.680000e+07 | 36.000000 | 6.000000 | 7.800000e+07 | 18.740000 | 24.000000 | 0.000000 | 5.976960e+05 | 2.870040e+05 | 0.000000 | 0.000000 |
75% | 2.400000e+07 | 60.000000 | 10.000000 | 1.128000e+08 | 25.540000 | 32.000000 | 0.000000 | 1.055076e+06 | 5.702160e+05 | 0.000000 | 0.000000 |
max | 4.200000e+07 | 60.000000 | 10.000000 | 1.080000e+10 | 9999.000000 | 169.000000 | 30.000000 | 4.195594e+07 | 5.653416e+06 | 75768.000000 | 4.000000 |
# ์๊ฐํ
num_list = ['๋์ถ๊ธ์ก', '์ฐ๊ฐ์๋', '๋ถ์ฑ_๋๋น_์๋_๋น์จ', '์ด๊ณ์ข์', '์ต๊ทผ_2๋
๊ฐ_์ฐ์ฒด_ํ์', '์ด์ํ์๊ธ', '์ด์ํ์ด์', '์ด์ฐ์ฒด๊ธ์ก', '์ฐ์ฒด๊ณ์ข์']
for i in num_list:
fig, ax = plt.subplots(ncols=2, nrows=1, figsize=(14, 6))
sns.histplot(data=train_df, x=i, hue='๋์ถ๋ฑ๊ธ', ax=ax[0])
sns.boxplot(data=train_df, y=i, x='๋์ถ๋ฑ๊ธ', order=['A', 'B', 'C', 'D', 'E', 'F', 'G'], ax=ax[1])
plt.show()
*#262c36 → transparent
'๐ฅ ๋ด์ผ๋ฐฐ์์บ ํ DA' ์นดํ ๊ณ ๋ฆฌ์ ๋ค๋ฅธ ๊ธ
240304 MON ์ค์ ํ๋ก์ ํธ 5์ผ์ฐจ ํ๊ณ (0) | 2024.03.04 |
---|---|
ํ๋ก์ ํธ ํ KPT (0) | 2024.02.20 |
2401-3 WIL ํ๋ก์ ํธ ํ๊ณ (0) | 2024.01.21 |
240115 MON ํ๋ก์ ํธ ์ฃผ๊ฐ (0) | 2024.01.15 |
2401-1 WIL (0) | 2024.01.07 |