๐Ÿ”ฅ ๋‚ด์ผ๋ฐฐ์›€์บ ํ”„ DA

231215 FRI ๋ฐ์ดํ„ฐ ๋ถ„์„์˜ ํ๋ฆ„, ํŒŒ์ด์ฌ ๊ธฐ์ดˆ

ํ–‰ํŒฝ 2023. 12. 17. 01:26

ํŒŒ์ด์ฌ ๊ฐ•์˜ ์ˆ˜๊ฐ•์„ ์‹œ์ž‘ํ–ˆ๋‹ค. ๊ทผ๋ฐ ๊ฐ•์˜ ์ง„ํ–‰ ๋ฐฉ์‹์ด ๋ฌด๋”ฐ๊ธฐ์ธ ๊ฑด ์ข‹์€๋ฐ ์„ค๋ช…์ด ๋„ˆ๋ฌด ๋ถ€์กฑํ•œ ๊ฒƒ ๊ฐ™์•„์„œ ์•ฝ๊ฐ„ ๋‹นํ™ฉ์Šค๋Ÿฝ๋‹ค.

 


 

Liked

 

  • ์˜ค๋žœ๋งŒ์— ํŒŒ์ด์ฌ ์กฐ๋ฌผ์กฐ๋ฌผ ํ•œ ์ .

 

Lacked

 

  • TIL ์“ธ ๋•Œ๋งˆ๋‹ค ๊ณ„์† ์“ฐ๋Š” ๊ฒƒ ๊ฐ™์€๋ฐ ์ด๋ก ์˜ ๋ถ€์กฑ…

 

Learned

 

1. ๋ฐ์ดํ„ฐ ๋ถ„์„์˜ ํ๋ฆ„

 

  • ๋ฌธ์ œ ์ •์˜ ๋ฐ ๊ฐ€์„ค ์„ค์ •ํ•˜๊ธฐ
  • ๋ฐ์ดํ„ฐ ๋ถ„์„ ๊ธฐ๋ณธ ์„ธํŒ… ํ•˜๊ธฐ
  • ๋ฐ์ดํ„ฐ ๋ถ„์„ํ•˜๊ธฐ
  • ๋ถ„์„ ๊ฒฐ๊ณผ ์‹œ๊ฐํ™” ํ•˜๊ธฐ
  • ์ตœ์ข… ๊ฒฐ๋ก  ๋‚ด๋ฆฌ๊ธฐ

 

2. ํŒŒ์ด์ฌ ๊ธฐ์ดˆ

 

  • ํŒŒ์ด์ฌ : ํ”„๋กœ๊ทธ๋ž˜๋ฐ ์–ธ์–ด
  • ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ : ์†Œํ”„ํŠธ์›จ์–ด ๊ฐœ๋ฐœ์— ์“ฐ์ด๋Š” ํ•˜๋ถ€ ํ”„๋กœ๊ทธ๋žจ๋“ค์˜ ๋ชจ์Œ์ง‘ (์ฝ”๋“œ ๋ชจ์Œ์ง‘)
    • pandas : ๋ฐ์ดํ„ฐ ๋ถ„์„์— ์‚ฌ์šฉ๋˜๋Š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ
    • matplotlib : ์‹œ๊ฐํ™” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ

 

3. ํŒŒ์ด์ฌ ๋ฌธ๋ฒ• ๊ธฐ์ดˆ

 

  • ๋ณ€์ˆ˜ : ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ด๋Š” ์ปจํ…Œ์ด๋„ˆ
  • ๋ฆฌ์ŠคํŠธ : ์ธ๋ฑ์Šค(์ˆœ์„œ)๊ฐ€ ์žˆ๋Š” ๋ฐ์ดํ„ฐ๋“ค์˜ ๋ชจ์Œ์ง‘
#๋ณ€์ˆ˜
student_1 = "๊น€์ฒ ์ˆ˜"
student_2 = "๊น€์˜ํฌ"
…
student_3 = "ํ™๊ธธ๋™"

#๋ฆฌ์ŠคํŠธ
students_list = ["๊น€์ฒ ์ˆ˜", "๊น€์˜ํฌ", …, "ํ™๊ธธ๋™"]

 

  • ๋”•์…”๋„ˆ๋ฆฌ : ์ด๋ฆ„(key)๊ณผ ๊ฐ’(value)์ด ์Œ์œผ๋กœ ์ด๋ฃจ์–ด์ง„ ๋ฐ์ดํ„ฐ๋“ค์˜ ๋ชจ์Œ์ง‘
#๋ณ€์ˆ˜๋กœ ํ•˜๋‚˜์”ฉ ์ง€์ •
๊น€์ฒ ์ˆ˜_height = 180
๊น€์ฒ ์ˆ˜_weight = 70
๊น€์ฒ ์ˆ˜_room = "room1"
…

#๋”•์…”๋„ˆ๋ฆฌ
๊น€์ฒ ์ˆ˜ = {'height' : 183, 'weight' : 68, …}

 

 

 

4. Pandas, Matplotlib ํ™œ์šฉํ•œ ๋ฐ์ดํ„ฐ ๋ถ„์„ ๋ฐ ์‹œ๊ฐํ™” ๊ธฐ์ดˆ

 

  • ๋ฐ์ดํ„ฐ ๋ถ„์„ ๊ธฐ๋ณธ ์„ธํŒ…ํ•˜๊ธฐ → ๋ฐ์ดํ„ฐ ๋ถ„์„ํ•˜๊ธฐ
import pandas as pd                                    #pandas ์‚ฌ์šฉ ์„ ์–ธ 
titanic = pd.read_table('/content/train.csv',sep=',')  #titanic ํ…Œ์ด๋ธ” ๊ฐ€์ ธ์˜ค๊ธฐ
titanic = titanic.dropna()                             #null๊ฐ’ ์ œ๊ฑฐ
titanic.head()
corr=titanic.corr(method='pearson')                    #ํ”ผ์–ด์Šจ ๋ฐฉ๋ฒ•์œผ๋กœ ์ƒ๊ด€๊ณ„์ˆ˜ ๊ตฌํ•˜๊ธฐ
corr = corr[corr.Survived !=1]                         #์ƒ๊ด€๊ณ„์ˆ˜ Survived ์š”์†Œ๊ฐ€ 1(์ตœ๋Œ€)์ด ์•„๋‹Œ ์ˆ˜๋งŒ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
corr                                                   #์กฐํšŒ 

 

 

  • ๋ถ„์„ ๊ฒฐ๊ณผ ์‹œ๊ฐํ™”ํ•˜๊ธฐ
import matplotlib.pyplot as plt                 #matplotlib ์‚ฌ์šฉ ์„ ์–ธ
corr = corr.drop(['PassengerId'], axis ='rows') #Passenger Id ์—ด ์ œ๊ฑฐ
corr['Survived'].plot.bar()                     #Survived ์—ด ์ง€์ • ํ›„ ๋ง‰๋Œ€๊ทธ๋ž˜ํ”„๋กœ ์กฐํšŒ

 

 

  • ๊ฒฐ๊ณผ
import pandas as pd
import matplotlib.pyplot as plt
titanic = pd.read_table('train.csv',sep=',')

# 1.Null(๊ณต๋ฐฑ) ๋ฐ์ดํ„ฐ ํŒŒ์•…ํ•˜๊ธฐ
print(titanic.isnull().sum())

# 2. ๊ณต๋ฐฑ ๋ฐ์ดํ„ฐ ์ œ๊ฑฐํ•˜๊ธฐ
titanic = titanic.dropna()

#์ƒ๊ด€๊ณ„์ˆ˜ ๊ตฌํ•˜๊ธฐ
corr=titanic.corr(method='pearson')

#survived 1์ธ ์š”์†Œ ์ œ์™ธํ•˜๊ธฐ
corr = corr[corr.Survived !=1]

#passengerId ์—ด ์‚ญ์ œ ํ•˜๊ธฐ
corr = corr.drop(['PassengerId'], axis ='rows')

#์ƒ์กด์œจ ์ƒ๊ด€๊ด€๊ณ„ ๋ฐ” ๊ทธ๋ž˜ํ”„ ์ƒ์„ฑํ•˜๊ธฐ
corr['Survived'].plot.bar()

#x์ถ• ๋ ˆ์ด๋ธ” 45๋„ ํšŒ์ „ํ•˜๊ธฐ
plt.xticks(rotation=45)



 

5. NumPy, Seaborn

 

  • NumPy : ๋ฐ์ดํ„ฐ ์—ฐ์‚ฐ์„ ๋„์™€์ค€๋‹ค.
  • Seaborn : matplotlib ์‹œ๊ฐํ™”๋ฅผ ๋„์™€์ค€๋‹ค.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

titanic = pd.read_table('/content/train.csv',sep=',')  #titanic ํ…Œ์ด๋ธ” ๊ฐ€์ ธ์˜ค๊ธฐ
titanic = titanic.dropna()                             #null๊ฐ’ ์ œ๊ฑฐ
titanic.head()
titanic.describe()   #๋ฐ์ดํ„ฐ ํ†ต๊ณ„์น˜ ์š”์•ฝ

#๋‚˜์ด๋ณ„๋กœ ํžˆ์Šคํ† ๊ทธ๋žจ ๊ตฌํ•˜๊ธฐ (์ฒซ ๋ฒˆ์งธ ๊ทธ๋ž˜ํ”„)
titanic['Age'].hist(bins=40,figsize=(18,8),grid=True)

#๋‚˜์ด๋ณ„ ๊ตฌ๋ถ„ ๋ฐ ๊ฐ ๋‚˜์ด๋ณ„ ์ƒ์กด์œจ ํ™•์ธ ํ•˜๊ธฐ
titanic['Age_cat'] = pd.cut(titanic['Age'],bins=[0,3,7,15,30,60,100],include_lowest=True,labels=['baby','children','teenage','young','adult','old'])

#์—ฐ๋ น๋Œ€๋ฅผ ๊ธฐ์ค€์œผ๋กœ ํ‰๊ท  ๊ฐ’์„ ๊ตฌํ•ด ๋ณผ์ˆ˜ ์žˆ์–ด์š”!
titanic.groupby('Age_cat').mean()

#๊ทธ๋ž˜ํ”„ ํฌ๊ธฐ ์„ค์ •
plt.figure(figsize=(14,5))

# ๋ฐ” ๊ทธ๋ž˜ํ”„ ๊ทธ๋ฆฌ๊ธฐ (x์ถ• = Age_cat, y์ถ• = Survived)
sns.barplot(x='Age_cat',y='Survived',data=titanic)

# ๊ทธ๋ž˜ํ”„ ๋‚˜ํƒ€๋‚ด๊ธฐ (๋‘ ๋ฒˆ์งธ ๊ทธ๋ž˜ํ”„)
plt.show()

 

 

 

 

 

Longed for

 

  • ๋‹ค์Œ์ฃผ๋ถ€ํ„ฐ ๋ณธ์บ ํ”„๋‹ˆ๊นŒ ์ฃผ๋ง ์‚ฌ์ด์— ์ฒด๋ ฅ์„ ๋ณด์ถฉํ•ด๋‘๊ณ  ์‹ถ๋‹ค.