웹크롤링과 데이터분석 : 전세계 축구 선수 몸값 분석 - 4 (Pandas로 데이터 분석하기)
2021. 11. 13. 13:12ㆍ빅데이터 스터디
데이터 프레임 만들기¶
In [39]:
import pandas as pd
from bs4 import BeautifulSoup
import requests
import time
In [40]:
#첫번째 방식
df1 = pd.DataFrame({'이름':['손흥민','메시','호날두'],
'나이':[28,33,35],
'소속':['토트넘','바르셀로나','유벤투스']},
index = [1,2,3])
# 딕셔너리의 key = column
df1
Out[40]:
이름 | 나이 | 소속 | |
---|---|---|---|
1 | 손흥민 | 28 | 토트넘 |
2 | 메시 | 33 | 바르셀로나 |
3 | 호날두 | 35 | 유벤투스 |
In [41]:
#두번째 방식
#리스트 안에 리스트
player_list=[
['손흥민',28,'토트넘'],
['메시',33,'바르셀로나'],
['호날두',35,'유벤투스']
]
df2 = pd.DataFrame(player_list,columns = ['이름','나이','소속'])
df2
Out[41]:
이름 | 나이 | 소속 | |
---|---|---|---|
0 | 손흥민 | 28 | 토트넘 |
1 | 메시 | 33 | 바르셀로나 |
2 | 호날두 | 35 | 유벤투스 |
In [4]:
headers = {'User-Agent':"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36"}
player_lst = []
for i in range(1,3):
url = f'https://www.transfermarkt.com/marktwertetop/wertvollstespieler?page={i}'
r = requests.get(url,headers = headers)
soup = BeautifulSoup(r.content,'html.parser')
player_info = soup.find_all('tr',{'class':['odd','even']})
for info in player_info:
player = info.find_all('td')
number = player[0].text
name = player[3].text
position = player[4].text
age = player[5].text
nation = player[6].img['alt']
team = player[7].img['alt']
value = player[8].text.strip()
player_lst.append([number,name,position,age,nation,team,value])
#두번째 형태
time.sleep(1)
df2 = pd.DataFrame(player_lst,
columns = ['number','name','position','age','nation','team','value'])
df2
Out[4]:
number | name | position | age | nation | team | value | |
---|---|---|---|---|---|---|---|
0 | 1 | Kylian Mbappé | Centre-Forward | 22 | France | Paris Saint-Germain | €160.00m |
1 | 2 | Erling Haaland | Centre-Forward | 21 | Norway | Borussia Dortmund | €150.00m |
2 | 3 | Harry Kane | Centre-Forward | 28 | England | Tottenham Hotspur | €120.00m |
3 | 4 | Jack Grealish | Left Winger | 26 | England | Manchester City | €100.00m |
4 | 5 | Mohamed Salah | Right Winger | 29 | Egypt | Liverpool FC | €100.00m |
5 | 6 | Romelu Lukaku | Centre-Forward | 28 | Belgium | Chelsea FC | €100.00m |
6 | 7 | Kevin De Bruyne | Attacking Midfield | 30 | Belgium | Manchester City | €100.00m |
7 | 8 | Neymar | Left Winger | 29 | Brazil | Paris Saint-Germain | €100.00m |
8 | 9 | Jadon Sancho | Left Winger | 21 | England | Manchester United | €90.00m |
9 | 10 | Frenkie de Jong | Central Midfield | 24 | Netherlands | FC Barcelona | €90.00m |
10 | 11 | Bruno Fernandes | Attacking Midfield | 27 | Portugal | Manchester United | €90.00m |
11 | 12 | Joshua Kimmich | Defensive Midfield | 26 | Germany | Bayern Munich | €90.00m |
12 | 13 | Raheem Sterling | Left Winger | 26 | England | Manchester City | €90.00m |
13 | 14 | Marcus Rashford | Left Winger | 24 | England | Manchester United | €85.00m |
14 | 15 | Sadio Mané | Left Winger | 29 | Senegal | Liverpool FC | €85.00m |
15 | 16 | Heung-min Son | Left Winger | 29 | Korea, South | Tottenham Hotspur | €85.00m |
16 | 17 | Pedri | Central Midfield | 18 | Spain | FC Barcelona | €80.00m |
17 | 18 | Phil Foden | Central Midfield | 21 | England | Manchester City | €80.00m |
18 | 19 | Lautaro Martínez | Centre-Forward | 24 | Argentina | Inter Milan | €80.00m |
19 | 20 | Marcos Llorente | Central Midfield | 26 | Spain | Atlético de Madrid | €80.00m |
20 | 21 | Lionel Messi | Right Winger | 34 | Argentina | Paris Saint-Germain | €80.00m |
21 | 22 | Mason Mount | Attacking Midfield | 22 | England | Chelsea FC | €75.00m |
22 | 23 | Trent Alexander-Arnold | Right-Back | 23 | England | Liverpool FC | €75.00m |
23 | 24 | Rúben Dias | Centre-Back | 24 | Portugal | Manchester City | €75.00m |
24 | 25 | Marquinhos | Centre-Back | 27 | Brazil | Paris Saint-Germain | €75.00m |
25 | 26 | Jude Bellingham | Central Midfield | 18 | England | Borussia Dortmund | €70.00m |
26 | 27 | João Félix | Second Striker | 22 | Portugal | Atlético de Madrid | €70.00m |
27 | 28 | Alphonso Davies | Left-Back | 21 | Canada | Bayern Munich | €70.00m |
28 | 29 | Achraf Hakimi | Right-Back | 23 | Morocco | Paris Saint-Germain | €70.00m |
29 | 30 | Declan Rice | Defensive Midfield | 22 | England | West Ham United | €70.00m |
30 | 31 | Rodri | Defensive Midfield | 25 | Spain | Manchester City | €70.00m |
31 | 32 | Mikel Oyarzabal | Left Winger | 24 | Spain | Real Sociedad | €70.00m |
32 | 33 | Federico Chiesa | Right Winger | 24 | Italy | Juventus FC | €70.00m |
33 | 34 | Matthijs de Ligt | Centre-Back | 22 | Netherlands | Juventus FC | €70.00m |
34 | 35 | Kai Havertz | Attacking Midfield | 22 | Germany | Chelsea FC | €70.00m |
35 | 36 | Sergej Milinković-Savić | Central Midfield | 26 | Serbia | SS Lazio | €70.00m |
36 | 37 | Bernardo Silva | Attacking Midfield | 27 | Portugal | Manchester City | €70.00m |
37 | 38 | Raphaël Varane | Centre-Back | 28 | France | Manchester United | €70.00m |
38 | 39 | Serge Gnabry | Right Winger | 26 | Germany | Bayern Munich | €70.00m |
39 | 40 | Leon Goretzka | Central Midfield | 26 | Germany | Bayern Munich | €70.00m |
40 | 41 | Jan Oblak | Goalkeeper | 28 | Slovenia | Atlético de Madrid | €70.00m |
41 | 42 | Casemiro | Defensive Midfield | 29 | Brazil | Real Madrid | €70.00m |
42 | 43 | Florian Wirtz | Attacking Midfield | 18 | Germany | Bayer 04 Leverkusen | €65.00m |
43 | 44 | Bukayo Saka | Left Midfield | 20 | England | Arsenal FC | €65.00m |
44 | 45 | Federico Valverde | Central Midfield | 23 | Uruguay | Real Madrid | €65.00m |
45 | 46 | Gianluigi Donnarumma | Goalkeeper | 22 | Italy | Paris Saint-Germain | €65.00m |
46 | 47 | Nicolò Barella | Central Midfield | 24 | Italy | Inter Milan | €65.00m |
47 | 48 | Andrew Robertson | Left-Back | 27 | Scotland | Liverpool FC | €65.00m |
48 | 49 | Ansu Fati | Left Winger | 19 | Spain | FC Barcelona | €60.00m |
49 | 50 | Jules Koundé | Centre-Back | 23 | France | Sevilla FC | €60.00m |
In [5]:
df2.to_csv('transfermarkt50_2.csv',index=False , encoding='UTF-8')
CSV 파일 불러오기¶
In [6]:
pd.read_csv('transfermarkt50.csv') #저장된 파일 불러오기
Out[6]:
number | name | position | age | nation | team | value | |
---|---|---|---|---|---|---|---|
0 | 1 | Kylian Mbappé | Centre-Forward | 22 | France | Paris Saint-Germain | €160.00m |
1 | 2 | Erling Haaland | Centre-Forward | 21 | Norway | Borussia Dortmund | €150.00m |
2 | 3 | Harry Kane | Centre-Forward | 28 | England | Tottenham Hotspur | €120.00m |
3 | 4 | Jack Grealish | Left Winger | 26 | England | Manchester City | €100.00m |
4 | 5 | Mohamed Salah | Right Winger | 29 | Egypt | Liverpool FC | €100.00m |
5 | 6 | Romelu Lukaku | Centre-Forward | 28 | Belgium | Chelsea FC | €100.00m |
6 | 7 | Kevin De Bruyne | Attacking Midfield | 30 | Belgium | Manchester City | €100.00m |
7 | 8 | Neymar | Left Winger | 29 | Brazil | Paris Saint-Germain | €100.00m |
8 | 9 | Jadon Sancho | Left Winger | 21 | England | Manchester United | €90.00m |
9 | 10 | Frenkie de Jong | Central Midfield | 24 | Netherlands | FC Barcelona | €90.00m |
10 | 11 | Bruno Fernandes | Attacking Midfield | 27 | Portugal | Manchester United | €90.00m |
11 | 12 | Joshua Kimmich | Defensive Midfield | 26 | Germany | Bayern Munich | €90.00m |
12 | 13 | Raheem Sterling | Left Winger | 26 | England | Manchester City | €90.00m |
13 | 14 | Marcus Rashford | Left Winger | 24 | England | Manchester United | €85.00m |
14 | 15 | Sadio Mané | Left Winger | 29 | Senegal | Liverpool FC | €85.00m |
15 | 16 | Heung-min Son | Left Winger | 29 | Korea, South | Tottenham Hotspur | €85.00m |
16 | 17 | Pedri | Central Midfield | 18 | Spain | FC Barcelona | €80.00m |
17 | 18 | Phil Foden | Central Midfield | 21 | England | Manchester City | €80.00m |
18 | 19 | Lautaro Martínez | Centre-Forward | 24 | Argentina | Inter Milan | €80.00m |
19 | 20 | Marcos Llorente | Central Midfield | 26 | Spain | Atlético de Madrid | €80.00m |
20 | 21 | Lionel Messi | Right Winger | 34 | Argentina | Paris Saint-Germain | €80.00m |
21 | 22 | Mason Mount | Attacking Midfield | 22 | England | Chelsea FC | €75.00m |
22 | 23 | Trent Alexander-Arnold | Right-Back | 23 | England | Liverpool FC | €75.00m |
23 | 24 | Rúben Dias | Centre-Back | 24 | Portugal | Manchester City | €75.00m |
24 | 25 | Marquinhos | Centre-Back | 27 | Brazil | Paris Saint-Germain | €75.00m |
25 | 26 | Jude Bellingham | Central Midfield | 18 | England | Borussia Dortmund | €70.00m |
26 | 27 | João Félix | Second Striker | 21 | Portugal | Atlético de Madrid | €70.00m |
27 | 28 | Alphonso Davies | Left-Back | 21 | Canada | Bayern Munich | €70.00m |
28 | 29 | Achraf Hakimi | Right-Back | 23 | Morocco | Paris Saint-Germain | €70.00m |
29 | 30 | Declan Rice | Defensive Midfield | 22 | England | West Ham United | €70.00m |
30 | 31 | Rodri | Defensive Midfield | 25 | Spain | Manchester City | €70.00m |
31 | 32 | Mikel Oyarzabal | Left Winger | 24 | Spain | Real Sociedad | €70.00m |
32 | 33 | Federico Chiesa | Right Winger | 24 | Italy | Juventus FC | €70.00m |
33 | 34 | Matthijs de Ligt | Centre-Back | 22 | Netherlands | Juventus FC | €70.00m |
34 | 35 | Kai Havertz | Attacking Midfield | 22 | Germany | Chelsea FC | €70.00m |
35 | 36 | Sergej Milinković-Savić | Central Midfield | 26 | Serbia | SS Lazio | €70.00m |
36 | 37 | Bernardo Silva | Attacking Midfield | 27 | Portugal | Manchester City | €70.00m |
37 | 38 | Raphaël Varane | Centre-Back | 28 | France | Manchester United | €70.00m |
38 | 39 | Serge Gnabry | Right Winger | 26 | Germany | Bayern Munich | €70.00m |
39 | 40 | Leon Goretzka | Central Midfield | 26 | Germany | Bayern Munich | €70.00m |
40 | 41 | Jan Oblak | Goalkeeper | 28 | Slovenia | Atlético de Madrid | €70.00m |
41 | 42 | Casemiro | Defensive Midfield | 29 | Brazil | Real Madrid | €70.00m |
42 | 43 | Florian Wirtz | Attacking Midfield | 18 | Germany | Bayer 04 Leverkusen | €65.00m |
43 | 44 | Bukayo Saka | Left Midfield | 20 | England | Arsenal FC | €65.00m |
44 | 45 | Federico Valverde | Central Midfield | 23 | Uruguay | Real Madrid | €65.00m |
45 | 46 | Gianluigi Donnarumma | Goalkeeper | 22 | Italy | Paris Saint-Germain | €65.00m |
46 | 47 | Nicolò Barella | Central Midfield | 24 | Italy | Inter Milan | €65.00m |
47 | 48 | Andrew Robertson | Left-Back | 27 | Scotland | Liverpool FC | €65.00m |
48 | 49 | Ansu Fati | Left Winger | 19 | Spain | FC Barcelona | €60.00m |
49 | 50 | Jules Koundé | Centre-Back | 22 | France | Sevilla FC | €60.00m |
In [7]:
#파일이 웹에 있는 경우, 웹에 있는 파일 불러오기
# pd.read_csv()웹의 주소 그대로 넣기
In [8]:
#데이터프레임을 df에 저장하기
#상대경로 이해 : 하위 폴더에 있는 자료 "./하위폴더/파일이름"
df = pd.read_csv('./data/transfermarkt50_2.csv')
df
Out[8]:
number | name | position | age | nation | team | value | |
---|---|---|---|---|---|---|---|
0 | 1 | Kylian Mbappé | Centre-Forward | 22 | France | Paris Saint-Germain | €160.00m |
1 | 2 | Erling Haaland | Centre-Forward | 21 | Norway | Borussia Dortmund | €150.00m |
2 | 3 | Harry Kane | Centre-Forward | 28 | England | Tottenham Hotspur | €120.00m |
3 | 4 | Jack Grealish | Left Winger | 26 | England | Manchester City | €100.00m |
4 | 5 | Mohamed Salah | Right Winger | 29 | Egypt | Liverpool FC | €100.00m |
5 | 6 | Romelu Lukaku | Centre-Forward | 28 | Belgium | Chelsea FC | €100.00m |
6 | 7 | Kevin De Bruyne | Attacking Midfield | 30 | Belgium | Manchester City | €100.00m |
7 | 8 | Neymar | Left Winger | 29 | Brazil | Paris Saint-Germain | €100.00m |
8 | 9 | Jadon Sancho | Left Winger | 21 | England | Manchester United | €90.00m |
9 | 10 | Frenkie de Jong | Central Midfield | 24 | Netherlands | FC Barcelona | €90.00m |
10 | 11 | Bruno Fernandes | Attacking Midfield | 27 | Portugal | Manchester United | €90.00m |
11 | 12 | Joshua Kimmich | Defensive Midfield | 26 | Germany | Bayern Munich | €90.00m |
12 | 13 | Raheem Sterling | Left Winger | 26 | England | Manchester City | €90.00m |
13 | 14 | Marcus Rashford | Left Winger | 24 | England | Manchester United | €85.00m |
14 | 15 | Sadio Mané | Left Winger | 29 | Senegal | Liverpool FC | €85.00m |
15 | 16 | Heung-min Son | Left Winger | 29 | Korea, South | Tottenham Hotspur | €85.00m |
16 | 17 | Pedri | Central Midfield | 18 | Spain | FC Barcelona | €80.00m |
17 | 18 | Phil Foden | Central Midfield | 21 | England | Manchester City | €80.00m |
18 | 19 | Lautaro Martínez | Centre-Forward | 24 | Argentina | Inter Milan | €80.00m |
19 | 20 | Marcos Llorente | Central Midfield | 26 | Spain | Atlético de Madrid | €80.00m |
20 | 21 | Lionel Messi | Right Winger | 34 | Argentina | Paris Saint-Germain | €80.00m |
21 | 22 | Mason Mount | Attacking Midfield | 22 | England | Chelsea FC | €75.00m |
22 | 23 | Trent Alexander-Arnold | Right-Back | 23 | England | Liverpool FC | €75.00m |
23 | 24 | Rúben Dias | Centre-Back | 24 | Portugal | Manchester City | €75.00m |
24 | 25 | Marquinhos | Centre-Back | 27 | Brazil | Paris Saint-Germain | €75.00m |
25 | 26 | Jude Bellingham | Central Midfield | 18 | England | Borussia Dortmund | €70.00m |
26 | 27 | João Félix | Second Striker | 22 | Portugal | Atlético de Madrid | €70.00m |
27 | 28 | Alphonso Davies | Left-Back | 21 | Canada | Bayern Munich | €70.00m |
28 | 29 | Achraf Hakimi | Right-Back | 23 | Morocco | Paris Saint-Germain | €70.00m |
29 | 30 | Declan Rice | Defensive Midfield | 22 | England | West Ham United | €70.00m |
30 | 31 | Rodri | Defensive Midfield | 25 | Spain | Manchester City | €70.00m |
31 | 32 | Mikel Oyarzabal | Left Winger | 24 | Spain | Real Sociedad | €70.00m |
32 | 33 | Federico Chiesa | Right Winger | 24 | Italy | Juventus FC | €70.00m |
33 | 34 | Matthijs de Ligt | Centre-Back | 22 | Netherlands | Juventus FC | €70.00m |
34 | 35 | Kai Havertz | Attacking Midfield | 22 | Germany | Chelsea FC | €70.00m |
35 | 36 | Sergej Milinković-Savić | Central Midfield | 26 | Serbia | SS Lazio | €70.00m |
36 | 37 | Bernardo Silva | Attacking Midfield | 27 | Portugal | Manchester City | €70.00m |
37 | 38 | Raphaël Varane | Centre-Back | 28 | France | Manchester United | €70.00m |
38 | 39 | Serge Gnabry | Right Winger | 26 | Germany | Bayern Munich | €70.00m |
39 | 40 | Leon Goretzka | Central Midfield | 26 | Germany | Bayern Munich | €70.00m |
40 | 41 | Jan Oblak | Goalkeeper | 28 | Slovenia | Atlético de Madrid | €70.00m |
41 | 42 | Casemiro | Defensive Midfield | 29 | Brazil | Real Madrid | €70.00m |
42 | 43 | Florian Wirtz | Attacking Midfield | 18 | Germany | Bayer 04 Leverkusen | €65.00m |
43 | 44 | Bukayo Saka | Left Midfield | 20 | England | Arsenal FC | €65.00m |
44 | 45 | Federico Valverde | Central Midfield | 23 | Uruguay | Real Madrid | €65.00m |
45 | 46 | Gianluigi Donnarumma | Goalkeeper | 22 | Italy | Paris Saint-Germain | €65.00m |
46 | 47 | Nicolò Barella | Central Midfield | 24 | Italy | Inter Milan | €65.00m |
47 | 48 | Andrew Robertson | Left-Back | 27 | Scotland | Liverpool FC | €65.00m |
48 | 49 | Ansu Fati | Left Winger | 19 | Spain | FC Barcelona | €60.00m |
49 | 50 | Jules Koundé | Centre-Back | 23 | France | Sevilla FC | €60.00m |
DataFrame 뜯어보기와 인덱싱¶
In [9]:
#shape은 함수가 아닌 변수 -> shape 뒤에 괄호 없다.
df.shape
Out[9]:
(50, 7)
In [10]:
#언패킹 ,데이터를 처음 가져왔을 경우, 데이터의 크기 알아보기
(rows,columns) = df.shape
In [11]:
rows
Out[11]:
50
In [12]:
columns
Out[12]:
7
In [13]:
df.info()
#non-null = 결측치가 없음, object = 문자
<class 'pandas.core.frame.DataFrame'> RangeIndex: 50 entries, 0 to 49 Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 number 50 non-null int64 1 name 50 non-null object 2 position 50 non-null object 3 age 50 non-null int64 4 nation 50 non-null object 5 team 50 non-null object 6 value 50 non-null object dtypes: int64(2), object(5) memory usage: 2.9+ KB
In [14]:
df.head() #앞 5개
Out[14]:
number | name | position | age | nation | team | value | |
---|---|---|---|---|---|---|---|
0 | 1 | Kylian Mbappé | Centre-Forward | 22 | France | Paris Saint-Germain | €160.00m |
1 | 2 | Erling Haaland | Centre-Forward | 21 | Norway | Borussia Dortmund | €150.00m |
2 | 3 | Harry Kane | Centre-Forward | 28 | England | Tottenham Hotspur | €120.00m |
3 | 4 | Jack Grealish | Left Winger | 26 | England | Manchester City | €100.00m |
4 | 5 | Mohamed Salah | Right Winger | 29 | Egypt | Liverpool FC | €100.00m |
In [15]:
df.tail() #뒤 5개
Out[15]:
number | name | position | age | nation | team | value | |
---|---|---|---|---|---|---|---|
45 | 46 | Gianluigi Donnarumma | Goalkeeper | 22 | Italy | Paris Saint-Germain | €65.00m |
46 | 47 | Nicolò Barella | Central Midfield | 24 | Italy | Inter Milan | €65.00m |
47 | 48 | Andrew Robertson | Left-Back | 27 | Scotland | Liverpool FC | €65.00m |
48 | 49 | Ansu Fati | Left Winger | 19 | Spain | FC Barcelona | €60.00m |
49 | 50 | Jules Koundé | Centre-Back | 23 | France | Sevilla FC | €60.00m |
In [16]:
#df[:]인덱싱
df[0:5] #==df.head()
Out[16]:
number | name | position | age | nation | team | value | |
---|---|---|---|---|---|---|---|
0 | 1 | Kylian Mbappé | Centre-Forward | 22 | France | Paris Saint-Germain | €160.00m |
1 | 2 | Erling Haaland | Centre-Forward | 21 | Norway | Borussia Dortmund | €150.00m |
2 | 3 | Harry Kane | Centre-Forward | 28 | England | Tottenham Hotspur | €120.00m |
3 | 4 | Jack Grealish | Left Winger | 26 | England | Manchester City | €100.00m |
4 | 5 | Mohamed Salah | Right Winger | 29 | Egypt | Liverpool FC | €100.00m |
In [17]:
df[10:21] #11위부터 21위
Out[17]:
number | name | position | age | nation | team | value | |
---|---|---|---|---|---|---|---|
10 | 11 | Bruno Fernandes | Attacking Midfield | 27 | Portugal | Manchester United | €90.00m |
11 | 12 | Joshua Kimmich | Defensive Midfield | 26 | Germany | Bayern Munich | €90.00m |
12 | 13 | Raheem Sterling | Left Winger | 26 | England | Manchester City | €90.00m |
13 | 14 | Marcus Rashford | Left Winger | 24 | England | Manchester United | €85.00m |
14 | 15 | Sadio Mané | Left Winger | 29 | Senegal | Liverpool FC | €85.00m |
15 | 16 | Heung-min Son | Left Winger | 29 | Korea, South | Tottenham Hotspur | €85.00m |
16 | 17 | Pedri | Central Midfield | 18 | Spain | FC Barcelona | €80.00m |
17 | 18 | Phil Foden | Central Midfield | 21 | England | Manchester City | €80.00m |
18 | 19 | Lautaro Martínez | Centre-Forward | 24 | Argentina | Inter Milan | €80.00m |
19 | 20 | Marcos Llorente | Central Midfield | 26 | Spain | Atlético de Madrid | €80.00m |
20 | 21 | Lionel Messi | Right Winger | 34 | Argentina | Paris Saint-Germain | €80.00m |
In [18]:
#컬럼 이름 선택하기 df['컬럼이름']
#'number','name','nation' 정보를 보여주기
df[['name','name','nation']].head()
Out[18]:
name | name | nation | |
---|---|---|---|
0 | Kylian Mbappé | Kylian Mbappé | France |
1 | Erling Haaland | Erling Haaland | Norway |
2 | Harry Kane | Harry Kane | England |
3 | Jack Grealish | Jack Grealish | England |
4 | Mohamed Salah | Mohamed Salah | Egypt |
iloc와 loc실습¶
In [19]:
df.iloc[0:2]
#df[0:2]와 같다
Out[19]:
number | name | position | age | nation | team | value | |
---|---|---|---|---|---|---|---|
0 | 1 | Kylian Mbappé | Centre-Forward | 22 | France | Paris Saint-Germain | €160.00m |
1 | 2 | Erling Haaland | Centre-Forward | 21 | Norway | Borussia Dortmund | €150.00m |
In [20]:
#loc는 인덱스 숫자(문자)를 기준으로 한다. 즉, 마지막 숫자(문자)도 포함
df.loc[0:2]
Out[20]:
number | name | position | age | nation | team | value | |
---|---|---|---|---|---|---|---|
0 | 1 | Kylian Mbappé | Centre-Forward | 22 | France | Paris Saint-Germain | €160.00m |
1 | 2 | Erling Haaland | Centre-Forward | 21 | Norway | Borussia Dortmund | €150.00m |
2 | 3 | Harry Kane | Centre-Forward | 28 | England | Tottenham Hotspur | €120.00m |
In [21]:
#쉼표를 기준으로 행과 열 표시 df.loc[행이름, 열이름]
#첫번째 행의 , 이름이 'name'인 값
df.loc[0,'name'] #0행에서 name이라는 열을 가져옴
Out[21]:
'Kylian Mbappé'
In [22]:
#행은 처음부터 5까지, 열은 'name', 'team' , 'value'인 값
df.loc[0:5,['name','team','value']]
Out[22]:
name | team | value | |
---|---|---|---|
0 | Kylian Mbappé | Paris Saint-Germain | €160.00m |
1 | Erling Haaland | Borussia Dortmund | €150.00m |
2 | Harry Kane | Tottenham Hotspur | €120.00m |
3 | Jack Grealish | Manchester City | €100.00m |
4 | Mohamed Salah | Liverpool FC | €100.00m |
5 | Romelu Lukaku | Chelsea FC | €100.00m |
In [23]:
#조건 인덱싱
df['age']<=20 #age 칼럼 이하인 것들중 20이하인 것
Out[23]:
0 False 1 False 2 False 3 False 4 False 5 False 6 False 7 False 8 False 9 False 10 False 11 False 12 False 13 False 14 False 15 False 16 True 17 False 18 False 19 False 20 False 21 False 22 False 23 False 24 False 25 True 26 False 27 False 28 False 29 False 30 False 31 False 32 False 33 False 34 False 35 False 36 False 37 False 38 False 39 False 40 False 41 False 42 True 43 True 44 False 45 False 46 False 47 False 48 True 49 False Name: age, dtype: bool
In [24]:
#값을 보기위해선
df[df['age']<=20] #나이가 20이하인 선수
Out[24]:
number | name | position | age | nation | team | value | |
---|---|---|---|---|---|---|---|
16 | 17 | Pedri | Central Midfield | 18 | Spain | FC Barcelona | €80.00m |
25 | 26 | Jude Bellingham | Central Midfield | 18 | England | Borussia Dortmund | €70.00m |
42 | 43 | Florian Wirtz | Attacking Midfield | 18 | Germany | Bayer 04 Leverkusen | €65.00m |
43 | 44 | Bukayo Saka | Left Midfield | 20 | England | Arsenal FC | €65.00m |
48 | 49 | Ansu Fati | Left Winger | 19 | Spain | FC Barcelona | €60.00m |
In [25]:
#소속팀이 토트넘인 선수
df[df['team']=='Tottenham Hotspur']
Out[25]:
number | name | position | age | nation | team | value | |
---|---|---|---|---|---|---|---|
2 | 3 | Harry Kane | Centre-Forward | 28 | England | Tottenham Hotspur | €120.00m |
15 | 16 | Heung-min Son | Left Winger | 29 | Korea, South | Tottenham Hotspur | €85.00m |
돌발퀴즈) loc조건으로 나이가 30 이상인 선수의 'name'과 'value'가져오기¶
In [26]:
df.loc[df['age']>=30,['name','value']]
Out[26]:
name | value | |
---|---|---|
6 | Kevin De Bruyne | €100.00m |
20 | Lionel Messi | €80.00m |
DataFrame 정렬하기와 컬럼바꾸기¶
In [27]:
#인덱스로 정렬하기 df.sort_index()
df.head()
df.sort_index().head()
Out[27]:
number | name | position | age | nation | team | value | |
---|---|---|---|---|---|---|---|
0 | 1 | Kylian Mbappé | Centre-Forward | 22 | France | Paris Saint-Germain | €160.00m |
1 | 2 | Erling Haaland | Centre-Forward | 21 | Norway | Borussia Dortmund | €150.00m |
2 | 3 | Harry Kane | Centre-Forward | 28 | England | Tottenham Hotspur | €120.00m |
3 | 4 | Jack Grealish | Left Winger | 26 | England | Manchester City | €100.00m |
4 | 5 | Mohamed Salah | Right Winger | 29 | Egypt | Liverpool FC | €100.00m |
In [28]:
# 내림차 순으로 정렬하기 (ascending = False)
df.sort_index(ascending=False)[:5]
Out[28]:
number | name | position | age | nation | team | value | |
---|---|---|---|---|---|---|---|
49 | 50 | Jules Koundé | Centre-Back | 23 | France | Sevilla FC | €60.00m |
48 | 49 | Ansu Fati | Left Winger | 19 | Spain | FC Barcelona | €60.00m |
47 | 48 | Andrew Robertson | Left-Back | 27 | Scotland | Liverpool FC | €65.00m |
46 | 47 | Nicolò Barella | Central Midfield | 24 | Italy | Inter Milan | €65.00m |
45 | 46 | Gianluigi Donnarumma | Goalkeeper | 22 | Italy | Paris Saint-Germain | €65.00m |
In [29]:
# sort_values로 정렬하기 df.sort_values(칼럼이름)
# 나이 많은 선수 10명 보여주기
df.sort_values('age').head()
Out[29]:
number | name | position | age | nation | team | value | |
---|---|---|---|---|---|---|---|
16 | 17 | Pedri | Central Midfield | 18 | Spain | FC Barcelona | €80.00m |
42 | 43 | Florian Wirtz | Attacking Midfield | 18 | Germany | Bayer 04 Leverkusen | €65.00m |
25 | 26 | Jude Bellingham | Central Midfield | 18 | England | Borussia Dortmund | €70.00m |
48 | 49 | Ansu Fati | Left Winger | 19 | Spain | FC Barcelona | €60.00m |
43 | 44 | Bukayo Saka | Left Midfield | 20 | England | Arsenal FC | €65.00m |
In [30]:
df.sort_values('age',ascending=False).head()
#내림차순
Out[30]:
number | name | position | age | nation | team | value | |
---|---|---|---|---|---|---|---|
20 | 21 | Lionel Messi | Right Winger | 34 | Argentina | Paris Saint-Germain | €80.00m |
6 | 7 | Kevin De Bruyne | Attacking Midfield | 30 | Belgium | Manchester City | €100.00m |
15 | 16 | Heung-min Son | Left Winger | 29 | Korea, South | Tottenham Hotspur | €85.00m |
14 | 15 | Sadio Mané | Left Winger | 29 | Senegal | Liverpool FC | €85.00m |
4 | 5 | Mohamed Salah | Right Winger | 29 | Egypt | Liverpool FC | €100.00m |
In [31]:
#인덱스를 컬럼 이름으로 바꾸기 df.set_index('칼럼이름')
#number로 인덱스 바꿔보기
df.set_index('number').head()
Out[31]:
name | position | age | nation | team | value | |
---|---|---|---|---|---|---|
number | ||||||
1 | Kylian Mbappé | Centre-Forward | 22 | France | Paris Saint-Germain | €160.00m |
2 | Erling Haaland | Centre-Forward | 21 | Norway | Borussia Dortmund | €150.00m |
3 | Harry Kane | Centre-Forward | 28 | England | Tottenham Hotspur | €120.00m |
4 | Jack Grealish | Left Winger | 26 | England | Manchester City | €100.00m |
5 | Mohamed Salah | Right Winger | 29 | Egypt | Liverpool FC | €100.00m |
In [32]:
#컬럼이름 바꾸고 저장하기 : 'team'을 'club'으로 바꾸기 위해 cheat cheet에서 검색
df.rename(columns = {'team':'club'},inplace = True)
#inplace = True 하면 저장이 된다
df.head(3)
Out[32]:
number | name | position | age | nation | club | value | |
---|---|---|---|---|---|---|---|
0 | 1 | Kylian Mbappé | Centre-Forward | 22 | France | Paris Saint-Germain | €160.00m |
1 | 2 | Erling Haaland | Centre-Forward | 21 | Norway | Borussia Dortmund | €150.00m |
2 | 3 | Harry Kane | Centre-Forward | 28 | England | Tottenham Hotspur | €120.00m |
데이터 전처리 (pre-processing)¶
In [33]:
# value 값에서 불필요한 문자는 없애고 데이터 타입을 숫자로 바꾸기
# €160.00m -> 160.00
# 어떻게? 검색해서... 예) 판다스 특수문자 제거
df['value'] = df['value'].str.replace('€','')
df['value'] = df['value'].str.replace('m','').astype('float64')
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 50 entries, 0 to 49 Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 number 50 non-null int64 1 name 50 non-null object 2 position 50 non-null object 3 age 50 non-null int64 4 nation 50 non-null object 5 club 50 non-null object 6 value 50 non-null float64 dtypes: float64(1), int64(2), object(4) memory usage: 2.9+ KB
In [34]:
df.head()
Out[34]:
number | name | position | age | nation | club | value | |
---|---|---|---|---|---|---|---|
0 | 1 | Kylian Mbappé | Centre-Forward | 22 | France | Paris Saint-Germain | 160.0 |
1 | 2 | Erling Haaland | Centre-Forward | 21 | Norway | Borussia Dortmund | 150.0 |
2 | 3 | Harry Kane | Centre-Forward | 28 | England | Tottenham Hotspur | 120.0 |
3 | 4 | Jack Grealish | Left Winger | 26 | England | Manchester City | 100.0 |
4 | 5 | Mohamed Salah | Right Winger | 29 | Egypt | Liverpool FC | 100.0 |
In [35]:
#컬럼 생성
#시장가치는 단위가 백만유로인데... 13을 곱해 한화로 억원으로 만들기
df['시장가치(억)'] = df['value']*13 #새로운 column 생성
df.head()
Out[35]:
number | name | position | age | nation | club | value | 시장가치(억) | |
---|---|---|---|---|---|---|---|---|
0 | 1 | Kylian Mbappé | Centre-Forward | 22 | France | Paris Saint-Germain | 160.0 | 2080.0 |
1 | 2 | Erling Haaland | Centre-Forward | 21 | Norway | Borussia Dortmund | 150.0 | 1950.0 |
2 | 3 | Harry Kane | Centre-Forward | 28 | England | Tottenham Hotspur | 120.0 | 1560.0 |
3 | 4 | Jack Grealish | Left Winger | 26 | England | Manchester City | 100.0 | 1300.0 |
4 | 5 | Mohamed Salah | Right Winger | 29 | Egypt | Liverpool FC | 100.0 | 1300.0 |
In [36]:
#column 삭제 df.drop(coulmns = ['칼럼이름'])
df.drop(columns = ['value'], inplace = True)
df.head()
Out[36]:
number | name | position | age | nation | club | 시장가치(억) | |
---|---|---|---|---|---|---|---|
0 | 1 | Kylian Mbappé | Centre-Forward | 22 | France | Paris Saint-Germain | 2080.0 |
1 | 2 | Erling Haaland | Centre-Forward | 21 | Norway | Borussia Dortmund | 1950.0 |
2 | 3 | Harry Kane | Centre-Forward | 28 | England | Tottenham Hotspur | 1560.0 |
3 | 4 | Jack Grealish | Left Winger | 26 | England | Manchester City | 1300.0 |
4 | 5 | Mohamed Salah | Right Winger | 29 | Egypt | Liverpool FC | 1300.0 |
In [37]:
#값만 가져오기
for info in player_info:
player = info.find_all('td')
number = player[0].text
name = player[3].text
position = player[4].text
age = player[5].text
nation = player[6].img['alt']
team = player[7].img['alt']
value = player[8].text.strip()
value = value[1:-1]
print(value)
70.00 70.00 70.00 70.00 70.00 70.00 70.00 70.00 70.00 70.00 70.00 70.00 70.00 70.00 70.00 70.00 70.00 65.00 65.00 65.00 65.00 65.00 65.00 60.00 60.00
DataFrame 통계분석과 groupby()¶
In [38]:
#숫자형 데이터에 대한 통계
df.describe()
Out[38]:
number | age | 시장가치(억) | |
---|---|---|---|
count | 50.00000 | 50.000000 | 50.000000 |
mean | 25.50000 | 24.540000 | 1049.100000 |
std | 14.57738 | 3.477097 | 258.673401 |
min | 1.00000 | 18.000000 | 780.000000 |
25% | 13.25000 | 22.000000 | 910.000000 |
50% | 25.50000 | 24.000000 | 942.500000 |
75% | 37.75000 | 27.000000 | 1153.750000 |
max | 50.00000 | 34.000000 | 2080.000000 |
In [42]:
# df[칼럼이름].mean()
# 나이평균 구하기
df['age'].mean()
Out[42]:
24.54
In [43]:
# 몸값 합계 구하기
df['시장가치(억)'].sum()
Out[43]:
52455.0
In [45]:
#선수들이 속한 가장 많은 나라는? 최빈값 mode()
df['nation'].mode()
Out[45]:
0 England dtype: object
In [48]:
#국적이 Brazil인 선수들은?
df[df['nation']=='Brazil']
Out[48]:
number | name | position | age | nation | club | 시장가치(억) | |
---|---|---|---|---|---|---|---|
7 | 8 | Neymar | Left Winger | 29 | Brazil | Paris Saint-Germain | 1300.0 |
24 | 25 | Marquinhos | Centre-Back | 27 | Brazil | Paris Saint-Germain | 975.0 |
41 | 42 | Casemiro | Defensive Midfield | 29 | Brazil | Real Madrid | 910.0 |
In [49]:
#국적이 England인 선수들은?
df[df['nation']=='England']
Out[49]:
number | name | position | age | nation | club | 시장가치(억) | |
---|---|---|---|---|---|---|---|
2 | 3 | Harry Kane | Centre-Forward | 28 | England | Tottenham Hotspur | 1560.0 |
3 | 4 | Jack Grealish | Left Winger | 26 | England | Manchester City | 1300.0 |
8 | 9 | Jadon Sancho | Left Winger | 21 | England | Manchester United | 1170.0 |
12 | 13 | Raheem Sterling | Left Winger | 26 | England | Manchester City | 1170.0 |
13 | 14 | Marcus Rashford | Left Winger | 24 | England | Manchester United | 1105.0 |
17 | 18 | Phil Foden | Central Midfield | 21 | England | Manchester City | 1040.0 |
21 | 22 | Mason Mount | Attacking Midfield | 22 | England | Chelsea FC | 975.0 |
22 | 23 | Trent Alexander-Arnold | Right-Back | 23 | England | Liverpool FC | 975.0 |
25 | 26 | Jude Bellingham | Central Midfield | 18 | England | Borussia Dortmund | 910.0 |
29 | 30 | Declan Rice | Defensive Midfield | 22 | England | West Ham United | 910.0 |
43 | 44 | Bukayo Saka | Left Midfield | 20 | England | Arsenal FC | 845.0 |
groupby()¶
In [50]:
#데이터를 그룹으로 묶어 분석
df.groupby('nation')
Out[50]:
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x00000194834874C0>
In [51]:
g = df.groupby('nation')
g.size()
Out[51]:
nation Argentina 2 Belgium 2 Brazil 3 Canada 1 Egypt 1 England 11 France 3 Germany 5 Italy 3 Korea, South 1 Morocco 1 Netherlands 2 Norway 1 Portugal 4 Scotland 1 Senegal 1 Serbia 1 Slovenia 1 Spain 5 Uruguay 1 dtype: int64
In [54]:
# count로도 가능
g.count()
Out[54]:
number | name | position | age | club | 시장가치(억) | |
---|---|---|---|---|---|---|
nation | ||||||
Argentina | 2 | 2 | 2 | 2 | 2 | 2 |
Belgium | 2 | 2 | 2 | 2 | 2 | 2 |
Brazil | 3 | 3 | 3 | 3 | 3 | 3 |
Canada | 1 | 1 | 1 | 1 | 1 | 1 |
Egypt | 1 | 1 | 1 | 1 | 1 | 1 |
England | 11 | 11 | 11 | 11 | 11 | 11 |
France | 3 | 3 | 3 | 3 | 3 | 3 |
Germany | 5 | 5 | 5 | 5 | 5 | 5 |
Italy | 3 | 3 | 3 | 3 | 3 | 3 |
Korea, South | 1 | 1 | 1 | 1 | 1 | 1 |
Morocco | 1 | 1 | 1 | 1 | 1 | 1 |
Netherlands | 2 | 2 | 2 | 2 | 2 | 2 |
Norway | 1 | 1 | 1 | 1 | 1 | 1 |
Portugal | 4 | 4 | 4 | 4 | 4 | 4 |
Scotland | 1 | 1 | 1 | 1 | 1 | 1 |
Senegal | 1 | 1 | 1 | 1 | 1 | 1 |
Serbia | 1 | 1 | 1 | 1 | 1 | 1 |
Slovenia | 1 | 1 | 1 | 1 | 1 | 1 |
Spain | 5 | 5 | 5 | 5 | 5 | 5 |
Uruguay | 1 | 1 | 1 | 1 | 1 | 1 |
In [55]:
# 수치형 데이터 총하 알아보기
g.sum()
Out[55]:
number | age | 시장가치(억) | |
---|---|---|---|
nation | |||
Argentina | 40 | 58 | 2080.0 |
Belgium | 13 | 58 | 2600.0 |
Brazil | 75 | 85 | 3185.0 |
Canada | 28 | 21 | 910.0 |
Egypt | 5 | 29 | 1300.0 |
England | 206 | 251 | 11960.0 |
France | 89 | 73 | 3770.0 |
Germany | 169 | 118 | 4745.0 |
Italy | 126 | 70 | 2600.0 |
Korea, South | 16 | 29 | 1105.0 |
Morocco | 29 | 23 | 910.0 |
Netherlands | 44 | 46 | 2080.0 |
Norway | 2 | 21 | 1950.0 |
Portugal | 99 | 100 | 3965.0 |
Scotland | 48 | 27 | 845.0 |
Senegal | 15 | 29 | 1105.0 |
Serbia | 36 | 26 | 910.0 |
Slovenia | 41 | 28 | 910.0 |
Spain | 149 | 112 | 4680.0 |
Uruguay | 45 | 23 | 845.0 |
In [57]:
# 나라별 시장가치 총합
g['시장가치(억)'].sum()
Out[57]:
nation Argentina 2080.0 Belgium 2600.0 Brazil 3185.0 Canada 910.0 Egypt 1300.0 England 11960.0 France 3770.0 Germany 4745.0 Italy 2600.0 Korea, South 1105.0 Morocco 910.0 Netherlands 2080.0 Norway 1950.0 Portugal 3965.0 Scotland 845.0 Senegal 1105.0 Serbia 910.0 Slovenia 910.0 Spain 4680.0 Uruguay 845.0 Name: 시장가치(억), dtype: float64
퀴즈) 선수들의 몸값의 합이 큰 클럽별로 정렬해서 보여주기¶
In [67]:
club_group = df.groupby('club')
club_group['시장가치(억)'].sum().sort_values(ascending=False)
Out[67]:
club Manchester City 7605.0 Paris Saint-Germain 7150.0 Manchester United 4355.0 Liverpool FC 4225.0 Bayern Munich 3900.0 Chelsea FC 3185.0 FC Barcelona 2990.0 Borussia Dortmund 2860.0 Atlético de Madrid 2860.0 Tottenham Hotspur 2665.0 Inter Milan 1885.0 Juventus FC 1820.0 Real Madrid 1755.0 SS Lazio 910.0 West Ham United 910.0 Real Sociedad 910.0 Bayer 04 Leverkusen 845.0 Arsenal FC 845.0 Sevilla FC 780.0 Name: 시장가치(억), dtype: float64
'빅데이터 스터디' 카테고리의 다른 글
빅데이터 스터디 마지막 챕터 - 미니프로젝트 - 넷플릭스 드라마 순위 크롤링 후 분석 (0) | 2021.11.29 |
---|---|
웹크롤링과 데이터분석 : 전세계 축구 선수 몸값 분석-3 (실전 크롤링) (0) | 2021.11.08 |
웹크롤링과 데이터분석 : 전세계 축구 선수 몸값 분석-2 (웹 크롤링 연습) (0) | 2021.11.08 |
웹크롤링과 데이터분석 : 전세계 축구 선수 몸값 분석-1 (기초 개념) (0) | 2021.11.08 |
프로세스 마이닝을 활용한 고객여정분석 - 4 (0) | 2021.10.28 |