Pandas 데이터분석 기초실습 -3
2021. 9. 22. 17:16ㆍ빅데이터 스터디
데이터 프레임 파일로 저장하기¶
In [1]:
import pandas as pd
In [3]:
friends = [
{'name':'Jone','age':20,'job':'student'},
{'name':'Jenny','age':30,'job':None},
{'name':'Nate','age':30,'job':'teacher'}
]
df = pd.DataFrame(friends)
df = df[['name','age','job']]
df.head()
Out[3]:
name | age | job | |
---|---|---|---|
0 | Jone | 20 | student |
1 | Jenny | 30 | None |
2 | Nate | 30 | teacher |
In [6]:
df.to_csv('friends.csv',index= True,header =True)
#index = True, header =True 는 default로 설정 되어 있음
In [8]:
df.to_csv('friends.csv',index = True, header = False)
#header 없이 저장
In [9]:
df.to_csv('friends.csv',index = False)
#index 없이 저장
In [11]:
df.to_csv('friends.csv',index = False, header = False,na_rep = '-')
#na replace '-'
데이터프레임 행 , 열 선택 및 필터 하기¶
In [14]:
import pandas as pd
from collections import OrderedDict
In [24]:
friend_ordered_dict =OrderedDict([
('name',['John','Jenny','Nate']),
('age',[20,30,30]),
('job',['student','developer','teacher'])
])
df = pd.DataFrame.from_dict(friend_ordered_dict)
df.head()
Out[24]:
name | age | job | |
---|---|---|---|
0 | John | 20 | student |
1 | Jenny | 30 | developer |
2 | Nate | 30 | teacher |
In [25]:
#Jenny와 Nate만 추출
df[1:3]
Out[25]:
name | age | job | |
---|---|---|---|
1 | Jenny | 30 | developer |
2 | Nate | 30 | teacher |
In [26]:
df = df[1:3]
df #추출값 적용
Out[26]:
name | age | job | |
---|---|---|---|
1 | Jenny | 30 | developer |
2 | Nate | 30 | teacher |
In [28]:
friend_ordered_dict =OrderedDict([
('name',['John','Jenny','Nate']),
('age',[20,30,30]),
('job',['student','developer','teacher'])
])
df = pd.DataFrame.from_dict(friend_ordered_dict)
John과 Nate 만 원할때 loc함수¶
In [30]:
df.loc[[0,2]]
Out[30]:
name | age | job | |
---|---|---|---|
0 | John | 20 | student |
2 | Nate | 30 | teacher |
In [32]:
df = df.loc[[0,2]]
df
Out[32]:
name | age | job | |
---|---|---|---|
0 | John | 20 | student |
2 | Nate | 30 | teacher |
by column condition¶
In [34]:
friend_ordered_dict =OrderedDict([
('name',['John','Jenny','Nate']),
('age',[20,30,30]),
('job',['student','developer','teacher'])
])
df = pd.DataFrame.from_dict(friend_ordered_dict)
In [36]:
df[df.age > 25]
Out[36]:
name | age | job | |
---|---|---|---|
1 | Jenny | 30 | developer |
2 | Nate | 30 | teacher |
In [39]:
df.query('age>25') #df[df.age > 25]와 같은결과
Out[39]:
name | age | job | |
---|---|---|---|
1 | Jenny | 30 | developer |
2 | Nate | 30 | teacher |
In [43]:
df[(df.age>25) & (df.name == 'Nate')] #25세 이상, 이름이 Nate
Out[43]:
name | age | job | |
---|---|---|---|
2 | Nate | 30 | teacher |
Filter Column¶
by Index¶
In [52]:
friend_list = [
['John',20,'student'],
['Jenny',30,'developer'],
['Nate',30,'teacher']
]
df = pd.DataFrame.from_records(friend_list)
df
Out[52]:
0 | 1 | 2 | |
---|---|---|---|
0 | John | 20 | student |
1 | Jenny | 30 | developer |
2 | Nate | 30 | teacher |
In [47]:
df.iloc[:,0:2] #모든 row 의 0~1번 column
Out[47]:
0 | 1 | |
---|---|---|
0 | John | 20 |
1 | Jenny | 30 |
2 | Nate | 30 |
In [54]:
df.iloc[0:2,0:2]
Out[54]:
0 | 1 | |
---|---|---|
0 | John | 20 |
1 | Jenny | 30 |
by column name¶
In [57]:
df = pd.read_csv('data/friend_list_no_head.csv',header =None,names =['name','age','job'])
df
Out[57]:
name | age | job | |
---|---|---|---|
0 | John | 20 | student |
1 | Jenny | 30 | developer |
2 | Nate | 30 | teacher |
3 | Julia | 40 | dentist |
4 | Brian | 45 | manager |
5 | Chris | 25 | intern |
6 | BoBo | 6 | Dog |
7 | Sol | 1 | Dog |
In [60]:
#job column이 필요없을경우 ,job filter out
df_filtered = df[['name','age']]
In [62]:
df_filtered
Out[62]:
name | age | |
---|---|---|
0 | John | 20 |
1 | Jenny | 30 |
2 | Nate | 30 |
3 | Julia | 40 |
4 | Brian | 45 |
5 | Chris | 25 |
6 | BoBo | 6 |
7 | Sol | 1 |
In [64]:
df.filter(items =['age','job']) #age와 job만 들은것 원함
Out[64]:
age | job | |
---|---|---|
0 | 20 | student |
1 | 30 | developer |
2 | 30 | teacher |
3 | 40 | dentist |
4 | 45 | manager |
5 | 25 | intern |
6 | 6 | Dog |
7 | 1 | Dog |
In [65]:
df
Out[65]:
name | age | job | |
---|---|---|---|
0 | John | 20 | student |
1 | Jenny | 30 | developer |
2 | Nate | 30 | teacher |
3 | Julia | 40 | dentist |
4 | Brian | 45 | manager |
5 | Chris | 25 | intern |
6 | BoBo | 6 | Dog |
7 | Sol | 1 | Dog |
In [67]:
#column name에 a 가 들어간 것만 원한다
df.filter(like = 'a',axis =1)
Out[67]:
name | age | |
---|---|---|
0 | John | 20 |
1 | Jenny | 30 |
2 | Nate | 30 |
3 | Julia | 40 |
4 | Brian | 45 |
5 | Chris | 25 |
6 | BoBo | 6 |
7 | Sol | 1 |
In [68]:
df
Out[68]:
name | age | job | |
---|---|---|---|
0 | John | 20 | student |
1 | Jenny | 30 | developer |
2 | Nate | 30 | teacher |
3 | Julia | 40 | dentist |
4 | Brian | 45 | manager |
5 | Chris | 25 | intern |
6 | BoBo | 6 | Dog |
7 | Sol | 1 | Dog |
In [72]:
#regex 사용(정규식)
#column name이 b로 끝나는 것만 추출
df.filter(regex = 'b$',axis = 1 )
Out[72]:
job | |
---|---|
0 | student |
1 | developer |
2 | teacher |
3 | dentist |
4 | manager |
5 | intern |
6 | Dog |
7 | Dog |
'빅데이터 스터디' 카테고리의 다른 글
Pandas 데이터분석 기초 실습 -5 (0) | 2021.09.22 |
---|---|
Pandas 데이터분석 기초 실습 -4 (0) | 2021.09.22 |
Pandas 데이터분석 기초 실습 -2 (0) | 2021.09.22 |
Pandas 데이터 분석 기초 실습-1 (0) | 2021.09.22 |
30분 요약 강좌 시즌2 : Python 활용편 섹션6-폴리움 (0) | 2021.09.17 |