In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn
In [46]:
insta = pd.read_csv("./Datas/instagram_global_top_1000.csv")
In [3]:
insta.head(5)
Out[3]:
Country | Rank | Account | Title | Link | Category | Followers | Audience Country | Authentic engagement | Engagement avg | Scraped | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | All | 1 | cristiano | Cristiano Ronaldo | https://www.instagram.com/cristiano/ | Sports with a ball | 400100000.0 | India | 7800000.0 | 9500000.0 | 2022-02-07 16:50:24.798803 |
1 | All | 2 | kyliejenner | Kylie 🤍 | https://www.instagram.com/kyliejenner/ | Fashion|Modeling|Beauty | 308800000.0 | United States | 6200000.0 | 10100000.0 | 2022-02-07 16:50:24.798803 |
2 | All | 3 | leomessi | Leo Messi | https://www.instagram.com/leomessi/ | Sports with a ball|Family | 306300000.0 | Argentina | 4800000.0 | 6500000.0 | 2022-02-07 16:50:24.798803 |
3 | All | 4 | kendalljenner | Kendall | https://www.instagram.com/kendalljenner/ | Modeling|Fashion | 217800000.0 | United States | 3400000.0 | 5400000.0 | 2022-02-07 16:50:24.798803 |
4 | All | 5 | selenagomez | Selena Gomez | https://www.instagram.com/selenagomez/ | Music|Lifestyle | 295800000.0 | United States | 2700000.0 | 3600000.0 | 2022-02-07 16:50:24.798803 |
In [51]:
insta.count()
Out[51]:
Country 1000 Rank 1000 Account 1000 Title 983 Link 1000 Category 909 Followers 1000 Audience Country 993 Authentic engagement 1000 Engagement avg 1000 Scraped 1000 dtype: int64
Top 5개의 카테고리 (팔로워 총합 순) 어떤 카테고리가 가장 잘 나갈까?
In [13]:
df = insta.groupby("Category").sum().Followers.sort_values(ascending=False)
df = df.head(5)
In [19]:
plt.bar(df.index, df.values)
plt.xticks(rotation=30)
plt.title("Top 5 Categories sorted by sum of total followers")
plt.show()
어떤 나라에서 가장 instagram follow를 많이 했을까?
In [24]:
insta["Audience Country"].value_counts()
Out[24]:
United States 283 Brazil 161 India 143 Indonesia 130 Mexico 50 Spain 34 Russia 30 Argentina 24 Iran 17 United Kingdom 17 Turkey 16 Italy 15 South Korea 13 Colombia 9 Philippines 7 France 6 Egypt 6 Germany 5 Nigeria 4 Thailand 4 Iraq 4 Morocco 3 Japan 2 Saudi Arabia 2 Kazakhstan 2 Syria 1 China 1 Algeria 1 United Arab Emirates 1 Poland 1 Chile 1 Name: Audience Country, dtype: int64
위 지표만 봐도 대충은 알 수 있음. 보다 확실하게 하기 위해 팔로워수 총합을 기준으로 해보자.
In [33]:
df2 = insta.groupby("Audience Country").sum().Followers.sort_values(ascending=False)
df2 = df2.head(5)
In [34]:
plt.bar(df2.index, df2.values)
plt.xticks(rotation=30)
plt.title("Top 5 influential Countries")
plt.show()
결론 : 인스타그램에서 가장 많이 팔로우 되는 분야는 음악 분야로, 가수들을 의미한다. 그 영향력은 미국에서 가장 크게 드러났다.
In [45]:
df3 = insta.sort_values(by="Followers", ascending=False)
df3
Out[45]:
Country | Rank | Account | Title | Link | Category | Followers | Audience Country | Authentic engagement | Engagement avg | Scraped | |
---|---|---|---|---|---|---|---|---|---|---|---|
28 | All | 29 | https://www.instagram.com/instagram/ | Photography | 469600000.0 | India | 490400.0 | 608100.0 | 2022-02-07 16:50:24.798803 | ||
0 | All | 1 | cristiano | Cristiano Ronaldo | https://www.instagram.com/cristiano/ | Sports with a ball | 400100000.0 | India | 7800000.0 | 9500000.0 | 2022-02-07 16:50:24.798803 |
1 | All | 2 | kyliejenner | Kylie 🤍 | https://www.instagram.com/kyliejenner/ | Fashion|Modeling|Beauty | 308800000.0 | United States | 6200000.0 | 10100000.0 | 2022-02-07 16:50:24.798803 |
2 | All | 3 | leomessi | Leo Messi | https://www.instagram.com/leomessi/ | Sports with a ball|Family | 306300000.0 | Argentina | 4800000.0 | 6500000.0 | 2022-02-07 16:50:24.798803 |
4 | All | 5 | selenagomez | Selena Gomez | https://www.instagram.com/selenagomez/ | Music|Lifestyle | 295800000.0 | United States | 2700000.0 | 3600000.0 | 2022-02-07 16:50:24.798803 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
836 | All | 837 | sound_of_coups | COUPS | https://www.instagram.com/sound_of_coups/ | NaN | 3200000.0 | Indonesia | 727300.0 | 932900.0 | 2022-02-07 16:50:24.798803 |
971 | All | 972 | nozeworld | no:ze | 노제 | https://www.instagram.com/nozeworld/ | NaN | 3200000.0 | South Korea | 637000.0 | 779700.0 | 2022-02-07 16:50:24.798803 |
835 | All | 836 | wilbursoot | Wilbur Soot | https://www.instagram.com/wilbursoot/ | Music | 3200000.0 | United States | 733800.0 | 942600.0 | 2022-02-07 16:50:24.798803 |
808 | All | 809 | for_everyoung10 | 장원영 WONYOUNG | https://www.instagram.com/for_everyoung10/ | NaN | 3100000.0 | Indonesia | 801100.0 | 1000000.0 | 2022-02-07 16:50:24.798803 |
747 | All | 748 | yooncy1 | 윤찬영 Yoon chanyoung | https://www.instagram.com/yooncy1/ | NaN | 2800000.0 | South Korea | 1100000.0 | 1300000.0 | 2022-02-07 16:50:24.798803 |
1000 rows × 11 columns
팔로워순으로 보다보니, 한국의 인스타 스타들도 있다는 사실을 알게 되었다. 랭킹을 보고 싶어 팔로워 기준 전체 데이터에 랭킹 데이터를 붙이고, 한국 인스타 스타들을 확인해보기로 하였다.
In [52]:
df3["Follower_Rank"] = range(1,1001)
In [58]:
df4 = df3[df3["Audience Country"]=="South Korea"]
df4
Out[58]:
Country | Rank | Account | Title | Link | Category | Followers | Audience Country | Authentic engagement | Engagement avg | Scraped | Follower_Rank | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
58 | All | 59 | dlwlrma | 이지금 IU | https://www.instagram.com/dlwlrma/ | Art|Artists | 25000000.0 | South Korea | 2700000.0 | 3200000.0 | 2022-02-07 16:50:24.798803 | 258 |
90 | All | 91 | hoooooyeony | Hoyeon | https://www.instagram.com/hoooooyeony/ | Lifestyle | 23500000.0 | South Korea | 1700000.0 | 2200000.0 | 2022-02-07 16:50:24.798803 | 276 |
269 | All | 270 | taeyeon_ss | TaeYeon | https://www.instagram.com/taeyeon_ss/ | Music | 17800000.0 | South Korea | 648700.0 | 789600.0 | 2022-02-07 16:50:24.798803 | 403 |
246 | All | 247 | hyunah_aa | Hyun Ah | https://www.instagram.com/hyunah_aa/ | Music | 17200000.0 | South Korea | 754100.0 | 894100.0 | 2022-02-07 16:50:24.798803 | 413 |
112 | All | 113 | hi_high_hiy | 황인엽 | https://www.instagram.com/hi_high_hiy/ | Cinema|Actors/actresses | 12300000.0 | South Korea | 2700000.0 | 3200000.0 | 2022-02-07 16:50:24.798803 | 597 |
199 | All | 200 | wi__wi__wi | 위하준 Wi Ha Jun | https://www.instagram.com/wi__wi__wi/ | NaN | 10000000.0 | South Korea | 1600000.0 | 2000000.0 | 2022-02-07 16:50:24.798803 | 689 |
966 | All | 967 | miyayeah | SUNMI | https://www.instagram.com/miyayeah/ | Photography|Lifestyle | 7900000.0 | South Korea | 267300.0 | 321200.0 | 2022-02-07 16:50:24.798803 | 806 |
421 | All | 422 | leeyoum262 | 이유미 | https://www.instagram.com/leeyoum262/ | Photography|Humor|Fun|Happiness | 7400000.0 | South Korea | 843400.0 | 1000000.0 | 2022-02-07 16:50:24.798803 | 837 |
805 | All | 806 | hyeri_0609 | 혜리 | https://www.instagram.com/hyeri_0609/ | Music | 6900000.0 | South Korea | 381100.0 | 459000.0 | 2022-02-07 16:50:24.798803 | 867 |
956 | All | 957 | masijacoke850714 | 이광수 | https://www.instagram.com/masijacoke850714/ | Cinema|Actors/actresses | 4700000.0 | South Korea | 443000.0 | 549200.0 | 2022-02-07 16:50:24.798803 | 967 |
680 | All | 681 | dear.zia | 𝑓𝑟𝑒𝑒𝑧𝑖𝑎 .🌼 | https://www.instagram.com/dear.zia/ | Lifestyle | 3700000.0 | South Korea | 956900.0 | 1100000.0 | 2022-02-07 16:50:24.798803 | 987 |
971 | All | 972 | nozeworld | no:ze | 노제 | https://www.instagram.com/nozeworld/ | NaN | 3200000.0 | South Korea | 637000.0 | 779700.0 | 2022-02-07 16:50:24.798803 | 997 |
747 | All | 748 | yooncy1 | 윤찬영 Yoon chanyoung | https://www.instagram.com/yooncy1/ | NaN | 2800000.0 | South Korea | 1100000.0 | 1300000.0 | 2022-02-07 16:50:24.798803 | 1000 |
In [63]:
plt.bar(df4.Title, df4.Followers)
plt.xticks(rotation = 90)
plt.show()
그래프를 그려보니 한글이 깨지는 것을 확인 할 수 있다. 이 문제를 해결해보자.
In [81]:
from matplotlib import font_manager, rc
font_path = "C:\Windows\Fonts\gulim.ttc"
font = font_manager.FontProperties(fname=font_path).get_name()
rc('font', family=font)
In [83]:
from matplotlib import font_manager, rc
font_path = "C:\Windows\Fonts\gulim.ttc"
font = font_manager.FontProperties(fname=font_path).get_name()
rc('font', family=font)
plt.bar(df4.Title, df4.Followers)
plt.xticks(rotation = 90)
plt.show()
뭐 어찌저찌 했더니 이번에는 폰트가 깨진다... 다른 방법은 없을까?
In [84]:
lst = np.array([1,2,3,4,5])
In [88]:
plt.scatter(lst,lst)
plt.title("산점도")
plt.xlabel("변수1")
plt.ylabel("변수2")
plt.grid(True)
plt.show()
https://bskyvision.com/1133 참고한 예제 내에서 시행한 것 처럼 일반적인 한글은 제대로 나온다. 그렇다면 문제는 내장 글씨체가 아니라, 데이터에서 존재하는 한글 폰트가 제각각이라 인식이 어려운 듯 하다.
In [103]:
plt.pie(df4.Followers, labels=df4.Title)
plt.show()
In [96]:
import matplotlib
matplotlib.rcParams["font.family"] = "Malgun Gothic"
matplotlib.rcParams['axes.unicode_minus'] = False
In [104]:
plt.plot(df4.Title, df4.Followers)
plt.show()
C:\Users\se99a\anaconda3\lib\site-packages\matplotlib\backends\backend_agg.py:240: RuntimeWarning: Glyph 119891 missing from current font. font.set_text(s, 0.0, flags=flags) C:\Users\se99a\anaconda3\lib\site-packages\matplotlib\backends\backend_agg.py:240: RuntimeWarning: Glyph 119903 missing from current font. font.set_text(s, 0.0, flags=flags) C:\Users\se99a\anaconda3\lib\site-packages\matplotlib\backends\backend_agg.py:240: RuntimeWarning: Glyph 119890 missing from current font. font.set_text(s, 0.0, flags=flags) C:\Users\se99a\anaconda3\lib\site-packages\matplotlib\backends\backend_agg.py:240: RuntimeWarning: Glyph 119911 missing from current font. font.set_text(s, 0.0, flags=flags) C:\Users\se99a\anaconda3\lib\site-packages\matplotlib\backends\backend_agg.py:240: RuntimeWarning: Glyph 119894 missing from current font. font.set_text(s, 0.0, flags=flags) C:\Users\se99a\anaconda3\lib\site-packages\matplotlib\backends\backend_agg.py:240: RuntimeWarning: Glyph 119886 missing from current font. font.set_text(s, 0.0, flags=flags) C:\Users\se99a\anaconda3\lib\site-packages\matplotlib\backends\backend_agg.py:240: RuntimeWarning: Glyph 127804 missing from current font. font.set_text(s, 0.0, flags=flags) C:\Users\se99a\anaconda3\lib\site-packages\matplotlib\backends\backend_agg.py:203: RuntimeWarning: Glyph 119891 missing from current font. font.set_text(s, 0, flags=flags) C:\Users\se99a\anaconda3\lib\site-packages\matplotlib\backends\backend_agg.py:203: RuntimeWarning: Glyph 119903 missing from current font. font.set_text(s, 0, flags=flags) C:\Users\se99a\anaconda3\lib\site-packages\matplotlib\backends\backend_agg.py:203: RuntimeWarning: Glyph 119890 missing from current font. font.set_text(s, 0, flags=flags) C:\Users\se99a\anaconda3\lib\site-packages\matplotlib\backends\backend_agg.py:203: RuntimeWarning: Glyph 119911 missing from current font. font.set_text(s, 0, flags=flags) C:\Users\se99a\anaconda3\lib\site-packages\matplotlib\backends\backend_agg.py:203: RuntimeWarning: Glyph 119894 missing from current font. font.set_text(s, 0, flags=flags) C:\Users\se99a\anaconda3\lib\site-packages\matplotlib\backends\backend_agg.py:203: RuntimeWarning: Glyph 119886 missing from current font. font.set_text(s, 0, flags=flags) C:\Users\se99a\anaconda3\lib\site-packages\matplotlib\backends\backend_agg.py:203: RuntimeWarning: Glyph 127804 missing from current font. font.set_text(s, 0, flags=flags)
다른 방법 또한 시도해 보았지만 어려웠다. 그렇다면 이름을 불러와서 다시 재설정하는 방법은 어떨까?
In [105]:
df4.head()
Out[105]:
Country | Rank | Account | Title | Link | Category | Followers | Audience Country | Authentic engagement | Engagement avg | Scraped | Follower_Rank | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
58 | All | 59 | dlwlrma | 이지금 IU | https://www.instagram.com/dlwlrma/ | Art|Artists | 25000000.0 | South Korea | 2700000.0 | 3200000.0 | 2022-02-07 16:50:24.798803 | 258 |
90 | All | 91 | hoooooyeony | Hoyeon | https://www.instagram.com/hoooooyeony/ | Lifestyle | 23500000.0 | South Korea | 1700000.0 | 2200000.0 | 2022-02-07 16:50:24.798803 | 276 |
269 | All | 270 | taeyeon_ss | TaeYeon | https://www.instagram.com/taeyeon_ss/ | Music | 17800000.0 | South Korea | 648700.0 | 789600.0 | 2022-02-07 16:50:24.798803 | 403 |
246 | All | 247 | hyunah_aa | Hyun Ah | https://www.instagram.com/hyunah_aa/ | Music | 17200000.0 | South Korea | 754100.0 | 894100.0 | 2022-02-07 16:50:24.798803 | 413 |
112 | All | 113 | hi_high_hiy | 황인엽 | https://www.instagram.com/hi_high_hiy/ | Cinema|Actors/actresses | 12300000.0 | South Korea | 2700000.0 | 3200000.0 | 2022-02-07 16:50:24.798803 | 597 |
In [145]:
df5 = df4.Title
df5
Out[145]:
58 이지금 IU 90 Hoyeon 269 TaeYeon 246 Hyun Ah 112 황인엽 199 위하준 Wi Ha Jun 966 SUNMI 421 이유미 805 혜리 956 이광수 680 𝑓𝑟𝑒𝑒𝑧𝑖𝑎 .🌼 971 no:ze | 노제 747 윤찬영 Yoon chanyoung Name: Title, dtype: object
In [148]:
name_list = df5.to_list()
In [150]:
name_list = ['이지금 IU', 'Hoyeon', 'TaeYeon', 'Hyun Ah', '황인엽', '위하준 Wi Ha Jun', 'SUNMI', '이유미', '혜리', '이광수', 'freezia','no:ze | 노제','윤찬영 Yoon chanyoung']
In [152]:
df4.Title=name_list
C:\Users\se99a\anaconda3\lib\site-packages\pandas\core\generic.py:5516: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy self[name] = value
In [202]:
plt.plot(df4.Title, df4.Followers, color='g')
plt.ylabel("팔로워 수 (단위 : 천 만)", fontsize= 15)
plt.yticks(rotation = 90)
plt.xticks(rotation = 50, fontsize=20)
plt.title("한국 스타들의 인스타 팔로워 수")
plt.show()
'데이터 시각화 분석' 카테고리의 다른 글
취미로 하는 데이터 분석 시리즈05(이미지 분류/Dacon 공모전 CNN 클론 코딩) (0) | 2022.03.04 |
---|---|
취미로 하는 데이터 분석 시리즈04-2(기원후 1000년에 와인을 만들었다면 그 가격은 얼마일까?) (0) | 2022.03.01 |
취미로 하는 데이터 분석 시리즈04-1(와인 가격 데이터 분석) (0) | 2022.02.28 |
취미로 하는 데이터 분석 시리즈02(Covid 확진자 데이터 분석) (0) | 2022.02.22 |
취미로 하는 데이터 분석 시리즈01(reddit 에서 우수하다고 생각하는 대학 순위) (0) | 2022.02.22 |