In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn
In [2]:
comments = pd.read_csv("./Datas/top-american-universities-on-reddit-comments.csv")
In [3]:
posts = pd.read_csv("./Datas/top-american-universities-on-reddit-posts.csv")
In [4]:
posts.head(5)
Out[4]:
type | id | subreddit.id | subreddit.name | subreddit.nsfw | created_utc | permalink | domain | url | selftext | title | score | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | post | sxx8xn | 2sbc1 | brownu | False | 1645459938 | https://old.reddit.com/r/BrownU/comments/sxx8x... | self.brownu | NaN | [removed] | Transfer As A Junior or Wait For Grad School? | 1 |
1 | post | sxqx1s | 2sbc1 | brownu | False | 1645441249 | https://old.reddit.com/r/BrownU/comments/sxqx1... | self.brownu | NaN | \n\n[View Poll](https://www.reddit.com/poll/sx... | Who would you rather date? | 1 |
2 | post | sx0wgj | 2sbc1 | brownu | False | 1645362476 | https://old.reddit.com/r/BrownU/comments/sx0wg... | self.brownu | NaN | I don't know if this counts as an admission qu... | I'm slowly fading away... | 6 |
3 | post | swma7o | 2sbc1 | brownu | False | 1645311432 | https://old.reddit.com/r/BrownU/comments/swma7... | self.brownu | NaN | [removed] | Favorite class | 1 |
4 | post | sw0t10 | 2sbc1 | brownu | False | 1645242904 | https://old.reddit.com/r/BrownU/comments/sw0t1... | self.brownu | NaN | I stg its so dead, there is no sense of commun... | Yall we need to revive this subreddit | 46 |
In [5]:
comments.head()
Out[5]:
type | id | subreddit.id | subreddit.name | subreddit.nsfw | created_utc | permalink | body | sentiment | score | |
---|---|---|---|---|---|---|---|---|---|---|
0 | comment | hxu14uh | 2sbc1 | brownu | False | 1645453230 | https://old.reddit.com/r/BrownU/comments/sx0wg... | Will do that surely! Thanks!! | 0.7701 | 1 |
1 | comment | hxtwdi1 | 2sbc1 | brownu | False | 1645450748 | https://old.reddit.com/r/BrownU/comments/sx0wg... | It looks like a very small and intimate progra... | 0.8932 | 1 |
2 | comment | hxtrd2v | 2sbc1 | brownu | False | 1645447789 | https://old.reddit.com/r/BrownU/comments/sw0t1... | I will revive this subreddit | 0.3400 | 1 |
3 | comment | hxnwarx | 2sbc1 | brownu | False | 1645329323 | https://old.reddit.com/r/BrownU/comments/sg1co... | Be a chad, take both | 0.0000 | 1 |
4 | comment | hxmxrob | 2sbc1 | brownu | False | 1645312358 | https://old.reddit.com/r/BrownU/comments/swma7... | I’ll go first. This semester I’m taking cs030... | 0.9100 | 1 |
In [7]:
comments["subreddit.name"].value_counts()
Out[7]:
upenn 90142 stanford 66114 harvard 42043 mit 39059 duke 28972 brownu 21405 yale 20709 dartmouth 13656 princeton 10162 caltech 5969 Name: subreddit.name, dtype: int64
In [40]:
df = pd.DataFrame({
"frequency" : comments["subreddit.name"].value_counts()
}, index = comments["subreddit.name"].value_counts().index)
In [54]:
df.index
Out[54]:
Index(['upenn', 'stanford', 'harvard', 'mit', 'duke', 'brownu', 'yale', 'dartmouth', 'princeton', 'caltech'], dtype='object')
In [74]:
plt.bar(df.index, df.frequency, width = 0.5, color='g')
plt.title("Best University by reddit comments 2021")
plt.xticks(rotation=30)
plt.show()
reddit 댓글에서 본 우수 대학 순위
In [80]:
df2 = pd.DataFrame({"frequency": posts["subreddit.name"].value_counts()}, index = posts["subreddit.name"].value_counts().index)
df2
Out[80]:
frequency | |
---|---|
upenn | 19514 |
stanford | 15814 |
harvard | 9541 |
mit | 8446 |
duke | 6628 |
yale | 5042 |
brownu | 4823 |
princeton | 3665 |
dartmouth | 3100 |
caltech | 1540 |
In [84]:
plt.bar(df2.index, df2.frequency, color='maroon')
plt.title("Best University by reddit posts 2021")
plt.xticks(rotation = 30)
plt.show()
In [147]:
plt.bar(df.index, df.frequency, width = 0.5, color='lightblue')
plt.bar(df2.index, df2.frequency, width = 0.5, color='blue')
plt.ylim(0,100000)
plt.title("Best University by reddit comments and posts 2021")
plt.xticks(rotation=30)
plt.show()
댓글과 게시글 순위 비교
In [148]:
df3 = df2+df
df3
Out[148]:
frequency | |
---|---|
brownu | 26228 |
caltech | 7509 |
dartmouth | 16756 |
duke | 35600 |
harvard | 51584 |
mit | 47505 |
princeton | 13827 |
stanford | 81928 |
upenn | 109656 |
yale | 25751 |
In [151]:
df3 = df3.sort_values(by="frequency", ascending=False)
In [152]:
plt.bar(df3.index, df3.frequency, color="hotpink")
plt.title("sum of them")
plt.xticks(rotation=30)
plt.show()
댓글과 게시글 합산으로 본 대학 순위
In [ ]:
'데이터 시각화 분석' 카테고리의 다른 글
취미로 하는 데이터 분석 시리즈05(이미지 분류/Dacon 공모전 CNN 클론 코딩) (0) | 2022.03.04 |
---|---|
취미로 하는 데이터 분석 시리즈04-2(기원후 1000년에 와인을 만들었다면 그 가격은 얼마일까?) (0) | 2022.03.01 |
취미로 하는 데이터 분석 시리즈04-1(와인 가격 데이터 분석) (0) | 2022.02.28 |
취미로 하는 데이터 분석 시리즈03(Instagram 팔로워 수 데이터 분석) (0) | 2022.02.23 |
취미로 하는 데이터 분석 시리즈02(Covid 확진자 데이터 분석) (0) | 2022.02.22 |