爬取B站20万+条弹幕,我学会了如何成为B站老司机
本文含 3420 字,27 图表截屏
建议阅读 10分钟
前言
弹幕分析
户外区-华农兄弟
![](https://filescdn.proginn.com/6c37676db398b3acd294bce73bbb6110/229d2f01e0fd9cb48537a4d432758f6b.webp)
![](https://filescdn.proginn.com/39b970348caf1eb63bb4c910827fd49f/2d331265078c62d6b879197d6c89ecd1.webp)
![](https://filescdn.proginn.com/71116c5be219a283bbdfccec6009651e/47721384265f5ebf1836e3e0ccbb5fd4.webp)
![](https://filescdn.proginn.com/550ade06a782f927f15a5259ae9ad35d/f784c1be75c7177b1333a4058b159548.webp)
知识区-罗翔
![](https://filescdn.proginn.com/71116c5be219a283bbdfccec6009651e/47721384265f5ebf1836e3e0ccbb5fd4.webp)
![](https://filescdn.proginn.com/ae551e68b52704e237ec7fdef3e6b002/610f7113bda1547d4649d139a0fcabad.webp)
![](https://filescdn.proginn.com/0ee9f187f2909538e1ff8dab6c13bd09/149f7f2df5356604dfcd00c4442a8839.webp)
![](https://filescdn.proginn.com/7d7c42caded31e66a64b3411b7e0703b/5595660a4b7923470727b1cc581ecfe8.webp)
![](https://filescdn.proginn.com/71116c5be219a283bbdfccec6009651e/47721384265f5ebf1836e3e0ccbb5fd4.webp)
![](https://filescdn.proginn.com/98c08694cf596104b9fc1f838ea79105/753f8e343e11a3a33ba4159622871a81.webp)
![](https://filescdn.proginn.com/fcfa525a3cf30e84995cff5129ca8abc/8d9ff8750d691acb4a07e3dcf9f32d17.webp)
![](https://filescdn.proginn.com/3b4c22d3bc7de2f805136f781b4098e5/139e68f373de58c7ac0154cd6048dd47.webp)
![](https://filescdn.proginn.com/7f857751453878a0ce7e4089d3275aa2/9e177b83590909275ac16471eb43e8bc.webp)
生活区-手工耿
![](https://filescdn.proginn.com/d62e434dbe05c62344df48c7d9ccbbca/7cb48a7036d9f350e02ef08409bb6665.webp)
![](https://filescdn.proginn.com/54959a753be78d376359aeabbe318149/a7b3e2519d3da4c96eef478ad9c752ef.webp)
美食区-我是郭杰瑞
![](https://filescdn.proginn.com/33ba935f2eb9344563f3227ecfc58156/ec15df184927c96c5cddb06f1f90bb9c.webp)
![](https://filescdn.proginn.com/951a162ef23f575fe73e6fa259e01d99/48aa0ace87bc7471c791d8d6ae902fe4.webp)
![](https://filescdn.proginn.com/5e80b9744fb6b2f2cad634af69d35647/551d6c022c40a708ad3e6e320183ad39.webp)
鬼畜区
![](https://filescdn.proginn.com/1cae0e5053e39515bba432527c4f13e5/ffafc0a35f0e1aaf5eb8967ad5ddd432.webp)
![](https://filescdn.proginn.com/d0b63ffbb86729fe9b34ef8b24d35b79/7441405546e6fcb2889daa7dd4233b6d.webp)
技术解析
requests
请求数据,我们已华农兄弟的视频为例,首先打开需要采集弹幕的视频,然后F12—>Network
,![](https://filescdn.proginn.com/77ab68ab0e146e4066e2b71f592e1081/14e27d39ae059b7ea14d085a3f158e43.webp)
![](https://filescdn.proginn.com/30d862c36bfac5392d88af4c315559c6/5d36a8f3bb1307b99a5c96f51e839832.webp)
![](https://filescdn.proginn.com/1c00145691e95d83f0fcaa71a6bf7c13/c099f39edacb7b1c00807c1292cc96cd.webp)
RequestURL
关键就是oid
和date
两个参数,date是日期没什么好说的,oid虽然不知道是什么,但是一堆数据包中很多都是带有一个oid![](https://filescdn.proginn.com/0483ef5a239c271fd78317add5b4ba9b/861dbc2a3d655a36ce28b40cc79d1e9a.webp)
def get_url(oid,start,end):
'''
获取指定日期的弹幕
oid:视频oid
start,end:起止日期
'''
url_list = []
date_list = [i for i in pd.date_range(start,end).strftime('%Y-%m-%d')]
for date in date_list:
url = f"https://api.bilibili.com/x/v2/dm/history?type=1&oid={oid}&date={date}"
url_list.append(url)
return url_list
pandas
中的date_range
函数,非常好用,感兴趣的读者可以自行搜索了解,现在我们获得了指定日期的弹幕数据URL,接下来要做的就是使用requests
请求网站并使用bs4
解析数据,最后将数据写入TXT即可def get_danmu(url_list,name):
'''
下载弹幕存至本地txt
'''
headers = {"cookie": "修改为你的cookie",
"origin": "https://www.bilibili.com",
"referer": "https://www.bilibili.com/video/BV1gW411b735",
"sec-fetch-dest": "empty",
"sec-fetch-mode": "cors",
"sec-fetch-site": "same-site",
"user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36"}
file = open(f"{name}.txt",'w')
for i in trange(len(url_list)):
url = url_list[i]
res = requests.get(url,headers = headers)
res.encoding = 'utf-8'
soup = BeautifulSoup(res.text)
data = soup.find_all("d")
danmu = [data[i].text for i in range(len(data))]
for items in danmu:
file.write(items)
file.write("\n")
time.sleep(2)
file.close()
cookie
等参数构造请求头循环请求数据即可,唯一要注意的就是返回的结果编码为ISO-8859-1
,需要先使用res.encoding = 'utf-8'
修改编码,否则就会乱码,当然我这里还是用了tqdm
来添加进度条![](https://filescdn.proginn.com/a4540216b276915cabcb83ab3a89032f/4e0ffe71e96ea9d9eec14b170b5a44fe.webp)
关注「Python 知识大全」,做全栈开发工程师 岁月有你 惜惜相处 回复【资料】获取高质量学习资料 【在看】和【赞】我都需要
评论