HTML文本解析
工具: Beautifulsoup
https://www.crummy.com/software/BeautifulSoup/bs4/doc/index.zh.html
安装
pip install beautifulsoup4
解析器:
pip install lxml
导入
from bs4 import BeautifulSoup
简单示例
import requests
from bs4 import BeautifulSoup
url = "http://www.shixiaolei.com/posts/1/"
r = requests.get(url)
r.text
# HTML文本解析成Beautifulsoup对象
soup = BeautifulSoup(r.text,'lxml')
soup
# CSS选择器
data = soup.select(".title a")
for d in data:
print(d.get_text())
留言