使用selenium自动化爬取网页数据

2025-05-30 2025-05-31 编程 python pytest 31

仓库地址：https://github.com/XuWink/webapp.git

配置环境

python -m venv venv
.\venv\Scripts\activate
pip install selenium

下载浏览器驱动

参考该博客：Selenium+WebDriver 各浏览器驱动下载与使用 - 苏念雨 - 博客园

自动化爬取网页表格

电影票房排行榜|全球电影票房排行榜

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.edge.service import Service
from selenium.webdriver.edge.options import Options
import time

# 配置 EdgeDriver
edge_options = Options()
edge_options.add_argument("--headless")  # 无头模式，不打开浏览器窗口
service = Service(r'asserts/edgedriver_win64/msedgedriver.exe')  # 替换为你的 chromedriver 路径

# 启动浏览器
driver = webdriver.Edge(service=service, options=edge_options)

# 打开目标网站
url = 'http://www.piaofang.biz/'  # 替换为你的目标网站 URL
driver.get(url)

# 等待页面加载完成
time.sleep(5)  # 简单等待，实际应用中可以使用 WebDriverWait 进行更精确的等待

# 查找表格元素
table = driver.find_element(By.TAG_NAME, 'table')  # 根据实际情况调整选择器

# 提取表格数据
rows = table.find_elements(By.TAG_NAME, 'tr')
data = []

for row in rows:
    cols = row.find_elements(By.TAG_NAME, 'td')
    row_data = [col.text for col in cols]
    data.append(row_data)

# 关闭浏览器
driver.quit()

# 输出数据
for row in data:
    print(row)

使用selenium自动化爬取网页数据

https://www.bytecanvas.top/archives/HllkWa2T

作者

禧语许

发布于

2025-05-30

更新于

2025-05-31

许可

编程 python pytest