Python批量采集爱站关键词搜索量

通过Python批量采集爱站关键词搜索量&简单的关键词挖掘，因为都在网页源代码当中，都可以用万能的正则表达式来匹配出来，不知道的不清楚的都自行脑补，下面的代码部分有完整正则表达式部分，几乎网页源代码中的采集思路都可以大概是这样操作：

先请求url，获取html源码

url变量部分用for遍历本地文件实现批量替换操作

使用正则表达式或xpath提取重要信息

将数据保存到excel，数据库等

需要登录爱站获取数据的部分也将提供源码

在sublime运行print出来的结果如下：

导出csv的结果如下：

Python采集爱站关键词带搜索量源码：

import re
import urllib
op_csv_write=open('ciku.csv','a')
op_csv_write.write('关键词,搜索量\n')
for keyword in open('word.txt'):
    word=keyword.strip()
    url='https://ci.aizhan.com/%s/'%word
    # print url
    html=urllib.urlopen(url).read()
    # print html
    if '没有相关的关键词' in html:
        pass
    else:
        r=re.compile(r'<td class="blue t_l"><a href="https://www\.baidu\.com/baidu.*?" target="_blank" rel="nofollow">(.*?)</a></td>[\s\S]*?<td align="right">(\d+)</td>')
        a=re.findall(r,html)
        for i in a:
            # print i
            f=','.join(i)
            w=re.compile('<font color="#FF0000">|</font>')
            b = w.sub('',f)
            print b
            op_csv_write.write(b+'\n')

另外只要词根够多的话是可以采集很多的，还有就是翻页，爱站要登陆才能出数据，模拟登录一下就可以翻页采集更多关键词数据，下面还是直接上代码：
python通过post方式登录爱站

#-*-coding:utf-8-*-
import urllib  
import urllib2  
import cookielib  
import re  
  
hosturl = 'https://www.aizhan.com/'   
posturl = 'https://www.aizhan.com/login.php'
  
#保存cookie至本地  
cj = cookielib.LWPCookieJar()  
cookie_support = urllib2.HTTPCookieProcessor(cj)  
opener = urllib2.build_opener(cookie_support, urllib2.HTTPHandler)  
urllib2.install_opener(opener)  
  
h = urllib2.urlopen(hosturl)  
 
headers = {
"Host":"www.aizhan.com",
"Connection":"keep-alive",
"User-Agent":"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1",
"Content-Type":"application/x-www-form-urlencoded",
"Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Encoding":"gzip,deflate,sdch",
"Accept-Language":"zh-CN,zh;q=0.8",
"Accept-Charset":"GBK,utf-8;q=0.7,*;q=0.3"
}  
 
postData = {"email":"用户名","password":"密码"}  
  
postData = urllib.urlencode(postData)  
  
#请求并发送制定的构造数据
request = urllib2.Request(posturl, postData, headers)  
response = urllib2.urlopen(request)  
text = response.read()  
  
#抓取分页，测试登陆是否成功，未登录情况下只返回"2"
url = "https://baidurank.aizhan.com/baidu/anjuke.com/position/"
req = re.compile('<a href="/baidu/anjuke.com/[0-99]+/position/">(.*?)</a>')
html = urllib2.urlopen(url).read()
page_data = re.findall(req,html)
print page_data

到此我们轻松采集到爱站的关键词数据，关于python与seo那点事，待续…

5 thoughts on “Python批量采集爱站关键词搜索量”

匿名

2018年3月28日 at 下午4:49

python + selenium 用谷歌批量搜索关键词，然后匹配出结果，用selenium怎么弄呢，有几个地方卡了，费脑子
import re
import time
from selenium import webdriver
from time import sleep
from bs4 import BeautifulSoup
options = webdriver.ChromeOptions()
options.add_experimental_option(“excludeSwitches”, [“ignore-certificate-errors”])
browser = webdriver.Chrome(chrome_options=options)
browser.get(‘https://www.google.co.th/’)
time.sleep(3)
search_box = browser.find_element_by_name(‘q’) # 取得搜索框,用name去获取DOM
search_box.send_keys(‘美国华人旅行社邮箱’) # 在搜尋框內輸入 ‘Github’
search_box.submit() # 令 chrome 按下 submit按钮.
time.sleep(5) # 缓冲5秒
html = browser.page_source
soup = BeautifulSoup(html,”html.parser”)
print(soup.find_all(re.compile(“[a-zA-Z0-9_-]+@[a-zA-Z0-9_-]+(\.[a-zA-Z0-9_-]+)+”)))
print(soup.find_all(text=”r'([\w-]+@[\w-]+\.[\w-]+?[\.\w-]*)'”))
browser.quit() # 关闭chromedriver(关闭浏览器)
输出的结果是
[]
[]
两个空白
可否加我q指点：203451260
回复
王露

2017年1月6日 at 下午7:50

你好，如果我自己设定10个关键词，想获取这10个关键词在百度中的搜索量，我是初学者，不太会，能否指教一下！
回复
- Bigway
  
  2017年1月9日 at 上午10:27
  
  不知道你程序水平如何，我建议直接通过爱站，5118，牛佬工具，百度指数等平台直接查就可以
  回复
匿名

2015年10月8日 at 上午10:47

赞一个
回复
毛驴哥

2015年10月7日 at 上午10:38

是这么回事，Python还是挺好用的！
回复

相关文章:

5 thoughts on “Python批量采集爱站关键词搜索量”

Leave a Comment 取消回复