Python通过urllib2模拟登录爱站

采集数据有时会用到模拟登录,比如百度竞价后台采集关键词的时候,一些电子商务平台不提供导出关键词以及搜索量等数据的时候,我们都可以通过python小脚本来实现快速批量采集以及导出,python中自带的urllib2库就可以实现向网页提交数据来实现登录,然后虽然urllib2模拟登录看起来有点长,但这种语法上的东西就只能死记硬背了,到时直接套就能够用,我也记录一下:
python模拟登录

#-*-coding:utf-8-*-
import urllib  
import urllib2  
import cookielib  
import re  
 
hosturl = 'https://www.aizhan.com/'    
posturl = 'https://www.aizhan.com/login.php' 
 
#保存cookie至本地  
cj = cookielib.LWPCookieJar()  
cookie_support = urllib2.HTTPCookieProcessor(cj)  
opener = urllib2.build_opener(cookie_support, urllib2.HTTPHandler)  
urllib2.install_opener(opener)  
 
h = urllib2.urlopen(hosturl)  

headers = {
"Host":"www.aizhan.com",
"Connection":"keep-alive",
"User-Agent":"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1",
"Content-Type":"application/x-www-form-urlencoded",
"Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Encoding":"gzip,deflate,sdch",
"Accept-Language":"zh-CN,zh;q=0.8",
"Accept-Charset":"GBK,utf-8;q=0.7,*;q=0.3"
}  

postData = {"email":"用户名","password":"密码"}  
 
postData = urllib.urlencode(postData)  
 
#请求并发送制定的构造数据
request = urllib2.Request(posturl, postData, headers)  
response = urllib2.urlopen(request)  
text = response.read()  
 
#抓取分页,测试登陆是否成功,未登录情况下只返回"2"
url = "https://baidurank.aizhan.com/baidu/anjuke.com/position/"
req = re.compile('<a href="/baidu/anjuke.com/[0-99]+/position/">(.*?)</a>')
html = urllib2.urlopen(url).read()
page_data = re.findall(req,html)
print page_data

然后源码其实是从GOGO闯(博客是https://www.kaopuseo.com/)那里直接扒过来,难得自己敲了,语法上都是比较死记硬背的东西,没有太多的高深逻辑上的思想。套套套就是了,主要也是为了忘了的时候方便直接到博客上看看!

Leave a Comment