介绍一个市面上少提到的实用的爬虫–tornado
Ubantu环境下安装方式pip install tornado
下面是爬取百度源码
#encoding=utf-8 import tornado.httpclient def getpage(url): http_header={'User-Agent':'Chrome'} http_request=tornado.httpclient.HTTPRequest(url=url,method='GET',headers=http_header,connect_timeout=20,request_timeout=600) http_client=tornado.httpclient.HTTPClient() print 'Start downloading data...' http_response=http_client.fetch(http_request) print 'Finish downloading data' # 打印状态码 print http_response.code #获取header所有信息 all_fields=http_response.headers.get_all() for x in all_fields: print x #打印网页源码 print http_response.body if __name__ == '__main__': getpage('https://bigwayseo.com')
下面是打印出来的结果:
Start downloading data... Finish downloading data 200 ('X-Consumed-Content-Encoding', 'gzip') ('Bduserid', '0') ('Bdqid', '0x8619bea400001fa6') ('X-Powered-By', 'HPHP') ('Transfer-Encoding', 'chunked') ('Set-Cookie', 'BAIDUID=6074595641285942B3B28F52889C30CC:FG=1; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com') ('Set-Cookie', 'BIDUPSID=6074595641285942B3B28F52889C30CC; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com') ('Set-Cookie', 'PSTM=1453696013; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com') ('Set-Cookie', 'BDSVRTM=0; path=/') ('Set-Cookie', 'BD_HOME=0; path=/') ('Set-Cookie', 'H_PS_PSSID=18286_1434_13549_18879_17949_18205_18964_18778_17000_18782_17072_15444_12239; path=/; domain=.baidu.com') ('Set-Cookie', '__bsi=12997306107329212412_00_11_N_N_2_0303_C02F_N_N_N_0; expires=Mon, 25-Jan-16 04:26:58 GMT; domain=www.baidu.com; path=/') ('Expires', 'Mon, 25 Jan 2016 04:26:30 GMT') ('Vary', 'Accept-Encoding') ('Server', 'bfe/1.0.8.13') ('Connection', 'close') ('Cxy_all', 'baidu+1d7ede856c7c96380845666fcd8157ce') ('Cache-Control', 'private') ('Date', 'Mon, 25 Jan 2016 04:26:53 GMT') ('P3p', 'CP=" OTI DSP COR IVA OUR IND COM "') ('Content-Type', 'text/html; charset=utf-8') ('Bdpagetype', '1') ('X-Ua-Compatible', 'IE=Edge,chrome=1') <!DOCTYPE html><!--STATUS OK--><html><head><meta http-equiv="content-type" content="text/html;charset=utf-8"><meta http-equiv="X-UA-Compatible" content="IE=Edge"><meta content="always" name="referrer"><meta name="theme-color" content="#2932e1"><link rel="shortcut icon" href="/favicon.ico" type="image/x-icon" /><link rel="search" type="application/opensearchdescription+xml" href="/content-search.xml" title="百度搜索" /><link rel="icon" sizes="any" mask href="//www.baidu.com/img/baidu.svg"><link rel="dns-prefetch" href="//s1.bdstatic.com"/><link rel="dns-prefetch" href="//t1.baidu.com"/><link rel="dns-prefetch" href="//t2.baidu.com"/><link rel="dns-prefetch" href="//t3.baidu.com"/><link rel="dns-prefetch" href="//t10.baidu.com"/><link rel="dns-prefetch" href="//t11.baidu.com"/><link rel="dns-prefetch" href="//t12.baidu.com"/><link rel="dns-prefetch" href="//b1.bdstatic.com"/><title>百度一下,你就知道</title><--------源码只截取了一部分------>
这个tornado明显是比urllib快,具体大家可以自行测试!
官方https://www.tornadoweb.org/en/stable/