Python批量获取淘宝相关搜索和下拉框关键词

首先淘宝seo是什么？淘宝seo是对淘宝站内关键字排名进行优化，淘宝三大排名因数：文本相关性，商业因素，用户喜好度。非专业总结（勿喷）
1，文本相关性：起码标题出现吧
2，商业因素：广告展位，直通车等
3，用户喜好度：成交量，评论系统，旺旺等等
其实做百度SEO完全可以从淘宝挖掘关键字来使用，万一客户在百度搜索呢，，我们可以找多一些词根，长尾也没关系，去爆淘宝或一些电子商务平台的关键字
下面我的脚本也比较简单，不过这次又稍微用到面向对象编程，思路也跟市面上的差不多，抓包发现淘宝API接口，用requests，urllib等模块爆它，出来一些数据，然后提取就是了。
另外说说chardet和multiprocessing模块，如果不是自带的就用“pip install 库名字”安装就是；chardet计算那个编码的，multiprocessing是多进程，百度一下，你就知道！
然后程序写的好啰嗦，主要是顺便练练手而已
下面篼雨seo博客啰里啰嗦写的淘宝seo查询工具：批量获取相关搜索和下拉框关键词：

#encoding=utf-8
import sys
reload(sys)
sys.setdefaultencoding( "utf-8" ) 
import urllib
import requests
import re
import json
import chardet
import multiprocessing
# print 'hello world'
# keyword=raw_input('请输入关键字:'.decode('utf-8').encode('gbk'))

class taobaospider(multiprocessing.Process):
    def __int__(self):
        multiprocessing.Process.__int__(self)

    def words(self):
        for line in open('word.txt'):
            keyword=line.strip()
            self.taobao(keyword)
            self.staobao(keyword)
        
    def taobao(self,keyword):
        self.op_txt=open('sword.txt','a')

        url='https://suggest.taobao.com/sug?code=utf-8&q=%s&_ksTS=1388978237516_3338&callback=jsonp3339&k=1&area=c2c&bucketid=11'%urllib.quote_plus(keyword)
        html=requests.get(url=url,timeout=10).content
        # print html
        data=re.compile(r'\((.+?)\)')

        jsdata=re.findall(data,html)
        for i in jsdata:
            c=json.loads(i)
            a=c['result']
            for n in a:
                self.op_txt.write('%s\n'%n[0])
                print n[0]

    def staobao(self,keyword):
        url='https://s.taobao.com/search?q='+keyword+''
        headers={'host':'s.taobao.com','user-agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36'}
        page=requests.post(url=url,headers=headers,timeout=10).content
        # page=html.decode('utf-8')
        c=re.compile(r'"text":"(.+?)","isHighlight":false,"href":"/search\?q.+?"}')
        a=re.findall(c,page)
        for i in a:
            mychar = chardet.detect(i)
            # print mychar
            bianma = mychar['encoding']
            print bianma
            if bianma == 'utf-8' or bianma == 'UTF-8':
                data=i
                self.op_txt.write('%s\n'%data)
                print data
            else:
                data=i.decode('gbk').encode('utf-8')
                self.op_txt.write('%s\n'%data)
                print data

if __name__ == '__main__':
    p=taobaospider()
    p.words()
    p.start()

相关文章:

Leave a Comment 取消回复