所在分类:  Amazon 所属圈子: Amazon 计算机和网络技术

亚马逊爬虫工具WebScraper使用全攻略:抓取产品QA信息、抓取指定页面代码、抓取竞品review、抓取搜索页关键词前几页产品、抓取榜单

发帖3次 被置顶1次 被推荐1次 质量分2星 回帖互动35次 历史交流热度9% 历史交流深度0%
首先你要保证自己的浏览器已经安装了WebScraper插件。
 
接下来,直接上各种案例代码。

1.抓取产品QA信息

有两种方式:抓取所有页面,抓取指定页

a.Web scraper点击翻页,抓取所有页数。缺点是有的竞品可能几百页QA,没必要抓那么多。
代码:
{"_id":"amz-qa","startUrl":["https://www.amazon.com/ask/que ... ot%3B],"selectors":[{"id":"contents","parentSelectors":["_root","nextpage"],"type":"SelectorElement","selector":".a-section > div.a-spacing-base > div > div.a-col-right","multiple":true,"delay":null},{"id":"question","parentSelectors":["contents"],"type":"SelectorText","selector":".a-spacing-small div.a-col-right","multiple":false,"delay":0,"regex":""},{"id":"answer","parentSelectors":["contents"],"type":"SelectorText","selector":".a-col-right > span:nth-of-type(1)","multiple":false,"delay":0,"regex":""},{"id":"buyer","parentSelectors":["contents"],"type":"SelectorText","selector":"span.a-profile-name","multiple":false,"delay":0,"regex":""},{"id":"date","parentSelectors":["contents"],"type":"SelectorText","selector":"span.a-color-tertiary","multiple":false,"delay":0,"regex":""},{"id":"nextpage","parentSelectors":["_root","nextpage"],"type":"SelectorLink","selector":".a-last a","multiple":true,"delay":0}]} 
 
按图示,导入代码
 
https://wearesellers.oss-cn-shenzhen.aliyuncs.com/questions/20220525/a7614c2ae3a69eb85af2c0da1fc9b956.png
https://wearesellers.oss-cn-shenzhen.aliyuncs.com/questions/20220525/5e53974557c3739300077f3582ff3e3d.png
https://wearesellers.oss-cn-shenzhen.aliyuncs.com/questions/20220525/015626e7d9dbcc8e3280c4a1b3225a55.png
抓取不同竞品,最好更换ASIN。可以在这里编辑,后面的ASIN换成其他ASIN

 
b.抓取指定页面代码:

导入代码以后,如果想指定页数,就打开编辑,更改网址后面的数字。
 
https://wearesellers.oss-cn-shenzhen.aliyuncs.com/questions/20220525/7d7a657ac3bc799e0848d5739317f481.png
如果爬取8页,就改成[1-8],依此类推。
代码:
{"_id":"amz-qa2","startUrl":["https://www.amazon.com/ask/que ... 8QQP/[1-2]"],"selectors":[{"delay":null,"id":"contents","multiple":true,"parentSelectors":["_root"],"selector":".a-section > div.a-spacing-base > div > div.a-col-right","type":"SelectorElement"},{"delay":0,"id":"question","multiple":false,"parentSelectors":["contents"],"regex":"","selector":".a-spacing-small div.a-col-right","type":"SelectorText"},{"delay":0,"id":"answer","multiple":false,"parentSelectors":["contents"],"regex":"","selector":".a-col-right > span:nth-of-type(1)","type":"SelectorText"},{"delay":0,"id":"buyer","multiple":false,"parentSelectors":["contents"],"regex":"","selector":"span.a-profile-name","type":"SelectorText"},{"delay":0,"id":"date","multiple":false,"parentSelectors":["contents"],"regex":"","selector":"span.a-color-tertiary","type":"SelectorText"}]}
 
抓取其他数据依旧是同样的方式,导入代码即可。
 
2.抓取竞品review

代码:
{"_id":"review","startUrl":["https://www.amazon.com/Insulat ... ot%3B],"selectors":[{"clickElementSelector":".a-last a","clickElementUniquenessType":"uniqueCSSSelector","clickType":"clickMore","delay":3000,"discardInitialElements":"do-not-discard","id":"info","multiple":true,"parentSelectors":["_root"],"selector":".a-row div.celwidget","type":"SelectorElementClick"},{"delay":0,"id":"name","multiple":false,"parentSelectors":["info"],"regex":"","selector":"span.a-profile-name","type":"SelectorText"},{"delay":0,"id":"score","multiple":false,"parentSelectors":["info"],"regex":"","selector":"> div:nth-of-type(2)","type":"SelectorText"},{"delay":0,"id":"status","multiple":false,"parentSelectors":["info"],"regex":"","selector":"div.a-spacing-mini.review-data","type":"SelectorText"},{"delay":0,"id":"time","multiple":false,"parentSelectors":["info"],"regex":"","selector":"span.a-color-secondary","type":"SelectorText"},{"delay":0,"id":"content","multiple":false,"parentSelectors":["info"],"regex":"","selector":"div.a-spacing-small","type":"SelectorText"}]} 

3.抓取搜索页关键词前几页产品

无限翻页代码
{"_id":"wxsearch","startUrl":["https://www.amazon.com/s%3Fk%3 ... ot%3B],"selectors":[{"delay":0,"id":"info","multiple":true,"parentSelectors":["_root","panination"],"selector":"div.s-expand-height","type":"SelectorElement"},{"delay":0,"id":"title","multiple":false,"parentSelectors":["info"],"selector":".a-size-mini a","type":"SelectorLink"},{"delay":0,"id":"score","multiple":false,"parentSelectors":["info"],"regex":"","selector":"i.a-icon-star-small","type":"SelectorText"},{"delay":0,"id":"reviews","multiple":false,"parentSelectors":["info"],"regex":"","selector":"span.a-size-base","type":"SelectorText"},{"delay":0,"id":"panination","multiple":true,"parentSelectors":["_root","panination"],"selector":".a-last a","type":"SelectorLink"},{"delay":0,"id":"price","multiple":false,"parentSelectors":["info"],"regex":"","selector":"[data-a-size='l'] span[aria-hidden]","type":"SelectorText"}]} 

指定页数代码:(前5页)
{"_id":"search","startUrl":["https://www.amazon.com/s?k=lunch+box&page=[1-5]"],"selectors":[{"delay":0,"id":"info","multiple":true,"parentSelectors":["_root"],"selector":"div.s-expand-height","type":"SelectorElement"},{"delay":0,"id":"title","multiple":false,"parentSelectors":["info"],"selector":".a-size-mini a","type":"SelectorLink"},{"delay":0,"id":"score","multiple":false,"parentSelectors":["info"],"regex":"","selector":"i.a-icon-star-small","type":"SelectorText"},{"delay":0,"id":"reviews","multiple":false,"parentSelectors":["info"],"regex":"","selector":"span.a-size-base","type":"SelectorText"},{"delay":0,"id":"price","multiple":false,"parentSelectors":["info"],"regex":"","selector":"[data-a-size='l'] span[aria-hidden]","type":"SelectorText"}]} 

4.抓取榜单代码
 
{"_id":"amazon-com-best-sellers","startUrl":["https://www.amazon.com/Best-Se ... pg%3D[1-8]"],"selectors":[{"id":"info","parentSelectors":["_root"],"type":"SelectorElement","selector":"div.zg-grid-general-faceout","multiple":true,"delay":null},{"id":"name","parentSelectors":["info"],"type":"SelectorText","selector":"div._cDEzb_p13n-sc-css-line-clamp-3_g3dy1","multiple":false,"delay":0,"regex":""},{"id":"score","parentSelectors":["info"],"type":"SelectorText","selector":"div.a-icon-row","multiple":false,"delay":0,"regex":""},{"id":"price","parentSelectors":["info"],"type":"SelectorText","selector":"div:nth-of-type(2)","multiple":false,"delay":0,"regex":""}]} 

如果数据为空,有两种情况:

1.需要更改网址

2.类目结构不一样,需要重新选择爬虫结构
这种情况,可按照下面的图示指引,重新选择数据。
 https://wearesellers.oss-cn-shenzhen.aliyuncs.com/questions/20220525/73de8debc9088c201abb4b6127faf07b.png
https://wearesellers.oss-cn-shenzhen.aliyuncs.com/questions/20220525/f24c989bec9e03c7d32bdef75222ba34.png
 
https://wearesellers.oss-cn-shenzhen.aliyuncs.com/questions/20220525/b4f6227323d204b558753e9d937b62bd.png
https://wearesellers.oss-cn-shenzhen.aliyuncs.com/questions/20220525/2c74168608360013f351681e751a833f.png
https://wearesellers.oss-cn-shenzhen.aliyuncs.com/questions/20220525/80eae19a79733103495b8395083c594a.png
 
相信耐心看完的同学,也能熟练掌握webscraper爬虫了。
 
已邀请:
我也用webscraper,我建议其他卖家可以搜 少数派-卤蛋实验室 有七八节这个爬虫的教程,由浅到深,值得花1-2小时学习一下,用得舒服

要回复问题请先登录注册