python如何获得一个url地址对应的跳转后的最终网址
问题描述
在python中,如何获取短链对应最终网址,现在有很多淘宝短链,我需要获取短链跳到的最终网址,这个有什么好办法吗?有的是302 有的是直接在网页用js进行跳转,,这些如何获取
问题解答
回答1:用selenium+phantonjs...
http://stackoverflow.com/ques...
#!/usr/bin/python2.7from twisted.internet import reactorfrom twisted.internet.defer import Deferred, DeferredList, DeferredLockfrom twisted.internet.defer import inlineCallbacksfrom twisted.web.client import Agent, HTTPConnectionPoolfrom twisted.web.http_headers import Headersfrom pprint import pprintfrom collections import defaultdictfrom urlparse import urlparsefrom random import randrangeimport fileinputpool = HTTPConnectionPool(reactor)pool.maxPersistentPerHost = 16agent = Agent(reactor, pool)locks = defaultdict(DeferredLock)locations = {}def getLock(url, simultaneous = 1): return locks[urlparse(url).netloc, randrange(simultaneous)]@inlineCallbacksdef getMapping(url): # Limit ourselves to 4 simultaneous connections per host # Tweak this as desired, but make sure that it no larger than # pool.maxPersistentPerHost lock = getLock(url,4) yield lock.acquire() try:resp = yield agent.request(’HEAD’, url)locations[url] = resp.headers.getRawHeaders(’location’,[None])[0] except Exception as e:locations[url] = str(e) finally: lock.release()
而且可以试试pip包
https://pypi.python.org/pypi/...
from urlunshort import resolveresolve('http://bit.ly/qlKaI') 结果 ’http://bitbucket.org/runeh/urlunshort/’
相关文章:
1. 在mac下出现了两个docker环境2. html - sumlime text3代码自动补全功能问题!3. web - css3 @keyframes用一次后面的任何样式就解析不了了.4. javascript - 有一个异步获取数据的函数A,其他依赖这个A得到的数据的函数是否都必须是异步的?5. css3 - 给body设置了translate3d,动画的同时导致内部的fixed元素失效?6. node.js - win7 grunt不是内部或外部命令。7. 想练支付宝对接和微信支付对接开发(Java),好像个人不可以,怎么弄个企业的呢?8. node.js - 用node做微信开发后台,获取access_token报错9. css3 - 求css页面解决方案10. javascript - 大神们,求救啊,搞百度编辑器,弄了三天了,问题,如下
