开发者社区> 问答> 正文

从Extracting Links获取ValueError

我正在从Wiki页面中提取url链接,并在尝试解析某些链接时出现“ValueError”。我正在寻找一种方法来忽略错误或解决问题。似乎当循环提取链接时,它会运行到它不能识别为链接和回溯的链接。

from bs4 import BeautifulSoup
import urllib.request, urllib.parse, urllib.error
import ssl
import re

ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
url = input("Enter First Link: ")
if len(url)<1: url = "https://www.bing.com/search?q=k+means+wiki&src=IE-SearchBox&FORM=IENAD2"

position = 18
process = 7

#to repeat 18 times#
for i in range(process):

html = urllib.request.urlopen(url, context=ctx)
soup = BeautifulSoup(html, 'html.parser')
tags = soup('a')
count = 0
for tag in tags:
    count = count +1
    #make it stop at position 3#
    if count>position:
        break
    url = tag.get('href', None)

    print(url)

Raises:
ValueError Traceback (most recent call last)

ValueError: unknown url type: '/search?q=Cluster+analysis%20wikipedia&FORM=WIKIRE'

展开
收起
一码平川MACHEL 2019-01-23 14:28:45 1852 0
1 条回答
写回答
取消 提交回答
  • 你可以把它放在循环中:

    for i in range(process):

    try:
        "line of code causes the problem"
    except ValueError:
        print("invalid url")
    
    2019-07-17 23:26:37
    赞同 展开评论 打赏
问答分类:
问答地址:
问答排行榜
最热
最新

相关电子书

更多
Towards A Fault-Tolerant Speaker Verification System: A Regularization Approach To Reduce The Condition Number 立即下载
低代码开发师(初级)实战教程 立即下载
阿里巴巴DevOps 最佳实践手册 立即下载