python - How to modify this script to check for HTTP status (404, 200) -


I'm currently using the following script to load the list of URLs, for each list of error strings Check the source. If there is no error string in the source, then the URL is considered valid and written in a text file.

How can I modify this script to check HTTP status instead? If a URL returns 404 then it will be ignored, if it returns 200, then the URL will be written in the text file. Any help will be very much appreciated.

  urlib2 import system error_string = ['invalid product number', 'specification is not available. Please contact customer service. '] Def check_link (url): if not url: return false f = urllib2.urlopen (url) html = f.read () result = if html: result = true html = html.lower () if in html: error_strings Results for the result of = false break return result if __name__ == '__main__': if lane (sys.argv) == 1: print 'usage:% s & lt; File_containing_urls & gt; % Sys.argv [0] and: Open = url ('valid_links.txt', 'w +') for url in open (sys.argv [1]): if (check_link (url.strip ()): Write output ('% s \ n'% url.strip ()); Output.flush () output.close ()  

You can change your call slightly Are:

  & gt; & Gt; & Gt; Try: ... f = urllib2.urlopen (url) ... urllib2.HTTPError except e ... ... print e.code ... 404  

E.code , you can check whether it has 404S or not. If you do not hit by blocking , you can currently use f while doing this.


Comments

Popular posts from this blog

winforms - C# Form - Property Change -

java - Messages from .properties file do not display UTF-8 characters -

javascript - amcharts makechart not working -