Le 4 mai 2021, la plateforme Yahoo Questions/Réponses fermera. Elle est désormais accessible en mode lecture seule. Aucune modification ne sera apportée aux autres sites ou services Yahoo, ni à votre compte Yahoo. Vous trouverez plus d’informations sur l'arrêt de Yahoo Questions/Réponses et sur le téléchargement de vos données sur cette page d'aide.
Need help with Python?
Hi,
I'm writing a program in Python, and I need a little help.
First of all, my code fetches a Web page:
import urllib
f = urllib.urlopen("http://www.poil.ca/%22)
s = f.read()
f.close()
print s
This prints the source code of the Web page.
Then, I need to find within the source code of the page all the occurences of id="voteup?????" in the source code, such as id="voteup25051":
import re
m = re.findall('id="voteup....."', s)
print m
This prints this list:
['id="voteup20524"', 'id="voteup20516"', 'id="voteup20517"', 'id="voteup20526"', 'id="voteup20525"', 'id="voteup20528"', 'id="voteup20527"', 'id="voteup20518"', 'id="voteup20512"', 'id="voteup20511"', 'id="voteup20510"', 'id="voteup20509"', 'id="voteup20507"', 'id="voteup20501"', 'id="voteup20496"', 'id="voteup20495"', 'id="voteup20492"', 'id="voteup20494"', 'id="voteup20488"', 'id="voteup20484"', 'id="voteup20483"', 'id="voteup20485"', 'id="voteup20480"', 'id="voteup20482"', 'id="voteup20481"', 'id="voteup20471"', 'id="voteup20461"', 'id="voteup20470"', 'id="voteup20466"', 'id="voteup20469"']
Now, I need something to keep only the numbers in this list. I want the output to be:
['20524', '20516', '20517', '20526', '20525', '20528', '20527', '20518', '20512', '20511', '20510', '20509', '20507', '20501', '20496', '20495', '20492', '20494', '20488', '20484', '20483', '20485', '20480', '20482', '20481', '20471', '20461', '20470', '20466', '20469']
And I want to store that in a variable.
How could I do this?
Thanks!
1 réponse
- omLv 6il y a 1 décennieRéponse favorite
The simplest way is to do something brutal and ad hoc like just throwing away everything that isn't a digit:
nums=[]
for mstr in m:
nstr=''
for mchr in mstr:
if mchr.isdigit():
nstr += mchr
nums.append(nstr)
The list 'nums' will now contain the numbers you want.
A more general approach would be to use the "match group" feature of the regular expression module. If you enclose a portion of the regular expression in parentheses then just that portion of the matched string is returned by findall(). Portions marked off in this way are called "groups". An expression can contain multiple groups -- you don't need that here -- and groups also work with match() and search(), although in a slightly more complex way than with findall(). Read the 're' sectionof the Python Library Manual at the link below for details. It has some helpful examples towards the bottom of the page. (If you're familiar with 'sed' and other regexp tools on Unix then groups are the same as backreference expressions.)
You could either use the group technique on each of the strings in your 'm' list, or you could use it in your original findall() call to get the list of numbers in one shot, like this:
m = re.findall('id="voteup(\d+)"')
'\d+' matches a run of one or more digits. Since it's in parens, it's the group that will be returned by findall(). Your original code contains no groups and in that case findall() returns the entire matched string, but this version will cause it to return only the numbers that match the group. Obviously this is a more flexible and powerful technique than a hard-coded loop that drops everything except digits.
Source(s) : http://docs.python.org/library/re.html