scanner tout les liens en Python

Libellés : code source, programmation, python

simpa cette petite librairie urllib !!!
def getLink(url): import urllib str=[] htmlSource = urllib.urlopen(url).read(200000) for chunk in htmlSource.lower().split('href=')[1:]: indexes = [i for i in [chunk.find('"',1),chunk.find('>'),chunk.find(' ')] if i>-1] str.append(chunk[:min(indexes)]) return str

ce code retourne un tableau avec tout les liens du site ;
Problème : il retourne aussi les liens avec les images ; le mieux serait ainsi :

def getLink2(url): import urllib str=[] ext = ['.gif','.png','.jpg','.bmp','.css'] htmlSource = urllib.urlopen(url).read() for chunk in htmlSource.lower().split('href=')[1:]: indexes = [i for i in [chunk.find('"',1),chunk.find('>'),chunk.find(' ')] if i>-1] link = chunk[:min(indexes)] link = link.replace("'","") if link[0]=='"':link=link[1:] if link[-1]=='"':link=link[:1] if url[-1]!="/":url+="/" if link[0]=="/": link = url+link[1:] if link != "#": if not link[-4:] in ext : str.append(link) return str

on évitera aussi les ancres, les liens nuls, etc ..