Extracting Links from an HTML document using a Script

Discussion in 'Scripting' started by JoJo, Aug 28, 2009.

  1. JoJo

    JoJo Guest


    I have an HTML document that is about 100 pages long. I assembled this
    document from the "Articles By
    This Author" section of the following web page:

    Scattered throughout this document are many links to the web. The links of
    interest to me all start with the ">>" characters, as seen
    at TigerSharkTrading, then the name of the article is given as a link.

    * How can I quickly extract these links and transfer same to a new file
    * Is there some type of script that can quickly accomplish this task ?

    JoJo, Aug 28, 2009
    1. Advertisements

  2. hi JoJo,

    I suggest using the "all" collection (of the document

    Let's say that your links appear in an "anchor" (A) tag.

    Then you could get your collection of anchor tags like this:


    To get the tags you want, you could "walk-the-list" with
    some sort of a loop (your choice, try "For Each").

    The individual items would be addressed as:

    document.all.tags("A")(i) ' where i is your index

    And the number of items would be:


    In your discussion, you mentioned the URL's, which are
    probably appearing as the "href" attribute of the "A"
    tag. My guess is that you can get the URL as:


    cheers, jw

    You got questions? WE GOT ANSWERS!!! ..(but, no guarantee
    the answers will be applicable to the questions)
    mr_unreliable, Aug 28, 2009
    1. Advertisements

  3. As indicated by mr_unreliable, you will probable want to use the DOM
    objects to parse the document. I was just going to add that it appears
    all the links of interest are contained in SPAN objects that have a class name
    of 'title'. So, instead of grabbing 'all' anchors, you could grab all 'SPAN'
    objects and check for a className of title, and then do another grab
    within that object for all anchors (of which there is only one, the one you

    Something like: (warning - air code)

    For each sp in document.all.tags("SPAN")
    If sp.className = "title" Then
    For each ref in sp.all.tags("A")
    ' Save hRef to new file ex...
    AppendToFile ref.hRef
    End If

    Your own AppendToFile routine night as well make the file an HTML
    document, so you can load it in a browser and click on any interesting

    Have fun!
    Larry Serflaten, Aug 29, 2009
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.