Perl Script to Automatically Submit URL(s) to AltaVista
Alma Mater Software

Perl Script to Automatically Submit URL(s) to AltaVista

Alma Mater Software, Inc.

Contents

Automatically Submit URL to AltaVista

The perl script below will submit one or more URLs to AltaVista for indexing. You can also use the script to delete dead URLs that were once indexed by AltaVista but are no longer on your site, to bring AltaVista index back into "sync."

To use the script, save this HTML file and extract the perl code with a text editor. You may want to schedule a batch job (e.g., cron job on Unix) to submit a very volatile page on your site for indexing every one or two weeks.

Here is sample from a "crontab" file that will run the script every two weeks:

# Submit 
0 2 1,15 * * /usr/local/bin/submiturl http://www.mycompany.com 2>&1 | /bin/mail [email protected]
This will run at 2:00 am on the first and fifteenth of each month, and mail the output to the Webmaster.

Etiquette

Because it undermines the resource scheduling and load-balancing policies of the AltaVista servers, Digital (and Alma Mater Software for that matter) consider it in extremely bad taste to submit an entire site or portion of a site's URLs individually. Only submit the top level URL, or maybe one or two URLs with late-breaking news. Let the AltaVista crawler come back and index the rest of your site in due time.

Things to avoid:

Things to do:

Digital AltaVista suggest other etiquette for submitting URLs for indexing or deletion on its site.


#!/usr/local/bin/perl

# Copyright (C) 1997 Alma Mater Software, Inc.

# submiturl - Submit a URL to AltaVista for indexing or deleting

# See the AlataVista page on etiquette for submitting:
#    http://altavista.digital.com/cgi-bin/query?pg=tmpl&v=addurl.html

# Host to contact via HTTP to submit URL
$hostname   = "add-url.altavista.digital.com";

# CGI script to call to submit
$scripturl  = "/cgi-bin/newurl";

# HTTP standard port
$http_port  = 80;

use Socket;
&main(@ARGV);

# main routine
sub main {
    
    local(@args) = @_;

    &usage if !@args;
    local($arg);
    for $arg ( @args ) {
        &submiturl($arg);
    }
    exit 0;
}

# Submit URL
sub submiturl {
    local($url) = @_;

    local($handle) = "SUBMITSOCK";
    local($success) = open_socket($handle, $hostname, $http_port);
    &fatal("Cannot contact host '$hostname': $!") if ! $success;
    
    # Escape special characters in URL
    local($urlesc) = $url;
    $urlesc =~ s/(\W)/sprintf("%%%02x", ord($1))/ge;

    &send($handle, "GET $scripturl?q=" . $urlesc . "  HTTP/1.0\n\n");
    local($response) = &drain($handle);
    &close_socket($handle);

    # Make sure AltaVista could get URL and said how many seconds it took
    if ( $response =~ m/fetched in (.+) seconds/ ) {
        printf("%-50s SUBMITTED (%d seconds)\n", $url, $1);
    }
    elsif ( $response =~ m/this page is no longer valid/ ) {
        printf("%-50s %s\n", $url, "DELETED");
    }
    elsif ( $response =~ m/is not a valid URL/ ) {
        printf("%-50s %s\n", $url, "BAD URL!");
    }
    else {
        print STDERR "ERROR SUBMITTING URL '$url' ... Received reply:\n", $response, "\n\n";
    }
}

# Show usage and exit
sub usage {
    print STDERR "Usage: submiturl  ...\n";
    exit 1;
}


# Send message to socket
sub send {
    local($handle, @list) = @_;
    print $handle @list;
}

# Drain data from handle until EOF
sub drain {
    local($handle) = @_;
    local($buf, $data) = ("", "");
    while ( read($handle, $buf, 1024) > 0 ) {
        $data .= $buf;
    }
    return $data;
}

# Open socket on $targhost to $port
sub open_socket {

    local($handle, $targhost, $port) = @_;

    local($sock_template)           = 'S n a4 x8';  # Pack template

    # Get protocol data
    local($name,$aliases,$proto)    = getprotobyname("tcp");

    # Resolve host names to IP addr's
    $thisaddr = "\0\0\0\0";

    ($name,$aliases,$type,$len,$thataddr) = gethostbyname($targhost);
    &fatal("Cannot resolve host '$targhost'") if !$thataddr;

    # Open socket()

    local($this) = pack($sock_template, AF_INET, 0, $thisaddr);
    local($that) = pack($sock_template, AF_INET, $port, $thataddr);
    return 0 if ! socket($handle, AF_INET, SOCK_STREAM, $proto);
    return 0 if ! bind($handle, $this);
    return 0 if ! connect($handle, $that);

    # Set socket for immediate flush of writes
    select($handle);
    $| = 1;
    select(STDOUT);

    return 1;
}
    
# Close a socket
sub close_socket {
    local($handle) = @_;
    return close($handle);
}

# Fatal error
sub fatal {
    local(@stuff) = @_;
    print STDERR "*** FATAL ERROR:\n", @stuff, "\n";
    exit 2;
}

BACK to Tips for registering your site.


Copyright © 1997 Alma Mater Software, Inc.