![]() |
Perl Script to Automatically Submit URL(s) to AltaVista |
Alma Mater Software, Inc. |
To use the script, save this HTML file and extract the perl code with a text editor. You may want to schedule a batch job (e.g., cron job on Unix) to submit a very volatile page on your site for indexing every one or two weeks.
Here is sample from a "crontab" file that will run the script every two weeks:
# Submit 0 2 1,15 * * /usr/local/bin/submiturl http://www.mycompany.com 2>&1 | /bin/mail [email protected]This will run at 2:00 am on the first and fifteenth of each month, and mail the output to the Webmaster.
Things to avoid:
Things to do:
Digital AltaVista suggest other etiquette for submitting URLs for indexing or deletion on its site.
#!/usr/local/bin/perl # Copyright (C) 1997 Alma Mater Software, Inc. # submiturl - Submit a URL to AltaVista for indexing or deleting # See the AlataVista page on etiquette for submitting: # http://altavista.digital.com/cgi-bin/query?pg=tmpl&v=addurl.html # Host to contact via HTTP to submit URL $hostname = "add-url.altavista.digital.com"; # CGI script to call to submit $scripturl = "/cgi-bin/newurl"; # HTTP standard port $http_port = 80; use Socket; &main(@ARGV); # main routine sub main { local(@args) = @_; &usage if !@args; local($arg); for $arg ( @args ) { &submiturl($arg); } exit 0; } # Submit URL sub submiturl { local($url) = @_; local($handle) = "SUBMITSOCK"; local($success) = open_socket($handle, $hostname, $http_port); &fatal("Cannot contact host '$hostname': $!") if ! $success; # Escape special characters in URL local($urlesc) = $url; $urlesc =~ s/(\W)/sprintf("%%%02x", ord($1))/ge; &send($handle, "GET $scripturl?q=" . $urlesc . " HTTP/1.0\n\n"); local($response) = &drain($handle); &close_socket($handle); # Make sure AltaVista could get URL and said how many seconds it took if ( $response =~ m/fetched in (.+) seconds/ ) { printf("%-50s SUBMITTED (%d seconds)\n", $url, $1); } elsif ( $response =~ m/this page is no longer valid/ ) { printf("%-50s %s\n", $url, "DELETED"); } elsif ( $response =~ m/is not a valid URL/ ) { printf("%-50s %s\n", $url, "BAD URL!"); } else { print STDERR "ERROR SUBMITTING URL '$url' ... Received reply:\n", $response, "\n\n"; } } # Show usage and exit sub usage { print STDERR "Usage: submiturl...\n"; exit 1; } # Send message to socket sub send { local($handle, @list) = @_; print $handle @list; } # Drain data from handle until EOF sub drain { local($handle) = @_; local($buf, $data) = ("", ""); while ( read($handle, $buf, 1024) > 0 ) { $data .= $buf; } return $data; } # Open socket on $targhost to $port sub open_socket { local($handle, $targhost, $port) = @_; local($sock_template) = 'S n a4 x8'; # Pack template # Get protocol data local($name,$aliases,$proto) = getprotobyname("tcp"); # Resolve host names to IP addr's $thisaddr = "\0\0\0\0"; ($name,$aliases,$type,$len,$thataddr) = gethostbyname($targhost); &fatal("Cannot resolve host '$targhost'") if !$thataddr; # Open socket() local($this) = pack($sock_template, AF_INET, 0, $thisaddr); local($that) = pack($sock_template, AF_INET, $port, $thataddr); return 0 if ! socket($handle, AF_INET, SOCK_STREAM, $proto); return 0 if ! bind($handle, $this); return 0 if ! connect($handle, $that); # Set socket for immediate flush of writes select($handle); $| = 1; select(STDOUT); return 1; } # Close a socket sub close_socket { local($handle) = @_; return close($handle); } # Fatal error sub fatal { local(@stuff) = @_; print STDERR "*** FATAL ERROR:\n", @stuff, "\n"; exit 2; }
BACK to Tips for registering your site.