![]() |
Perl Script to Automatically Submit URL(s) to AltaVista |
Alma Mater Software, Inc. |
To use the script, save this HTML file and extract the perl code with a text editor. You may want to schedule a batch job (e.g., cron job on Unix) to submit a very volatile page on your site for indexing every one or two weeks.
Here is sample from a "crontab" file that will run the script every two weeks:
# Submit 0 2 1,15 * * /usr/local/bin/submiturl http://www.mycompany.com 2>&1 | /bin/mail [email protected]This will run at 2:00 am on the first and fifteenth of each month, and mail the output to the Webmaster.
Things to avoid:
Things to do:
Digital AltaVista suggest other etiquette for submitting URLs for indexing or deletion on its site.
#!/usr/local/bin/perl
# Copyright (C) 1997 Alma Mater Software, Inc.
# submiturl - Submit a URL to AltaVista for indexing or deleting
# See the AlataVista page on etiquette for submitting:
# http://altavista.digital.com/cgi-bin/query?pg=tmpl&v=addurl.html
# Host to contact via HTTP to submit URL
$hostname = "add-url.altavista.digital.com";
# CGI script to call to submit
$scripturl = "/cgi-bin/newurl";
# HTTP standard port
$http_port = 80;
use Socket;
&main(@ARGV);
# main routine
sub main {
local(@args) = @_;
&usage if !@args;
local($arg);
for $arg ( @args ) {
&submiturl($arg);
}
exit 0;
}
# Submit URL
sub submiturl {
local($url) = @_;
local($handle) = "SUBMITSOCK";
local($success) = open_socket($handle, $hostname, $http_port);
&fatal("Cannot contact host '$hostname': $!") if ! $success;
# Escape special characters in URL
local($urlesc) = $url;
$urlesc =~ s/(\W)/sprintf("%%%02x", ord($1))/ge;
&send($handle, "GET $scripturl?q=" . $urlesc . " HTTP/1.0\n\n");
local($response) = &drain($handle);
&close_socket($handle);
# Make sure AltaVista could get URL and said how many seconds it took
if ( $response =~ m/fetched in (.+) seconds/ ) {
printf("%-50s SUBMITTED (%d seconds)\n", $url, $1);
}
elsif ( $response =~ m/this page is no longer valid/ ) {
printf("%-50s %s\n", $url, "DELETED");
}
elsif ( $response =~ m/is not a valid URL/ ) {
printf("%-50s %s\n", $url, "BAD URL!");
}
else {
print STDERR "ERROR SUBMITTING URL '$url' ... Received reply:\n", $response, "\n\n";
}
}
# Show usage and exit
sub usage {
print STDERR "Usage: submiturl ...\n";
exit 1;
}
# Send message to socket
sub send {
local($handle, @list) = @_;
print $handle @list;
}
# Drain data from handle until EOF
sub drain {
local($handle) = @_;
local($buf, $data) = ("", "");
while ( read($handle, $buf, 1024) > 0 ) {
$data .= $buf;
}
return $data;
}
# Open socket on $targhost to $port
sub open_socket {
local($handle, $targhost, $port) = @_;
local($sock_template) = 'S n a4 x8'; # Pack template
# Get protocol data
local($name,$aliases,$proto) = getprotobyname("tcp");
# Resolve host names to IP addr's
$thisaddr = "\0\0\0\0";
($name,$aliases,$type,$len,$thataddr) = gethostbyname($targhost);
&fatal("Cannot resolve host '$targhost'") if !$thataddr;
# Open socket()
local($this) = pack($sock_template, AF_INET, 0, $thisaddr);
local($that) = pack($sock_template, AF_INET, $port, $thataddr);
return 0 if ! socket($handle, AF_INET, SOCK_STREAM, $proto);
return 0 if ! bind($handle, $this);
return 0 if ! connect($handle, $that);
# Set socket for immediate flush of writes
select($handle);
$| = 1;
select(STDOUT);
return 1;
}
# Close a socket
sub close_socket {
local($handle) = @_;
return close($handle);
}
# Fatal error
sub fatal {
local(@stuff) = @_;
print STDERR "*** FATAL ERROR:\n", @stuff, "\n";
exit 2;
}
BACK to Tips for registering your site.