|
How to Search List Archives the Easy Way
Hugh Jarvis (ANTOWNER@UBVM.BITNET)
Thu, 26 May 1994 13:32:42 EDT
database, such as an list archive. It is drawn from two files,
both available from the listserver:
zenhause v2n2 available from listserv@kentvm.kent.edu
listdb memo available from listserv@searn.bitnet
by sending the command "info database"
First you will need software. You can get it from the
listserver. Then, every time you initiate an interactive search,
you initiate the session with the command shown below. For
batch searching, you simply send a file, with the commands as the
only text, to the listserver. Most of what is described below is
written from the perspective of interactive searching, but will
aslo apply to batch searching. Send all listserver commands to
listserv@ubvm.cc.buffalo.edu or listserv@ubvm.bitnet
On the CMS system, there are two command files. Send
get ldbase exec
get lsviucv module
and then start each interactive session with the command
ldbase
On the VMS system, send the message
get ldbase com
and then start each interactive session with
@ldbase
A search instructs the listserver to look through all files in
a particular database for a particular set of paramaters. The
basic search command is a line like one of these:
search * in listname
search _string_ in listname
In searches, "*" targets the whole database, while the "string" is
a target word, series of words, or a fixed phrase. Words can be
any character string. However, if you want to include any of what
the listserver considers special characters, spaces, or to use
fixed case, you will need to follow the procedures noted below.
There are several commands you will use while searching. They
are "search", "index", "print", and "sendback". "search" alone
will target part or all of a database, but will not do anything
with it. Once you have selected your target, or "hits", you then
run "index" and "print" (or "sendback print").
You can search more than one database at a time,
eg. search * in anthro-l arch-l
The "search" command targets a word, a series of words, or a
fixed phrase. If you search for a series of words, the search is
made independent of word order. Use single quotes (' ') to
maintain word order, and double quotes (" ") to maintain order and
case. Searches produce one or more "hits", which can then be
operated on further. For example:
'TEXT' will find TEXT, Text, text, or teXt
"TEXT" will find only TEXT
Certain words and symbols are reserved for the listserver, so
they must be enclosed in quotes, either single or double. Some of
these special words and symbols are:
in, from, since, to, until, where, with, *()<>=|&^/,
or a series of blanks
Parentheses and boolian keywords can be used to ensure you are
targetting what you want. Remember that the logic used is
mathematical, and not regular English syntax. When no quotes are
used, "and" is implicit between words in a target string. Some
equivalent examples:
(wooden chair) or plastic chair not blue)
chair (wooden or (plastic not blue))
chair (wooden or (plastic but not blue))
chair and (wooden or (plastic and not blue))
Thus, some other logicals you may need to use follow, grouped as
synonyms:
=, is; &, and, but; ^=, <>, is not; |, /, or; ^, not
You can use dates in your searches. A complete date has the
following syntax "31 July 1985 23:59:59", but abbreviations and
partial dates are possible too. With the date, you can use
keywords such as "since", "until", and "from ... to". Examples:
since dec since 12/28
since today 11:53 since dec 85
until dec 85 from 12 aug to dec
Example searches:
search string in listname until _date_
search string in listname since _date_
search string in listname from _date_ to _date_
Date defaults will set "year" to the current year, and the rest
to include as much material as possible. For example, if you use
until july, it will use 31 july 23:59:59. If you use since july,
it will use 1 july 00:00:00. Months can be shortened to any
length, and ambiguities will be resolved to the first possibilty
(eg. j will be set as january, not june etc.). Case does not
matter.
You can aim your search at list header keywords, such as the
"subject", "from", and "reply-to" lines. To do so, use "where"
plus the keyword. You may also use parentheses for clarity, as
well as "is", "contains", "all", arithmetical operators, and more.
For example:
search * in anthro-l where sender is John Doe
search * in anthro-l where sender contains John Doe
search * in anthro-l where sender does not contain John Doe
Phonetic parameters are possible, and can be useful when you are
not sure of the exact spelling, or to cope with typos. Use "sounds
like" or "does not sound like". (This facility is set up as rather
quick and dirty to reduce computer time, so the quality of results
may be variable. It also works best for English.) An example is:
search * in anthro-l where subject sounds like arkeologee
If you need more than one line for a command sequence, you must
end the line with a dash. But you should try to keep quoted target
strings on one line so extra blanks do not get added by mistake.
For example:
search archaeology -
in anthro-l -
where sender sounds like (jon dough) -
since jul 1985
As a general strategy, it is better to start with a broad search
and then narrow it down. Once you have reduced the number of hits,
it is not possible to get any that you have excluded back without
starting from scratch. Once you run the first search, you can then
keep slimming the returns down until you have what you want,
whereupon you can retrieve your output.
The "index" command retrieves a list of your "hits" with
associated info, such as a "hit number", "from", "date", and
"subject". Index follows the search command on the next line.
While a whole bunch of parameters are possible, the most basic
syntax is simply
index
The "print" command retrieves the full contents of your hits;
i.e. the whole files. In interactive mode, you are limited to 30
lines maximum from a remote terminal. (In batch mode, there is no
limit.) "print" follows the search command on a fresh line. In
interactive mode, you will probably do a search, an index, then a
print once you see what is in the index. However, it is usually
better to leave interactive mode and then send the full sequence
again, with the print command, in batch mode, or, use the
"sendback" command described in the next section. (This reduces
the load on the net. Interactive mode takes up a lot of cpu time.)
With "print", you can list several hits in a row. For example:
print 10 106 (the #'s are provide by the index data)
The "sendback" command tells the listserver to send you your
results as a file instead of to your screen. "sendback" is only
used in interactive mode, and, once you use it, you can end your
session. Use "sendback" to retrieve any large blocks of data which
will exceed the maximum allowed for remote users. Sendback
immediately precedes the command whose output you want to receive.
Examples are:
sendback print
sendback index
Comments and corrections on this information are, as usual,
welcomed. Send them to me at
antowner@ubvm.cc.buffalo.edu or antowner@ubvm.bitnet
_________________ANTHRO-L LISTOWNER_________________
ANTOWNER@UBVM
|