HRAF

Barry Lewis (blewis@UX1.CSO.UIUC.EDU)
Wed, 27 Apr 1994 15:16:29 -0500

My thanks to Marlene Martin, HRAF bibliographer and Librarian, and my
colleague Tom Riley, the new Chairman of the HRAF Board of Directors for
responding to my posting concerning the HRAF self-interest group. I am
somewhat disappointed to see that the only comments are both from persons
who are HRAF personnel or effectively so. Although they each raise
excellent points, they do what I'd do in their place and take issue with my
"vision" on e-text developments over the short run rather than consider
that there may be multiple valid perspectives on these developments. On
the other hand, to agree with a scenario that is at odds with the direction
in which HRAF has "bet the ranch" would be a little foolish. Whatever...we
see different futures for the growth of text-based anthropological
archives.

To touch briefly on several points raised by Ms. Martin. First, many
thanks for the INFORMATION TECHNOLOGY AND LIBRARIES reference. I'll sure
read it. While we're trading references, she and others might find the
1994 Conference on Computing for the Social Sciences of interest this year.
The theme is "Information Society: Superhighways or Gridlock?"; it'll be
held at the University of Maryland at College Park on May 31-June 3 (e-mail
CSS($@bss1.umd.edu for details).

Second, on the issue of scanning texts, I'm sorry that Ms. Martin thought I
was talking about low-end hand-held scanners. I wasn't. (Beats me what
they're good for except to make it easier for the manufacturer to extract
money from the wallets of some consumers). The scanner issue is one of
those topics in which we all could benefit from HRAF, Inc.'s experience.
Since Ms. Martin comes across as being so vehemently opposed to scanning, I
asked Michael Hart, Executive Director of Project Gutenberg, a major e-text
conversion and distribution (for free) project, his views on scanning.
When I asked if most Project Gutenberg e-texts were keyed in by hand or
scanned in and then massaged into postable form, Hart responded "We have
about 500 volunteers, and we don't really keep track of how they input
their etexts, but I would say at least half are scanned, tho many of the
professionals now have them typed overseas." When I further asked about
the viability of scanning for getting texts online, he said, "Scanning is
getting better and better, and even including the proofing, is much easier
than typing. If your proofers are terrible, then they won't catch enough
of the errors to equal the quality of the typing. However, the great
advantage of etext is that it only takes a minute to correct an error, so
they will eventually surpass all paper texts in accuracy."

The point of my discussion of the scanning issue is assert, once again,
that the volunteers with the necessary hardware and software _do_ have a
potential role to play in bringing more anthropological primary texts and
data archives online where we can all use it at low or no cost. Whether or
not this is a viable alternative remains to be demonstrated, but what good
comes from denying its viability out of hand?

Third, Ms. Martin writes..

>HRAF differs significantly from the Ad-Hoc E-Texts in that what we basically
>do at HRAF and have always done is to add value to texts. This value that we
>add consists most significantly in the selection of texts and the application
>at a very specific level, i.e., each paragraph of the text, of analytic,
>controlled-vocabulary indexing. I am referring to the use of the OUTLINE OF
>CULTURAL MATERIALS (OCM) and the OUTLINE OF WORLD CULTURES (OWC). The
>development and application of these indexing systems is a significant task
>and much of the cost of the HRAF archive subscription pays for these value-
>added features.

Sounds good at first glance, but, as a user of this archive, when I look
closer I see in the OCM, the OWC, and the data archive itself, a strong,
inescapable commitment to design principles that Murdock articulated in the
1930s, a commitment that is necessarily handicapped by a 1930s notion of
culture, technology, and research. This is not to say that many of the
theoretical and methodological criticisms leveled at HRAF in the past have
not been fixed or at least mitigated to some extent. They have. From my
perspective, however, the fixes and mitigations have come less from
inspired work by HRAF researchers than from the simple availability of
coded data, advances in multivariate data analysis, and the advent of cheap
computing. Given a choice, I'd rather have the HRAF e-texts without the
burden of Murdock's legacy.

Finally, I'm glad to hear that part of the BAEB are available in cd-rom
format. Nevertheless, $695 a pop just isn't good enough. I still hope to
these records and other things like it (there's also the Ohio Valley-Great
Lakes Ethnohistory archive at Indiana University; it would be a nice one)
available online or at low or no cost on cd-rom.

For the sake of the "silent majority" who may be following this thread,
I'll clarify where I'm coming from on the "HRAF interest group" question.
First, I am a HRAF supporter, not an opponent (hard to believe, isn't it?).
Second, I'm a regular user of HRAF archive materials in paper, microfiche,
and cd-rom form. Third, I'm a member of the Society for Cross-Cultural
Research. Fourth, one of my main professional interests (teaching and
research) is research methods. In addition to my existing mix of UIUC
methods classes, I also introduced and taught an upper-level
undergrad/graduate seminar here this past semester entitled "Cross-Cultural
Research using the HRAF Archive". To my knowledge this was the first UIUC
course to be focused specifically on this data archive. I'm now drafting a
proposal to put it on the books as a regular course offering -- but
expanded to text-based data archives in general, including HRAF. Finally,
I make certain that the UIUC social science librarians know that, in spite
of its cost, inherent data biases, archaic format, and the 20-30 or so
filing cabinets stuffed full of little paper slips that intimidate the hell
out of most prospective users, we definitely need HRAF. Other UIUC
researchers must be making the same case for keeping it. Otherwise, we'd
have parted ways with HRAF and used the several thousands of dollars that
it costs each year from the serial publications budget to get more social
science journals. (Regardless, we may be fighting a losing battle. Unless
the abysmally low user load for HRAF materials turns around soon at UIUC, I
suspect it will be dropped).

Does, however, the future of text-based data archives rest mainly in the
hands of the value-added resellers? I sure hope not. HRAF belittles the
potential contributions of individuals and volunteers to get texts into
machine-readable form, but such efforts can do a lot to get research
materials to people who can make use of them. Consider, for example, the
important roles played by volunteers in the Internet.

To repeat and rephrase a key assertion from my initial posting: the
discipline needs leadership in the area of developing, maintaining,
disseminating, and analyzing large scale text databases. The HRAF staff is
an obvious potential source of this leadership, but, given the
inflexibility I see in Ms. Martin's response, I'm apparently wrong. It
doesn't look like they're interested in anything other than their project.

If any of the AAA leadership are following this thread, please consider the
following:

1. Create an AAA interest group that strives to discover innovative,
feasible ways to get more text-based data archives online at low or no
cost.

2. Create one that helps to define development priorities and develops
standards to ensure that the maximum number of researchers can get these
data regardless of their hardware platforms and software tools.

3. Do _not_ create a _HRAF_ interest group. To do so will serve only the
needs of HRAF, Inc., and a small pool of researchers. The scope of such an
interest group will necessarily (but appropriately) be limited to HRAF's
vision of the future of text-based data archives. There is a bigger need
out there.

What do the rest of you think? The dialog to this point has been HRAF and
me needling each other. Your opinions could help to influence the
direction of AAA leadership in data archive development.