GENIRE-L Archives

Archiver > GENIRE > 2003-03 > 1046687135


From: Don Moody <>
Subject: Re: Dear Dr. Doody - search engines
Date: Mon, 3 Mar 2003 10:25:35 +0000
References: <001c01c2e120$e3106220$f18491c2@computer>


In message <001c01c2e120$e3106220$>, Jane Lyons
<> writes
>Dear Dr. Doody,
>
>A I have already said - thank you for your constructive criticism of my
>lists - and in that I assume my web site also.
I wasn't criticising either. I was writing on generalities of
information retrieval which apply regardless of the subject, the medium,
and the 'lister'.

>
>I would appreciate any guidance if you can give it to me - please and
>thanks.
Information Science is a subject in its own right and has an extensive
literature. Librarians tend to know rather a lot about it, and have done
so for centuries. There Are librarians and a good library at TCD


>
>It is my belief that all search engines will only return the exact spelling
>of any word to you
Not true. The spell-checker in a common word-processor is an example of
a search engine where putting one spelling in will result in suggestions
of other spellings. Usually, 'spell-checking' in a search engine is
fairly unsophisticated, but all search engines I've come across, all the
way back to valves, mercury delay lines, and teleprinters, have had the
facility to use the * to mean 'any letter(s)'

However there is a problem more common in genealogy than in most other
subjects. BAIN* will find BAIN and BAINE. It will not find WHITEHEAD
(English), WEISSKOPF (German), or any other translation in any other
language of the fundamental concept of 'bheinn'. Search engines do not
(yet?) have the sophistication to search for meanings expressed in very
different ways in different circumstances. This is the bit where human
input still has to occur. Incidentally, chemistry and medicine are two
other fields in which one thing can have a plethora of names; and that
always has been part of the difficulty of those subjects. Difficulty
squared for those of us who were drug researchers working on the
interface between those subjects.

>Each page with names - that's indexed by surname and first
>name............in this way - anyone who goes through that page may possibly
>spot the surnames that are phonetically similar to theirs - e.g. Hiland,
>Highland, Hyland...........
One point I made was that 'who goes through that page' is likely to be
minute in number if the search engine does not cause the searched-for
term to be highlighted in some way in a long list. I gave an example of
where this is not done. Some lists on GENUKI might as well not be there.
There is no point in telling ordinary people that, say. a lady's name is
present in a list of, say, 10,000 marriages indexed by the men's names.
Fast scan is impossible (and usually made worse by incompetent choice of
typeface and size). Whereas highlighting that one name enables it to be
found as fast as the page can be scrolled without reading individual
words.

>
>I had assumed that I had in some ways managed to index my site to such an
>extent, that if any person came to one of my county pages then all the most
>they had to do was go through each page individually and open the page, go
>to their browser search facility and key in the word they were looking for -
>if that word was on my page then they would find it without having to read
>through the whole document...............
See above.

>Yet, you have pointed out to me that all my pages and all my work, this
>search engine that I have on my web site are of absolutely no value at
>all..............
I did not say that. You criticised the people who criticised the act of
listing. I pointed out, from general information science theory, that -
in effect - both you and your critics were 'wrong'. The 'wrong' is to
focus on the reformatting of information as a worthwhile or not
worthwhile activity in itself. Reformatting is, and always will be,
valueless unless it is done to suit the ordinary questions likely to be
asked by the ordinary users of the data.

A long time ago a famous company had 320 filing cabinet drawers full of
reports of new chemical compounds made anywhere in their worldwide
laboratories. They realised the data store was not being used. Chemists
were finding ways to make 'new' compounds which had been made already
within the company. They asked me whether microfilming the records, and
distributing copies of the microfilm to each lab round the world would
liberate value from this enormous database. I asked them how the
information was filed. 'By date of the report recording the synthesis,
of course.' I handed the MD a box of matches and suggested he burn all
the paper and flog the 80 filing cabinets as secondhand furniture if he
wanted to recover any value. Why? No chemist I've ever known or heard of
searches for a specific compound by the date of its first synthesis.
They all search by chemical structure, or even substructure. Dates may
be important to patent lawyers but they are of no interest to chemists
who want a sample of a compound. The famous company refused to change
its system. The system does not e4xist any more; because the company
does not exist any more. Effectively it went bust. There is a case study
of the wrongness of focussing on input and not thinking about output
first.

>Could you please advise me of your URL, as I would like to see how you
>format data.
I don't have a URL. The database I have for disabled and disadvantaged
professional people is never going to be available via a URL. The
database of my family is not going to be available via URL either. It
will be on CDs restricted to some bloodkin. The same reason applies to
both sets of data. Individual people could be hurt, very greatly, if
some facts about them became known. Therefore those facts will be held
off-line until all the participants are dead.

Seeing where you are, and in respect of the family database only, I
invite you to consider the potential hassles and religious intolerances
between rabbis of ultra-orthodox Jewish persuasion and Irish catholic
priests of a charismatic persuasion. Bigotry and hate are alive and
well. Merely to add spice, you can stir in a bunch of hard-line
Presbyterians. Hopefully, in another two or three generations, the
descendants will be so assimilated that they will look back to the
1930's with utter amazement at the sheer idiocy of hate arising out of
different descriptions of one God. That is when my researches will be
published. If the bigotry goes on, the publication will be delayed.

Don

who, by the way, is not likely to be found by a search engine looking
for DOODY.


>
>Thank you.
>
>Jane Lyons, B.Sc. Ph.D. D.E.E.
>Dublin, Ireland
>

--
Dr D P Moody, Ashwood, Exeter Cross, Liverton, Newton Abbot, Devon,
England TQ12 6EY
Tel: +44(0) 1626 821725 Fax: +44(0) 1626 824912


This thread: