TMG-L Archives

Archiver > TMG > 2001-05 > 0988858557


From: "Mary D. Taffet" <>
Subject: Re: [TMG] OT NGS Conference
Date: Wed, 02 May 2001 22:56:06 -0400
References: <5.0.2.1.2.20010501144349.00a11290@mail.cwnet.com> <5.0.2.1.2.20010502091857.00a1acd0@mail.cwnet.com>


Patricia,

You and I just might be able to help each other a bit. And I'm copying
Lee on this message.

I'm fairly new to this mailing list, having only acquired TMG in
February, so I don't yet know who the regular contributors to this list
are. Lee, I know you are one of them, but I'm not sure that I really
know what your position is.

Here's the situation:

1) Patricia, you have lots and lots of text files that have things like
names, dates and places in them, along with lots of other information in
what I gather is a rather unstructured manner. You need access to this
data, but it's buried, and getting it out of your files will be rather
painstaking and laborious, with lots of room for error along the way.

2) I have proposed a project for which I just happen to have a need for
data of the very nature that you describe. The project that I proposed
for the GENTECH 2001 Scholarship won, and now I actually have to do the
project! For that I need data. The lack of good data has been a
stopping point for me in my project (well, not stopping point actually,
I just haven't gotten started yet....). You can read about the project
that I proposed, and now have to do, on the following page:

http://web.syr.edu/~mdtaffet/GENTECH_Scholarship_Proposal.htm


While I understand that your issue is getting data into your program,
the type of approach I have in mind for this project would be very
beneficial to you in your need to extract the data from the text files
that you have accumulated. Extraction of names, dates and places, and
potentially the relationships between them, is what I have proposed to
do for my project.

The creation of an index based on the kind of tagging that I desire to
do could be manipulated fairly easily to pull out for you the data that
you want.

I realize that my timeframe (completion by January in order to present
my results at GENTECH 2002 in Boston) may not fit yours exactly, but I
think I would be able to provide you with some preliminary results from
your data that would show you the kind of information which could be
extracted.

If you're willing to help me by providing me copies of your text files
(at least some of them), I would be more than happy to help you by
providing you with hopefully useful and usable results in turn.

Would you care to discuss this possibility further?

As a graduate of two Linguistics programs (into language), a former
computer programmer with over 10 years of full-time programming
experience (into computers and programming), a graduate of a Library
Science program (into organization of information & information
retrieval), and an extremely avid genealogy hobbyist, not to mention the
fact that I am currently working in the field of corpus
linguistics/computational linguistics/natural language processing, I am
uniquely qualified to be of assistance to you.

Additionally, I plan to do my dissertation in an area that will be
beneficial to genealogists (among others). I plan to do my dissertation
on a methodology and system that will be useful for determining when two
references to an individual are or are not talking about the same
person, either within a document or across documents.

Assuming that there was more than one person with the same name in your
data, it would also make very good data for my dissertation.

-- Mary Taffet
Syracuse University
Ph.D. Student/School of Information Studies
Research Analyst/Center for Natural Language Processing
4-230 Center for Science & Technology
Syracuse, NY 13244-4100
E-mail:
WWW:http://web.syr.edu/~mdtaffet/



maverik wrote:
>
> At 09:14 PM 5/1/01 -0400, you wrote:
> >If you can give us more detail of your situation, some of us here may be
> >able to tell you what we have done in similar situations.
> >
> >----------
> >Lee Hoffman/KY
>
> Hi Lee, I have 3000+ original text files that have been created over the
> past 25 years by the family association There are probably close to 40 to
> 50 thousand names in these files. I started moving these text files into
> TMG about 2 years ago and have only made a nick into the data. What I would
> like to do is be able to have the members help me by putting the basic
> data, name, birth, place, death, place, father, mother, and spouse into an
> Excel spreadsheet and be able to bring that into TMG.
>
> I would then be able to search for the names we have and fill in the data
> as needed and as time permits. Over the past 2 years data continues to
> come in and the text files are now approaching 5000 and I feel like I am
> spinning my wheels.
>
> I know that they could input into TMG. However, I would lose consistency of
> data input and source documentation. I have been working with a couple of
> members trying to set up guidelines for data entry and source entry
> according to Mills. It isn't easy to do when one lives on the west coast
> and the other lives on the east coast and we are all very busy people.
>
> What I really need at this point is to know all of the names and basic data
> about the people in the text files.
>
> I know that I can get help from the members to input into Excel. Most of
> the members have it and can be brought up to speed on how to use it. I can
> set up a template for inputting the data. Most of them have their own
> preferences for a genealogy program so I couldn't expect them to buy
> another genealogy program just to do this work, no matter how much I love it.
>
> I would welcome any suggestions on how to make it easier and quicker to get
> the basics into TMG.
>
> Patricia


This thread: