Using GeoNames for Genealogical Research

EuropeWe’ve talked about the importance of normalizing place names in your family tree. You may find the same state name written “Massachusetts”, “Massachusetts, USA”,”Mass.”,  or just  “MA”, and that’s just the beginning. Within states there are counties, boroughs, cities, towns, and local geographic references. The same places will frequently be referred to differently in different records, and it is important to know when two records refer to the same geographic location. There are several reasons for this, the most obvious being that you need to know whether the records refer to the same person. As an example, I have a grand aunt who seemed, according to some records, to have died in Florida. I was pretty sure this was not the case, but thought there was a small possibility that she could have moved there late in life, and I didn’t know. It turns out that another woman with the same first and last name (married name) who was born on the very same day, did die and was buried in Florida. But my grand aunt was buried in Utah, just as I suspected. In this case, the place names were so different that Citrus, Florida stuck out like a sore thumb, and I didn’t miss it. But it could have been different. What if my aunt were buried in a different part of Florida, possibly somewhere with a name I didn’t know? That’s the kind of discrepancy that it would be easy to miss, and I very well could have added quite a bit more inaccurate data to my family tree before the error was discovered.

Software can be very helpful in working with place names, but it can also make us vulnerable to other errors. As a simple example, I once encountered references to a colonial ancestor living in Germany. How could that be? In case you haven’t already guessed, she lived in Delaware, and someone recorded it as DE, the standard U.S. state abbreviation for Delaware. But DE is also the International 2-letter code for Germany (Deutschland in German). International standards (such as ISO 3166 for country codes) are indispensable in representing place names unambiguously and, if you’re like me, you use state abbreviations without giving it much thought. Unfortunately, computer applications often simply digitize paper forms, and since people have been writing addresses on one or a few lines for ages, computer programs tend to provide so-called free text fields for place names. And that’s where the trouble starts. In order to compare place names software needs to parse these fields into their constituent components.This is often done heuristically, so “Dover, DE” will be correctly interpreted as Dover, Delaware, and “Hamburg, DE” will correctly be interpreted as Hamburg, Germany. But in this case, something went wrong. Most likely, the software was unable to identify the name of the town or settlement after consulting a geographic database, so it fell back on the interpretation of DE as referring to Germany. The moral of the story is that you should always manually review place names before storing them in your family tree or database.

Of course, there is a problem, there are a lot of place names, and no matter how extensive our geographic knowledge, we are likely to encounter names we don’t recognize. Worse, we may think a place name is correct or complete when, in fact, it is not. This is where tool support comes in. Popular genealogy applications such as Family Tree Maker include geographic databases and include tools allow you to validate and correct place names. But what if you don’t use one of these tools? GeoNames is an open source database (licensed under the Creative Commons 3.0 Attribution license) that includes over eight million records, and it is freely available on the web. You can either use the web based interface, download the database or make use of the web service interface. Most of the time, you’ll probably want to use the form on the web site to look up place names using your browser. The other options are primarily of interest to application developers.

So, how does it work? Let’s suppose that the place name Adwick Le Street, Yorkshire, England is unfamiliar to us, or we are unsure it is spelled correctly. Head over to GeoNames at http://www.geonames.org and type “Adwick Le Street” in the search box, and select “United Kingdom” from the drop down box to the right of it. Press Search. You will see something like this

2 records found for “Adwick Le Street”
Name Country Feature class Latitude Longitude
1 P Adwick le Street  wikipedia article
United Kingdom, England
Doncaster > Brodsworth
populated place N 53° 34′ 14” W 1° 11′ 4”
2 S Adwick le Street Castle Hills
United Kingdom, England
Doncaster
castle N 53° 33′ 14” W 1° 10′ 9”

In this case, there is no need to use the advanced search option. If you like, you can click on the hyperlink to see Adwick Le Street on a map. This can help to resolve apparent ambiguities. For example, in the case of my ancestor John Woodhouse, I had seen him described as living in Doncaster as well as Adwick Le Street. As it happens, Doncaster is the nearest town. That tells me that I’m not looking at two place names (well, distant ones, anyway), and I don’t have to worry about having made a mistake. But if one source said he was born in London and another in Doncaster, I would have a problem, and would need to do further research to resolve the ambiguity.

Who’s afraid of GEDCOM?

If you’ve spent any time using computer applications in genealogy, including web based applications, you will probably have heard of a data format known as Genealogical Data Communications or just GEDCOM. It is a format developed by the Church of Jesus Christ of Latter-day Saints (often called the LDS or Mormon church), but it is available for anyone to use, and for this reason, it is supported by pretty much all genealogical software. There’s a good reason for this, too. Even if you always use the same application to do your work, there will likely come a time when you want to share data with someone else, and if you do switch to another program, you will need a vendor neutral way of storing your data. As of today, GEDCOM 5.5 is the only format to have gained sufficient traction to work for this purpose.

But if GEDCOM is so great, why don’t applications just use it as their standard data format? There are a few reasons for this. First of all, GEDCOM is a text based format that is designed for relatively straightforward representation of data. It is not designed for efficient storage and manipulation of data. In other words, it good for moving data from one application to another, but it is doesn’t provide efficient indexing or other features you might expect in format meant to support frequent updates. It doesn’t provide the flexibility you might want in areas such as internationalization and representation of complex relationships. To put it simply, it is primarily a submission format, one that provides a standard way of uploading data to FamilySearch.org.

But how does it work? Regardless of what software you use (or no software at all), you are probably intuitively familiar with the basic concepts. Your family tree consists of

  • Individuals, organized into families
  • Events associated with one or more individuals, such as birth, death or marriage
  • Other facts or attributes, such as name or sex
  • Relationships between people such as parent, child, or sibling
  • Documentation for facts or events

GEDCOM provides a way of representing each of these. To see how, let’s look at an excerpt from an actual GEDCOM file

0 @I1@ INDI
1 NAME Matthew /Cooper/
1 SEX M
1 BIRT
2 DATE 12 NOV 1925
2 PLAC Pittsburgh, Allegheny, Pennsylvania
1 DEAT
2 DATE 03 FEB 1976
2 PLAC Cupertino, Santa Clara, California
1 FAMC @F1@

This is a representation of information about a single person. The digit at the beginning of each line is a level in a hierarchy. The individual appears at level 0, his name at level 1, and for his birth and death, the date and place occur at level 2. On the first line, INDI tells us that we are about to see a representation of an individual (as opposed to a family) and @I1@ is an index that can be used to refer to that individual elsewhere in the file. These indices always occur between “at” signs. The final line is a pointer to the family structure in which Matthew Cooper is a child. Before moving on, I should note that David Cooper’s surname appears between slashes. This is done so that names like de Silva will be treated as a unit. Other data format split the name into multiple fields (e.g., surname and given names), but GEDCOM does not do this. It should be noted that this is another weakness of GEDCOM, it fairs poorly in treating naming conventions used in other languages or other parts of the world in a consistent manner. But my intent here is not to criticize GEDCOM so much as explain how it works.

Let’s press forward:

0 @I9@ INDI
1 NAME Herman /Grimes/
1 SEX M
1 FAMS @F5@
0 @I10@ INDI
1 NAME Priscilla /Richardson/
1 SEX F
1 FAMS @F5@

Here, we have two individuals, Herman Grimes and Priscilla Richardson. Notice that each of them is associated with the same family, but this time using the FAMS tag. As you might expect, this is a pointer to the family in which the given person is a spouse or a parent. The family itself is defined later on in the document as follows

0 @F5@ FAM
1 HUSB @I9@
1 WIFE @I10@
1 CHIL @I8@

But what about source citations? If we include a birth certificate for George Cooper, we will find a few extra lines in the INDI record

0 @I2@ INDI
1 NAME George /Cooper/
2 SOUR @S1@
3 PAGE document number ABC123

and a record for the source citation itself

0 @S1@ SOUR
1 TITL Life on Triton birth certificates, 1925
1 NOTE
2 CONC Life on Triton birth certificates, 1925.  TRITON microfilm publication 
2 CONC A1.  TARA Archives and Records Service, 1925.

We can, of course include other events, either standard ones such as emigration (to Saturn in this example):

1 EMIG
2 DATE 1950
2 PLAC Saturn

or custom events, such as Invention in

1 EVEN
2 TYPE Invention
2 DATE 04 MAR 1971

I have not covered all the details of the GEDCOM 5.5 standard, nor have I discussed any of the features needed specifically for LDS temple work, but I hope I have given you an idea of how it works, and demonstrated that the basic concepts and constructs are similar to what you find in other applications. There is no reason to feel intimidated by GEDCOM. If you want to learn more, the actual specification is online in a number of places such as GEDCOM 5.5.1

Using task lists to stay organized

Family history research is a lot like detective work. Sometimes, you will be able to quickly find the information you need about a person or a family, but more often than not, you have to work to find the information you’re looking for. In fact, except for the obvious case of vital dates and place names (and they’re not as simple as it appears on the surface), you may not know what you’re looking for until you find it. Instead, in the process of reading through the documents available to you, you find little intriguing details, such as an allusion to a girl trapped in a mine shaft, a brief mention of someone having contracted smallpox or cholera, or Quaker meeting minutes in which a person you are studying is voted out of the meeting, In fact, these are examples of tantalizing details I’ve come across in my own research. Usually, I would have no idea of what to do with them at the time, so all I could do is record them and (temporarily) move on. In reporter or detective jargon, these are “leads”, hints or details that can lead us to more information as we chase them down, even if at the start we have no idea how helpful they’ll be. I started to say important, but that’s not really right. If a girl is trapped in a mine shaft, that is important. It may be that we will have ha hard fitting it into a coherent narrative, or even finding any more details than we have. It may be that we can find no corroboration or additional sources, but if we can, then we’ve taken a big step towards writing an interesting chapter of our family history.

Different people will have their own preferred method of organizing leads and preliminary information, and I do not pretend that there is any one best solution for everyone. Instead, I’ll briefly consider a feature of both Family Tree Maker and Ancestry.com: task lists. When you come across a piece of information that you need to investigate further, you need to record that somewhere. At first, you may just be able to remember it, or write it down on a notepad, but as your family tree grows, and as the number of documents and photographs you’ve collected grows, a more systematic approach can really be helpful. In Family Tree Maker, there is a task pane in Plan View, and it is likely the first place you will see the a task list.

task list  in TM3

There are several things to notice here. First of all, the task has a priority. I set it to low because this is an item I want to come back to when I have time. It’s not keeping me from making progress in other areas, nor is there much of a risk of making mistakes if I don’t investigate this lead right away. Finally, it’s an interesting story, one I really want to investigate, but it’s not a vital date or other data element that is central to genealogy. So, by setting the priority to low, I’m not saying it’s unimportant or uninteresting. Rather, I’m setting my own priorities. Next, note that the task is associated with a specific person. You can create general tasks, and this is normally the only type of task you will create in Plan View, but most tasks will be associated with a specific person. This is important because, most of the time, you will move from one person to another while doing research, and you need to be able to keep track of what you want to do when you come back to the person you’re working on. The task has a creation date, and this can be important. You may wish to filter by task age so that old tasks aren’t simply forgotten. Or, on the other hand, you may write something down and come back to it thinking, “Why did I want to do that?” It may be that a task simply is relevant anymore, and knowing its age can help you decide whether or not to keep it.

person tasks in FTM3If you look at the toolbar above the tasks, you will see several buttons. One allows you to create new tasks. This is the only one that is enabled (not grayed out) because no tasks are selected. There are also buttons that allow you to edit tasks (for example, to change the wording), delete them, remove completed tasks, or apply a filter. A filter is a rule that can be used to select a subset of tasks, making it easier for you to see what you need to focus on. You can also print your task list in Plan View.

Most of the time, though, you will be working on specific people and will want to work with tasks associated with people. In Person View, select a particular person and be sure the Tree tab is selected (not Details). Then what you see will be something like this. Notice that there are toolbars. The one on top allows you to select Facts (the birthdates, marriage dates and so on that you usually think of when you think of genealogy), Media (usually photographs and scanned documents), Notes (additional details that you want to keep track of in your family tree – think of these as marginal notes), Web Links (bookmarks for websites that contain further information or are helpful in the context of this person), and Tasks. Since the last of these (tasks) is selected, we have secondary toolbar just below it which is very much like the toolbar we just saw in Plan View. It is here that you can create tasks linked to a particular person. You don’t have to go back here to view these tasks, you see all of them in Plan View (that way you don’t forget!) but you can edit tasks, delete them, or mark them complete here in the same way.

What if you don’t use Family Tree Maker or similar software? You can still maintain task lists using paper or your favorite note management tool. Evernote is quite popular among genealogists,it is device independent and has some nice  features for sorting notes. If you want to associate tasks with specific people, you probably want a method of assigning unique identifiers to people, and then you can use that identifier to tag the task. It would take us too far afield to discuss schemes for assigning identifiers to people right now, but if you use a software tool, it will probably do this work for you. There are several numbering schemes you can use, or you can use names and sequence numbers for uniqueness.

A final option worth considering is a general purpose database. It requires more work to set up the database schemas, but you can store information about people using the database management system (DBMS) of your choice: MySQL, PostgresSQL, Microsoft Access, FileMaker Pro, etc. Realistically, though, this is a lot of work, and unless you’re a computer programmer or database administrator, you might not want to take it on. On the other hand, if you want to go this route, there’s a real opportunity to develop a tool that will benefit the genealogical community.

The very least you need to know about sources and documentation

sample chart

When we think about genealogy we tend to think about the various types of charts and diagrams that are used to present results. What we tend not to think of, at least at first, are source citations, annotations, and supporting documents. This is natural enough. For one thing, we always start with a certain amount of information we just know, and which we want to record right away, without stopping to record references for every data item. Another reason is that it is just human nature to think in terms of the finished product, rather than the process we go through to arrive at our final product, be it a chart, a narrative or something else. And this is fine. I started out by recording what I already knew on a pedigree chart. Everybody does. But presentation is only part of the story: we want our genealogy to look good, but we also want it to be accurate and as complete as possible.

Notice that I’m not just talking about persuading your audience that your conclusions are correct. That’s important. Anyone could write down a few names and dates chosen at random (and that’s basically what I did to produce the above chart), but you want people to have confidence in your conclusions. You want to be systematic and precise, and not leave important information out through a simple oversight. Of course, your readers will appreciate your thoroughness, and they will be more likely (and able) to make use of your work if they have confidence in it. Even if your readers are not planning on doing any research of their own, they’ll still want to know if the information you present is accurate.

So, how do you get there? Genealogical research is very much like doing research for a term paper, and preparing charts and other diagrams is vey much like writing the paper. You no doubt remember being expected to document your sources in the form of footnotes or endnotes. Why do you do this (except, of course, for the obvious reason that your teacher requires it)? The obvious answer is that doing so, you provide support for your conclusions. Your paper becomes more persuasive. Certainly, this is what my teachers told me when I was in school, and it’s true, of course. But it’s only part of the story. You also want to have confidence in your conclusions independent of any consideration of how persuasive the paper you write may or may not be. You also want to be systematic in your research: you don’t to omit important aspects of the story. Conversely, you don’t want to get bogged down in details or lines of research that ultimately don’t really matter, or which detract from other aspects of your work. In short, you need a systematic way of keeping track of the main points of your presentation, how solid they are, what needs more work, and what you have pretty much nailed down. In the context of genealogy, this can mean not spending a lot of time compiling information about people who are not actually ancestors, but who have similar names and vital dates (such as birthdates) and were mistakenly included in one of your charts. To put it succinctly, proper documentation is not only important as a means of proving or demonstrating the correctness of your conclusions, it is equally important as a research tool. It helps you to guide and organize your research, even if relatively few people will actually want to check your sources.

In spite of the benefits of citing sources in genealogical research, many people do not include extensive documentation. Instead, explanatory footnotes will be included only for selected entries, or there will be no documentation at all. Why is this? Two obvious reasons for this are lack of tool support and lack of familiarity with the process of writing proper source citations. I find that having tools that help me to record, organize and manage information is indispensable. It’s not my goal here to advocate the use of any particular product, and I certainly don’t mean to imply that there is only one software tool that should be the one that everyone uses. But whether you use a specialized tool such as Family Tree Maker, general purpose tools such as spreadsheets and word processors, or paper and a physical filing system, you still need to keep track of your sources and document your charts by citing sources as appropriate. You can start out with a paper record, but the further you go, the more you will benefit from using a specialized tool. Fortunately, there are a number of genealogy tools available, and web sites such as Ancestry.com provide much of the same functionality. But to be blunt about it, even though you can do everything with paper an pencil if you want to, automated tools will make life much easier for you, and decrease the likelihood that you’ll leave out crucial documentation for lack of time. I strongly recommend using a specialized tool or web site (or both).

That takes us to the matter of mechanics. The basic data of genealogy are people, attributes or “facts” (which I place in quotes because the information we record may be incorrect or incomplete, or simply not proven), and relationships such as parent-child relationships, marriage, and so forth. As a matter of terminology, things  like birth, death, marriage, residence, national identifier and the like are called facts or events. We tend to say that things like birth, christening, immigration, marriage and the like are events. An identifier such as a social security number or a national identification number is a fact. But in genealogy, facts and events are generally treated together and are essentially the same sort of thing. It is facts and events that we document. This is crucial. One might think that relationships such as “A is married to B” or “A is the son of B and C” would be documented directly, but instead we document the marriage of A and B, or the birth of A to B and C. Now, events usually have attributes, usually a date and a location, and sometimes others, but it is the events that we document. This is really a design decision. Tools and standards (such as GEDCOM) could have been designed to document relationships (or even people) directly, but if you think about it, documenting events makes sense, and you’ll want to follow this approach.

Okay, so how do you document an event? By referring to a source. What is a source? It can be something concrete like a birth or death certificate, a parish registry, a book, or something a little more abstract like the 1860 census of the United States. A source can also be an index such as a listing of the people buried in a cemetery. What a source is typically not is a page in a book, a particular roll of microfilm, or a record in an online database (such as findagrave.com). As a rule of thumb, if it’s part that can be separately indexed, it’s not an independent source. You should try to include one or more source citations for each fact in your database. A source citation refers to a source (of course), but includes additional indexical information such as page numbers or record numbers where this makes sense. In Family Tree Maker and at Ancestry.com, this is called the citation detail. In addition, you can add citation text, which is generally freer in form, and generally includes a transcription of the cited text or other details. If you are using one of these tools, two additional details you can add are a URL for an online resource and a multimedia item such a scanned document.

leaf exampleFinally, I should say a bit about Ancestry.com and the hint (or leaf) mechanism it uses to help you discover (and sometimes document) information. If you use Family Tree Maker (in this case, Family Tree Maker 3 for the Macintosh) and it is linked to Ancestry. You my see a leaf like this. If you click on the leaf, you will be given the opportunity to review information and decide if you want to add it to your tree. If you use the web site instead, you will see the same leaf icon, but the interface is a little different. If you use this mechanism, Ancestry will add source citations for you, citing the resources it matched against your family tree. At this time, they can be any of a number of things, including census records, indexed birth and death certificates, immigration records, land grants and other types of records. This doesn’t mean you don’t want to review the source citations, and you don’t want to limit yourself to using this search tool. Add your own sources and source citations as appropriate. And if you use public member trees in Ancestry as sources, be sure to click on the tree names and review their sources. Member trees can be a great search tool, but they are not what you want to use as information sources in your final product. Different genealogists will tell you different things here, but my position is that it’s fine to use member trees as a tool as long as you’re careful to document your sources and you don’t treat them as the last word.