Who’s afraid of GEDCOM?

If you’ve spent any time using computer applications in genealogy, including web based applications, you will probably have heard of a data format known as Genealogical Data Communications or just GEDCOM. It is a format developed by the Church of Jesus Christ of Latter-day Saints (often called the LDS or Mormon church), but it is available for anyone to use, and for this reason, it is supported by pretty much all genealogical software. There’s a good reason for this, too. Even if you always use the same application to do your work, there will likely come a time when you want to share data with someone else, and if you do switch to another program, you will need a vendor neutral way of storing your data. As of today, GEDCOM 5.5 is the only format to have gained sufficient traction to work for this purpose.

But if GEDCOM is so great, why don’t applications just use it as their standard data format? There are a few reasons for this. First of all, GEDCOM is a text based format that is designed for relatively straightforward representation of data. It is not designed for efficient storage and manipulation of data. In other words, it good for moving data from one application to another, but it is doesn’t provide efficient indexing or other features you might expect in format meant to support frequent updates. It doesn’t provide the flexibility you might want in areas such as internationalization and representation of complex relationships. To put it simply, it is primarily a submission format, one that provides a standard way of uploading data to FamilySearch.org.

But how does it work? Regardless of what software you use (or no software at all), you are probably intuitively familiar with the basic concepts. Your family tree consists of

  • Individuals, organized into families
  • Events associated with one or more individuals, such as birth, death or marriage
  • Other facts or attributes, such as name or sex
  • Relationships between people such as parent, child, or sibling
  • Documentation for facts or events

GEDCOM provides a way of representing each of these. To see how, let’s look at an excerpt from an actual GEDCOM file

0 @I1@ INDI
1 NAME Matthew /Cooper/
1 SEX M
1 BIRT
2 DATE 12 NOV 1925
2 PLAC Pittsburgh, Allegheny, Pennsylvania
1 DEAT
2 DATE 03 FEB 1976
2 PLAC Cupertino, Santa Clara, California
1 FAMC @F1@

This is a representation of information about a single person. The digit at the beginning of each line is a level in a hierarchy. The individual appears at level 0, his name at level 1, and for his birth and death, the date and place occur at level 2. On the first line, INDI tells us that we are about to see a representation of an individual (as opposed to a family) and @I1@ is an index that can be used to refer to that individual elsewhere in the file. These indices always occur between “at” signs. The final line is a pointer to the family structure in which Matthew Cooper is a child. Before moving on, I should note that David Cooper’s surname appears between slashes. This is done so that names like de Silva will be treated as a unit. Other data format split the name into multiple fields (e.g., surname and given names), but GEDCOM does not do this. It should be noted that this is another weakness of GEDCOM, it fairs poorly in treating naming conventions used in other languages or other parts of the world in a consistent manner. But my intent here is not to criticize GEDCOM so much as explain how it works.

Let’s press forward:

0 @I9@ INDI
1 NAME Herman /Grimes/
1 SEX M
1 FAMS @F5@
0 @I10@ INDI
1 NAME Priscilla /Richardson/
1 SEX F
1 FAMS @F5@

Here, we have two individuals, Herman Grimes and Priscilla Richardson. Notice that each of them is associated with the same family, but this time using the FAMS tag. As you might expect, this is a pointer to the family in which the given person is a spouse or a parent. The family itself is defined later on in the document as follows

0 @F5@ FAM
1 HUSB @I9@
1 WIFE @I10@
1 CHIL @I8@

But what about source citations? If we include a birth certificate for George Cooper, we will find a few extra lines in the INDI record

0 @I2@ INDI
1 NAME George /Cooper/
2 SOUR @S1@
3 PAGE document number ABC123

and a record for the source citation itself

0 @S1@ SOUR
1 TITL Life on Triton birth certificates, 1925
1 NOTE
2 CONC Life on Triton birth certificates, 1925.  TRITON microfilm publication 
2 CONC A1.  TARA Archives and Records Service, 1925.

We can, of course include other events, either standard ones such as emigration (to Saturn in this example):

1 EMIG
2 DATE 1950
2 PLAC Saturn

or custom events, such as Invention in

1 EVEN
2 TYPE Invention
2 DATE 04 MAR 1971

I have not covered all the details of the GEDCOM 5.5 standard, nor have I discussed any of the features needed specifically for LDS temple work, but I hope I have given you an idea of how it works, and demonstrated that the basic concepts and constructs are similar to what you find in other applications. There is no reason to feel intimidated by GEDCOM. If you want to learn more, the actual specification is online in a number of places such as GEDCOM 5.5.1

Can it really be that easy?

If you’re like me, you have watched commercials for Ancestry.com with considerable skepticism. After all, genealogy is supposed to be hard work, involving countless hours digging through library stacks and perusing microfilms. On television, on the other hand, we see people entering only as small amount of information and then seeing a leaf appear indicating that a clue to further information is available. Can it really be that easy? I thought the obvious answer was that it wasn’t, it couldn’t be. So, for the longest time, I just ignored these commercials and didn’t even consider trying it. Well, of course, family history research is not easy, it can be hard work, it can be frustrating. But I eventually decided to try it out and was surprised at how fast I was provided with information about my ancestors. Now, it really isn’t my intention to make this into a testimonial, but suffice it to say that I now use Family Tree Maker and Ancestry.com as one of primary research tools. Now, to be fair, I’m part of a Mormon family that goes back to pioneer days, and have many ancestors from Colonial America (one sort of implies the other), so there are a lot of people who have been looking into my ancestors’ family lines for years. So, it might be argued that I’m not a very good test case, of course the information will be out there for the taking, and of course a family tree on Ancestry.com will only grow, and grow quickly.

But is that really right? First of all, we should note that data in the form of pedigree charts, family records and published genealogies are out there, but how much information does that raw data really provide us with? Data can be thought of as a collection of statements, the source and reliability of which may or may not be known. Of course, data comes from somewhere and designers of repositories of data known as databases are usually careful to record the source of the recorded data, and this information (known to database professionals as metadata) can be very valuable to us as we analyze the data and try to  glean useful information from it. Okay, that’s a lot of terminology. Let’s break it down. First off, data is pretty much anything that can be written down. It can be unstructured (like a journal) or structured (like a list of names and birth dates in a parish registry). What we know about information is called metadata (or “data about data”). Before I go on, I should make it clear that this terminology is taken from information science, not genealogy. So, if you get funny looks from other researchers if you ask about metadata, you’ll know why. That’s actually not quite the whole story: metadata is generally systematic recording such things as authorship, language, time recorded and so forth. One thing that metadata is not, though, is evaluation or even interpretation. A document may list  3 Jan 1840 as the birthdate of Mary Williams, but how confident are we? This is a bit of a digression, but if the information source is a birth certificate, we can be fairly confident, but if it’s a death certificate, it has to come from some other source. The type of document from which we get our data is yet another issue to consider when interpreting that data. As an aside, I’ll note that it’s not uncommon to see the date of birth and christening date for a person to be listed as the very same day? How likely is that? It is certainly possible, but is certainly not expected. If the date doesn’t come from a primary source it is possible that, somewhere along the way, someone needed a birthdate but the only information they had was the date of christening. We also need to consider the cultural context. Was the christening considered the more important event? Was it more important to record that information correctly or the date of birth? The point is that there is a necessary analysis phase that may be described as trying to ascertain what the available data is telling us. At this point, the question isn’t whether it’s true or false, but simply what it means. When we have performed this task, the analyzed data becomes information.

But to return to the topic of online genealogical research, we’re likely to find ourselves confronted with a wealth of often conflicting information (at this point, I’ll stop being pedantic about the distinction between the two). My genealogical database is full of alternative dates and places for what should be a single event, and I imagine yours is, too. Of course, we work hard to resolve these discrepancies, but that’s not always easy. This is why a technology can seem to give us a lot of answers very quickly, but then we find out that we’re not as sure of the information in our charts (or electronic databases). Does that mean that online search tools and social media just give us a false sense of knowledge, leading us to believe things that have yet to be proven? It would be easy for a cynic to take this position, but I think it is a mistake. The Internet is a powerful tool, and can be extremely helpful to us as we dig into our family history. It’s not a panacea, though, and we have to be critical of the information we’re able to find using search tools. We just need to do the work of evaluating, cross-referencing and verifying that information. This isn’t really so different from genealogy in the pre-Internet days. The difference is that instead of getting our raw data from microfilm or microfiche, we are likely to get it from a web-based tool. The amount of time we spend performing these various tasks may change, but the tasks themselves do not. Modern technology really can help us to find information more quickly, but we do need to be careful and methodical in our analysis if we want to avoid mistakes.