Who’s afraid of GEDCOM?

If you’ve spent any time using computer applications in genealogy, including web based applications, you will probably have heard of a data format known as Genealogical Data Communications or just GEDCOM. It is a format developed by the Church of Jesus Christ of Latter-day Saints (often called the LDS or Mormon church), but it is available for anyone to use, and for this reason, it is supported by pretty much all genealogical software. There’s a good reason for this, too. Even if you always use the same application to do your work, there will likely come a time when you want to share data with someone else, and if you do switch to another program, you will need a vendor neutral way of storing your data. As of today, GEDCOM 5.5 is the only format to have gained sufficient traction to work for this purpose.

But if GEDCOM is so great, why don’t applications just use it as their standard data format? There are a few reasons for this. First of all, GEDCOM is a text based format that is designed for relatively straightforward representation of data. It is not designed for efficient storage and manipulation of data. In other words, it good for moving data from one application to another, but it is doesn’t provide efficient indexing or other features you might expect in format meant to support frequent updates. It doesn’t provide the flexibility you might want in areas such as internationalization and representation of complex relationships. To put it simply, it is primarily a submission format, one that provides a standard way of uploading data to FamilySearch.org.

But how does it work? Regardless of what software you use (or no software at all), you are probably intuitively familiar with the basic concepts. Your family tree consists of

  • Individuals, organized into families
  • Events associated with one or more individuals, such as birth, death or marriage
  • Other facts or attributes, such as name or sex
  • Relationships between people such as parent, child, or sibling
  • Documentation for facts or events

GEDCOM provides a way of representing each of these. To see how, let’s look at an excerpt from an actual GEDCOM file

0 @I1@ INDI
1 NAME Matthew /Cooper/
2 DATE 12 NOV 1925
2 PLAC Pittsburgh, Allegheny, Pennsylvania
2 DATE 03 FEB 1976
2 PLAC Cupertino, Santa Clara, California
1 FAMC @F1@

This is a representation of information about a single person. The digit at the beginning of each line is a level in a hierarchy. The individual appears at level 0, his name at level 1, and for his birth and death, the date and place occur at level 2. On the first line, INDI tells us that we are about to see a representation of an individual (as opposed to a family) and @I1@ is an index that can be used to refer to that individual elsewhere in the file. These indices always occur between “at” signs. The final line is a pointer to the family structure in which Matthew Cooper is a child. Before moving on, I should note that David Cooper’s surname appears between slashes. This is done so that names like de Silva will be treated as a unit. Other data format split the name into multiple fields (e.g., surname and given names), but GEDCOM does not do this. It should be noted that this is another weakness of GEDCOM, it fairs poorly in treating naming conventions used in other languages or other parts of the world in a consistent manner. But my intent here is not to criticize GEDCOM so much as explain how it works.

Let’s press forward:

0 @I9@ INDI
1 NAME Herman /Grimes/
1 FAMS @F5@
0 @I10@ INDI
1 NAME Priscilla /Richardson/
1 FAMS @F5@

Here, we have two individuals, Herman Grimes and Priscilla Richardson. Notice that each of them is associated with the same family, but this time using the FAMS tag. As you might expect, this is a pointer to the family in which the given person is a spouse or a parent. The family itself is defined later on in the document as follows

0 @F5@ FAM
1 HUSB @I9@
1 WIFE @I10@
1 CHIL @I8@

But what about source citations? If we include a birth certificate for George Cooper, we will find a few extra lines in the INDI record

0 @I2@ INDI
1 NAME George /Cooper/
2 SOUR @S1@
3 PAGE document number ABC123

and a record for the source citation itself

0 @S1@ SOUR
1 TITL Life on Triton birth certificates, 1925
2 CONC Life on Triton birth certificates, 1925.  TRITON microfilm publication 
2 CONC A1.  TARA Archives and Records Service, 1925.

We can, of course include other events, either standard ones such as emigration (to Saturn in this example):

2 DATE 1950
2 PLAC Saturn

or custom events, such as Invention in

2 TYPE Invention
2 DATE 04 MAR 1971

I have not covered all the details of the GEDCOM 5.5 standard, nor have I discussed any of the features needed specifically for LDS temple work, but I hope I have given you an idea of how it works, and demonstrated that the basic concepts and constructs are similar to what you find in other applications. There is no reason to feel intimidated by GEDCOM. If you want to learn more, the actual specification is online in a number of places such as GEDCOM 5.5.1

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: