Introducing GENEALOGIX: An Open Format for the Family Archives We Actually Have
A portable, extensible archive format built on YAML and Git—where oral traditions, photographs, and vital records carry equal evidentiary weight, and every archive defines its own rules.
The format gap
Think about what you actually have. A shoebox of photographs. A phone recording of your grandmother telling a story she’ll only tell once. Three conflicting spellings of your great-grandfather’s name across a census, a ship manifest, and a baptismal record. A cousin in another state who insists the family came from a different village entirely.
This is real family history. It’s messy, contradictory, multilingual, and stubbornly human. It does not fit neatly into rows and columns.
For decades, the dominant exchange format has been GEDCOM, first drafted in 1984.1 It’s been remarkably successful, but its architecture is conclusion-first rather than evidence-first—and extending it for diverse research domains or richer evidence models means working around the format rather than with it. GEDCOM 7.0 (2021) made meaningful improvements—UTF-8, same-sex relationship support, GEDZip multimedia bundling—but adoption has been slow and the underlying model remains largely unchanged.2
We think family data deserves better infrastructure. Today we’re publicly introducing GENEALOGIX (GLX)—an open specification for version-controlled, evidence-first family archives. GLX is the underlying data model for Oracynth—it’s the foundation everything we build sits on top of. The specification is currently in beta, and we’re developing it in the open.
What GENEALOGIX is
GENEALOGIX is a permanent, human-readable archive format for genealogical research and related domains. Each entity in your archive—a person, an event, a relationship, a source, a place—lives in a plain YAML file. Those files live in a standard Git repository. That’s it. No proprietary database. No vendor lock-in. No format you can’t open in a text editor.
Here’s what a person looks like:
persons:
person-margaret-chen:
properties:
name:
value: "Margaret Mei-Ling Chen"
fields:
given: "Mei-Ling"
surname: "Chen"
prefix: "Margaret"
gender: female
born_on: "ABT 1923"
born_at: place-guangzhou
notes: |
Name romanization varies across documents.
Immigration records use "Mei Ling Chan."
Custom, human-readable identifiers. If you can read a text file, you can read your family history.
Evidence first, conclusions second
Most genealogy software stores conclusions: “Margaret Chen was born about 1923 in Guangzhou.” That feels clean. But it hides the most important part of family research—the reasoning. Which source said 1923? Was it primary or derivative? Does another document say 1921? Who decided which date to trust, and why?
GLX is built on an assertion-aware data model inspired by prior work in genealogical and archival standards.3 Instead of attaching facts directly to a person, GLX separates the claim from the evidence through a five-level chain:
Repository → Source → Citation → Assertion → Property
Each level serves a distinct role. A repository is the institution or location holding original materials—an archive, a church, a government office. A source is a specific document or record held by that repository—a birth register, a census volume, a family Bible. A citation pinpoints exactly where within a source the relevant information appears—a page number, an entry reference, a transcribed passage. An assertion connects that cited evidence to a specific claim about an entity, recording the concluded value alongside the researcher’s confidence in it. And a property is the researcher’s current accepted conclusion—what you believe to be true based on the evidence you’ve gathered so far.
The power of this separation is that you can record properties quickly during data entry and add rigorous assertions later during focused research. Multiple assertions can coexist—even contradictory ones. Your archive doesn’t force you to pick a winner until you’re ready:
assertions:
assertion-margaret-birth-cert:
subject:
person: person-margaret-chen
property: born_on
value: "1923-05-12"
citations: [citation-hk-birth-cert]
confidence: high
notes: "Primary direct evidence."
assertion-margaret-birth-bible:
subject:
person: person-margaret-chen
property: born_on
value: "1923-03-10"
citations: [citation-chen-family-bible]
confidence: medium
notes: |
Family Bible entry conflicts with certificate.
Likely recorded from memory years later.
This isn’t scholarly rigor for its own sake. It’s practical: when your cousin sends a document that contradicts your timeline, your archive holds both versions without losing either. When AI tools improve at reading handwriting or translating documents, your original sources are still there to reprocess. And when you return to a research question after years away, the reasoning is right there in the file.
Your archive, your rules
Archive-owned controlled vocabularies are GENEALOGIX’s most powerful feature—and the one that most directly challenges how existing formats work.
Traditional formats ship with fixed type systems. You get the event types, relationship types, and property categories that a standards committee decided were universal. If your research domain doesn’t fit—too bad.
GLX inverts this. Each archive defines its own valid types in vocabulary files that ship alongside your data. The glx init command seeds standard vocabulary files covering common genealogy types, and then you extend them however your research demands:4
# vocabularies/relationship-types.glx — your archive, your types
relationship_types:
marriage:
label: "Marriage"
description: "Legal or religious union"
parent_child:
label: "Parent-Child"
description: "Biological, adoptive, or legal"
# Your additions
compadrazgo:
label: "Compadrazgo"
description: "Godparent/co-parent ritual kinship"
chosen_family:
label: "Chosen Family"
description: "Enduring ties forged by care and obligation"
mentor:
label: "Mentor-Mentee"
description: "Intellectual or professional mentorship"
This means GLX adapts to research domains far beyond traditional genealogy. Studying colonial history? Define indenture and manumission events. Building an academic prosopography? Add doctoral_advisor relationships. Researching maritime history? Create ship_departure and port_arrival types. The standard vocabularies give you a starting point; you’re never limited by them.5
No central registry. No approval process. No committee. Your vocabularies are Git-versioned alongside your data, and the CLI validates everything against them:
glx validate
# ✓ All entity references valid
# ✓ All vocabulary types defined
# ⚠ Unknown property "clan_affiliation" (not in person-properties.glx)
That last line is a warning, not an error—GLX won’t block you from working just because you haven’t formalized a new property yet. Strictness where it matters (broken references), flexibility where you need it (emerging properties during active research).
Oral traditions and multimedia as first-class evidence
We’ve written before about the politics of preservation and about FAN networks and the future of family history. GLX puts those critiques into practice.
In the assertion model, an audio recording of your grandmother describing her childhood carries the same structural weight as a certified birth certificate. Both are sources. Both produce citations. Both support assertions with explicit confidence levels. The format doesn’t rank them—the researcher does, transparently:
assertions:
assertion-family-migration:
subject:
person: person-abuela-rosa
property: residence
value: place-san-juan
date: "FROM 1952 TO 1960"
media: [media-oral-history-rosa-2024]
confidence: medium
notes: |
Based on oral history interview recorded 2024-01-15.
Rosa describes arriving "when Eisenhower was president"
and leaving "before Kennedy." No documentary confirmation
yet, but consistent with Operation Bootstrap migration patterns.
This matters for communities whose histories were never prioritized by official record-keeping systems—a reality we explored in depth in our piece on the destruction of Palestinian archives. If your family’s story lives in oral tradition, in community memory, in photographs and home videos rather than government files, GLX doesn’t make you work around the format. It was designed with you in mind from the start.
Git is the archive
We chose Git as the storage layer for reasons that go beyond developer familiarity. Git provides exactly the properties that archival data needs: immutable history, branching for collaborative research, and cryptographic verification that nothing has been silently altered.6
When your family archive is a Git repository:
- Every edit is permanent. You can see exactly what your archive looked like at any point in time, who changed what, and when. No more wondering which copy of a file is current.
- Collaboration scales. Multiple family members or researchers can work on separate branches and merge their contributions—the same workflow that millions of software teams already trust with mission-critical work.
- You own your data. The repository is the archive. Back it up anywhere. Host it anywhere. Move it anywhere. There is no export button because there’s nothing to export from.
- Offline works by default. Git doesn’t need a network connection. Edit on a plane. Sync when you land. This isn’t a feature we had to build—it’s a consequence of the architecture.
You don’t need to be a developer to use Git. Tools like GitHub Desktop provide a visual interface, and the glx CLI handles the genealogy-specific parts for you.
What GLX is not
GLX is not a visualization tool. It doesn’t draw family trees or generate reports. It’s the data layer—the foundation that visualization tools, AI assistants, and research platforms build on top of.
And GLX is not finished. The specification is at v0.0.0-beta.3—expect changes as the format matures. We’re developing it in the open because we believe formats this fundamental shouldn’t be designed behind closed doors.
GEDCOM interoperability
GLX is not a replacement for GEDCOM in every scenario. GEDCOM has forty years of ecosystem momentum, and interoperability matters. That’s why we’ve built bidirectional conversion: glx import translates GEDCOM files into GLX archives (already implemented, absorbing four decades of vendor extensions and encoding quirks), and GEDCOM export is shipping later in 2026. The export is inherently lossy—GLX captures richer data than GEDCOM can represent—but we believe in meeting people where they are:
# Import your existing GEDCOM file
glx import family.ged -o my-family-archive
# Creates a full GLX archive with vocabularies,
# entities, and source citations preserved
If you have decades of research in GEDCOM files, you don’t have to start over. Import what you have, and GLX preserves your sources, citations, and relationships—then gives you room to enrich them with the evidence model described above.
Open tooling, not just an open spec
A specification without tooling is a whitepaper. We’re shipping the glx-go reference implementation alongside the format—parser, validator, GEDCOM importer, and CLI. It handles the gnarliest interoperability work and provides the glx init, glx validate, and glx import commands.
We think the pattern of open spec, open tooling, and shared conformance tests is how you build trust in a format that’s going to hold families’ most precious data. The specification, JSON schemas, examples, and test suites are available on GitHub under the Apache 2.0 license.7
Get involved
The GENEALOGIX specification repository is live at github.com/genealogix/glx, with documentation at genealogix.io. You’ll find the complete specification, JSON schemas for validation, working examples from minimal archives to fully-cited family histories, and a test suite covering every entity type.
If you’re a genealogist frustrated by the limitations of existing formats, a developer interested in building tools for family history, a historian working in domains that existing software can’t accommodate, or someone who wants to ensure your family’s stories survive the next platform migration—we’d love to hear from you. File issues, start discussions, or just read the spec and tell us what we got wrong.
Your family’s history is too important to be trapped in someone else’s database.
Early Access Signup Open
Join the First Platform Built on GLX
Free early access • No credit card required
Footnotes
-
GEDCOM was developed by The Church of Jesus Christ of Latter-day Saints beginning in 1984. Version 5.5.1 (1999) remains the most widely implemented version. See FamilySearch, “The FamilySearch GEDCOM Specification.” ↩
-
GEDCOM 7.0 was released June 2021. As of early 2026, major vendors including Ancestry and MyHeritage have not fully implemented 7.0 support. See the GEDCOM 7.0 registry of implementations. ↩
-
The assertion model is inspired by: the GENTECH Genealogical Data Model (2000) and its three-tiered evidence architecture; CIDOC-CRM E13 Attribute Assignment (ISO 21127) for formal assertion modeling; and GEDCOM-X for its Information → Evidence → Hypothesis → Conclusion → Proof hierarchy. ↩
-
The standard vocabulary files cover: event types, relationship types, place types, source types, repository types, media types, participant roles, confidence levels, and property vocabularies for persons, events, relationships, places, media, repositories, sources, and citations. ↩
-
GLX’s extensibility makes it suitable for traditional genealogy, biographical research, prosopography (collective biography), colonial and local history, maritime research, religious studies, historical demography—any domain involving people, events, and relationships. ↩
-
You don’t need to use Git to use GLX. The files are valid YAML regardless of whether they’re in a repository. But Git integration is where the format’s collaborative and archival properties come alive. See the specification’s Archive Organization section. ↩
-
GENEALOGIX is licensed under the Apache License 2.0. Copyright 2025-2026 Oracynth, Inc. ↩