That Time I Remembered IDs are important

The Setup

So there I was, trying to live my marginally normal life and not spend all of it at a computer, when I noticed that I initially created this site with 11 separate Schema.org include files. Eleven! Like some kind of schema hoarder who couldn’t let go of a single <script type="application/ld+json"> tag.

I restarted writing this blog in July, when I was still just sorting out what was available to me in the open source world for big projects, as opposed to the Facebook core stack that I’d gotten used to over 8 years. You know that feeling when you look at code you wrote years ago and think “what was I even doing here?” Or you just shudder in horror and want to look (or run) away? Yeah. That. Except it only took 3-4 months.

When I was first researching, schema.org seemed like the best open-source equivalent of a universal taxonomy/language for ‘things’ and ‘things related to those things’ and ‘how the things are related’ that one of my personal heroes, Oliver Dodd, designed at Facebook and called EntSchema. So I started using it for this blog in July.

But I forgot to give them entity ids! When I looked recently, half of them were duplicating the same entities. I had my Person schema defined in like 4 different places. I just threw schema.org at my site and gave it an identity crisis more severe than mine during my imposter-syndrome soaked first 6-months at Facebook where, every time my ID badge didn’t immediately click green, I was convinced I had been discovered as a fraud and been immediately fired. (Spoiler: still convinced, just better at hiding it. Actually, I just wrote that down, so worse at hiding it. Actively, publicly bad at hiding it now.)

The Problem

Here’s what was wrong:

11 fragmented schema files (why?)
Duplicate Person entities everywhere (apparently I really wanted Google to know who I am)
Nested objects instead of proper @id references (because who needs best practices?)
Organizations pointing to external URLs
No unified entity relationships (just vibes)

It was like I’d read the Schema.org documentation, nodded thoughtfully, and then done the exact opposite of everything it recommended.

The Solution (Or: How I Turned One Problem Into Several Problems)

I read an article from Momentic Marketing about using @id attributes to build knowledge graphs. And look, the term ‘knowledge graph’ makes me cringe a bit. It’s used by a lot of companies out there, branding themselves as AI data companies, and referring to it as the solution for everything - context windows, reducing hallucinations, making sure your printer never annoys you again. It’s just a graph. You know, graph?. But graphs are very useful. And you do need to identify the vertices (nodes) in them, and aggregate data about those vertices so that the edges function correctly. So you need ids for your vertices. And your edges too, actually, but that’s a little less straightforward than the n00b mistake I made by not including them in the vertices - or, ahem, ‘things’. Oliver Dodd had held my hand for too many years with his beautiful EntSchemas, and handled all of that identification/deduplication nonsense for me with his underlying framework—so 4 months later, I’m writing a quick fix myself, and then writing a better ID algorithm that will be more robust as our databases and data sources grow.

Phase 1: Unified (ugh, Knowledge) Graph

First, I needed to understand what I had:

1 Person (me, allegedly)
1 WebSite (this one)
1 Blog (the thing you’re reading)
2 Organizations (Integrity Studios and InventoryAI.io)

So I built a graph. Using actual @id references. Like a responsible adult developer, going back and fixing my open-source n00b mistakes. Manually, because I hadn’t written automation for this yet. (I’m as shocked as you are.) Even this blog post is (for the most part) hand-written; at this point, a hand-written developer blog post (written into an .md file via vim, no less) makes me feel a little bit like that one asshole who always brought a typewriter into Flightpath to work when I was in college. Yes, we all see how terribly seriously you’re taking writing your novel, and we’re all very proud of you. Now please go buy a computer like the rest of us, because those keys are so loud.

And anyway, getting back to my point:

The format: {canonical_url}#{entity_type}

{
  "@type": "Person",
  "@id": "https://www.aledlie.com#person",
  "worksFor": [
    {
      "@id": "https://www.aledlie.com/organizations/integrity-studios#organization"
    }
  ]
}

Look at that. It’s clean. It has relationships. It follows Best practices for search engine and LLM data discovery. Side note, I also refuse to use the term ‘SEO’ for anything I do - like ‘knowledge graph,’ or ‘I’m super into crypto,’ I’ve always considered a red flag, even if there are a few legitimate cases out there in the sea of scammy fluff and get-rich-quick schemes.

Results:

11 files → 1 file (that’s a 91% reduction, in case you’re counting)
15 bidirectional entity relationships (Person ↔ WebSite, Person → Organizations, Blog ↔ WebSite)

Phase 2: Enhanced Blog Schemas (Because One Success Wasn’t Enough)

But of course, blogs actually have both text in them, styles, and types. And according to schema.org and pydantic, mine had 3 types:

Technical guides (like that Jekyll update nightmare)
Performance analysis (with actual data and everything)
Personal narratives (like this one, which is definitely getting too meta)

TechArticle Schema

For posts like “Updating Jekyll in 2025” where I explain technical stuff while complaining about M2 chips.

Properties:

dependencies: “Ruby 3.x, Jekyll 4.x, Bundler 2.x, pain”
proficiencyLevel: “Intermediate (I hope)”
articleSection: “Jekyll”

AnalysisNewsArticle Schema

For my Wix performance post where I actually included data and metrics like a real engineer.

Properties:

dateline: “November 2025”
backstory: “Performance analysis based on real-world production data (I swear)”
about: “Web Performance Optimization”

HowTo Schema

For future step-by-step tutorials. Because apparently I’m planning to write more of these.

Properties:

steps: Array of actual steps (revolutionary!)
tools: What you’ll need
totalTime: “PT2H” (2 hours in ISO 8601, because we’re fancy)

All of them reference the unified knowledge graph via @id. It’s knowledge graphs all the way down.

The Documentation (Or: Future Me Will Thank Present Me)

Testing procedures (3 phases, very official)
Search Console monitoring (daily/weekly/monthly schedules)
Analysis reports (with statistics)
Before/after comparisons (because I needed validation)

I even created a decision tree for choosing the right schema type. A DECISION TREE. For a personal blog.

Is it technical documentation?
├─ Yes → Does it have numbered steps?
│   ├─ Yes → HowTo
│   └─ No → TechArticle
└─ No → Does it analyze data?
    ├─ Yes → AnalysisNewsArticle
    └─ No → BlogPosting (boring)

The Front Matter (Or: How to Use This Madness)

Now I just add this to my blog post front matter:

---
title: "My Post Title"
schema_type: TechArticle
schema_dependencies: "Ruby, sanity (optional)"
schema_proficiency: "Intermediate"
---

And I get more proper Schema.org markup. Ready for our new LLM overlords to index my thoughts.

Expected Benefits (According to The Internet)

Apparently this should give me:

Better technical documentation indexing
Rich results in Google (the fancy kind)
Knowledge panel data (if Google deems me worthy)
Improved CTR on enhanced posts

Will any of this actually matter? Probably not. But my vertices will have ids, and not a bunch of duplicates, so I feel slightly less shame.

The Irony

The absolute best part about all of this? I spent an entire session optimizing Schema.org markup for better data discovery (ugh, SEO) on a personal blog that gets like… maybe 12 visitors a month (7 of them are probably me checking if my changes deployed).

But you know what? Those 12 visitors are going to have the BEST structured data experience of their lives.

Lessons Learned

@id Best Practices Matter: Current format is {canonical_url}#{entity_type}. Stable IDs only—no timestamps or query parameters, you monster. But I am going to make this way more robust later.
Validation Saves Lives: Or at least saves you from deploy-time errors. 100% validation pass rate is achievable and feels amazing.
Sometimes Over-Engineering is Fun: Sure, I could’ve just left it alone. But where’s the fun in that?

The Stats (For My Fellow Nerds)

17 files changed
4,388 insertions (Thanks, Claude)
45 deletions (minimal destruction)
2 blog posts enhanced with fancy new schemas
91% file reduction (concise + precise -> beautiful)
100% @id validation (✨ perfection ✨)

In Conclusion

Was this necessary? No. Is it over-engineered? Probably. Do my schemas now follow best practices for SEO, LLMs, and knowledge graphs? Absolutely. Am I proud of this? Sort of. I am proud of the 91% file reduction.

P.S. If you actually read this entire post about Schema.org optimization on a personal blog, we should be friends. Or you should seek professional help. Possibly both. Definitely both.

P.P.S. Yes, I did use the new TechArticle schema for this post. Meta? Absolutely. Appropriate? Debatable. Do I care? Not even a little bit.

This post brought to you by an unhealthy obsession with optimization, and my lingering mild resentment about writing half of my college papers while trying to drown out typewriter noise.