Thoughts on using Schema.org for data modelling — common properties

David Janes
2 min readNov 4, 2020
Wikimedia Commons, for some reason.

Schema.org is “…a collection of shared vocabularies webmasters can use to mark up their pages in ways that can be understood by the major search engine…”.

I almost always use the schemas from schema.org as the starting point for modelling data in my projects. One issue I usually run across is that it’s difficult to quickly summarize a record (or object, if you prefer) without introspection / understanding the data type you’re looking at.

My solution is to use a couple of “common definitions” for all records, even if schema.org suggests this is unnecessary or redundant.

I use JSON-LD notation for defining records, so we need to get a little of this out of the way first:

  • The @type defines the node type of the record, basically the “class”
  • the @id the “node identifier” of the record — usually a URL, basically the “id”

We then try to preferentially add the following properties to records:

  • name — “The name of the item”
  • identifier — “The identifier property represents any kind of identifier for any kind of Thing, such as ISBNs, GTIN codes, UUIDs etc”

We “preferentially” add them so that when we’re confronted with any random record, we can display them to a user without having to do any introspection to figure out “what it really is”. If a record has an @id, it should have a name and an @type; if it makes sense, it should have an identifier too.

So for example, here’s me:

If we’re just browsing this record, we can display that it’s “David Janes (DPJ-0001)” or whatever and this is close enough for show business. The address doesn’t need an @id or a name, as it’s baked into the parent record.

Note how this slightly augments what schema.org specifies: e.g. a NewsArticle specifies to use headline for the “Headline of the article”; I would always use name, at least in addition to that.

--

--