Skip to main content
Version: v2.0

Canonical JSON Schema

Every certification is derived from a canonical JSON representation of the track's rights-determinative metadata. The canonical form is deterministic: given identical input data, the same byte-exact JSON string is always produced, yielding the same SHA-256 hash.

This determinism is the foundation of the entire certification system. If you can reproduce the canonical JSON, you can reproduce the hash, and therefore independently verify any certification.

Certified fields

The canonical JSON includes only fields that are relevant to rights determination. Transient data, engagement metrics, and internal workflow state are excluded.

Field reference

FieldTypeSort OrderDescription
certification_versionstringN/ASchema version (e.g. "2.0").
certification_tierstringN/ACertification tier: "gold", "silver", "bronze", "declared", or null for legacy v1.0 certifications.
isrcstringN/AInternational Standard Recording Code.
iswcstringN/APrimary ISWC for the underlying work.
titlestringN/ATrack title.
artiststringN/APrimary artist name.
release_datestringN/ARelease date (ISO 8601 date).
duration_msintegerN/ADuration in milliseconds.
source_countintegerN/ANumber of independent authoritative sources that corroborated the metadata.
performersarraySorted by (role, name)Performing artists with roles.
writersarraySorted by (ipi, name)Songwriter/composer credits.
pro_work_idsobjectSorted by keyPRO-specific work identifiers (e.g. {"prs": "12345", "ascap": "98765"}).
territory_registrationsobjectSorted by keyTerritory-level registration data.
publisher_chainsarraySorted by publisher namePublishing chain information.
iswcsobjectSorted by keyAdditional ISWC mappings.

New in v2.0

The following fields were added in schema version 2.0:

  • certification_tier — Indicates the verification depth applied. Values: "gold", "silver", "bronze", "declared". Legacy certifications produced under v1.0 will have this field set to null or absent.
  • source_count — The number of independent authoritative sources that corroborated the metadata. This is relevant to the Gold tier requirement of 2+ source corroborations. For Declared tier, this will be 0 or absent.
  • pro_work_ids — An object mapping PRO names to their work identifiers for the track. Keys are lowercase PRO abbreviations (e.g. "prs", "ascap", "bmi", "gema", "sacem"). Values are the work identifier strings assigned by each PRO.
v2.0 hashes differ from v1.0

Because the canonical JSON now includes additional fields (certification_tier, source_count, pro_work_ids), a v2.0 certification of the same track will produce a different hash from a v1.0 certification. This is expected and correct — the certification_version field distinguishes the two schema versions, and older certifications remain independently verifiable against their original v1.0 schema.

Performers

Each performer entry contains:

{
"name": "Bruce Dickinson",
"role": "vocals"
}

Performers are sorted lexicographically by (role, name) — role takes priority, then name within the same role.

Writers

Each writer entry contains:

{
"name": "Steve Harris",
"ipi": "00026781433",
"role": "composer",
"share": 33.34
}

Writers are sorted lexicographically by (ipi, name) — IPI takes priority, then name for writers sharing the same IPI.

FieldTypeDescription
namestringWriter's name.
ipistringIPI number (Interested Party Information).
rolestringWriter role (e.g. "composer", "lyricist", "composer/lyricist").
sharefloatOwnership share as a percentage.

PRO work IDs

The pro_work_ids object maps PRO names to their work identifiers:

{
"ascap": "894523170",
"prs": "30118462"
}

Keys are sorted lexicographically. Only PROs where a work identifier has been confirmed are included.

Excluded fields

The following categories of data are never included in the canonical JSON, even if present in the underlying record:

CategoryExamplesReason
Engagement metricsPlay counts, skip rates, playlist addsVolatile; not rights-determinative.
Source identifiersSpotify URI, MusicBrainz MBID, Discogs IDPlatform-specific; not rights-determinative.
Confidence scoresEnrichment confidence, match scoresInternal quality metrics.
Workflow statePipeline stage, operator assignmentInternal process data.
Enrichment metadataSource timestamps, API response logsAudit trail data, not certified content.
TimestampsCreated/updated timestampsOnly the certification date is recorded (outside the canonical JSON, in the certification record).

This separation ensures that the certification hash depends solely on rights-critical data. Changes to engagement metrics, internal workflow, or enrichment logs do not invalidate an existing certification.

Serialisation rules

The canonical JSON is produced using the following deterministic serialisation:

json.dumps(data, sort_keys=True, separators=(',', ':'), ensure_ascii=True)

Rules in detail

  1. Key sorting — All object keys are sorted lexicographically (sort_keys=True).
  2. No whitespace — No spaces after colons or commas (separators=(',', ':')).
  3. ASCII-safe — All non-ASCII characters are escaped to \uXXXX sequences (ensure_ascii=True).
  4. UTF-8 encoding — The resulting string is encoded as UTF-8 bytes before hashing.
  5. No trailing newline — The serialised string has no trailing newline or whitespace.
  6. Null/empty removal — Fields with None, empty list ([]), or empty dict ({}) values are removed before serialisation.

Null/empty value removal

Before serialisation, the canonicaliser strips any field whose value is None, [], or {}:

certified_fields = {
k: v
for k, v in certified_fields.items()
if v is not None and v != [] and v != {}
}

This ensures that the absence of optional data does not affect the hash. A track with no territory_registrations produces the same hash regardless of whether the field was None or simply absent.

Example canonical JSON

For a fictional track with three writers and two performers, certified at Gold tier:

{"artist":"Iron Maiden","certification_tier":"gold","certification_version":"2.0","duration_ms":437000,"isrc":"GBAYE0100538","iswc":"T-010.466.720-3","performers":[{"name":"Dave Murray","role":"guitar"},{"name":"Bruce Dickinson","role":"vocals"}],"pro_work_ids":{"ascap":"894523170","prs":"30118462"},"release_date":"2000-05-29","source_count":3,"title":"The Wicker Man","writers":[{"ipi":"00026781411","name":"Adrian Smith","role":"composer","share":33.33},{"ipi":"00026781422","name":"Bruce Dickinson","role":"composer","share":33.33},{"ipi":"00026781433","name":"Steve Harris","role":"composer","share":33.34}]}

Note the characteristics:

  • No whitespace between tokens.
  • Keys in alphabetical order (artist before certification_tier before certification_version before duration_ms).
  • certification_tier is "gold" and certification_version is "2.0".
  • pro_work_ids keys sorted alphabetically: "ascap" before "prs".
  • source_count is 3, indicating three independent sources corroborated the metadata.
  • Performers sorted by (role, name): Dave Murray (guitar) appears before Bruce Dickinson (vocals) because "guitar" sorts before "vocals".
  • Writers sorted by (ipi, name): Adrian Smith (00026781411) before Bruce Dickinson (00026781422) before Steve Harris (00026781433).

Producing the record hash

The record hash is the SHA-256 digest of the canonical JSON string, encoded as UTF-8:

import hashlib
import json

record_hash = hashlib.sha256(canonical_json.encode("utf-8")).hexdigest()

The result is a 64-character lowercase hexadecimal string. This hash is the leaf value used in the Merkle tree.

Verification procedure

To verify that a certified track's metadata has not been altered:

  1. Obtain the canonical JSON — Either from the track certificate (canonical_json field) or by re-serialising the metadata using the rules above.
  2. Compute the SHA-256 hash — Hash the canonical JSON string (UTF-8 encoded) to produce a 64-character hex digest.
  3. Compare — The computed hash must match the record_hash stored in the certification.
  4. Verify via API — Submit the canonical JSON to the public POST /verify endpoint. The response confirms whether the hash matches a certified record and returns blockchain anchor details.

If any field in the canonical JSON differs — even by a single character — the hash will be completely different, and verification will fail. This is the fundamental property that makes the certification tamper-evident.

Certification version

The certification_version field (currently "2.0") is included in the canonical JSON itself. If the schema changes in a future version (e.g. adding new certified fields or changing sort orders), the version number will increment. This ensures that certifications produced under different schema versions are distinguishable and that older certifications remain independently verifiable against their original schema.

Legacy certifications produced under version "1.0" will not have the certification_tier, source_count, or pro_work_ids fields. They remain valid and verifiable against the v1.0 schema specification.