Canonical JSON Schema
Every certification is derived from a canonical JSON representation of the track's rights-determinative metadata. The canonical form is deterministic: given identical input data, the same byte-exact JSON string is always produced, yielding the same SHA-256 hash.
This determinism is the foundation of the entire certification system. If you can reproduce the canonical JSON, you can reproduce the hash, and therefore independently verify any certification.
Certified fields
The canonical JSON includes only fields that are relevant to rights determination. Transient data, engagement metrics, and internal workflow state are excluded.
Field reference
| Field | Type | Sort Order | Description |
|---|---|---|---|
certification_version | string | N/A | Schema version (e.g. "2.0"). |
certification_tier | string | N/A | Certification tier: "gold", "silver", "bronze", "declared", or null for legacy v1.0 certifications. |
isrc | string | N/A | International Standard Recording Code. |
iswc | string | N/A | Primary ISWC for the underlying work. |
title | string | N/A | Track title. |
artist | string | N/A | Primary artist name. |
release_date | string | N/A | Release date (ISO 8601 date). |
duration_ms | integer | N/A | Duration in milliseconds. |
source_count | integer | N/A | Number of independent authoritative sources that corroborated the metadata. |
performers | array | Sorted by (role, name) | Performing artists with roles. |
writers | array | Sorted by (ipi, name) | Songwriter/composer credits. |
pro_work_ids | object | Sorted by key | PRO-specific work identifiers (e.g. {"prs": "12345", "ascap": "98765"}). |
territory_registrations | object | Sorted by key | Territory-level registration data. |
publisher_chains | array | Sorted by publisher name | Publishing chain information. |
iswcs | object | Sorted by key | Additional ISWC mappings. |
New in v2.0
The following fields were added in schema version 2.0:
certification_tier— Indicates the verification depth applied. Values:"gold","silver","bronze","declared". Legacy certifications produced under v1.0 will have this field set tonullor absent.source_count— The number of independent authoritative sources that corroborated the metadata. This is relevant to the Gold tier requirement of 2+ source corroborations. For Declared tier, this will be0or absent.pro_work_ids— An object mapping PRO names to their work identifiers for the track. Keys are lowercase PRO abbreviations (e.g."prs","ascap","bmi","gema","sacem"). Values are the work identifier strings assigned by each PRO.
Because the canonical JSON now includes additional fields (certification_tier, source_count, pro_work_ids), a v2.0 certification of the same track will produce a different hash from a v1.0 certification. This is expected and correct — the certification_version field distinguishes the two schema versions, and older certifications remain independently verifiable against their original v1.0 schema.
Performers
Each performer entry contains:
{
"name": "Bruce Dickinson",
"role": "vocals"
}
Performers are sorted lexicographically by (role, name) — role takes priority, then name within the same role.
Writers
Each writer entry contains:
{
"name": "Steve Harris",
"ipi": "00026781433",
"role": "composer",
"share": 33.34
}
Writers are sorted lexicographically by (ipi, name) — IPI takes priority, then name for writers sharing the same IPI.
| Field | Type | Description |
|---|---|---|
name | string | Writer's name. |
ipi | string | IPI number (Interested Party Information). |
role | string | Writer role (e.g. "composer", "lyricist", "composer/lyricist"). |
share | float | Ownership share as a percentage. |
PRO work IDs
The pro_work_ids object maps PRO names to their work identifiers:
{
"ascap": "894523170",
"prs": "30118462"
}
Keys are sorted lexicographically. Only PROs where a work identifier has been confirmed are included.
Excluded fields
The following categories of data are never included in the canonical JSON, even if present in the underlying record:
| Category | Examples | Reason |
|---|---|---|
| Engagement metrics | Play counts, skip rates, playlist adds | Volatile; not rights-determinative. |
| Source identifiers | Spotify URI, MusicBrainz MBID, Discogs ID | Platform-specific; not rights-determinative. |
| Confidence scores | Enrichment confidence, match scores | Internal quality metrics. |
| Workflow state | Pipeline stage, operator assignment | Internal process data. |
| Enrichment metadata | Source timestamps, API response logs | Audit trail data, not certified content. |
| Timestamps | Created/updated timestamps | Only the certification date is recorded (outside the canonical JSON, in the certification record). |
This separation ensures that the certification hash depends solely on rights-critical data. Changes to engagement metrics, internal workflow, or enrichment logs do not invalidate an existing certification.
Serialisation rules
The canonical JSON is produced using the following deterministic serialisation:
json.dumps(data, sort_keys=True, separators=(',', ':'), ensure_ascii=True)
Rules in detail
- Key sorting — All object keys are sorted lexicographically (
sort_keys=True). - No whitespace — No spaces after colons or commas (
separators=(',', ':')). - ASCII-safe — All non-ASCII characters are escaped to
\uXXXXsequences (ensure_ascii=True). - UTF-8 encoding — The resulting string is encoded as UTF-8 bytes before hashing.
- No trailing newline — The serialised string has no trailing newline or whitespace.
- Null/empty removal — Fields with
None, empty list ([]), or empty dict ({}) values are removed before serialisation.
Null/empty value removal
Before serialisation, the canonicaliser strips any field whose value is None, [], or {}:
certified_fields = {
k: v
for k, v in certified_fields.items()
if v is not None and v != [] and v != {}
}
This ensures that the absence of optional data does not affect the hash. A track with no territory_registrations produces the same hash regardless of whether the field was None or simply absent.
Example canonical JSON
For a fictional track with three writers and two performers, certified at Gold tier:
{"artist":"Iron Maiden","certification_tier":"gold","certification_version":"2.0","duration_ms":437000,"isrc":"GBAYE0100538","iswc":"T-010.466.720-3","performers":[{"name":"Dave Murray","role":"guitar"},{"name":"Bruce Dickinson","role":"vocals"}],"pro_work_ids":{"ascap":"894523170","prs":"30118462"},"release_date":"2000-05-29","source_count":3,"title":"The Wicker Man","writers":[{"ipi":"00026781411","name":"Adrian Smith","role":"composer","share":33.33},{"ipi":"00026781422","name":"Bruce Dickinson","role":"composer","share":33.33},{"ipi":"00026781433","name":"Steve Harris","role":"composer","share":33.34}]}
Note the characteristics:
- No whitespace between tokens.
- Keys in alphabetical order (
artistbeforecertification_tierbeforecertification_versionbeforeduration_ms). certification_tieris"gold"andcertification_versionis"2.0".pro_work_idskeys sorted alphabetically:"ascap"before"prs".source_countis3, indicating three independent sources corroborated the metadata.- Performers sorted by
(role, name): Dave Murray (guitar) appears before Bruce Dickinson (vocals) because"guitar"sorts before"vocals". - Writers sorted by
(ipi, name): Adrian Smith (00026781411) before Bruce Dickinson (00026781422) before Steve Harris (00026781433).
Producing the record hash
The record hash is the SHA-256 digest of the canonical JSON string, encoded as UTF-8:
import hashlib
import json
record_hash = hashlib.sha256(canonical_json.encode("utf-8")).hexdigest()
The result is a 64-character lowercase hexadecimal string. This hash is the leaf value used in the Merkle tree.
Verification procedure
To verify that a certified track's metadata has not been altered:
- Obtain the canonical JSON — Either from the track certificate (
canonical_jsonfield) or by re-serialising the metadata using the rules above. - Compute the SHA-256 hash — Hash the canonical JSON string (UTF-8 encoded) to produce a 64-character hex digest.
- Compare — The computed hash must match the
record_hashstored in the certification. - Verify via API — Submit the canonical JSON to the public
POST /verifyendpoint. The response confirms whether the hash matches a certified record and returns blockchain anchor details.
If any field in the canonical JSON differs — even by a single character — the hash will be completely different, and verification will fail. This is the fundamental property that makes the certification tamper-evident.
Certification version
The certification_version field (currently "2.0") is included in the canonical JSON itself. If the schema changes in a future version (e.g. adding new certified fields or changing sort orders), the version number will increment. This ensures that certifications produced under different schema versions are distinguishable and that older certifications remain independently verifiable against their original schema.
Legacy certifications produced under version "1.0" will not have the certification_tier, source_count, or pro_work_ids fields. They remain valid and verifiable against the v1.0 schema specification.