Skip to main content
Version: v2.0

Canonicalisation

Methodology Version

Methodology Version 1.0 — Effective February 2026

Once a track has been enriched, validated, and confirmed as a Golden Record, its metadata must be transformed into a canonical form — a single, deterministic representation that always produces the same output for the same underlying data, regardless of when or how the data was collected.

Canonicalisation is the prerequisite for hashing. Without a deterministic representation, identical metadata could produce different hashes depending on field ordering, whitespace, or serialisation choices, undermining the entire certification chain.

Field selection

Only rights-determinative fields are included in the canonical form. Fields that relate to processing history, engagement metrics, or internal workflow state are excluded.

IncludedExcluded
ISRCEngagement metrics (play counts, listener statistics)
ISWC(s)Source-specific identifiers (Spotify URI, MusicBrainz MBID)
TitleConfidence scores from enrichment
ArtistInternal workflow state
Writers (name, IPI, role, share %)Enrichment log metadata
Performers (name, role)Processing timestamps (except certification date)
Release dateSource provenance records
Duration (milliseconds)
Territory registrations
Publisher chains

This separation ensures that the hash represents the substance of the rights claim — the data that determines who is owed what — rather than artefacts of how the data was assembled. Two tracks with identical rights-critical data will always produce the same hash, even if they were enriched at different times or through different source orderings.

Deterministic serialisation

The canonical form is constructed through a strictly defined serialisation process:

Step-by-step process

  1. Extract rights-critical fields — Only the fields listed in the "Included" column above are retained. All other fields are discarded. Null or empty values are removed.

  2. Sort all keys alphabetically — The top-level keys of the data structure are sorted in lexicographic order. This eliminates variation caused by different insertion orderings in the source data.

  3. Normalise writer arrays — Writers are sorted first by IPI number (lexicographic), then by name. Each writer entry is normalised to a consistent structure: {name, ipi, role, share}.

  4. Normalise performer arrays — Performers are sorted by role, then by name.

  5. Sort nested structures — Territory registrations, publisher chains, and ISWC collections are sorted by their respective keys.

  6. Serialise as compact JSON — The data is serialised using the following parameters:

json.dumps(data, sort_keys=True, separators=(',', ':'), ensure_ascii=True)

This produces compact JSON with no whitespace between elements, no trailing newlines, and all non-ASCII characters escaped to their Unicode code points. The sort_keys=True parameter provides a secondary guarantee of key ordering.

The resulting string is encoded as UTF-8 before being passed to the hashing step.

Why these choices matter

DecisionRationale
sort_keys=TrueEliminates ordering variation across different systems and languages
separators=(',', ':')Removes all optional whitespace, ensuring byte-identical output
ensure_ascii=TrueNormalises character encoding, preventing Unicode normalisation form differences
UTF-8 encodingUniversally supported encoding standard
Writer sort by IPI then nameIPI is the most stable identifier; name provides a tiebreaker
Null/empty removalPrevents spurious hash differences from absent-vs-null distinctions

Verification

The canonical form is included in the certification proof bundle, allowing any party to:

  1. Inspect the exact data that was certified
  2. Re-run the serialisation process independently
  3. Confirm that the canonical JSON produces the expected SHA-256 hash

Because the serialisation parameters are fully specified and use standard library functions available in every major programming language, independent reproduction does not require any TrackForge software.

Methodology hash and record hash

The methodology_hash field is included in certification metadata (e.g., in the proof bundle and API responses) but is not part of the canonical JSON or the record_hash. This is intentional: the record hash proves data integrity (the rights-critical metadata has not changed), while the methodology hash proves rule integrity (the evaluation criteria have not changed). These are independent verification dimensions — changing the methodology version does not alter record hashes for unchanged data.

  • Golden Record Selection — The completeness criteria that must be met before canonicalisation
  • Hashing — The next step: computing a SHA-256 digest of the canonical form
  • Independent Verification — How third parties can reproduce the canonical form and verify the hash