feedpak Format Specification¶
- Specification version: 1.14.0
- Format major version: 1
- Status: Draft
- Date: 2026-07-02
- Editors: The feedpak authors
- License: CC0 1.0 Universal (this document)
- Machine-readable schemas:
schemas/(MIT)
Abstract¶
feedpak is an open, hand-editable package format for interactive music notation — guitar/bass tab, standard notation, drum tabs, lyrics, vocal pitch, beats, sections, and the audio stems that go with them. A feedpak holds the authored data for a single song. The format is plain-text-first (YAML index + JSON data), self-describing, and designed to be extended without breaking existing readers.
This document is the authoritative reference for the on-disk format. It is independent of any single implementation: anyone may read, write, implement, or extend feedpak under the terms above, with no obligation to any project.
1. Conformance¶
1.1. Requirement keywords¶
The key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL in this document are to be interpreted as described in RFC 2119 and RFC 8174 when, and only when, they appear in all capitals, as shown here.
1.2. Roles¶
- A Reader is any program that opens a feedpak and consumes its data.
- A Writer is any program that produces a feedpak.
- A conformant feedpak is a package that satisfies every MUST in this document.
A conformant Reader MUST NOT reject a feedpak solely because it carries keys, files, or fields the Reader does not recognise (see §4, Versioning and §9, Extending the format). Forward-compatibility through ignored-but- preserved unknown data is a load-bearing property of this format.
1.3. What this document does and does not govern¶
This specification governs the bytes of the package on disk — its structure, its files, and the schemas of those files. It does not govern any runtime transport, API, or rendering behaviour an application layers on top. Where such implementation detail is informative, it is confined to a clearly-labelled non-normative appendix (see Appendix A).
2. Format at a glance¶
A feedpak exists in two interchangeable forms that hold identical contents:
| Form | What it is | Used for |
|---|---|---|
| Directory | A folder named *.feedpak/ containing the files below |
Authoring, hand editing, tooling development |
| Zip archive | A .feedpak file (a ZIP archive with the same files inside) |
Distribution |
A Reader MUST accept both forms and SHOULD treat them as equivalent. A Reader that unpacks the zip form to a cache SHOULD key that cache on the archive's size and modification time so a re-packed archive is re-read.
2.1. Directory layout¶
my-song.feedpak/
├── manifest.yaml # REQUIRED — all metadata + the file index
├── arrangements/
│ ├── lead.json # One JSON per playable arrangement
│ ├── rhythm.json
│ └── bass.json
├── stems/
│ ├── full.ogg # Mixed audio (single-stem packs); may be absent after stem splitting
│ ├── guitar.ogg # Optional individual stems
│ ├── bass.ogg
│ ├── drums.ogg
│ ├── vocals.ogg
│ └── other.ogg
├── lyrics.json # OPTIONAL — syllable-level lyrics (the primary track)
├── lyrics_romaji.json # OPTIONAL — a transliteration track (lyric_tracks, §5.5)
├── lyrics_en.json # OPTIONAL — a translation track (lyric_tracks, §5.5)
└── cover.jpg # OPTIONAL — album art
2.2. Three core rules¶
manifest.yamlis the index. Nothing inside a feedpak is auto-discovered by scanning — every consumed file is referenced from the manifest. Readers MUST NOT rely on filename conventions to locate data; they MUST resolve files through the manifest. (Writers MAY use the conventional filenames shown here, but Readers MUST follow the manifest's pointers.)- Paths in
manifest.yamlare POSIX-style relative paths from the feedpak root: forward slashes, no leading/, no..segments, no empty segments (//), no colon (:) — which excludes drive letters and alternate-data-stream paths — and no backslashes. Readers MUST reject paths that escape the package root. - YAML for the manifest, JSON (or JSONC) for data files. The manifest is YAML so it is
comfortable to hand-edit; all structured data files are JSON (or JSONC, where the
.jsoncextension signals that the file MAY contain//line comments and/* */block comments). A YAML manifest is, by YAML 1.2 rules, valid against the JSON-Schema definitions inschemas/(YAML is a JSON superset for this purpose).
3. Encodings and conventions¶
- Text files (
manifest.yaml, all*.json, all*.jsonc) MUST be UTF-8 encoded. A Writer SHOULD NOT emit a byte-order mark. - Time is always expressed in seconds as a JSON number (floating point), measured from
the start of the song's audio. Time fields are named
tortime. Writers MUST NOT use milliseconds, ticks, or sample counts. - Field names in hot-path data (notes, chords, hits, beats) are deliberately short because they may be repeated tens of thousands of times in one file. Writers MUST NOT expand them. One-off metadata MAY use long, descriptive names.
- Audio stems default to OGG (Vorbis) at a quality around
-q:a 5(size/fidelity balance). OGG and WAV together form the baseline every Reader MUST decode; other formats MAY be carried and are dispatched by file extension (or an explicitcodecoverride); see §5.3.2 for the full decoder baseline, codec-resolution precedence, and portability rule. Cover art is JPEG or PNG, conventionally square, 500–1500 px on a side. - Unknown keys / files / fields are reserved for forward-compatibility. Readers MUST ignore them rather than error, and Writers that copy or re-emit a feedpak SHOULD preserve them verbatim.
4. Versioning¶
feedpak distinguishes three independent version axes. Conflating them is the most common mistake when implementing the format, so they are defined separately here.
4.1. Format version — feedpak_version¶
The manifest SHOULD carry a top-level feedpak_version key whose value is a
Semantic Versioning 2.0.0 string (for example "1.0.0"). It declares
which version of this format the package conforms to.
- A Writer producing a feedpak that conforms to this document SHOULD set
feedpak_version: "1.14.0". (The optional fields added since 1.0.0 —authorsin 1.1.0; the song-leveltempos/time_signaturesplus the per-arrangementtemposoverride in 1.2.0; the per-note bend shapebt/bnvin 1.4.0; the per-note teaching marksfg/ch/sdin 1.5.0; and the per-chord harmony annotationsfn/voicingin 1.7.0; and the template harmony descriptorscaged/guideTonesin 1.8.0; and the engine-agnostic rig library (rigsside-file plus the arrangementtones.base_rig/tones.changes[].rigreferences) in 1.13.0 — are all additive, so an older Reader simply ignores what it does not recognise. 1.3.0 added no on-disk field. 1.6.0 adds no new manifest key or note/side-file field either, but it does newly permit the.jsoncdata-file extension (see §8): such files MAY contain comments that a Reader MUST strip, so — unlike the additive fields above — a comment-bearing.jsoncfile requires a JSONC-aware Reader rather than being silently ignorable. 1.9.0 adds the optional stemcodechint and broadens audio beyond OGG behind a MUST-decode baseline; a pack that stays within the baseline is unaffected, while one that carries a non-baseline stem format requires a Reader that supports it — the same opt-in shape as.jsonc, per the §4.2 carve-out. 1.10.0 adds the optional song-levelharmonyside-file (intended chord progression) — additive, an older Reader simply ignores the manifest key and file. 1.11.0 adds the optional multi-lingual keys — top-levellanguage, the per-stemlanguagehint, and thelyric_trackslist (original / transliteration / translation lyric tracks) — all additive, an older Reader ignores them and still loads the singlelyricspointer as before. 1.12.0 adds the optional album-grouping / genre keys —album_artist,track,disc, andgenres— all additive, an older Reader ignores them. 1.14.0 adds the optional recording-identity keys —mbidandisrc— additive, an older Reader ignores them.) - If
feedpak_versionis absent, a Reader MUST treat the package as"1.0.0". (This makes every package authored before the field existed a valid 1.0.0 package.) - The value MUST be a valid semver string when present. A Reader MUST reject a value
that is not parseable as semver, or MAY fall back to treating the package as
"1.0.0"with a logged warning.
4.2. Compatibility policy¶
Changes to the format are released under semver semantics, applied to feedpak_version:
| Change | Bump | Meaning |
|---|---|---|
MAJOR (2.0.0) |
breaking | Required structure or semantics changed. Existing packages may need regeneration. |
MINOR (1.1.0) |
additive | New OPTIONAL manifest keys or side-files. Older Readers keep working by ignoring them. |
PATCH (1.0.1) |
editorial | Clarifications, fixed wording, metadata-only adjustments. No schema change. |
Reader rules. A Reader that supports feedpak major version X:
- MUST accept any package whose declared major version is X, regardless of its minor or patch (it ignores unknown minor additions per §1.2) — with the single exception that a package which uses an opt-in file-format relaxation the Reader does not implement (see the carve-out below) MAY be rejected with a clear error. This MUST-accept guarantee otherwise holds for all ordinary additive minor/patch changes.
- SHOULD warn, and MAY refuse with a clear error, when a package declares a major version greater than X.
- MAY accept a package whose major version is less than X, applying the backward-compatibility provisions this document defines.
Writer rules. A Writer MUST NOT introduce a backward-incompatible structural change without bumping the major version. Additive changes (new optional keys/files) MUST be released as a minor bump and MUST be safe for an older Reader to ignore.
Carve-out: opt-in file-format relaxations. One narrow class of change is released as a minor bump even though an older Reader cannot transparently ignore it: an opt-in relaxation of how a file is encoded — a data file's text, or a stem's audio codec — that a pack uses only if it chooses to. Two such relaxations exist in this document:
- The
.jsoncextension (1.6.0), whose files may contain comments a Reader MUST strip. - Audio stem formats beyond OGG (1.9.0): any stem not encoded as OGG (Vorbis). 1.9.0 both widens the MUST-decode baseline from OGG alone to OGG + WAV and allows further formats above it (FLAC, Opus, …). Because OGG was the only format guaranteed before 1.9.0, a Reader predating 1.9.0 may not decode even a baseline WAV stem, and no Reader is ever required to decode the above-baseline formats.
The justification for keeping each minor rather than major is that they are strictly opt-in and
per-file: a pack that does not adopt .jsonc — and a .jsonc file with no comments — is
byte-for-byte ordinary JSON every Reader handles; likewise a pack whose stems are all OGG is
decodable by every Reader regardless of version. Only a pack that actually uses a relaxation (a
comment-bearing .jsonc file, or a non-OGG stem with no OGG alternative) requires a Reader
supporting a later version — a 1.9.0 Reader for a WAV-only pack, or a Reader that opted into the
above-baseline format otherwise. A Writer wanting maximum cross-version compatibility therefore
SHOULD include an OGG stem. This is therefore an explicit, bounded exception both to the "older Readers keep working by
ignoring them" rule and to the Reader rule that a major-X Reader MUST accept any X.y.z
package (a Reader MAY reject a package that uses a relaxation it does not implement). It is
not a general license to add un-ignorable changes under a minor bump; any
future relaxation of this kind MUST be opt-in and per-file in the same way, or else be released
as a major bump. A Writer that needs maximum Reader compatibility SHOULD NOT rely on the
relaxation (for .jsonc: keep data files as comment-free .json).
4.3. Side-file schema versions¶
Several side-files carry their own integer version field at their top level
(lyrics/vocal_pitch/song_timeline/drum_tab/notation_<id> — see §7).
These are per-file schema versions and are orthogonal to feedpak_version:
- They are plain integers (
1,2, …), not semver, and start at1. - They are bumped only when that file's entry shape changes incompatibly.
- A Reader MUST NOT assume any relationship between a side-file's
versionand the package'sfeedpak_version. Every new object-shaped side-file SHOULD include"version": 1.lyrics.jsonis the one exception — it is a flat array retained for backward compatibility and carries noversionfield (see §7.1).
4.4. Specification-document version¶
This published document has its own version (shown in the header, e.g. 1.0.0) tracking the
prose, schemas, and examples. It matches the format major.minor it describes. Major revisions
of the format are published as new documents (spec/feedpak-v2.md); minor and patch revisions
update this document in place and are recorded in CHANGELOG.md. Each
published version is frozen with a git tag (v1.0.0).
5. manifest.yaml¶
The manifest is the only REQUIRED file. It carries the song's metadata and indexes every other file in the package.
Minimal valid manifest:
feedpak_version: "1.0.0"
title: "Black Hole Sun"
artist: "Soundgarden"
duration: 320.5
arrangements:
- id: lead
name: Lead
file: arrangements/lead.json
tuning: [0, 0, 0, 0, 0, 0]
capo: 0
stems:
- id: full
file: stems/full.ogg
default: true
5.1. Top-level keys¶
| Key | Type | Required | Description |
|---|---|---|---|
feedpak_version |
string (semver) | SHOULD | Format version the package conforms to (see §4.1). Absent ⇒ "1.0.0". |
title |
string | yes | Song title. |
artist |
string | yes | Artist name. |
album |
string | no | Album. |
album_artist |
string | no | Album artist for the release. Used to group multi-artist / compilation albums under one album identity. Absent ⇒ falls back to artist. |
year |
int | no | Release year. |
track |
int | no | 1-based track number within the album, for album playback order. Absent ⇒ order is unspecified (a consumer MAY fall back to a filename/title heuristic or the user's own ordering). |
disc |
int | no | 1-based disc number for a multi-disc release. Absent ⇒ 1. |
genres |
list | no | Zero or more genre labels (strings), most specific first; a consumer treats genres[0] as the primary genre. Free-form, but authors SHOULD prefer common, normalized names. Absent ⇒ unspecified. |
mbid |
string (UUID) | no | MusicBrainz Recording MBID of the recording this pack transcribes, in lowercase canonical UUID form — an author-set, open, exact identity key. A consumer SHOULD prefer it over free-text artist/title matching when grouping charts of the same recording or looking up external metadata. Absent ⇒ identity is text-derived. |
isrc |
string | no | International Standard Recording Code of the recording (12 characters, e.g. USABC2400001) — an alternate author-set exact identity key for when the MBID is unknown. The stored form is the bare 12-character code: the hyphens in the common display form (US-ABC-24-00001) are presentation only and not part of the code, so authors SHOULD strip separators when writing the key and consumers SHOULD strip them before comparing. When both are present and disagree, a consumer SHOULD trust mbid. Absent ⇒ unspecified. |
language |
string | no | BCP 47 tag for the song's primary sung language (e.g. en, ja, ko, zh-Hans). Absent ⇒ unspecified (treat as und). See §5.5. |
authors |
list | no | Human contributors who authored or edited this feedpak (see §5.4). Distinct from artist. |
duration |
number | yes | Song length in seconds. |
arrangements |
list | yes | Playable arrangements (see §5.2). MUST be non-empty. A notation-only entry still counts as an arrangement — it may omit file (see §5.2), but it is not an exception to the non-empty rule. |
stems |
list | yes | Audio stems (see §5.3). MUST be non-empty. |
stem_separation |
object | no | Provenance metadata when stems were produced by an automated separation engine. See §5.3.1. |
lyrics |
string (path) | no | Path to a lyrics JSON file (see §7.1). |
lyrics_source |
string | no | Origin of the lyrics: authored, transcribed, or user. Absent ⇒ treat as authored. See §7.1. |
lyric_transcription |
object | no | Provenance metadata when lyrics came from an automated transcription engine. See §7.1.1. |
lyric_tracks |
list | no | Additional lyric representations — original-script, transliteration, and translation tracks, plus per-language sung originals. See §5.5. |
vocal_pitch |
string (path) | no | Path to per-syllable vocal pitch JSON (see §7.2). |
pitch_extraction |
object | no | Provenance metadata when vocal pitch was extracted by an automated engine. See §7.2.1. |
vocal_pitch_contour |
string (path) | no | Path to a fine-grained pitch-contour JSON (see §7.3). |
cover |
string (path) | no | Path to cover image (JPEG/PNG). |
preview |
string (path) | no | Path to a short preview audio clip for hover-to-listen UIs. Same audio-format rules as stems (§5.3.2) — OGG/WAV baseline, other formats allowed behind it. |
song_timeline |
string (path) | no | Path to a song-wide beats/sections file (see §7.4). Takes priority over beats/sections embedded in arrangement JSON. |
drum_tab |
string (path) | no | Path to a drum-tab file (see §7.5). |
keys |
string (path) | no | Path to a key/scale-annotation file (see §7.7). |
harmony |
string (path) | no | Path to a song-level harmony track (intended chord progression; see §7.8). |
rigs |
string (path) | no | Path to an engine-agnostic rig library (amps/cabs/pedals as portable effect graphs; see §7.9). |
Any key not listed here is an extension key and MUST be ignored by Readers that do not understand it (see §9).
5.2. arrangements[]¶
Each entry describes one playable arrangement and points at its JSON file:
arrangements:
- id: lead # filesystem-safe stable id, used for filenames
name: Lead # display name
file: arrangements/lead.json
tuning: [0, 0, 0, 0, 0, 0] # six semitone offsets from E A D G B E
capo: 0
centOffset: 0.0 # OPTIONAL float, cents; default 0.0
| Field | Type | Default | Notes |
|---|---|---|---|
id |
string | — | REQUIRED. Stable, filesystem-safe, lowercase identifier; used in filenames and referenced by consumers. |
name |
string | id |
Display name. |
file |
string (path) | — | Path to the arrangement JSON (see §6). MAY be omitted only when notation is present (see below). |
tuning |
int[] | [0,0,0,0,0,0] |
Semitone offsets from standard E2 A2 D2 G3 B3 E4. Six elements is the standard 6-string-guitar convention; lengths 4–8 are accepted (4–6 = bass, 6–8 = extended-range guitar; length 6 is shared). Readers MUST NOT hard-code length 6. |
capo |
int | 0 |
Capo fret. |
centOffset |
number | 0.0 |
Pitch-shift in cents. Common values: -1200.0 (one octave down for extended-range bass), small non-zero values for non-A440 reference pitch (e.g. A443 ≈ +11.8). |
type |
string | — | OPTIONAL instrument hint (guitar, bass, piano, violin, …). |
notation |
string (path) | — | OPTIONAL path to a standard-notation file for this arrangement (see §7.6). When present, file MAY be omitted and the arrangement is notation-only. |
Manifest-level tuning, capo, and centOffset override any equivalents embedded inside
the arrangement JSON; the in-JSON values are fallbacks.
5.3. stems[]¶
stems:
- id: full
file: stems/full.ogg
default: true # plays by default when the song opens
- id: drums
file: stems/drums.ogg
default: false
| Field | Type | Default | Notes |
|---|---|---|---|
id |
string | — | REQUIRED. Stable identifier referenced by consumers. |
file |
string (path) | — | REQUIRED. Path to the audio file. |
codec |
string | — | OPTIONAL codec hint (e.g. "vorbis", "opus", "pcm", "mp3", "flac"). When absent, the codec is inferred from the file extension; when present it overrides the extension (see §5.3.2). Its main use is disambiguating an extension that doesn't pin the codec (container ≠ codec), but it MAY be set redundantly to be explicit. |
language |
string | — | OPTIONAL BCP 47 tag for a language-specific vocal stem (e.g. a vocals_ja stem with language: ja and a vocals_en stem with language: en for a song with two sung-language recordings). Absent ⇒ untagged — an instrumental stem or a stem whose language is unspecified. A lyric_tracks entry's stem pointer pairs lyrics with the stem they were sung on. |
default |
boolean | false |
Whether this stem is enabled when the song opens. |
default is logically boolean. For hand-edited convenience, Readers MUST also accept the
case-insensitive strings "true"/"false", "on"/"off", and "yes"/"no", mapping them to
the obvious boolean. Writers SHOULD emit a real boolean or "on"/"off".
stems MUST be non-empty. There is no required id or filename: a freshly converted pack
typically has a single {id: full, file: stems/full.ogg} entry; after stem splitting, full
is commonly replaced by per-instrument entries (guitar, bass, drums, vocals, other,
piano, …). No entry is guaranteed to be present beyond the non-empty requirement.
5.3.1. stem_separation¶
When stems were produced by an automated source-separation engine, an OPTIONAL top-level
stem_separation object records the provenance:
stem_separation:
engine: demucs # stable engine id
model: htdemucs_6s # engine-specific model name
version: "1.0.0" # semver for the stem-artifact contract
| Field | Type | Notes |
|---|---|---|
engine |
string | Stable identifier for the separation engine. |
model |
string | Engine-specific model id used for this split. |
version |
string (semver) | Version of the producer's stem-artifact contract, independent of the upstream engine version. Bump: patch = metadata-only, minor = backward-compatible additions, major = stem set / packaging / post-processing changed and existing splits should be regenerated. |
Omitted for single-stem packs and for hand-recorded or hand-edited stems. The three fields together form a natural cache key: a consumer regenerating stems can treat any change among them as a cache miss.
5.3.2. Audio formats — baseline, dispatch, and portability¶
Codec resolution (precedence). A Reader determines a stem's codec in this order:
- the explicit
codecfield, if present (it overrides the extension); else - the file extension (
.ogg→ Vorbis,.opus→ Opus,.wav→ PCM,.mp3→ MP3,.flac→ FLAC).
codec exists precisely for the cases where the extension is ambiguous or doesn't pin the codec
(a container that can hold more than one codec). A Reader decodes the formats it supports; for one
it does not, it MUST surface a clear error or fall back to another stem — it MUST NOT fail
silently into dead air.
Decoder baseline. A conformant Reader MUST decode:
- OGG (Vorbis) —
.ogg - WAV (PCM) —
.wav
and SHOULD decode MP3 (.mp3), FLAC (.flac), and Opus (.opus). These are
the recommended, broadly-portable formats; FLAC/WAV suit lossless masters, Opus/Vorbis/MP3 suit
compressed delivery.
Portability rule. A pack intended for distribution MUST include at least one stem whose
resolved codec (per the precedence above — codec override, else extension) is in the
MUST-decode baseline (in practice, an OGG or WAV full mix). The resolved codec, not the file
extension alone, is what counts: a stem declaring file: stems/full.wav with codec: mp3 is an
MP3 stem and does not satisfy the baseline. Stems in any other
format are opportunistic enhancements layered on top of that guaranteed-playable baseline, so
a leaner Reader always has something to play. A Writer that controls its target environment MAY
omit the baseline stem, but such a pack is not portable and a Reader MAY reject it with a
clear error (see the §4.2 opt-in carve-out).
Other / proprietary formats (e.g. WEM). The format is codec-agnostic enough that a Reader
MAY decode an extension outside the recommended set (for instance a desktop build that bundles
a decoder for Wwise .wem) — but such formats are NOT part of the baseline, MUST NOT be
the only stem in a distributable pack (the portability rule still applies), and this document
ships no reference decoder for them. Relying on a non-recommended format makes a pack
playable only on Readers that opted into it.
5.4. authors[]¶
An OPTIONAL top-level authors list credits the people who authored or edited this feedpak —
whoever transcribed the chart, arranged it, mixed the stems, or hand-corrected the data. This is
distinct from artist, which names the recording artist / musician who performed the song;
both MAY be present.
authors:
- name: Jane Smith # display name or handle
role: transcriber
email: jane@example.com
url: https://janesmith.dev
- name: Bob Lee
role: editor
| Field | Type | Default | Notes |
|---|---|---|---|
name |
string | — | REQUIRED. Contributor's name or handle. |
role |
string | — | OPTIONAL. What they contributed. The value is free-form and a Reader MUST accept any string; the RECOMMENDED vocabulary is transcriber, arranger, charter, editor, mixer, engineer, and proofreader. |
email |
string | — | OPTIONAL. Contact email. |
url |
string | — | OPTIONAL. Profile or homepage URL. |
When an entry omits role, a consumer SHOULD treat the contribution as unspecified rather
than infer one. The authors key is OPTIONAL: a pack with no contributor metadata simply omits
it, and a Reader that does not understand it ignores it per §1.2.
5.5. lyric_tracks[]¶
An OPTIONAL top-level lyric_tracks list carries multiple lyric representations of a song so a
global audience can read along in the original script, a transliteration (romaji, pinyin, …), or a
translation — and so a song released with more than one sung-language recording can pair each
vocal with its own lyrics. Each entry points at a lyrics file that uses the same flat-array shape
as lyrics.json (the file schema is unchanged; only the manifest grows):
language: ja # the song's primary sung language (§5.1)
lyrics: lyrics.json # the legacy single pointer — kept for old Readers
lyric_tracks:
- id: ja
file: lyrics.json # may reuse the same file the `lyrics` pointer names
language: ja
kind: original
stem: vocals_ja
- id: romaji
file: lyrics_romaji.json
language: ja-Latn
kind: transliteration
- id: en
file: lyrics_en.json
language: en
kind: translation
| Field | Type | Default | Notes |
|---|---|---|---|
id |
string | — | REQUIRED. Stable, filesystem-safe, lowercase identifier for this track. |
file |
string (path) | — | REQUIRED. Path to a lyrics file in the lyrics.json flat-array shape. Two entries MAY name the same file (e.g. a kind: original track and the legacy lyrics pointer). |
language |
string | — | REQUIRED. BCP 47 tag for this track's text. Use a script subtag for a transliteration (e.g. ja-Latn romaji, zh-Latn pinyin). |
kind |
string | — | REQUIRED. One of original (the sung lyrics in their own language/script), transliteration (a phonetic re-spelling of an original, typically sharing its timing 1:1), or translation (the meaning in another language, for reading rather than precise sing-along). A Reader MUST accept these three and SHOULD treat any other value as translation. |
lyrics_source |
string | authored |
OPTIONAL origin of this track: authored, transcribed, or user — same vocabulary as the manifest lyrics_source. |
lyric_transcription |
object | — | OPTIONAL {engine, model, version} provenance when this track was machine-produced (transcribed or machine-translated) — same shape and semver semantics as stem_separation and lyric_transcription. |
stem |
string | — | OPTIONAL id of the stems[] entry this track's lyrics were sung on — pairs a per-language sung original with its vocal recording. Omitted for transliteration/translation tracks, which ride the original performance. |
name |
string | id |
OPTIONAL display label. |
Relationship to the single lyrics pointer. The lyrics /
lyrics_source / lyric_transcription keys are
unchanged and remain valid on their own. When lyric_tracks is absent, a Reader behaves
exactly as before. When lyric_tracks is present it is authoritative for multi-track display,
and a Writer SHOULD also set lyrics to point at the file of the kind: original track whose
language matches the song's primary language — pointing at the same file,
not a copy — so a Reader predating 1.11.0 still shows lyrics. A Reader that understands
lyric_tracks MAY ignore the lyrics pointer when the list is present.
No new file schema. Each track is an ordinary lyrics.json array: a
transliteration track typically mirrors the original's syllable timing 1:1, while a translation
track carries its own (looser, reading-oriented) timing. The §7.1 rule that
lyrics.json is a versionless flat array is preserved for every track. Per-track vocal pitch is
out of scope for this version (the melody is largely language-independent); a future version MAY
add it.
lyric_tracks is OPTIONAL; a Reader that does not understand it ignores it per §1.2
and falls back to the single lyrics pointer.
6. Arrangement JSON¶
Arrangement files (arrangements/<id>.json, or arrangements/<id>.jsonc for hand-edited
packs) carry the playable note/chord data for one arrangement — the wire format. This
document defines that format; it is the authoritative reference for it.
6.1. Top-level shape¶
{
"name": "Lead",
"tuning": [0, 0, 0, 0, 0, 0],
"capo": 0,
"centOffset": 0.0, // OPTIONAL, float cents, default 0.0
"notes": [ /* §6.2 */ ],
"chords": [ /* §6.3 */ ],
"anchors": [ /* §6.4 */ ],
"handshapes": [ /* §6.5 */ ],
"templates": [ /* §6.6 */ ],
"phrases": [ /* OPTIONAL, §6.7 */ ],
"tones": { /* OPTIONAL, §6.9 */ },
"tempos": [ /* OPTIONAL, §6.10 — per-chart tempo override */ ],
"beats": [ /* §6.8 — see note */ ],
"sections": [ /* §6.8 — see note */ ]
}
beats and sections are song-level data. For historical reasons they MAY appear inside
the first arrangement's JSON, in which case a Reader MUST hoist them to the song. New
packages SHOULD instead use a dedicated song_timeline.json. When
both are present, the song_timeline.json values MUST take priority.
6.2. Notes¶
Field names are short on purpose. A Writer MUST NOT expand them.
{
"t": 12.345, // time (s)
"s": 2, // string (0 = lowest)
"f": 7, // fret (0 = open, 24 = max)
"sus": 0.5, // sustain (s, 0 = none)
"sl": 9, // pitched slide-to fret (-1 = no slide)
"slu": -1, // unpitched slide-to fret (-1 = no slide)
"bn": 1.0, // bend amount in semitones (peak)
"bt": 4, // OPTIONAL bend intent (§6.2.1): 0 up, 1 release, 2 pre-bend, 3 pre-bend-release, 4 round-trip
"bnv": [{ "t": 0, "v": 0 }, { "t": 0.25, "v": 1.0 }, { "t": 0.5, "v": 0 }], // OPTIONAL time-stamped bend curve
"ho": false, // hammer-on
"po": false, // pull-off
"hm": false, // natural harmonic
"hp": false, // pinch harmonic
"pm": false, // palm mute
"mt": false, // string mute
"vb": false, // vibrato
"tr": false, // tremolo
"ac": false, // accent
"tp": false, // tap
"ln": false, // link-next (chord-linking metadata; renderers may ignore)
"fhm": false, // fret-hand mute
"plk": false, // pluck / pop (bass)
"slp": false, // slap (bass)
"rh": -1, // right-hand fingering (-1 = unset)
"pkd": -1, // pick direction (-1 = unset, 0 = down, 1 = up)
"fg": -1, // OPTIONAL fret-hand finger, teaching mark (§6.2.2): -1 unset, 0 thumb, 1–4 index→pinky
"ch": -1, // OPTIONAL strum-group key, teaching mark (§6.2.2): notes sharing a value ≥ 0 are one strum gesture
"sd": -1, // OPTIONAL scale degree, teaching mark (§6.2.2): chromatic offset 0–11 above the active key's tonic
"ig": false // ignore (authoring flag — rendered but not scored / sequenced)
}
Only t, s, and f are REQUIRED. Numeric defaults are 0, or -1 for the slide / rh /
pkd / fg / ch / sd fields; boolean defaults are false. When authoring by hand, a Writer
MAY omit any field equal to its default; a Reader MUST fill in defaults for absent fields.
6.2.1. Bend shape (bt, bnv)¶
bn is the bend's peak magnitude in semitones. Two OPTIONAL fields describe its shape over
time; both were added in 1.4.0 and an older Reader simply ignores them, treating the note as a
plain bend-up to bn.
bt— bend intent, an integer (default0):
bt |
meaning |
|---|---|
0 |
bend up (default) |
1 |
release (a held bend let down) |
2 |
pre-bend (already bent at the note's onset) |
3 |
pre-bend-and-release |
4 |
round-trip (bend up then back down within the note) |
bnv— a time-stamped bend curve: an array of{ "t": <seconds from the note's onset>, "v": <semitones> }points, in non-descendingtorder. When present,bnvis the authoritative shape andbtis advisory;bnremains the peak for Readers that don't interpolate the curve. Omitbnvfor a simple bend.
A note SHOULD NOT carry bnv or a non-zero bt when bn is 0.
6.2.2. Teaching marks (fg, ch, sd)¶
Three OPTIONAL fields added in 1.5.0 annotate how a note is taught or displayed. They are teaching marks, not performance data: a renderer MAY show them, but a grader MUST NOT use them to decide whether a note was played correctly. An older Reader ignores them.
fg— the fret-hand finger the chart prescribes, an integer (default-1):
fg |
finger |
|---|---|
-1 |
unset — no prescription (default) |
0 |
thumb |
1 |
index |
2 |
middle |
3 |
ring |
4 |
pinky |
This is the same convention as a chord template.fingers entry, lifted to the
single note. fg = -1 (or absent) means the chart does not prescribe a finger — it does
not mean "wrong finger". A grader MUST NOT penalise a performance for the finger used.
-
ch— a strum-group key, an integer (default-1). Notes within one arrangement that share the samechvalue≥ 0are performed as a single strum or rake gesture and MAY be drawn with a strum indicator;pkdgives the gesture's direction. The value is an opaque grouping label, not an index or an ordering.ch = -1(or absent) means ungrouped. Whereas a chord groups notes sounded simultaneously,chgroups notes — possibly slightly staggered int— that the player executes as one motion. -
sd— the note's scale degree, an integer0–11(default-1). It is the note's pitch class expressed as a chromatic offset in semitones above the tonic of the key/scale active at the note's time (seekeys.json):0= tonic,2= major second,3= minor third,7= perfect fifth, and so on. A display layer maps it to a conventional degree label (1,♭3,5, …) using the active scale. Becausesdis fully derivable fromkeys.jsonand the note's pitch, a Reader MAY compute it itself and a Writer MAY omit it; when present it is a cached value or an author override.sd = -1(or absent) means unset — for instance when no key is in effect.
6.3. Chords¶
A chord groups note-shaped objects under a single time:
{
"t": 30.0,
"id": 12, // index into templates[]
"hd": false, // high-density flag
"notes": [
{"s": 0, "f": 3, "sus": 0.0},
{"s": 1, "f": 5, "sus": 0.0}
],
"fn": {"rn": "ii7", "q": "m7", "deg": 2} // OPTIONAL harmonic function (§6.3.1)
}
Chord notes use the same field set as standalone notes except that t is omitted — the
chord carries the time. id indexes into templates[] for the fingering/shape.
6.3.1. Harmonic function (fn)¶
fn is an OPTIONAL object describing the chord's harmonic function in the active key. Added in
1.7.0, it is a teaching annotation, not performance data: a renderer MAY display it, but a
grader MUST NOT use it to score whether the chord was played correctly. An older Reader ignores
it.
| key | type | meaning |
|---|---|---|
rn |
string | Roman-numeral label, for display (e.g. "ii7", "V7", "♭VII"). |
q |
string | chord quality token (e.g. "maj7", "m7", "7", "m7b5", "sus4"). |
deg |
integer 0–11 |
chromatic offset in semitones of the chord root above the tonic of the key/scale active at the chord's time (0 = tonic), mirroring the per-note sd convention. |
fn itself is OPTIONAL, but when present all three keys (rn, q, deg) are REQUIRED — a
half-specified function is ambiguous. fn describes the chord's as-played function and is fully derivable from the chord root and the
key active at its time (see keys.json), so a Reader MAY compute it and a
Writer MAY omit it; when present it is a cached value or an author override. It lives on the
chord instance (§6.3), not the template — the same shape is reused across
keys, so its function is not a property of the shape. The song-level harmony track
(harmony.json) instead carries the song's intended chord progression;
fn describes each chord as actually voiced, and the two are deliberately distinct (a Reader MAY
derive one from the other, but neither is authoritative).
The q token is drawn from a shared chord-quality vocabulary — a recommended (not closed) set
including "maj", "m" (minor triad; "min" is an accepted synonym), "maj7", "m7", "7",
"m7b5", "dim", "aug", "sus2", "sus4", "6", "9", … — the same set
harmony.json's quality uses, so the two stay interoperable.
6.4. Anchors¶
Where the fretting hand sits; drives a renderer's zoom/focus box.
6.5. Hand shapes¶
Spans during which a chord shape is held:
| Field | Type | Default | Notes |
|---|---|---|---|
chord_id |
int | 0 |
Index into templates[]. |
start_time |
number | 0.0 |
Span start (s). |
end_time |
number | 0.0 |
Span end (s). |
arp |
boolean | false |
Treat as an arpeggio span rather than a strummed hold. |
6.6. Chord templates¶
Named shapes referenced by chord.id and handshape.chord_id:
{
"name": "Em7",
"displayName": "Em7",
"arp": false,
"fingers": [-1, 2, 1, -1, -1, -1],
"frets": [ 0, 2, 2, 0, 0, 0],
"voicing": "open",
"caged": "E",
"guideTones": [3, 10]
}
| Field | Type | Default | Notes |
|---|---|---|---|
name |
string | "" |
Canonical template name. |
displayName |
string | name |
UI label (MAY carry display-only variants). |
arp |
boolean | false |
Whether the template is arpeggiated. |
fingers |
int[] | all -1 |
Fretting-hand fingers, lowest string first. -1 = unused, 0 = open / no finger, 1..4 = index/middle/ring/pinky. Length matches the string count. |
frets |
int[] | all -1 |
Fret numbers, lowest string first. -1 = unused, 0 = open, positive = fretted. |
voicing |
string | "" |
OPTIONAL voicing type of the shape (1.7.0) — e.g. "open", "triad", "shell", "drop2", "drop3", "barre". A teaching annotation (display only — a grader MUST NOT score it); free-form, but a Writer SHOULD prefer the listed tokens. It describes the key-independent shape, which is why it lives on the template rather than the chord instance (cf. fn). |
caged |
string | "" |
OPTIONAL CAGED-system shape (1.8.0) the fingering derives from — one of "C", "A", "G", "E", "D". A teaching annotation (display only — a grader MUST NOT score it). Like voicing, it describes the key-independent shape, so it rides the template. |
guideTones |
int[] | [] |
OPTIONAL guide tones (1.8.0) — chromatic semitone offsets 0–11 above the chord root marking the tones that most define the chord's quality (typically the 3rd and 7th); e.g. a dominant-7 → [4, 10] (major 3rd, minor 7th). A teaching annotation (display only — a grader MUST NOT score it). Root-relative, hence key-independent, so it rides the template. |
6.7. Phrases (OPTIONAL — multi-difficulty data)¶
Sources that carry a per-phrase difficulty ladder include this:
"phrases": [
{
"start_time": 0.0,
"end_time": 12.5,
"max_difficulty": 4,
"levels": [
{ "difficulty": 0, "notes": [], "chords": [], "anchors": [], "handshapes": [] },
{ "difficulty": 1, "notes": [], "chords": [], "anchors": [], "handshapes": [] }
]
}
]
A Writer with no multi-difficulty data MUST omit the phrases key entirely rather than
emit "phrases": []. A missing key means "single fixed difficulty"; the distinction is
normative because consumers gate a difficulty control on the key's presence.
6.8. Beats and sections¶
"beats": [{"time": 0.5, "measure": 1}, {"time": 1.0, "measure": -1}],
"sections": [{"name": "verse", "number": 1, "time": 12.5}]
measure: -1 marks a sub-beat (not a downbeat). Section name follows the usual
song-structure vocabulary (intro, verse, chorus, bridge, solo, outro, …). See the
note in §6.1 regarding placement and the song_timeline.json priority
rule.
6.9. Tones (OPTIONAL)¶
tones carries the arrangement's gear (amp/pedal/cabinet) and in-song tone switches. Omitted
when the source carries no tone data.
"tones": {
"base": "Clean Rhythm",
"base_rig": "clean-rhythm",
"changes": [
{"t": 12.5, "name": "Lead Drive", "rig": "lead-drive"},
{"t": 48.0, "name": "Clean Rhythm", "rig": "clean-rhythm"}
],
"definitions": [
{"name": "Clean Rhythm", "id": "tone-1", "gear": { /* opaque, source-specific gear data */ }}
]
}
tones binds gear to time; the actual rigs are defined in one of two ways:
- Structured, portable — the
rigs.jsonlibrary (§7.9).base_rigand eachchanges[].rigreference a rigidthere. This is the recommended, engine-agnostic representation and the only one a Reader can portably render. - Legacy opaque —
definitions, a list of raw tone objects copied verbatim from the source (e.g. an opaque object from an external source). It is source-specific and not portably interpretable; Readers that do not interpret it MUST preservetonesverbatim.
A pack MAY carry both; a rig-aware Reader SHOULD prefer the rig/base_rig references over
definitions.
base(string) — human-readable name of the tone in effect before the first change.base_rig(string, OPTIONAL) — rigidinrigs.jsonfor the base tone.changes(list, time-sorted) —{"t": seconds, "name": str, "rig"?: str}switches.rigreferences a rigidinrigs.json.definitions(list, OPTIONAL) — raw, opaque tone objects copied from the source.
All sub-keys are individually OPTIONAL.
6.10. Per-arrangement tempo (OPTIONAL)¶
A chart MAY declare its own tempo map, overriding the song-wide tempo for that arrangement only.
This supports an arrangement authored at a different tempo from the rest of the song (a half-time
feel, a slowed practice chart, …). The shape is identical to the song-level
tempos:
| Field | Type | Notes |
|---|---|---|
time |
number | Event time (s). REQUIRED. |
bpm |
number | Beats per minute, > 0. REQUIRED. Applies until the next event. |
Priority. When the consumer plays this arrangement and the arrangement carries a non-empty
tempos, a Reader MUST use it and MUST ignore the song-level
song_timeline.json tempos for this chart. When the arrangement omits
tempos, the song-level tempo applies. A Writer MUST omit the key entirely when the chart
follows the song tempo rather than emit "tempos": []. time_signatures are song-level only
and are not overridden per arrangement.
7. Side-files¶
Each side-file is referenced from the manifest by a pointer key (the "manifest opt-in, file off
to the side" pattern; see §9.1).
Side-files use the .json extension by convention; hand-edited side-files MAY use .jsonc to
signal that the file contains comments. Every side-file that is a JSON object SHOULD carry a
top-level integer version (see §4.3). The one exception is
lyrics.json, which is a flat JSON array kept for backward compatibility: it carries no
version field, and its origin and revision are tracked at the manifest level instead
(lyrics_source, lyric_transcription).
7.1. lyrics.json¶
Referenced by the manifest lyrics key. A flat JSON list of syllable objects. Unlike the
object-shaped side-files, lyrics.json carries no top-level version field — it is an
array kept in this shape for backward compatibility, and its origin/revision is tracked at the
manifest level (lyrics_source, lyric_transcription). This is the one documented exception to
the side-file version guidance in §4.3 and §9.3.
[
{"t": 12.34, "d": 0.18, "w": "Hel"},
{"t": 12.52, "d": 0.22, "w": "lo-"},
{"t": 13.10, "d": 0.30, "w": "world"}
]
| Field | Type | Notes |
|---|---|---|
t |
number | Time (s). |
d |
number | Duration (s). |
w |
string | Syllable text. A trailing - joins to the next syllable as one word; a trailing + marks the last syllable of a line. Both are suffixes on a real syllable, not standalone entries. |
The manifest lyrics_source key records the origin: authored (from an authored chart),
transcribed (machine-generated by an automated engine — speech transcription, and, for
lyric_tracks, machine translation or transliteration), or user
(hand-edited). Absent ⇒ authored. A UI MAY use this to badge machine-generated lyrics
differently.
A song MAY carry more than one lyric track — an original-script track plus a transliteration
and/or translation, or per-language sung originals. These additional tracks are listed in the
manifest lyric_tracks key (§5.5); each one is a file in this same flat-array
shape. The single lyrics pointer remains the backward-compatible primary track.
7.1.1. lyric_transcription¶
When lyrics_source is transcribed (or any future automated origin), an OPTIONAL top-level
lyric_transcription object records {engine, model, version} with the same shape and semver
semantics as stem_separation, identifying the automated engine that
produced the text — speech transcription, or (for a lyric_tracks entry)
machine translation or transliteration. Omitted for authored/user lyrics.
7.2. vocal_pitch.json¶
Referenced by the manifest vocal_pitch key — the per-syllable karaoke companion to lyrics:
{
"version": 1,
"notes": [
{"t": 12.34, "d": 0.40, "midi": 64},
{"t": 12.78, "d": 0.55, "midi": 67}
]
}
| Field | Type | Notes |
|---|---|---|
version |
int | Schema version of this file (currently 1). |
notes |
list | One entry per syllable that could be pitched. t/d mirror the matching lyrics.json entry; midi is the MIDI note number (60 = middle C). Unpitched syllables are omitted, so this list MAY be shorter than lyrics.json. |
7.2.1. pitch_extraction¶
When pitch came from an automated engine, an OPTIONAL top-level pitch_extraction object
records {engine, model, version} — same shape and semver-string semantics as
stem_separation. This is distinct from the in-file integer version
above. Omitted for hand-edited pitch tracks.
7.3. vocal_pitch_contour.json¶
Referenced by the manifest vocal_pitch_contour key — a fine-grained pitch contour (one sample
every ~20 ms, Hz rather than MIDI). It is a different shape from vocal_pitch.json and rides
its own manifest key so the two never collide:
7.4. song_timeline.json¶
Referenced by the manifest song_timeline key. Holds the song-wide rhythmic grid — beats,
sections, and (since 1.2.0) the song's tempo and time-signature maps — in one dedicated,
instrument-independent file, out of the first arrangement JSON:
{
"version": 1,
"tempos": [
{"time": 0.000, "bpm": 120.0},
{"time": 64.000, "bpm": 90.0}
],
"time_signatures": [
{"time": 0.000, "ts": [4, 4]},
{"time": 32.000, "ts": [6, 8]}
],
"beats": [
{"time": 0.500, "measure": 1},
{"time": 1.000, "measure": -1},
{"time": 2.000, "measure": 2}
],
"sections": [
{"name": "intro", "number": 1, "time": 0.0},
{"name": "verse", "number": 1, "time": 16.0},
{"name": "chorus", "number": 1, "time": 32.0}
]
}
tempos and time_signatures are each OPTIONAL, time-ordered, and follow the same
"each entry applies until the next" rule as keys.json. Adding them did not
change the file's version, which stays 1.
tempos[] field |
Type | Notes |
|---|---|---|
time |
number | Event time (s). REQUIRED. |
bpm |
number | Beats per minute, > 0. REQUIRED. In effect until the next tempos entry. |
time_signatures[] field |
Type | Notes |
|---|---|---|
time |
number | Event time (s). REQUIRED. |
ts |
int[2] | [numerator, denominator], each ≥ 1 (same convention as notation ts, §7.6). REQUIRED. In effect until the next entry. |
beats[] field |
Type | Notes |
|---|---|---|
time |
number | Beat timestamp (s). |
measure |
int | 1-based downbeat number; -1 = sub-beat. |
sections[] field |
Type | Notes |
|---|---|---|
name |
string | Song-structure name (intro, verse, chorus, …). |
number |
int | Repeat number. |
time |
number | Section start (s). |
The song-level tempos map is the song-wide playback/timing tempo, distinct from the
per-measure tempo inside notation_<id>.json (§7.6), which exists for
staff rendering. An arrangement MAY override tempos for its own chart — see
§6.10.
When song_timeline.json is present and valid, its data MUST take priority over any
beats/sections embedded in arrangement JSON. When absent, the Reader MUST fall back to the
arrangement-embedded data per §6.1. No migration is required for older
packages.
7.5. drum_tab.json¶
Referenced by the manifest drum_tab key — per-piece drum hits:
{
"version": 1,
"name": "Drums",
"kit": [
{"id": "kick", "name": "Kick"},
{"id": "snare", "name": "Snare"},
{"id": "hh_closed", "name": "Hi-hat (closed)"},
{"id": "ride", "name": "Ride"}
],
"hits": [
{"t": 0.500, "p": "kick", "v": 110},
{"t": 0.750, "p": "snare", "v": 92},
{"t": 0.750, "p": "hh_closed", "v": 70},
{"t": 1.000, "p": "snare", "v": 60, "g": true},
{"t": 1.250, "p": "snare", "v": 105, "f": true},
{"t": 4.000, "p": "crash_r", "v": 120, "k": 0.080}
]
}
hits[] field |
Type | Notes |
|---|---|---|
t |
number | Hit time (s), REQUIRED, monotonic across hits[]. |
p |
string | Piece-id from the vocabulary below, REQUIRED. |
v |
int 1–127 | Velocity (default 100). |
g |
boolean | Ghost note. |
f |
boolean | Flam. |
k |
number | Cymbal-choke tail duration (s). |
kit[] is the legend (stable metadata); hits[] is the hot path (short field names). Open and
closed hi-hat are distinct piece-ids, not articulation flags — a scorer can only reject a
wrong-articulation strike if the articulation is part of the id.
Canonical piece-id vocabulary. This is a closed list for v1; unknown piece-ids MUST round-trip (a Reader renders them with a sensible default rather than erroring).
| Piece-id | Category | Default GM MIDI |
|---|---|---|
kick |
kick | 35, 36 |
snare |
drum | 38, 40 |
snare_xstick |
drum | 37 |
tom_hi |
drum | 50, 48 |
tom_mid |
drum | 47, 45 |
tom_low |
drum | 43 |
tom_floor |
drum | 41 |
hh_closed |
cymbal | 42 |
hh_open |
cymbal | 46 |
hh_pedal |
cymbal | 44 |
stack |
cymbal | 30 |
crash_l |
cymbal | 49 |
crash_r |
cymbal | 57 |
splash |
cymbal | 55 |
china |
cymbal | 52 |
ride |
cymbal | 51, 59 |
ride_bell |
cymbal | 53 |
bell |
cymbal | 80 |
7.6. notation_<id>.json¶
Referenced by an arrangement entry's notation key (per-arrangement, not song-wide — a
song MAY carry both notation_keys.json and notation_violin.json). Promotes staff-notation
instruments (keys, piano, strings, …) to first-class data, separate from the guitar wire
format.
{
"version": 1,
"instrument": "piano",
"staves": [
{"id": "rh", "clef": "G2", "label": "Right Hand"},
{"id": "lh", "clef": "F4", "label": "Left Hand"}
],
"measures": [
{
"idx": 1,
"t": 0.0,
"ts": [4, 4],
"ks": 0,
"tempo": 120.0,
"staves": {
"rh": {"voices": [{"v": 1, "beats": [
{"t": 0.000, "dur": 4, "notes": [{"midi": 64}]},
{"t": 0.500, "dur": 4, "notes": [{"midi": 67}]}
]}]},
"lh": {"voices": [{"v": 1, "beats": [
{"t": 0.000, "dur": 1, "notes": [{"midi": 52}, {"midi": 60}]}
]}]}
}
}
]
}
Top-level fields
| Field | Type | Notes |
|---|---|---|
version |
int | Always 1 for this schema. |
instrument |
string | piano, violin, guitar, … Mirrors the arrangement type; makes the file self-describing. |
rights / lyricist / arranger |
string | OPTIONAL credits. Omit when absent. |
staves |
list | Static staff definitions: id (stable, referenced by measures[].staves keys), clef, OPTIONAL label. |
measures |
list | Ordered measure data — the hot path. |
Clef vocabulary: G2 (treble), F4 (bass), C3 (alto), C4 (tenor), neutral
(unpitched/percussion).
Measure fields: idx (1-based number), t (downbeat time, s), ts ([num, den], omit if
unchanged), beat_groups (int[] for compound/irregular meters; the sum MUST equal the
numerator; omit for simple meters), ks (key signature, −7..+7 semitones from C; negative =
flats; omit if unchanged), tempo (BPM, omit if unchanged), pickup (bool anacrusis flag, omit
when false), staves (keyed by staff id; each has OPTIONAL clef and voices).
Beat fields (inside staves → voices → beats): t (required), dur (required denominator:
1=whole … 32=thirty-second), dot, rest, tu ([num, den] tuplet), beat_pos
([num, den] exact position within the measure), notes, dyn, slr/slre, grace
("a"/"p"), arp, ferm, sustain-pedal spd/sph/spu, and additional optional effects
(cre, dec, vib, vibw, fade, pm, lr, slap, pop, tap, su, sd, rasg,
golpe, wah, txt, chrd).
Note fields (inside beats → notes): midi (required, 0–127), tied, acc (accidental
override: 0=force natural, −2..2 = double-flat..double-sharp), stem ("up"/"down"), and
additional optional effects (stc, ten, ac, hac, vib, vibw, dead, ghost, fng,
rfng, str, harm, bend, slide, trill, ho, po, tp, barre).
v1 non-features (out of scope; ship later only as additive v1.x optional fields): microtonal
pitch finer than acc; figured bass; mid-measure key/time/clef changes (all three are
measure-granular in v1); ottava lines, repeat/volta barline semantics, ornaments beyond trills,
notated tremolo, and glissando lines. A Writer MUST drop a source feature on this list (with
a logged warning) rather than approximate it into wrong notation; a Reader MUST NOT invent
semantics for an unspecified field name before a future version defines it.
7.7. keys.json¶
Referenced by the manifest keys key — key/scale annotations for theory-aware visualisations.
Each entry applies until the next:
{
"version": 1,
"events": [
{"t": 0.0, "key": "Em", "scale": "natural_minor"},
{"t": 64.5, "key": "G", "scale": "major"}
]
}
Note —
keys.jsonis not a keyboard part. Despite the name, this file is a song-level musical-key/scale track and is instrument-independent: it has nothing to do with a keyboard or piano arrangement (an arrangementtype: keys, or anotation_<id>.json). A guitar-only, drums-only, or stems-only pack useskeys.jsonto tell the player "the song is in Em, then changes to G major at 64.5 s" with no keyboard part anywhere in the package. Thekeysmanifest pointer (this file) and thekeysinstrument hint are unrelated.
7.8. harmony.json¶
Referenced by the manifest harmony key — the song's intended chord progression: the chords
the song is, independent of what any one arrangement plays. This is the reference/teaching
track a fretboard-theory overlay, a "chords" lyric line, or a practice-along reads; a simplified
or solo arrangement won't carry it per-chord, so it lives once at the song level. Time-ordered
events, each applying until the next (the same shape as keys.json):
{
"version": 1,
"events": [
{"t": 0.0, "root": "G", "quality": "maj7", "rn": "I"},
{"t": 2.0, "root": "E", "quality": "m7", "rn": "vi", "bass": "G"},
{"t": 4.0, "root": "C", "quality": "maj7", "rn": "IV"},
{"t": 6.0, "root": null}
]
}
| Field | Type | Req | Meaning |
|---|---|---|---|
t |
number | yes | Event time (s); applies until the next event (matches keys.json's t). |
root |
string | null | — | Chord root note name — "G", "F#", "Bb" (same note-name convention as keys.json's key). null marks a no-chord (N.C.) event — a rest/breakdown with no harmony until the next event; without it the apply-until-next rule would wrongly sustain the previous chord. Omitting root entirely is equivalent to null. |
quality |
string | — | Quality token (e.g. "maj7", "m7", "7", "dim", "sus4"). Drawn from the shared chord-quality vocabulary (see §6.3.1); the set is recommended, not closed. |
rn |
string | — | Roman numeral relative to the active keys.json key, for display (like fn.rn). Meaningful only when a key is active at this t — omit it when keys.json is absent or has no event covering this time. |
bass |
string | — | Slash-chord bass note when different from root (e.g. G/B → root: "G", bass: "B"). |
Spelling. root and bass SHOULD follow the active key's spelling convention (e.g. Bb
in F major, A# in B major) so a derived rn/notation reads correctly; the absolute spelling is
authoritative and the relative view is derived from it.
Relationship to fn. harmony.json is the song's intended progression (one per song,
reference data); per-chord fn is the as-played function on a
specific arrangement's chords. They may agree or deliberately diverge (a simplified arrangement
plays a subset). A Reader MAY derive fn from harmony.json + the played chord, but neither
is authoritative over the other.
No voicing. harmony.json is abstract harmony only — it carries no fingering and does not
index the arrangement chord templates; voicings are an arrangement concern.
Honesty rule. The harmony track is reference/teaching data: a renderer/teacher MAY display it; a grader MUST NOT use it to score the player's notes. (A backing engine that plays the progression is a separate, dependent proposal, not part of this track.)
Note — repeats are flattened. Events are a flat, time-ordered list, so a progression that recurs (e.g. a chorus) is repeated in full at each occurrence — the same tradeoff
keys.jsonandsong_timeline.jsonmake. A section-relative encoding is out of scope for this version.
7.9. rigs.json¶
Referenced by the manifest rigs key — a pack-level library of engine-agnostic rigs (guitar
/ bass tones as portable effect graphs). Arrangements bind rigs to time via
tones.base_rig / tones.changes[].rig (§6.9), which
reference a rig's id here. Rigs live pack-level (not per-arrangement) so one rig can be shared
across arrangements and tone changes.
This is the portable successor to the opaque, source-specific tones.definitions
passthrough. It is deliberately not modelled on any one source or engine (a
single serial chain, a fixed gear catalog, or one plugin format): a rig is an ordered set
of effect blocks, each of which separates what it is (intent, engine-independent) from how
to render it (realizations, an ordered preference list).
{
"version": 1,
"rigs": [
{
"id": "lead-drive", // referenced by tones.base_rig / changes[].rig
"name": "Lead Drive",
"instrument": "guitar", // OPTIONAL hint
"channels": 1, // OPTIONAL; default 1 (mono)
"blocks": [
{
"id": "amp", // OPTIONAL; required only if referenced by `graph`
"role": "amp", // open vocabulary (gain/drive/amp/cab/eq/delay/reverb/…)
"name": "British Hi-Gain",
"bypassed": false, // OPTIONAL
"mix": 1.0, // OPTIONAL wet/dry, 0..1
"intent": { // engine-independent: what the block IS
"kind": "amp", "family": "british-hi-gain", "tags": ["crunch", "lead"]
},
"realizations": [ // ordered; reader renders the FIRST engine it supports
{"engine": "nam", "ref": "captures/british-higain.nam", "sha256": "…"},
{"engine": "plugin", "format": "clap", "id": "com.vendor.amp",
"name": "AmpX", "vendor": "Vendor", "state": "<base64>", "params": {"gain": 0.72}},
{"engine": "builtin", "id": "feedback.amp.tubestage.v1", "params": {"gain": 0.72}}
],
"params": {"gain": 0.72, "bass": 0.5, "mid": 0.62, "treble": 0.58, "master": 0.5},
"automation": [ // OPTIONAL time-varying params
{"param": "gain", "points": [{"t": 48.0, "v": 0.72}, {"t": 60.0, "v": 0.85}]}
]
}
],
"graph": { // OPTIONAL; non-serial topology (parallel/stereo/wet-dry)
"nodes": ["input", "amp", "output"],
"edges": [["input", "amp"], ["amp", "output"]]
}
}
]
}
| Top-level field | Type | Notes |
|---|---|---|
version |
integer ≥ 1 | REQUIRED. Side-file schema version (independent of feedpak_version). |
rigs |
array | REQUIRED. The rig library. |
Rig (rigs[]):
| Field | Type | Notes |
|---|---|---|
id |
string | REQUIRED. Stable id referenced by tones.base_rig / changes[].rig. |
name |
string | OPTIONAL, human-readable. |
instrument |
string | OPTIONAL hint (guitar, bass, …). Non-normative. |
channels |
integer ≥ 1 | OPTIONAL. Channels the rig is authored for; default 1. |
blocks |
array | REQUIRED. Ordered serial signal chain (input → blocks[0] → … → output) unless graph overrides. |
graph |
object | OPTIONAL. Explicit {nodes, edges} topology for non-serial rigs. |
Block (blocks[]) — every field except at least one of intent/realizations is OPTIONAL:
| Field | Type | Notes |
|---|---|---|
id |
string | REQUIRED only if the block is referenced from graph. |
role |
string | Open vocabulary (gain, drive, amp, cab, eq, dynamics, modulation, delay, reverb, pitch, filter, utility, …). Unknown values MUST be preserved. |
name |
string | Human-readable. |
bypassed |
boolean | Default false. |
mix |
number 0..1 | Wet/dry mix; default 1. |
intent |
object | Engine-independent description — kind, family, model (non-normative hints), tags. Lets a Reader with no matching realization still display / approximate / remap the block. |
realizations |
array | Ordered by preference. A Reader renders the first realization whose engine it supports; if it supports none, it falls back to intent. |
params |
object | Open map of named controls: a number (normalized 0..1), a string (discrete/enum), a boolean, or {value, unit} (e.g. {"value": 320, "unit": "ms"}). Not limited to any fixed knob set. |
automation |
array | OPTIONAL. {param, points: [{t, v}]} lanes; time-sorted, value held to the next point. |
Realization (realizations[]):
| Field | Type | Notes |
|---|---|---|
engine |
string | REQUIRED. Open vocabulary. Reserved: nam (Neural Amp Modeler capture), ir (impulse response), plugin (host plugin), builtin (host DSP by id). |
ref |
string | For nam/ir: pack-relative path (no leading /, no .. — see §8) or absolute URI. |
sha256 |
string | OPTIONAL integrity hash of the referenced asset. |
format |
string | For plugin: vst3 | au | lv2 | clap | … (open). |
id |
string | For plugin: bundle/unique id; for builtin: the host DSP id. |
name, vendor, preset |
string | OPTIONAL plugin identity/preset hints. |
state |
string (base64) | OPTIONAL opaque plugin state for exact recall. MUST NOT be the only representation — pair with intent/params so a Reader without that plugin can still render something. |
params |
object | Same shape as block params; realization-specific control values. |
Conformance.
- A Writer SHOULD give every block an
intenteven when it also emits astateblob, so the rig stays renderable by Readers that lack the exact plugin/capture. - A Reader that renders rigs SHOULD select, per block, the first
realizations[]entry whoseengineit supports; if none match, it MAY synthesize fromintent/paramsor skip the block, and MUST NOT fail the whole pack. - A Reader that does not interpret rigs MUST ignore
rigs.jsonand preserve it verbatim on round-trip, along with any unknownrole/engine/kindvalues andextnamespaces.
8. Reading and writing¶
This section is informative — it describes the typical lifecycle, not additional requirements.
Reading. Parse manifest.yaml; read title/artist/duration/feedpak_version for a
quick metadata view. For a full load, resolve each manifest pointer to its file, validate it
against the relevant schema, and merge per the priority rules in §6.1 and
§7.4. Unknown manifest keys and files are retained, not discarded.
JSONC comments (normative). The requirements in this paragraph are binding, notwithstanding
the otherwise-informative framing of §8. When a manifest pointer resolves to a .jsonc file, a
Reader MUST strip C-style comments (// line comments and /* */ block comments) before
parsing the JSON content.
Comments are the only relaxation permitted: trailing commas, single-quoted strings, and other
JSON5-style extensions are NOT allowed — after comment removal the content MUST be strict,
valid JSON. A Writer that preserves edits to a .jsonc file SHOULD leave the original comments
intact. For new hand-edited packs, Writers MAY write .jsonc data files and MAY include
comments in them.
Note on compatibility. .jsonc is governed by the opt-in file-format relaxation carve-out in
§4.2, not by the ordinary "older Readers keep working by ignoring them"
rule. Two facts must be kept separate:
- No existing pack breaks. Packs that do not adopt
.jsoncare entirely unchanged, and a comment-free.jsoncfile is byte-for-byte valid JSON. - A
.jsoncfile that actually contains comments is NOT readable by a strict-JSON-only Reader — such a Reader errors on//and/* */. Only a Reader that implements the comment-stripping step above can read it. Unlike the optional fields added under a normal minor bump, comments cannot be safely ignored by an unaware Reader — which is exactly why §4.2 lists this as a bounded, opt-in exception.
A Writer that needs a pack to be readable by the broadest range of Readers therefore SHOULD
keep its data files as comment-free .json.
Writing. Build the package in a working directory: write arrangements/<id>.json per
arrangement; encode audio into stems/; write any side-files; compose manifest.yaml last so
its index matches what was written. Emit YAML with block style and preserve key order for
human readability (do not sort keys). Then either keep the directory form or zip it into a
.feedpak archive. A round-trip (write → read → write) SHOULD be lossless for all
recognised fields and MUST preserve unrecognised ones.
9. Extending the format¶
feedpak is designed to be extended without breaking older Readers. The conventions below are
how lyrics, stems, drum_tab, song_timeline, and notation were each added.
9.1. The golden rule: manifest opt-in, file off to the side¶
- Drop a new file alongside the standard ones (
drums.json,keys.json, …). - Add a manifest key that points at that file (
drum_tab: drum_tab.json). - Gate consumers on the manifest key. If the key is absent, do nothing. Never auto-discover by filename — that breaks the "manifest is the index" rule (§2.2).
Older Readers ignore the unknown key; new consumers act on it. No coordination with other implementations is required.
9.2. Naming conventions¶
- Manifest keys:
snake_case, descriptive; singular for one value (lyrics,cover,drum_tab), plural for lists (stems,arrangements). - Filenames: lowercase; JSON (
.json) or JSONC (.jsonc) for structured data, audio in a baseline-or-recommended format (.ogg/.wav, or.mp3/.flac/.opus— see §5.3.2), JPEG/PNG for images. - JSON fields: short names for hot-path data streamed many times (
t,s,f); long names for one-off metadata. - Time fields: always
tortime, always seconds as floats. - Ids/indexes: stable, filesystem-safe, lowercase.
9.3. Always include version¶
Every new object-shaped side-file SHOULD carry "version": 1. It is free insurance: a
later incompatible change becomes version: 2, and old consumers can branch or fall back
gracefully. (The legacy lyrics.json array is the lone exception — see
§7.1.)
9.4. Stay backward-compatible¶
- Adding fields is always safe (older Readers ignore them) — release as a minor format bump.
- Removing fields breaks older Readers — do not; release only under a major bump.
- Repurposing fields (changing meaning or units) is the most dangerous — bump the side-file
version(and the format major if it is structural) and branch. Prefer adding a new field and sunsetting the old one over a version or two.
9.5. What does NOT belong in a feedpak¶
A feedpak holds the song's authored data only. The following MUST NOT be written into a package, because they vary by user or machine:
- Per-machine settings (device picks, network addresses, hardware universes).
- UI state (zoom level, panel sizes).
- User progress, scores, or play counts.
10. Conformance checklist¶
A conformant feedpak (normative summary):
- [ ] Has a
manifest.yamlat the root, UTF-8, parseable as YAML. - [ ] Manifest has
title,artist,duration, a non-emptyarrangements, and a non-emptystems. (A notation-only arrangement counts towardarrangements; it may omitfile— see §5.2 — but does not relax the non-empty rule.) - [ ] Every manifest pointer resolves to an existing file via a safe POSIX relative path.
- [ ]
feedpak_version, if present, is a valid semver string. - [ ] Every side-file validates against its schema and carries a
versionwhere this document requires one. - [ ] Times are seconds; hot-path field names are unexpanded.
A conformant Reader:
- [ ] Accepts both directory and zip forms.
- [ ] Resolves all data through the manifest, never by filename scanning.
- [ ] Ignores (and, when re-emitting, preserves) unknown keys/files/fields.
- [ ] Applies the
feedpak_versionreader rules (§4.2) and thesong_timelinepriority rule (§7.4). - [ ] Rejects paths that escape the package root.
Appendix A (non-normative) — streaming and runtime notes¶
An application MAY transport feedpak data to a renderer over a streaming protocol (e.g. a WebSocket) rather than handing over the on-disk files directly. Such protocols typically reuse the same per-object field names defined here, often chunked into typed messages (song info, notes, chords, beats, sections, lyrics, drum hits, notation measures, …). This is an implementation choice and is not part of the format: the on-disk package defined by §2–§7 is the sole normative artifact, and the byte-for-byte layout of any wire protocol is out of scope.
Appendix B (non-normative) — media conventions¶
- Stems: OGG (Vorbis), ~
-q:a 5, is the recommended default and baseline; other formats are allowed behind the decoder baseline + portability rule (§5.3.2). Mono or stereo as the source dictates. - Preview clip: a short clip (a few seconds) for hover-to-listen UIs; same format guidance as stems.
- Cover art: JPEG or PNG, square, 500–1500 px on a side.
References¶
- RFC 2119 — Key words for use in RFCs.
- RFC 8174 — Ambiguity of uppercase vs lowercase.
- Semantic Versioning 2.0.0.
- YAML 1.2.2.
- JSON Schema 2020-12.
- General MIDI percussion key map (drum piece-id GM mapping).