diff --git a/docs/OSL.md b/docs/OSL.md
index 9fa6bfd..498abf2 100644
--- a/docs/OSL.md
+++ b/docs/OSL.md
@@ -1,120 +1,402 @@
-# OSL JSON Format (As Used By This App)
+# OSL JSON Format
 
-This page describes the OSL-style structure expected and produced by the current Video Annotation Tool.
+This page describes the OSL-style JSON files loaded, edited, and written by the
+Video Annotation Tool.
 
-## Top-Level Structure
+An OSL JSON file is a single JSON object with dataset metadata, a label schema,
+and a `data` array of samples. Each sample points to one or more media inputs and
+can carry task-specific annotations.
 
-Required/standard fields:
+## Top-Level Object
 
-- `version` (string)
-- `date` (string)
-- `dataset_name` (string)
-- `description` (string)
-- `modalities` (array, usually `["video"]`)
-- `metadata` (object)
-- `labels` (object)
-- `data` (array)
+The smallest useful file is a JSON object with `data` as a list. When loading,
+the app fills missing standard fields with defaults. When saving, it writes the
+standard project fields back out.
 
-Unknown root keys are preserved.
+| Field | Type | Notes |
+|---|---|---|
+| `version` | string | Current app default is `"2.0"`. |
+| `date` | string | Usually an ISO date such as `"2026-05-19"`. |
+| `dataset_name` | string | Human-readable project name. |
+| `description` | string | Free-text dataset description. Empty string is allowed. |
+| `modalities` | array | Input types present in the dataset, for example `["video"]`. The app recomputes this from sample inputs on save. |
+| `metadata` | object | Dataset-level custom metadata. |
+| `labels` | object | Label schema shared by classification and localization heads. |
+| `data` | array | Sample list. This must be a list. |
 
-## Labels Schema (`labels`)
+Unknown root keys are preserved, except retired legacy keys documented below.
 
-Each head is a key under `labels`:
+## Label Schema
+
+The root `labels` object defines annotation heads. Each head name is a key, and
+each definition should include:
+
+- `type`: `single_label` or `multi_label`.
+- `labels`: list of allowed label strings.
 
 ```json
-"labels": {
-  "action": {
-    "type": "single_label",
-    "labels": ["pass", "shot"]
-  },
-  "attributes": {
-    "type": "multi_label",
-    "labels": ["left_foot", "header"]
+{
+  "labels": {
+    "action": {
+      "type": "single_label",
+      "labels": ["pass", "shot", "foul"]
+    },
+    "attributes": {
+      "type": "multi_label",
+      "labels": ["left_foot", "header", "set_piece"]
+    }
   }
 }
 ```
 
-## Sample Structure (`data[]`)
+Classification and localization annotations should reference these same head
+names. For example, `data[].labels.action` and `data[].events[].head == "action"`
+both point at the root `labels.action` schema.
+
+## Sample Objects
 
-Each sample typically contains:
+Each entry in `data` is one sample.
 
-- `id` (string)
-- `inputs` (array of input objects, each usually has `type` + `path`)
-- Optional task blocks:
-  - `labels` (classification)
-  - `events` (localization)
-  - `captions` (description)
-  - `dense_captions` (dense description)
-  - `answers` (Q/A)
-- Optional `metadata`
-- Any additional custom keys are preserved.
+| Field | Type | Notes |
+|---|---|---|
+| `id` | string | Stable sample ID. Missing or duplicate IDs are normalized on load/save. Duplicates receive suffixes such as `__2`. |
+| `inputs` | array | Media or feature files for this sample. Multi-view samples use multiple input entries. |
+| `metadata` | object | Optional sample-level metadata. Empty metadata is removed on save. |
+| `labels` | object | Classification payload for this sample. |
+| `events` | array | Timestamped localization events. |
+| `captions` | array | Clip-level description captions. |
+| `dense_captions` | array | Timestamped dense descriptions. |
+| `answers` | array | Grouped question/answer annotations. |
 
-### `inputs`
+Unknown sample keys are preserved.
 
-Example:
+## Input Objects
+
+Each sample should include `inputs`, even if the sample has only one media file.
 
 ```json
-"inputs": [
-  {"type": "video", "path": "test/action_0/clip_0.mp4", "fps": 25.0}
-]
+{
+  "inputs": [
+    {
+      "type": "video",
+      "path": "clips/clip_0001.mp4",
+      "fps": 25.0
+    }
+  ]
+}
 ```
 
-Multi-view samples can include multiple input entries.
+Supported input types:
 
-### Classification payload (`labels` per sample)
+| Type | Typical path | Notes |
+|---|---|---|
+| `video` | `clips/clip_0001.mp4` | Default when type is missing and the extension is not special. |
+| `frames_npy` | `frames/clip_0001.npy` | Uses `fps` for playback timing. The legacy alias `frame_npy` is normalized to `frames_npy`. |
+| `tracking_parquet` | `tracking/clip_0001.parquet` | Uses parquet timestamps when available. Optional `fps` is a fallback. |
 
-- single-label head: `{"label": "shot"}`
-- multi-label head: `{"labels": ["header", "left_foot"]}`
-- smart predictions may include `confidence_score`
+Input paths can be relative or absolute when loading. On save, input paths are
+rewritten relative to the saved JSON file location when possible.
 
-### Localization payload (`events`)
+Multi-view samples use more than one input:
 
 ```json
-"events": [
-  {"head": "action", "label": "pass", "position_ms": 1234}
-]
+{
+  "id": "play_0001",
+  "inputs": [
+    {"type": "video", "path": "wide/play_0001.mp4", "fps": 25.0},
+    {"type": "video", "path": "close/play_0001.mp4", "fps": 25.0}
+  ]
+}
 ```
 
-Smart localization events may include `confidence_score`.
+## Task Payloads
 
-### Description payload (`captions`)
+### Classification
+
+Sample-level `labels` uses the same head names defined at the root.
 
 ```json
-"captions": [
-  {"lang": "en", "text": "A short caption."}
-]
+{
+  "labels": {
+    "action": {
+      "label": "shot"
+    },
+    "attributes": {
+      "labels": ["left_foot", "set_piece"]
+    }
+  }
+}
 ```
 
-### Dense payload (`dense_captions`)
+For smart predictions, a head payload may include `confidence_score` as a float
+from `0.0` to `1.0`:
 
-The current dense editor uses point timestamps:
+```json
+{
+  "labels": {
+    "action": {
+      "label": "shot",
+      "confidence_score": 0.91
+    }
+  }
+}
+```
+
+Confirming a smart prediction removes only `confidence_score`; the chosen label
+stays as the manual annotation.
+
+### Localization
+
+Localization annotations live in `events`. Each event is a point timestamp in
+milliseconds.
 
 ```json
-"dense_captions": [
-  {"position_ms": 4567, "lang": "en", "text": "Dense description."}
-]
+{
+  "events": [
+    {
+      "head": "action",
+      "label": "pass",
+      "position_ms": 1240
+    },
+    {
+      "head": "action",
+      "label": "shot",
+      "position_ms": 4320,
+      "confidence_score": 0.84
+    }
+  ]
+}
 ```
 
-### Q/A payload (`answers`)
+`head` should match a root label head. Smart localization predictions use the
+same optional `confidence_score` convention as classification.
 
-Per-sample grouped answers keep the question text next to one or more answers:
+### Description
+
+Description annotations live in `captions`. The app writes one English caption
+for manual description edits, but additional caption fields are preserved.
 
 ```json
-"answers": [
-  {
-    "question": "How are you?",
-    "answers": ["I am fine.", "I am good."]
-  }
-]
+{
+  "captions": [
+    {
+      "lang": "en",
+      "text": "A player receives the pass and shoots from the edge of the box."
+    }
+  ]
+}
+```
+
+### Dense Description
+
+Dense description annotations live in `dense_captions`. The current dense editor
+uses point timestamps in milliseconds.
+
+```json
+{
+  "dense_captions": [
+    {
+      "position_ms": 1200,
+      "lang": "en",
+      "text": "The midfielder receives the ball."
+    },
+    {
+      "position_ms": 4300,
+      "lang": "en",
+      "text": "The forward takes a shot."
+    }
+  ]
+}
+```
+
+### Question/Answer
+
+Q/A annotations live in grouped per-sample `answers`. Each group stores the
+question text and one or more non-empty answers.
+
+```json
+{
+  "answers": [
+    {
+      "question": "What happens after the pass?",
+      "answers": ["The receiving player shoots."]
+    }
+  ]
+}
+```
+
+Legacy top-level `questions` and per-answer `question_id` entries are not
+persisted. Convert old VQA files with `tools/convert_legacy_vqa_to_grouped.py`.
+
+## Complete Examples
+
+### Classification JSON
+
+```json
+{
+  "version": "2.0",
+  "date": "2026-05-19",
+  "dataset_name": "soccer-classification-demo",
+  "description": "Clip-level action labels.",
+  "modalities": ["video"],
+  "metadata": {
+    "sport": "soccer",
+    "split": "train"
+  },
+  "labels": {
+    "action": {
+      "type": "single_label",
+      "labels": ["pass", "shot", "foul"]
+    },
+    "attributes": {
+      "type": "multi_label",
+      "labels": ["left_foot", "header", "set_piece"]
+    }
+  },
+  "data": [
+    {
+      "id": "clip_0001",
+      "inputs": [
+        {
+          "type": "video",
+          "path": "clips/clip_0001.mp4",
+          "fps": 25.0
+        }
+      ],
+      "labels": {
+        "action": {
+          "label": "shot"
+        },
+        "attributes": {
+          "labels": ["left_foot"]
+        }
+      },
+      "metadata": {
+        "match_id": "match_01"
+      }
+    }
+  ]
+}
+```
+
+### Localization and Dense Description JSON
+
+```json
+{
+  "version": "2.0",
+  "date": "2026-05-19",
+  "dataset_name": "soccer-timeline-demo",
+  "description": "Timestamped events and dense captions.",
+  "modalities": ["video"],
+  "metadata": {},
+  "labels": {
+    "action": {
+      "type": "single_label",
+      "labels": ["pass", "shot", "save"]
+    }
+  },
+  "data": [
+    {
+      "id": "attack_0001",
+      "inputs": [
+        {
+          "type": "video",
+          "path": "clips/attack_0001.mp4",
+          "fps": 25.0
+        }
+      ],
+      "events": [
+        {
+          "head": "action",
+          "label": "pass",
+          "position_ms": 1100
+        },
+        {
+          "head": "action",
+          "label": "shot",
+          "position_ms": 3650
+        }
+      ],
+      "captions": [
+        {
+          "lang": "en",
+          "text": "A quick attack ends with a shot on goal."
+        }
+      ],
+      "dense_captions": [
+        {
+          "position_ms": 1100,
+          "lang": "en",
+          "text": "The midfielder plays a forward pass."
+        },
+        {
+          "position_ms": 3650,
+          "lang": "en",
+          "text": "The striker shoots from inside the area."
+        }
+      ]
+    }
+  ]
+}
+```
+
+### Multi-Input Q/A JSON
+
+```json
+{
+  "version": "2.0",
+  "date": "2026-05-19",
+  "dataset_name": "multi-view-qa-demo",
+  "description": "Two synchronized views with question/answer labels.",
+  "modalities": ["video"],
+  "metadata": {
+    "sport": "basketball"
+  },
+  "labels": {},
+  "data": [
+    {
+      "id": "possession_0001",
+      "inputs": [
+        {
+          "type": "video",
+          "path": "broadcast/possession_0001.mp4",
+          "fps": 30.0
+        },
+        {
+          "type": "video",
+          "path": "baseline/possession_0001.mp4",
+          "fps": 30.0
+        }
+      ],
+      "answers": [
+        {
+          "question": "Which team ends the possession?",
+          "answers": ["The home team."]
+        },
+        {
+          "question": "How does the possession end?",
+          "answers": ["A made three-point shot."]
+        }
+      ]
+    }
+  ]
+}
 ```
 
 ## Save-Time Behavior
 
 On save/export, the app:
 
-- ensures unique sample IDs
-- normalizes/filters invalid or empty answer entries
-- drops legacy top-level `questions` and `question_id` answers; convert old VQA files with `tools/convert_legacy_vqa_to_grouped.py`
-- removes empty optional task blocks
-- rewrites input paths relative to the output JSON location
-- preserves unknown root/sample fields
+- Ensures unique sample IDs.
+- Normalizes input types, including `frame_npy` to `frames_npy`.
+- Rewrites input paths relative to the output JSON location when possible.
+- Recomputes `modalities` from `data[].inputs[]`.
+- Removes empty optional sample fields such as `labels`, `events`, `captions`,
+  `dense_captions`, `answers`, and `metadata`.
+- Normalizes Q/A answers to grouped `{"question": ..., "answers": [...]}` entries
+  with non-empty text.
+- Drops legacy top-level `questions` and `question_id` answer entries.
+- Drops retired sample smart keys such as `smart_labels` and `smart_events`.
+- Does not persist localization `label_colors`; label colors live in app
+  settings.
+- Preserves unknown root and sample fields where possible.