Skip to content

lancedb/docs

Repository files navigation

LanceDB Documentation

Home of the LanceDB documentation. Built using Mintlify.

Development

Install the Mintlify CLI to preview the documentation changes locally. To install, use the following command

npm i -g mintlify

Run the following commands at the root of the documentation (/docs/ in this repo, where docs.json is located).

cd docs
mint dev

Check broken links (applies to internal links within this docs site only):

mint broken-links

Generate snippets

To generate snippets, use uv to sync your local Python environment so that you can run the Python script described below.

uv sync

The Python, TypeScript and Rust code snippets used in the documentation are tested prior to use in the docs. These tests are located in the tests/ directory. Run the tests locally for each language when building the docs locally.

MDX snippets are generated by running a separate script scripts/mdx_snippets_gen.py, as Mintlify cannot scan the contents of raw code files -- it requires that the snippets are in MDX files under the snippets directory.

A Makefile is provided with convience functions that run the snippet generation for each language:

# Generate snippets for each language, one by one
make py
make ts
make rs

# Or, generate them for all languages in one command
make snippets

The generated snippets are placed in the appropriate file in /docs/snippets/ directory, making them available for importing in the corresponding file.

The following sequence of steps are run:

  1. Run tests for py, ts, rs files that contain new code you added, and verify that the tests pass locally
  2. Generate MDX snippets via the make snippets command
  3. Import MDX snippets in the corresponding MDX docs page
  4. Include the MDX snippet as a parameter inside a <CodeBlock> JSX component in Mintlify

Creating and using snippets for code blocks in the MDX files helps ensure that we are placing code that's been tested (per recent LanceDB releases) in the hands of users.

Note

As far as possible, do not add code snippets manually inside triple-backticks! Write the tests for the required language in tests/* directory, then generate the snippets programmatically via the Makefile commands.

Sync Hugging Face dataset pages

The Datasets tab is populated from lance-format/lance-huggingface, the master repository where each Lance dataset published under the lance-format Hugging Face organization has its own directory with an HF_DATASET_CARD.md. That same file is what gets pushed to the Hub as the dataset's README.md via the hf CLI, so the GitHub repo is the single source of truth for the content of every dataset card.

To avoid maintaining the same content in two places, the per-dataset MDX pages under docs/datasets/ are generated from those upstream cards via scripts/sync_hf_datasets.py. The script:

  1. Reads scripts/hf_datasets.yaml, which lists every dataset to publish and maps the upstream directory name, the URL slug, the HF Hub repo, and the human-readable title.
  2. Fetches each HF_DATASET_CARD.md from lance-format/lance-huggingface on GitHub.
  3. Rewrites the frontmatter for Mintlify (sets title, sidebarTitle, description), strips the upstream H1, injects a "View on Hugging Face" card at the top, and sanitizes known MDX hazards (bibtex citations outside code fences, literal <> in prose).
  4. Writes docs/datasets/<slug>.mdx, regenerates the card grid in docs/datasets/index.mdx between the HF_SYNC:START / HF_SYNC:END markers, and updates the Datasets tab in docs/docs.json to keep the sidebar in sync.

Run it from the repo root:

make hf-sync

Adding a new dataset

  1. Author the new dataset's HF_DATASET_CARD.md upstream in lance-format/lance-huggingface (and push it to the Hub as usual).
  2. Add a single line for the dataset under the appropriate category in scripts/hf_datasets.yaml. The four fields (dir, slug, hf, title) are explicit because the GitHub directory name, the HF Hub repo slug, and the desired URL slug don't follow a derivable convention.
  3. Run make hf-sync. The script will fetch the new card, generate docs/datasets/<slug>.mdx, refresh the landing-page card grid, and add the new page to the Datasets tab in docs/docs.json.
  4. Preview locally with mint dev and commit the changes (the MDX page, the regenerated index.mdx, the updated docs.json, and the new yaml entry).

If you remove a dataset from the yaml, the next make hf-sync will delete its MDX file and drop the sidebar entry. The script hard-fails on any fetch error — partial regeneration would be worse than a clear error.

About

Documentation for LanceDB, the Multimodal Lakehouse for AI

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors