Blog with Preprocessors
- preprocessors
- blog
- markdown
- code highlighting
- DX
18 min read 3957 words
Preprocessing Code Decoration
01. Write:
<!-- shiki-start
l"2" d"highlight"
s"hello" c"underline"
```ts
console.log('hello world');
const highlightedLine = true;
```
shiki-end -->
02. Processed (sans whitespace):
<pre style="--h-daffodil:#575279;--h-dark:#D4D4D4;--h-bellflower:#4c4f69;--h-daffodil-bg:#faf4ed;--h-dark-bg:#000;--h-bellflower-bg:#eff1f5" data-shiki="" data-shiki-lang-ts="" data-shiki-t-block="">
<code>
<span data-line="">
<span style="--h-daffodil:#575279;--h-dark:#9CDCFE;--h-bellflower:#4C4F69;--h-daffodil-font-style:italic;--h-dark-font-style:inherit;--h-bellflower-font-style:inherit">console</span>
<span style="--h-daffodil:#286983;--h-dark:#D4D4D4;--h-bellflower:#179299">.</span>
<span style="--h-daffodil:#D7827E;--h-dark:#DCDCAA;--h-bellflower:#1E66F5;--h-daffodil-font-style:inherit;--h-dark-font-style:inherit;--h-bellflower-font-style:italic">log</span>
<span style="--h-daffodil:#575279;--h-dark:#D4D4D4;--h-bellflower:#4C4F69">(</span>
<span style="--h-daffodil:#EA9D34;--h-dark:#CE9178;--h-bellflower:#40A02B">'</span>
<span style="--h-daffodil:#EA9D34;--h-dark:#CE9178;--h-bellflower:#40A02B" class="underline">hello</span>
<span style="--h-daffodil:#EA9D34;--h-dark:#CE9178;--h-bellflower:#40A02B"> world'</span>
<span style="--h-daffodil:#575279;--h-dark:#D4D4D4;--h-bellflower:#4C4F69">)</span>
<span style="--h-daffodil:#797593;--h-dark:#D4D4D4;--h-bellflower:#7C7F93">;</span>
</span>
<span data-line="" data-line-highlight="">
<span style="--h-daffodil:#286983;--h-dark:#569CD6;--h-bellflower:#8839EF;--h-daffodil-font-style:inherit;--h-dark-font-style:italic;--h-bellflower-font-style:inherit">const</span>
<span style="--h-daffodil:#575279;--h-dark:#4FC1FF;--h-bellflower:#4C4F69;--h-daffodil-font-style:italic;--h-dark-font-style:inherit;--h-bellflower-font-style:inherit"> highlightedLine</span>
<span style="--h-daffodil:#286983;--h-dark:#D4D4D4;--h-bellflower:#179299"> =</span>
<span style="--h-daffodil:#D7827E;--h-dark:#569CD6;--h-bellflower:#FE640B;--h-daffodil-font-style:inherit;--h-dark-font-style:italic;--h-bellflower-font-style:inherit"> true</span>
<span style="--h-daffodil:#797593;--h-dark:#D4D4D4;--h-bellflower:#7C7F93">;</span>
</span>
</code>
</pre>
03. Rendered:
console.log('hello world');
const highlightedLine = true;
Preprocessing Markdown
01. Write:
<!-- md-start
| Heading 1 | Heading 2 | Heading 3 |
| --------- | --------- | --------- |
| cell 1 | cell 2 | cell 3 |
| cell 4 | cell 5 | cell 6 |
md-end -->
02. Processed (sans whitespace):
<table>
<thead>
<tr>
<th>Heading 1</th>
<th>Heading 2</th>
<th>Heading 3</th>
</tr>
</thead>
<tbody>
<tr>
<td>cell 1</td>
<td>cell 2</td>
<td>cell 3</td>
</tr>
<tr>
<td>cell 4</td>
<td>cell 5</td>
<td>cell 6</td>
</tr>
</tbody>
</table>
03. Rendered:
Heading 1 | Heading 2 | Heading 3 |
---|---|---|
cell 1 | cell 2 | cell 3 |
cell 4 | cell 5 | cell 6 |
Preprocessing Math
01. Write:
<!-- \[
\begin{align}
\dot{x} & = \sigma(y-x) \\
\dot{y} & = \rho x - y - xz \\
\dot{z} & = -\beta z + xy
\end{align}
\] -->
02. Processed (sans whitespace):
<span class="katex-display">
<span class="katex">
<span class="katex-mathml">
<math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
<semantics>
<mtable rowspacing="0.25em" columnalign="right left" columnspacing="0em">
<mtr>
<mtd class="mtr-glue">
</mtd>
<mtd>
<mstyle scriptlevel="0" displaystyle="true">
<mover accent="true">
<mi>x</mi>
<mo>˙</mo>
</mover>
</mstyle>
</mtd>
<mtd>
<mstyle scriptlevel="0" displaystyle="true">
<mrow>
<mrow>
</mrow>
<mo>=</mo>
<mi>σ</mi>
<mo stretchy="false">(</mo>
<mi>y</mi>
<mo>−</mo>
<mi>x</mi>
<mo stretchy="false">)</mo>
</mrow>
</mstyle>
</mtd>
<mtd class="mtr-glue">
</mtd>
<mtd class="mml-eqn-num">
</mtd>
</mtr>
<mtr>
<mtd class="mtr-glue">
</mtd>
<mtd>
<mstyle scriptlevel="0" displaystyle="true">
<mover accent="true">
<mi>y</mi>
<mo>˙</mo>
</mover>
</mstyle>
</mtd>
<mtd>
<mstyle scriptlevel="0" displaystyle="true">
<mrow>
<mrow>
</mrow>
<mo>=</mo>
<mi>ρ</mi>
<mi>x</mi>
<mo>−</mo>
<mi>y</mi>
<mo>−</mo>
<mi>x</mi>
<mi>z</mi>
</mrow>
</mstyle>
</mtd>
<mtd class="mtr-glue">
</mtd>
<mtd class="mml-eqn-num">
</mtd>
</mtr>
<mtr>
<mtd class="mtr-glue">
</mtd>
<mtd>
<mstyle scriptlevel="0" displaystyle="true">
<mover accent="true">
<mi>z</mi>
<mo>˙</mo>
</mover>
</mstyle>
</mtd>
<mtd>
<mstyle scriptlevel="0" displaystyle="true">
<mrow>
<mrow>
</mrow>
<mo>=</mo>
<mo>−</mo>
<mi>β</mi>
<mi>z</mi>
<mo>+</mo>
<mi>x</mi>
<mi>y</mi>
</mrow>
</mstyle>
</mtd>
<mtd class="mtr-glue">
</mtd>
<mtd class="mml-eqn-num">
</mtd>
</mtr>
</mtable>
<annotation encoding="application/x-tex"> begin{align}
dot{x} & = sigma(y-x) \
dot{y} & =
ho x - y - xz \
dot{z} & = -beta z + xy
end{align}</annotation>
</semantics>
</math>
</span>
<span class="katex-html" aria-hidden="true">
<span class="base">
<span class="strut" style="height:4.5em;vertical-align:-2em;">
</span>
<span class="mtable">
<span class="col-align-r">
<span class="vlist-t vlist-t2">
<span class="vlist-r">
<span class="vlist" style="height:2.5em;">
<span style="top:-4.66em;">
<span class="pstrut" style="height:3em;">
</span>
<span class="mord">
<span class="mord accent">
<span class="vlist-t">
<span class="vlist-r">
<span class="vlist" style="height:0.6679em;">
<span style="top:-3em;">
<span class="pstrut" style="height:3em;">
</span>
<span class="mord mathnormal">x</span>
</span>
<span style="top:-3em;">
<span class="pstrut" style="height:3em;">
</span>
<span class="accent-body" style="left:-0.1111em;">
<span class="mord">˙</span>
</span>
</span>
</span>
</span>
</span>
</span>
</span>
</span>
<span style="top:-3.16em;">
<span class="pstrut" style="height:3em;">
</span>
<span class="mord">
<span class="mord accent">
<span class="vlist-t vlist-t2">
<span class="vlist-r">
<span class="vlist" style="height:0.6679em;">
<span style="top:-3em;">
<span class="pstrut" style="height:3em;">
</span>
<span class="mord mathnormal" style="margin-right:0.03588em;">y</span>
</span>
<span style="top:-3em;">
<span class="pstrut" style="height:3em;">
</span>
<span class="accent-body" style="left:-0.0833em;">
<span class="mord">˙</span>
</span>
</span>
</span>
<span class="vlist-s">
</span>
</span>
<span class="vlist-r">
<span class="vlist" style="height:0.1944em;">
<span>
</span>
</span>
</span>
</span>
</span>
</span>
</span>
<span style="top:-1.66em;">
<span class="pstrut" style="height:3em;">
</span>
<span class="mord">
<span class="mord accent">
<span class="vlist-t">
<span class="vlist-r">
<span class="vlist" style="height:0.6679em;">
<span style="top:-3em;">
<span class="pstrut" style="height:3em;">
</span>
<span class="mord mathnormal" style="margin-right:0.04398em;">z</span>
</span>
<span style="top:-3em;">
<span class="pstrut" style="height:3em;">
</span>
<span class="accent-body" style="left:-0.0833em;">
<span class="mord">˙</span>
</span>
</span>
</span>
</span>
</span>
</span>
</span>
</span>
</span>
<span class="vlist-s">
</span>
</span>
<span class="vlist-r">
<span class="vlist" style="height:2em;">
<span>
</span>
</span>
</span>
</span>
</span>
<span class="col-align-l">
<span class="vlist-t vlist-t2">
<span class="vlist-r">
<span class="vlist" style="height:2.5em;">
<span style="top:-4.66em;">
<span class="pstrut" style="height:3em;">
</span>
<span class="mord">
<span class="mord">
</span>
<span class="mspace" style="margin-right:0.2778em;">
</span>
<span class="mrel">=</span>
<span class="mspace" style="margin-right:0.2778em;">
</span>
<span class="mord mathnormal" style="margin-right:0.03588em;">σ</span>
<span class="mopen">(</span>
<span class="mord mathnormal" style="margin-right:0.03588em;">y</span>
<span class="mspace" style="margin-right:0.2222em;">
</span>
<span class="mbin">−</span>
<span class="mspace" style="margin-right:0.2222em;">
</span>
<span class="mord mathnormal">x</span>
<span class="mclose">)</span>
</span>
</span>
<span style="top:-3.16em;">
<span class="pstrut" style="height:3em;">
</span>
<span class="mord">
<span class="mord">
</span>
<span class="mspace" style="margin-right:0.2778em;">
</span>
<span class="mrel">=</span>
<span class="mspace" style="margin-right:0.2778em;">
</span>
<span class="mord mathnormal">ρ</span>
<span class="mord mathnormal">x</span>
<span class="mspace" style="margin-right:0.2222em;">
</span>
<span class="mbin">−</span>
<span class="mspace" style="margin-right:0.2222em;">
</span>
<span class="mord mathnormal" style="margin-right:0.03588em;">y</span>
<span class="mspace" style="margin-right:0.2222em;">
</span>
<span class="mbin">−</span>
<span class="mspace" style="margin-right:0.2222em;">
</span>
<span class="mord mathnormal">x</span>
<span class="mord mathnormal" style="margin-right:0.04398em;">z</span>
</span>
</span>
<span style="top:-1.66em;">
<span class="pstrut" style="height:3em;">
</span>
<span class="mord">
<span class="mord">
</span>
<span class="mspace" style="margin-right:0.2778em;">
</span>
<span class="mrel">=</span>
<span class="mspace" style="margin-right:0.2778em;">
</span>
<span class="mord">−</span>
<span class="mord mathnormal" style="margin-right:0.05278em;">β</span>
<span class="mord mathnormal" style="margin-right:0.04398em;">z</span>
<span class="mspace" style="margin-right:0.2222em;">
</span>
<span class="mbin">+</span>
<span class="mspace" style="margin-right:0.2222em;">
</span>
<span class="mord mathnormal">x</span>
<span class="mord mathnormal" style="margin-right:0.03588em;">y</span>
</span>
</span>
</span>
<span class="vlist-s">
</span>
</span>
<span class="vlist-r">
<span class="vlist" style="height:2em;">
<span>
</span>
</span>
</span>
</span>
</span>
</span>
</span>
<span class="tag">
<span class="vlist-t vlist-t2">
<span class="vlist-r">
<span class="vlist" style="height:2.5em;">
<span style="top:-4.5em;">
<span class="pstrut" style="height:2.84em;">
</span>
<span class="eqn-num">
</span>
</span>
<span style="top:-3em;">
<span class="pstrut" style="height:2.84em;">
</span>
<span class="eqn-num">
</span>
</span>
<span style="top:-1.5em;">
<span class="pstrut" style="height:2.84em;">
</span>
<span class="eqn-num">
</span>
</span>
</span>
<span class="vlist-s">
</span>
</span>
<span class="vlist-r">
<span class="vlist" style="height:2em;">
<span>
</span>
</span>
</span>
</span>
</span>
</span>
</span>
</span>
03. Rendered:
import { join } from 'path';
import { createKatexLogger, processKatex } from '@samplekit/preprocess-katex';
import { createMdLogger, processMarkdown } from '@samplekit/preprocess-markdown';
import { createShikiLogger, processCodeblockSync } from '@samplekit/preprocess-shiki';
import adapter from '@sveltejs/adapter-auto';
import { vitePreprocess } from '@sveltejs/vite-plugin-svelte';
import { opts } from './src/lib/shiki/index.js';
const root = join(new URL(import.meta.url).pathname, '..');
const src = join(root, 'src');
const articleRoot = join(src, 'routes/articles');
const formatFilename = (/** @type {string} */ filename) => filename.replace(src, '');
const include = (/** @type {string} */ filename) => filename.startsWith(articleRoot) || filename.endsWith('pp.svelte');
/** @type {import('@sveltejs/kit').Config} */
const config = {
preprocess: [
processCodeblockSync({
include,
logger: createShikiLogger(formatFilename),
opts,
}),
processMarkdown({
include,
logger: createMdLogger(formatFilename),
}),
processKatex({
include,
logger: createKatexLogger(formatFilename),
}),
vitePreprocess(),
],
kit: {
adapter: adapter(),
},
};
export default config;
When creating this website, I had a few DX requirements for the articles:
- preserve all the tooling Svelte offers
- write tables in Markdown
- style dark code blocks with my personal VS Code theme
- add new features (like Math) as needed
- transform everything on the server
- write the code as if it were on the client
- not have to wait for compatibility when Svelte 5 is released
Evaluating the Options
0.1: Unified
Unified is an ecosystem of packages that all accept a standardized abstract syntax tree (AST). The uniformity means
you can convert your language into an AST, run a pipeline of transformations, and then transform it into some other
language. Originally, I thought of using the unified ecosystem to convert Markdown to an AST, use rehypePrettyCode
in the middle for styling, and then convert it to HTML. This is a great solution for technical
and non-technical users alike because of how easy Markdown is to write.
Something like this would work:
// https://github.com/syntax-tree/unist#list-of-utilities unified AST utilities
// https://github.com/syntax-tree/mdast#list-of-utilities markdown AST utilities
// https://github.com/syntax-tree/hast#list-of-utilities html AST utilities
// https://github.com/remarkjs/remark/blob/main/doc/plugins.md remark: markdown AST plugins
// https://github.com/rehypejs/rehype/blob/main/doc/plugins.md rehype: html AST plugins
import matter from 'gray-matter';
import { rehypeAddCopyBtnToCodeTitle, remarkReadingTime, type RemarkReadingTimeData } from './plugins.js';
import fs from 'fs/promises';
import path from 'path';
import rehypeCodeTitles from 'rehype-code-titles';
import { default as rehypePrettyCode, type Theme } from 'rehype-pretty-code';
import rehypeSlug from 'rehype-slug';
import rehypeStringify from 'rehype-stringify';
import remarkGfm from 'remark-gfm';
import remarkParse from 'remark-parse';
import remarkRehype from 'remark-rehype';
import remarkSmartypants from 'remark-smartypants';
import remarkTableOfContents from 'remark-toc';
import { unified } from 'unified';
import { darker } from './code-themes/darker.js';
import type { Parsed } from './types.js';
type PluginData = RemarkReadingTimeData;
/** @throws Error */
export const mdToHTML = async <T extends object>(markdown: string) => {
const split = matter(markdown);
const rawMd = split.content;
const frontMatter = split.data as T;
const result = await unified()
.use(remarkParse) // Convert Markdown string to Markdown AST
.use(remarkGfm) // Use GitHub flavored Markdown
.use(remarkSmartypants) // Convert ASCII to Unicode punctuation: “ ” – — …
.use([[remarkTableOfContents, { tight: true }]]) // Generate TOC list from headings (tight removes <p> from <li> when nested)
.use(remarkReadingTime) // Add reading time to result.data
.use(remarkRehype) // Convert Markdown AST to HTML AST
.use(rehypeSlug) // Add IDs to headings
.use(rehypeCodeTitles) // Add titles to code blocks
.use(rehypePrettyCode, { theme: { light: 'rose-pine-dawn', dark: darker as unknown as Theme } }) // Add code syntax, line/word highlighting, line numbers
.use(rehypeAddCopyBtnToCodeTitle) // Add copy button to code blocks
.use(rehypeStringify) // Convert HTML AST to HTML string
.process(rawMd);
const pluginData = result.data as PluginData;
const readingTime = Math.ceil(pluginData.readingTime.minutes);
return {
rawHTML: result.value as string,
data: { ...frontMatter, readingTime },
};
};
/** @throws Error if the file cannot be read or parsed */
export async function parsePath<T extends object>(mdPath: { slug: string; path: string }): Promise<Parsed<T>> {
const rawMdContent = await fs.readFile(mdPath.path, 'utf-8').catch((_err) => {
throw new Error(`Unable to readFile ${mdPath.path}`);
});
return await mdToHTML<T>(rawMdContent).catch((_err) => {
throw new Error(`Unable to parse ${mdPath.slug}`);
});
}
/** @throws fs.readdir Error */
export async function parseDir<T extends object>(inDir: string): Promise<Parsed<T>[]> {
const mdPaths = (await fs.readdir(inDir, { withFileTypes: true })).map((dirent) => ({
slug: dirent.name,
path: path.join(dirent.path, dirent.name, `${dirent.name}.md`),
}));
const res = await Promise.all(
mdPaths.map(async (p) => {
try {
return await parsePath<T>(p);
} catch (err) {
console.error(`\n${(err as Error).message}`);
return null;
}
}),
);
return res.filter((r) => r !== null) as Parsed<T>[];
}
The largest drawback is that we'd be writing Markdown and forgoing the Svelte ecosystem entirely. Furthermore, even if
that were acceptable, this makes inlining Svelte components cumbersome. Would we write multiple .md
files, import/process them separately, and embed them into .svelte
components? Would we write Markdown, and
inject Svelte components in-between?
0.2: MDsveX
MDsveX – a Markdown preprocessor for Svelte – solves the embedded Svelte problem. Preprocessors transform the input files before passing them to the Svelte compiler, and this particular preprocessor enables writing Markdown and Svelte in the same file.
The biggest concern at the time of my decision was that it hadn't been updated in 7 months with over 150+ open issues. It didn't inspire confidence that it would support Svelte 5 upon its release, which was a large consideration in which method to use. Ideally, the technique we use would be resistant to changes in Svelte. The second reason was that MDsveX is a separate language, and because of that, it wouldn't just work with other Svelte tooling.
I really only want Markdown for three things: tables, code blocks, and math. The general idea of a preprocessor, however, is very appealing.
0.3: Preprocessors
We can easily add a new feature to Svelte by writing a simple preprocessor wrapping a dedicated package for each new functionality. There's Marked for Markdown, Shiki for code highlighting, and KaTeX for math.
.svelte
files alongside the Svelte language server, Prettier,
ESLint, etc. This means we need syntax that is ignored by all tooling. Luckily, we already have such a thing: the HTML
comment: <!-- -->
. Bonus points that it already has a shortcut keybinding and
simply turns into a comment if the preprocessor isn't present. With this, we can decide upon some delimiters.<!-- shiki-start
const foo = "bar";
shiki-end -->
<!-- md-start
# Markdown
md-end -->
<!--\[ V={4 \over 3}\pi r^{3} \] -->
Writing the Preprocessors
Markdown
We'll start with the Markdown preprocessor because it does nothing but wrap Marked
. Writing a
preprocessor that wraps an existing package is dead simple. We just have to remove the code between our delimiters,
process it, and put it back in. There are two obvious choices for how to pull out the code. We can either loop over
the raw string content with indexOf
or we can walk a Svelte AST tree. Here are two examples of how we can
implement the preprocessor.
import { walk } from 'estree-walker';
import MagicString from 'magic-string';
import { marked as hostedMarked } from 'marked';
import { parse, type PreprocessorGroup } from 'svelte/compiler';
import type { Logger } from './logger.js';
const delimiter = { start: 'md-start', end: 'md-end' };
const delimLoc = { start: delimiter.start.length + 1, end: -delimiter.end.length - 1 };
export function processMarkdown({
include,
logger,
marked = hostedMarked,
}: {
include?: (filename: string) => boolean;
logger?: Logger;
marked?: typeof hostedMarked;
} = {}) {
return {
name: 'md',
markup({ content, filename }) {
if (!filename) return;
if (include && !include(filename)) return;
try {
const s = new MagicString(content);
const ast = parse(content, { filename, modern: true });
let count = 0;
walk(ast.fragment, {
enter(node: (typeof ast.fragment.nodes)[number]) {
if (node.type !== 'Comment') return;
const trimmed = node.data.trim();
if (!trimmed.startsWith(delimiter.start)) return;
if (!trimmed.endsWith(delimiter.end)) return;
s.remove(node.start, node.end);
s.appendLeft(node.start, marked(trimmed.slice(delimLoc.start, delimLoc.end), { async: false }) as string);
count++;
},
});
if (count) logger?.info?.({ count }, filename);
return { code: s.toString() };
} catch (err) {
if (err instanceof Error) logger?.error?.(err, filename);
else logger?.error?.(Error('Failed to render Markdown.'), filename);
}
},
} satisfies PreprocessorGroup;
}
import { marked as hostedMarked } from 'marked';
import type { Logger } from './logger.js';
import type { PreprocessorGroup } from 'svelte/compiler';
const delimiter = { start: 'md-start', end: 'md-end' };
/**
* @throws {Error} If the comment block is incomplete.
*/
const replaceComments = (content: string, replacer: (comment: string) => null | string): string => {
let resultContent = content;
let startIdx = resultContent.indexOf('<!--');
while (startIdx !== -1) {
const endIdx = resultContent.indexOf('-->', startIdx + 4);
if (endIdx === -1) throw new Error(`Incomplete comment block at start ${startIdx}. Aborting.`);
const extractedContent = resultContent.substring(startIdx + 4, endIdx);
const replaced = replacer(extractedContent);
if (replaced === null) {
startIdx = resultContent.indexOf('<!--', endIdx + 3);
continue;
}
const before = resultContent.substring(0, startIdx);
const after = resultContent.substring(endIdx + 3);
resultContent = before + replaced + after;
startIdx = resultContent.indexOf('<!--', startIdx + replaced.length);
}
return resultContent;
};
export function processMarkdown({
include,
logger,
marked = hostedMarked,
}: {
include?: (filename: string) => boolean;
logger?: Logger;
marked?: typeof hostedMarked;
} = {}) {
return {
name: 'md',
markup({ content, filename }) {
if (!filename) return;
if (include && !include(filename)) return;
try {
let count = 0;
const code = replaceComments(content, (comment) => {
let trimmed = comment.trim();
if (!trimmed.startsWith(delimiter.start)) return null;
count++;
if (!trimmed.endsWith(delimiter.end)) throw new Error(`Incomplete md (count: ${count}). Aborting.`);
trimmed = trimmed.substring(delimiter.start.length + 1, trimmed.length - delimiter.end.length);
return marked(trimmed, { async: false }) as string;
});
if (count) logger?.info?.({ count }, filename);
return { code };
} catch (err) {
if (err instanceof Error) logger?.error?.(err, filename);
else logger?.error?.(Error('Failed to render Markdown.'), filename);
}
},
} satisfies PreprocessorGroup;
}
export type Logger = {
error?: (e: Error, filename: string) => void;
info?: (a: { count: number }, filename: string) => void;
};
export const createMdLogger = (formatFilename: (filename: string) => void = (filename: string) => filename): Logger => {
return {
error: (e, filename) => console.error(`[PREPROCESS] | Mark | Error | ${formatFilename(filename)} | ${e.message}`),
info: (detail, filename) =>
console.info(`[PREPROCESS] | Mark | Info | ${formatFilename(filename)} | count: ${detail.count}`),
};
};
And we can now register it with Svelte.
import { processMarkdown, createMdLogger } from '@samplekit/preprocess-markdown';
import adapter from '@sveltejs/adapter-auto';
import { vitePreprocess } from '@sveltejs/vite-plugin-svelte';
import { opts } from './src/lib/shiki/index.js';
const preprocessorRoot = `${import.meta.dirname}/src/routes/`;
const formatFilename = (/** @type {string} */ filename) => filename.replace(preprocessorRoot, '');
const include = (/** @type {string} */ filename) => filename.startsWith(preprocessorRoot);
/** @type {import('@sveltejs/kit').Config} */
const config = {
preprocess: [
processMarkdown({
include,
logger: createMdLogger(formatFilename),
}),
vitePreprocess(),
],
kit: {
adapter: adapter(),
},
};
export default config;
Hopefully it's clear how powerful this simple idea is. We could write some complicated logic to parse a file and determine where a table might start or end, but that would necessarily require us to work in a different language without the Svelte ecosystem. In my opinion sacrificing Svelte tooling to use these preprocessors would be the wrong tradeoff.
Math
We'll do the same thing for Markdown, but in this version, we'll want two delimiters. One for inline math like this and one for display math like this:
It's very similar to the AST approach above.
import { walk } from 'estree-walker';
import katex from 'katex';
import MagicString from 'magic-string';
import { parse, type PreprocessorGroup } from 'svelte/compiler';
import type { Logger } from './logger.js';
const display = { start: String.raw`\[`, end: String.raw`\]` };
const inline = { start: String.raw`\(`, end: String.raw`\)` };
const delimLoc = { start: 3, end: -3 };
type RenderToString = (
tex: string,
options: {
displayMode?: boolean | undefined;
throwOnError: true;
strict: (errorCode: string, errorMsg: string) => 'ignore' | undefined;
},
) => string;
export function processKatex({
include,
logger,
renderToString = katex.renderToString,
}: {
include?: (filename: string) => boolean;
logger?: Logger;
renderToString?: RenderToString;
} = {}) {
return {
name: 'katex',
markup({ content, filename }) {
if (!filename) return;
if (include && !include(filename)) return;
const s = new MagicString(content);
const ast = parse(content, { filename, modern: true });
let count = 0;
walk(ast.fragment, {
enter(node: (typeof ast.fragment.nodes)[number]) {
if (node.type !== 'Comment') return;
const trimmed = node.data.trim();
let displayMode: boolean | undefined = undefined;
if (trimmed.startsWith(display.start) && trimmed.endsWith(display.end)) displayMode = true;
else if (trimmed.startsWith(inline.start) && trimmed.endsWith(inline.end)) displayMode = false;
if (displayMode === undefined) return;
s.remove(node.start, node.end);
let parsed;
try {
const rawInput = String.raw`${trimmed.slice(delimLoc.start, delimLoc.end)}`;
parsed = renderToString(rawInput, {
displayMode,
throwOnError: true,
});
} catch (err) {
logger?.error?.(err instanceof Error ? err : Error('Failed to render KaTeX.'), filename);
return;
}
const content = displayMode
? `<div class="overflow-x-auto">{@html \`${parsed}\`}</div>`
: `{@html \`${parsed}\`}`;
s.appendLeft(node.start, content);
count++;
},
});
if (count) logger?.info?.({ count }, filename);
return { code: s.toString() };
},
} satisfies PreprocessorGroup;
}
Annoyingly, KaTeX logs directly to the console. To prevent this, let's wrap the call to KaTeX in a trap function.
import { walk } from 'estree-walker';
import katex from 'katex';
import MagicString from 'magic-string';
import { parse, type PreprocessorGroup } from 'svelte/compiler';
import type { Logger } from './logger.js';
const display = { start: String.raw`\[`, end: String.raw`\]` };
const inline = { start: String.raw`\(`, end: String.raw`\)` };
const delimLoc = { start: 3, end: -3 };
// https://github.com/KaTeX/KaTeX/issues/3720
const catchStdErr = ({ tmpWrite, trappedFn }: { trappedFn: () => void; tmpWrite: (str: string) => boolean }) => {
const write = process.stdout.write;
try {
process.stderr.write = tmpWrite;
trappedFn();
} finally {
process.stderr.write = write;
}
};
type RenderToString = (
tex: string,
options: {
displayMode?: boolean | undefined;
throwOnError: true;
strict: (errorCode: string, errorMsg: string) => 'ignore' | undefined;
},
) => string;
export function processKatex({
include,
logger,
renderToString = katex.renderToString,
}: {
include?: (filename: string) => boolean;
logger?: Logger;
renderToString?: RenderToString;
} = {}) {
return {
name: 'katex',
markup({ content, filename }) {
if (!filename) return;
if (include && !include(filename)) return;
const s = new MagicString(content);
const ast = parse(content, { filename, modern: true });
let count = 0;
walk(ast.fragment, {
enter(node: (typeof ast.fragment.nodes)[number]) {
if (node.type !== 'Comment') return;
const trimmed = node.data.trim();
let displayMode: boolean | undefined = undefined;
if (trimmed.startsWith(display.start) && trimmed.endsWith(display.end)) displayMode = true;
else if (trimmed.startsWith(inline.start) && trimmed.endsWith(inline.end)) displayMode = false;
if (displayMode === undefined) return;
s.remove(node.start, node.end);
let parsed;
try {
const rawInput = String.raw`${trimmed.slice(delimLoc.start, delimLoc.end)}`;
const warns: Error[] = [];
catchStdErr({
trappedFn: () => {
parsed = renderToString(rawInput, {
displayMode,
throwOnError: true,
});
},
tmpWrite: (str) => {
if (!str.startsWith('No character metrics for ')) warns.push(Error(str));
return true;
},
});
if (logger?.warn) {
warns.forEach((err) => logger.warn?.(err, filename));
}
} catch (err) {
logger?.error?.(err instanceof Error ? err : Error('Failed to render KaTeX.'), filename);
return;
}
const content = displayMode
? `<div class="overflow-x-auto">{@html \`${parsed}\`}</div>`
: `{@html \`${parsed}\`}`;
s.appendLeft(node.start, content);
count++;
},
});
if (count) logger?.info?.({ count }, filename);
return { code: s.toString() };
},
} satisfies PreprocessorGroup;
}
Looks good. But what about reactivity? It's quite possible you'll want to use handlebar substitution inside the
equations. If we simply tried to use it as is, KaTeX would choke on the syntax. We'll have to make a special TeX like
macro for Svelte. \s
seems as good as any. It should take the Svelte content out, replace it with unique single character placeholders,
process the TeX, and then put the Svelte content back in.
// the nuts and bolts of it
const { svelteFreeString, extractedSvelteContent } = replaceSvelteAndStore(rawInput);
const mathString = katex.renderToString(svelteFreeString)
const parsed = restoreSvelte(mathString, extractedSvelteContent);
We'll use some Unicode characters as our storage placeholder.
import { walk } from 'estree-walker';
import katex from 'katex';
import MagicString from 'magic-string';
import { parse, type PreprocessorGroup } from 'svelte/compiler';
import type { Logger } from './logger.js';
const display = { start: String.raw`\[`, end: String.raw`\]` };
const inline = { start: String.raw`\(`, end: String.raw`\)` };
const delimLoc = { start: 3, end: -3 };
// https://github.com/KaTeX/KaTeX/issues/3720
const catchStdErr = ({ tmpWrite, trappedFn }: { trappedFn: () => void; tmpWrite: (str: string) => boolean }) => {
const write = process.stdout.write;
try {
process.stderr.write = tmpWrite;
trappedFn();
} finally {
process.stderr.write = write;
}
};
const unicodeInsertionPlaceholders = [
'␇',
'␈',
'␉',
'␊',
'␋',
'␌',
'␍',
'␎',
'␏',
'␐',
'␑',
'␒',
'␓',
'␔',
'␕',
'␖',
'␗',
'␘',
'␙',
'␚',
'␛',
'␜',
'␝',
'␞',
'␟',
'␠',
];
function replaceSvelteAndStore(input: string): { svelteFreeString: string; extractedSvelteContent: string[] } {
const extractedSvelteContent: string[] = [];
let index = 0;
const svelteFreeString = input.replace(/\\s\{([^}]*)\}/g, (_match, p1) => {
if (index >= unicodeInsertionPlaceholders.length) throw new Error('Too many variable substitutions.');
extractedSvelteContent.push(p1);
const unicodePlaceholder = unicodeInsertionPlaceholders[index];
index++;
return `{` + unicodePlaceholder + `}`;
});
return { svelteFreeString, extractedSvelteContent };
}
function restoreSvelte(mathString: string, extractedSvelteContent: string[]): string {
if (!extractedSvelteContent.length) return mathString;
const unicodeMap = new Map();
extractedSvelteContent.forEach((content, i) => {
unicodeMap.set(unicodeInsertionPlaceholders[i], content);
});
const unicodePlaceholderRegex = new RegExp(`(${unicodeInsertionPlaceholders.join('|')})`, 'g');
return mathString.replaceAll(unicodePlaceholderRegex, (placeholder) => {
const svelteContent = unicodeMap.get(placeholder);
return `\${${svelteContent}}`;
});
}
type RenderToString = (
tex: string,
options: {
displayMode?: boolean | undefined;
throwOnError: true;
strict: (errorCode: string, errorMsg: string) => 'ignore' | undefined;
},
) => string;
export function processKatex({
include,
logger,
renderToString = katex.renderToString,
}: {
include?: (filename: string) => boolean;
logger?: Logger;
renderToString?: RenderToString;
} = {}) {
return {
name: 'katex',
markup({ content, filename }) {
if (!filename) return;
if (include && !include(filename)) return;
const s = new MagicString(content);
const ast = parse(content, { filename, modern: true });
let count = 0;
walk(ast.fragment, {
enter(node: (typeof ast.fragment.nodes)[number]) {
if (node.type !== 'Comment') return;
const trimmed = node.data.trim();
let displayMode: boolean | undefined = undefined;
if (trimmed.startsWith(display.start) && trimmed.endsWith(display.end)) displayMode = true;
else if (trimmed.startsWith(inline.start) && trimmed.endsWith(inline.end)) displayMode = false;
if (displayMode === undefined) return;
s.remove(node.start, node.end);
let parsed;
try {
const rawInput = String.raw`${trimmed.slice(delimLoc.start, delimLoc.end)}`;
const { svelteFreeString, extractedSvelteContent } = replaceSvelteAndStore(rawInput);
const warns: Error[] = [];
catchStdErr({
trappedFn: () => {
const mathString = renderToString(svelteFreeString, {
displayMode,
throwOnError: true,
strict: (errorCode: string, errorMsg: string) => {
if (errorCode === 'unknownSymbol' && errorMsg.startsWith('Unrecognized Unicode character'))
return 'ignore';
},
});
parsed = restoreSvelte(mathString, extractedSvelteContent);
},
tmpWrite: (str) => {
if (!str.startsWith('No character metrics for ')) warns.push(Error(str));
return true;
},
});
if (logger?.warn) {
warns.forEach((err) => logger.warn?.(err, filename));
}
} catch (err) {
logger?.error?.(err instanceof Error ? err : Error('Failed to render KaTeX.'), filename);
return;
}
const content = displayMode
? `<div class="overflow-x-auto">{@html \`${parsed}\`}</div>`
: `{@html \`${parsed}\`}`;
s.appendLeft(node.start, content);
count++;
},
});
if (count) logger?.info?.({ count }, filename);
return { code: s.toString() };
},
} satisfies PreprocessorGroup;
}
Easy – two down and one to go!
Code Decoration
<pre>
tag, line ranges, index ranges, substrings, etc.We'll need a place for our preprocessor to look for the options. If we look back at our delimiter syntax, we have an obvious place: between the front delimiter and code fence. For convenience, let's also allow line options to be at the end of the line they're scoped to.
<!-- shiki-start
s"foo" c"border border-accent-9"
```ts
const foo = "bar";
const added = true;//! d"diff-add"
```
shiki-end -->
const foo = "bar";
const added = true;
Because we're supporting options, this preprocessor will be more involved than the other two. We'll first need to
split the raw string into the global options, inline options, code fence, language, and code. Then we can pass
everything off to a function that calls Shiki codeToHtml
and applies Shiki decorations based on the extracted
options.
import { walk } from 'estree-walker';
import MagicString from 'magic-string';
import { parse } from 'svelte/compiler';
import { getOrLoadOpts } from './defaultOpts.js';
import { codeToDecoratedHtmlSync } from './highlight.js';
import { stripOptions } from './strip-options/index.js';
import type { PreprocessOpts, Logger, PreprocessorGroup } from './types.js';
export function processCodeblockSync({
opts,
include,
logger,
}: {
include?: (filename: string) => boolean;
logger?: Logger;
opts: PreprocessOpts;
}) {
return {
name: 'codeblock',
markup({ content, filename }) {
if (!filename) return;
if (include && !include(filename)) return;
const s = new MagicString(content);
const ast = parse(content, { filename, modern: true });
let count = 0;
walk(ast.fragment, {
enter(node) {
if (node.type !== 'Comment') return;
const trimmed = node.data.trim();
if (!trimmed.startsWith(opts.delimiters.common)) return;
if (trimmed.endsWith(opts.delimiters.fenced.end)) {
s.remove(node.start, node.end);
const prepared = stripOptions(
// escapePreprocessor allows us to write things like --> which would
// otherwise terminate the html comment surrounding our preprocessor
opts.escapePreprocessor({
code: trimmed.slice(opts.delimiters.fenced.startLoc, opts.delimiters.fenced.endLoc),
}),
(e: Error) => logger?.warn?.(e, filename),
);
if (prepared instanceof Error) {
return logger?.error?.(prepared, filename);
}
const {
lang,
lineToProperties,
tranName, // we haven't discussed this, but this allows us to register custom transform functions
preProperties,
strippedCode,
windowProperties,
allLinesProperties,
} = prepared;
if (!opts.highlighterCore.getLoadedLanguages().includes(lang)) {
return logger?.error?.(
Error(
lang
? `Language ${lang} not loaded. Hint: try \`opts: await getOrLoadOpts({ langNames: ['${lang}'] })\` and restart the server.`
: 'No lang provided.',
),
filename,
);
}
let transformName = 'block';
if (tranName && opts.transformMap[tranName]) {
transformName = tranName;
} else if (tranName !== null) {
const keys = Object.keys(opts.transformMap);
logger?.warn?.(
Error(`${tranName} not in opts.transformMap. Defaulting to 'block'. Options: (${keys.join(', ')}).`),
filename,
);
}
const { error, data } = codeToDecoratedHtmlSync({
opts,
lineToProperties,
allLinesProperties,
preProperties,
windowProperties,
code: strippedCode,
transformName,
lang,
});
if (error) {
return logger?.error?.(error, filename);
}
s.appendLeft(node.start, data);
count++;
return;
}
for (const { delimLoc, delimiter, lang } of opts.delimiters.inline) {
if (trimmed.endsWith(delimiter)) {
s.remove(node.start, node.end);
const { error, data } = codeToDecoratedHtmlSync({
code: opts.escapePreprocessor({ code: trimmed.slice(delimLoc.start, delimLoc.end) }),
lang,
opts,
transformName: 'inline',
});
if (error) {
return logger?.error?.(error, filename);
}
s.appendLeft(node.start, data);
count++;
break;
}
}
},
});
if (count) {
logger?.info?.({ count }, filename);
}
return { code: s.toString() };
},
} satisfies PreprocessorGroup;
}
Shiki is async, so we'll wrap the previous function with another that awaits the default options containing the Shiki Highlighter
.
export async function processCodeblock({
include,
logger,
}: {
include?: (filename: string) => boolean;
logger?: Logger;
}) {
return processCodeblockSync({ include, logger, opts: await getOrLoadOpts() });
}
For brevity, the details behind the options extraction have been omitted, but the actual preprocessor part is very similar to the other two. Full functionality details are available in the docs.
Code Editor Support
We've finished our preprocessors, but they're hard to use without syntax highlighting in our code editor. Everything is formatted like a comment. Let's fix that by writing a VS Code extension!
Overview
The best way to do this would be to follow the official VS Code extension docs. It shows how to use Yeoman to scaffold an extension. These are simple extensions though, so we'll just write them directly instead.
We need to write a package.json
with a contributes
field. In it, we'll have grammars
and snippets
. When we're ready, we'll run vsce package
to package the
extension and metadata. Then, we can install it with code --install-extension <extension-name>
. This will add it to the ~/.vscode/extensions
folder (or wherever your install location is).
Syntaxes
Let's use the Markdown preprocessor as an example. We'll create a repo for the VS Code extension and add a package.json
. In it, we'll add a section to point VS Code to the grammar declaration file.
{
"contributes": {
"grammars": [
{
"scopeName": "md.pp-svelte",
"injectTo": [
"source.svelte"
],
"path": "./syntaxes/md.pp-svelte.tmLanguage.json"
}
],
}
}
Developer: Inspect Editor Tokens and Scopes
, you can see how things are
highlighted. Everything exists in a textmate scope, and that scope is targeted by your VS Code theme for decorating.
Our job is to tell VS Code, "when you see <!-- md-start md-end -->
in a Svelte template, open up our custom scope, highlight the delimiter like a control keyword, and highlight the internals
like Markdown." This is how we do it:{
"name": "Svelte Component Markdown Injection",
// the name matches what we put in the package.json
"scopeName": "md.pp-svelte",
// only match within the svelte scope
"injectionSelector": "L:source.svelte",
"fileTypes": [],
"patterns": [
{
// start with the HTML comment opening. Use an oniguruma lookahead to only match if the next thing is "md-start"
"begin": "<!--(?=\\s*md-start)",
// end with the HTML comment closing. Use an oniguruma lookbehind to only match if the last thing matched was "md-end"
"end": "(?<=.*?md-end)\\s*&closehtmlcomment",
// give the entire match our custom scope
"name": "source.pp-svelte",
"patterns": [
{
// inside the match, mark the md-start as a keyword
"begin": "md-start",
"beginCaptures": { "0": { "name": "keyword.control.pp-svelte" } },
// also match the end as a keyword
"end": ".*?(md-end)",
"endCaptures": { "1": { "name": "keyword.control.pp-svelte" } },
// give everything within the <!-- and --> our custom markdown scope
"name": "markdown.pp-svelte",
// give everything within the md-start and md-end delimiters the real markdown scope
"contentName": "text.html.markdown",
// actually use the markdown scope for everything between the delimiters
"patterns": [{ "include": "text.html.markdown" }]
}
]
}
]
}
Although this particular file is fairly straightforward, Oniguruma syntax can be... difficult. Writing it in JSON is even more tedious because it requires double escaping. regex101 and this decade old blog post by Matt Neuburg are useful shields to defend yourself against the beast that is Oniguruma regex.
Snippets
<!-- md-start md-end -->
all the time.
Let's write a VS Code snippet so we can write mds
instead. Back in our package.json
we can point to a snippets file.{
"contributes": {
"grammars": [
{
"scopeName": "md.pp-svelte",
"injectTo": [
"source.svelte"
],
"path": "./syntaxes/md.pp-svelte.tmLanguage.json"
}
],
"snippets": [
{
"language": "svelte",
"path": "./snippets/md.pp-svelte.code-snippets"
}
]
}
}
And write it just like a normal VS Code snippets file.
{
"New Markdown block": {
"scope": "text.svelte",
"prefix": "md-start",
"description": "Creates a new markdown block.",
"body": ["<!-- md-start", "$1", "md-end -->", "$0"],
},
}
Voilà. Now rinse and repeat for the other two – though they're all subtly different. Find the source code here.
Conclusion
NPM Package | VS Code Extension |
---|---|
@samplekit/preprocess-katex | samplekit.svelte-pp-katex |
@samplekit/preprocess-markdown | samplekit.svelte-pp-markdown |
@samplekit/preprocess-shiki | samplekit.svelte-pp-shiki |
Svelte preprocessors are quite powerful, and by using the HTML comment delimiters, we can have our preprocessors and standard tooling too. I'm grateful to MDsveX, svelte-put, and Melt UI for introducing them to me, and I hope this article helps you get started with your own preprocessors. If you have a question or want to share your preprocessor, share it in the GitHub discussions!
Happy coding!