Blog with Preprocessors

  • preprocessors
  • blog
  • markdown
  • code highlighting
  • DX

18 min read 3957 words

Preprocessing Code Decoration

01. Write:

<!-- shiki-start
l"2" d"highlight"
s"hello" c"underline"
```ts
console.log('hello world');
const highlightedLine = true;
```
shiki-end -->

02. Processed (sans whitespace):

<pre style="--h-daffodil:#575279;--h-dark:#D4D4D4;--h-bellflower:#4c4f69;--h-daffodil-bg:#faf4ed;--h-dark-bg:#000;--h-bellflower-bg:#eff1f5" data-shiki="" data-shiki-lang-ts="" data-shiki-t-block="">
	<code>
		<span data-line="">
			<span style="--h-daffodil:#575279;--h-dark:#9CDCFE;--h-bellflower:#4C4F69;--h-daffodil-font-style:italic;--h-dark-font-style:inherit;--h-bellflower-font-style:inherit">console</span>
			<span style="--h-daffodil:#286983;--h-dark:#D4D4D4;--h-bellflower:#179299">.</span>
			<span style="--h-daffodil:#D7827E;--h-dark:#DCDCAA;--h-bellflower:#1E66F5;--h-daffodil-font-style:inherit;--h-dark-font-style:inherit;--h-bellflower-font-style:italic">log</span>
			<span style="--h-daffodil:#575279;--h-dark:#D4D4D4;--h-bellflower:#4C4F69">(</span>
			<span style="--h-daffodil:#EA9D34;--h-dark:#CE9178;--h-bellflower:#40A02B">'</span>
			<span style="--h-daffodil:#EA9D34;--h-dark:#CE9178;--h-bellflower:#40A02B" class="underline">hello</span>
			<span style="--h-daffodil:#EA9D34;--h-dark:#CE9178;--h-bellflower:#40A02B"> world'</span>
			<span style="--h-daffodil:#575279;--h-dark:#D4D4D4;--h-bellflower:#4C4F69">)</span>
			<span style="--h-daffodil:#797593;--h-dark:#D4D4D4;--h-bellflower:#7C7F93">;</span>
		</span>
		<span data-line="" data-line-highlight="">
			<span style="--h-daffodil:#286983;--h-dark:#569CD6;--h-bellflower:#8839EF;--h-daffodil-font-style:inherit;--h-dark-font-style:italic;--h-bellflower-font-style:inherit">const</span>
			<span style="--h-daffodil:#575279;--h-dark:#4FC1FF;--h-bellflower:#4C4F69;--h-daffodil-font-style:italic;--h-dark-font-style:inherit;--h-bellflower-font-style:inherit"> highlightedLine</span>
			<span style="--h-daffodil:#286983;--h-dark:#D4D4D4;--h-bellflower:#179299"> =</span>
			<span style="--h-daffodil:#D7827E;--h-dark:#569CD6;--h-bellflower:#FE640B;--h-daffodil-font-style:inherit;--h-dark-font-style:italic;--h-bellflower-font-style:inherit"> true</span>
			<span style="--h-daffodil:#797593;--h-dark:#D4D4D4;--h-bellflower:#7C7F93">;</span>
		</span>
	</code>
</pre>

03. Rendered:

console.log('hello world');
const highlightedLine = true;

When creating this website, I had a few DX requirements for the articles:

  • preserve all the tooling Svelte offers
  • write tables in Markdown
  • style dark code blocks with my personal VS Code theme
  • add new features (like Math) as needed
  • transform everything on the server
  • write the code as if it were on the client
  • not have to wait for compatibility when Svelte 5 is released

Evaluating the Options

0.1: Unified

Unified is an ecosystem of packages that all accept a standardized abstract syntax tree (AST). The uniformity means you can convert your language into an AST, run a pipeline of transformations, and then transform it into some other language. Originally, I thought of using the unified ecosystem to convert Markdown to an AST, use rehypePrettyCode in the middle for styling, and then convert it to HTML. This is a great solution for technical and non-technical users alike because of how easy Markdown is to write.

Something like this would work:

Unified Markdown Parser
// https://github.com/syntax-tree/unist#list-of-utilities unified AST utilities
// https://github.com/syntax-tree/mdast#list-of-utilities markdown AST utilities
// https://github.com/syntax-tree/hast#list-of-utilities html AST utilities
// https://github.com/remarkjs/remark/blob/main/doc/plugins.md remark: markdown AST plugins
// https://github.com/rehypejs/rehype/blob/main/doc/plugins.md rehype: html AST plugins

import matter from 'gray-matter';
import { rehypeAddCopyBtnToCodeTitle, remarkReadingTime, type RemarkReadingTimeData } from './plugins.js';
import fs from 'fs/promises';
import path from 'path';
import rehypeCodeTitles from 'rehype-code-titles';
import { default as rehypePrettyCode, type Theme } from 'rehype-pretty-code';
import rehypeSlug from 'rehype-slug';
import rehypeStringify from 'rehype-stringify';
import remarkGfm from 'remark-gfm';
import remarkParse from 'remark-parse';
import remarkRehype from 'remark-rehype';
import remarkSmartypants from 'remark-smartypants';
import remarkTableOfContents from 'remark-toc';
import { unified } from 'unified';
import { darker } from './code-themes/darker.js';
import type { Parsed } from './types.js';

type PluginData = RemarkReadingTimeData;

/** @throws Error */
export const mdToHTML = async <T extends object>(markdown: string) => {
	const split = matter(markdown);
	const rawMd = split.content;
	const frontMatter = split.data as T;

	const result = await unified()
		.use(remarkParse) // Convert Markdown string to Markdown AST
		.use(remarkGfm) // Use GitHub flavored Markdown
		.use(remarkSmartypants) // Convert ASCII to Unicode punctuation: “ ” – — …
		.use([[remarkTableOfContents, { tight: true }]]) // Generate TOC list from headings (tight removes <p> from <li> when nested)
		.use(remarkReadingTime) // Add reading time to result.data
		.use(remarkRehype) // Convert Markdown AST to HTML AST
		.use(rehypeSlug) // Add IDs to headings
		.use(rehypeCodeTitles) // Add titles to code blocks
		.use(rehypePrettyCode, { theme: { light: 'rose-pine-dawn', dark: darker as unknown as Theme } }) // Add code syntax, line/word highlighting, line numbers
		.use(rehypeAddCopyBtnToCodeTitle) // Add copy button to code blocks
		.use(rehypeStringify) // Convert HTML AST to HTML string
		.process(rawMd);

	const pluginData = result.data as PluginData;
	const readingTime = Math.ceil(pluginData.readingTime.minutes);

	return {
		rawHTML: result.value as string,
		data: { ...frontMatter, readingTime },
	};
};

/** @throws Error if the file cannot be read or parsed */
export async function parsePath<T extends object>(mdPath: { slug: string; path: string }): Promise<Parsed<T>> {
	const rawMdContent = await fs.readFile(mdPath.path, 'utf-8').catch((_err) => {
		throw new Error(`Unable to readFile ${mdPath.path}`);
	});
	return await mdToHTML<T>(rawMdContent).catch((_err) => {
		throw new Error(`Unable to parse ${mdPath.slug}`);
	});
}

/** @throws fs.readdir Error */
export async function parseDir<T extends object>(inDir: string): Promise<Parsed<T>[]> {
	const mdPaths = (await fs.readdir(inDir, { withFileTypes: true })).map((dirent) => ({
		slug: dirent.name,
		path: path.join(dirent.path, dirent.name, `${dirent.name}.md`),
	}));

	const res = await Promise.all(
		mdPaths.map(async (p) => {
			try {
				return await parsePath<T>(p);
			} catch (err) {
				console.error(`\n${(err as Error).message}`);
				return null;
			}
		}),
	);
	return res.filter((r) => r !== null) as Parsed<T>[];
}

The largest drawback is that we'd be writing Markdown and forgoing the Svelte ecosystem entirely. Furthermore, even if that were acceptable, this makes inlining Svelte components cumbersome. Would we write multiple .md files, import/process them separately, and embed them into .svelte components? Would we write Markdown, and inject Svelte components in-between?

0.2: MDsveX

MDsveX – a Markdown preprocessor for Svelte – solves the embedded Svelte problem. Preprocessors transform the input files before passing them to the Svelte compiler, and this particular preprocessor enables writing Markdown and Svelte in the same file.

The biggest concern at the time of my decision was that it hadn't been updated in 7 months with over 150+ open issues. It didn't inspire confidence that it would support Svelte 5 upon its release, which was a large consideration in which method to use. Ideally, the technique we use would be resistant to changes in Svelte. The second reason was that MDsveX is a separate language, and because of that, it wouldn't just work with other Svelte tooling.

I really only want Markdown for three things: tables, code blocks, and math. The general idea of a preprocessor, however, is very appealing.

0.3: Preprocessors

We can easily add a new feature to Svelte by writing a simple preprocessor wrapping a dedicated package for each new functionality. There's Marked for Markdown, Shiki for code highlighting, and KaTeX for math.

The main requirement is to work inside .svelte files alongside the Svelte language server, Prettier, ESLint, etc. This means we need syntax that is ignored by all tooling. Luckily, we already have such a thing: the HTML comment:
<!--  -->
. Bonus points that it already has a shortcut keybinding and simply turns into a comment if the preprocessor isn't present. With this, we can decide upon some delimiters.
<!-- shiki-start
const foo = "bar";
shiki-end -->
<!-- md-start
# Markdown
md-end -->
<!--\[ V={4 \over 3}\pi r^{3} \] -->

Writing the Preprocessors

Markdown

We'll start with the Markdown preprocessor because it does nothing but wrap Marked. Writing a preprocessor that wraps an existing package is dead simple. We just have to remove the code between our delimiters, process it, and put it back in. There are two obvious choices for how to pull out the code. We can either loop over the raw string content with indexOf or we can walk a Svelte AST tree. Here are two examples of how we can implement the preprocessor.

import { walk } from 'estree-walker';
import MagicString from 'magic-string';
import { marked as hostedMarked } from 'marked';
import { parse, type PreprocessorGroup } from 'svelte/compiler';
import type { Logger } from './logger.js';

const delimiter = { start: 'md-start', end: 'md-end' };
const delimLoc = { start: delimiter.start.length + 1, end: -delimiter.end.length - 1 };

export function processMarkdown({
	include,
	logger,
	marked = hostedMarked,
}: {
	include?: (filename: string) => boolean;
	logger?: Logger;
	marked?: typeof hostedMarked;
} = {}) {
	return {
		name: 'md',
		markup({ content, filename }) {
			if (!filename) return;
			if (include && !include(filename)) return;

			try {
				const s = new MagicString(content);
				const ast = parse(content, { filename, modern: true });
				let count = 0;

				walk(ast.fragment, {
					enter(node: (typeof ast.fragment.nodes)[number]) {
						if (node.type !== 'Comment') return;
						const trimmed = node.data.trim();
						if (!trimmed.startsWith(delimiter.start)) return;
						if (!trimmed.endsWith(delimiter.end)) return;
						s.remove(node.start, node.end);
						s.appendLeft(node.start, marked(trimmed.slice(delimLoc.start, delimLoc.end), { async: false }) as string);
						count++;
					},
				});

				if (count) logger?.info?.({ count }, filename);
				return { code: s.toString() };
			} catch (err) {
				if (err instanceof Error) logger?.error?.(err, filename);
				else logger?.error?.(Error('Failed to render Markdown.'), filename);
			}
		},
	} satisfies PreprocessorGroup;
}

And we can now register it with Svelte.

svelte.config.js
import { processMarkdown, createMdLogger } from '@samplekit/preprocess-markdown';
import adapter from '@sveltejs/adapter-auto';
import { vitePreprocess } from '@sveltejs/vite-plugin-svelte';
import { opts } from './src/lib/shiki/index.js';

const preprocessorRoot = `${import.meta.dirname}/src/routes/`;
const formatFilename = (/** @type {string} */ filename) => filename.replace(preprocessorRoot, '');
const include = (/** @type {string} */ filename) => filename.startsWith(preprocessorRoot);

/** @type {import('@sveltejs/kit').Config} */
const config = {
	preprocess: [
		processMarkdown({
			include,
			logger: createMdLogger(formatFilename),
		}),
		vitePreprocess(),
	],
	kit: {
		adapter: adapter(),
	},
};

export default config;

Hopefully it's clear how powerful this simple idea is. We could write some complicated logic to parse a file and determine where a table might start or end, but that would necessarily require us to work in a different language without the Svelte ecosystem. In my opinion sacrificing Svelte tooling to use these preprocessors would be the wrong tradeoff.

Math

We'll do the same thing for Markdown, but in this version, we'll want two delimiters. One for inline math like this V=43πr3V={4 over 3}pi r^{3} and one for display math like this:

x˙=σ(yx)y˙=ρxyxzz˙=βz+xyegin{align} dot{x} & = sigma(y-x) \ dot{y} & = ho x - y - xz \ dot{z} & = -eta z + xy end{align}

It's very similar to the AST approach above.

preprocess-katex
import { walk } from 'estree-walker';
import katex from 'katex';
import MagicString from 'magic-string';
import { parse, type PreprocessorGroup } from 'svelte/compiler';
import type { Logger } from './logger.js';

const display = { start: String.raw`\[`, end: String.raw`\]` };
const inline = { start: String.raw`\(`, end: String.raw`\)` };
const delimLoc = { start: 3, end: -3 };

type RenderToString = (
	tex: string,
	options: {
		displayMode?: boolean | undefined;
		throwOnError: true;
		strict: (errorCode: string, errorMsg: string) => 'ignore' | undefined;
	},
) => string;

export function processKatex({
	include,
	logger,
	renderToString = katex.renderToString,
}: {
	include?: (filename: string) => boolean;
	logger?: Logger;
	renderToString?: RenderToString;
} = {}) {
	return {
		name: 'katex',
		markup({ content, filename }) {
			if (!filename) return;
			if (include && !include(filename)) return;
			const s = new MagicString(content);
			const ast = parse(content, { filename, modern: true });
			let count = 0;

			walk(ast.fragment, {
				enter(node: (typeof ast.fragment.nodes)[number]) {
					if (node.type !== 'Comment') return;
					const trimmed = node.data.trim();

					let displayMode: boolean | undefined = undefined;
					if (trimmed.startsWith(display.start) && trimmed.endsWith(display.end)) displayMode = true;
					else if (trimmed.startsWith(inline.start) && trimmed.endsWith(inline.end)) displayMode = false;
					if (displayMode === undefined) return;

					s.remove(node.start, node.end);

					let parsed;
					try {
						const rawInput = String.raw`${trimmed.slice(delimLoc.start, delimLoc.end)}`;
						parsed = renderToString(rawInput, {
							displayMode,
							throwOnError: true,
						});
					} catch (err) {
						logger?.error?.(err instanceof Error ? err : Error('Failed to render KaTeX.'), filename);
						return;
					}

					const content = displayMode
						? `<div class="overflow-x-auto">{@html \`${parsed}\`}</div>`
						: `{@html \`${parsed}\`}`;
					s.appendLeft(node.start, content);
					count++;
				},
			});

			if (count) logger?.info?.({ count }, filename);

			return { code: s.toString() };
		},
	} satisfies PreprocessorGroup;
}

Annoyingly, KaTeX logs directly to the console. To prevent this, let's wrap the call to KaTeX in a trap function.

preprocess-katex
import { walk } from 'estree-walker';
import katex from 'katex';
import MagicString from 'magic-string';
import { parse, type PreprocessorGroup } from 'svelte/compiler';
import type { Logger } from './logger.js';

const display = { start: String.raw`\[`, end: String.raw`\]` };
const inline = { start: String.raw`\(`, end: String.raw`\)` };
const delimLoc = { start: 3, end: -3 };

// https://github.com/KaTeX/KaTeX/issues/3720
const catchStdErr = ({ tmpWrite, trappedFn }: { trappedFn: () => void; tmpWrite: (str: string) => boolean }) => {
	const write = process.stdout.write;
	try {
		process.stderr.write = tmpWrite;
		trappedFn();
	} finally {
		process.stderr.write = write;
	}
};

type RenderToString = (
	tex: string,
	options: {
		displayMode?: boolean | undefined;
		throwOnError: true;
		strict: (errorCode: string, errorMsg: string) => 'ignore' | undefined;
	},
) => string;

export function processKatex({
	include,
	logger,
	renderToString = katex.renderToString,
}: {
	include?: (filename: string) => boolean;
	logger?: Logger;
	renderToString?: RenderToString;
} = {}) {
	return {
		name: 'katex',
		markup({ content, filename }) {
			if (!filename) return;
			if (include && !include(filename)) return;
			const s = new MagicString(content);
			const ast = parse(content, { filename, modern: true });
			let count = 0;

			walk(ast.fragment, {
				enter(node: (typeof ast.fragment.nodes)[number]) {
					if (node.type !== 'Comment') return;
					const trimmed = node.data.trim();

					let displayMode: boolean | undefined = undefined;
					if (trimmed.startsWith(display.start) && trimmed.endsWith(display.end)) displayMode = true;
					else if (trimmed.startsWith(inline.start) && trimmed.endsWith(inline.end)) displayMode = false;
					if (displayMode === undefined) return;

					s.remove(node.start, node.end);

					let parsed;
					try {
						const rawInput = String.raw`${trimmed.slice(delimLoc.start, delimLoc.end)}`;
						const warns: Error[] = [];
						catchStdErr({
							trappedFn: () => {
								parsed = renderToString(rawInput, {
									displayMode,
									throwOnError: true,
								});
							},
							tmpWrite: (str) => {
								if (!str.startsWith('No character metrics for ')) warns.push(Error(str));
								return true;
							},
						});
						if (logger?.warn) {
							warns.forEach((err) => logger.warn?.(err, filename));
						}
					} catch (err) {
						logger?.error?.(err instanceof Error ? err : Error('Failed to render KaTeX.'), filename);
						return;
					}

					const content = displayMode
						? `<div class="overflow-x-auto">{@html \`${parsed}\`}</div>`
						: `{@html \`${parsed}\`}`;
					s.appendLeft(node.start, content);
					count++;
				},
			});

			if (count) logger?.info?.({ count }, filename);

			return { code: s.toString() };
		},
	} satisfies PreprocessorGroup;
}

Looks good. But what about reactivity? It's quite possible you'll want to use handlebar substitution inside the equations. If we simply tried to use it as is, KaTeX would choke on the syntax. We'll have to make a special TeX like macro for Svelte. \s seems as good as any. It should take the Svelte content out, replace it with unique single character placeholders, process the TeX, and then put the Svelte content back in.

Svelte Reactivity
// the nuts and bolts of it
const { svelteFreeString, extractedSvelteContent } = replaceSvelteAndStore(rawInput);
const mathString = katex.renderToString(svelteFreeString)
const parsed = restoreSvelte(mathString, extractedSvelteContent);

We'll use some Unicode characters as our storage placeholder.

preprocess-katex
import { walk } from 'estree-walker';
import katex from 'katex';
import MagicString from 'magic-string';
import { parse, type PreprocessorGroup } from 'svelte/compiler';
import type { Logger } from './logger.js';

const display = { start: String.raw`\[`, end: String.raw`\]` };
const inline = { start: String.raw`\(`, end: String.raw`\)` };
const delimLoc = { start: 3, end: -3 };

// https://github.com/KaTeX/KaTeX/issues/3720
const catchStdErr = ({ tmpWrite, trappedFn }: { trappedFn: () => void; tmpWrite: (str: string) => boolean }) => {
	const write = process.stdout.write;
	try {
		process.stderr.write = tmpWrite;
		trappedFn();
	} finally {
		process.stderr.write = write;
	}
};

const unicodeInsertionPlaceholders = [
	'␇',
	'␈',
	'␉',
	'␊',
	'␋',
	'␌',
	'␍',
	'␎',
	'␏',
	'␐',
	'␑',
	'␒',
	'␓',
	'␔',
	'␕',
	'␖',
	'␗',
	'␘',
	'␙',
	'␚',
	'␛',
	'␜',
	'␝',
	'␞',
	'␟',
	'␠',
];

function replaceSvelteAndStore(input: string): { svelteFreeString: string; extractedSvelteContent: string[] } {
	const extractedSvelteContent: string[] = [];
	let index = 0;
	const svelteFreeString = input.replace(/\\s\{([^}]*)\}/g, (_match, p1) => {
		if (index >= unicodeInsertionPlaceholders.length) throw new Error('Too many variable substitutions.');
		extractedSvelteContent.push(p1);
		const unicodePlaceholder = unicodeInsertionPlaceholders[index];
		index++;
		return `{` + unicodePlaceholder + `}`;
	});

	return { svelteFreeString, extractedSvelteContent };
}

function restoreSvelte(mathString: string, extractedSvelteContent: string[]): string {
	if (!extractedSvelteContent.length) return mathString;

	const unicodeMap = new Map();
	extractedSvelteContent.forEach((content, i) => {
		unicodeMap.set(unicodeInsertionPlaceholders[i], content);
	});

	const unicodePlaceholderRegex = new RegExp(`(${unicodeInsertionPlaceholders.join('|')})`, 'g');

	return mathString.replaceAll(unicodePlaceholderRegex, (placeholder) => {
		const svelteContent = unicodeMap.get(placeholder);
		return `\${${svelteContent}}`;
	});
}

type RenderToString = (
	tex: string,
	options: {
		displayMode?: boolean | undefined;
		throwOnError: true;
		strict: (errorCode: string, errorMsg: string) => 'ignore' | undefined;
	},
) => string;

export function processKatex({
	include,
	logger,
	renderToString = katex.renderToString,
}: {
	include?: (filename: string) => boolean;
	logger?: Logger;
	renderToString?: RenderToString;
} = {}) {
	return {
		name: 'katex',
		markup({ content, filename }) {
			if (!filename) return;
			if (include && !include(filename)) return;
			const s = new MagicString(content);
			const ast = parse(content, { filename, modern: true });
			let count = 0;

			walk(ast.fragment, {
				enter(node: (typeof ast.fragment.nodes)[number]) {
					if (node.type !== 'Comment') return;
					const trimmed = node.data.trim();

					let displayMode: boolean | undefined = undefined;
					if (trimmed.startsWith(display.start) && trimmed.endsWith(display.end)) displayMode = true;
					else if (trimmed.startsWith(inline.start) && trimmed.endsWith(inline.end)) displayMode = false;
					if (displayMode === undefined) return;

					s.remove(node.start, node.end);

					let parsed;
					try {
						const rawInput = String.raw`${trimmed.slice(delimLoc.start, delimLoc.end)}`;
						const { svelteFreeString, extractedSvelteContent } = replaceSvelteAndStore(rawInput);
						const warns: Error[] = [];
						catchStdErr({
							trappedFn: () => {
								const mathString = renderToString(svelteFreeString, {
									displayMode,
									throwOnError: true,
									strict: (errorCode: string, errorMsg: string) => {
										if (errorCode === 'unknownSymbol' && errorMsg.startsWith('Unrecognized Unicode character'))
											return 'ignore';
									},
								});
								parsed = restoreSvelte(mathString, extractedSvelteContent);
							},
							tmpWrite: (str) => {
								if (!str.startsWith('No character metrics for ')) warns.push(Error(str));
								return true;
							},
						});
						if (logger?.warn) {
							warns.forEach((err) => logger.warn?.(err, filename));
						}
					} catch (err) {
						logger?.error?.(err instanceof Error ? err : Error('Failed to render KaTeX.'), filename);
						return;
					}
					const content = displayMode
						? `<div class="overflow-x-auto">{@html \`${parsed}\`}</div>`
						: `{@html \`${parsed}\`}`;
					s.appendLeft(node.start, content);
					count++;
				},
			});

			if (count) logger?.info?.({ count }, filename);

			return { code: s.toString() };
		},
	} satisfies PreprocessorGroup;
}

Easy – two down and one to go!

Code Decoration

We've saved the best for last. Fundamentally, there doesn't need to be anything different about this preprocessor than the other two we've done so far. But simply wrapping Shiki without the ability to customize decorations would be an injustice. So, we'll add the ability to add data attributes and classes using Shiki's Decoration API. Ideally we could add it to the
<pre>
tag, line ranges, index ranges, substrings, etc.

We'll need a place for our preprocessor to look for the options. If we look back at our delimiter syntax, we have an obvious place: between the front delimiter and code fence. For convenience, let's also allow line options to be at the end of the line they're scoped to.

<!-- shiki-start
s"foo" c"border border-accent-9"
```ts
const foo = "bar";
const added = true;//! d"diff-add"
```
shiki-end -->
const foo = "bar";
const added = true;

Because we're supporting options, this preprocessor will be more involved than the other two. We'll first need to split the raw string into the global options, inline options, code fence, language, and code. Then we can pass everything off to a function that calls Shiki codeToHtml and applies Shiki decorations based on the extracted options.

import { walk } from 'estree-walker';
import MagicString from 'magic-string';
import { parse } from 'svelte/compiler';
import { getOrLoadOpts } from './defaultOpts.js';
import { codeToDecoratedHtmlSync } from './highlight.js';
import { stripOptions } from './strip-options/index.js';
import type { PreprocessOpts, Logger, PreprocessorGroup } from './types.js';

export function processCodeblockSync({
	opts,
	include,
	logger,
}: {
	include?: (filename: string) => boolean;
	logger?: Logger;
	opts: PreprocessOpts;
}) {
	return {
		name: 'codeblock',
		markup({ content, filename }) {
			if (!filename) return;
			if (include && !include(filename)) return;
			const s = new MagicString(content);
			const ast = parse(content, { filename, modern: true });
			let count = 0;

			walk(ast.fragment, {
				enter(node) {
					if (node.type !== 'Comment') return;
					const trimmed = node.data.trim();
					if (!trimmed.startsWith(opts.delimiters.common)) return;

					if (trimmed.endsWith(opts.delimiters.fenced.end)) {
						s.remove(node.start, node.end);

						const prepared = stripOptions(
							// escapePreprocessor allows us to write things like --> which would
							// otherwise terminate the html comment surrounding our preprocessor
							opts.escapePreprocessor({
								code: trimmed.slice(opts.delimiters.fenced.startLoc, opts.delimiters.fenced.endLoc),
							}),
							(e: Error) => logger?.warn?.(e, filename),
						);
						if (prepared instanceof Error) {
							return logger?.error?.(prepared, filename);
						}

						const {
							lang,
							lineToProperties,
							tranName, // we haven't discussed this, but this allows us to register custom transform functions
							preProperties,
							strippedCode,
							windowProperties,
							allLinesProperties,
						} = prepared;
						if (!opts.highlighterCore.getLoadedLanguages().includes(lang)) {
							return logger?.error?.(
								Error(
									lang
										? `Language ${lang} not loaded. Hint: try \`opts: await getOrLoadOpts({ langNames: ['${lang}'] })\` and restart the server.`
										: 'No lang provided.',
								),
								filename,
							);
						}

						let transformName = 'block';
						if (tranName && opts.transformMap[tranName]) {
							transformName = tranName;
						} else if (tranName !== null) {
							const keys = Object.keys(opts.transformMap);
							logger?.warn?.(
								Error(`${tranName} not in opts.transformMap. Defaulting to 'block'. Options: (${keys.join(', ')}).`),
								filename,
							);
						}

						const { error, data } = codeToDecoratedHtmlSync({
							opts,
							lineToProperties,
							allLinesProperties,
							preProperties,
							windowProperties,
							code: strippedCode,
							transformName,
							lang,
						});
						if (error) {
							return logger?.error?.(error, filename);
						}

						s.appendLeft(node.start, data);
						count++;
						return;
					}

					for (const { delimLoc, delimiter, lang } of opts.delimiters.inline) {
						if (trimmed.endsWith(delimiter)) {
							s.remove(node.start, node.end);

							const { error, data } = codeToDecoratedHtmlSync({
								code: opts.escapePreprocessor({ code: trimmed.slice(delimLoc.start, delimLoc.end) }),
								lang,
								opts,
								transformName: 'inline',
							});
							if (error) {
								return logger?.error?.(error, filename);
							}

							s.appendLeft(node.start, data);
							count++;
							break;
						}
					}
				},
			});

			if (count) {
				logger?.info?.({ count }, filename);
			}

			return { code: s.toString() };
		},
	} satisfies PreprocessorGroup;
}

Shiki is async, so we'll wrap the previous function with another that awaits the default options containing the Shiki Highlighter.

export async function processCodeblock({
	include,
	logger,
}: {
	include?: (filename: string) => boolean;
	logger?: Logger;
}) {
	return processCodeblockSync({ include, logger, opts: await getOrLoadOpts() });
}

For brevity, the details behind the options extraction have been omitted, but the actual preprocessor part is very similar to the other two. Full functionality details are available in the docs.

Code Editor Support

We've finished our preprocessors, but they're hard to use without syntax highlighting in our code editor. Everything is formatted like a comment. Let's fix that by writing a VS Code extension!

Overview

The best way to do this would be to follow the official VS Code extension docs. It shows how to use Yeoman to scaffold an extension. These are simple extensions though, so we'll just write them directly instead.

We need to write a package.json with a contributes field. In it, we'll have grammars and snippets. When we're ready, we'll run vsce package to package the extension and metadata. Then, we can install it with code --install-extension <extension-name>. This will add it to the ~/.vscode/extensions folder (or wherever your install location is).

Syntaxes

Let's use the Markdown preprocessor as an example. We'll create a repo for the VS Code extension and add a package.json. In it, we'll add a section to point VS Code to the grammar declaration file.

package.json
{
"contributes": {
		"grammars": [
			{
				"scopeName": "md.pp-svelte",
				"injectTo": [
					"source.svelte"
				],
				"path": "./syntaxes/md.pp-svelte.tmLanguage.json"
			}
		],
	}
}
VS Code uses TextMate grammar. If you open the VS Code command palette and search Developer: Inspect Editor Tokens and Scopes, you can see how things are highlighted. Everything exists in a textmate scope, and that scope is targeted by your VS Code theme for decorating. Our job is to tell VS Code, "when you see
<!-- md-start md-end -->
in a Svelte template, open up our custom scope, highlight the delimiter like a control keyword, and highlight the internals like Markdown." This is how we do it:
./syntaxes/md.pp-svelte.tmLanguage.json
{
	"name": "Svelte Component Markdown Injection",
	// the name matches what we put in the package.json
	"scopeName": "md.pp-svelte",
	// only match within the svelte scope
	"injectionSelector": "L:source.svelte",
	"fileTypes": [],
	"patterns": [
		{
			// start with the HTML comment opening. Use an oniguruma lookahead to only match if the next thing is "md-start"
			"begin": "<!--(?=\\s*md-start)",
			// end with the HTML comment closing. Use an oniguruma lookbehind to only match if the last thing matched was "md-end"
			"end": "(?<=.*?md-end)\\s*&closehtmlcomment",
			// give the entire match our custom scope
			"name": "source.pp-svelte",
			"patterns": [
				{
					// inside the match, mark the md-start as a keyword
					"begin": "md-start",
					"beginCaptures": { "0": { "name": "keyword.control.pp-svelte" } },
					// also match the end as a keyword
					"end": ".*?(md-end)",
					"endCaptures": { "1": { "name": "keyword.control.pp-svelte" } },
					// give everything within the <!-- and --> our custom markdown scope
					"name": "markdown.pp-svelte",
					// give everything within the md-start and md-end delimiters the real markdown scope
					"contentName": "text.html.markdown",
					// actually use the markdown scope for everything between the delimiters
					"patterns": [{ "include": "text.html.markdown" }]
				}
			]
		}
	]
}

Although this particular file is fairly straightforward, Oniguruma syntax can be... difficult. Writing it in JSON is even more tedious because it requires double escaping. regex101 and this decade old blog post by Matt Neuburg are useful shields to defend yourself against the beast that is Oniguruma regex.

Snippets

It would get very old very fast if we needed to write
<!-- md-start md-end -->
all the time. Let's write a VS Code snippet so we can write mds instead. Back in our package.json we can point to a snippets file.
package.json
{
	"contributes": {
		"grammars": [
			{
				"scopeName": "md.pp-svelte",
				"injectTo": [
					"source.svelte"
				],
				"path": "./syntaxes/md.pp-svelte.tmLanguage.json"
			}
		],
		"snippets": [
			{
				"language": "svelte",
				"path": "./snippets/md.pp-svelte.code-snippets"
			}
		]
	}
}

And write it just like a normal VS Code snippets file.

./snippets/md.pp-svelte.code-snippets
{
	"New Markdown block": {
		"scope": "text.svelte",
		"prefix": "md-start",
		"description": "Creates a new markdown block.",
		"body": ["<!-- md-start", "$1", "md-end -->", "$0"],
	},
}

Voilà. Now rinse and repeat for the other two – though they're all subtly different. Find the source code here.

Conclusion

These three preprocessors and their VS Code extensions are available under NPM and the VS Code marketplace. Full documentation here.
NPM PackageVS Code Extension
@samplekit/preprocess-katexsamplekit.svelte-pp-katex
@samplekit/preprocess-markdownsamplekit.svelte-pp-markdown
@samplekit/preprocess-shikisamplekit.svelte-pp-shiki

Svelte preprocessors are quite powerful, and by using the HTML comment delimiters, we can have our preprocessors and standard tooling too. I'm grateful to MDsveX, svelte-put, and Melt UI for introducing them to me, and I hope this article helps you get started with your own preprocessors. If you have a question or want to share your preprocessor, share it in the GitHub discussions!

Happy coding!

Published
Last Updated

Previous Article


Image Cropper And Uploader

Select an image, crop it, upload it to an AWS S3 Bucket with a progress indicator, moderate it with Rekognition, save it to the DB, and serve it via AWS Cloudfront.
Updated: October 22, 2024

Next Article


Simple URL State Controller

Store state in the URL with a few simple Svelte stores.
March 7, 2024

Changelog

  • Update to use @samplekit preprocessors.

  • Add formatLogFilename.

  • Expand processor syntax beyond highlighting.

Have a suggestion? File an issue.