SwiftMarkItDown is the start of a native Swift/iOS document-to-Markdown pipeline inspired by Microsoft MarkItDown. The repository is structured around a Swift Package so the same core can be embedded in an iOS app, a macOS utility, or a server-side Swift service. The repo also includes a minimal SwiftUI demo app that exercises the package on iOS.
The current MVP is intentionally small and deterministic. These formats have converters in the default MarkItDown pipeline:
| Input family | Extensions / aliases | Content-type hints | Conversion behavior | Platform availability |
|---|---|---|---|---|
| Plain text | txt, text |
text/plain |
Decodes text and normalizes blank lines. | All package platforms. |
| Markdown | md, markdown |
text/markdown, text/x-markdown |
Treats Markdown as text-like input and normalizes blank lines. | All package platforms. |
| HTML | html, htm |
text/html, application/xhtml+xml |
Converts common headings, inline emphasis, links, code, paragraphs, and list items; ignores document <head>, script, and style content. |
All package platforms. |
| CSV | csv |
text/csv, application/csv |
Converts rows to GitHub-Flavored Markdown tables, including quoted fields and escaped pipes. | All package platforms. |
| JSON | json |
application/json, text/json |
Converts objects and arrays to nested Markdown bullets with stable key ordering. | All package platforms. |
| Images | png, jpg, jpeg, heic, heif, tif, tiff, gif |
image/png, image/jpeg, image/heic, image/heif, image/tiff, image/gif |
Uses Apple Vision OCR and returns recognized text lines as Markdown text. GIF OCR uses the decoded first image. | Apple platforms that provide Vision, CoreGraphics, and ImageIO. Other platforms recognize the formats but return unsupportedFormat. |
pdf |
application/pdf |
Uses Apple PDFKit to extract embedded page text into Markdown paragraphs and records page-count metadata. | Apple platforms that provide PDFKit. Other platforms recognize PDF but return unsupportedFormat. |
The package also includes:
txtandmdpassthrough with text decoding and blank-line cleanup.htmlto Markdown for common headings, inline emphasis, links, code, paragraphs, and list items, with document<head>, script, and style content ignored.csvto GitHub-Flavored Markdown tables, including quoted fields and escaped pipes.jsonto nested Markdown bullets with stable key ordering.- Apple-platform image OCR for
png,jpg/jpeg,heic,tiff, andgifinputs using Vision text recognition, returning recognized lines as Markdown text. - Apple-platform PDF text extraction for embedded-text PDFs using PDFKit.
- A CLI wrapper for local/manual conversion checks.
- A SwiftUI iOS demo app for editing sample input and converting it to Markdown in the simulator.
Image OCR is available when the package is built on platforms that provide Vision, CoreGraphics, and ImageIO. On other platforms, image formats are recognized but return unsupportedFormat.
DOCX, PPTX, and XLSX are represented in the format model but still return unsupportedFormat until their native converter modules are implemented. PDF conversion is implemented on Apple platforms that provide PDFKit; non-Apple platforms still return unsupportedFormat for PDF.
Package.swift
SwiftMarkItDownApp.xcodeproj/ Xcode project for the iOS demo app
App/
SwiftMarkItDownApp/ SwiftUI app target that imports the package
Sources/
SwiftMarkItDown/ Core library and converter protocols
swift-markitdown/ Minimal CLI wrapper around the library
Tests/
SwiftMarkItDownTests/ Core conversion tests
Fixtures/ CLI smoke-test inputs
Expected/ CLI smoke-test expected Markdown
import Foundation
import SwiftMarkItDown
let input = Data("<h1>Hello</h1><p>Native Swift</p>".utf8)
let request = ConversionRequest(data: input, fileName: "example.html")
let document = try MarkItDown().convert(request)
print(document.markdown)See Docs/ImportButtonIntegration.md for SwiftUI fileImporter and PhotosPicker examples that wire an app's Import button into SwiftMarkItDown, including image OCR inputs.
swift run swift-markitdown path/to/file.htmlRun the unit test suite and CLI fixture smoke tests before opening a PR:
swift test
Scripts/smoke-test.shGitHub Actions runs the same checks on pushes to main, pull requests, and manual workflow dispatches.
SwiftMarkItDown is available under the MIT License.
- Expand the text/HTML/CSV/JSON converters with richer Markdown normalization and metadata extraction.
- Improve OCR layout reconstruction for headings, lists, tables, and multi-column scans.
- Add OCR fallback for scanned/image-only PDF pages on Apple platforms.
- Add a ZIP/OpenXML package reader as shared infrastructure for DOCX, PPTX, and XLSX.
- Implement DOCX paragraph, heading, table, hyperlink, and image-reference extraction.
- Evolve the demo into a more complete iOS MVP with document picker import, share/export flows, progress reporting, and a pluggable backend escape hatch for heavyweight conversions.