User Guide¶

Basic Usage¶

List available scanners and scan a document:

import scanlib

# Discover scanners
scanners = scanlib.list_scanners()
for s in scanners:
    print(s)  # e.g. "2nd Floor" or "HP Officejet Pro 8500"

# Scan a document
with scanners[0] as scanner:
    doc = scanner.scan()

# doc.data contains PDF bytes
with open("output.pdf", "wb") as f:
    f.write(doc.data)

Scan Options¶

Customize the scan with keyword arguments:

from scanlib import ColorMode, ScanArea, ScanSource

with scanners[0] as scanner:
    doc = scanner.scan(
        dpi=600,
        color_mode=ColorMode.GRAY,
        scan_area=ScanArea(0, 0, 2100, 2970),  # full A4 in 1/10 mm
        source=ScanSource.FLATBED,
    )

Color Modes¶

Three color modes are available: ColorMode.COLOR (24-bit RGB), ColorMode.GRAY (8-bit grayscale), and ColorMode.BW (1-bit black & white).

scanlib always returns pages in the mode you requested. Some scanners (notably over eSCL) ignore the requested mode and return a richer one — for example, RGB when you asked for grayscale. In that case scanlib down-converts the page to the requested mode (COLOR → GRAY via luminance, GRAY/COLOR → BW via threshold), so scan() and scan_pages() are consistent across backends. The conversion only runs when the scanner actually returns a richer mode than requested; a page already at (or below) the requested mode is passed through untouched.

Black & White Threshold¶

When scanning in BW mode, grayscale pixels are converted to 1-bit black or white using a threshold. Pixels with a value ≥ threshold become white; below become black. The default is 128:

with scanners[0] as scanner:
    # Lower threshold = more white (lighter output)
    doc = scanner.scan(color_mode=ColorMode.BW, bw_threshold=100)

    # Higher threshold = more black (darker output)
    doc = scanner.scan(color_mode=ColorMode.BW, bw_threshold=180)

The threshold applies both to scan()/scan_pages() and to build_pdf() when converting grayscale pages to BW.

Opening a Scanner by ID¶

If you already know a scanner’s ID from a previous discovery, you can open it directly without running list_scanners() again:

import scanlib

scanner = scanlib.open_scanner("escl:192.168.1.5:443")
with scanner:
    doc = scanner.scan()

This skips the mDNS/platform discovery step and connects immediately. On macOS with native ImageCaptureCore, a quick targeted discovery is run behind the scenes to resolve the UUID to a device object.

Scanner Capabilities¶

After opening a scanner, you can query its capabilities:

with scanners[0] as scanner:
    for si in scanner.sources:
        print(si.type)           # ScanSource.FLATBED
        print(si.resolutions)    # [150, 300, 600, 1200]
        print(si.color_modes)    # [ColorMode.COLOR, ColorMode.GRAY, ColorMode.BW]
        print(si.max_scan_area)  # ScanArea(x=0, y=0, width=2159, height=2972)
    print(scanner.defaults)      # ScannerDefaults(dpi=300, ...)

The first entry in sources is the scanner’s primary source (typically flatbed). When scan() is called without an explicit source, the first entry is used for parameter validation.

Feeder Scanning¶

When scanning from a document feeder, all pages are scanned automatically:

with scanners[0] as scanner:
    doc = scanner.scan(source=ScanSource.FEEDER)
    print(doc.page_count)  # Number of pages in the feeder

If the feeder is empty, FeederEmptyError is raised (consistently across all backends).

Multi-Page Flatbed Scanning¶

Use the next_page callback to scan multiple pages one at a time on a flatbed scanner. The callback receives the number of pages scanned so far and returns True to continue or False to stop:

def prompt_next(pages_so_far: int) -> bool:
    return input(f"{pages_so_far} page(s) scanned. Add another? [y/n] ") == "y"

with scanners[0] as scanner:
    doc = scanner.scan(next_page=prompt_next)
    # doc is a single multi-page PDF

Page-Level Scanning¶

Use scan_pages() to receive individual pages as they arrive. Each ScannedPage carries raw pixel data and can be encoded as JPEG or PNG for previewing. After reviewing and reordering, assemble a PDF with build_pdf():

import scanlib

with scanners[0] as scanner:
    pages = list(scanner.scan_pages())

# Preview each page
for i, page in enumerate(pages):
    with open(f"page_{i}.jpg", "wb") as f:
        f.write(page.to_jpeg())

# Rotate a page 90° clockwise
pages[0] = pages[0].rotate(90)

# Reorder, filter, then build the final PDF
pages.reverse()
doc = scanlib.build_pdf(pages, dpi=300)
with open("output.pdf", "wb") as f:
    f.write(doc.data)

Progress Callback¶

Monitor scan progress with a callback. It receives an int from 0 to 100, or -1 while the scanner is working but no percentage is yet available (e.g. warming up or feeding a page). Return True (or None) to continue, or False to abort (which raises ScanAborted):

def on_progress(percent: int) -> bool:
    if percent < 0:
        print("Scanning...")       # indeterminate
    else:
        print(f"Scanning... {percent}%")
    return True  # return False to abort

with scanners[0] as scanner:
    doc = scanner.scan(progress=on_progress)

The same callback is accepted by Scanner.scan_pages(). Depending on the backend it may run on an internal worker thread, so marshal any GUI updates to your own UI thread.

Aborting a Scan¶

Call abort() from any thread to cancel an in-progress scan. The running scan() or scan_pages() call will raise ScanAborted shortly after:

import threading

with scanners[0] as scanner:
    # Abort after 5 seconds from another thread
    threading.Timer(5, scanner.abort).start()
    try:
        doc = scanner.scan()
    except scanlib.ScanAborted:
        print("Scan was cancelled")

abort() is safe to call even when no scan is running.

Cancelling Discovery¶

Pass a threading.Event to list_scanners() to cancel a long-running discovery from another thread:

import threading
import scanlib

cancel = threading.Event()

# Cancel after 5 seconds from another thread
threading.Timer(5, cancel.set).start()

scanners = scanlib.list_scanners(timeout=120, cancel=cancel)

When the event is set, list_scanners() returns immediately with whatever scanners have been found (or an empty list).

eSCL / AirScan Network Scanners¶

scanlib includes a built-in eSCL (AirScan) backend that discovers network scanners via mDNS and communicates with them directly over HTTP — no OS-level scanner drivers are needed.

Platform	eSCL status	Notes
Linux	Always enabled	Runs alongside SANE
Windows	Always enabled	Runs alongside WIA
macOS	Opt-in (`SCANLIB_ESCL=1`)	ImageCaptureCore already handles eSCL natively

To enable the eSCL backend on macOS:

export SCANLIB_ESCL=1

When the eSCL backend runs alongside a platform backend, discovery runs in parallel and the results are deduplicated: a network scanner seen by both backends (for example a WSD scanner configured in Windows that also advertises eSCL over mDNS) is reported only once. Two scanners are recognised as the same device when their uuid values match or their IP addresses match.

Which entry survives a duplicate depends on the platform. On Linux and Windows the platform driver (SANE/WIA) is preferred, since it is always available; on macOS the eSCL entry is preferred, since enabling SCANLIB_ESCL=1 signals that you want the eSCL driver. Setting SCANLIB_ESCL=1 on Linux or Windows flips the preference there too, so the eSCL entry wins everywhere. Each scanner’s backend property indicates which backend discovered it:

`scanner.backend`	Description	Scanner ID format
`"sane"`	Linux SANE (USB)	SANE device URI
`"imagecapture"`	macOS ImageCaptureCore	ICC UUID
`"wia"`	Windows WIA 2.0 (USB + network)	WIA device ID
`"escl"`	eSCL / AirScan (network)	`escl:IP:PORT`

Error Handling¶

All scanlib exceptions derive from ScanLibError, so a single except scanlib.ScanLibError catches everything. The common subclasses are:

Exception	Raised when
`ScannerBusyError`	The scanner is already in use by another session or application (most scanners — network ones especially — allow only one scan session at a time). Subclass of `ScanError`.
`ScannerUnavailableError`	The scanner could not be reached — offline, asleep, or disconnected — rather than held by another session. Often transient (a network scanner that went to sleep) and worth retrying once it is reachable. Subclass of `ScanError`.
`FeederEmptyError`	A feeder scan was requested but the document feeder is empty. Subclass of `ScanError`.
`ScanAborted`	The scan was cancelled via `Scanner.abort()`, a `progress` callback returning `False`, or at the device.
`ScannerNotOpenError`	A scan/capability call was made before `Scanner.open()`.
`ScanError`	Any other scanning failure.

import scanlib

try:
    with scanners[0] as scanner:
        doc = scanner.scan()
except scanlib.ScannerBusyError:
    print("Scanner is busy — close any other scanning app and retry.")
except scanlib.ScannerUnavailableError:
    print("Scanner is offline or asleep — check it's on and reachable.")
except scanlib.ScanAborted:
    print("Scan was cancelled.")
except scanlib.ScanLibError as exc:
    print(f"Scan failed: {exc}")

Because ScannerBusyError, ScannerUnavailableError, and FeederEmptyError subclass ScanError, existing except ScanError handlers keep working; catch them explicitly only when you want to react differently.

Thread Safety¶

All scanlib operations can be called from any thread. The library internally dispatches operations to the correct thread for backends that require it (macOS ImageCaptureCore, Windows WIA).

Note that progress callbacks may execute on an internal thread. If your callback updates a GUI, dispatch to your UI thread accordingly. The next_page callback always runs on the caller’s thread.