mirror of
https://codeberg.org/listyantidewi/your-everyday-tools.git
synced 2026-07-01 23:17:37 +08:00
Improve local conversion fidelity and launchers
This commit is contained in:
@@ -1,3 +1,6 @@
|
||||
.claude
|
||||
__pycache__
|
||||
.venv/
|
||||
*.pyc
|
||||
.pytest_cache/
|
||||
.gitignore
|
||||
@@ -2,6 +2,51 @@
|
||||
|
||||
All notable changes to **Your Everyday Tools** are documented here. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and this project loosely follows [Semantic Versioning](https://semver.org/).
|
||||
|
||||
## [0.6.3] — 2026-06-06
|
||||
|
||||
### Added — Local conversion fidelity layer
|
||||
|
||||
- Added a shared local capability detector for LibreOffice, FFmpeg/ffprobe, Tesseract, ODA File Converter, PyMuPDF, pdf2docx, pdfplumber, Marker, pyzbar, rembg, pillow-heif, Whisper, and python-pptx.
|
||||
- Added `GET /capabilities`, returning engine availability, detected paths/versions when known, route quality tier, missing engines, and install hints.
|
||||
- Added standard conversion metadata on file responses: `X-Conversion-Engine`, `X-Conversion-Quality`, and `X-Fidelity-Warnings`. JSON responses can now include `engine`, `quality`, and `warnings`.
|
||||
- Added a shared upload-page status banner showing **High fidelity**, **Basic fallback**, or **Unavailable** before users convert.
|
||||
- Added a shared cross-platform launcher (`scripts/launcher.py`) used by `run.bat`, `run.command`, and `run.sh`. It creates a private `.venv`, installs core dependencies, best-effort installs optional Python packages, verifies PyMuPDF, opens the browser, and starts the app.
|
||||
- Added PDF to Word **Exact visual copy** mode, which renders each PDF page into a Word page as a non-editable image for best appearance preservation.
|
||||
- Added a browser-rendered SVG to PNG path for better SVG fidelity, with the existing local server renderer kept as fallback.
|
||||
- Added a first `tests/fidelity` scaffold covering capabilities, fallback gating, metadata headers, and future golden-fixture strategy.
|
||||
- Added a PyMuPDF import guard so the common wrong-package `fitz`/`frontend` install fails with clear setup instructions instead of a misleading Starlette traceback.
|
||||
|
||||
### Improved — Document and layout conversions
|
||||
|
||||
- Hardened LibreOffice conversion with an isolated temporary user profile, `--headless`, `--nologo`, `--nofirststartwizard`, `--norestore`, safer timeout handling, and robust output-file detection.
|
||||
- Word/HTML/Excel/PowerPoint to PDF now prefer LibreOffice when available for high-fidelity layout preservation.
|
||||
- Excel to PDF now performs full-fidelity local conversion through LibreOffice; the older ReportLab table renderer remains available only as an explicit basic fallback.
|
||||
- Layout-sensitive fallbacks are no longer silent. Word/HTML/Excel to PDF now return a clear error unless the user explicitly allows the basic fallback.
|
||||
- PDF to Excel can use optional `pdfplumber` before falling back to PyMuPDF table detection.
|
||||
- PDF to PowerPoint and PowerPoint to PDF now report conversion engine and quality metadata.
|
||||
|
||||
### Improved — Images, SVG, media, and CAD
|
||||
|
||||
- Image tools now apply EXIF orientation before processing.
|
||||
- Image saves preserve ICC profiles where supported.
|
||||
- Compress Image now offers Auto, Photo/JPEG, Lossless PNG, and WebP modes instead of always forcing JPEG.
|
||||
- Media conversion uses ffprobe metadata to preserve compatible audio/video streams when possible, and otherwise exposes re-encode warnings and quality presets.
|
||||
- CAD outputs now include engine metadata and warnings about unsupported entities, fonts, and line styles.
|
||||
|
||||
### Changed — Offline UX
|
||||
|
||||
- Replaced the Bootstrap Icons CDN dependency with a local icon shim so the app UI remains offline.
|
||||
- README now documents the local fidelity model, `/capabilities`, response metadata headers, explicit fallback behavior, and updated optional dependencies.
|
||||
- One-click launchers are now isolated from global/user Python packages and no longer fail the whole app when an optional Python package cannot be installed.
|
||||
- Optional heavy modules (`rembg`, Whisper, Marker, CAD/matplotlib stack) are now lazy-loaded only when their specific tool runs, reducing startup stalls before Flask prints its server banner.
|
||||
|
||||
### Dependencies
|
||||
|
||||
- Added optional `pdfplumber`.
|
||||
- Added `pytest` for the test suite.
|
||||
- Split dependency files into `requirements-core.txt`, `requirements-optional.txt`, and `requirements-dev.txt`; `requirements.txt` remains the full aggregate install.
|
||||
- Optional background removal now installs/checks `rembg[cpu]` so incomplete `rembg` installs without ONNX Runtime do not block app startup.
|
||||
|
||||
## [0.6.2] — 2026-04-29
|
||||
|
||||
### Improved — Requirements & expectations on every tool page
|
||||
|
||||
@@ -32,16 +32,16 @@ See [CHANGELOG.md](CHANGELOG.md) for release history and recent fixes.
|
||||
|
||||
| Tool | Description |
|
||||
| -------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| **Files to PDF** | Convert images (JPG, PNG, BMP, TIFF, WebP), Word documents (.docx, .doc, .odt), and text files to PDF. Word files use LibreOffice for full-fidelity layout when available, with a built-in fallback for `.docx`. |
|
||||
| **PDF to Word** | Convert PDF documents to `.docx`. Four modes: **Layout** (preserves tables, columns, figures), **Smart structure** (detects headings & lists for clean Word outline), **Flowing text** (always-clean paragraphs), **Marker** (optional ML engine, best fidelity). Page range supported on all modes. |
|
||||
| **Files to PDF** | Convert images (JPG, PNG, BMP, TIFF, WebP), Word documents (.docx, .doc, .odt), and text files to PDF. Word files prefer LibreOffice for full-fidelity layout; lower-fidelity `.docx` fallback must be explicitly allowed. |
|
||||
| **PDF to Word** | Convert PDF documents to `.docx`. Five modes: **Exact visual copy** (non-editable page images, best appearance), **Layout** (editable, lossy), **Smart structure**, **Flowing text**, and **Marker** (optional ML structure engine). Page range supported on all modes. |
|
||||
| **PDF to Images** | Export each PDF page as PNG or JPG (configurable DPI) |
|
||||
| **PDF to Text** | Extract all text content from a PDF |
|
||||
| **PDF to Excel** | Extract tables from a PDF into an `.xlsx` workbook — one sheet per table, per page, or all combined. Falls back to line-by-line text when no tables are detected. Uses PyMuPDF's native `find_tables()` (no extra dependencies). |
|
||||
| **HTML to PDF** | Convert HTML content to a PDF document. Uses LibreOffice for full CSS / table / image support when available; falls back to a minimal renderer otherwise. |
|
||||
| **PDF to Excel** | Extract tables from a PDF into an `.xlsx` workbook — one sheet per table, per page, or all combined. Uses optional `pdfplumber` first when available, with PyMuPDF as the built-in local fallback. |
|
||||
| **HTML to PDF** | Convert HTML content to a PDF document. Uses LibreOffice for full CSS / table / image support when available; lower-fidelity PyMuPDF fallback must be explicitly allowed. |
|
||||
| **Markdown to PDF** | Paste or upload Markdown (.md) and download a formatted PDF. Choose page size and base font size. Uses PyMuPDF's `Story` API for proper multi-page pagination. |
|
||||
| **Markdown to Word** | Convert Markdown to a `.docx` document with correct heading, list, quote, and code styles |
|
||||
| **PDF to PowerPoint** | Render each PDF page as an image and place it on its own slide in a `.pptx`. Choose 16:9 / 4:3 / A4 slide size, page range, and DPI. |
|
||||
| **PowerPoint to PDF** | Convert `.pptx` / `.ppt` / `.odp` presentations to PDF (requires LibreOffice on PATH) |
|
||||
| **PDF to PowerPoint** | Convert PDFs to `.pptx` using editable LibreOffice mode when available, or image-per-slide mode for non-editable visual preservation. Choose slide size, page range, and DPI in image mode. |
|
||||
| **PowerPoint to PDF** | Convert `.pptx` / `.ppt` / `.odp` presentations to PDF through the hardened LibreOffice wrapper (isolated profile, safer timeout, robust output detection). |
|
||||
| **OCR PDF** | Make scanned PDFs searchable (image + hidden text layer) or extract text — 14 languages supported |
|
||||
| **CAD to PDF/Image** | Convert DXF drawings to PDF or PNG (DWG via optional ODA File Converter) |
|
||||
|
||||
@@ -51,7 +51,7 @@ See [CHANGELOG.md](CHANGELOG.md) for release history and recent fixes.
|
||||
| ------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
|
||||
| **Excel to CSV / JSON** | Export sheets from `.xlsx` / `.xls` to CSV or JSON (array-of-objects or array-of-arrays). Single sheet or all sheets as ZIP. |
|
||||
| **CSV / JSON to Excel** | Build an `.xlsx` workbook from one or more CSV or JSON files — one sheet per file, optional bold/shaded header row |
|
||||
| **Excel to PDF** | Convert a workbook to PDF with one section per sheet. Configurable page size, orientation, and font size. Basic table rendering, not pixel-perfect. |
|
||||
| **Excel to PDF** | Convert workbooks to PDF with LibreOffice for high-fidelity print/layout preservation. The older ReportLab table renderer remains as an explicit basic fallback. |
|
||||
| **Merge Workbooks** | Combine multiple Excel files into a single workbook, optionally prefixing each sheet with its source filename |
|
||||
| **Split Sheets** | Export each sheet of a workbook as its own `.xlsx` (bundled as a ZIP if more than one) |
|
||||
| **Excel Info & Preview** | List sheet names, row/column counts, and preview the first N rows of every sheet |
|
||||
@@ -79,7 +79,7 @@ See [CHANGELOG.md](CHANGELOG.md) for release history and recent fixes.
|
||||
| Tool | Description |
|
||||
| ----------------------- | --------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| **Resize Image** | Resize by percentage or exact pixel dimensions (with aspect ratio lock) |
|
||||
| **Compress Image** | Reduce file size with adjustable quality slider (10–100%) |
|
||||
| **Compress Image** | Reduce file size with Auto, Photo/JPEG, Lossless PNG, and WebP modes. Applies EXIF orientation and preserves transparency/profile data where possible. |
|
||||
| **Convert Format** | Convert between PNG, JPG, WebP, BMP, and TIFF |
|
||||
| **Remove Background** | Automatically remove image backgrounds using AI |
|
||||
| **Crop Image** | Crop by aspect ratio (1:1, 4:3, 16:9, etc.) or custom coordinates |
|
||||
@@ -90,7 +90,7 @@ See [CHANGELOG.md](CHANGELOG.md) for release history and recent fixes.
|
||||
| **Image to Text (OCR)** | Extract text from images using optical character recognition |
|
||||
| **Animated WebP/GIF** | Convert between animated GIF and animated WebP (preserves per-frame timing) |
|
||||
| **Color Palette** | Extract a dominant color palette (2–16 colors) from an image via quantization or grid sampling. Includes swatch preview with hex codes. |
|
||||
| **SVG to PNG** | Rasterize SVG vectors to PNG at a chosen width, with optional transparent background |
|
||||
| **SVG to PNG** | Rasterize SVG vectors to PNG in the browser first for better SVG fidelity, with the existing local server renderer as fallback. |
|
||||
| **SVG Optimizer** | Strip comments, editor metadata (Inkscape/Sketch/Adobe namespaces), and round decimals to shrink SVG files |
|
||||
| **HEIC Converter** | Convert iPhone `.heic` / `.heif` photos to JPG, PNG, or WebP (single or bulk → ZIP). Once installed, all other image tools also accept HEIC inputs. |
|
||||
|
||||
@@ -173,7 +173,7 @@ See [CHANGELOG.md](CHANGELOG.md) for release history and recent fixes.
|
||||
| Tool | Description |
|
||||
| --------------------- | -------------------------------------------------------------------------------------------- |
|
||||
| **Convert Audio** | Convert between MP3, WAV, OGG, FLAC, AAC, M4A, and Opus with adjustable bitrate |
|
||||
| **Convert Video** | Convert between MP4, WebM, MKV, MOV, and AVI (uses sensible codec defaults per target) |
|
||||
| **Convert Video** | Convert between MP4, WebM, MKV, MOV, and AVI. Uses ffprobe metadata to preserve compatible streams, otherwise re-encodes with clear quality presets. |
|
||||
| **Extract Audio** | Pull the audio track out of a video file to MP3 / WAV / OGG / M4A |
|
||||
| **Trim Media** | Trim audio or video by start/end time (stream-copy first, re-encodes on keyframe mismatch) |
|
||||
| **Compress Video** | Re-encode video with H.264 at a chosen CRF and preset to shrink file size |
|
||||
@@ -197,7 +197,11 @@ Install [Python 3.10+](https://www.python.org/downloads/) once (on Windows, tick
|
||||
| **macOS** | Double-click `run.command`. First time only, open Terminal in this folder and run `chmod +x run.command` (macOS strips the executable bit on downloads). |
|
||||
| **Linux** | `chmod +x run.sh && ./run.sh` |
|
||||
|
||||
The launcher creates a virtual environment, installs dependencies, starts the server, and opens your browser automatically. Close the window to stop. Subsequent runs skip the setup step.
|
||||
The launcher creates a private `.venv`, installs required Python packages, best-effort installs optional Python packages, starts the server, and opens your browser automatically. Close the window to stop. Subsequent runs skip completed setup steps unless the dependency files change.
|
||||
|
||||
The launchers are intentionally isolated from your global Python packages, so a broken system/user install (for example the unrelated `fitz` package that conflicts with PyMuPDF) will not break the app.
|
||||
|
||||
Optional native desktop engines such as LibreOffice, FFmpeg, Tesseract, and ODA File Converter are detected locally and used automatically when present. They are not bundled into the repository because they are large system apps with OS-specific installers, but the app shows clear install hints through the tool pages and `/capabilities`.
|
||||
|
||||
### Simple Use (Dockerfile + docker-compose)
|
||||
|
||||
@@ -222,7 +226,7 @@ python -m venv venv
|
||||
source venv/bin/activate # Linux/macOS
|
||||
venv\Scripts\activate # Windows
|
||||
|
||||
# Install dependencies
|
||||
# Install the full dependency set for manual/developer use
|
||||
pip install -r requirements.txt
|
||||
|
||||
# Run
|
||||
@@ -233,24 +237,69 @@ Open **http://localhost:5000** in your browser.
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### `fitz` / PyMuPDF import error
|
||||
|
||||
PyMuPDF is installed with the package name `PyMuPDF`, but imported in Python as `fitz`. Do **not** install the unrelated package named `fitz`; it can cause startup errors involving `frontend` or Starlette.
|
||||
|
||||
Recommended fix:
|
||||
|
||||
```bash
|
||||
python -m pip uninstall -y fitz frontend
|
||||
python -m pip install --upgrade PyMuPDF
|
||||
```
|
||||
|
||||
On Windows, the easiest path is to run `run.bat`, which creates a clean `.venv` and installs the correct dependencies there. If running manually, prefer:
|
||||
|
||||
```bash
|
||||
.\.venv\Scripts\python.exe app.py
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Local Conversion Fidelity
|
||||
|
||||
The app stays local/offline: no uploaded files are sent to cloud conversion APIs. Tools that need better layout fidelity use locally installed engines when available.
|
||||
|
||||
- `GET /capabilities` reports detected engines, paths/versions when known, quality tier, missing engines, and install hints.
|
||||
- The shared upload UI shows **High fidelity**, **Basic fallback**, or **Unavailable** before conversion.
|
||||
- File responses include `X-Conversion-Engine`, `X-Conversion-Quality`, and, when relevant, `X-Fidelity-Warnings`.
|
||||
- Layout-sensitive fallbacks are no longer silent. Word/HTML/Excel to PDF require an explicit fallback checkbox when LibreOffice is unavailable or fails.
|
||||
- LibreOffice conversions run with an isolated temporary user profile and headless-safe flags.
|
||||
|
||||
---
|
||||
|
||||
## Optional Dependencies
|
||||
|
||||
The core app works out of the box with the main dependencies. Some features require additional packages that may need system-level libraries:
|
||||
|
||||
| Package | Feature | Notes |
|
||||
| ------------------------------------- | ------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `rembg` | Remove Background | Installs ONNX Runtime (~500 MB). The app works without it and shows a helpful message if missing. |
|
||||
| `rembg[cpu]` | Remove Background | Installs rembg with the CPU ONNX Runtime backend. The app works without it and shows a helpful message if missing or incomplete. |
|
||||
| `pyzbar` | Read QR Code | Requires the [ZBar](https://github.com/NaturalHistoryMuseum/pyzbar#installation) shared library on your system. |
|
||||
| `pdf2docx` | PDF to Word | Pure Python, but conversion quality depends on PDF complexity. |
|
||||
| `LibreOffice` (external) | Word/HTML/Excel/PowerPoint to PDF, editable PDF to PowerPoint | Recommended for high-fidelity document/layout conversion. The app detects common install paths and uses an isolated temporary profile per conversion. |
|
||||
| `pdf2docx` | PDF to Word Layout mode | Pure Python, but conversion quality depends on PDF complexity. |
|
||||
| `pdfplumber` | PDF to Excel | Optional table extractor used before the built-in PyMuPDF fallback when available. |
|
||||
| `pytesseract` | Image to Text (OCR), OCR PDF | Requires the [Tesseract](https://github.com/tesseract-ocr/tesseract) binary installed on your system. For non-English OCR, download the matching `*.traineddata` language pack into your Tesseract `tessdata` folder. |
|
||||
| `ezdxf` + `matplotlib` | CAD to PDF/Image | Renders DXF drawings. For DWG support, also install the free [ODA File Converter](https://www.opendesign.com/guestfiles/oda_file_converter) and make sure it's on your `PATH`. |
|
||||
| `ffmpeg` (external) | All Audio & Video tools | Requires the [FFmpeg](https://ffmpeg.org/download.html) binary on your `PATH`. Each media tool page shows a green banner if FFmpeg is detected, with install instructions if not. |
|
||||
| `pillow-heif` | HEIC/HEIF image support | Enables iPhone `.heic` / `.heif` inputs across image tools. |
|
||||
| `openai-whisper` | Speech to Text | Local Whisper transcription. First use of a model may download model weights unless already cached locally. |
|
||||
| `pytest` | Test suite | Used for route and fidelity tests; optional for normal app use. |
|
||||
| `sqlparse`, `croniter`, `jsonpath-ng` | SQL Formatter, Cron Parser, JSONPath Tester | Small pure-Python packages included in `requirements.txt`. Everything else under _Developer Utilities_ runs entirely in the browser. |
|
||||
|
||||
If you only need the core tools, install the minimal set:
|
||||
Dependency files are split for easier setup:
|
||||
|
||||
- `requirements-core.txt` starts the app and enables the primary built-in tools.
|
||||
- `requirements-optional.txt` adds optional local packages for heavier/specialized tools.
|
||||
- `requirements-dev.txt` adds test tooling.
|
||||
- `requirements.txt` includes all of the above for full manual/developer installs.
|
||||
|
||||
If you only need the core tools, install:
|
||||
|
||||
```bash
|
||||
pip install Flask Pillow PyMuPDF "qrcode[pil]" markdown reportlab img2pdf python-docx openpyxl xlrd
|
||||
pip install -r requirements-core.txt
|
||||
```
|
||||
|
||||
### Enabling DWG support (ODA File Converter)
|
||||
@@ -294,8 +343,14 @@ DXF files work out of the box once you install `ezdxf` and `matplotlib`. For **D
|
||||
your-everyday-tools/
|
||||
├── app.py # Flask app, tool registry, blueprint registration
|
||||
├── requirements.txt
|
||||
├── requirements-core.txt
|
||||
├── requirements-optional.txt
|
||||
├── requirements-dev.txt
|
||||
├── utils/
|
||||
│ └── file_utils.py # Shared helpers (ZIP creation, file validation)
|
||||
│ ├── file_utils.py # Shared helpers (ZIP creation, file validation)
|
||||
│ └── capabilities.py # Local engine detection + conversion metadata helpers
|
||||
├── scripts/
|
||||
│ └── launcher.py # Cross-platform one-click setup/start helper
|
||||
├── routes/
|
||||
│ ├── convert_tools.py # Document conversion endpoints
|
||||
│ ├── pdf_tools.py # PDF manipulation endpoints
|
||||
@@ -307,7 +362,8 @@ your-everyday-tools/
|
||||
│ ├── spreadsheet_tools.py # Excel / CSV / JSON workbook tools
|
||||
│ ├── dev_tools.py # Developer utilities (UUID/JWT/UA/formatters/cron/jsonpath)
|
||||
│ ├── archive_tools.py # ZIP create / extract / info
|
||||
│ └── media_tools.py # FFmpeg-powered audio & video tools
|
||||
│ ├── media_tools.py # FFmpeg-powered audio & video tools
|
||||
│ └── capabilities.py # /capabilities endpoint
|
||||
├── templates/
|
||||
│ ├── base.html # Main layout (sidebar + content area)
|
||||
│ ├── index.html # Home page with tool cards
|
||||
@@ -336,7 +392,8 @@ your-everyday-tools/
|
||||
│ ├── password_generator.html
|
||||
│ └── hash_generator.html
|
||||
└── static/
|
||||
├── css/style.css # All styles (~400 lines, no framework)
|
||||
├── css/style.css # All styles, no framework
|
||||
├── css/icons.css # Local icon shim; no CDN required
|
||||
└── js/main.js # File upload, AJAX, sidebar, shared logic
|
||||
```
|
||||
|
||||
@@ -344,9 +401,9 @@ your-everyday-tools/
|
||||
|
||||
- **One universal template** — `upload_tool.html` powers all 25+ server-side tools. Each route passes title, description, accepted file types, and form options as template variables. No per-tool template duplication.
|
||||
- **Client-side tools** (text utilities, calculators, security tools) run entirely in the browser with vanilla JavaScript — zero server round-trips.
|
||||
- **In-memory processing** — all file operations use `BytesIO`. No temporary files are written to disk.
|
||||
- **No CSS framework** — custom CSS with CSS Grid, Flexbox, and CSS custom properties. The only external resource is Bootstrap Icons via CDN (~100 KB) for the icon set.
|
||||
- **Graceful degradation** — heavy optional packages (`rembg`, `pyzbar`, `pdf2docx`, `pytesseract`) and external binaries (ODA File Converter, FFmpeg) are probed at import time via `importlib` / `shutil.which`. If missing, the affected tool shows a clear install instruction instead of crashing.
|
||||
- **Local-first processing** — pure browser tools never leave the page; server routes process files locally. Some engines such as LibreOffice, FFmpeg, ODA, and pdf2docx use isolated temporary directories when their CLI/library workflow requires files.
|
||||
- **No CSS framework or CDN dependency** — custom CSS with CSS Grid, Flexbox, CSS custom properties, and a local icon shim.
|
||||
- **Graceful degradation** — optional packages and external binaries (`LibreOffice`, `FFmpeg`, `ffprobe`, `Tesseract`, ODA File Converter, `rembg`, `pyzbar`, `pdf2docx`, `pdfplumber`, `pytesseract`, `pillow-heif`, Whisper, etc.) are reported through `/capabilities` and tool-page status banners. Missing high-fidelity engines either show a clear unavailable state or require explicit basic fallback consent.
|
||||
|
||||
---
|
||||
|
||||
@@ -379,4 +436,3 @@ waitress-serve --port=8000 app:app
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
|
||||
@@ -1,4 +1,3 @@
|
||||
import os
|
||||
from flask import Flask, render_template
|
||||
|
||||
app = Flask(__name__)
|
||||
@@ -217,6 +216,7 @@ from routes.spreadsheet_tools import bp as spreadsheet_bp
|
||||
from routes.dev_tools import bp as dev_bp
|
||||
from routes.archive_tools import bp as archive_bp
|
||||
from routes.media_tools import bp as media_bp
|
||||
from routes.capabilities import bp as capabilities_bp
|
||||
|
||||
app.register_blueprint(convert_bp, url_prefix="/convert")
|
||||
app.register_blueprint(pdf_bp, url_prefix="/pdf")
|
||||
@@ -229,6 +229,7 @@ app.register_blueprint(spreadsheet_bp, url_prefix="/spreadsheet")
|
||||
app.register_blueprint(dev_bp, url_prefix="/dev")
|
||||
app.register_blueprint(archive_bp, url_prefix="/archive")
|
||||
app.register_blueprint(media_bp, url_prefix="/media")
|
||||
app.register_blueprint(capabilities_bp)
|
||||
|
||||
if __name__ == "__main__":
|
||||
app.run(debug=True, port=5000)
|
||||
|
||||
@@ -0,0 +1,18 @@
|
||||
# Required for the app to start and for the primary built-in tools.
|
||||
Flask
|
||||
Pillow
|
||||
PyMuPDF
|
||||
qrcode[pil]
|
||||
python-barcode
|
||||
markdown
|
||||
reportlab
|
||||
svglib
|
||||
img2pdf
|
||||
python-docx
|
||||
python-pptx
|
||||
openpyxl
|
||||
xlrd
|
||||
sqlparse
|
||||
croniter
|
||||
jsonpath-ng
|
||||
cryptography
|
||||
@@ -0,0 +1,2 @@
|
||||
# Developer/test tools. Not required for normal one-click app startup.
|
||||
pytest
|
||||
@@ -0,0 +1,11 @@
|
||||
# Optional local packages. The launcher installs these best-effort so one
|
||||
# missing wheel or heavy dependency does not prevent the app from starting.
|
||||
rembg[cpu]
|
||||
pyzbar
|
||||
pdf2docx
|
||||
pdfplumber
|
||||
pytesseract
|
||||
ezdxf
|
||||
matplotlib
|
||||
pillow-heif
|
||||
openai-whisper
|
||||
+8
-28
@@ -1,28 +1,8 @@
|
||||
# Core
|
||||
Flask
|
||||
Pillow
|
||||
PyMuPDF
|
||||
qrcode[pil]
|
||||
python-barcode
|
||||
markdown
|
||||
reportlab
|
||||
svglib
|
||||
img2pdf
|
||||
python-docx
|
||||
python-pptx
|
||||
openpyxl
|
||||
xlrd
|
||||
sqlparse
|
||||
croniter
|
||||
jsonpath-ng
|
||||
cryptography
|
||||
|
||||
# Optional (app works without these, shows install message if missing)
|
||||
rembg
|
||||
pyzbar
|
||||
pdf2docx
|
||||
pytesseract
|
||||
ezdxf
|
||||
matplotlib
|
||||
pillow-heif
|
||||
openai-whisper
|
||||
# Full install for developers/manual setup.
|
||||
#
|
||||
# The one-click launchers install requirements-core.txt first, then install
|
||||
# requirements-optional.txt best-effort so optional package failures do not
|
||||
# prevent the app from starting.
|
||||
-r requirements-core.txt
|
||||
-r requirements-optional.txt
|
||||
-r requirements-dev.txt
|
||||
|
||||
@@ -0,0 +1,10 @@
|
||||
from flask import Blueprint, jsonify
|
||||
|
||||
from utils.capabilities import get_capabilities
|
||||
|
||||
bp = Blueprint("capabilities", __name__)
|
||||
|
||||
|
||||
@bp.route("/capabilities")
|
||||
def capabilities():
|
||||
return jsonify(get_capabilities())
|
||||
+265
-153
@@ -1,7 +1,7 @@
|
||||
import io
|
||||
import fitz # PyMuPDF
|
||||
import importlib.util
|
||||
from flask import Blueprint, render_template, request, send_file, jsonify
|
||||
from PIL import Image
|
||||
from PIL import Image, ImageOps
|
||||
import img2pdf
|
||||
from docx import Document as DocxDocument
|
||||
from reportlab.lib.pagesizes import A4
|
||||
@@ -10,6 +10,9 @@ from reportlab.lib.units import inch
|
||||
from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer, Table, TableStyle
|
||||
from reportlab.lib import colors
|
||||
from reportlab.lib.enums import TA_LEFT, TA_CENTER, TA_RIGHT
|
||||
from utils.pymupdf import import_pymupdf
|
||||
|
||||
fitz = import_pymupdf()
|
||||
|
||||
try:
|
||||
from pdf2docx import Converter as Pdf2DocxConverter
|
||||
@@ -17,13 +20,9 @@ try:
|
||||
except ImportError:
|
||||
HAS_PDF2DOCX = False
|
||||
|
||||
# Marker is loaded lazily inside the route to avoid the ~2GB model preload
|
||||
# on server start. We only check importability here.
|
||||
try:
|
||||
import marker # type: ignore
|
||||
HAS_MARKER = True
|
||||
except ImportError:
|
||||
HAS_MARKER = False
|
||||
# Marker is loaded lazily inside the route to avoid heavy model/module work
|
||||
# on server start. We only check package presence here.
|
||||
HAS_MARKER = importlib.util.find_spec("marker") is not None
|
||||
|
||||
try:
|
||||
import pytesseract
|
||||
@@ -32,64 +31,28 @@ except ImportError:
|
||||
HAS_TESSERACT = False
|
||||
|
||||
try:
|
||||
import ezdxf
|
||||
from ezdxf.addons.drawing import RenderContext, Frontend
|
||||
from ezdxf.addons.drawing.matplotlib import MatplotlibBackend
|
||||
import matplotlib
|
||||
matplotlib.use("Agg")
|
||||
import matplotlib.pyplot as plt
|
||||
HAS_EZDXF = True
|
||||
import pdfplumber
|
||||
HAS_PDFPLUMBER = True
|
||||
except ImportError:
|
||||
HAS_EZDXF = False
|
||||
HAS_PDFPLUMBER = False
|
||||
|
||||
HAS_EZDXF = (
|
||||
importlib.util.find_spec("ezdxf") is not None
|
||||
and importlib.util.find_spec("matplotlib") is not None
|
||||
)
|
||||
|
||||
import shutil
|
||||
from routes._helpers import safe_int, safe_float, log_error, NO_FILE_SINGLE, NO_FILE_MULTIPLE
|
||||
from utils.capabilities import (
|
||||
QUALITY_BASIC,
|
||||
QUALITY_HIGH,
|
||||
find_soffice,
|
||||
set_conversion_metadata,
|
||||
soffice_convert,
|
||||
)
|
||||
import shutil
|
||||
|
||||
ODA_CONVERTER = shutil.which("ODAFileConverter") or shutil.which("oda_file_converter")
|
||||
def _find_soffice() -> str | None:
|
||||
"""Detect LibreOffice. PATH first, then common per-OS install locations.
|
||||
|
||||
Most users — especially on Windows — install LibreOffice via the regular
|
||||
installer but never add it to PATH, so `shutil.which` fails to find it
|
||||
and the app silently falls back to a low-fidelity converter.
|
||||
"""
|
||||
found = shutil.which("soffice") or shutil.which("libreoffice")
|
||||
if found:
|
||||
return found
|
||||
|
||||
import os
|
||||
import sys
|
||||
|
||||
candidates: list[str] = []
|
||||
if sys.platform == "win32":
|
||||
program_files = [
|
||||
os.environ.get("ProgramFiles", r"C:\Program Files"),
|
||||
os.environ.get("ProgramFiles(x86)", r"C:\Program Files (x86)"),
|
||||
os.environ.get("ProgramW6432", r"C:\Program Files"),
|
||||
]
|
||||
for pf in program_files:
|
||||
if pf:
|
||||
candidates.append(os.path.join(pf, "LibreOffice", "program", "soffice.exe"))
|
||||
candidates.append(os.path.join(pf, "LibreOffice", "program", "soffice.com"))
|
||||
elif sys.platform == "darwin":
|
||||
candidates.append("/Applications/LibreOffice.app/Contents/MacOS/soffice")
|
||||
else: # linux / other unix
|
||||
candidates.extend([
|
||||
"/usr/bin/soffice",
|
||||
"/usr/bin/libreoffice",
|
||||
"/usr/local/bin/soffice",
|
||||
"/usr/local/bin/libreoffice",
|
||||
"/opt/libreoffice/program/soffice",
|
||||
"/snap/bin/libreoffice",
|
||||
])
|
||||
|
||||
for c in candidates:
|
||||
if c and os.path.isfile(c):
|
||||
return c
|
||||
return None
|
||||
|
||||
|
||||
SOFFICE = _find_soffice()
|
||||
SOFFICE = find_soffice()
|
||||
|
||||
try:
|
||||
from pptx import Presentation
|
||||
@@ -101,6 +64,16 @@ except ImportError:
|
||||
bp = Blueprint("convert", __name__)
|
||||
|
||||
|
||||
def _load_cad_modules():
|
||||
import ezdxf
|
||||
from ezdxf.addons.drawing import RenderContext, Frontend
|
||||
from ezdxf.addons.drawing.matplotlib import MatplotlibBackend
|
||||
import matplotlib
|
||||
matplotlib.use("Agg")
|
||||
import matplotlib.pyplot as plt
|
||||
return ezdxf, RenderContext, Frontend, MatplotlibBackend, plt
|
||||
|
||||
|
||||
# ── LibreOffice availability note (PPT/ODP/DOC conversion) ──────
|
||||
|
||||
def _soffice_available_notes():
|
||||
@@ -131,41 +104,8 @@ def _soffice_available_notes():
|
||||
|
||||
def _soffice_convert(file_data: bytes, source_ext: str, target_ext: str = "pdf",
|
||||
timeout: int = 180):
|
||||
"""Run LibreOffice's headless converter on the given bytes.
|
||||
|
||||
Returns the converted file as bytes on success, or None if soffice is not
|
||||
available / the conversion failed (caller falls back to a different engine).
|
||||
|
||||
`source_ext` is used for the temp filename (e.g. "docx", "html", "xlsx").
|
||||
"""
|
||||
if not SOFFICE:
|
||||
return None
|
||||
import os
|
||||
import subprocess
|
||||
import tempfile
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
in_path = os.path.join(tmp, f"input.{source_ext}")
|
||||
with open(in_path, "wb") as fp:
|
||||
fp.write(file_data)
|
||||
try:
|
||||
proc = subprocess.run(
|
||||
[SOFFICE, "--headless", "--convert-to", target_ext,
|
||||
"--outdir", tmp, in_path],
|
||||
capture_output=True, timeout=timeout,
|
||||
)
|
||||
except (subprocess.TimeoutExpired, FileNotFoundError) as e:
|
||||
log_error(e, f"soffice {source_ext}->{target_ext}")
|
||||
return None
|
||||
if proc.returncode != 0:
|
||||
err = proc.stderr.decode("utf-8", errors="replace")[:200]
|
||||
log_error(RuntimeError(err), f"soffice {source_ext}->{target_ext}")
|
||||
return None
|
||||
out_path = os.path.join(tmp, f"input.{target_ext}")
|
||||
if not os.path.exists(out_path):
|
||||
return None
|
||||
with open(out_path, "rb") as fp:
|
||||
return fp.read()
|
||||
"""Compatibility wrapper around the shared hardened LibreOffice converter."""
|
||||
return soffice_convert(file_data, source_ext, target_ext, timeout)
|
||||
|
||||
|
||||
# ── Page Routes ──────────────────────────────────
|
||||
@@ -205,7 +145,12 @@ def to_pdf_page():
|
||||
endpoint="/convert/to-pdf",
|
||||
accept=".jpg,.jpeg,.png,.bmp,.tiff,.webp,.txt,.docx,.doc,.odt",
|
||||
multiple=True,
|
||||
options=[])
|
||||
options=[
|
||||
{"type": "checkbox", "name": "use_basic_fallback",
|
||||
"label": "Fallback",
|
||||
"check_label": "Allow basic Python fallback if LibreOffice is unavailable or fails",
|
||||
"default": False},
|
||||
])
|
||||
|
||||
|
||||
@bp.route("/pdf-to-word")
|
||||
@@ -241,10 +186,13 @@ def pdf_to_word_page():
|
||||
{"type": "select", "name": "mode", "label": "Mode", "default": "layout",
|
||||
"choices": [
|
||||
{"value": "layout", "label": "Layout — preserve tables, columns, figures"},
|
||||
{"value": "exact", "label": "Exact visual copy — non-editable page images"},
|
||||
{"value": "structure", "label": "Smart structure — detect headings & lists"},
|
||||
{"value": "text", "label": "Flowing text — clean paragraphs, no structure"},
|
||||
{"value": "marker", "label": "Marker (ML) — best fidelity, slow, needs install"},
|
||||
]},
|
||||
{"type": "number", "name": "exact_dpi", "label": "Exact visual copy DPI",
|
||||
"default": 180, "min": 96, "max": 300, "depends_on": {"mode": "exact"}},
|
||||
{"type": "text", "name": "pages", "label": "Pages (blank = all)",
|
||||
"placeholder": "e.g. 1-3, 5, 8-10"},
|
||||
{"type": "checkbox", "name": "extract_tables",
|
||||
@@ -348,6 +296,12 @@ def pdf_to_excel_page():
|
||||
{"value": "lines", "label": "Lines only (ruled tables)"},
|
||||
{"value": "text", "label": "Text alignment only (borderless tables)"},
|
||||
]},
|
||||
{"type": "select", "name": "table_engine", "label": "Table engine", "default": "auto",
|
||||
"choices": [
|
||||
{"value": "auto", "label": "Auto — pdfplumber if installed, then PyMuPDF"},
|
||||
{"value": "pymupdf", "label": "PyMuPDF built-in"},
|
||||
{"value": "pdfplumber", "label": "pdfplumber (optional, often better on borderless tables)"},
|
||||
]},
|
||||
{"type": "select", "name": "mode", "label": "Extraction mode", "default": "tables",
|
||||
"choices": [
|
||||
{"value": "tables", "label": "Tables only (recommended)"},
|
||||
@@ -514,7 +468,12 @@ def html_to_pdf_page():
|
||||
text_placeholder="<h1>Hello World</h1>\n<p>Paste your HTML here...</p>",
|
||||
accept="",
|
||||
multiple=False,
|
||||
options=[],
|
||||
options=[
|
||||
{"type": "checkbox", "name": "use_basic_fallback",
|
||||
"label": "Fallback",
|
||||
"check_label": "Allow basic PyMuPDF fallback if LibreOffice is unavailable or fails",
|
||||
"default": False},
|
||||
],
|
||||
button_text="Convert to PDF")
|
||||
|
||||
|
||||
@@ -692,10 +651,13 @@ def to_pdf():
|
||||
if not files or not files[0].filename:
|
||||
return jsonify(error=NO_FILE_MULTIPLE), 400
|
||||
|
||||
allow_basic_fallback = request.form.get("use_basic_fallback") == "on"
|
||||
pdf_doc = fitz.open()
|
||||
# Track which engine ran on Word docs so the response can advertise it
|
||||
# (helps users diagnose "why is my output low-fidelity" without log access).
|
||||
word_engine_used: str | None = None
|
||||
engine_used = "pymupdf"
|
||||
quality = QUALITY_HIGH
|
||||
warnings: list[str] = []
|
||||
|
||||
for f in files:
|
||||
name = f.filename.lower()
|
||||
@@ -709,8 +671,14 @@ def to_pdf():
|
||||
try:
|
||||
pdf_bytes = _soffice_convert(data, ext, "pdf")
|
||||
if pdf_bytes is not None:
|
||||
word_engine_used = "libreoffice"
|
||||
engine_used = "libreoffice"
|
||||
quality = QUALITY_HIGH
|
||||
else:
|
||||
if not allow_basic_fallback:
|
||||
return jsonify(error=(
|
||||
f"High-fidelity conversion for '{f.filename}' requires LibreOffice. "
|
||||
"Tick 'Allow basic Python fallback' to continue with lower layout fidelity."
|
||||
)), 400
|
||||
if ext != "docx":
|
||||
return jsonify(error=(
|
||||
f"'{f.filename}' requires LibreOffice (soffice) on PATH. "
|
||||
@@ -718,7 +686,11 @@ def to_pdf():
|
||||
"Install LibreOffice for full layout fidelity."
|
||||
)), 400
|
||||
pdf_bytes = _docx_to_pdf(data)
|
||||
word_engine_used = "fallback"
|
||||
engine_used = "python-docx/reportlab"
|
||||
quality = QUALITY_BASIC
|
||||
warnings.append(
|
||||
"Word document used basic fallback; headers, custom layout, and precise positioning may differ."
|
||||
)
|
||||
with fitz.open(stream=pdf_bytes, filetype="pdf") as docx_pdf:
|
||||
pdf_doc.insert_pdf(docx_pdf)
|
||||
except Exception as e:
|
||||
@@ -734,6 +706,7 @@ def to_pdf():
|
||||
# Image → PDF page
|
||||
try:
|
||||
with Image.open(io.BytesIO(data)) as pil_img:
|
||||
pil_img = ImageOps.exif_transpose(pil_img)
|
||||
if pil_img.mode in ("RGBA", "P"):
|
||||
pil_img = pil_img.convert("RGB")
|
||||
buf = io.BytesIO()
|
||||
@@ -755,9 +728,7 @@ def to_pdf():
|
||||
|
||||
resp = send_file(output, mimetype="application/pdf",
|
||||
as_attachment=True, download_name="converted.pdf")
|
||||
if word_engine_used:
|
||||
resp.headers["X-Conversion-Engine"] = word_engine_used
|
||||
return resp
|
||||
return set_conversion_metadata(resp, engine_used, quality, warnings)
|
||||
|
||||
|
||||
@bp.route("/pdf-to-word", methods=["POST"])
|
||||
@@ -798,8 +769,10 @@ def pdf_to_word():
|
||||
except Exception as e:
|
||||
log_error(e, "pdf-to-word text")
|
||||
return jsonify(error="Could not extract text from the PDF (it may be a scan — try OCR PDF first)."), 400
|
||||
return send_file(io.BytesIO(buf), mimetype=docx_mime,
|
||||
resp = send_file(io.BytesIO(buf), mimetype=docx_mime,
|
||||
as_attachment=True, download_name=f"{base}.docx")
|
||||
return set_conversion_metadata(resp, "pymupdf/python-docx", QUALITY_BASIC,
|
||||
"Flowing text prioritizes clean editable text over visual layout.")
|
||||
|
||||
if mode == "structure":
|
||||
try:
|
||||
@@ -809,8 +782,22 @@ def pdf_to_word():
|
||||
except Exception as e:
|
||||
log_error(e, "pdf-to-word structure")
|
||||
return jsonify(error="Smart-structure analysis failed. Try Flowing text mode instead."), 400
|
||||
return send_file(io.BytesIO(buf), mimetype=docx_mime,
|
||||
resp = send_file(io.BytesIO(buf), mimetype=docx_mime,
|
||||
as_attachment=True, download_name=f"{base}.docx")
|
||||
return set_conversion_metadata(resp, "pymupdf/python-docx", "medium",
|
||||
"Smart structure is editable but drops precise layout, figures, and tables.")
|
||||
|
||||
if mode == "exact":
|
||||
dpi = safe_int(request.form.get("exact_dpi"), 180, min_val=96, max_val=300)
|
||||
try:
|
||||
buf = _pdf_to_docx_exact_visual(pdf_data, target_pages, dpi)
|
||||
except Exception as e:
|
||||
log_error(e, "pdf-to-word exact")
|
||||
return jsonify(error="Exact visual copy failed. The PDF may be corrupted or password-protected."), 400
|
||||
resp = send_file(io.BytesIO(buf), mimetype=docx_mime,
|
||||
as_attachment=True, download_name=f"{base}.docx")
|
||||
return set_conversion_metadata(resp, "pymupdf/python-docx", QUALITY_HIGH,
|
||||
"Exact visual copy preserves appearance by embedding page images; text is not editable.")
|
||||
|
||||
if mode == "marker":
|
||||
if not HAS_MARKER:
|
||||
@@ -824,8 +811,10 @@ def pdf_to_word():
|
||||
log_error(e, "pdf-to-word marker")
|
||||
return jsonify(error="Marker conversion failed. Check the server log; "
|
||||
"first run downloads ~2 GB and may need extra time."), 400
|
||||
return send_file(io.BytesIO(buf), mimetype=docx_mime,
|
||||
resp = send_file(io.BytesIO(buf), mimetype=docx_mime,
|
||||
as_attachment=True, download_name=f"{base}.docx")
|
||||
return set_conversion_metadata(resp, "marker-pdf/python-docx", QUALITY_HIGH,
|
||||
"Marker output is editable structured content, not pixel-perfect layout.")
|
||||
|
||||
# ── Layout mode (default) ──────────────────────────────
|
||||
if not HAS_PDF2DOCX:
|
||||
@@ -863,8 +852,10 @@ def pdf_to_word():
|
||||
|
||||
result.seek(0)
|
||||
name = files[0].filename.rsplit(".", 1)[0] + ".docx"
|
||||
return send_file(result, mimetype="application/vnd.openxmlformats-officedocument.wordprocessingml.document",
|
||||
resp = send_file(result, mimetype="application/vnd.openxmlformats-officedocument.wordprocessingml.document",
|
||||
as_attachment=True, download_name=name)
|
||||
return set_conversion_metadata(resp, "pdf2docx", "medium",
|
||||
"Layout mode is editable but PDF-to-Word conversion is inherently lossy.")
|
||||
|
||||
|
||||
# ── PDF → Word helpers (one per non-pdf2docx mode) ─────────
|
||||
@@ -895,6 +886,48 @@ def _pdf_to_docx_flowing_text(pdf_data: bytes, target_pages: list[int]) -> bytes
|
||||
return buf.getvalue()
|
||||
|
||||
|
||||
def _pdf_to_docx_exact_visual(pdf_data: bytes, target_pages: list[int], dpi: int) -> bytes:
|
||||
"""Render PDF pages into a DOCX as full-page images.
|
||||
|
||||
This mode is intentionally non-editable. It exists for users who care more
|
||||
about visual fidelity than editable Word content.
|
||||
"""
|
||||
from docx import Document as DocxDocument
|
||||
from docx.shared import Inches
|
||||
|
||||
with fitz.open(stream=pdf_data, filetype="pdf") as src:
|
||||
doc = DocxDocument()
|
||||
section = doc.sections[0]
|
||||
section.top_margin = Inches(0)
|
||||
section.bottom_margin = Inches(0)
|
||||
section.left_margin = Inches(0)
|
||||
section.right_margin = Inches(0)
|
||||
|
||||
mat = fitz.Matrix(dpi / 72, dpi / 72)
|
||||
first = True
|
||||
for pno in target_pages:
|
||||
page = src[pno]
|
||||
page_w_in = page.rect.width / 72
|
||||
page_h_in = page.rect.height / 72
|
||||
if first:
|
||||
section.page_width = Inches(page_w_in)
|
||||
section.page_height = Inches(page_h_in)
|
||||
first = False
|
||||
else:
|
||||
doc.add_page_break()
|
||||
|
||||
pix = page.get_pixmap(matrix=mat, alpha=False)
|
||||
png_bytes = pix.tobytes("png")
|
||||
paragraph = doc.add_paragraph()
|
||||
paragraph.paragraph_format.space_before = 0
|
||||
paragraph.paragraph_format.space_after = 0
|
||||
paragraph.add_run().add_picture(io.BytesIO(png_bytes), width=Inches(page_w_in))
|
||||
|
||||
buf = io.BytesIO()
|
||||
doc.save(buf)
|
||||
return buf.getvalue()
|
||||
|
||||
|
||||
def _pdf_to_docx_smart_structure(pdf_data: bytes, target_pages: list[int]) -> bytes:
|
||||
"""Detect headings (by font size), bullet/numbered lists (by line prefix),
|
||||
and paragraphs. Emit a .docx with proper Word heading and list styles.
|
||||
@@ -1167,21 +1200,41 @@ def pdf_to_excel():
|
||||
strategy = request.form.get("strategy", "auto")
|
||||
if strategy not in ("auto", "lines", "text"):
|
||||
strategy = "auto"
|
||||
table_engine = request.form.get("table_engine", "auto")
|
||||
if table_engine not in ("auto", "pymupdf", "pdfplumber"):
|
||||
table_engine = "auto"
|
||||
if table_engine == "pdfplumber" and not HAS_PDFPLUMBER:
|
||||
return jsonify(error="pdfplumber is not installed. Install it or choose Auto/PyMuPDF."), 400
|
||||
pages_spec = request.form.get("pages", "").strip()
|
||||
pdf_data = files[0].read()
|
||||
|
||||
try:
|
||||
doc = fitz.open(stream=files[0].read(), filetype="pdf")
|
||||
doc = fitz.open(stream=pdf_data, filetype="pdf")
|
||||
except Exception as e:
|
||||
log_error(e, "pdf-to-excel open")
|
||||
return jsonify(error="Could not open PDF (the file may be corrupted or password-protected)."), 400
|
||||
|
||||
plumber_doc = None
|
||||
if table_engine in ("auto", "pdfplumber") and HAS_PDFPLUMBER:
|
||||
try:
|
||||
plumber_doc = pdfplumber.open(io.BytesIO(pdf_data))
|
||||
except Exception as e:
|
||||
log_error(e, "pdf-to-excel pdfplumber open")
|
||||
if table_engine == "pdfplumber":
|
||||
doc.close()
|
||||
return jsonify(error="pdfplumber could not open this PDF. Try Auto or PyMuPDF."), 400
|
||||
|
||||
try:
|
||||
target_pages = parse_page_ranges(pages_spec, len(doc))
|
||||
except (ValueError, IndexError):
|
||||
doc.close()
|
||||
if plumber_doc:
|
||||
plumber_doc.close()
|
||||
return jsonify(error="Invalid page range. Use e.g. '1-3, 5, 8-10'."), 400
|
||||
if not target_pages:
|
||||
doc.close()
|
||||
if plumber_doc:
|
||||
plumber_doc.close()
|
||||
return jsonify(error="No valid pages selected."), 400
|
||||
|
||||
wb = Workbook()
|
||||
@@ -1189,6 +1242,8 @@ def pdf_to_excel():
|
||||
used_names: set[str] = set()
|
||||
total_tables = 0
|
||||
total_text_pages = 0
|
||||
warnings: list[str] = []
|
||||
table_engines_used: set[str] = set()
|
||||
|
||||
def _safe_name(base: str) -> str:
|
||||
name = re.sub(r"[\[\]\*\?\/\\:]", "_", base)[:31] or "Sheet"
|
||||
@@ -1253,6 +1308,50 @@ def pdf_to_excel():
|
||||
log_error(e, f"find_tables strategy={strategy}")
|
||||
return []
|
||||
|
||||
def _clean_table_rows(rows) -> list[list[str]]:
|
||||
cleaned = []
|
||||
for row in rows or []:
|
||||
normalized = ["" if cell is None else str(cell) for cell in row]
|
||||
if any(cell.strip() for cell in normalized):
|
||||
cleaned.append(normalized)
|
||||
return cleaned
|
||||
|
||||
def _pymupdf_table_rows(page) -> list[list[list[str]]]:
|
||||
rows_list = []
|
||||
for table in _find_tables_robust(page):
|
||||
try:
|
||||
rows = _clean_table_rows(table.extract())
|
||||
except Exception as e:
|
||||
log_error(e, "pdf-to-excel table extract")
|
||||
continue
|
||||
if rows:
|
||||
rows_list.append(rows)
|
||||
if rows_list:
|
||||
table_engines_used.add("pymupdf")
|
||||
return rows_list
|
||||
|
||||
def _table_rows_for_page(page, pno: int) -> list[list[list[str]]]:
|
||||
if plumber_doc is not None:
|
||||
try:
|
||||
rows_list = [
|
||||
rows for rows in
|
||||
(_clean_table_rows(rows) for rows in (plumber_doc.pages[pno].extract_tables() or []))
|
||||
if rows
|
||||
]
|
||||
except Exception as e:
|
||||
log_error(e, f"pdfplumber extract page {pno + 1}")
|
||||
rows_list = []
|
||||
if rows_list:
|
||||
table_engines_used.add("pdfplumber")
|
||||
return rows_list
|
||||
if table_engine == "pdfplumber":
|
||||
return []
|
||||
|
||||
rows_list = _pymupdf_table_rows(page)
|
||||
if rows_list and plumber_doc is not None and table_engine == "auto":
|
||||
warnings.append("pdfplumber found no table on at least one page; PyMuPDF fallback was used.")
|
||||
return rows_list
|
||||
|
||||
# ── "combined" — stream everything into a single sheet ────────────
|
||||
if organize == "combined":
|
||||
ws = wb.create_sheet(_safe_name("Extracted"))
|
||||
@@ -1262,9 +1361,8 @@ def pdf_to_excel():
|
||||
page_had_content = False
|
||||
|
||||
if mode in ("tables", "tables_text"):
|
||||
tables = _find_tables_robust(page)
|
||||
for t in tables:
|
||||
rows = t.extract()
|
||||
tables = _table_rows_for_page(page, pno)
|
||||
for rows in tables:
|
||||
if not rows:
|
||||
continue
|
||||
ws.cell(row=next_row, column=1, value=f"Page {pno + 1} – table").font = Font(bold=True, italic=True)
|
||||
@@ -1277,6 +1375,7 @@ def pdf_to_excel():
|
||||
if mode == "text" or (mode == "tables_text" and not page_had_content):
|
||||
text_rows = _text_rows(page)
|
||||
if text_rows:
|
||||
table_engines_used.add("pymupdf")
|
||||
ws.cell(row=next_row, column=1, value=f"Page {pno + 1} – text").font = Font(bold=True, italic=True)
|
||||
next_row += 1
|
||||
next_row = _write_rows(ws, text_rows, start_row=next_row, header=False)
|
||||
@@ -1290,8 +1389,7 @@ def pdf_to_excel():
|
||||
tables_rows = [] # list of (label, rows)
|
||||
|
||||
if mode in ("tables", "tables_text"):
|
||||
for tidx, t in enumerate(_find_tables_robust(page), start=1):
|
||||
rows = t.extract()
|
||||
for tidx, rows in enumerate(_table_rows_for_page(page, pno), start=1):
|
||||
if rows:
|
||||
tables_rows.append((f"Table {tidx}", rows))
|
||||
total_tables += 1
|
||||
@@ -1299,6 +1397,7 @@ def pdf_to_excel():
|
||||
if mode == "text" or (mode == "tables_text" and not tables_rows):
|
||||
text_rows = _text_rows(page)
|
||||
if text_rows:
|
||||
table_engines_used.add("pymupdf")
|
||||
tables_rows.append(("Text", text_rows))
|
||||
total_text_pages += 1
|
||||
|
||||
@@ -1321,6 +1420,8 @@ def pdf_to_excel():
|
||||
next_row += 1
|
||||
|
||||
doc.close()
|
||||
if plumber_doc:
|
||||
plumber_doc.close()
|
||||
|
||||
if not wb.sheetnames:
|
||||
msg = "No tables found on the selected pages."
|
||||
@@ -1347,12 +1448,15 @@ def pdf_to_excel():
|
||||
output.seek(0)
|
||||
|
||||
name = files[0].filename.rsplit(".", 1)[0] + ".xlsx"
|
||||
return send_file(
|
||||
engine = "+".join(sorted(table_engines_used)) or "pymupdf"
|
||||
quality = "medium" if total_tables else QUALITY_BASIC
|
||||
resp = send_file(
|
||||
output,
|
||||
mimetype="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
|
||||
as_attachment=True,
|
||||
download_name=name,
|
||||
)
|
||||
return set_conversion_metadata(resp, engine, quality, warnings)
|
||||
|
||||
|
||||
@bp.route("/md-to-pdf", methods=["POST"])
|
||||
@@ -1559,6 +1663,7 @@ def html_to_pdf():
|
||||
html = request.form.get("text", "").strip()
|
||||
if not html:
|
||||
return jsonify(error="Please enter some HTML content."), 400
|
||||
allow_basic_fallback = request.form.get("use_basic_fallback") == "on"
|
||||
|
||||
# Wrap in basic structure if no <html> tag present
|
||||
if "<html" not in html.lower():
|
||||
@@ -1566,8 +1671,19 @@ def html_to_pdf():
|
||||
|
||||
# Prefer LibreOffice for proper CSS rendering
|
||||
pdf_bytes = _soffice_convert(html.encode("utf-8"), "html", "pdf")
|
||||
engine = "libreoffice"
|
||||
quality = QUALITY_HIGH
|
||||
warnings: list[str] = []
|
||||
|
||||
if pdf_bytes is None:
|
||||
if not allow_basic_fallback:
|
||||
return jsonify(error=(
|
||||
"High-fidelity HTML to PDF requires LibreOffice. Tick "
|
||||
"'Allow basic PyMuPDF fallback' to continue with limited CSS/layout support."
|
||||
)), 400
|
||||
engine = "pymupdf"
|
||||
quality = QUALITY_BASIC
|
||||
warnings.append("Basic HTML fallback supports only simple markup and may not preserve CSS/layout.")
|
||||
# Fallback: PyMuPDF's minimal HTML rendering
|
||||
doc = fitz.open()
|
||||
try:
|
||||
@@ -1588,8 +1704,9 @@ def html_to_pdf():
|
||||
output = io.BytesIO(pdf_bytes)
|
||||
output.seek(0)
|
||||
|
||||
return send_file(output, mimetype="application/pdf",
|
||||
resp = send_file(output, mimetype="application/pdf",
|
||||
as_attachment=True, download_name="converted.pdf")
|
||||
return set_conversion_metadata(resp, engine, quality, warnings)
|
||||
|
||||
|
||||
@bp.route("/ocr-pdf", methods=["POST"])
|
||||
@@ -1667,16 +1784,19 @@ def cad_to_pdf():
|
||||
|
||||
target = request.form.get("format", "pdf")
|
||||
dpi = safe_int(request.form.get("dpi"), 150, min_val=72, max_val=600)
|
||||
ezdxf, RenderContext, Frontend, MatplotlibBackend, plt = _load_cad_modules()
|
||||
|
||||
filename = files[0].filename
|
||||
ext = filename.rsplit(".", 1)[-1].lower() if "." in filename else ""
|
||||
file_data = files[0].read()
|
||||
engine_used = "ezdxf/matplotlib"
|
||||
|
||||
import tempfile, os, subprocess
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
if ext == "dwg":
|
||||
if not ODA_CONVERTER:
|
||||
return jsonify(error="DWG support requires ODA File Converter. Download it free from https://www.opendesign.com/guestfiles/oda_file_converter and ensure it is on your PATH. Or convert your DWG to DXF first."), 400
|
||||
engine_used = "oda/ezdxf/matplotlib"
|
||||
|
||||
in_dir = os.path.join(tmpdir, "in")
|
||||
out_dir = os.path.join(tmpdir, "out")
|
||||
@@ -1734,14 +1854,26 @@ def cad_to_pdf():
|
||||
fig.savefig(buf, format="pdf", bbox_inches="tight", pad_inches=0.2)
|
||||
plt.close(fig)
|
||||
buf.seek(0)
|
||||
return send_file(buf, mimetype="application/pdf",
|
||||
resp = send_file(buf, mimetype="application/pdf",
|
||||
as_attachment=True, download_name=base_name + ".pdf")
|
||||
return set_conversion_metadata(
|
||||
resp,
|
||||
engine_used,
|
||||
"medium",
|
||||
"CAD rendering may omit or simplify unsupported entities, fonts, and line styles.",
|
||||
)
|
||||
else:
|
||||
fig.savefig(buf, format="png", dpi=dpi, bbox_inches="tight", pad_inches=0.2)
|
||||
plt.close(fig)
|
||||
buf.seek(0)
|
||||
return send_file(buf, mimetype="image/png",
|
||||
resp = send_file(buf, mimetype="image/png",
|
||||
as_attachment=True, download_name=base_name + ".png")
|
||||
return set_conversion_metadata(
|
||||
resp,
|
||||
engine_used,
|
||||
"medium",
|
||||
"CAD rendering may omit or simplify unsupported entities, fonts, and line styles.",
|
||||
)
|
||||
|
||||
|
||||
# ── PDF → PowerPoint ─────────────────────────────────────
|
||||
@@ -1866,8 +1998,9 @@ def pdf_to_pptx():
|
||||
"use features LibreOffice's PDF importer can't handle. Try Image mode instead."
|
||||
)), 400
|
||||
|
||||
return send_file(io.BytesIO(pptx_bytes), mimetype=PPTX_MIME,
|
||||
resp = send_file(io.BytesIO(pptx_bytes), mimetype=PPTX_MIME,
|
||||
as_attachment=True, download_name=f"{base}.pptx")
|
||||
return set_conversion_metadata(resp, "libreoffice", QUALITY_HIGH)
|
||||
|
||||
# ── Image mode (page-image-per-slide) ─────────────────
|
||||
if not HAS_PPTX:
|
||||
@@ -1921,8 +2054,14 @@ def pdf_to_pptx():
|
||||
finally:
|
||||
doc.close()
|
||||
|
||||
return send_file(output, mimetype=PPTX_MIME,
|
||||
resp = send_file(output, mimetype=PPTX_MIME,
|
||||
as_attachment=True, download_name=f"{base}.pptx")
|
||||
return set_conversion_metadata(
|
||||
resp,
|
||||
"pymupdf/python-pptx",
|
||||
QUALITY_HIGH,
|
||||
"Image mode preserves appearance but slides are not editable.",
|
||||
)
|
||||
|
||||
|
||||
# ── PowerPoint → PDF ─────────────────────────────────────
|
||||
@@ -1942,10 +2081,7 @@ def pptx_to_pdf_page():
|
||||
|
||||
@bp.route("/pptx-to-pdf", methods=["POST"])
|
||||
def pptx_to_pdf():
|
||||
import os
|
||||
import tempfile
|
||||
import subprocess
|
||||
from routes._helpers import log_error, NO_FILE_SINGLE
|
||||
from routes._helpers import NO_FILE_SINGLE
|
||||
|
||||
if not SOFFICE:
|
||||
return jsonify(error="LibreOffice (soffice) is not installed or not on PATH. Install LibreOffice and restart the server."), 400
|
||||
@@ -1959,35 +2095,11 @@ def pptx_to_pdf():
|
||||
if ext not in ("pptx", "ppt", "odp"):
|
||||
return jsonify(error="Unsupported format. Upload .pptx, .ppt, or .odp."), 400
|
||||
|
||||
safe_name = "input." + ext
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
in_path = os.path.join(tmp, safe_name)
|
||||
f.save(in_path)
|
||||
|
||||
try:
|
||||
proc = subprocess.run(
|
||||
[SOFFICE, "--headless", "--convert-to", "pdf",
|
||||
"--outdir", tmp, in_path],
|
||||
capture_output=True, timeout=300,
|
||||
)
|
||||
except subprocess.TimeoutExpired:
|
||||
return jsonify(error="Conversion timed out (file is too complex or too large)."), 400
|
||||
except Exception as e:
|
||||
log_error(e, "pptx-to-pdf subprocess")
|
||||
return jsonify(error="LibreOffice failed to launch."), 400
|
||||
|
||||
if proc.returncode != 0:
|
||||
err = proc.stderr.decode("utf-8", errors="replace")[:200] or "unknown error"
|
||||
log_error(RuntimeError(err), "pptx-to-pdf")
|
||||
data = soffice_convert(f.read(), ext, "pdf", timeout=300)
|
||||
if data is None:
|
||||
return jsonify(error="LibreOffice could not convert this file."), 400
|
||||
|
||||
out_pdf = os.path.join(tmp, "input.pdf")
|
||||
if not os.path.exists(out_pdf):
|
||||
return jsonify(error="Conversion produced no output (file may be corrupted or empty)."), 400
|
||||
|
||||
with open(out_pdf, "rb") as fp:
|
||||
data = fp.read()
|
||||
|
||||
base = f.filename.rsplit(".", 1)[0]
|
||||
return send_file(io.BytesIO(data), mimetype="application/pdf",
|
||||
resp = send_file(io.BytesIO(data), mimetype="application/pdf",
|
||||
as_attachment=True, download_name=f"{base}.pdf")
|
||||
return set_conversion_metadata(resp, "libreoffice", QUALITY_HIGH)
|
||||
|
||||
+59
-42
@@ -1,15 +1,17 @@
|
||||
import io
|
||||
import importlib.util
|
||||
from flask import Blueprint, render_template, request, send_file, jsonify
|
||||
from PIL import Image, ImageDraw, ImageFont
|
||||
from PIL import Image, ImageDraw, ImageFont, ImageOps
|
||||
from PIL.ExifTags import TAGS
|
||||
|
||||
from routes._helpers import safe_int, safe_float, log_error, NO_FILE_SINGLE
|
||||
from utils.capabilities import QUALITY_BASIC, QUALITY_HIGH, set_conversion_metadata
|
||||
|
||||
try:
|
||||
from rembg import remove as rembg_remove
|
||||
HAS_REMBG = True
|
||||
except ImportError:
|
||||
HAS_REMBG = False
|
||||
HAS_REMBG = (
|
||||
importlib.util.find_spec("rembg") is not None
|
||||
and importlib.util.find_spec("onnxruntime") is not None
|
||||
)
|
||||
REMBG_IMPORT_ERROR = "" if HAS_REMBG else "Install rembg with CPU support: pip install \"rembg[cpu]\""
|
||||
|
||||
try:
|
||||
import pytesseract
|
||||
@@ -38,7 +40,7 @@ def get_pil_image(file):
|
||||
Used by routes that need a single in-memory PIL.Image. Routes that should
|
||||
properly close the image on error paths use _safe_open_image() instead.
|
||||
"""
|
||||
return Image.open(io.BytesIO(file.read()))
|
||||
return ImageOps.exif_transpose(Image.open(io.BytesIO(file.read())))
|
||||
|
||||
|
||||
def _safe_open_image(file):
|
||||
@@ -47,7 +49,7 @@ def _safe_open_image(file):
|
||||
Returns the opened image (caller should close or use as a context manager).
|
||||
"""
|
||||
try:
|
||||
return Image.open(io.BytesIO(file.read()))
|
||||
return ImageOps.exif_transpose(Image.open(io.BytesIO(file.read())))
|
||||
except Exception as e:
|
||||
log_error(e, "Image.open")
|
||||
raise ValueError("Could not read image (file may be corrupted or not an image).")
|
||||
@@ -56,6 +58,7 @@ def _safe_open_image(file):
|
||||
def image_to_bytes(img, fmt, quality=85):
|
||||
buf = io.BytesIO()
|
||||
save_kwargs = {"format": fmt}
|
||||
icc_profile = img.info.get("icc_profile")
|
||||
|
||||
if fmt.upper() == "JPEG":
|
||||
if img.mode in ("RGBA", "P", "LA"):
|
||||
@@ -66,6 +69,8 @@ def image_to_bytes(img, fmt, quality=85):
|
||||
save_kwargs["optimize"] = True
|
||||
elif fmt.upper() == "WEBP":
|
||||
save_kwargs["quality"] = quality
|
||||
if icc_profile and fmt.upper() in ("JPEG", "PNG", "WEBP"):
|
||||
save_kwargs["icc_profile"] = icc_profile
|
||||
|
||||
img.save(buf, **save_kwargs)
|
||||
buf.seek(0)
|
||||
@@ -116,10 +121,9 @@ def compress_page():
|
||||
title="Compress Image",
|
||||
description="Reduce image file size while controlling quality",
|
||||
notes=(
|
||||
'<p><strong>Output is always JPG</strong> regardless of the input format — '
|
||||
'JPG is the most compressible format and best for photos. If you need to keep '
|
||||
'transparency or sharp text/diagrams, use <a href="/image/convert">Convert Format</a> '
|
||||
'with PNG or WebP instead.</p>'
|
||||
'<p><strong>Auto mode</strong> keeps transparency lossless and uses photo compression '
|
||||
'for opaque images. Choose Photo/JPEG for smaller photos, Lossless PNG for diagrams '
|
||||
'or transparent artwork, or WebP for modern lossy compression.</p>'
|
||||
'<p><strong>Quality guide:</strong> 70–80% is the sweet spot for photos (large '
|
||||
'savings, no visible loss). Below 50% you\'ll start seeing JPEG artefacts. '
|
||||
'Above 90% gives diminishing returns.</p>'
|
||||
@@ -128,6 +132,13 @@ def compress_page():
|
||||
accept=IMAGE_ACCEPT,
|
||||
multiple=False,
|
||||
options=[
|
||||
{"type": "select", "name": "compression_mode", "label": "Mode", "default": "auto",
|
||||
"choices": [
|
||||
{"value": "auto", "label": "Auto"},
|
||||
{"value": "photo", "label": "Photo/JPEG"},
|
||||
{"value": "lossless", "label": "Lossless PNG"},
|
||||
{"value": "webp", "label": "WebP"},
|
||||
]},
|
||||
{"type": "range", "name": "quality", "label": "Quality",
|
||||
"default": 70, "min": 10, "max": 100, "step": 5, "suffix": "%"},
|
||||
])
|
||||
@@ -166,7 +177,7 @@ def remove_bg_page():
|
||||
status = (
|
||||
'<p><i class="bi bi-exclamation-triangle-fill" style="color:#ffb703"></i> '
|
||||
'<strong>Background removal is unavailable.</strong> Install with '
|
||||
'<code>pip install rembg</code> and restart the server. First use will '
|
||||
'<code>pip install "rembg[cpu]"</code> and restart the server. First use will '
|
||||
'download the AI model (~170 MB) automatically.</p>'
|
||||
)
|
||||
return render_template("upload_tool.html",
|
||||
@@ -355,28 +366,7 @@ def palette_page():
|
||||
|
||||
@bp.route("/svg-to-png")
|
||||
def svg_to_png_page():
|
||||
return render_template("upload_tool.html",
|
||||
title="SVG to PNG",
|
||||
description="Rasterise an SVG file to a PNG image",
|
||||
notes=(
|
||||
'<p><strong>Renders via <code>svglib</code> + reportlab.</strong> Supports the '
|
||||
'common SVG features: paths, shapes, basic styling, embedded raster images. '
|
||||
'<strong>Limitations:</strong> some advanced SVG 2 features (filters, masks, '
|
||||
'animations, web fonts loaded via <code>@font-face</code>) may render incorrectly '
|
||||
'or not at all. For pixel-perfect rendering of complex SVGs, open the SVG in a '
|
||||
'browser and use Print → Save as PDF, then convert that PDF to PNG.</p>'
|
||||
'<p style="font-size:.9em;color:var(--muted)"><strong>No external dependencies '
|
||||
'beyond <code>svglib</code></strong> (already installed).</p>'
|
||||
),
|
||||
endpoint="/image/svg-to-png",
|
||||
accept=".svg",
|
||||
multiple=False,
|
||||
options=[
|
||||
{"type": "number", "name": "width", "label": "Output width (pixels, 0 = native size)",
|
||||
"default": 0, "min": 0, "max": 8192},
|
||||
{"type": "checkbox", "name": "transparent", "label": "Background",
|
||||
"default": True, "check_label": "Transparent (otherwise white)"},
|
||||
])
|
||||
return render_template("tools/svg_to_png.html")
|
||||
|
||||
|
||||
@bp.route("/svg-optimize")
|
||||
@@ -486,16 +476,35 @@ def compress():
|
||||
return jsonify(error=NO_FILE_SINGLE), 400
|
||||
|
||||
quality = safe_int(request.form.get("quality"), 70, min_val=1, max_val=100)
|
||||
mode = request.form.get("compression_mode", "auto")
|
||||
if mode not in ("auto", "photo", "lossless", "webp"):
|
||||
mode = "auto"
|
||||
try:
|
||||
img = _safe_open_image(files[0])
|
||||
except ValueError as e:
|
||||
return jsonify(error=str(e)), 400
|
||||
|
||||
# Always output as JPEG for best compression
|
||||
buf = image_to_bytes(img, "JPEG", quality=quality)
|
||||
has_alpha = img.mode in ("RGBA", "LA") or (img.mode == "P" and "transparency" in img.info)
|
||||
if mode == "auto":
|
||||
mode = "lossless" if has_alpha else "photo"
|
||||
|
||||
name = files[0].filename.rsplit(".", 1)[0] + "_compressed.jpg"
|
||||
return send_file(buf, mimetype="image/jpeg", as_attachment=True, download_name=name)
|
||||
if mode == "lossless":
|
||||
png_img = img.convert("RGBA") if has_alpha else img.convert("RGB")
|
||||
buf = image_to_bytes(png_img, "PNG", quality=quality)
|
||||
mime, ext, quality_label = "image/png", "png", QUALITY_HIGH
|
||||
elif mode == "webp":
|
||||
buf = image_to_bytes(img, "WEBP", quality=quality)
|
||||
mime, ext, quality_label = "image/webp", "webp", "medium"
|
||||
else:
|
||||
buf = image_to_bytes(img, "JPEG", quality=quality)
|
||||
mime, ext, quality_label = "image/jpeg", "jpg", QUALITY_BASIC
|
||||
|
||||
name = files[0].filename.rsplit(".", 1)[0] + f"_compressed.{ext}"
|
||||
resp = send_file(buf, mimetype=mime, as_attachment=True, download_name=name)
|
||||
warnings = []
|
||||
if mode == "photo" and has_alpha:
|
||||
warnings.append("Photo/JPEG mode flattens transparency.")
|
||||
return set_conversion_metadata(resp, "pillow", quality_label, warnings)
|
||||
|
||||
|
||||
@bp.route("/convert", methods=["POST"])
|
||||
@@ -521,7 +530,8 @@ def convert():
|
||||
@bp.route("/remove-bg", methods=["POST"])
|
||||
def remove_bg():
|
||||
if not HAS_REMBG:
|
||||
return jsonify(error="Background removal requires the 'rembg' package. Install with: pip install rembg"), 400
|
||||
detail = f" Details: {REMBG_IMPORT_ERROR[:180]}" if REMBG_IMPORT_ERROR else ""
|
||||
return jsonify(error="Background removal requires rembg with an ONNX Runtime backend. Install with: pip install \"rembg[cpu]\"." + detail), 400
|
||||
|
||||
files = request.files.getlist("files")
|
||||
if not files or not files[0].filename:
|
||||
@@ -529,10 +539,11 @@ def remove_bg():
|
||||
|
||||
input_data = files[0].read()
|
||||
try:
|
||||
from rembg import remove as rembg_remove
|
||||
output_data = rembg_remove(input_data)
|
||||
except Exception as e:
|
||||
log_error(e, "remove_bg")
|
||||
return jsonify(error="Background removal failed (the file may not be a valid image)."), 400
|
||||
return jsonify(error="Background removal failed. If this is a setup issue, install with: pip install \"rembg[cpu]\""), 400
|
||||
|
||||
name = files[0].filename.rsplit(".", 1)[0] + "_nobg.png"
|
||||
return send_file(io.BytesIO(output_data), mimetype="image/png",
|
||||
@@ -997,8 +1008,14 @@ def svg_to_png():
|
||||
out_bytes = png_bytes
|
||||
|
||||
name = files[0].filename.rsplit(".", 1)[0] + ".png"
|
||||
return send_file(io.BytesIO(out_bytes), mimetype="image/png",
|
||||
resp = send_file(io.BytesIO(out_bytes), mimetype="image/png",
|
||||
as_attachment=True, download_name=name)
|
||||
return set_conversion_metadata(
|
||||
resp,
|
||||
"svglib/reportlab",
|
||||
QUALITY_BASIC,
|
||||
"Server fallback may miss advanced SVG filters, masks, animations, and web fonts.",
|
||||
)
|
||||
|
||||
|
||||
@bp.route("/svg-optimize", methods=["POST"])
|
||||
|
||||
+115
-59
@@ -1,10 +1,13 @@
|
||||
import os
|
||||
import json
|
||||
import importlib.util
|
||||
import shutil
|
||||
import subprocess
|
||||
import tempfile
|
||||
from flask import Blueprint, render_template, request, send_file, jsonify
|
||||
|
||||
from routes._helpers import safe_int, safe_float, log_error, NO_FILE_SINGLE
|
||||
from utils.capabilities import QUALITY_HIGH, set_conversion_metadata
|
||||
|
||||
bp = Blueprint("media", __name__)
|
||||
|
||||
@@ -64,6 +67,65 @@ def _save_upload(file_storage, tmpdir: str) -> str:
|
||||
return path
|
||||
|
||||
|
||||
def _probe_media(path: str) -> dict | None:
|
||||
if not FFPROBE:
|
||||
return None
|
||||
try:
|
||||
proc = subprocess.run(
|
||||
[
|
||||
FFPROBE,
|
||||
"-v", "error",
|
||||
"-print_format", "json",
|
||||
"-show_streams",
|
||||
"-show_format",
|
||||
path,
|
||||
],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=15,
|
||||
)
|
||||
except Exception:
|
||||
return None
|
||||
if proc.returncode != 0:
|
||||
return None
|
||||
try:
|
||||
return json.loads(proc.stdout or "{}")
|
||||
except json.JSONDecodeError:
|
||||
return None
|
||||
|
||||
|
||||
def _first_codec(probe: dict | None, codec_type: str) -> str | None:
|
||||
for stream in (probe or {}).get("streams", []):
|
||||
if stream.get("codec_type") == codec_type:
|
||||
return stream.get("codec_name")
|
||||
return None
|
||||
|
||||
|
||||
def _can_copy_video(probe: dict | None, target_fmt: str) -> bool:
|
||||
video = _first_codec(probe, "video")
|
||||
audio = _first_codec(probe, "audio")
|
||||
if target_fmt == "mp4":
|
||||
return video in {"h264", "hevc", "mpeg4"} and (audio in {None, "aac", "mp3", "alac"})
|
||||
if target_fmt == "webm":
|
||||
return video in {"vp8", "vp9", "av1"} and (audio in {None, "vorbis", "opus"})
|
||||
if target_fmt == "mkv":
|
||||
return True
|
||||
if target_fmt == "mov":
|
||||
return video in {"h264", "hevc", "prores"} and (audio in {None, "aac", "pcm_s16le", "alac"})
|
||||
return False
|
||||
|
||||
|
||||
def _media_response(data: bytes, *, mimetype: str | None = None,
|
||||
download_name: str, warnings: list[str] | None = None):
|
||||
resp = send_file(
|
||||
_bytes_io(data),
|
||||
mimetype=mimetype,
|
||||
as_attachment=True,
|
||||
download_name=download_name,
|
||||
)
|
||||
return set_conversion_metadata(resp, "ffmpeg", QUALITY_HIGH, warnings or [])
|
||||
|
||||
|
||||
# ── Audio convert ──────────────────────────────────────
|
||||
|
||||
@bp.route("/convert-audio", methods=["GET", "POST"])
|
||||
@@ -128,12 +190,7 @@ def convert_audio():
|
||||
data = fp.read()
|
||||
|
||||
base = f.filename.rsplit(".", 1)[0]
|
||||
return send_file(
|
||||
_bytes_io(data),
|
||||
mimetype=f"audio/{fmt}",
|
||||
as_attachment=True,
|
||||
download_name=f"{base}.{fmt}",
|
||||
)
|
||||
return _media_response(data, mimetype=f"audio/{fmt}", download_name=f"{base}.{fmt}")
|
||||
|
||||
|
||||
# ── Video convert ──────────────────────────────────────
|
||||
@@ -157,6 +214,17 @@ def convert_video():
|
||||
"default": "mp4",
|
||||
"choices": [{"value": f, "label": f.upper()} for f in VIDEO_FORMATS],
|
||||
},
|
||||
{
|
||||
"name": "quality",
|
||||
"label": "Quality",
|
||||
"type": "select",
|
||||
"default": "auto",
|
||||
"choices": [
|
||||
{"value": "auto", "label": "Auto preserve when compatible"},
|
||||
{"value": "high", "label": "High quality re-encode"},
|
||||
{"value": "standard", "label": "Standard re-encode"},
|
||||
],
|
||||
},
|
||||
],
|
||||
button_text="Convert",
|
||||
)
|
||||
@@ -167,16 +235,29 @@ def convert_video():
|
||||
fmt = request.form.get("format", "mp4")
|
||||
if fmt not in VIDEO_FORMATS:
|
||||
return jsonify({"error": "Unsupported target format."}), 400
|
||||
quality = request.form.get("quality", "auto")
|
||||
if quality not in ("auto", "high", "standard"):
|
||||
quality = "auto"
|
||||
warnings = []
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
in_path = _save_upload(f, tmp)
|
||||
out_path = os.path.join(tmp, f"output.{fmt}")
|
||||
|
||||
probe = _probe_media(in_path)
|
||||
args = ["-i", in_path]
|
||||
if quality == "auto" and _can_copy_video(probe, fmt):
|
||||
args += ["-c", "copy", out_path]
|
||||
else:
|
||||
if quality == "auto" and probe is None:
|
||||
warnings.append("ffprobe metadata was unavailable; FFmpeg re-encoded streams instead of attempting stream copy.")
|
||||
elif quality == "auto":
|
||||
warnings.append("Input streams were not compatible with the target container; FFmpeg re-encoded them.")
|
||||
crf = "20" if quality == "high" else "23"
|
||||
if fmt == "webm":
|
||||
args += ["-c:v", "libvpx-vp9", "-c:a", "libopus", out_path]
|
||||
args += ["-c:v", "libvpx-vp9", "-crf", "30" if quality == "standard" else "24", "-b:v", "0", "-c:a", "libopus", out_path]
|
||||
elif fmt == "mp4":
|
||||
args += ["-c:v", "libx264", "-c:a", "aac", "-preset", "medium", out_path]
|
||||
args += ["-c:v", "libx264", "-crf", crf, "-c:a", "aac", "-preset", "medium", out_path]
|
||||
else:
|
||||
args += [out_path]
|
||||
|
||||
@@ -188,12 +269,7 @@ def convert_video():
|
||||
data = fp.read()
|
||||
|
||||
base = f.filename.rsplit(".", 1)[0]
|
||||
return send_file(
|
||||
_bytes_io(data),
|
||||
mimetype=f"video/{fmt}",
|
||||
as_attachment=True,
|
||||
download_name=f"{base}.{fmt}",
|
||||
)
|
||||
return _media_response(data, mimetype=f"video/{fmt}", download_name=f"{base}.{fmt}", warnings=warnings)
|
||||
|
||||
|
||||
# ── Extract audio from video ───────────────────────────
|
||||
@@ -252,12 +328,7 @@ def extract_audio():
|
||||
data = fp.read()
|
||||
|
||||
base = f.filename.rsplit(".", 1)[0]
|
||||
return send_file(
|
||||
_bytes_io(data),
|
||||
mimetype=f"audio/{fmt}",
|
||||
as_attachment=True,
|
||||
download_name=f"{base}.{fmt}",
|
||||
)
|
||||
return _media_response(data, mimetype=f"audio/{fmt}", download_name=f"{base}.{fmt}")
|
||||
|
||||
|
||||
# ── Trim media ─────────────────────────────────────────
|
||||
@@ -322,11 +393,7 @@ def trim():
|
||||
data = fp.read()
|
||||
|
||||
base = f.filename.rsplit(".", 1)[0]
|
||||
return send_file(
|
||||
_bytes_io(data),
|
||||
as_attachment=True,
|
||||
download_name=f"{base}_trimmed.{ext}",
|
||||
)
|
||||
return _media_response(data, download_name=f"{base}_trimmed.{ext}")
|
||||
|
||||
|
||||
# ── Compress video ─────────────────────────────────────
|
||||
@@ -396,12 +463,7 @@ def compress_video():
|
||||
data = fp.read()
|
||||
|
||||
base = f.filename.rsplit(".", 1)[0]
|
||||
return send_file(
|
||||
_bytes_io(data),
|
||||
mimetype="video/mp4",
|
||||
as_attachment=True,
|
||||
download_name=f"{base}_compressed.mp4",
|
||||
)
|
||||
return _media_response(data, mimetype="video/mp4", download_name=f"{base}_compressed.mp4")
|
||||
|
||||
|
||||
# ── Video to GIF ───────────────────────────────────────
|
||||
@@ -458,12 +520,7 @@ def video_to_gif():
|
||||
data = fp.read()
|
||||
|
||||
base = f.filename.rsplit(".", 1)[0]
|
||||
return send_file(
|
||||
_bytes_io(data),
|
||||
mimetype="image/gif",
|
||||
as_attachment=True,
|
||||
download_name=f"{base}.gif",
|
||||
)
|
||||
return _media_response(data, mimetype="image/gif", download_name=f"{base}.gif")
|
||||
|
||||
|
||||
# ── Subtitle convert / shift ───────────────────────────
|
||||
@@ -618,12 +675,13 @@ def subtitle_convert():
|
||||
|
||||
out_text = _write_srt(cues) if target == "srt" else _write_vtt(cues)
|
||||
base = f.filename.rsplit(".", 1)[0]
|
||||
return send_file(
|
||||
resp = send_file(
|
||||
_bytes_io(out_text.encode("utf-8")),
|
||||
mimetype="text/plain",
|
||||
as_attachment=True,
|
||||
download_name=f"{base}.{target}",
|
||||
)
|
||||
return set_conversion_metadata(resp, "python", QUALITY_HIGH)
|
||||
|
||||
|
||||
# ── Burn subtitles ─────────────────────────────────────
|
||||
@@ -710,12 +768,7 @@ def burn_subtitles():
|
||||
data = fp.read()
|
||||
|
||||
base = f.filename.rsplit(".", 1)[0]
|
||||
return send_file(
|
||||
_bytes_io(data),
|
||||
mimetype="video/mp4",
|
||||
as_attachment=True,
|
||||
download_name=f"{base}_subs.mp4",
|
||||
)
|
||||
return _media_response(data, mimetype="video/mp4", download_name=f"{base}_subs.mp4")
|
||||
|
||||
|
||||
# ── Audio Normalize (FFmpeg loudnorm, EBU R128) ────────
|
||||
@@ -808,24 +861,16 @@ def normalize_audio():
|
||||
mime_map = {"mp3": "audio/mpeg", "wav": "audio/wav", "flac": "audio/flac",
|
||||
"ogg": "audio/ogg", "m4a": "audio/mp4", "opus": "audio/opus"}
|
||||
mime = mime_map.get(out_ext, "application/octet-stream")
|
||||
return send_file(
|
||||
_bytes_io(data),
|
||||
mimetype=mime,
|
||||
as_attachment=True,
|
||||
download_name=f"{base}_normalized.{out_ext}",
|
||||
)
|
||||
return _media_response(data, mimetype=mime, download_name=f"{base}_normalized.{out_ext}")
|
||||
|
||||
|
||||
# ── Speech to Text (Whisper, optional) ──────────────
|
||||
|
||||
try:
|
||||
import whisper as _whisper # type: ignore
|
||||
HAS_WHISPER = True
|
||||
except ImportError:
|
||||
HAS_WHISPER = False
|
||||
HAS_WHISPER = importlib.util.find_spec("whisper") is not None
|
||||
|
||||
WHISPER_MODELS = ["tiny", "base", "small", "medium", "large"]
|
||||
_whisper_model_cache: dict = {}
|
||||
_whisper_module = None
|
||||
|
||||
|
||||
@bp.route("/transcribe", methods=["GET", "POST"])
|
||||
@@ -913,9 +958,13 @@ def transcribe():
|
||||
in_path = _save_upload(f, tmp)
|
||||
|
||||
try:
|
||||
global _whisper_module
|
||||
if _whisper_module is None:
|
||||
import whisper as whisper_module # type: ignore
|
||||
_whisper_module = whisper_module
|
||||
model = _whisper_model_cache.get(model_size)
|
||||
if model is None:
|
||||
model = _whisper.load_model(model_size)
|
||||
model = _whisper_module.load_model(model_size)
|
||||
_whisper_model_cache[model_size] = model
|
||||
result = model.transcribe(in_path, language=language, verbose=False)
|
||||
except Exception as e:
|
||||
@@ -926,7 +975,12 @@ def transcribe():
|
||||
base = f.filename.rsplit(".", 1)[0]
|
||||
|
||||
if out_fmt == "txt":
|
||||
return jsonify({"text": (result.get("text") or "").strip() or "(no speech detected)"})
|
||||
return jsonify({
|
||||
"text": (result.get("text") or "").strip() or "(no speech detected)",
|
||||
"engine": "whisper",
|
||||
"quality": QUALITY_HIGH,
|
||||
"warnings": [],
|
||||
})
|
||||
|
||||
# Build SRT / VTT from segments
|
||||
segments = result.get("segments") or []
|
||||
@@ -946,8 +1000,9 @@ def transcribe():
|
||||
lines.append((seg.get("text") or "").strip())
|
||||
lines.append("")
|
||||
body = "\n".join(lines).rstrip() + "\n"
|
||||
return send_file(_bytes_io(body.encode("utf-8")), mimetype="text/plain",
|
||||
resp = send_file(_bytes_io(body.encode("utf-8")), mimetype="text/plain",
|
||||
as_attachment=True, download_name=f"{base}.srt")
|
||||
return set_conversion_metadata(resp, "whisper", QUALITY_HIGH)
|
||||
|
||||
# vtt
|
||||
lines = ["WEBVTT", ""]
|
||||
@@ -958,8 +1013,9 @@ def transcribe():
|
||||
lines.append((seg.get("text") or "").strip())
|
||||
lines.append("")
|
||||
body = "\n".join(lines).rstrip() + "\n"
|
||||
return send_file(_bytes_io(body.encode("utf-8")), mimetype="text/plain",
|
||||
resp = send_file(_bytes_io(body.encode("utf-8")), mimetype="text/plain",
|
||||
as_attachment=True, download_name=f"{base}.vtt")
|
||||
return set_conversion_metadata(resp, "whisper", QUALITY_HIGH)
|
||||
|
||||
|
||||
# ── helpers ────────────────────────────────────────────
|
||||
|
||||
+3
-1
@@ -1,9 +1,11 @@
|
||||
import io
|
||||
import fitz # PyMuPDF
|
||||
from flask import Blueprint, render_template, request, send_file, jsonify
|
||||
from utils.file_utils import make_zip
|
||||
from utils.pymupdf import import_pymupdf
|
||||
from routes._helpers import safe_int, safe_float, log_error, NO_FILE_SINGLE, NO_FILE_MULTIPLE
|
||||
|
||||
fitz = import_pymupdf()
|
||||
|
||||
bp = Blueprint("pdf", __name__)
|
||||
|
||||
|
||||
|
||||
@@ -19,12 +19,20 @@ from reportlab.platypus import SimpleDocTemplate, Paragraph, PageBreak, Table, T
|
||||
from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle
|
||||
|
||||
from utils.file_utils import make_zip
|
||||
from utils.capabilities import (
|
||||
QUALITY_BASIC,
|
||||
QUALITY_HIGH,
|
||||
find_soffice,
|
||||
set_conversion_metadata,
|
||||
soffice_convert,
|
||||
)
|
||||
|
||||
bp = Blueprint("spreadsheet", __name__)
|
||||
|
||||
EXCEL_ACCEPT = ".xlsx,.xlsm,.xls"
|
||||
EXCEL_EXTS = {"xlsx", "xlsm", "xls"}
|
||||
MAX_PDF_ROWS_PER_SHEET = 5000
|
||||
SOFFICE = find_soffice()
|
||||
|
||||
|
||||
# ── Reader helpers ─────────────────────────────
|
||||
@@ -145,6 +153,8 @@ def excel_to_pdf_page():
|
||||
title="Excel to PDF",
|
||||
description="Convert an Excel workbook to PDF (one section per sheet)",
|
||||
notes=(
|
||||
'<p><strong>High fidelity path:</strong> LibreOffice is used when installed. '
|
||||
'The basic Python table renderer is only used when you explicitly allow fallback.</p>'
|
||||
'<p><strong>This is a basic renderer, not a pixel-perfect Excel print.</strong> '
|
||||
'It reads cell values and renders each sheet as a simple table. <strong>Not preserved:</strong></p>'
|
||||
'<ul style="margin:.4rem 0 .6rem 1.2rem">'
|
||||
@@ -153,10 +163,8 @@ def excel_to_pdf_page():
|
||||
'<li>Merged cells, frozen panes, charts, images, embedded objects</li>'
|
||||
'<li>Print areas, page setup, headers/footers</li>'
|
||||
'</ul>'
|
||||
'<p><strong>For full-fidelity Excel→PDF</strong> (matching what Excel itself prints), '
|
||||
'install LibreOffice and we\'ll route through it automatically — the same approach '
|
||||
'used by <a href="/convert/pptx-to-pdf">PowerPoint to PDF</a>. '
|
||||
'<em>(Coming soon — track progress in the CHANGELOG.)</em></p>'
|
||||
'<p><strong>For full-fidelity Excel to PDF</strong> (matching print output), '
|
||||
'install LibreOffice and this route will use it automatically.</p>'
|
||||
'<p style="font-size:.9em;color:var(--muted)">Output uses one PDF page per sheet '
|
||||
'(or splits across pages if the table is too wide/tall). Auto-fits column widths '
|
||||
'to content.</p>'
|
||||
@@ -179,6 +187,8 @@ def excel_to_pdf_page():
|
||||
]},
|
||||
{"type": "number", "name": "fontsize", "label": "Font Size",
|
||||
"default": 8, "min": 5, "max": 14},
|
||||
{"type": "checkbox", "name": "use_basic_fallback", "label": "Fallback",
|
||||
"check_label": "Allow basic table fallback if LibreOffice is unavailable or fails"},
|
||||
])
|
||||
|
||||
|
||||
@@ -449,6 +459,32 @@ def excel_to_pdf():
|
||||
if not files or not files[0].filename:
|
||||
return jsonify(error=NO_FILE_SINGLE), 400
|
||||
|
||||
file_data = files[0].read()
|
||||
filename = files[0].filename
|
||||
ext = _ext(filename)
|
||||
allow_basic_fallback = request.form.get("use_basic_fallback") == "on"
|
||||
|
||||
if ext not in EXCEL_EXTS:
|
||||
return jsonify(error="Unsupported file type. Upload .xlsx, .xlsm, or .xls."), 400
|
||||
|
||||
try:
|
||||
pdf_bytes = soffice_convert(file_data, ext, "pdf", timeout=300)
|
||||
except Exception as e:
|
||||
log_error(e, "excel-to-pdf libreoffice")
|
||||
pdf_bytes = None
|
||||
|
||||
if pdf_bytes is not None:
|
||||
base = filename.rsplit(".", 1)[0]
|
||||
resp = send_file(io.BytesIO(pdf_bytes), mimetype="application/pdf",
|
||||
as_attachment=True, download_name=f"{base}.pdf")
|
||||
return set_conversion_metadata(resp, "libreoffice", QUALITY_HIGH)
|
||||
|
||||
if not allow_basic_fallback:
|
||||
return jsonify(error=(
|
||||
"High-fidelity Excel to PDF requires LibreOffice. "
|
||||
"Tick 'Allow basic table fallback' to continue with lower layout fidelity."
|
||||
)), 400
|
||||
|
||||
size_name = request.form.get("size", "A4")
|
||||
orientation = request.form.get("orientation", "landscape")
|
||||
fontsize = safe_int(request.form.get("fontsize"), 8, min_val=4, max_val=24)
|
||||
@@ -459,7 +495,7 @@ def excel_to_pdf():
|
||||
page_size = landscape(page_size)
|
||||
|
||||
try:
|
||||
sheets = read_workbook(files[0].read(), files[0].filename)
|
||||
sheets = read_workbook(file_data, filename)
|
||||
except Exception as e:
|
||||
log_error(e, "excel-to-pdf read")
|
||||
return jsonify(error="Could not read workbook (file may be corrupted or unsupported format)."), 400
|
||||
@@ -523,9 +559,15 @@ def excel_to_pdf():
|
||||
return jsonify(error=f"PDF layout failed (table too wide?). Try a larger page size or smaller font. Details: {str(e)[:150]}"), 400
|
||||
|
||||
buf.seek(0)
|
||||
base = files[0].filename.rsplit(".", 1)[0]
|
||||
return send_file(buf, mimetype="application/pdf",
|
||||
base = filename.rsplit(".", 1)[0]
|
||||
resp = send_file(buf, mimetype="application/pdf",
|
||||
as_attachment=True, download_name=f"{base}.pdf")
|
||||
return set_conversion_metadata(
|
||||
resp,
|
||||
"openpyxl/reportlab" if ext != "xls" else "xlrd/reportlab",
|
||||
QUALITY_BASIC,
|
||||
"Basic fallback used; formatting, charts, print areas, and page setup are not preserved.",
|
||||
)
|
||||
|
||||
|
||||
def _pdf_cell(v):
|
||||
|
||||
@@ -3,55 +3,44 @@ setlocal
|
||||
|
||||
cd /d "%~dp0"
|
||||
|
||||
set "EVERYTOOLS_PY="
|
||||
|
||||
where py >nul 2>nul
|
||||
if not errorlevel 1 (
|
||||
py -3 -c "import sys; raise SystemExit(0 if sys.version_info >= (3, 10) else 1)" >nul 2>nul
|
||||
if not errorlevel 1 set "EVERYTOOLS_PY=py -3"
|
||||
)
|
||||
|
||||
if not defined EVERYTOOLS_PY (
|
||||
where python >nul 2>nul
|
||||
if errorlevel 1 (
|
||||
if not errorlevel 1 (
|
||||
python -c "import sys; raise SystemExit(0 if sys.version_info >= (3, 10) else 1)" >nul 2>nul
|
||||
if not errorlevel 1 set "EVERYTOOLS_PY=python"
|
||||
)
|
||||
)
|
||||
|
||||
if not defined EVERYTOOLS_PY (
|
||||
echo.
|
||||
echo Python 3.10 or newer is required, but was not found on PATH.
|
||||
echo Python 3.10 or newer is required, but was not found.
|
||||
echo.
|
||||
echo Download and install it from:
|
||||
echo Download and install Python from:
|
||||
echo https://www.python.org/downloads/
|
||||
echo.
|
||||
echo During install, make sure to check "Add Python to PATH".
|
||||
echo During install, make sure to check "Add python.exe to PATH".
|
||||
echo.
|
||||
pause
|
||||
exit /b 1
|
||||
)
|
||||
|
||||
if not exist ".venv\Scripts\python.exe" (
|
||||
%EVERYTOOLS_PY% "scripts\launcher.py" %*
|
||||
set "EXITCODE=%ERRORLEVEL%"
|
||||
|
||||
echo.
|
||||
echo First-time setup: creating virtual environment...
|
||||
echo (this only happens once and takes about a minute)
|
||||
echo.
|
||||
python -m venv .venv
|
||||
if errorlevel 1 (
|
||||
echo.
|
||||
echo Failed to create the virtual environment.
|
||||
pause
|
||||
exit /b 1
|
||||
if not "%EXITCODE%"=="0" (
|
||||
echo EveryTools stopped with an error. See the messages above.
|
||||
) else (
|
||||
echo EveryTools stopped.
|
||||
)
|
||||
)
|
||||
|
||||
call ".venv\Scripts\activate.bat"
|
||||
|
||||
echo.
|
||||
echo Checking dependencies...
|
||||
pip install --quiet --disable-pip-version-check -r requirements.txt
|
||||
if errorlevel 1 (
|
||||
echo.
|
||||
echo Dependency install failed. Check your internet connection and try again.
|
||||
pause
|
||||
exit /b 1
|
||||
)
|
||||
|
||||
echo.
|
||||
echo ============================================================
|
||||
echo EveryTools is starting at http://localhost:5000
|
||||
echo Your browser will open automatically in a moment.
|
||||
echo Close this window to stop the server.
|
||||
echo ============================================================
|
||||
echo.
|
||||
|
||||
start "" cmd /c "timeout /t 2 /nobreak >nul && start http://localhost:5000"
|
||||
python app.py
|
||||
|
||||
pause
|
||||
exit /b %EXITCODE%
|
||||
|
||||
+34
-38
@@ -1,52 +1,48 @@
|
||||
#!/usr/bin/env bash
|
||||
# Double-click launcher for macOS (also works as ./run.sh on Linux).
|
||||
# Double-click launcher for macOS. Also used by run.sh on Linux.
|
||||
|
||||
set -e
|
||||
set -u
|
||||
cd "$(dirname "$0")"
|
||||
|
||||
if ! command -v python3 >/dev/null 2>&1; then
|
||||
find_python() {
|
||||
for cmd in python3 python; do
|
||||
if command -v "$cmd" >/dev/null 2>&1; then
|
||||
if "$cmd" -c 'import sys; raise SystemExit(0 if sys.version_info >= (3, 10) else 1)' >/dev/null 2>&1; then
|
||||
printf '%s\n' "$cmd"
|
||||
return 0
|
||||
fi
|
||||
fi
|
||||
done
|
||||
return 1
|
||||
}
|
||||
|
||||
PYTHON_CMD="$(find_python || true)"
|
||||
|
||||
if [ -z "$PYTHON_CMD" ]; then
|
||||
echo
|
||||
echo " Python 3.10 or newer is required, but was not found."
|
||||
echo
|
||||
echo " macOS: brew install python (install Homebrew from https://brew.sh)"
|
||||
echo " Linux: sudo apt install python3 python3-venv (Debian/Ubuntu)"
|
||||
echo " sudo dnf install python3 (Fedora)"
|
||||
echo " macOS: install Python from https://www.python.org/downloads/"
|
||||
echo " or install Homebrew from https://brew.sh and run: brew install python"
|
||||
echo
|
||||
read -p "Press Enter to close..." _
|
||||
echo " Linux: Debian/Ubuntu: sudo apt install python3 python3-venv"
|
||||
echo " Fedora: sudo dnf install python3"
|
||||
echo
|
||||
printf "Press Enter to close..."
|
||||
read -r _
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if [ ! -d ".venv" ]; then
|
||||
"$PYTHON_CMD" scripts/launcher.py "$@"
|
||||
EXITCODE=$?
|
||||
|
||||
echo
|
||||
echo " First-time setup: creating virtual environment..."
|
||||
echo " (this only happens once and takes about a minute)"
|
||||
echo
|
||||
python3 -m venv .venv
|
||||
if [ "$EXITCODE" -ne 0 ]; then
|
||||
echo " EveryTools stopped with an error. See the messages above."
|
||||
else
|
||||
echo " EveryTools stopped."
|
||||
fi
|
||||
|
||||
# shellcheck disable=SC1091
|
||||
source .venv/bin/activate
|
||||
|
||||
echo
|
||||
echo " Checking dependencies..."
|
||||
pip install --quiet --disable-pip-version-check -r requirements.txt
|
||||
|
||||
echo
|
||||
echo " ============================================================"
|
||||
echo " EveryTools is starting at http://localhost:5000"
|
||||
echo " Your browser will open automatically in a moment."
|
||||
echo " Press Ctrl+C in this window to stop the server."
|
||||
echo " ============================================================"
|
||||
echo
|
||||
|
||||
# Open the browser after a brief delay, in the background.
|
||||
(
|
||||
sleep 2
|
||||
if command -v open >/dev/null 2>&1; then
|
||||
open http://localhost:5000
|
||||
elif command -v xdg-open >/dev/null 2>&1; then
|
||||
xdg-open http://localhost:5000
|
||||
fi
|
||||
) &
|
||||
|
||||
python app.py
|
||||
printf "Press Enter to close..."
|
||||
read -r _
|
||||
exit "$EXITCODE"
|
||||
|
||||
@@ -1,6 +1,5 @@
|
||||
#!/usr/bin/env bash
|
||||
# Launcher for Linux. Run with: ./run.sh
|
||||
# (macOS users: use run.command — identical content, different filename
|
||||
# so Finder recognises it as double-clickable.)
|
||||
|
||||
exec "$(dirname "$0")/run.command" "$@"
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
exec "$SCRIPT_DIR/run.command" "$@"
|
||||
|
||||
@@ -0,0 +1,262 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import hashlib
|
||||
import os
|
||||
import shutil
|
||||
import subprocess
|
||||
import sys
|
||||
import threading
|
||||
import time
|
||||
import venv
|
||||
import webbrowser
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
ROOT = Path(__file__).resolve().parent.parent
|
||||
VENV_DIR = ROOT / ".venv"
|
||||
STATE_DIR = VENV_DIR / ".everytools"
|
||||
CORE_REQ = ROOT / "requirements-core.txt"
|
||||
OPTIONAL_REQ = ROOT / "requirements-optional.txt"
|
||||
APP_URL = "http://localhost:5000"
|
||||
|
||||
|
||||
def venv_python() -> Path:
|
||||
if os.name == "nt":
|
||||
return VENV_DIR / "Scripts" / "python.exe"
|
||||
return VENV_DIR / "bin" / "python"
|
||||
|
||||
|
||||
def env() -> dict[str, str]:
|
||||
result = os.environ.copy()
|
||||
result["PYTHONNOUSERSITE"] = "1"
|
||||
result.setdefault("PIP_DISABLE_PIP_VERSION_CHECK", "1")
|
||||
return result
|
||||
|
||||
|
||||
def run(cmd: list[str | Path], *, check: bool = True, log_file=None) -> subprocess.CompletedProcess:
|
||||
text_cmd = " ".join(str(part) for part in cmd)
|
||||
print(f" > {text_cmd}")
|
||||
stdout = log_file if log_file else None
|
||||
stderr = subprocess.STDOUT if log_file else None
|
||||
proc = subprocess.run(
|
||||
[str(part) for part in cmd],
|
||||
cwd=ROOT,
|
||||
env=env(),
|
||||
stdout=stdout,
|
||||
stderr=stderr,
|
||||
)
|
||||
if check and proc.returncode != 0:
|
||||
raise subprocess.CalledProcessError(proc.returncode, [str(part) for part in cmd])
|
||||
return proc
|
||||
|
||||
|
||||
def file_hash(paths: list[Path]) -> str:
|
||||
digest = hashlib.sha256()
|
||||
for path in paths:
|
||||
digest.update(path.name.encode("utf-8"))
|
||||
digest.update(b"\0")
|
||||
digest.update(path.read_bytes())
|
||||
digest.update(b"\0")
|
||||
return digest.hexdigest()
|
||||
|
||||
|
||||
def stamp_path(name: str) -> Path:
|
||||
return STATE_DIR / f"{name}.sha256"
|
||||
|
||||
|
||||
def stamp_matches(name: str, expected: str) -> bool:
|
||||
path = stamp_path(name)
|
||||
return path.exists() and path.read_text(encoding="utf-8").strip() == expected
|
||||
|
||||
|
||||
def write_stamp(name: str, value: str) -> None:
|
||||
STATE_DIR.mkdir(parents=True, exist_ok=True)
|
||||
stamp_path(name).write_text(value + "\n", encoding="utf-8")
|
||||
|
||||
|
||||
def check_python_version() -> None:
|
||||
if sys.version_info < (3, 10):
|
||||
print()
|
||||
print(" Python 3.10 or newer is required.")
|
||||
print(f" Current Python: {sys.version.split()[0]}")
|
||||
raise SystemExit(1)
|
||||
|
||||
|
||||
def create_venv() -> None:
|
||||
if venv_python().exists():
|
||||
return
|
||||
|
||||
print()
|
||||
print(" First-time setup: creating a private .venv for EveryTools...")
|
||||
print(" This keeps the app away from broken global Python packages.")
|
||||
print()
|
||||
try:
|
||||
venv.EnvBuilder(with_pip=True, clear=False).create(VENV_DIR)
|
||||
except Exception as exc:
|
||||
print()
|
||||
print(" Could not create the virtual environment.")
|
||||
print(" On Linux, install the venv package first, for example:")
|
||||
print(" sudo apt install python3-venv")
|
||||
print()
|
||||
print(f" Details: {exc}")
|
||||
raise SystemExit(1) from exc
|
||||
|
||||
|
||||
def pip_install_core(*, force: bool = False) -> None:
|
||||
expected = file_hash([CORE_REQ])
|
||||
if not force and stamp_matches("core", expected):
|
||||
print(" Core Python dependencies are already installed.")
|
||||
return
|
||||
|
||||
print()
|
||||
print(" Installing core Python dependencies...")
|
||||
print()
|
||||
py = venv_python()
|
||||
|
||||
run([py, "-m", "ensurepip", "--upgrade"], check=False)
|
||||
run([py, "-m", "pip", "install", "--upgrade", "pip", "setuptools", "wheel"], check=False)
|
||||
|
||||
# Remove the unrelated PyPI packages that can shadow PyMuPDF's `fitz`.
|
||||
run([py, "-m", "pip", "uninstall", "-y", "fitz", "frontend"], check=False)
|
||||
|
||||
run([py, "-m", "pip", "install", "-r", CORE_REQ])
|
||||
write_stamp("core", expected)
|
||||
|
||||
|
||||
def parse_requirements(path: Path) -> list[str]:
|
||||
requirements = []
|
||||
for raw in path.read_text(encoding="utf-8").splitlines():
|
||||
line = raw.strip()
|
||||
if not line or line.startswith("#"):
|
||||
continue
|
||||
requirements.append(line)
|
||||
return requirements
|
||||
|
||||
|
||||
def pip_install_optional(*, force: bool = False) -> None:
|
||||
expected = file_hash([OPTIONAL_REQ])
|
||||
if not force and stamp_matches("optional", expected):
|
||||
print(" Optional Python packages were already attempted.")
|
||||
return
|
||||
|
||||
requirements = parse_requirements(OPTIONAL_REQ)
|
||||
if not requirements:
|
||||
write_stamp("optional", expected)
|
||||
return
|
||||
|
||||
print()
|
||||
print(" Installing optional Python packages best-effort...")
|
||||
print(" If one optional package fails, the app will still start.")
|
||||
print()
|
||||
|
||||
log_path = STATE_DIR / "optional-install.log"
|
||||
STATE_DIR.mkdir(parents=True, exist_ok=True)
|
||||
failures: list[str] = []
|
||||
with log_path.open("w", encoding="utf-8") as log:
|
||||
for requirement in requirements:
|
||||
log.write(f"\n\n=== {requirement} ===\n")
|
||||
log.flush()
|
||||
proc = run(
|
||||
[venv_python(), "-m", "pip", "install", requirement],
|
||||
check=False,
|
||||
log_file=log,
|
||||
)
|
||||
if proc.returncode != 0:
|
||||
failures.append(requirement)
|
||||
print(f" optional install skipped/failed: {requirement}")
|
||||
|
||||
write_stamp("optional", expected)
|
||||
if failures:
|
||||
print()
|
||||
print(" Some optional packages could not be installed:")
|
||||
print(" " + ", ".join(failures))
|
||||
print(f" Details were saved to: {log_path}")
|
||||
print(" The app will still run; affected tools will show install hints.")
|
||||
|
||||
|
||||
def verify_core_imports() -> bool:
|
||||
code = (
|
||||
"import flask, PIL, fitz\n"
|
||||
"assert hasattr(fitz, 'open'), getattr(fitz, '__file__', 'unknown')\n"
|
||||
)
|
||||
proc = subprocess.run(
|
||||
[str(venv_python()), "-c", code],
|
||||
cwd=ROOT,
|
||||
env=env(),
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
if proc.returncode == 0:
|
||||
return True
|
||||
print(proc.stdout.strip())
|
||||
print(proc.stderr.strip())
|
||||
return False
|
||||
|
||||
|
||||
def native_engine_note() -> None:
|
||||
try:
|
||||
from utils.capabilities import find_soffice
|
||||
except Exception:
|
||||
find_soffice = lambda: None
|
||||
|
||||
missing = []
|
||||
if not find_soffice():
|
||||
missing.append("LibreOffice")
|
||||
if not shutil.which("ffmpeg"):
|
||||
missing.append("FFmpeg")
|
||||
if not shutil.which("tesseract"):
|
||||
missing.append("Tesseract")
|
||||
if not (shutil.which("ODAFileConverter") or shutil.which("oda_file_converter")):
|
||||
missing.append("ODA File Converter")
|
||||
|
||||
if missing:
|
||||
print()
|
||||
print(" Optional native engines not detected:")
|
||||
print(" " + ", ".join(missing))
|
||||
print(" EveryTools will still start. The app shows install hints and uses")
|
||||
print(" these engines automatically when they are installed locally.")
|
||||
|
||||
|
||||
def open_browser_later() -> None:
|
||||
time.sleep(2)
|
||||
webbrowser.open(APP_URL)
|
||||
|
||||
|
||||
def start_app() -> int:
|
||||
print()
|
||||
print(" ============================================================")
|
||||
print(f" EveryTools is starting at {APP_URL}")
|
||||
print(" Close this window or press Ctrl+C to stop the server.")
|
||||
print(" ============================================================")
|
||||
print()
|
||||
|
||||
threading.Thread(target=open_browser_later, daemon=True).start()
|
||||
try:
|
||||
return subprocess.call([str(venv_python()), str(ROOT / "app.py")], cwd=ROOT, env=env())
|
||||
except KeyboardInterrupt:
|
||||
return 0
|
||||
|
||||
|
||||
def main() -> int:
|
||||
repair = "--repair" in sys.argv
|
||||
check_python_version()
|
||||
create_venv()
|
||||
pip_install_core(force=repair)
|
||||
|
||||
if not verify_core_imports():
|
||||
print()
|
||||
print(" Core dependency check failed. Repairing the virtual environment...")
|
||||
pip_install_core(force=True)
|
||||
if not verify_core_imports():
|
||||
print()
|
||||
print(" EveryTools could not install the required Python packages.")
|
||||
print(" Check your internet connection, then run the launcher again.")
|
||||
return 1
|
||||
|
||||
pip_install_optional(force=repair)
|
||||
native_engine_note()
|
||||
return start_app()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
raise SystemExit(main())
|
||||
@@ -0,0 +1,62 @@
|
||||
/* Local Bootstrap Icons shim.
|
||||
The original project used the Bootstrap Icons CDN. This lightweight local
|
||||
fallback keeps the app offline without changing every template. */
|
||||
.bi {
|
||||
display: inline-flex;
|
||||
align-items: center;
|
||||
justify-content: center;
|
||||
width: 1.15em;
|
||||
height: 1.15em;
|
||||
line-height: 1;
|
||||
vertical-align: -0.125em;
|
||||
}
|
||||
|
||||
.bi::before {
|
||||
content: "";
|
||||
display: inline-block;
|
||||
width: .78em;
|
||||
height: .78em;
|
||||
border: 1.8px solid currentColor;
|
||||
border-radius: 2px;
|
||||
}
|
||||
|
||||
.bi-tools::before,
|
||||
.bi-rulers::before { border-radius: 0; transform: rotate(45deg); }
|
||||
.bi-chevron-down::before { content: "v"; border: 0; width: auto; height: auto; font-size: .78em; }
|
||||
.bi-list::before { content: "="; border: 0; width: auto; height: auto; font-weight: 700; }
|
||||
.bi-check-circle-fill::before,
|
||||
.bi-check-circle::before { border-radius: 50%; background: currentColor; box-shadow: inset 0 0 0 3px #fff; }
|
||||
.bi-exclamation-circle-fill::before,
|
||||
.bi-exclamation-triangle-fill::before { border-radius: 50%; border-width: 2px; }
|
||||
.bi-arrow-repeat::before,
|
||||
.bi-arrow-clockwise::before { border-radius: 50%; border-right-color: transparent; }
|
||||
.bi-download::before,
|
||||
.bi-cloud-arrow-up::before,
|
||||
.bi-box-arrow-up::before { border-top: 0; border-left: 0; transform: rotate(45deg); }
|
||||
.bi-scissors::before { border-radius: 50%; box-shadow: .35em .35em 0 -.15em currentColor; }
|
||||
.bi-lock-fill::before,
|
||||
.bi-unlock-fill::before,
|
||||
.bi-shield-lock-fill::before { border-radius: 2px 2px 4px 4px; }
|
||||
.bi-file-pdf-fill::before,
|
||||
.bi-file-word-fill::before,
|
||||
.bi-file-image-fill::before,
|
||||
.bi-file-text-fill::before,
|
||||
.bi-file-earmark-spreadsheet-fill::before,
|
||||
.bi-file-earmark-slides-fill::before,
|
||||
.bi-file-earmark-pdf-fill::before,
|
||||
.bi-file-earmark-text-fill::before,
|
||||
.bi-file-zip-fill::before,
|
||||
.bi-file-earmark-play-fill::before,
|
||||
.bi-file-earmark::before,
|
||||
.bi-file-pdf::before,
|
||||
.bi-file-zip::before { border-radius: 1px; border-top-right-radius: 5px; }
|
||||
.bi-qr-code::before,
|
||||
.bi-qr-code-scan::before,
|
||||
.bi-upc-scan::before { box-shadow: inset .28em .28em 0 currentColor, inset -.28em -.28em 0 currentColor; }
|
||||
.bi-calculator::before,
|
||||
.bi-calculator-fill::before { box-shadow: inset 0 .25em 0 currentColor; }
|
||||
.bi-percent::before,
|
||||
.bi-123::before,
|
||||
.bi-hash::before { content: "#"; border: 0; width: auto; height: auto; font-weight: 700; }
|
||||
.bi-code-slash::before,
|
||||
.bi-braces::before { content: "{}"; border: 0; width: auto; height: auto; font-size: .78em; font-weight: 700; }
|
||||
@@ -165,6 +165,8 @@ a { color: var(--primary); text-decoration: none; }
|
||||
.btn:active { transform: scale(.97); }
|
||||
.btn-primary { background: var(--primary); color: #fff; }
|
||||
.btn-primary:hover { background: var(--primary-dark); }
|
||||
.btn-secondary { background: var(--bg); color: var(--text); border: 1px solid var(--border); }
|
||||
.btn-secondary:hover { background: var(--border); }
|
||||
.btn-success { background: var(--success); color: #fff; }
|
||||
.btn-success:hover { background: #27b0a3; }
|
||||
.btn-small { padding: .3rem .7rem; font-size: .8rem; background: var(--bg); color: var(--text); }
|
||||
@@ -253,6 +255,46 @@ a { color: var(--primary); text-decoration: none; }
|
||||
.tool-notes li { margin-bottom: .2rem; }
|
||||
.tool-notes a { color: #b26f00; }
|
||||
|
||||
.capability-status {
|
||||
border: 1px solid var(--border);
|
||||
border-radius: var(--radius);
|
||||
background: var(--surface);
|
||||
padding: .75rem 1rem;
|
||||
margin-bottom: 1rem;
|
||||
font-size: .88rem;
|
||||
line-height: 1.45;
|
||||
}
|
||||
.capability-status.high {
|
||||
border-left: 3px solid var(--success);
|
||||
}
|
||||
.capability-status.basic {
|
||||
border-left: 3px solid var(--warning);
|
||||
}
|
||||
.capability-status.unavailable {
|
||||
border-left: 3px solid var(--danger);
|
||||
}
|
||||
.capability-status strong {
|
||||
display: inline-flex;
|
||||
align-items: center;
|
||||
gap: .35rem;
|
||||
margin-right: .35rem;
|
||||
}
|
||||
.capability-status small {
|
||||
display: block;
|
||||
color: var(--text-light);
|
||||
margin-top: .25rem;
|
||||
}
|
||||
.result-meta {
|
||||
width: 100%;
|
||||
color: var(--text-light);
|
||||
font-size: .8rem;
|
||||
}
|
||||
.result-meta span {
|
||||
display: inline-flex;
|
||||
align-items: center;
|
||||
margin-right: .5rem;
|
||||
}
|
||||
|
||||
/* ── Upload Zone ──────────────────────────────── */
|
||||
.upload-zone {
|
||||
border: 2px dashed var(--border);
|
||||
|
||||
+67
-3
@@ -33,8 +33,43 @@ document.addEventListener("DOMContentLoaded", () => {
|
||||
initUploadZone();
|
||||
initToolForm();
|
||||
initDependentOptions();
|
||||
initCapabilityStatus();
|
||||
});
|
||||
|
||||
async function initCapabilityStatus() {
|
||||
const box = document.getElementById("capability-status");
|
||||
if (!box) return;
|
||||
const endpoint = box.dataset.endpoint;
|
||||
if (!endpoint) return;
|
||||
try {
|
||||
const resp = await fetch("/capabilities");
|
||||
if (!resp.ok) return;
|
||||
const data = await resp.json();
|
||||
const status = data.routes && data.routes[endpoint];
|
||||
if (!status) return;
|
||||
box.className = "capability-status " + status.quality;
|
||||
box.style.display = "block";
|
||||
|
||||
const missing = (status.missing_engines || [])
|
||||
.map(id => data.engines[id]?.label || id)
|
||||
.join(", ");
|
||||
const engines = (status.required_engines || [])
|
||||
.map(id => data.engines[id]?.label || id)
|
||||
.join(", ");
|
||||
const detail = status.quality === "high"
|
||||
? `Using local high-fidelity engine${engines ? ": " + engines : ""}.`
|
||||
: status.quality === "basic"
|
||||
? `High-fidelity engine missing${missing ? ": " + missing : ""}. ${status.fallback || ""}`
|
||||
: `Required local engine missing${missing ? ": " + missing : ""}.`;
|
||||
|
||||
box.innerHTML = `
|
||||
<strong><i class="bi ${status.quality === "high" ? "bi-check-circle-fill" : "bi-exclamation-triangle-fill"}"></i> ${status.status}</strong>
|
||||
<span>${status.label}</span>
|
||||
<small>${detail}</small>
|
||||
`;
|
||||
} catch (_) {}
|
||||
}
|
||||
|
||||
|
||||
/* ── Upload Zone ──────────────────────────────── */
|
||||
let selectedFiles = [];
|
||||
@@ -176,10 +211,16 @@ function initToolForm() {
|
||||
const url = URL.createObjectURL(blob);
|
||||
|
||||
// If image, show preview
|
||||
const meta = {
|
||||
engine: resp.headers.get("X-Conversion-Engine") || "",
|
||||
quality: resp.headers.get("X-Conversion-Quality") || "",
|
||||
warnings: resp.headers.get("X-Fidelity-Warnings") || ""
|
||||
};
|
||||
|
||||
if (ct.startsWith("image/")) {
|
||||
showFileResult(url, filename, true);
|
||||
showFileResult(url, filename, true, meta);
|
||||
} else {
|
||||
showFileResult(url, filename, false);
|
||||
showFileResult(url, filename, false, meta);
|
||||
}
|
||||
}
|
||||
} catch (err) {
|
||||
@@ -202,7 +243,7 @@ function showError(msg) {
|
||||
document.getElementById("error-message").textContent = msg;
|
||||
}
|
||||
|
||||
function showFileResult(url, filename, isImage) {
|
||||
function showFileResult(url, filename, isImage, meta = {}) {
|
||||
const area = document.getElementById("result-area");
|
||||
area.style.display = "block";
|
||||
document.getElementById("result-error").style.display = "none";
|
||||
@@ -225,6 +266,19 @@ function showFileResult(url, filename, isImage) {
|
||||
} else {
|
||||
preview.style.display = "none";
|
||||
}
|
||||
|
||||
const oldMeta = success.querySelector(".result-meta");
|
||||
if (oldMeta) oldMeta.remove();
|
||||
if (meta.engine || meta.quality || meta.warnings) {
|
||||
const div = document.createElement("div");
|
||||
div.className = "result-meta";
|
||||
const parts = [];
|
||||
if (meta.engine) parts.push(`<span>Engine: ${escapeHtml(meta.engine)}</span>`);
|
||||
if (meta.quality) parts.push(`<span>Quality: ${escapeHtml(meta.quality)}</span>`);
|
||||
if (meta.warnings) parts.push(`<span>Warnings: ${escapeHtml(meta.warnings)}</span>`);
|
||||
div.innerHTML = parts.join("");
|
||||
success.appendChild(div);
|
||||
}
|
||||
}
|
||||
|
||||
function showTextResult(text) {
|
||||
@@ -245,6 +299,16 @@ function copyResult() {
|
||||
if (text) navigator.clipboard.writeText(text);
|
||||
}
|
||||
|
||||
function escapeHtml(text) {
|
||||
return String(text).replace(/[&<>"']/g, c => ({
|
||||
"&": "&",
|
||||
"<": "<",
|
||||
">": ">",
|
||||
'"': """,
|
||||
"'": "'"
|
||||
}[c]));
|
||||
}
|
||||
|
||||
|
||||
/* ── Dependent Options ────────────────────────── */
|
||||
function initDependentOptions() {
|
||||
|
||||
+1
-1
@@ -4,7 +4,7 @@
|
||||
<meta charset="UTF-8">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||
<title>{% block title %}Your Everyday Tools{% endblock %}</title>
|
||||
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/bootstrap-icons@1.11.3/font/bootstrap-icons.min.css">
|
||||
<link rel="stylesheet" href="{{ url_for('static', filename='css/icons.css') }}">
|
||||
<link rel="stylesheet" href="{{ url_for('static', filename='css/style.css') }}">
|
||||
</head>
|
||||
<body>
|
||||
|
||||
@@ -0,0 +1,197 @@
|
||||
{% extends "base.html" %}
|
||||
{% block title %}SVG to PNG - EveryTools{% endblock %}
|
||||
{% block top_title %}SVG to PNG{% endblock %}
|
||||
|
||||
{% block content %}
|
||||
<div class="tool-page">
|
||||
<div class="tool-header">
|
||||
<h1>SVG to PNG</h1>
|
||||
<p>Rasterise an SVG file to a PNG image locally in your browser.</p>
|
||||
</div>
|
||||
|
||||
<div id="capability-status" class="capability-status" data-endpoint="/image/svg-to-png" style="display:none"></div>
|
||||
|
||||
<form id="svg-png-form">
|
||||
<div class="upload-zone" id="svg-upload-zone">
|
||||
<input type="file" id="svg-file-input" accept=".svg,image/svg+xml">
|
||||
<div class="upload-prompt" id="svg-upload-prompt">
|
||||
<i class="bi bi-cloud-arrow-up"></i>
|
||||
<p>Drag & drop SVG here</p>
|
||||
<span>or click to browse</span>
|
||||
<small>Accepted: .svg</small>
|
||||
</div>
|
||||
</div>
|
||||
<div class="file-list" id="svg-file-list"></div>
|
||||
|
||||
<div class="tool-options">
|
||||
<div class="form-group">
|
||||
<label for="svg-width">Output width (pixels, 0 = native size)</label>
|
||||
<input type="number" id="svg-width" value="0" min="0" max="10000" step="1">
|
||||
</div>
|
||||
<div class="form-group">
|
||||
<label>Background</label>
|
||||
<label class="checkbox-label">
|
||||
<input type="checkbox" id="svg-transparent" checked>
|
||||
<span>Transparent (otherwise white)</span>
|
||||
</label>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<button type="submit" class="btn btn-primary" id="svg-render-btn">
|
||||
<i class="bi bi-image"></i> Render in browser
|
||||
</button>
|
||||
<button type="button" class="btn btn-secondary" id="svg-server-btn">
|
||||
<i class="bi bi-arrow-repeat"></i> Use server fallback
|
||||
</button>
|
||||
</form>
|
||||
|
||||
<div id="result-area" class="result-area" style="display:none">
|
||||
<div id="result-success" class="result-success" style="display:none">
|
||||
<i class="bi bi-check-circle-fill"></i>
|
||||
<span id="result-message">Done!</span>
|
||||
<a id="download-btn" class="btn btn-success" download>
|
||||
<i class="bi bi-download"></i> Download
|
||||
</a>
|
||||
<div id="result-preview" style="display:none"></div>
|
||||
</div>
|
||||
<div id="result-error" class="result-error" style="display:none">
|
||||
<i class="bi bi-exclamation-circle-fill"></i>
|
||||
<span id="error-message"></span>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<script>
|
||||
(() => {
|
||||
const zone = document.getElementById("svg-upload-zone");
|
||||
const input = document.getElementById("svg-file-input");
|
||||
const list = document.getElementById("svg-file-list");
|
||||
const prompt = document.getElementById("svg-upload-prompt");
|
||||
const form = document.getElementById("svg-png-form");
|
||||
const fallbackBtn = document.getElementById("svg-server-btn");
|
||||
let selectedFile = null;
|
||||
|
||||
function setFile(file) {
|
||||
selectedFile = file || null;
|
||||
if (!selectedFile) {
|
||||
list.innerHTML = "";
|
||||
prompt.style.display = "";
|
||||
return;
|
||||
}
|
||||
prompt.style.display = "none";
|
||||
list.innerHTML = `<div class="file-item"><span><i class="bi bi-file-earmark"></i> ${escapeHtml(selectedFile.name)} <small>(${formatSize(selectedFile.size)})</small></span><button type="button" class="remove-file" id="svg-clear-file">×</button></div>`;
|
||||
document.getElementById("svg-clear-file").addEventListener("click", () => setFile(null));
|
||||
}
|
||||
|
||||
function showError(message) {
|
||||
document.getElementById("result-area").style.display = "block";
|
||||
document.getElementById("result-success").style.display = "none";
|
||||
document.getElementById("result-error").style.display = "flex";
|
||||
document.getElementById("error-message").textContent = message;
|
||||
}
|
||||
|
||||
function showResult(url, filename, metaText) {
|
||||
document.getElementById("result-area").style.display = "block";
|
||||
document.getElementById("result-error").style.display = "none";
|
||||
document.getElementById("result-success").style.display = "flex";
|
||||
const link = document.getElementById("download-btn");
|
||||
link.href = url;
|
||||
link.download = filename;
|
||||
const preview = document.getElementById("result-preview");
|
||||
preview.style.display = "block";
|
||||
preview.innerHTML = `<img src="${url}" alt="PNG preview" style="max-width:100%;max-height:420px;border-radius:6px;margin-top:1rem">${metaText ? `<div class="result-meta">${metaText}</div>` : ""}`;
|
||||
}
|
||||
|
||||
function intrinsicSize(svgText, img) {
|
||||
const parsed = new DOMParser().parseFromString(svgText, "image/svg+xml");
|
||||
const svg = parsed.documentElement;
|
||||
const viewBox = (svg.getAttribute("viewBox") || "").trim().split(/\s+/).map(Number);
|
||||
if (viewBox.length === 4 && viewBox.every(Number.isFinite) && viewBox[2] > 0 && viewBox[3] > 0) {
|
||||
return { width: viewBox[2], height: viewBox[3] };
|
||||
}
|
||||
return {
|
||||
width: img.naturalWidth || 1024,
|
||||
height: img.naturalHeight || 1024,
|
||||
};
|
||||
}
|
||||
|
||||
async function renderClient() {
|
||||
if (!selectedFile) {
|
||||
showError("Please select an SVG file first.");
|
||||
return;
|
||||
}
|
||||
const svgText = await selectedFile.text();
|
||||
const blobUrl = URL.createObjectURL(new Blob([svgText], { type: "image/svg+xml" }));
|
||||
const img = new Image();
|
||||
img.decoding = "async";
|
||||
await new Promise((resolve, reject) => {
|
||||
img.onload = resolve;
|
||||
img.onerror = () => reject(new Error("Browser could not render this SVG."));
|
||||
img.src = blobUrl;
|
||||
});
|
||||
|
||||
const size = intrinsicSize(svgText, img);
|
||||
const targetWidth = Math.max(0, Math.min(10000, Number(document.getElementById("svg-width").value) || 0));
|
||||
const scale = targetWidth > 0 ? targetWidth / size.width : 1;
|
||||
const canvas = document.createElement("canvas");
|
||||
canvas.width = Math.max(1, Math.round(size.width * scale));
|
||||
canvas.height = Math.max(1, Math.round(size.height * scale));
|
||||
const ctx = canvas.getContext("2d");
|
||||
const transparent = document.getElementById("svg-transparent").checked;
|
||||
if (!transparent) {
|
||||
ctx.fillStyle = "#fff";
|
||||
ctx.fillRect(0, 0, canvas.width, canvas.height);
|
||||
}
|
||||
ctx.drawImage(img, 0, 0, canvas.width, canvas.height);
|
||||
URL.revokeObjectURL(blobUrl);
|
||||
|
||||
const pngBlob = await new Promise(resolve => canvas.toBlob(resolve, "image/png"));
|
||||
const url = URL.createObjectURL(pngBlob);
|
||||
const base = selectedFile.name.replace(/\.[^.]+$/, "") || "image";
|
||||
showResult(url, `${base}.png`, "Engine: browser canvas | Quality: high");
|
||||
}
|
||||
|
||||
async function renderServerFallback() {
|
||||
if (!selectedFile) {
|
||||
showError("Please select an SVG file first.");
|
||||
return;
|
||||
}
|
||||
const fd = new FormData();
|
||||
fd.append("files", selectedFile);
|
||||
fd.append("width", document.getElementById("svg-width").value || "0");
|
||||
if (document.getElementById("svg-transparent").checked) fd.append("transparent", "on");
|
||||
const resp = await fetch("/image/svg-to-png", { method: "POST", body: fd });
|
||||
if (!resp.ok) {
|
||||
let message = "Server fallback failed.";
|
||||
try {
|
||||
const json = await resp.json();
|
||||
message = json.error || message;
|
||||
} catch (_) {}
|
||||
showError(message);
|
||||
return;
|
||||
}
|
||||
const blob = await resp.blob();
|
||||
const url = URL.createObjectURL(blob);
|
||||
const base = selectedFile.name.replace(/\.[^.]+$/, "") || "image";
|
||||
showResult(url, `${base}.png`, "Engine: svglib/reportlab | Quality: basic fallback");
|
||||
}
|
||||
|
||||
zone.addEventListener("click", () => input.click());
|
||||
zone.addEventListener("dragover", e => { e.preventDefault(); zone.classList.add("dragover"); });
|
||||
zone.addEventListener("dragleave", () => zone.classList.remove("dragover"));
|
||||
zone.addEventListener("drop", e => {
|
||||
e.preventDefault();
|
||||
zone.classList.remove("dragover");
|
||||
setFile(e.dataTransfer.files[0]);
|
||||
});
|
||||
input.addEventListener("change", () => setFile(input.files[0]));
|
||||
form.addEventListener("submit", e => {
|
||||
e.preventDefault();
|
||||
renderClient().catch(err => showError(err.message || "Browser rendering failed."));
|
||||
});
|
||||
fallbackBtn.addEventListener("click", () => {
|
||||
renderServerFallback().catch(err => showError(err.message || "Server fallback failed."));
|
||||
});
|
||||
})();
|
||||
</script>
|
||||
{% endblock %}
|
||||
@@ -15,6 +15,8 @@
|
||||
</div>
|
||||
{% endif %}
|
||||
|
||||
<div id="capability-status" class="capability-status" data-endpoint="{{ endpoint }}" style="display:none"></div>
|
||||
|
||||
<form id="tool-form" data-endpoint="{{ endpoint }}">
|
||||
{% if text_input %}
|
||||
<div class="form-group">
|
||||
|
||||
@@ -0,0 +1,17 @@
|
||||
# Fidelity Test Fixtures
|
||||
|
||||
This suite is for local/offline conversion fidelity checks. Tests should skip
|
||||
cleanly when an optional local engine is unavailable.
|
||||
|
||||
Planned fixture groups:
|
||||
|
||||
- `documents`: DOCX, PPTX, XLSX, HTML, and representative PDFs.
|
||||
- `images`: JPEG with EXIF orientation, transparent PNG, WebP, HEIC when available.
|
||||
- `svg`: filters, masks, viewBox-only sizing, embedded raster images.
|
||||
- `media`: short audio/video clips with known codecs for stream-copy checks.
|
||||
- `cad`: small DXF/DWG samples with common entities and unsupported-entity warnings.
|
||||
|
||||
PDF-like output checks should rasterize pages with PyMuPDF and compare page
|
||||
count, dimensions, and pixel similarity within fixed thresholds. Office outputs
|
||||
can assert structure directly and optionally round-trip through LibreOffice to
|
||||
PDF when LibreOffice is installed.
|
||||
@@ -0,0 +1,75 @@
|
||||
import io
|
||||
|
||||
import pytest
|
||||
|
||||
|
||||
def _load_app_or_skip():
|
||||
try:
|
||||
from app import app
|
||||
except ImportError as exc:
|
||||
pytest.skip(f"App dependencies are not installed: {exc}")
|
||||
return app
|
||||
|
||||
|
||||
def test_docx_to_pdf_requires_explicit_basic_fallback(monkeypatch):
|
||||
app = _load_app_or_skip()
|
||||
from routes import convert_tools
|
||||
|
||||
monkeypatch.setattr(convert_tools, "_soffice_convert", lambda *args, **kwargs: None)
|
||||
|
||||
with app.test_client() as client:
|
||||
resp = client.post(
|
||||
"/convert/to-pdf",
|
||||
data={"files": (io.BytesIO(b"not-a-real-docx"), "sample.docx")},
|
||||
content_type="multipart/form-data",
|
||||
)
|
||||
|
||||
assert resp.status_code == 400
|
||||
assert b"Allow basic Python fallback" in resp.data
|
||||
|
||||
|
||||
def test_excel_to_pdf_requires_explicit_basic_fallback(monkeypatch):
|
||||
app = _load_app_or_skip()
|
||||
from routes import spreadsheet_tools
|
||||
|
||||
monkeypatch.setattr(spreadsheet_tools, "soffice_convert", lambda *args, **kwargs: None)
|
||||
|
||||
with app.test_client() as client:
|
||||
resp = client.post(
|
||||
"/spreadsheet/excel-to-pdf",
|
||||
data={"files": (io.BytesIO(b"not-a-real-xlsx"), "sample.xlsx")},
|
||||
content_type="multipart/form-data",
|
||||
)
|
||||
|
||||
assert resp.status_code == 400
|
||||
assert b"Allow basic table fallback" in resp.data
|
||||
|
||||
|
||||
def test_excel_to_pdf_high_fidelity_metadata(monkeypatch):
|
||||
app = _load_app_or_skip()
|
||||
from routes import spreadsheet_tools
|
||||
|
||||
monkeypatch.setattr(spreadsheet_tools, "soffice_convert", lambda *args, **kwargs: b"%PDF-1.4\n%%EOF\n")
|
||||
|
||||
with app.test_client() as client:
|
||||
resp = client.post(
|
||||
"/spreadsheet/excel-to-pdf",
|
||||
data={"files": (io.BytesIO(b"fake-xlsx"), "sample.xlsx")},
|
||||
content_type="multipart/form-data",
|
||||
)
|
||||
|
||||
assert resp.status_code == 200
|
||||
assert resp.headers["X-Conversion-Engine"] == "libreoffice"
|
||||
assert resp.headers["X-Conversion-Quality"] == "high"
|
||||
|
||||
|
||||
def test_capabilities_endpoint_returns_routes():
|
||||
app = _load_app_or_skip()
|
||||
|
||||
with app.test_client() as client:
|
||||
resp = client.get("/capabilities")
|
||||
|
||||
assert resp.status_code == 200
|
||||
data = resp.get_json()
|
||||
assert data["offline"] is True
|
||||
assert "/convert/to-pdf" in data["routes"]
|
||||
@@ -0,0 +1,12 @@
|
||||
from utils.capabilities import QUALITY_HIGH, get_capabilities
|
||||
|
||||
|
||||
def test_capabilities_shape():
|
||||
data = get_capabilities()
|
||||
|
||||
assert data["offline"] is True
|
||||
assert "engines" in data
|
||||
assert "routes" in data
|
||||
assert "libreoffice" in data["engines"]
|
||||
assert data["routes"]["/spreadsheet/excel-to-pdf"]["label"] == "Excel to PDF"
|
||||
assert data["routes"]["/image/svg-to-png"]["quality"] == QUALITY_HIGH
|
||||
@@ -0,0 +1,385 @@
|
||||
"""Local engine detection and conversion metadata helpers.
|
||||
|
||||
The app is offline-first, so high-fidelity conversion depends on tools that
|
||||
are installed on the user's machine. This module centralizes that discovery so
|
||||
routes and the UI agree on what is high fidelity, basic fallback, or missing.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import importlib.util
|
||||
import os
|
||||
import shutil
|
||||
import subprocess
|
||||
import tempfile
|
||||
from pathlib import Path
|
||||
from typing import Iterable
|
||||
|
||||
|
||||
QUALITY_HIGH = "high"
|
||||
QUALITY_BASIC = "basic"
|
||||
QUALITY_UNAVAILABLE = "unavailable"
|
||||
|
||||
|
||||
def find_soffice() -> str | None:
|
||||
"""Detect LibreOffice. PATH first, then common per-OS install locations."""
|
||||
found = shutil.which("soffice") or shutil.which("libreoffice")
|
||||
if found:
|
||||
return found
|
||||
|
||||
import sys
|
||||
|
||||
candidates: list[str] = []
|
||||
if sys.platform == "win32":
|
||||
program_files = [
|
||||
os.environ.get("ProgramFiles", r"C:\Program Files"),
|
||||
os.environ.get("ProgramFiles(x86)", r"C:\Program Files (x86)"),
|
||||
os.environ.get("ProgramW6432", r"C:\Program Files"),
|
||||
]
|
||||
for pf in program_files:
|
||||
if pf:
|
||||
candidates.append(os.path.join(pf, "LibreOffice", "program", "soffice.exe"))
|
||||
candidates.append(os.path.join(pf, "LibreOffice", "program", "soffice.com"))
|
||||
elif sys.platform == "darwin":
|
||||
candidates.append("/Applications/LibreOffice.app/Contents/MacOS/soffice")
|
||||
else:
|
||||
candidates.extend([
|
||||
"/usr/bin/soffice",
|
||||
"/usr/bin/libreoffice",
|
||||
"/usr/local/bin/soffice",
|
||||
"/usr/local/bin/libreoffice",
|
||||
"/opt/libreoffice/program/soffice",
|
||||
"/snap/bin/libreoffice",
|
||||
])
|
||||
|
||||
for candidate in candidates:
|
||||
if candidate and os.path.isfile(candidate):
|
||||
return candidate
|
||||
return None
|
||||
|
||||
|
||||
def _package_available(import_name: str) -> bool:
|
||||
return importlib.util.find_spec(import_name) is not None
|
||||
|
||||
|
||||
def _binary_version(path: str | None, args: Iterable[str]) -> str | None:
|
||||
if not path:
|
||||
return None
|
||||
try:
|
||||
proc = subprocess.run(
|
||||
[path, *args],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=3,
|
||||
)
|
||||
except Exception:
|
||||
return None
|
||||
text = (proc.stdout or proc.stderr or "").strip()
|
||||
return text.splitlines()[0][:160] if text else None
|
||||
|
||||
|
||||
def _binary_engine(engine_id: str, label: str, path: str | None,
|
||||
version_args: Iterable[str], install_hint: str,
|
||||
quality: str = QUALITY_HIGH) -> dict:
|
||||
version_args = list(version_args)
|
||||
return {
|
||||
"id": engine_id,
|
||||
"label": label,
|
||||
"available": bool(path),
|
||||
"path": path,
|
||||
"version": _binary_version(path, version_args) if path and version_args else None,
|
||||
"quality": quality if path else QUALITY_UNAVAILABLE,
|
||||
"install_hint": install_hint,
|
||||
"kind": "binary",
|
||||
}
|
||||
|
||||
|
||||
def _package_engine(engine_id: str, label: str, import_name: str,
|
||||
install_hint: str, quality: str = QUALITY_HIGH) -> dict:
|
||||
available = _package_available(import_name)
|
||||
return {
|
||||
"id": engine_id,
|
||||
"label": label,
|
||||
"available": available,
|
||||
"path": None,
|
||||
"version": None,
|
||||
"quality": quality if available else QUALITY_UNAVAILABLE,
|
||||
"install_hint": install_hint,
|
||||
"kind": "python-package",
|
||||
}
|
||||
|
||||
|
||||
def _combined_package_engine(engine_id: str, label: str, import_names: Iterable[str],
|
||||
install_hint: str, quality: str = QUALITY_HIGH) -> dict:
|
||||
missing = [name for name in import_names if not _package_available(name)]
|
||||
return {
|
||||
"id": engine_id,
|
||||
"label": label,
|
||||
"available": not missing,
|
||||
"path": None,
|
||||
"version": None,
|
||||
"quality": quality if not missing else QUALITY_UNAVAILABLE,
|
||||
"install_hint": install_hint,
|
||||
"kind": "python-package",
|
||||
"missing_packages": missing,
|
||||
}
|
||||
|
||||
|
||||
def _oda_path() -> str | None:
|
||||
return shutil.which("ODAFileConverter") or shutil.which("oda_file_converter")
|
||||
|
||||
|
||||
def get_capabilities() -> dict:
|
||||
soffice = find_soffice()
|
||||
ffmpeg = shutil.which("ffmpeg")
|
||||
ffprobe = shutil.which("ffprobe")
|
||||
tesseract = shutil.which("tesseract")
|
||||
oda = _oda_path()
|
||||
|
||||
engines = {
|
||||
"libreoffice": _binary_engine(
|
||||
"libreoffice",
|
||||
"LibreOffice",
|
||||
soffice,
|
||||
["--version"],
|
||||
"Install LibreOffice locally, then restart this app.",
|
||||
),
|
||||
"ffmpeg": _binary_engine(
|
||||
"ffmpeg", "FFmpeg", ffmpeg, ["-version"],
|
||||
"Install FFmpeg locally and make sure it is on PATH.",
|
||||
),
|
||||
"ffprobe": _binary_engine(
|
||||
"ffprobe", "FFprobe", ffprobe, ["-version"],
|
||||
"Install FFmpeg locally; ffprobe ships with it.",
|
||||
),
|
||||
"tesseract": _binary_engine(
|
||||
"tesseract", "Tesseract OCR", tesseract, ["--version"],
|
||||
"Install the Tesseract binary and required language packs.",
|
||||
),
|
||||
"oda": _binary_engine(
|
||||
"oda", "ODA File Converter", oda, [],
|
||||
"Install ODA File Converter for DWG support.",
|
||||
),
|
||||
"pymupdf": _package_engine(
|
||||
"pymupdf", "PyMuPDF", "fitz",
|
||||
"Install PyMuPDF with pip install PyMuPDF.",
|
||||
),
|
||||
"pdf2docx": _package_engine(
|
||||
"pdf2docx", "pdf2docx", "pdf2docx",
|
||||
"Install pdf2docx with pip install pdf2docx.",
|
||||
quality="medium",
|
||||
),
|
||||
"pdfplumber": _package_engine(
|
||||
"pdfplumber", "pdfplumber", "pdfplumber",
|
||||
"Install pdfplumber with pip install pdfplumber.",
|
||||
quality="medium",
|
||||
),
|
||||
"marker": _package_engine(
|
||||
"marker", "Marker PDF", "marker",
|
||||
"Install marker-pdf locally; first use downloads local model weights.",
|
||||
),
|
||||
"pytesseract": _package_engine(
|
||||
"pytesseract", "pytesseract", "pytesseract",
|
||||
"Install pytesseract with pip install pytesseract.",
|
||||
),
|
||||
"pyzbar": _package_engine(
|
||||
"pyzbar", "pyzbar", "pyzbar",
|
||||
"Install pyzbar and the local ZBar shared library.",
|
||||
),
|
||||
"rembg": _combined_package_engine(
|
||||
"rembg", "rembg", ["rembg", "onnxruntime"],
|
||||
'Install rembg with CPU support: pip install "rembg[cpu]".',
|
||||
),
|
||||
"pillow-heif": _package_engine(
|
||||
"pillow-heif", "pillow-heif", "pillow_heif",
|
||||
"Install pillow-heif with pip install pillow-heif.",
|
||||
),
|
||||
"whisper": _package_engine(
|
||||
"whisper", "Whisper", "whisper",
|
||||
"Install Whisper with pip install openai-whisper.",
|
||||
),
|
||||
"python-pptx": _package_engine(
|
||||
"python-pptx", "python-pptx", "pptx",
|
||||
"Install python-pptx with pip install python-pptx.",
|
||||
quality="medium",
|
||||
),
|
||||
}
|
||||
|
||||
return {
|
||||
"offline": True,
|
||||
"engines": engines,
|
||||
"routes": _route_statuses(engines),
|
||||
}
|
||||
|
||||
|
||||
ROUTE_REQUIREMENTS = {
|
||||
"/convert/to-pdf": {
|
||||
"label": "Files to PDF",
|
||||
"primary": ["libreoffice"],
|
||||
"fallback": "Basic Python renderer for images, text, and simple DOCX.",
|
||||
},
|
||||
"/convert/html-to-pdf": {
|
||||
"label": "HTML to PDF",
|
||||
"primary": ["libreoffice"],
|
||||
"fallback": "Basic PyMuPDF HTML renderer.",
|
||||
},
|
||||
"/spreadsheet/excel-to-pdf": {
|
||||
"label": "Excel to PDF",
|
||||
"primary": ["libreoffice"],
|
||||
"fallback": "Basic ReportLab table renderer.",
|
||||
},
|
||||
"/convert/pdf-to-word": {
|
||||
"label": "PDF to Word",
|
||||
"primary_any": ["pdf2docx", "marker", "pymupdf"],
|
||||
"fallback": "Visual-copy and flowing-text modes remain local fallbacks.",
|
||||
},
|
||||
"/convert/pdf-to-excel": {
|
||||
"label": "PDF to Excel",
|
||||
"primary_any": ["pdfplumber", "pymupdf"],
|
||||
"fallback": "PyMuPDF table detection.",
|
||||
},
|
||||
"/convert/pdf-to-pptx": {
|
||||
"label": "PDF to PowerPoint",
|
||||
"primary": ["libreoffice"],
|
||||
"fallback": "Image-per-slide PowerPoint output.",
|
||||
},
|
||||
"/convert/pptx-to-pdf": {
|
||||
"label": "PowerPoint to PDF",
|
||||
"primary": ["libreoffice"],
|
||||
"fallback": None,
|
||||
},
|
||||
"/convert/ocr-pdf": {
|
||||
"label": "OCR PDF",
|
||||
"primary": ["tesseract", "pytesseract"],
|
||||
"fallback": None,
|
||||
},
|
||||
"/image/svg-to-png": {
|
||||
"label": "SVG to PNG",
|
||||
"primary": [],
|
||||
"fallback": "Browser canvas renderer; server svglib renderer remains available as fallback.",
|
||||
},
|
||||
"/image/ocr": {
|
||||
"label": "Image OCR",
|
||||
"primary": ["tesseract", "pytesseract"],
|
||||
"fallback": None,
|
||||
},
|
||||
"/media/convert-audio": {"label": "Convert Audio", "primary": ["ffmpeg"], "fallback": None},
|
||||
"/media/convert-video": {"label": "Convert Video", "primary": ["ffmpeg"], "fallback": None},
|
||||
"/media/extract-audio": {"label": "Extract Audio", "primary": ["ffmpeg"], "fallback": None},
|
||||
"/media/trim": {"label": "Trim Media", "primary": ["ffmpeg"], "fallback": None},
|
||||
"/media/compress-video": {"label": "Compress Video", "primary": ["ffmpeg"], "fallback": None},
|
||||
"/media/video-to-gif": {"label": "Video to GIF", "primary": ["ffmpeg"], "fallback": None},
|
||||
"/media/burn-subtitles": {"label": "Burn Subtitles", "primary": ["ffmpeg"], "fallback": None},
|
||||
"/media/normalize-audio": {"label": "Normalize Audio", "primary": ["ffmpeg"], "fallback": None},
|
||||
"/media/transcribe": {"label": "Speech to Text", "primary": ["ffmpeg", "whisper"], "fallback": None},
|
||||
}
|
||||
|
||||
|
||||
def _route_statuses(engines: dict) -> dict:
|
||||
statuses = {}
|
||||
for endpoint, req in ROUTE_REQUIREMENTS.items():
|
||||
primary = req.get("primary", [])
|
||||
primary_any = req.get("primary_any", [])
|
||||
if primary:
|
||||
available = all(engines[e]["available"] for e in primary if e in engines)
|
||||
elif primary_any:
|
||||
available = any(engines[e]["available"] for e in primary_any if e in engines)
|
||||
else:
|
||||
available = True
|
||||
|
||||
if available:
|
||||
quality = QUALITY_HIGH
|
||||
status = "High fidelity"
|
||||
elif req.get("fallback"):
|
||||
quality = QUALITY_BASIC
|
||||
status = "Basic fallback"
|
||||
else:
|
||||
quality = QUALITY_UNAVAILABLE
|
||||
status = "Unavailable"
|
||||
|
||||
missing = [
|
||||
e for e in [*primary, *primary_any]
|
||||
if e in engines and not engines[e]["available"]
|
||||
]
|
||||
statuses[endpoint] = {
|
||||
"label": req["label"],
|
||||
"quality": quality,
|
||||
"status": status,
|
||||
"required_engines": primary or primary_any,
|
||||
"missing_engines": missing,
|
||||
"fallback": req.get("fallback"),
|
||||
}
|
||||
return statuses
|
||||
|
||||
|
||||
def set_conversion_metadata(response, engine: str, quality: str,
|
||||
warnings: str | Iterable[str] | None = None):
|
||||
response.headers["X-Conversion-Engine"] = engine
|
||||
response.headers["X-Conversion-Quality"] = quality
|
||||
if warnings:
|
||||
if isinstance(warnings, str):
|
||||
warning_text = warnings
|
||||
else:
|
||||
warning_text = "; ".join(str(w) for w in warnings if w)
|
||||
if warning_text:
|
||||
response.headers["X-Fidelity-Warnings"] = warning_text[:1000]
|
||||
return response
|
||||
|
||||
|
||||
def metadata_payload(data: dict | None = None, *, engine: str, quality: str,
|
||||
warnings: Iterable[str] | str | None = None) -> dict:
|
||||
payload = dict(data or {})
|
||||
payload["engine"] = engine
|
||||
payload["quality"] = quality
|
||||
if warnings:
|
||||
payload["warnings"] = [warnings] if isinstance(warnings, str) else list(warnings)
|
||||
return payload
|
||||
|
||||
|
||||
def soffice_convert(file_data: bytes, source_ext: str, target_ext: str = "pdf",
|
||||
timeout: int = 180) -> bytes | None:
|
||||
"""Run LibreOffice headless conversion with an isolated user profile."""
|
||||
soffice = find_soffice()
|
||||
if not soffice:
|
||||
return None
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
tmp_path = Path(tmp)
|
||||
profile_dir = tmp_path / "lo-profile"
|
||||
profile_dir.mkdir(parents=True, exist_ok=True)
|
||||
in_path = tmp_path / f"input.{source_ext.lstrip('.').lower()}"
|
||||
in_path.write_bytes(file_data)
|
||||
|
||||
profile_uri = profile_dir.resolve().as_uri()
|
||||
cmd = [
|
||||
soffice,
|
||||
f"-env:UserInstallation={profile_uri}",
|
||||
"--headless",
|
||||
"--nologo",
|
||||
"--nofirststartwizard",
|
||||
"--norestore",
|
||||
"--convert-to",
|
||||
target_ext,
|
||||
"--outdir",
|
||||
str(tmp_path),
|
||||
str(in_path),
|
||||
]
|
||||
try:
|
||||
proc = subprocess.run(
|
||||
cmd,
|
||||
capture_output=True,
|
||||
timeout=timeout,
|
||||
)
|
||||
except (subprocess.TimeoutExpired, FileNotFoundError):
|
||||
return None
|
||||
if proc.returncode != 0:
|
||||
return None
|
||||
|
||||
candidates = [
|
||||
p for p in tmp_path.iterdir()
|
||||
if p.is_file() and p.suffix.lower() == f".{target_ext.lower()}"
|
||||
]
|
||||
if not candidates:
|
||||
return None
|
||||
candidates.sort(key=lambda p: p.stat().st_mtime, reverse=True)
|
||||
return candidates[0].read_bytes()
|
||||
@@ -0,0 +1,28 @@
|
||||
"""PyMuPDF import guard.
|
||||
|
||||
PyMuPDF's import name is `fitz`, but there is also an unrelated PyPI package
|
||||
named `fitz` that imports `frontend`/Starlette and crashes at startup. Keep the
|
||||
diagnostic in one place so users get actionable setup instructions.
|
||||
"""
|
||||
|
||||
|
||||
def import_pymupdf():
|
||||
try:
|
||||
import fitz # type: ignore
|
||||
except Exception as exc:
|
||||
raise RuntimeError(
|
||||
"PyMuPDF is required, but Python could not import the correct 'fitz' module. "
|
||||
"Use the project virtual environment (run run.bat on Windows), or fix this "
|
||||
"Python environment with: python -m pip uninstall -y fitz frontend && "
|
||||
"python -m pip install --upgrade PyMuPDF"
|
||||
) from exc
|
||||
|
||||
if not hasattr(fitz, "open") or not hasattr(fitz, "Document"):
|
||||
path = getattr(fitz, "__file__", "unknown location")
|
||||
raise RuntimeError(
|
||||
"Python imported a package named 'fitz', but it is not PyMuPDF "
|
||||
f"({path}). Uninstall the wrong package and install PyMuPDF: "
|
||||
"python -m pip uninstall -y fitz frontend && "
|
||||
"python -m pip install --upgrade PyMuPDF"
|
||||
)
|
||||
return fitz
|
||||
Reference in New Issue
Block a user