Commit Graph

7 Commits

Author SHA1 Message Date
Maurice 093e069ccc Backend/frontend hardening & consistency cleanups (#1113)
* refactor(auth): session token validation and password-change consistency

* refactor(journey): entry field allow-list and public share-link consistency

* refactor(mcp): align tool authorization with the REST permission checks

* chore: input validation and sanitisation touch-ups (uploads, pdf, maps, backup, csp)
2026-06-06 16:37:03 +02:00
Maurice 20791a29a7 Migrate TREK 3 to NestJS + React 19 (shared Zod contracts) (#1087)
* Migrate TREK 3 to NestJS + React 19 with a shared Zod contract layer

Brownfield strangler migration of the backend onto NestJS modules
(auth, trips, days, places, assignments, packing, todo, budget,
reservations, collab, files, photos, journey, share, settings, backup,
oidc, oauth, admin, atlas, vacay, weather, airports, maps, categories,
tags, notifications, system-notices) served through a per-prefix
dispatcher, keeping the existing SQLite/better-sqlite3 DB and JWT
httpOnly cookie auth, with behavioural parity for every route.

Client: React 19 upgrade, "page = wiring container + data hook"
pattern across all pages, per-domain Zustand stores bound to
@trek/shared contracts, and decomposition of the large components
(DayPlanSidebar, PackingListPanel, CollabNotes, FileManager,
MemoriesPanel, PlacesSidebar, CollabChat, SystemNoticeModal,
BudgetPanel, PlaceFormModal, ...) into focused render units backed by
in-file hooks.

Apply the shared global request pipeline (helmet/CSP, CORS, HSTS,
forced HTTPS, the global MFA policy and request logging) to the NestJS
instance as well, so a migrated route is protected identically to the
legacy fallback rather than bypassing it.

* Finish the NestJS migration — drop the legacy Express app

NestJS now serves the whole surface: every /api domain plus the platform
routes (uploads, /mcp, the OAuth/MCP SDK + /.well-known metadata and the
production SPA fallback). Removed server/src/app.ts, all of
server/src/routes/* and the strangler dispatcher; index.ts and the
integration suite share a single buildApp() bootstrap so prod and tests
can't drift.

- Platform/transport routes extracted to nest/platform/platform.routes.ts
  and mounted before app.init() — Nest's router answers an unmatched
  request with a 404, so a route registered after init is never reached.
  The SPA fallback is a NotFoundException filter and the catch-all uses a
  RegExp (Express 5's path-to-regexp rejects a bare '*').
- New modules: memories (/api/integrations/memories — the Journey
  gallery's Immich/Synology proxy), addons (GET /api/addons) and the
  cross-trip GET /api/reservations/upcoming.
- TrekExceptionFilter reproduces the old multer / err.statusCode handling
  so upload rejections keep their 400/413 { error } body and non-ASCII
  filenames survive (defParamCharset).
- addTripToJourney and the MCP get_journey_share_link tool gained the
  trip-access check they were missing.
- Re-pointed the 34 integration tests + the websocket test onto the Nest
  app; removed the now-meaningless Express-vs-Nest parity tests and a few
  orphaned client components.

* Restore the reset-password rate limit and fix copyTrip reservation links

Two correctness/security gaps the NestJS migration introduced:

- POST /api/auth/reset-password lost its per-IP rate limiter. Restore it
  (5 attempts / 15 min on a dedicated bucket, same as the old resetLimiter)
  so reset tokens can't be brute-forced unthrottled. Covered by AUTH-019.
- copyTripById did not copy reservations.end_day_id (a day reference — now
  remapped through dayMap like day_id) or needs_review, so a duplicated trip
  lost multi-day transport end-day links and reset the review flag.

* Clean up dead code, dedupe helpers, fix the reset-password contract

- Remove server exports orphaned by the Express removal: the immich
  album-link helpers, seven route-only service exports, getFileByIdFull;
  de-export internal-only helpers (utcSuffix).
- De-duplicate verifyTripAccess (9 identical copies -> services/tripAccess.ts)
  and avatarUrl (3 -> services/avatarUrl.ts); name the bcrypt cost
  (BCRYPT_COST) and the email regex (EMAIL_REGEX). Public API unchanged.
- resetPasswordRequestSchema declared `password`, but the client sends and
  the service reads `new_password` — rename it so the contract matches and
  the client types resolve.
- Make ATLAS-013 deterministic: stub the admin-1 GeoJSON download instead of
  fetching ~4600 features from GitHub during the test (it hung the suite).

* Make the client typecheck runnable (vitest/vite ambient types)

The client had no `typecheck` script and tsc couldn't even start (the
baseUrl deprecation errored out, same as server/shared already silence).
Add `ignoreDeprecations: "6.0"` to match the other workspaces, a `typecheck`
npm script, and a src/vite-env.d.ts referencing vite/client + vitest/globals
so tsc knows the test globals (describe/it/expect/vi). This turns ~3600
phantom "Cannot find name" errors into a real, measurable count (~590 actual
type errors remain, to be worked down). Type-only; no runtime change.

* Derive client domain types from the shared schema contracts

Add entity/response Zod schemas to @trek/shared (place, trip, assignment, day, budget, packing, reservation), each matched against the producing server service, and re-export them from client types.ts instead of the hand-written duplicates that had drifted (name/title, amount/total_price, owner_id/user_id, cover_url/cover_image, ...). Updates the call sites and test fixtures the corrected types surfaced; type-only, no runtime behaviour change.

* chore(db): log swallowed errors in addon-disable migration + guard against destructive migrations

The migration that disables the legacy "memories" addon swallowed any
error in an empty catch, as did ~30 other catch blocks in the migration
runner (column adds, the journey rebuild, index probes). Replace each
silent catch with the existing console.warn('[migrations] ...') log so
failures are visible. Control flow is unchanged: every step stays
non-fatal, nothing new is thrown.

Add a static guardrail test that scans the migration source and fails
when a new destructive statement (DROP TABLE / DROP COLUMN / TRUNCATE /
DELETE FROM / ALTER ... DROP) appears outside a reviewed allowlist, and
when an empty/silent catch block is reintroduced. The existing
destructive statements are all legitimate table rebuilds or
bounded cleanups and are recorded in the allowlist with a reason.

* Re-check SSRF on every redirect hop when resolving short links

Replace the one-shot checkSsrf + fetch(redirect:'follow') in the maps and place short-link resolvers with safeFetchFollow, which follows redirects manually and re-runs checkSsrf against the DNS-pinned IP of each hop (max 5). A redirect to an internal/loopback address is now blocked even when the initial URL is public, while legitimate cross-host redirects (goo.gl -> maps.google.com) still resolve.

* Reject WebSocket tokens minted before a password change

Stamp the user's password_version onto the ephemeral ws token and verify it on connect, closing the socket (4001) when it no longer matches, so a token issued before a password reset can't be replayed. Tokens minted without a version are treated as version 0, matching the JWT pv-claim semantics.

* fix(i18n): guard locale key parity and finish the OAuth consent page strings

Every non-en locale now exposes the exact same flat key set as en. Keys that
had drifted out of sync are backfilled with the English source value (tagged
en-fallback) so t() resolves a real string instead of relying on the silent
runtime fallback; no existing translation was touched and no key was removed.

Add a parity test that imports each aggregated locale bundle and asserts its
key set matches en, with a diagnostic listing of any missing/extra keys. This
complements the file-level check in shared/scripts by guarding the merged
export the app actually serves.

Finish internationalising OAuthAuthorizePage: the ~15 remaining hardcoded
English chrome strings now go through oauth.authorize.* keys (English source
in en, en-fallback placeholders elsewhere). Markup and behaviour are unchanged.

* Add semantic theme color tokens to Tailwind

Map the CSS theme variables from src/index.css (:root light / .dark dark) to named Tailwind utilities — bg-surface, text-content, border-edge, bg-accent and their variants. This gives components a Tailwind-native target for the theme colors so we can replace inline `style={{ ... 'var(--...)' }}` with utility classes without changing the rendered values.

* Surface silent store failures to the user and validate API responses in dev

Reservation toggle, todo/packing toggle and budget reorder were swallowing API errors after rolling back, so the user saw the change silently snap back with no explanation. Route those failures through the existing toast channel (new store/notify.ts bridges to window.__addToast, the same channel SystemNoticeBanner uses); the reservation toggle re-throws so ReservationsPanel's own translated toast finally fires. Also wire the existing parseInDev/checkInDev response validation into the maps and notification-test endpoints to catch contract drift in dev.

* Migrate static theme inline styles to Tailwind utilities and extract page sub-components

Replace the static, color-only inline `style={{ ... 'var(--bg-primary)' ... }}` props with the new semantic Tailwind utilities (bg-surface, text-content, border-edge, ...) wherever the result is byte-identical; dynamic/conditional theme styles and hardcoded status colors are left inline. Extract the Atlas country-search autocomplete, the Admin update banner, and two Journey dialogs into their own presentational components to shrink the oversized page files, keeping behaviour and markup identical.

* Remove the unrouted photos page and its dead photo components

PhotosPage was never wired into the router and its usePhotos hook read a tripStore photos slice that was never implemented; the Photos gallery, lightbox and upload components were only reachable through it. Per-trip photos now live in the Journey gallery (Immich/Synology). Removed the dead page, hook and components — the live Journey PhotoLightbox is a separate component and stays.

* Resolve the remaining client type errors and the trip.title navbar bug

Drive the client typecheck to zero without any/ts-ignore: convert the tripId route param to a number once at the page boundary so it matches the numeric props and store actions it feeds, fix trip.name -> trip.title (the wire field is title, so the old read rendered blank in the files/offline views), and tighten the scattered handler-arity, DOM-cast and untyped-payload sites. No runtime behaviour change.

* Convert the remaining dynamic and hardcoded inline styles to Tailwind utilities

Second styling pass over the components and pages: move conditional theme colors into className ternaries (bg-accent / bg-surface-hover etc.), turn reused CSSProperties constants into className constants, and express static hardcoded hex/rgba colors as Tailwind arbitrary values so the exact rendered colour is preserved. Truly dynamic styling (computed geometry, gradients, multi-part shadows, data-driven colours, the undefined --sidebar/--nav layout vars) stays inline as it cannot be expressed as a static class. Updated three component tests that asserted the old inline active-state styles to assert the equivalent utility class instead.

Verified: client typecheck 0, full client suite green, and a live light/dark render check in the dev server confirms the semantic theme tokens resolve correctly (the earlier 'transparent popups' were a stale dev server that pre-dated the tailwind.config token addition, not a code issue).

* Add eslint flat-config for client and server and gate typecheck, lint and pages in CI

client and server had lint scripts but no eslint config (only shared was linted in CI). Add flat configs mirroring shared's stack (js + typescript-eslint recommended + eslint-config-prettier) plus the client's react-hooks/react-refresh plugins. Pre-existing patterns in this never-linted code (explicit any, require() in the CommonJS server, empty catches, exhaustive-deps) are set to 'warn' rather than 'error' so the gate passes at 0 errors without a repo-wide reformat — these can be ratcheted to errors over time. Wire blocking typecheck + lint + lint:pages steps into the client and server CI jobs (now that both typechecks are clean) and promote the server typecheck from informational to blocking.

* Decompose the remaining God Components into hooks, helpers and sub-components

FE6: split the oversized page and panel components into thin layout shells plus colocated use<Component> hooks, .constants.ts, .helpers.ts (with tests) and presentational sub-components, following the established 'logic in a hook, render in slices' pattern. Behaviour, markup, classes and effect order are unchanged. Largest reductions: PackingListPanel 1598->42, FileManager 1055->36, AdminPage 1525->167, BudgetPanel 1266->146, JourneyDetailPage 2822->547, PlacesSidebar 945->66, CollabChat 861->106, CollabNotes 1417->532. DayPlanSidebar's drag-and-drop render body was left intact (ref-identity sensitive) and only its toolbar/modals/constants were extracted.

* Fix duplicate React keys in the file-assign place list

When a place is assigned to the same day more than once it appeared twice in a day's list, so the place-button key={p.id} collided and React warned about duplicate keys. Key by place id + render index so siblings stay unique. Pre-existing in the old FileManager; behaviour unchanged.

* Format the shared package and drop an unused import to satisfy the lint gate

The i18n and schema changes added code that wasn't prettier-formatted, and place.schema.ts imported categorySchema without using it. Run prettier over shared and remove the import so 'npm run lint' + 'format:check' pass.

* Install all workspaces in the server CI job so SWC's native binary is present

The server vitest config transforms via unplugin-swc, which needs @swc/core's platform-specific native binary. A workspace-scoped 'npm ci --workspace server' skips that optional dependency, so vitest failed to load the config on the Linux runner. Use a full 'npm ci'.

* Re-resolve dependencies with npm install in the server CI job for SWC

Full 'npm ci' still skipped @swc/core's Linux native binary because the committed lockfile was generated on Windows and lacks the Linux optional-dep install metadata. 'npm install' re-resolves and fetches the platform-matching binary, which the server's unplugin-swc transform needs to load vitest.config.ts.

* Install @swc/core's Linux binary explicitly in the server CI job

Neither npm ci nor npm install fetched @swc/core-linux-x64-gnu on the Linux runner because the lockfile was generated on Windows and lacks the Linux optional-dep metadata. Add a step that installs the matching @swc/core-linux-x64-gnu version (no-save, no-lockfile) so unplugin-swc can load the server's vitest config.

* Use legacy-peer-deps when installing the SWC Linux binary in CI

The explicit @swc/core-linux-x64-gnu install re-resolved the tree and hit the pre-existing lucide-react/react-19 peer conflict that the lockfile was generated around. Add --legacy-peer-deps so the step matches the project's resolution and installs the binary.

* Keep the lockfile when installing the SWC binary so other deps stay pinned

Dropping --no-package-lock made npm re-resolve the whole tree and upgrade eslint, whose newer recommended config flagged no-useless-assignment as an error in the server lint step. Keep the lockfile so only @swc/core-linux-x64-gnu is added and every other dependency (incl. eslint) stays at its locked version.
2026-05-31 21:10:00 +02:00
jubnl 5eaf7492dc fix(backups,files): auto-backups rejected by validator; trip file download broken after cookie migration
Fixes #773: isValidBackupFilename regex anchored to ^backup- rejected all
auto-backup-* filenames, causing 400 on download/restore/delete. Broadened
to ^(?:auto-)?backup-.

Fixes #774: three regressions in the trip Files tab —
- openFile import shadowed by a local function of the same name inside
  FileManager; PDF preview modal was calling the local with a URL string,
  corrupting state and crashing on the second click (mime_type read on
  undefined). Fixed by aliasing the import as openFileUrl.
- GET /:id/download used a bespoke authenticateDownload that checked only
  Bearer header and ?token= query param, ignoring the trek_session cookie.
  After the JWT-to-cookie migration the client sends cookies only, so every
  download silently 401-ed. Extended authenticateDownload to accept req and
  check cookie → Bearer → query token in priority order.
- files.download and files.openError translation keys were missing from all
  15 locale files; t() was returning the raw key as a truthy string,
  defeating the || 'Download' fallback.
2026-04-21 11:18:17 +02:00
Maurice 292e443dbe security: address silent-failure review findings on top of batch 1
Second-pass fixes caught by a self-review after the initial commit — each
one would have undermined a fix from the previous commit.

- mfaPolicy now goes through `verifyJwtAndLoadUser` too. Without this,
  a JWT stolen before a password reset still satisfied `require_mfa`
  until its natural 24h expiry, defeating the whole point of the
  password_version bump.
- Drop the `?? keys[0]` fallback in OIDC JWKS key selection. When the
  token carries a `kid` that is not in the current JWKS, refuse
  outright instead of picking an arbitrary key and letting the
  signature check produce a generic failure — the real failure mode
  deserves a specific error code.
- Tighten OAuth DCR custom-scheme rule so `javascript:`, `data:`,
  `vbscript:`, `file:`, `blob:`, `about:`, `chrome:` are all rejected.
  Previously the catch-all "not http/https" check admitted them; the
  authorize flow later 302s the browser to whatever is registered,
  which with a `javascript:` URI would execute attacker script on
  redirect. Also require the private-use scheme body to be reverse-DNS
  (contain a dot), matching RFC 8252 §7.1.
- permanentDeleteFile / emptyTrash only delete the trip_files row when
  the on-disk unlink actually succeeded. Previously Promise.all
  swallowed individual unlink failures and DELETE ran unconditionally,
  so a permission / ENOSPC failure would orphan bytes on disk.
- restoreFromZip also invalidates the permissions cache in the outer
  catch. If extraction threw before the DB swap even started, the
  cache wasn't stale, but belt-and-braces is cheap and guarantees no
  failed-restore path leaves stale cache behind.
2026-04-20 20:44:57 +02:00
Maurice 2d0414b4a3 security: internal audit — batch 1
Fixes the critical + high + medium findings from our internal security
review. Bundled into one PR because the changes overlap heavily (JWT
verification unifies across three call sites; backup-code hashing and
demo-email handling cross-cut several services); splitting them out
would mean redundant reviews of the same files.

Critical
- CI-C1 — .github/workflows/test.yml: restore actions/{checkout,setup-
  node,upload-artifact} to @v4. The @v6 refs don't exist, so the test
  workflow was errorring before a single test ran.
- SEC-C1 — mfaPolicy now extracts the token via extractToken() (cookie-
  first, Bearer fallback). Previously it only read Authorization, so
  every cookie-authenticated SPA session bypassed require_mfa entirely.
- SEC-C2/C4/C6 — all JWT verification paths (MCP bearer, file download,
  photo route) now go through the shared verifyJwtAndLoadUser that
  checks password_version. resetPassword additionally deletes every
  mcp_tokens row and marks outstanding oauth_tokens revoked, so a
  password reset invalidates ALL credential classes — not just the
  cookie JWT.

High
- SEC-H2 — reset email URL is built from server-side APP_URL /
  ALLOWED_ORIGINS (via existing getAppUrl()), not request headers.
  Closes the host-header-injection vector into reset links.
- SEC-H3 — OIDC findOrCreateUser wraps the invite-redemption UPDATE +
  user INSERT in a transaction. The UPDATE is the capacity check; if
  a concurrent callback takes the last slot, the whole transaction
  aborts with registration_disabled instead of double-creating users.
- SEC-H4 — new verifyIdToken() performs full JWT signature
  verification via the provider's JWKS (Node's crypto.createPublicKey
  accepts JWK directly — no extra dependency), plus iss/aud/exp
  checks. The callback also rejects the login when userinfo.sub does
  not match id_token.sub.
- SEC-H5 — OAuth DCR now validates redirect_uris against an allowlist
  of schemes: https, http-loopback, or a private custom scheme. Plain
  http://non-loopback is rejected.
- SEC-H6 — oauthService audience defaults to mcpResource when the
  `resource` parameter is missing, so tokens are always audience-bound
  to /mcp instead of being issued with audience=null.
- SEC-H7 — HSTS is enabled any time NODE_ENV=production (previously
  required FORCE_HTTPS=true), includeSubDomains defaults on and can
  be disabled with HSTS_INCLUDE_SUBDOMAINS=false.
- SEC-H8 — trek_session cookie Secure flag is also driven by
  req.secure (which Express resolves from X-Forwarded-Proto once
  trust proxy is set), so instances behind a TLS-terminating proxy
  get Secure cookies without needing FORCE_HTTPS.

Medium
- SEC-M1 — permanentDeleteFile / emptyTrash / avatar unlink now use
  fs.promises.rm with { force: true } (one async op vs the previous
  existsSync + unlinkSync pair per file).
- SEC-M2 — invalidatePermissionsCache() is called inside restoreFromZip
  so a restored DB with different permission rows is honoured
  immediately.
- SEC-M3 + C1 — idempotency store bounds the key at 128 chars, caches
  only responses ≤ 256 KiB, and scopes the lookup by (key, user_id,
  method, path) rather than (key, user_id). Same key replayed against
  a different endpoint no longer returns a stale unrelated body.
- SEC-M4 — share_tokens gets an expires_at column; new tokens default
  to 90-day TTL, expired tokens are denied at lookup. Existing tokens
  stay NULL = no expiry so already-published links don't break.
- SEC-M5 — /uploads/photos/:filename now resolves the photo to its
  trip_id and requires the share token to cover THAT trip. Previously
  any share token for any trip would unlock any photo filename.
- SEC-M6 — BLOCKED_EXTENSIONS is the single source of truth shared
  between fileService and collab uploads. The '*' allowed_file_types
  wildcard now still rejects executables/scripts.
- SEC-M7 — single DEMO_EMAILS constant (services/demo.ts) used by
  demoUploadBlock, mfaPolicy, and every demo-mode guard in
  authService. The old demoUploadBlock only matched 'demo@nomad.app'
  so the seed 'demo@trek.app' could in fact upload in demo mode.
- SEC-M8 — MFA backup codes are now bcrypt-hashed at rest
  (hashBackupCodeBcrypt). matchBackupCode accepts both bcrypt and
  legacy SHA-256 hex hashes, so existing installs keep working until
  the user regenerates codes via enableMfa.
- SEC-M9 — document the "security via UUID v4 filename" model for
  /uploads/avatars|covers|journey. Requires no code change but
  captures the decision so future reviewers don't re-flag it.
- SEC-M10 — already covered by the resetPassword revocation logic
  above: mcp_tokens DELETE + oauth_tokens UPDATE … SET revoked_at.

Performance
- PERF-H1 — new migration adds the indexes flagged in the audit:
  trips(user_id), trips(created_at DESC), photos(day_id),
  photos(place_id), reservations(day_id), share_tokens(token), plus
  conditional day_accommodations and notifications indexes depending
  on which columns are present.

Tests
- tests/integration/oidc.test.ts now mocks verifyIdToken and passes
  an id_token in the exchangeCodeForToken stub for the three flows
  that exercise a successful callback. The three remaining failures
  tests pointed out were all pre-existing (file-upload flakes +
  notificationPreferences event_types count drift), none introduced
  by this PR.
2026-04-20 20:36:52 +02:00
jubnl 6a23118342 fix(notifications): fix SMTP error surfacing, webhook button label, backup timestamp
- testSmtp now surfaces real nodemailer error instead of generic 'SMTP not configured' on send failure
- admin webhook test button uses correct i18n key (was showing 'Test-E-Mail senden' in all languages)
- backup created_at uses stat.mtime instead of unreliable stat.birthtime on Linux
2026-04-14 16:20:52 +02:00
Maurice 979322025d refactor: extract business logic from routes into reusable service modules 2026-04-02 17:14:53 +02:00