Files
TREK/wiki/Troubleshooting.md
T
Julien G. 25f326a659 v3.0.16 — bug fixes (#964)
* fix(mcp): MCP RFC compliant for more strict clients

* fix(mcp): serve flat /.well-known/oauth-protected-resource for ChatGPT reconnect

Clients such as ChatGPT probe the flat well-known URL on every fresh discovery
cycle (i.e. after a full disconnect/reconnect where cached OAuth state is cleared).
The SDK's mcpAuthMetadataRouter only serves the path-based form
/.well-known/oauth-protected-resource/mcp, so the flat probe returned 404.

Without the resource metadata, ChatGPT fell back to the issuer URL as the
resource parameter (https://…/ instead of https://…/mcp). The authorize handler
then rejected it with invalid_target and redirected back to ChatGPT's callback
with an error — showing the user the TREK home page instead of the consent form.

Add an explicit GET handler for the flat URL that returns the same protected
resource metadata, so the resource URI is discovered correctly on the first probe.

* fix(mcp): fix OAuth popup blank page — SW denylist and COOP header

Service worker was intercepting /oauth/authorize navigate requests
(not in denylist), serving index.html, and React Router's catch-all
redirected to / instead of the SDK authorize handler.

Helmet's default COOP: same-origin isolated the /oauth/consent popup
from its cross-origin opener, making window.opener null and breaking
the popup-based OAuth completion signal for ChatGPT and similar clients.

* fix(ntfy): encode non-Latin-1 header values with RFC 2047 to prevent ByteString crash

Todo/trip names containing chars like → or € (and non-Latin-1 locale templates
for Czech, Chinese, Russian, etc.) caused the Fetch API to throw when setting
the ntfy Title header. Apply RFC 2047 base64 encoded-word encoding for any
header value containing chars above U+00FF; ntfy decodes this automatically.

* docs(mcp): document Cloudflare bot detection blocking ChatGPT MCP requests

Add Cloudflare WAF note to MCP-Setup and a full troubleshooting entry covering
root cause (IP reputation + UA heuristics), free-plan limitation (disable Bot
Fight Mode entirely, with explicit warning), and paid-plan WAF skip rule with
the full expression syntax and path table for all MCP/OAuth/.well-known routes.

* fix(pwa): detect upstream proxy auth challenges and recover gracefully

Behind Cloudflare Zero Trust or Pangolin, cross-origin auth redirects on
/api/* calls surface as CORS errors (error.response === undefined) that
the existing 401 interceptor never catches, leaving the PWA stuck with
network-error toasts instead of re-authenticating.

New connectivity module probes /api/health every 30s using fetch with
cache:no-store and inspects Content-Type to reliably detect whether the
server is reachable vs intercepted by an upstream proxy.

axios interceptor changes:
- On !error.response + navigator.onLine: run probeNow(); if the health
  probe also fails (proxy is intercepting all requests), trigger a guarded
  window.location.reload() so the edge proxy can intercept the top-level
  navigation and run its auth flow (covers CF Access and Pangolin 302 mode)
- On error.response status 401 with text/html body: same reload path,
  covering Pangolin header-auth extended compatibility mode which returns
  401+HTML instead of a 302 redirect. TREK own 401s are always JSON so
  there is no collision with the existing AUTH_REQUIRED branch.
- sessionStorage flag prevents reload loops; cleared on any successful
  response so the guard resets after re-auth.

/api/health excluded from SW NetworkFirst cache (vite.config.js regex)
and Cache-Control: no-store added server-side so probes always hit the
network and cannot be served stale from the 24h api-data cache.

LoginPage caches last-known appConfig in localStorage so the SSO button
renders in OIDC+UN/PW dual mode even when the config fetch is intercepted
by the proxy. Auto-redirect to IdP skipped when config comes from cache
to avoid redirect loops while the proxy is challenging.

Fixes discussion #836.

* fix(files): add bottom-nav padding to files tab wrapper on mobile

* fix(budget): expose toolbar on mobile so users can add budget categories

* fix(pwa): unregister SW before proxy-reauth reload so Pangolin can challenge

WorkBox's NavigationRoute served the cached SPA shell on window.location.reload(),
meaning Pangolin/CF Access never saw the navigation and the app was left stuck
showing stale offline data. Unregistering the SW first lets the navigation reach
the network so the upstream proxy can run its auth flow.

Also rebuilds server/public with corrected sw.js (health excluded from
NetworkFirst, /oauth/ and /.well-known/ added to NavigationRoute denylist).

* chore: remove committed build artifacts from server/public

Dockerfile and Proxmox community script both rebuild client/dist and copy
it into server/public at build time — committed artifacts were never used.
Replace with .gitkeep and add server/public/* to .gitignore.

* chore: add build-from-sources script
2026-05-06 21:38:40 +02:00

306 lines
15 KiB
Markdown

# Troubleshooting
## "Access token required" when changing password on first login
**Cause:** The session cookie has the `Secure` flag set, which means the browser will only send it over HTTPS. When accessing TREK over plain HTTP (e.g. `http://192.168.1.x:3000`), the browser silently drops the cookie and the server sees no session — returning "Access token required".
**Fix:** Choose one of the following options:
**Option 1 — Use HTTPS.** Access TREK via HTTPS with a valid SSL certificate.
**Option 2 — Disable the Secure flag.** Set `COOKIE_SECURE=false` in your Docker environment to allow the session cookie to be sent over plain HTTP:
```yaml
environment:
- COOKIE_SECURE=false
```
> **Note:** Option 2 is only recommended for internal/home-lab deployments that do not use HTTPS. Do not use it on a publicly accessible instance. See [Environment Variables](Environment-Variables).
---
## WebSocket not connecting / real-time sync broken
**Cause:** Your reverse proxy is not forwarding WebSocket upgrade headers on the `/ws` path.
**Fix:** Add the following to your proxy config for the `/ws` location:
```nginx
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
```
Without these headers, the WebSocket handshake fails and real-time sync will not work. See [Reverse Proxy](Reverse-Proxy) for a complete nginx and Caddy configuration. Caddy handles WebSocket upgrades automatically.
---
## HTTPS redirect loop
**Cause:** `FORCE_HTTPS=true` is set but your reverse proxy is not forwarding the `X-Forwarded-Proto: https` header, so every request looks like plain HTTP and gets redirected indefinitely.
**Fix:** Ensure your proxy passes the `X-Forwarded-Proto` header to TREK. Also set `TRUST_PROXY=1` so that Express uses the forwarded IP for rate limiting and audit logs:
```yaml
environment:
- FORCE_HTTPS=true
- TRUST_PROXY=1
```
> **Note:** The `/api/health` endpoint is always exempt from the HTTPS redirect so that Docker health checks continue to work over plain HTTP.
If you are accessing TREK directly on `http://<host>:3000` with no proxy, remove `FORCE_HTTPS` entirely. See [Environment Variables](Environment-Variables).
---
## Encrypted settings lost / API keys not working after migration
**Cause:** The `ENCRYPTION_KEY` was changed or lost. All API keys, SMTP passwords, OIDC client secrets, and MFA TOTP secrets are encrypted at rest using this key. Without the original key, decryption fails.
**Fix:** See [Encryption Key Rotation](Encryption-Key-Rotation) for the migration script that re-encrypts data under a new key. If the original key is gone entirely, the encrypted values are unrecoverable and must be re-entered in the admin panel.
> **Note:** If you upgraded from an older version without setting `ENCRYPTION_KEY`, the server uses the following resolution order on startup: (1) `ENCRYPTION_KEY` env var, (2) `data/.encryption_key` file, (3) one-time fallback to `data/.jwt_secret` for legacy upgrades — the value is immediately written to `data/.encryption_key` so JWT rotation cannot break decryption later, (4) auto-generated fresh key for brand-new installs. Check `data/.encryption_key` for the key currently in use.
---
## Locked out of MFA / lost authenticator
**Fix:** If you still have access to your account, use one of the 10 backup codes generated during MFA setup to complete login. After signing in, go to **Settings > Security** to disable or reconfigure MFA.
If you no longer have access to backup codes and cannot log in, an admin must disable MFA for your account directly in the database, or use the `reset-admin.js` script to regain access to an admin account. There is no per-user MFA reset in the Admin Panel UI — the Admin Panel only controls the global "require MFA for all users" policy. See [Admin: Users and Invites](Admin-Users-and-Invites).
---
## Demo user cannot edit or create
**Cause:** The instance is running with `DEMO_MODE=true`. All write operations are blocked for the demo account by design.
**Fix:** This is intentional behavior for public demo deployments. If you are self-hosting and want full access, remove the `DEMO_MODE` variable (or set it to `false`). See [Demo Mode](Demo-Mode).
---
## Backup restore fails with "file too large"
**Cause:** Your reverse proxy has a default body size limit (commonly 1 MB or 10 MB) that is smaller than the backup ZIP. Backup archives include the full uploads directory and can be large.
**Fix:** Raise the body size limit in your proxy config. TREK's own backup upload cap is 500 MB. For nginx:
```nginx
client_max_body_size 500m;
```
Add this to the `location /` block (or the specific backup route). See [Reverse Proxy](Reverse-Proxy) and [Backups](Backups).
---
## "Cannot find module" on startup
**Likely cause:** A Docker volume mount is missing or the `/app/data` and `/app/uploads` directories are not writable by the container process. TREK automatically creates all required subdirectories on startup (`data/logs`, `data/backups`, `data/tmp`, `uploads/files`, `uploads/covers`, `uploads/avatars`, `uploads/photos`) — if this fails because the volume is read-only or owned by the wrong user, startup will abort.
**Fix:** Check your Docker volume configuration. Both `./data:/app/data` and `./uploads:/app/uploads` must be mounted and writable. Run `docker inspect <container> --format '{{json .Mounts}}'` to verify the mounts are present and point to valid host paths. If the host directories are owned by root, the container's `chown` step (which runs as root before dropping to `node`) should correct permissions automatically — but if your host filesystem is read-only or permissions are locked down, grant write access manually:
```bash
sudo chown -R 1000:1000 ./data ./uploads
```
---
## Encryption key regenerated on restart — stored secrets stop working
**Cause:** On every startup, TREK resolves its encryption key in this order: (1) `ENCRYPTION_KEY` env var, (2) `data/.encryption_key` file, (3) legacy `data/.jwt_secret` fallback, (4) auto-generate a fresh key. If neither the env var nor the `data/` volume is persisted — for example after recreating a container without a volume mount — a new random key is generated and all stored secrets (SMTP password, OIDC client secret, API keys, MFA TOTP seeds) become unrecoverable.
**Fix:** Ensure `./data:/app/data` is mounted as a persistent volume so `data/.encryption_key` survives restarts. Alternatively, pin the key explicitly:
```yaml
environment:
- ENCRYPTION_KEY=<your-key>
```
See [Encryption Key Rotation](Encryption-Key-Rotation) for how to retrieve or rotate the key.
---
## OIDC login returns "APP_URL is not configured"
**Cause:** When OIDC is enabled, TREK needs to know its own public URL to build the redirect URI. It resolves this from (1) `APP_URL` env var, (2) the first entry in `ALLOWED_ORIGINS`, (3) `http://localhost:<PORT>` as a last resort. If none of these are set and the request is not coming from localhost, TREK returns a 500 error.
**Fix:** Set `APP_URL` to the public URL of your instance:
```yaml
environment:
- APP_URL=https://trek.example.com
```
---
## OIDC login fails with issuer mismatch
**Cause:** TREK validates that the `issuer` field in the provider's discovery document exactly matches the configured `OIDC_ISSUER`. A trailing-slash difference (e.g. `https://auth.example.com` vs `https://auth.example.com/`) is enough to fail.
**Fix:** Check the exact issuer value your provider advertises and match it:
```bash
curl -s https://<your-oidc-issuer>/.well-known/openid-configuration | jq .issuer
```
Set `OIDC_ISSUER` to that exact string.
---
## OIDC login fails when provider is on a private/internal network
**Cause:** TREK's SSRF guard blocks outbound requests to private IP ranges by default. If your OIDC provider (e.g. Keycloak, Authentik) is running on an internal address, the discovery document fetch will be blocked with: `Requests to private/internal network addresses are not allowed.`
**Fix:**
```yaml
environment:
- ALLOW_INTERNAL_NETWORK=true
```
---
## Password reset emails are not delivered / SMTP is silent
**Cause:** SMTP failures are logged but do not surface as errors to the end user — the "reset email sent" message appears regardless. Common causes: wrong `SMTP_HOST` or `SMTP_PORT`, bad credentials, firewall blocking outbound on the SMTP port, or a self-signed certificate on the SMTP server.
**Fix:**
1. Check server logs for `Email send failed`:
```bash
docker logs <container> 2>&1 | grep "Email send failed"
```
2. If the error mentions TLS or certificate, set `SMTP_SKIP_TLS_VERIFY=true`.
3. Verify the port: `587` for STARTTLS, `465` for implicit TLS, `25` for plain SMTP.
4. Test connectivity from the container:
```bash
docker exec <container> nc -zv <SMTP_HOST> <SMTP_PORT>
```
> **Note:** If no SMTP is configured at all, TREK prints the reset link directly to the server logs (`===== PASSWORD RESET LINK =====`). This is useful for initial setup or self-hosted installs without email.
---
## CORS error — API requests blocked in the browser
**Cause:** If `ALLOWED_ORIGINS` is set, only those origins are permitted. Any request from a different origin is rejected with a CORS error visible in the browser console.
**Fix:** Add your origin to the comma-separated list:
```yaml
environment:
- ALLOWED_ORIGINS=https://trek.example.com,https://other.example.com
```
If `ALLOWED_ORIGINS` is not set, TREK allows all origins (development default). See [Environment Variables](Environment-Variables).
---
## WebSocket closes immediately after connecting (codes 4001 / 4403)
**Cause:** The `/ws` endpoint requires an ephemeral token generated by the client immediately before connecting. If the token is missing, expired, or the user's session state changed, the server closes the connection with a specific code:
| Code | Reason |
|------|--------|
| `4001` | No token, expired/invalid token, or user not found — re-login required |
| `4403` | MFA is required globally but the user has not enabled it |
**Fix:**
- Code `4001`: Log out and log back in. If it persists, check that your reverse proxy is not stripping the `token` query parameter from the WebSocket upgrade request.
- Code `4403`: The user must enable MFA in **Settings > Security**, or an admin can disable the global MFA requirement in **Admin > Settings**.
---
## Clipboard features not working (copy link, share, etc.)
**Cause:** The browser Clipboard API (`navigator.clipboard`) is only available in a [secure context](https://developer.mozilla.org/en-US/docs/Web/Security/Secure_Contexts). When accessing TREK over plain HTTP on a non-localhost address, the API is unavailable and clipboard operations silently fail or show an error.
**Fix:** The only supported options are:
- Access TREK over HTTPS with a valid SSL certificate.
- Access TREK directly from `http://localhost:<port>` — browsers treat `localhost` as a secure context for the Clipboard API (unlike the session cookie, which always requires HTTPS regardless of hostname).
---
## MCP OAuth flow does not initiate / "Connect" redirects but authentication never starts
**Cause:** TREK builds the OAuth 2.1 redirect URI from `APP_URL`. If `APP_URL` is not set, the authorization URL is constructed from a localhost fallback that external clients (Claude.ai, Claude Desktop) cannot reach, so the OAuth handshake never completes.
**Fix:** Set `APP_URL` to the public URL of your instance:
```yaml
environment:
- APP_URL=https://trek.example.com
```
Restart the container after adding the variable. Once set, clicking **Connect** in the MCP client should redirect to your TREK instance and complete the OAuth flow normally.
> **Note:** `APP_URL` is required for any MCP OAuth integration. Without it, the authorization endpoint resolves to `http://localhost:<PORT>`, which is unreachable from external MCP clients.
---
## MCP integration: "Too many requests" or "Session limit reached"
**Cause:** Each user is limited to 300 MCP requests per minute and 20 concurrent sessions by default. Exceeding either limit returns a `429` response.
**Fix:** Increase the limits via environment variables:
```yaml
environment:
- MCP_RATE_LIMIT=600 # requests per minute per user (default: 300)
- MCP_MAX_SESSION_PER_USER=50 # concurrent sessions per user (default: 20)
```
---
## MCP requests blocked by Cloudflare WAF (Bot Fight Mode)
**Cause:** When TREK is proxied through Cloudflare, **Bot Fight Mode** and **Super Bot Fight Mode** classify requests from ChatGPT as bots and block them at the WAF level — before the request ever reaches TREK. This is specific to ChatGPT; Claude.ai is not affected. ChatGPT's exit node IPs have low reputation scores in Cloudflare's threat intelligence and the User-Agent matches Cloudflare's automated-traffic heuristics. TREK itself never receives the request, so there is nothing in TREK's logs; the block is silent from TREK's perspective.
Symptoms:
- ChatGPT shows a connection error or times out immediately after OAuth completes.
- Cloudflare's Security → Events log shows blocked requests to `/mcp` with action `block` and source `bfm` (Bot Fight Mode) or `managed_rule`.
**Fix — Option 1: Disable Bot Fight Mode (free plan and paid plan)**
In the Cloudflare dashboard for your zone: **Security → Bots → Bot Fight Mode → Off** (or Super Bot Fight Mode → Off).
This is the only option available on the **free plan**. It disables bot blocking for the entire zone — all probe bots, scrapers, and crawlers that Cloudflare would otherwise block will reach your server. Only use this if you have no alternative.
**Fix — Option 2: WAF skip rule for MCP paths (paid plan only)**
> WAF custom rules require a **paid Cloudflare plan** (Pro or above). This option is not available on the free plan.
Create a WAF skip rule that bypasses bot management only for the MCP and OAuth paths, leaving protection in place for the rest of the site:
1. Go to **Security → WAF → Custom rules** and click **Create rule**.
2. Enter the following expression (replace `trek.example.com` with your domain):
```
(http.host eq "trek.example.com") and (
http.request.uri.path eq "/mcp" or
http.request.uri.path starts_with "/oauth/" or
http.request.uri.path starts_with "/.well-known/"
)
```
This covers all paths that ChatGPT's servers hit during discovery, OAuth, and MCP calls:
| Path | Purpose |
|---|---|
| `/mcp` | MCP endpoint (GET, POST, DELETE) |
| `/oauth/authorize` | OAuth authorization handler |
| `/oauth/register` | Dynamic client registration |
| `/oauth/token` | Token issuance |
| `/oauth/userinfo` | User info (for domain claiming) |
| `/oauth/revoke` | Token revocation |
| `/.well-known/oauth-authorization-server` | RFC 8414 AS metadata |
| `/.well-known/oauth-protected-resource` | RFC 9728 flat resource metadata |
| `/.well-known/openid-configuration` | OIDC discovery |
3. Set the action to **Skip** and check **Bot Fight Mode** (and/or **Super Bot Fight Mode**) under the skip options.
4. Save and deploy.
This allows MCP and OAuth traffic through while keeping Cloudflare bot protection active for all other paths.