tva
← Insights

Privacy-First Analytics: Setting Up Self-Hosted Plausible with Google Search Console

The analytics tracking landscape has shifted considerably since GDPR took effect in 2018. In reality, most organisations still run Google Analytics 4 on their websites — not because it is the best tool for the job, but because migration appears costly and the alternatives seem niche. We moved from Google Analytics to self-hosted Plausible CE in mid-2024, and the results are worth documenting in detail.

Why we left Google Analytics

Google Analytics 4 is a capable platform. The compliance overhead, however, is substantial. Running GA4 in a genuinely GDPR-compliant configuration requires a cookie consent banner, a consent management platform, data processing agreements, and careful regional routing to prevent IP addresses and user identifiers from leaving the EU. The consent banner itself introduces a systematic gap in your data: independent studies suggest between 30% and 50% of visitors decline consent entirely, leaving a blind spot in every metric you collect.

The problem is that this creates an unworkable tradeoff. Either you accept incomplete data from the consented segment, or you run GA4 without consent and accept legal exposure. Neither is a clean answer for an organisation that takes both data quality and compliance seriously.

Plausible CE is designed around a different premise. The tracking script is cookieless by default, collects no personally identifiable information, and requires no consent banner under GDPR. This is not marketing language — it is a specific technical claim. Plausible sets no cookies, generates no persistent user identifier between sessions, and transmits no data to third-party servers when self-hosted. The ICO guidance on PECR and the Article 29 Working Party opinions on analytics both confirm that cookieless, non-fingerprinting analytics do not trigger the consent requirement under the ePrivacy Directive.

The architecture: ClickHouse as the analytics engine

Plausible CE ships with two databases: ClickHouse for event storage and PostgreSQL for configuration. ClickHouse is a column-oriented database built for analytical workloads — the same engine Yandex built for their analytics infrastructure at scale. For a site receiving tens of thousands of monthly page views, the performance advantage over a row-based database is imperceptible. But the architecture means Plausible can handle sites with hundreds of millions of events without schema changes or infrastructure redesign, which matters if you expect growth.

The Docker Compose setup is straightforward. Plausible provides an official configuration with all required services. Our deployment runs alongside Traefik on a Hetzner CX21 instance (2 vCPUs, 4 GB RAM) in Helsinki:

services:
  plausible_db:
    image: postgres:16-alpine
    restart: always
    volumes:
      - db-data:/var/lib/postgresql/data
    environment:
      - POSTGRES_PASSWORD=${POSTGRES_PASSWORD}

  plausible_events_db:
    image: clickhouse/clickhouse-server:24.3.3.102-alpine
    restart: always
    volumes:
      - event-data:/var/lib/clickhouse
      - ./clickhouse/clickhouse-config.xml:/etc/clickhouse-server/config.d/logging.xml:ro
      - ./clickhouse/clickhouse-user-config.xml:/etc/clickhouse-server/users.d/logging.xml:ro
    ulimits:
      nofile:
        soft: 262144
        hard: 262144

  plausible:
    image: ghcr.io/plausible/community-edition:v2.1.4
    restart: always
    command: sh -c "sleep 10 && /entrypoint.sh db createdb && /entrypoint.sh db migrate && /entrypoint.sh run"
    depends_on:
      - plausible_db
      - plausible_events_db
    environment:
      - BASE_URL=https://plausible.yourdomain.com
      - SECRET_KEY_BASE=${SECRET_KEY_BASE}
      - DATABASE_URL=postgres://postgres:${POSTGRES_PASSWORD}@plausible_db:5432/plausible_db
      - CLICKHOUSE_DATABASE_URL=http://plausible_events_db:8123/plausible_events_db
      - DISABLE_REGISTRATION=true
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.plausible.rule=Host(`plausible.yourdomain.com`)"
      - "traefik.http.routers.plausible.entrypoints=websecure"
      - "traefik.http.routers.plausible.tls.certresolver=letsencrypt"
      - "traefik.http.services.plausible.loadbalancer.server.port=8000"

volumes:
  db-data:
  event-data:

The DISABLE_REGISTRATION=true flag is not optional. After creating your initial account, any public-facing instance without this flag is open to registration by anyone who finds the URL. Generate SECRET_KEY_BASE with openssl rand -base64 64 and keep it in a .env file outside version control.

The two ClickHouse XML configuration files referenced in the volumes section control logging verbosity. Without them, ClickHouse logs at a level that fills disk on a small instance within days. Plausible’s repository includes these files; copy them into a clickhouse/ subdirectory alongside your docker-compose.yml before starting the stack.

Deploying the tracking script

Once the stack is running, adding tracking to any site is a single tag in the HTML <head>:

<script defer data-domain="yourdomain.com"
  src="https://plausible.yourdomain.com/js/script.js"></script>

Plausible ships optional script extensions that bundle into a single request. We use the variant that includes file download tracking, hash-based routing support for single-page applications, outbound link tracking, custom page view properties, revenue tracking, and tagged events:

<script defer data-domain="yourdomain.com"
  src="https://plausible.yourdomain.com/js/script.file-downloads.hash.outbound-links.pageview-props.revenue.tagged-events.js"></script>

The combined script weighs approximately 2.4 KB uncompressed. The Google Analytics 4 script is approximately 45 KB. This is not a marginal difference — it is 20× less JavaScript parsing on every page load, which measurably affects Core Web Vitals scores on low-powered devices and slow connections.

Google Search Console integration

The strongest argument for keeping GA4 is its connection to Google Search Console, which allows you to correlate organic search query data with on-site behaviour. Plausible supports this integration natively, and it is the feature that makes the switch viable for content-driven sites.

The setup requires a verified Search Console property and an OAuth authorisation in Plausible’s site settings. Once connected, GSC data appears as a separate breakdown within the traffic sources section, showing search queries, impressions, click-through rates, and average position alongside your standard page view and session data.

One configuration detail that consistently causes problems: the domain in Plausible must match the GSC property type exactly. A GSC property verified as https://www.yourdomain.com using the URL-prefix method will not connect to a Plausible site tracking yourdomain.com without the subdomain. Use the Domain property type in GSC where possible — it covers all subdomains and protocols and makes the Plausible connection straightforward.

GSC data in Plausible carries the same 48-hour lag as the Search Console API itself. This is a Google constraint, not a Plausible limitation. For operational analytics review, this is rarely a practical obstacle.

GDPR compliance without cookie banners

The compliance position is worth stating with precision. Under GDPR and the ePrivacy Directive, consent is required when you store or access information on a user’s device. Cookies and localStorage both trigger this requirement. Plausible stores nothing on the user’s device: no cookies, no localStorage entries, no IndexedDB writes.

Plausible also does not generate a persistent cross-session user identifier. Each page view is processed independently. The closest approximation to a user identifier is a daily hash derived from a combination of the visitor’s IP address, user agent string, and a rotating server-side salt — a value that changes every 24 hours and cannot be reversed to recover the original IP address. This design places Plausible outside the definition of personal data processing under Article 4(1) GDPR.

The practical result: no consent banner, no consent management platform, no consent rate optimisation, and no gap in your data from declined consent. The analytics data you see reflects your actual audience rather than the subset willing to accept tracking.

Our privacy policy now requires a single concise paragraph on analytics, accurately describing what Plausible collects and where the data resides. The equivalent GA4 section previously ran to several paragraphs and required specific references to Google’s Standard Contractual Clauses and data processing terms. The reduction in legal overhead is material for a small organisation.

Infrastructure overhead and performance comparison

Running both tools in parallel for 60 days before the GA4 cutover provided a direct comparison. Page view counts differed by less than 3% — consistent with Plausible’s own published accuracy figures comparing cookieless deployments against consented GA4.

The more instructive difference was in unique visitor counts. Plausible recorded approximately 12% more unique visitors than consented GA4 over the same period. This is expected: the consent-accepting subset of GA4 users skews toward more engaged visitors who are less likely to be using ad blockers or privacy-focused browsers. Plausible, by tracking all visitors without consent friction, gives a more accurate picture of total reach. GA4 in consented mode gives a behavioural portrait of your most engaged users. Neither is wrong — they are answering slightly different questions.

The infrastructure footprint is modest. ClickHouse consumes approximately 400 MB RAM at steady state. PostgreSQL adds around 80 MB. CPU usage is negligible during normal operation and briefly elevated only during bulk historical data imports. For organisations already running a Hetzner server with spare capacity, the marginal cost of adding Plausible to an existing stack is effectively zero.

But in reality, the decision to switch is not primarily about infrastructure cost or feature parity. It is about what kind of data you are willing to build decisions on. Analytics that requires consent produces a systematically biased sample. Cookieless analytics without consent produces a complete picture with less detail per visitor. For a marketing website where the primary question is “which pages and sources drive enquiries,” the complete picture is more useful than the detailed portrait of a self-selected subset.

Related Insights

Further Reading