Why does attribution fail at companies your size?
You're running paid search, display, email, and organic—but you can't prove which channel earned the revenue. You buy a SaaS platform promising a model, spend $2–5k/month, and realize it can't answer a single question without a PhD in attribution theory. Meanwhile, your CFO wants to know why you spent $50k on LinkedIn and Google won't tell you.
The reason attribution fails is structural. Most platforms are black boxes. They either claim a single source of truth (usually last-click, which lies), or they charge so much per event that you have to downsample your data. You end up chasing proxy metrics instead of real revenue impact.
The fix: Build your model in-house, on your own data, in BigQuery. You own the logic, you can iterate in days, and the infrastructure costs less than a single paid tool license.
What does a working attribution setup require?
Three things: GA4 events flowing into BigQuery, revenue data (Shopify, Stripe, CRM, or internal database), and a SQL model that connects sessions to transactions. That's it.
First, enable Google Analytics 4 BigQuery export. You get daily tables of raw events—no sampling, no aggregation, just the source truth. Set it up in GA4 settings, link your GCP project, and wait 24 hours for the first load. Cost: $6 per GB of data storage. A typical D2C company with 100k monthly visitors stores roughly 100–200 GB/year. That's $600–1,200/year for storage.
Second, get your revenue data into BigQuery. If you use Shopify, Stripe, or Klaviyo, use a managed connector (Airbyte, Fivetran, or even Zapier to Cloud Storage). If you use a CRM, export weekly CSVs or write a light ETL. Cost depends on your tool: Airbyte's free tier covers most small businesses. Stripe webhooks to a Cloud Function are free.
Third, join the tables. A GA4 session becomes a user_id → session_id → revenue record. You flatten the events, group by session, annotate the touchpoints in order, and assign credit. A basic first-touch model takes 50 lines of SQL. A time-decay model takes 80.
How long does it take to ship your first model?
Three days to a week for a working attribution table, another week to build dashboards you'll actually use.
Day 1: Enable BigQuery export in GA4. Link your revenue system. Write a query to join events to transactions by user_id and date. Sanity-check the join rate (you should see 40–80% of revenue attributed, depending on tracking setup). This query takes 2 hours if your revenue source is clean.
Day 2–3: Build the attribution logic. Start with first-touch, which is one GROUP BY with a WINDOW FUNCTION to rank events by timestamp. Test it against a small date range (e.g., one week). Expected time: 4–8 hours. You'll hit bugs: missing user IDs, events without session context, revenue records that don't match GA4 timestamps. Fix them incrementally. Every fix teaches you about your data.
Day 4–7: Materialize the table as a scheduled query. Run it nightly. Layer on Looker Studio (free) or Metabase (self-hosted) dashboards. Build three views: revenue by channel, revenue by campaign, and cohort-level LTV by source. Your finance team can self-serve in 5 minutes.
Do not overthink the model on week one. First-touch or last-touch gets you 80% of the insight for 5% of the work. Iterate to time-decay or Shapley once you trust the data layer.
What's the actual cost breakdown?
BigQuery storage: $6/GB/month. Expect 10–20 GB/month for typical D2C. Cost: $60–120/month.
BigQuery queries: $6.25 per 1 TB of data scanned. Your attribution query scans maybe 50–100 GB/month. Cost: $0.30–0.60/month. Negligible.
Data ingestion (Airbyte free tier, Zapier, or webhooks): $0–300/month depending on your revenue system and row volume. Most startups stay free.
BI tool: Looker Studio is free. Metabase is free to self-host. Tableau/Power BI are $70+/user/month if you want them, but unnecessary.
Total for a 100k monthly visitor D2C brand with Shopify or Stripe: $200–300/month. Add $100–200 if you need a paid data connector or run higher query volume. You'll never breach $500/month unless you scale past a million monthly visitors.
What does a real attribution query look like?
Here's a first-touch model in readable SQL:
WITH events_clean AS (
SELECT
user_pseudo_id,
session_id,
event_timestamp,
event_name,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'source') AS source,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'medium') AS medium
FROM `project.analytics.events_*`
WHERE _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d', CURRENT_DATE() - 30) AND FORMAT_DATE('%Y%m%d', CURRENT_DATE())
AND event_name IN ('purchase', 'add_to_cart', 'view_item')
),
first_touch AS (
SELECT
user_pseudo_id,
MIN(event_timestamp) AS first_touch_time,
FIRST_VALUE(source) OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp) AS first_source,
FIRST_VALUE(medium) OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp) AS first_medium
FROM events_clean
GROUP BY user_pseudo_id
),
revenue_joined AS (
SELECT
ft.first_source,
ft.first_medium,
SUM(o.revenue) AS attributed_revenue,
COUNT(DISTINCT o.order_id) AS order_count
FROM first_touch ft
LEFT JOIN `project.ecommerce.orders` o
ON ft.user_pseudo_id = o.user_id
AND DATE(o.created_at) >= DATE(ft.first_touch_time)
AND DATE(o.created_at) <= DATE(ft.first_touch_time + INTERVAL 30 DAY)
GROUP BY first_source, first_medium
ORDER BY attributed_revenue DESC
)
SELECT * FROM revenue_joined;
This query finds the first touchpoint for every user, then matches it to revenue within 30 days. It runs in under 30 seconds on typical volumes and costs about $0.06 per run. You run it once daily as a scheduled query, write results to a table, and query that table in your dashboard for instant load times.
What's the real gotcha?
Your GA4 data isn't tracking user IDs by default. You have user_pseudo_id (a client ID, resets per domain), but revenue systems need real user identifiers. If you're not passing a hashed user_id` to GA4 on every event, your join rate will be 30–50% at best. Fix this first.
Second, your revenue system's timestamps might not align with GA4. Stripe fires webhooks in UTC. Your Shopify order timestamps are in your store's timezone. Your CRM backfills historical data hours later. Build a reconciliation query that compares daily revenue totals between systems. Find the offset. Document it. Test quarterly.
Start without worrying about cross-device users or view-through attribution. Those are refinements for later. You need a clean, auditable data layer first. Then iterate.
Once you own your attribution model in BigQuery, you'll ship answers instead of waiting for tools. Your finance team will know exactly which channels drive margin, not just traffic. You'll cut poor performers in weeks instead of debating them for months. That's not a small thing—it's the difference between marketing spending and marketing strategy.


