Building a Data Agency

CIFL/Building a Data Agency

Enrollment is closed

$999 or 3 monthly payments of $333

Building a Data Agency

Closed

How you can offer "data pipelines as a service" on Google BigQuery, following our end-to-end process

What You'll Learn

Down with chaos!

At CIFL, we've built data pipelines (mostly in Bigquery) for a wide array of businesses.

What we've found, is that coding is the easy part.

The hard part, as usual, is getting your team (or your client's teams) on the same page throughout the development process - so that what gets built actually gets used when it's done.

This course focuses on the processes that can ensure a good result - whether you're building a data pipeline internally for your company, or offering data pipelining as a service to your clients.

We'll dive into:

How to decide if you even need a data pipeline at all
How to roadmap a pipeline, to get buy-in + validation upfront
How to break a pipeline roadmap into bite-sized development sprints to deliver small wins
What roles are required to actually build a reporting pipeline, and how to hire for them
How to validate data to build trust in the outcome

This course is a labor of love - these are the lessons learned over a few years deep in the weeds of building data pipelines for clients.

Hope you enjoy it as much as we enjoy sharing it.

Many thanks to Supermetrics and Stitch for providing demo accounts for use in building the course.

Mahalo,
David
Commissioner, Coding is for Losers

Contents

Getting Started + FAQs

All the background info you'll need to dive into building data pipelines.

***ALL THE TEMPLATE LINKS***

**GETTING HELP**

The business of data, end-to-end

Why'd we republish this course?

Who is this for, and what will you learn?

What is a data pipeline?

[THROWBACK ALERT] WTF is ADP?

Additional (FREE) CIFL Courses

Recommended technical trainings to get your team going.

Getting Started with BigQuery SQL

Data Studio the Lazy Way

0.1 The Sales Flow

How to repeatably welcome new prospects as clients.

0.1.1 Finding your niche

0.1.2 Building an inbound content strategy

0.1.3 Why we arrived at the sprint pricing model

0.1.4 Why we publish pricing

0.1.5 The initial sales call

0.1.6 The roadmapping process

0.1.7 Deal closing + contracts

0.2 Staffing & Resourcing

How we recommend hiring + planning staffing to execute sprints effectively.

0.2.1 The sprint flow + roles

0.2.2 Hiring reporting + data modeling analysts

COMING SOON - Making a staffing plan & budget

1.1 Planning Development Sprints

An ounce of planning is worth a pound of building.

Before you write any code, you'll want to translate your Roadmap into action items across data feeds, data models and visualizations.

Only after those are planned, do we recommend putting pen to paper.

1.1.0 Meet the Tracking Plan

1.1.1 Breaking the roadmap into a Tracking Plan

1.1.2 Mapping our raw data source requirements

1.1.3 What you'll do with data source schemas

1.1.4 Mapping out data source schemas

1.1.5 Populating key starter questions for reporting

1.1.6 Scoping out each Site or Client

2.1 Data Feeds - Getting Started

Getting data flowing into your BigQuery database, using Stitch, Supermetrics, or custom API connections in the Tracking Plan.

2.1.1 Intro to data feeds

2.1.2 BigQuery initial setup

2.1.3 Setting up your BigQuery tables

2.1.4 Pushing data from Sheets to BigQuery

2.1.5 Supermetrics quickstart for beginners

2.1.6 Pulling data from unsupported APIs into the Tracking Plan

2.2 Data Feeds - Stitch

Stitch is the preferred ETL (extract, transform, load) platform used by CIFL to pull data from APIs like Google Analytics, FB Ads and more into BigQuery.

2.2.1 Stitch initial setup

2.2.2 Pulling GA data using Stitch

2.2.3 Pulling Adwords data using Stitch

2.2.4 Pulling FB Ads data using Stitch

3.1 Intro to dbt

Getting up and running with dbt, the open-source SQL modeling framework we highly recommend over building saved queries directly in BigQuery.

3.1.1 Intro to dbt for SQL data modeling

3.1.2 Planning your Data Models

3.1.3 Creating your dbt project

3.1.4 Connect your BigQuery database to dbt

3.1.5 Managing your dbt project via Github

3.2 Data modeling with dbt

This will likely be the most challenging section of the course, but the payoff in terms of your data analysis powers is worth the squeeze, I promise.

3.2.1 Writing your 'processing' level SQL queries

3.2.2 Writing your 'join' level SQL models

3.2.3 Sidenote: on debugging dbt models

3.2.4 Pro tip: standardizing URL structure

3.2.5 Pro tip: using dbt macros

3.2.6 Writing your 'admin' level SQL models

3.2.7 Writing your 'math' and 'visualization' level SQL models

COMING SOON - Data documentation in dbt

COMING SOON - Data + schema testing in dbt

3.3 Productionalizing your dbt project

When moving from building your pipeline, to *using* your pipeline, you'll want to put some production guardrails in place to make sure things run smoothly.

In this section, we'll learn about using Github to store your SQL models, running your data models on a timer using Sinter, and generally keeping your pipeline running smoothly.

3.3.1 Intro to productionalizing your pipeline

3.3.2 Using dbt cloud to run your SQL models on a schedule

3.3.3 Scheduling your data pipeline orchestrations

3.3.4 Testing changes to your data pipeline

3.3.5 QCing data using Supermetrics as a check

4.1 Visualizations in Data Studio

Getting your reporting workflow up and running in Google Data Studio.

You can port this methodology into any reporting tool - the fundamentals of designing, building and reviewing your visualizations will remain the same.

4.1.1 The "PDA" reporting design framework

4.1.2 Designing reports in the Tracking Plan

4.1.3 Executing the reporting build

4.1.4 Reviewing reporting

4.1.5 Pulling data from BigQuery into Sheets

5.1 Sprint wrapup + review

Transitioning to the next sprint, or into maintenance mode.

5.1.1 Conditions for closing out a sprint

5.1.2 Wiring up reporting with live data

5.1.3 Sharing draft models + visualizations with clients

5.1.4 Transitioning to support mode

Congrats!

Next steps on making sure your pipeline is built and fulfilling its duty in life.

Wapow! You made it.

Interested in working with CIFL?

Mastering Google Sheets, Data Studio and BigQuery

Helping you wrangle your data + automate your work, without (hardly ever) leaving the Google stack.

X Linked_in Youtube Website

FAQ

How much does the toolbelt cost to build a pipeline?

The tools required to build data pipelines are:

Stitch ($100 / month for base plan) or Supermetrics ($89 / month) for pulling data from APIs
Google Sheets + Apps Script (Free)
Google BigQuery (or similar database) for data warehousing (free 12month trial + $300 credit, cost varies by usage after that)
dbt to model data using SQL (Free + open-source)
dbt cloud to run dbt models on a schedule (free for one seat, $50/seat after that)

So the bottom line is - the cost of your pipeline depends on the size of your pipeline. If you're pulling + storing a small amount of data, it may be completely free.

If you're pulling a large amount of data, it will be more expensive. The raw tool cost for most of our data pipeline clients is around $400-500 per month.