Adobe Experience Platform – Query Service & Data Distiller

1. Overview

1. What is Query Service?

Query Service is Adobe Experience Platform’s SQL engine that lets you query and transform data stored in the AEP Data Lake.

  • Uses ANSI SQL
  • Works on datasets (Parquet files + XDM schemas)
  • Designed for analytics and transformation, not transactions

Important mental model: AEP is a data lake, not a traditional database.


2. Key Doubt #1

“If data already exists, why do we need CREATE TABLE?”

Answer:

A simple SELECT:

  • Scans raw data files
  • Returns temporary results
  • Nothing is stored or reused
SELECT * FROM web_events;

A CTAS (CREATE TABLE AS SELECT):

  • Runs the query
  • Materializes the result
  • Saves it as a new dataset in AEP
CREATE TABLE derived_events AS
SELECT * FROM web_events;

This is required because AEP does not store query results automatically.


3. What is CTAS?

CTAS (Create Table As Select) is the mechanism used to create Derived Datasets.

  • Output = new dataset
  • Stored in the Data Lake
  • Reusable, faster, cheaper for repeated queries

Derived Dataset = Result of a CTAS query


4. What is a Derived Dataset?

A Derived Dataset is:

  • A dataset created from other datasets
  • Cleaned, flattened, aggregated
  • Used for analytics, BI, ML, or Profile ingestion

Why it exists:

  • Raw XDM data is large, nested, and expensive to query repeatedly
  • Derived datasets optimize performance and cost

5. What Query Service Can Do (Alone)

With Query Service only, you can:

  • Explore data (SELECT)
  • Clean, shape, and manipulate data
  • Create one-time derived datasets using CTAS

✔️ Technically capable
❌ Not operational or automated


6. What is Data Distiller?

Data Distiller is a licensed add-on that builds on Query Service to support production-grade data preparation.

Data Distiller does not add new SQL.
It adds automation, scheduling, and incremental processing.


7. Key Doubt #2

“Can’t we do all this with Query Service?”

Correct answer:

  • Yes, functionally
  • No, operationally
CapabilityQuery ServiceData Distiller
SQL transformations
CTAS✅ (manual)
Scheduled execution
Incremental processing
Production pipelines
Reliable Profile feeding

8. Why Data Distiller Exists

Adobe introduced Data Distiller because customers needed:

  • Repeated transformations
  • Cost-efficient processing
  • Reliable datasets for Profile & Analytics
  • Automated pipelines

Query Service alone is interactive
Data Distiller is operational


9. Relationship Summary

Query Service = SQL engine
CTAS = Mechanism
Derived Dataset = Output
Data Distiller = Automation + Scheduling + Incremental logic


10. One-Line Takeaways (Most Important)

  • Query Service answers questions
  • CTAS saves answers
  • Derived datasets avoid repeated heavy queries
  • Data Distiller productionizes Query Service
  • AEP is a data lake, not a traditional database

11. When to Use What

Use Query Service only when:

  • Exploring data
  • Debugging ingestion
  • Running one-off analysis

Use Data Distiller when:

  • Data feeds Profile or analytics continuously
  • Queries must run repeatedly
  • Cost, scale, and reliability matter
ScenarioUse Query Service
SQL-based analysis
Large-scale transformations
Real-time personalization
Marketing dashboards⚠️ (Prefer CJA)

Data Sources
   ↓
AEP Ingestion
   ↓
AEP Data Lake (Parquet + XDM)
   ↓
Query Service (SQL Engine)
   ↓
Results / Derived Datasets / BI Tools

How it relates to Query Service

Query ServiceData Distiller
Core SQL query engineAdd-on on top of Query Service
Ad-hoc analysisProduction transformations
Manual CTASScheduled & incremental CTAS
One-time queriesContinuous data pipelines

👉 Data Distiller uses Query Service under the hood

Leave a Reply

Your email address will not be published. Required fields are marked *