Adobe Experience Platform – Query Service & Data Distiller

February 6, 2026 February 6, 2026

mosesdinakaran@gmail.com

1. Overview

1. What is Query Service?

Query Service is Adobe Experience Platform’s SQL engine that lets you query and transform data stored in the AEP Data Lake.

Uses ANSI SQL
Works on datasets (Parquet files + XDM schemas)
Designed for analytics and transformation, not transactions

Important mental model: AEP is a data lake, not a traditional database.

2. Key Doubt #1

“If data already exists, why do we need CREATE TABLE?”

Answer:

A simple SELECT:

Scans raw data files
Returns temporary results
Nothing is stored or reused

SELECT * FROM web_events;

A CTAS (CREATE TABLE AS SELECT):

Runs the query
Materializes the result
Saves it as a new dataset in AEP

CREATE TABLE derived_events AS
SELECT * FROM web_events;

This is required because AEP does not store query results automatically.

3. What is CTAS?

CTAS (Create Table As Select) is the mechanism used to create Derived Datasets.

Output = new dataset
Stored in the Data Lake
Reusable, faster, cheaper for repeated queries

Derived Dataset = Result of a CTAS query

4. What is a Derived Dataset?

A Derived Dataset is:

A dataset created from other datasets
Cleaned, flattened, aggregated
Used for analytics, BI, ML, or Profile ingestion

Why it exists:

Raw XDM data is large, nested, and expensive to query repeatedly
Derived datasets optimize performance and cost

5. What Query Service Can Do (Alone)

With Query Service only, you can:

Explore data (SELECT)
Clean, shape, and manipulate data
Create one-time derived datasets using CTAS

✔️ Technically capable
❌ Not operational or automated

6. What is Data Distiller?

Data Distiller is a licensed add-on that builds on Query Service to support production-grade data preparation.

Data Distiller does not add new SQL.
It adds automation, scheduling, and incremental processing.

7. Key Doubt #2

“Can’t we do all this with Query Service?”

Correct answer:

Yes, functionally
No, operationally

Capability	Query Service	Data Distiller
SQL transformations	✅	✅
CTAS	✅ (manual)	✅
Scheduled execution	❌	✅
Incremental processing	❌	✅
Production pipelines	❌	✅
Reliable Profile feeding	❌	✅

8. Why Data Distiller Exists

Adobe introduced Data Distiller because customers needed:

Repeated transformations
Cost-efficient processing
Reliable datasets for Profile & Analytics
Automated pipelines

Query Service alone is interactive
Data Distiller is operational

9. Relationship Summary

Query Service = SQL engine
CTAS = Mechanism
Derived Dataset = Output
Data Distiller = Automation + Scheduling + Incremental logic

10. One-Line Takeaways (Most Important)

Query Service answers questions
CTAS saves answers
Derived datasets avoid repeated heavy queries
Data Distiller productionizes Query Service
AEP is a data lake, not a traditional database

11. When to Use What

Use Query Service only when:

Exploring data
Debugging ingestion
Running one-off analysis

Use Data Distiller when:

Data feeds Profile or analytics continuously
Queries must run repeatedly
Cost, scale, and reliability matter

Scenario	Use Query Service
SQL-based analysis	✅
Large-scale transformations	✅
Real-time personalization	❌
Marketing dashboards	⚠️ (Prefer CJA)

Data Sources
   ↓
AEP Ingestion
   ↓
AEP Data Lake (Parquet + XDM)
   ↓
Query Service (SQL Engine)
   ↓
Results / Derived Datasets / BI Tools

How it relates to Query Service

Query Service	Data Distiller
Core SQL query engine	Add-on on top of Query Service
Ad-hoc analysis	Production transformations
Manual CTAS	Scheduled & incremental CTAS
One-time queries	Continuous data pipelines

👉 Data Distiller uses Query Service under the hood

Categories: AEP and Tutorials