Adobe Experience Platform – Query Service & Data Distiller
1. Overview

1. What is Query Service?
Query Service is Adobe Experience Platform’s SQL engine that lets you query and transform data stored in the AEP Data Lake.
- Uses ANSI SQL
- Works on datasets (Parquet files + XDM schemas)
- Designed for analytics and transformation, not transactions
Important mental model: AEP is a data lake, not a traditional database.
2. Key Doubt #1
“If data already exists, why do we need CREATE TABLE?”
Answer:
A simple SELECT:
- Scans raw data files
- Returns temporary results
- Nothing is stored or reused
SELECT * FROM web_events;
A CTAS (CREATE TABLE AS SELECT):
- Runs the query
- Materializes the result
- Saves it as a new dataset in AEP
CREATE TABLE derived_events AS
SELECT * FROM web_events;
This is required because AEP does not store query results automatically.
3. What is CTAS?
CTAS (Create Table As Select) is the mechanism used to create Derived Datasets.
- Output = new dataset
- Stored in the Data Lake
- Reusable, faster, cheaper for repeated queries
Derived Dataset = Result of a CTAS query
4. What is a Derived Dataset?
A Derived Dataset is:
- A dataset created from other datasets
- Cleaned, flattened, aggregated
- Used for analytics, BI, ML, or Profile ingestion
Why it exists:
- Raw XDM data is large, nested, and expensive to query repeatedly
- Derived datasets optimize performance and cost
5. What Query Service Can Do (Alone)
With Query Service only, you can:
- Explore data (
SELECT) - Clean, shape, and manipulate data
- Create one-time derived datasets using CTAS
✔️ Technically capable
❌ Not operational or automated
6. What is Data Distiller?
Data Distiller is a licensed add-on that builds on Query Service to support production-grade data preparation.
Data Distiller does not add new SQL.
It adds automation, scheduling, and incremental processing.
7. Key Doubt #2
“Can’t we do all this with Query Service?”
Correct answer:
- Yes, functionally
- No, operationally
| Capability | Query Service | Data Distiller |
|---|---|---|
| SQL transformations | ✅ | ✅ |
| CTAS | ✅ (manual) | ✅ |
| Scheduled execution | ❌ | ✅ |
| Incremental processing | ❌ | ✅ |
| Production pipelines | ❌ | ✅ |
| Reliable Profile feeding | ❌ | ✅ |
8. Why Data Distiller Exists
Adobe introduced Data Distiller because customers needed:
- Repeated transformations
- Cost-efficient processing
- Reliable datasets for Profile & Analytics
- Automated pipelines
Query Service alone is interactive
Data Distiller is operational
9. Relationship Summary
Query Service = SQL engine
CTAS = Mechanism
Derived Dataset = Output
Data Distiller = Automation + Scheduling + Incremental logic
10. One-Line Takeaways (Most Important)
- Query Service answers questions
- CTAS saves answers
- Derived datasets avoid repeated heavy queries
- Data Distiller productionizes Query Service
- AEP is a data lake, not a traditional database
11. When to Use What
Use Query Service only when:
- Exploring data
- Debugging ingestion
- Running one-off analysis
Use Data Distiller when:
- Data feeds Profile or analytics continuously
- Queries must run repeatedly
- Cost, scale, and reliability matter
| Scenario | Use Query Service |
|---|---|
| SQL-based analysis | ✅ |
| Large-scale transformations | ✅ |
| Real-time personalization | ❌ |
| Marketing dashboards | ⚠️ (Prefer CJA) |
Data Sources
↓
AEP Ingestion
↓
AEP Data Lake (Parquet + XDM)
↓
Query Service (SQL Engine)
↓
Results / Derived Datasets / BI Tools
How it relates to Query Service
| Query Service | Data Distiller |
|---|---|
| Core SQL query engine | Add-on on top of Query Service |
| Ad-hoc analysis | Production transformations |
| Manual CTAS | Scheduled & incremental CTAS |
| One-time queries | Continuous data pipelines |
👉 Data Distiller uses Query Service under the hood

