Table of Contents
- Introduction
- Client Overview
- Technical Challenge
- Our Technical Solution
- 1. Kafka-Based Producers on ECS for Discord Bot Events
- 2. YouTube Data Collection Using Apache NiFi
- 3. Twitch Stream Monitoring & Enrichment with NiFi
- Implementation Details
- 1. Optimizing Kafka-Based Producers on ECS
- 2. YouTube Data Ingestion via Apache NiFi
- 3. Twitch Data Ingestion via Apache NiFi
- Results
- Conclusion

Do not index
Do not index
Introduction
A growing gaming and content creator platform continues to expand its data infrastructure to power real-time experiences across services like Discord, YouTube, Twitch, and Spotify. To support these diverse integrations, a scalable, event-driven architecture was designed to ensure data is ingested, enriched, and stored efficiently using technologies like Kafka, ECS, Redshift, TimescaleDB, and Apache NiFi.
Client Overview
The client is a leader in the gaming and content creator ecosystem, collaborating with top influencers and communities. Their primary focus is on building tools and data infrastructure to support creator engagement, live content tracking, and platform analytics.
Technical Challenge
The key challenge was building a robust, real-time data pipeline that could:
- Handle high-volume streaming data from Discord, YouTube, and Twitch.
- Enrich data before storing it in analytical databases.
- Feed two different storage systems (Redshift and TimescaleDB) simultaneously.
- Scale seamlessly across multiple ECS services and NiFi workflows.
Our Technical Solution
We implemented a modular solution, structured around three core components:
1. Kafka-Based Producers on ECS for Discord Bot Events
Tech Stack: Java, Apache Kafka, AWS ECS, Redshift, TimescaleDB
- Modular Producers: Developed and deployed three separate Java-based producers on ECS, each tailored to a specific data domain:
- Game Events Producer
- Spotify Events Producer
- Miscellaneous Events Producer (e.g., messages, reactions, voice states)
- Kafka Integration: Each producer sends events to dedicated Kafka topics corresponding to its data type, ensuring modular and scalable ingestion.
- Data Processing and Insertion:
- For Miscellaneous events, Kafka Sink processors directly insert the data into TimescaleDB and Redshift (dual feeding).
- For Game and Spotify events, data is first routed through enricher services that apply domain-specific transformations before inserting into both databases.
- Multi-Bot Support: The architecture is designed to support multiple Discord bots. By reusing the same producer codebase with a different bot token, multiple bot instances can be deployed and configured to send events to the same Kafka topics. This allows all bot-generated data to flow into common Redshift and TimescaleDB tables, maintaining a unified and scalable event tracking system.
2. YouTube Data Collection Using Apache NiFi
Tech Stack: Apache NiFi, YouTube Data API v3, Redshift
- Deployed Apache NiFi to orchestrate the daily ingestion of YouTube data.
- Built pipelines to fetch:
- Channel-level metadata (e.g., subscriber count, total videos)
- Video-level metrics (e.g., views, likes, comments)
- The service runs daily, fetching:
- Updated data for existing YouTube channels and videos
- Newly published videos from those channels
- All collected data is stored in Redshift, enabling centralized access for analysis and reporting.
3. Twitch Stream Monitoring & Enrichment with NiFi
Tech Stack: Apache NiFi, Twitch API, TimescaleDB
- Built a NiFi processor pipeline that:
- Pulls live stream data every 10 minutes.
- Inserts raw data into TimescaleDB.
- A second enrichment pipeline:
- Updates the stream records with associated channel and game data.
- Additional pipelines perform derived metric calculations and write to 8–10 summary tables for analytics and reporting.
Implementation Details
1. Optimizing Kafka-Based Producers on ECS
- Modular Design: Separated producers for Game, Spotify, and Miscellaneous data ensured clearer routing, scaling, and debugging across ECS services.
- Dual DB Feeding: Kafka Sink processors directly insert certain topics (e.g., messages, reactions) into both TimescaleDB and Redshift.
- Enrichment Layer: For complex datasets like Game and Spotify, enrichment services were introduced before database insertion, ensuring the data is cleaned and normalized.
- Connection Scaling: Producer throughput was optimized by tuning Kafka batch sizes, compression, and managing DB connections to handle high-frequency event ingestion.
2. YouTube Data Ingestion via Apache NiFi
- Daily Scheduled Pipelines: NiFi pipelines were configured to run once daily, automatically fetching both updated channel/video data and any newly published videos.
- Quota & API Management: Pagination and token refresh logic were added to avoid hitting YouTube API quotas during high-volume fetches.
- Centralized Storage: Data was stored in Redshift, enabling unified access for analytics across all creator platforms.
3. Twitch Data Ingestion via Apache NiFi
- Real-Time Constraints: Twitch live stream data is fetched every 10 minutes; processor concurrency and flow tuning ensured that all data is processed and written to TimescaleDB within this interval.
- Data Enrichment and Aggregation: Enrichment pipelines update stream records with corresponding channel and game details. Additional pipelines perform metric calculations and update data into 8–10 summary tables for analytics.
Results
- Achieved reliable real-time ingestion across three major content platforms (Discord, YouTube, Twitch).
- Reduced processing time per pipeline cycle, ensuring all data is processed within a 10-minute window.
- Implemented dual DB writing strategy for both OLAP (Redshift) and time-series (TimescaleDB) use cases.
- Improved performance and stability of NiFi pipelines by resolving heap memory issues and scaling flow concurrency.
- Enabled scalable deployment using ECS and NiFi across multiple services and use cases.
Conclusion
This case study outlines how I designed and implemented a scalable, real-time data platform for a leading gaming and content creator ecosystem. Leveraging Kafka-based producers, NiFi pipelines, enrichment services, and dual-database integration, the system enables unified analytics and real-time tracking across platforms like Discord, YouTube, and Twitch.