How we built a sync engine that handles 48M products — SyncTec

Last month, SyncTec processed 48 million product syncs. Our median sync time was 1.8 seconds. Our 99th percentile was under 4 seconds.

This is the story of how we built a sync engine that scales.

Version 1: The Single Server (January 2024)

Our first version was embarrassingly simple:

One Node.js server
In-memory queue
Synchronous Shopify API calls
SQLite database

It worked for our first 50 customers. Each customer synced maybe 100-500 products per day. The server handled ~5,000 syncs per day total.

Then we hit our first scaling wall.

The First Bottleneck: API Rate Limits

Shopify has strict API rate limits: 2 requests per second per store.

When a customer tried to sync 1,000 products at once, it took... 500 seconds. Over 8 minutes.

Our solution: batch requests.

Instead of syncing products one at a time, we batch them into groups of 10 and use Shopify's bulk operations API. This reduced sync time from 8 minutes to about 45 seconds for 1,000 products.

Version 2: The Worker Pool (March 2024)

By March, we had 300 customers. The single server couldn't keep up.

We rebuilt with:

Redis queue (instead of in-memory)
5 worker servers (instead of 1)
PostgreSQL (instead of SQLite)
Load balancer in front

This got us to 50,000 syncs per day. But we were still synchronous: each worker handled one sync at a time.

The Second Bottleneck: Waiting on Shopify

Most of a sync is waiting for Shopify to respond. Our workers spent 80% of their time idle, waiting for API responses.

Solution: async workers.

We rewrote workers to handle multiple syncs concurrently. Each worker now processes 10-20 syncs simultaneously. When waiting for Shopify, it starts another sync.

This 10x'd our throughput without adding servers.

Version 3: The Distributed System (June 2024)

By June, we were at 1,000 customers and 500,000 syncs per day. Still running on 5 workers.

But we had a new problem: customers with huge catalogs (10,000+ products) were blocking the queue for everyone else.

We needed priority queues.

**The architecture:**

High-priority queue: Small syncs (< 100 products)
Medium-priority queue: Medium syncs (100-1,000 products)
Low-priority queue: Large syncs (1,000+ products)

Workers pull from high-priority first. If empty, check medium. If empty, check low.

This kept small syncs fast even when large syncs were running.

Version 4: The Current System (January 2025)

Today's architecture:

20 worker servers (autoscale to 50 during peak)
3 Redis clusters (one per priority level)
PostgreSQL with read replicas (primary + 2 replicas)
CDN for webhooks (CloudFlare)
Monitoring (Datadog)

**The sync flow:**

1. Shopify sends webhook to CloudFlare

2. CloudFlare routes to our API server

3. API server validates webhook, writes to database, pushes job to Redis

4. Worker picks up job from Redis

5. Worker makes batch request to Shopify API

6. Shopify processes batch asynchronously

7. Shopify sends completion webhook

8. We mark sync complete in database

**Average latency:**

Webhook received to job queued: 50ms
Job queued to worker pickup: 200ms
Worker processes sync: 1,500ms
Total: ~1.8 seconds

What We Learned

**1. Batch everything**

Individual API calls don't scale. Batching 10x'd our throughput.

**2. Async is mandatory**

When you're I/O bound (waiting on external APIs), async processing is the difference between 10 syncs/second and 100 syncs/second.

**3. Priority matters**

Without priority queues, one customer's 10,000-product sync blocks everyone else. Priority queues keep small syncs fast.

**4. Webhooks are hard**

Shopify sends thousands of webhooks per second. Handling them reliably is harder than processing syncs. We use CloudFlare to buffer and deduplicate before hitting our servers.

**5. Retries are essential**

Shopify APIs fail sometimes. Networks fail. Our workers retry failed syncs with exponential backoff. Current retry strategy: 3 attempts over 10 minutes.

What's Next

We're working on:

Edge workers: Deploy workers closer to customers for lower latency
Smarter batching: Group products by similarity to reduce API calls further
Predictive scaling: Auto-scale based on queue depth + time of day

The Numbers Today

48M syncs per month
1.8s median sync time
99.9% success rate
20 workers (50 during peak)
2,800+ active merchants

From a single server to a distributed system in 12 months. We learned a lot. We'll keep learning.