Why multi-location inventory sync took 6 months to build
Location Groups looks simple from the outside: link stores, sync inventory. But underneath it's one of the hardest features we've ever shipped. Here's what we learned building real-time inventory sync that actually works.
Location Groups launched in May 2025. We started building it in November 2024.
Six months for a feature that looks simple: link stores, sync inventory counts. Why did it take so long?
Because inventory sync is hard. Really hard. Here's what we learned.
The Core Problem: Race Conditions
Imagine this scenario:
- You have 1 unit of Product A in stock
- Customer 1 on Store A adds it to cart at 2:00:00.000pm
- Customer 2 on Store B adds it to cart at 2:00:00.001pm
- Both customers click 'Buy' within milliseconds of each other
Who gets the product?
Without careful engineering, both customers might successfully check out. You've now oversold. One customer will be disappointed.
This is called a race condition, and it's the central challenge of inventory sync.
Attempt 1: Last Write Wins (Failed)
Our first approach: whenever inventory changes on any store, immediately update all other stores.
This failed immediately. Here's why:
- Store A sells 1 unit. New count: 9
- We start syncing to Store B
- Meanwhile, Store B sells 2 units. New count: 8
- Store B syncs to Store A
- Now both stores show different counts
Last write wins means... nobody wins. Inventory drifts constantly.
Attempt 2: Centralized Counter (Too Slow)
Next attempt: maintain a single source of truth in SyncTec's database.
- All stores query SyncTec for current inventory
- When a sale happens, update SyncTec first
- Then sync to Shopify
This solved race conditions but created a new problem: latency.
Shopify checkout would try to check inventory, call our API, wait for response. Added 200-300ms to every checkout. Not acceptable.
Attempt 3: Optimistic Locking (Getting Closer)
Third attempt: optimistic locking with version numbers.
- Each inventory record has a version number
- When updating, check if version matches
- If it does, update and increment version
- If it doesn't, someone else modified it first — retry
This worked... most of the time. But:
- High contention (multiple stores selling fast) caused lots of retries
- Retries meant slower syncs
- Still couldn't prevent overselling in edge cases
Attempt 4: Distributed Locking (The Solution)
Final approach: distributed locks with Redis.
**How it works:**
1. Customer adds product to cart on Store A
2. Shopify reserves inventory on Store A
3. Store A sends webhook to SyncTec
4. SyncTec acquires distributed lock for this product
5. SyncTec reads current inventory from all stores
6. SyncTec calculates new inventory (current - amount sold)
7. SyncTec updates all stores
8. SyncTec releases lock
The lock ensures only one process can modify inventory at a time. No race conditions.
**Why Redis?**
Redis has atomic operations for distributed locking. We use Redlock algorithm for high availability.
**How fast is it?**
Lock acquisition: ~10ms. Total sync time: ~150ms. Fast enough that customers never notice.
Challenge 2: Location Mapping
Shopify lets each store have multiple locations (warehouse, retail store, pop-up, etc.).
When you say 'sync inventory for Product A', which location are we syncing?
We needed location mapping: let merchants tell us which location in each store corresponds to which physical warehouse.
This took 6 weeks to build because:
- UI had to be simple (it's not)
- Data model had to support many-to-many relationships (complex)
- Error handling had to be bulletproof (lots of edge cases)
Challenge 3: Partial Failures
What happens if we sync to 3 stores successfully, but the 4th store's API times out?
We can't roll back the first 3 stores — customers might have already seen the updated inventory.
Our solution: retry queue.
- If a sync fails, push to retry queue
- Retry with exponential backoff (1s, 2s, 4s, 8s, 16s)
- After 5 attempts, alert the merchant
- Merchant can manually re-sync or investigate
Challenge 4: Testing
How do you test race conditions?
We built a chaos testing framework:
- Spin up 4 test stores
- Simulate 100 customers buying simultaneously
- Randomly inject failures (API timeouts, webhook delays, network errors)
- Verify inventory never oversells
Ran this test 10,000 times before we were confident it worked.
What We Learned
**1. Distributed systems are hard**
Syncing data across multiple systems in real-time is fundamentally difficult. No shortcuts.
**2. Locks are essential**
Without distributed locking, you can't prevent race conditions. Period.
**3. Test everything**
We found 23 edge cases during testing that we never would have thought of. Chaos testing saved us.
**4. User experience matters**
The technical solution is only half the battle. Making it simple for merchants to configure is just as hard.
The Result
Location Groups is our most complex feature. It took 6 months.
But it works. Merchants love it. And we haven't had a single overselling incident since launch.
Sometimes the best features are the ones that look simple.