tva
← Insights

Pipelines de Datos Multi-Marketplace: Gestión de Cuentas de Vendedor en Distintas Regiones

Scaling an Amazon data pipeline from one marketplace to three is not a matter of running the same code three times. The US, Japanese, and German marketplaces differ in ways that are more fundamental than timezone offsets: report formats vary by marketplace for the same report type, currency handling requires explicit decisions about where conversion happens (and at what rate), date conventions are inconsistent across both the API responses and the downloaded files, and the account structures — unified versus regional — affect which credentials provide access to which data.

Most teams discover these differences sequentially: they build for one marketplace, then encounter each discrepancy individually when extending to the next. The more productive approach is to design for multi-marketplace from the start, even if the initial deployment only uses one. The marginal cost of a config-driven architecture is low. The cost of refactoring a hardcoded single-marketplace implementation to support additional regions is high.


Where Marketplaces Actually Differ

Report availability varies more than the SP-API documentation suggests. Certain report types are available on all marketplaces; others are North America-only, EU-only, or available only on specific regional endpoints. The reports covering VAT compliance, for example, are specific to EU marketplaces and have no equivalent in North America. Conversely, some FBA inventory reports have different column structures on Amazon.co.jp than on Amazon.com — the same report type ID returns different schemas depending on the marketplace endpoint.

Date and time handling is a persistent source of subtle bugs. Amazon's API returns timestamps in UTC, but downloaded report files often use the marketplace's local timezone for date columns without explicitly stating this. A settlement report for the Japanese marketplace will have dates in JST; the same report for Germany will use CET or CEST depending on the time of year. Code that treats all date columns as UTC will produce off-by-one errors on date aggregations that are difficult to detect because the data looks superficially correct.

Currency is the most operationally consequential difference. Amazon settles in the local marketplace currency: USD for North America, JPY for Japan, EUR for Germany. A pipeline that stores raw monetary values without explicit currency tagging creates a data quality problem that compounds over time — by the time the error is noticed, months of data may need to be corrected. The right design stores the currency code alongside every monetary value, and defers conversion to analysis time rather than hardcoding it into the extraction layer.


Config-Driven Architecture

The core of a maintainable multi-marketplace pipeline is a configuration layer that captures all marketplace-specific properties in one place, rather than scattering conditional logic throughout the extraction code. Each marketplace gets a configuration block that specifies its endpoint, credential references, local timezone, currency code, supported report types, and any format overrides for reports that differ from the baseline schema.

The extraction code itself becomes marketplace-agnostic: it reads the configuration for the target marketplace, selects the appropriate credentials, applies the correct timezone to date fields, and tags all monetary values with the configured currency code. Adding a new marketplace requires adding a configuration block and noting any format variations — not modifying the extraction logic.

This approach has a secondary benefit: it makes the data model explicit. When the currency code, timezone, and report format variations are documented in a configuration file rather than embedded in code comments, they become visible to everyone who works with the pipeline — not just the developer who originally handled each edge case.


Per-Marketplace YAML State

The YAML state tracking described for single-marketplace pipelines extends naturally to multi-marketplace operation, but the state file structure requires a deliberate choice: one state file per marketplace, or one file with marketplace-scoped sections.

Separate state files per marketplace are strongly preferable. They allow each marketplace's pipeline to run independently without contention on the state file, they make it straightforward to reset or replay one marketplace's extraction without affecting others, and they map naturally to a directory structure that organizes extracted files by marketplace. A state directory with one YAML file per marketplace — state/us.yaml, state/jp.yaml, state/de.yaml — is self-documenting in a way that a single combined file with nested sections is not.

The state file schema should include the marketplace identifier explicitly, even though it is implied by the filename. When state files are inspected during debugging or auditing, the redundancy eliminates ambiguity about which marketplace's data is being reviewed.


Registry-Based Report Definitions

A report registry — a structured definition of every supported report type, with its ID, marketplace availability, expected schema, and any format variations — is the highest-leverage investment in a multi-marketplace pipeline. Without it, the knowledge of which reports exist, how they differ across marketplaces, and what their column names mean lives in the heads of the developers who built the pipeline.

The registry serves several purposes simultaneously. It provides a single source of truth for report type IDs, which Amazon occasionally changes or deprecates. It documents schema variations by marketplace, making it straightforward to handle the differences programmatically rather than through scattered conditionals. It enables validation: every downloaded file can be checked against its expected schema, with discrepancies flagged for investigation rather than silently written to disk.

The registry also makes gap analysis tractable. Given a list of expected report types and the YAML state files for each marketplace, a simple script can identify which reports are missing, which are stale, and which have encountered repeated errors — across all marketplaces at once. Without the registry, this analysis requires understanding the extraction code rather than just querying a data structure.


Operational Considerations at Scale

Running parallel extraction across three marketplaces introduces timing considerations that single-marketplace pipelines do not face. The two-pass workflow — request in the evening, download in the morning — needs to account for the fact that "evening" and "morning" mean different things in UTC for US, Japanese, and German marketplaces. A single scheduled run that works for Germany will request reports at an inconvenient time for Japan and may miss the business-day boundary for the US.

The practical solution is marketplace-local scheduling: each marketplace's pipeline runs on a schedule aligned to its own business day, rather than a single global schedule. This adds operational complexity but eliminates the timezone-driven race conditions that create intermittent data gaps.

Rate limiting is the other operational consideration that becomes more salient with multiple marketplaces. The SP-API imposes rate limits per credential set, but credential sets can be shared across marketplaces in some configurations. A pipeline that requests reports aggressively for three marketplaces in parallel may exhaust shared quotas in ways that a single-marketplace pipeline does not. Explicit rate limit tracking — per credential set, not per marketplace — prevents the throttling errors that are otherwise difficult to diagnose.


Related Insights

Artículos relacionados