Docenas de Pruebas de Backend en un PR: Nuestra Estrategia de Testing para FastAPI

The pull request description read: “Add test suite – 52 tests across auth, data layer, and API surface.” It landed three months after the FastAPI backend it covered had been in production. This is not how testing is supposed to work, and it is not something to celebrate. But it is worth examining honestly, because “tests added late” is a materially different situation than “tests never added,” and the distance between them is exactly the distance between a codebase you can maintain and one you eventually rewrite.

This post describes the test architecture we settled on, the specific patterns that proved durable across a real FastAPI project, and what the PR actually contains beyond its test count.

The project context

The backend is a FastAPI service handling customer-facing operations for an e-commerce operator: order management, inventory queries, pricing rules, and an integration with a third-party logistics API. The database is PostgreSQL via SQLAlchemy with async support. Authentication is JWT-based, with tokens validated against a separate auth service. Three external APIs are called during normal operation: the logistics provider, a currency conversion service, and a product data feed.

Three months of production operation meant three months of bug fixes, feature additions, and incremental changes – all without tests. The codebase was not untestable. It had been built with testability in mind: dependencies injected, no global state, database sessions created per-request via FastAPI’s dependency injection system. The tests had simply never been written. A focused PR was the right vehicle – not because shipping test coverage in bulk is a pattern to recommend, but because “one PR with tests” gets reviewed, gets merged, and gets maintained. “Tests added over time” tends to remain a backlog item indefinitely.

The fixture hierarchy

pytest’s fixture system rewards deliberate design. The natural hierarchy for a FastAPI project with a real database follows three scopes: session, module, and function.

Session-scoped fixtures run once per test run. Database creation, schema migration via Alembic, and table seeding with reference data that does not change between tests – these belong here. The session-scoped database engine is shared across all tests, which means the cost of applying migrations is paid once, not once per test file.

Function-scoped fixtures provide each test with its own isolated state. The transaction rollback fixture is the critical one: it opens a transaction at the start of each test and rolls it back at the end. Any writes the test makes – creating users, inserting orders, updating prices – are reversed automatically. Tests are isolated without truncating tables between runs.

import pytest
import pytest_asyncio
from sqlalchemy.ext.asyncio import create_async_engine, AsyncSession
from sqlalchemy.orm import sessionmaker

@pytest_asyncio.fixture(scope="session")
async def engine():
    engine = create_async_engine(settings.TEST_DATABASE_URL, echo=False)
    async with engine.begin() as conn:
        await conn.run_sync(Base.metadata.create_all)
    yield engine
    await engine.dispose()

@pytest_asyncio.fixture
async def db_session(engine):
    """Each test gets a transaction that rolls back on completion."""
    async with engine.begin() as conn:
        await conn.begin_nested()
        session_factory = sessionmaker(
            conn, class_=AsyncSession, expire_on_commit=False
        )
        async with session_factory() as session:
            yield session
            await session.rollback()

The begin_nested() call creates a savepoint within the outer transaction. Test code can commit freely within its session, but the outer transaction’s rollback at teardown undoes everything. This pattern works with SQLAlchemy’s async layer, though it requires attention when testing code that explicitly calls session.commit() – those commits complete within the savepoint and are still rolled back by the outer transaction.

Mocking external APIs

Three external APIs means three potential sources of flakiness in tests that hit the network. The standard solution – mocking the HTTP client – is straightforward with respx, which intercepts httpx requests at the transport layer.

import respx
import httpx
import pytest

@pytest.mark.asyncio
async def test_shipping_estimate_returns_carrier_data(test_client, valid_token):
    mock_response = {
        "carrier": "DHL",
        "estimated_days": 3,
        "cost_eur": 8.50
    }

    with respx.mock:
        respx.post("https://api.logistics-provider.com/v2/rates").mock(
            return_value=httpx.Response(200, json=mock_response)
        )
        response = await test_client.post(
            "/api/v1/shipping/estimate",
            json={"destination_zip": "10115", "weight_kg": 2.5},
            headers={"Authorization": f"Bearer {valid_token}"}
        )

    assert response.status_code == 200
    assert response.json()["estimated_days"] == 3

The respx.mock context manager routes all outbound httpx requests through the mock layer. Any request not explicitly registered raises an error – which is the correct default. Tests that accidentally reach production APIs are a category of bugs that is difficult to detect and expensive when it causes side effects.

For the currency conversion service, called on every order with a non-EUR denomination, a shared fixture provides a pre-configured respx router with standard exchange rates. Tests covering currency logic override specific rate pairs; tests covering unrelated behaviour that happens to trigger the currency dependency use the fixture’s defaults without thinking about it. This separation keeps test intent clear: tests about shipping are about shipping, not about whatever exchange rate happens to be configured.

Testing authentication flows

FastAPI’s dependency injection system is one of its strongest assets for testing. Authentication in this project is a dependency: every protected route declares current_user: User = Depends(get_current_user), where get_current_user validates a JWT and returns a user object. In tests, this dependency is overridden at the application level:

from app.main import app
from app.dependencies import get_current_user
from app.models import User

def make_user(role: str = "operator") -> User:
    return User(id="test-user-id", email="[email protected]", role=role)

@pytest.fixture
def operator_client(db_session):
    app.dependency_overrides[get_current_user] = lambda: make_user("operator")
    client = TestClient(app)
    yield client
    app.dependency_overrides.clear()

@pytest.fixture
def admin_client(db_session):
    app.dependency_overrides[get_current_user] = lambda: make_user("admin")
    client = TestClient(app)
    yield client
    app.dependency_overrides.clear()

Separate fixtures for separate roles make permission boundary tests explicit. A test that asserts a 403 response when an operator attempts an admin action reads as clearly as the business rule it enforces. When the fixtures are named operator_client and admin_client, the test’s intent is visible without reading its body.

A small set of tests exercise the JWT validation code path directly – expired tokens, malformed tokens, tokens with incorrect audience claims. These tests do not use the override fixture; they pass raw tokens through the test client and assert on 401 or 403 responses. The separation is deliberate: most tests care about behaviour after authentication, so they use the override. The authentication tests care about the authentication logic itself. Mixing the two obscures what each test is actually covering.

Testing error paths deliberately

Happy-path tests are straightforward to write and are often the only tests that get written under time pressure. The bugs that reach production are rarely in the happy path. For this codebase, the most consequential untested behaviour was in error handling: what happens when the logistics API returns 503? What does the API return when required request fields are missing? What happens when a database constraint is violated by a concurrent modification?

For each external API mock, we include at least one error case alongside the success case. The logistics API fixture has a variant that returns 503, confirming that the endpoint returns a structured error response with an appropriate HTTP status code rather than letting the exception propagate as a 500. Database constraint tests attempt conflicting inserts and confirm that the API surfaces a 409 rather than an unhandled integrity error.

This coverage does not emerge naturally from writing tests after the fact. It requires deliberate enumeration of failure modes during test design, which is one honest argument for writing tests alongside code rather than afterward – the failure modes are easier to enumerate when the implementation is fresh and the edge cases are still in working memory.

The ROI of testing late

Testing three months late is better than testing never, for reasons that are not philosophical. The PR caught three existing bugs during authoring: a missing null check on an optional address field that caused a 500 on orders without a billing address line two, an off-by-one in pagination logic that only manifested with empty result sets, and an incorrect HTTP status code (200 instead of 201) on a resource creation endpoint. These bugs had been in production for months without triggering a user complaint – which is precisely why they had gone undetected. The tests found them in an afternoon.

The more durable return is behavioural documentation. A test suite that covers authentication boundaries, external API interactions, and error conditions is a specification of what the API does that remains accurate as long as the tests pass. The test for “operator cannot access admin endpoint” does not just catch regressions – it communicates intent to whoever reads the codebase next. For a service that will be maintained by someone other than its original author – which is eventually true of every service – that specification is worth considerably more than the bugs it catches at the time of writing.

The argument against writing tests late is that the cost is front-loaded: a large PR, a concentrated review effort, time that could have gone to features. This is accurate. But the alternative – a codebase that remains untested because the right moment to add tests never quite arrives – carries a compounding cost that rarely gets measured because it is distributed across every subsequent change. A late test suite is expensive once. No test suite is expensive continuously.