Source_Priority_at_the_AI_Layer | clamp test

Source Priority at the AI Layer

The AI model is only as good as the data it receives. Source priority at the inference layer is the discipline that determines what data the model sees, in what order, and what to do when the preferred source is unavailable.

Key Takeaways

Source priority at the AI layer is about data quality, not just availability.
Cached results are the highest-priority source—they cost nothing and are always correct for the inputs that produced them.
Transcript quality varies dramatically across video sources, and the priority chain reflects this variation.
The fallback chain is observable: every source attempt is a row in the database, with the reason for failure if it failed.

When the AI model processes a video, it does not see the video itself. The model sees text—a transcript, a description, a summary of the frames. The text is what the model reasons about. The quality of the model's output is bounded by the quality of the text it receives. A model that receives a perfect transcript will produce a better summary than a model that receives a noisy auto-generated caption. The source priority is, ultimately, a statement about which text sources are most likely to produce good model output.

The first version of the deepmox-worker sent the model whatever data was available. The order was: transcript, then description, then nothing. This worked for some videos and failed for others. The model would produce great summaries for videos with high-quality transcripts and poor summaries for videos with only descriptions. The bug was not in the model. The bug was in the source selection logic. The system was treating all sources as equivalent, when in fact they vary dramatically in quality.

The fix was to make source priority explicit. Each video source provides different kinds of data at different quality levels. The deepmox-worker encodes this as a priority chain, evaluated at inference time:

priority_chain = [
    { source: 'cache', condition: 'cache_hit' },
    { source: 'human_transcript', condition: 'available' },
    { source: 'official_transcript', condition: 'available' },
    { source: 'auto_transcript', condition: 'available' },
    { source: 'description', condition: 'available' },
    { source: 'metadata_only', condition: 'always' }
]

The chain is evaluated in order. The first source whose condition is met is used. The cache is the highest priority because it costs nothing and is always correct. The human transcript is second because it is the highest-quality fresh data. The official transcript is third because it is professionally produced. The auto transcript is fourth because it is machine-generated and noisy. The description is fifth because it is a user-written summary, not a full transcript. The metadata-only fallback is the last resort—when there is no other data, the model works with what it has.

flowchart TD
    Start[Inference Request] --> C{Cache hit?}
    C -->|Yes| Use[Use cached result]
    C -->|No| HT{Human transcript?}
    HT -->|Yes| Fetch1[Fetch human transcript]
    HT -->|No| OT{Official transcript?}
    OT -->|Yes| Fetch2[Fetch official transcript]
    OT -->|No| AT{Auto transcript?}
    AT -->|Yes| Fetch3[Fetch auto transcript]
    AT -->|No| D{Description?}
    D -->|Yes| Fetch4[Fetch description]
    D -->|No| Meta[Use metadata only]
    Fetch1 --> Infer[Run inference]
    Fetch2 --> Infer
    Fetch3 --> Infer
    Fetch4 --> Infer
    Meta --> Infer
    Infer --> Cache[Update cache]
    Cache --> Return[Return result]
    Use --> Return

The diagram shows the priority chain in action. The cache hit is the fastest path: no network calls, no inference, just a database lookup. The human transcript path is the highest-quality fresh path: one network call to fetch the transcript, then inference. The official transcript path is similar. The auto transcript path is faster but lower quality. The description path is the lightest: short text, fast inference. The metadata-only path is the last resort: the model has very little to work with, and the result is necessarily weaker.

The cache is the most important source. Caching is the discipline that makes repeated inference economical. A popular video might be summarized thousands of times. Without caching, the system would pay the inference cost thousands of times. With caching, the system pays once, and subsequent requests get the cached result for free. The cache key is a hash of the source chain that produced the result. If the source chain is the same, the result is the same. The cache is invalidated only when the source chain changes (e.g., a new transcript is added).

I started this analysis believing that caching was a performance optimization. After running the system in production, I now believe caching is a correctness feature. The cache is the only way to guarantee that two requests with the same inputs produce the same outputs. Without the cache, the model might produce slightly different summaries on different runs (due to temperature, model version, or prompt variations). With the cache, the same inputs always produce the same outputs. The cache is the system's memory. Without it, the system has amnesia.

Imagine you are a researcher. You write a paper. A colleague asks you to summarize it. You give them a summary. A second colleague asks you to summarize it. You give them the same summary. You do not re-read the paper each time. The first summary is the cache. Your subsequent answers are the cached results. If the paper changes, you re-read it and update the cache. If the paper does not change, the cache is correct. The model is the same. The model has a memory, and the memory is the cache.

The fallback chain is observable. Every source attempt is a row in the inference_sources table, with the source type, the attempt time, the success or failure, and the reason for failure. This means the system can answer questions like: how often does the cache hit, how often does the human transcript succeed, how often does the system fall through to the metadata-only fallback, which sources fail most often, and why. These questions are impossible to answer in systems where source selection is in-process. The deepmox-worker answers them with a single query.

The query that reveals the source distribution is:

SELECT source, count(*), avg(quality_score) FROM inference_sources
WHERE created_at > now() - interval '24 hours'
GROUP BY source ORDER BY count DESC

The answer is a snapshot of the system's data quality. If the human transcript is the dominant source, the system's outputs are high quality. If the auto transcript is the dominant source, the outputs are noisy. If the metadata-only fallback is common, the system has a data problem. The query is a dashboard, a metric, and a diagnostic tool, all in one.

The most counterintuitive property of source priority is that it is policy, not mechanism. The priority chain is a statement about what data the user wants the model to see. Different users might want different priorities. A premium user might want human transcripts only. A free user might accept auto transcripts. A user with no transcript available might want the model to skip the video entirely. The deepmox-worker encodes this as a submission parameter, like the source priority for video. The worker uses the parameter at inference time. The policy is in the data, not in the code.

The shift in my thinking came when I realized that source priority is the same pattern as the foundation layer's state-in-the-database principle. The priority is data. The chain is data. The fallback logic is data. The worker's job is to read the data and act on it. This means the priority chain can be changed without redeploying the worker. It can be different per user, per job, per experiment. It can be A/B tested. It can be tuned based on quality metrics. The data is the policy. The worker is the mechanism. The pattern is consistent across the system.

The source priority at the AI layer is, ultimately, a statement about quality. The system prefers high-quality data, falls back to lower-quality data when necessary, and surfaces the data quality through the database. The cache is the highest-priority source because it is the most reliable. The human transcript is the highest-quality fresh source. The auto transcript is the lowest-quality fresh source. The metadata-only fallback is the last resort. The system is honest about what it has and what it does not have. The next chapter will look at output validation—the discipline that catches bad model output before it reaches the user.