The Science of AI Citation

AthenaHQ

AthenaHQ

Action on AI Search

Introducing the Athena Citation Engine (ACE)

Athena Labs Technical Note — May 2025

Executive Summary

As AI-generated answers increasingly replace traditional search results, citation has emerged as a new form of digital visibility.

Historically, brands competed for rankings. In AI search, brands increasingly compete to become the sources that AI systems cite, trust, and recommend.

This shift creates a new technical problem:

Which content is likely to be cited by AI systems?

To address this problem, Athena developed the Athena Citation Engine (ACE), a proprietary citation prediction model and optimization framework for AI search.

ACE estimates the likelihood that a piece of content will be cited by AI systems and uses that signal inside an iterative content generation and refinement workflow.

To evaluate ACE, Athena analyzed 1,761 published articles that had been live for at least 90 days, allowing sufficient time for AI systems to discover and cite them. Articles were grouped by ACE score and compared against observed citation outcomes.

The results demonstrate a strong relationship between ACE score and citation performance:

  • Articles in the lowest ACE decile were cited 38.6% of the time.
  • Articles in the highest ACE decile were cited 87.0% of the time.
  • Content in the highest ACE decile was 2.25× more likely to be cited than content in the lowest decile.
  • ACE score and observed citation rate exhibited a correlation of 0.90 (R² = 0.81).

These findings suggest that AI citation behavior is not random. It can be measured, modeled, and optimized.

Athena’s broader thesis is that AI search requires a new intelligence layer. ACE is the first model built toward that vision.

The Technical Problem: Citation Prediction

Most AI content tools focus on generation.

They help users create articles, summaries, product descriptions, landing pages, and other forms of content.

However, content generation alone does not answer the question that increasingly matters in AI search:

Will this content actually be cited?

A piece of content may be factually correct, well written, and aligned with a brand’s messaging while still failing to appear in AI-generated answers.

Conversely, some content consistently earns citations despite appearing similar to competing articles.

This observation suggests that citation behavior follows identifiable patterns.

Athena’s hypothesis is that AI citation can be treated as a prediction problem.

Given a content asset and its metadata, estimate the probability that the content will be cited by an AI system.

Formally:

Input: Article content and metadata

Output: Probability that the article will be cited by AI systems

The objective is to estimate:

P(Citation | Content)

This framing transforms AI search optimization from a subjective content exercise into a measurable machine learning problem.

The Athena Citation Engine

ACE is Athena’s proprietary citation prediction model and optimization framework for AI search.

The system consists of two primary components:

  1. Citation Prediction
  2. Citation Optimization

The prediction model estimates the likelihood that content will be cited.

The optimization framework uses that signal to generate, evaluate, refine, and verify content before publication.

Citation Prediction Model

ACE evaluates a content asset and produces a score between 0 and 1.

Higher scores indicate a greater predicted likelihood of citation.

Rather than optimizing for general writing quality, ACE is trained against historical citation outcomes.

This distinction is important.

Good content and citable content are not identical.

A piece of content can be well written but still fail to earn citations because it lacks source signals, specificity, factual grounding, or alignment with the way AI systems retrieve and synthesize information.

Conversely, content that is specific, attributable, well sourced, and semantically aligned with user intent may be significantly more likely to earn citations.

ACE is designed to model this difference directly.

Context Construction Layer

Before content generation or optimization begins, ACE assembles structured context from available brand and knowledge assets.

This context may include:

  • Brand documentation
  • Product information
  • Knowledge-base content
  • Canonical claims
  • Approved source URLs
  • Internal links
  • External references
  • Editorial guidelines
  • Brand voice requirements

The goal is not merely to generate content that is likely to be cited.

The goal is to generate content that is:

  • Citable
  • Factually grounded
  • Brand aligned
  • Verifiable

Optimization Framework

ACE operates inside an iterative workflow.

The simplified process is:

  1. Build context from available brand assets
  2. Generate candidate drafts
  3. Score candidates using ACE
  4. Select the strongest candidate
  5. Generate improved variations
  6. Re-score variations
  7. Verify claims and brand alignment
  8. Publish optimized content

This workflow converts content generation into a search and optimization problem.

Rather than relying on a single generated draft, Athena evaluates multiple candidates against a learned citation objective.

Why Generic LLMs Are Not Enough

General-purpose models such as ChatGPT, Claude, Gemini, and other frontier systems are powerful generation engines.

However, they are not citation prediction systems.

They generally do not know:

  • Which prompts matter most for a specific category
  • Which sources are consistently cited
  • Which competitors are being referenced
  • Which content patterns correlate with citation
  • Whether a specific article is likely to be cited

Most importantly, they are not trained against a history of observed citation outcomes.

Generic models generate content.

ACE evaluates content against a citation objective.

This distinction is critical.

Athena is not attempting to replace frontier models.

Instead, Athena is building a specialized intelligence layer that helps brands understand and improve how those models perceive, cite, and recommend content.

Validation Methodology

To evaluate ACE, Athena conducted a retrospective analysis of published content.

Dataset

The evaluation included:

  • 1,761 published articles
  • Minimum publication age of 90 days
  • ACE score available for every article
  • Observed citation outcome available for every article

The 90-day threshold was chosen to allow sufficient time for AI systems to discover and cite content.

Outcome Definition

An article was considered cited if it received at least one citation within AI-generated responses during the observation period.

This produced a binary outcome:

  • Cited
  • Not cited

Evaluation Procedure

The validation process consisted of four steps:

  1. Score each article using ACE
  2. Sort articles by ACE score
  3. Divide articles into ten equal-sized groups (deciles)
  4. Calculate observed citation rate within each group

The goal was to determine whether higher ACE scores corresponded to higher observed citation rates.

A useful prediction model should produce increasing citation rates as scores increase.

Results

ACE Validation Results: Higher ACE Scores Correspond to Higher AI Citation Rates

Figure 1. ACE Validation Results. Articles were sorted by ACE score and divided into ten equal-sized groups. The observed citation rate represents the percentage of articles that received at least one AI citation after being live for at least 90 days. Higher ACE scores corresponded to higher observed citation rates across every decile.

Key Result

Content in the highest ACE decile was cited by AI systems 87.0% of the time.

Content in the lowest ACE decile was cited 38.6% of the time.

As a result, content in the highest ACE decile was:

2.25× more likely to be cited than content in the lowest ACE decile.

ACE score and observed citation rate exhibited a correlation of:

0.90 (R² = 0.81)

This indicates that ACE captures a meaningful predictive signal for AI citation behavior.

Results Analysis

The most important finding is not simply that high-scoring content outperformed low-scoring content.

The more important observation is that citation rates increased consistently across every decile.

This monotonic relationship indicates that ACE is effective as a ranking and prioritization mechanism.

Higher scores correspond to higher observed citation rates.

Lower scores correspond to lower observed citation rates.

This makes ACE useful for production workflows where teams must decide:

  • Which content to publish
  • Which content to revise
  • Which topics deserve investment
  • Which drafts should be expanded

The score becomes especially useful above approximately 0.73, where citation rates increase sharply.

This suggests that ACE can help identify content with substantially higher expected citation performance before publication.

Proprietary IP and Defensibility

Athena’s differentiation does not come from content generation alone.

Many systems can generate content.

Athena’s advantage comes from the combination of:

  • Citation outcome data
  • Citation prediction models
  • Optimization workflows
  • Continuous feedback loops

Citation Data

Athena observes citation behavior across AI systems.

These observations create labeled examples of citation and non-citation outcomes.

This dataset forms the foundation of ACE.

Citation Prediction

ACE is trained against historical citation outcomes rather than subjective quality measures.

This allows the model to learn patterns associated with real-world AI citation behavior.

Optimization Framework

The score is embedded inside the content workflow itself.

ACE is not a reporting metric.

It is an optimization objective used during generation and refinement.

Feedback Loop

Every workflow expands Athena’s understanding of:

  • Prompt behavior
  • Citation patterns
  • Source preferences
  • Content outcomes

This creates a compounding system:

More content → More observations → Better models → Better recommendations → Better outcomes

Limitations and Future Work

This study validates predictive signal rather than causality.

The analysis demonstrates that higher ACE scores are associated with higher citation rates.

Future work will focus on:

  • Controlled intervention experiments
  • Engine-specific citation prediction
  • Prompt-level citation analysis
  • Citation frequency prediction
  • Revenue attribution from AI visibility

Additional research will investigate how citation behavior differs across:

  • ChatGPT
  • Perplexity
  • Google AI Overviews
  • Gemini
  • Claude
  • Copilot

These systems may exhibit different citation preferences that can be modeled independently.

Toward AI Search Intelligence

ACE is Athena’s first proprietary model.

The broader opportunity is AI Search Intelligence.

As AI systems become the primary interface for information discovery, companies will need infrastructure to answer four questions:

  1. How do AI systems currently represent us?
  2. Which actions improve citation and visibility?
  3. What should we create or optimize?
  4. How do those actions affect business outcomes?

Athena’s long-term vision is to build the intelligence layer that answers those questions.

ACE is the first step.

Conclusion

AI search introduces a new competitive surface.

Visibility is no longer defined solely by rankings.

Increasingly, it is defined by citation.

The Athena Citation Engine was built to model that behavior directly.

Across 1,761 published articles, ACE demonstrated a strong relationship between predicted citation likelihood and observed citation outcomes, achieving a decile-level correlation of 0.90 (R² = 0.81).

These findings suggest that AI citation behavior can be measured, predicted, and optimized.

Generic models generate content.

ACE predicts whether AI systems are likely to cite it.

That distinction forms the foundation of Athena’s approach to AI Search Intelligence.