Russia-linked Pravda network cited on Wikipedia, LLMs, and X

Pravda network content pollutes sources on multiple platforms, sparking worries of false claims and sanctioned content reaching global audiences

Russia-linked Pravda network cited on Wikipedia, LLMs, and X

Share this story
THE FOCUS

Banner: Close-up of screen featuring LLM-powered chatbots. (Source: Nicolas Economou/NurPhoto via Reuters Connect)

The DFRLab and Finland-based CheckFirst examined the dissemination of content from Russia’s Pravda network in Wikipedia source links, X Community Notes, and conversations generated by popular AI chatbots. Our research reveals that Pravda network domains are often cited as sources, and their claims are reposted on these platforms, sparking concerns of content pollution. Using API access to Wikipedia and X, we found that posting activity featuring hyperlinks to Pravda network domains had grown exponentially since February 24, 2022. All data used in this investigation is available on CheckFirst’s Github repository. 

Formally known as Portal Kombat, the Pravda network is an inauthentic network of hundreds of news aggregators that has spread pro-Kremlin content since 2014. The DFRLab previously found that the Pravda network had targeted more than eighty regions and countries globally and heavily relied on machine translation. In addition, we reported on a Russian Wikipedia knock-off, dubbed the “Encyclopedia Runiversalis,” which heavily promoted pro-Kremlin narratives targeting domestic Russian and global audiences.

Using the Wikipedia API, CheckFirst gathered data from articles that featured hyperlinks to Pravda news aggregators. The data was then processed quantitatively, providing insights on posting trends and targets, and using a large language model (LLM) to analyze the articles found. Wikipedia contributors steadily sourced claims using Pravda news sites, enabling the laundering of content and potentially circumventing restrictions placed upon sanctioned Russian news sources such as RT, Sputnik, and others.

Our investigation also questions how the content pollution of Wikipedia by Pravda sources may impact LLMs utilizing one of the web’s freest and most popular resources to train generative AI algorithms. By prompting popular AI chatbots such as OpenAI’s ChatGPT and Google’s Gemini, we found that content posted by Pravda news portals had found its way into the generated responses. Furthermore, the chatbots did not disclose the network’s links to Russia despite including sources of reports proving so. A March 2025 publication by NewsGuard Technologies also found that popular AI tools had incorporated millions of records emanating from Pravda network news sites.

Leveraging publicly available data on X Community Notes, CheckFirst conducted an analysis to search for references to Pravda network websites. Data analysis of the identified X Community Notes and posts suggested that pro-Kremlin narratives were both debunked and justified using content posted on Pravda-affiliated domains, increasing their reach on the platform with a surge in activity on certain occasions in 2025.

Using Wikipedia’s API, we collected data from Wikipedia articles, identifying 1,907 hyperlinks shared across 1,672 pages and spanning forty-four languages that direct to 162 of the Pravda-affiliated websites. Wikipedia contributors heavily included Pravda sources in pages in Russian (922 hyperlinks) and Ukrainian (580 hyperlinks). However, since the start of Russia’s full-scale invasion in February 2022, the posting pace of this content has significantly increased, expanding to other audiences in English (133), and to a lesser extent, French (28), Mandarin (25), German (19), and Polish (17). Altogether, forty-four versions of Wikipedia in multiple languages featured Pravda hyperlinks, including Armenian, Spanish, Italian, Uzbek, Belarusian, Hungarian and many others.

Numerous hyperlinks were also posted in Russian minority languages such as Bashkir (28) and Tatar (25), an indication that Russia’s domestic audience is being targeted. Bashkir is spoken in Russia’s Bashkortostan region, where, reportedly, fatalities remain the highest. Since Russia does not actively report its combat fatalities, the DFRLab previously used local Telegram channel data, such as obituaries, to calculate fatalities.

Bar chart showing the distribution of Pravda hyperlinks on Wikipedia among languages. (Source: @CheckFirst and @gyron_bydton via Wikipedia)

When collecting the data, we aggregated the hyperlinks based on the domains to which they were pointing, revealing that most hyperlinks pointed at the domain crimea-news[.]com, which we previously established as the first version of the Pravda network. Subsequent expansions of the network targeting Ukraine, belonging to the  top-level domains “topnews” and “uanews,” were also the most often cited hyperlinks in Wikipedia sources across all observed languages, although more prominent in Russian, Ukrainian, Bashkir, and Tatar.

Pie chart showing the distribution of hyperlinks per domain on Wikipedia articles featuring Pravda hyperlinks. (Source: @CheckFirst and @gyron_bydton via Wikipedia)

In English, recent localizations of the Pravda network, such as pravda-en[.]com, the Donbas-focused dnr-news[.]com, and news-pravda.com, were also listed among the frequent domains in hyperlinks, despite crimea-news[.]com and other Ukraine-focused domains also being ubiquitous. Contributors to Wikipedia in other languages also utilized the English versions of Pravda news sites, most notably in Mandarin. Other localized versions of the Pravda network in various European languages also found their way into Wikipedia sources.

Table showing the distribution of Pravda domains in hyperlinks found in Wikipedia sources. (Source: @CheckFirst and @gyron_bydton via Wikipedia)

Content analysis reveals strong focus on Russia’s full-scale invasion of Ukraine and affiliated military personnel

As demonstrated, Russian-language Wikipedia had the greatest penetration, with 922 entries relying on Pravda network sources. These entries primarily encompassed domestic biographical content, documentation of chronological events, and coverage of regional and local political developments. The Ukrainian version of Wikipedia, with 580 affected articles, exhibited a strong concentration on topics related to the ongoing Russia-Ukraine conflict, including detailed accounts of military operations, Russian military losses, and comprehensive conflict chronologies.

English-language Wikipedia, with 133 entries, featured the broadest thematic range. Articles included diverse international biographies, broader geopolitical discussions, and extensive military-related coverage, particularly concerning the Russia-Ukraine conflict. In contrast, French, German, and Polish Wikipedias, though less extensively compromised, consistently presented detailed chronological accounts of the Russia-Ukraine conflict, biographies of key figures involved, and military events with specific regional relevance.

Chinese Wikipedia’s 25 entries were primarily focused on high-profile incidents and key individuals associated with the Russia- Ukraine conflict, alongside detailed references to the military equipment involved.

Graphs showing the number of Pravda hyperlinks posted in sources of Wikipedia articles. (Source: @CheckFirst and @gyron_bydton via Wikipedia)

Thematically, content referencing Pravda network sources significantly focused on biographies and profiles of individuals, predominantly political figures from Russia and Ukraine, alongside military personnel, cultural personalities, and historical figures. Additionally, substantial coverage was devoted to documenting aspects of the Russia-Ukraine conflict, including military operations, territorial changes, equipment, and chronologies. Military and defense-related topics featured prominently, documenting formations, weapon systems, personnel, and historical military events.

Chronological documentation spanning recent years, particularly from 2022 to 2025, suggests that these sources were strategically utilized for real-time historical recording of conflict-related developments. This approach highlights an effort to establish a long-term informational presence, shaping historical narratives as events unfold.

Further analysis identified overlaps in biographical content across multiple languages, indicating coordinated content dissemination. The differing documentation approaches to the Russia-Ukraine conflict across language communities illustrate how regional perspectives and cultural emphasis were strategically leveraged to influence narratives.

An analysis of content patterns indicates clear differences in the treatment and emphasis of invasion-related material between Russian and Ukrainian Wikipedia. Ukrainian Wikipedia contains a higher proportion (6.9 percent) of explicit invasion-related content than Russian Wikipedia (3.9 percent). Monthly patterns also suggest strategic increases in content creation, with significant activity spikes observed on Russian Wikipedia in October 2022, May and July 2023, and September 2024. Ukrainian Wikipedia similarly showed spikes in December 2022, September 2023, and January 2024.

Following February 2022, the Ukrainian Wikipedia notably shifted its focus toward documenting Russian military losses and attacks, reflected in articles detailing casualties and specific military unit losses. In contrast, the Russian Wikipedia maintained its concentration on domestic political issues, regional elections, and biographical content, indirectly referencing the conflict through discussions on governance and administrative developments.

Qualitative analysis revealed that the Ukrainian Wikipedia employed direct terminology regarding the conflict, using explicit language such as “invasion,” “war,” and “losses.” Conversely, the Russian Wikipedia adopts a more indirect approach, framing conflict-related topics through domestic political and administrative lenses.

LLMs cite claims using Pravda websites

The embedding of Pravda network websites into Wikipedia is particularly concerning given Wikipedia’s significant role as a primary source of knowledge for large language models (LLMs). In February 2025, the American Sunlight Project published a report indicating that the content hosted on the news-pravda website could potentially be AI-generated, further questioning the reliability and integrity of information disseminated by the Pravda network. This finding raises critical concerns about the integrity of information consumed by users and subsequently reinforced through generative AI tools.

Large language models (LLMs) such as ChatGPT, Gemini, or Copilot are trained on extensive datasets to enhance their language comprehension and generation capabilities. A notable component of these datasets is Wikipedia, which provides a vast repository of human knowledge across diverse subjects.​

For example, OpenAI’s GPT-3 model, which powers ChatGPT, was trained on a diverse dataset comprising 499 billion tokens. This dataset included sources like Common Crawl (60 percent), WebText2 (22 percent), Books1 (8 percent), Books2 (8 percent), and Wikipedia (3 percent). Specifically, Wikipedia contributed approximately three billion tokens to the training data

To assess whether the Pravda network had already impacted these technologies, we conducted a simple test with modern LLMs, including ChatGPT, Copilot, Perplexity, and Gemini, to determine whether they would identify or warn users about content originating from these sources. None of the tested agents warned or refused to guide users toward the network, despite sources pointing to reports flagging the network for being an information operation.

Collage of multiple responses of AI chatbots displaying content posted to news-pravda[.]com. (Sources: @CheckFirst and @DFRLab via ChatGPT, Copilot, Gemini and Perplexity)

The content outputted by the LLM notably included unverified claims such as Gemini reporting on “EU leaders rejected an initiative on military aid to Kiev for 2025.”

Additionally, on BlueSky, an open-source intelligence analyst from InfoEpi Lab indicated that websites such as news-pravda had appeared as references in non-direct queries, further highlighting the potential contamination risks for knowledge-based AI systems.

I asked Chat-GPT about a transgender statistic. It cited news-pravda, a well known pro-Kremlin website in a network documented by @viginum.bsky.social’s Portal Kombat report and expanded in @americansunlight.org’s update of the network. It’s a vulnerability that could be exploited indirectly.

E. Rosalie (@erosalie.infoepi.com) 2025-03-07T07:38:36.122Z
BlueSky post by E. Rosalie. (Source: @erosalie.infoepi.com/archive)

Pravda and X Community Notes

X Community Notes offer an insight into Pravda’s spread on the platform. Community Notes predominantly cited Pravda network domains to debunk sources of misinformation and false claims or highlight connections to Russian state propaganda efforts. An analysis of 153 Community Notes referencing Pravda network domains between late 2023 and early 2025 revealed consistent patterns in misinformation dissemination.

X (formerly Twitter) made Community Notes data publicly accessible to promote transparency and facilitate external analysis. This open data approach allows researchers and organizations to examine patterns and assess the effectiveness of the feature.  Community Notes were introduced to enhance the accuracy of information on social media platforms by leveraging user contributions. Initially launched by Twitter in 2021 as BirdWatch, this system allows users to collaboratively add context to potentially misleading posts, aiming to provide a more transparent and community-driven approach to content moderation.

The most frequently cited domain, news-pravda[.]com (and its variants), is referenced in approximately 67 notes. Other significant domains included yaroslavl-news[.]net (31 notes), pravda-en[.]com (17 notes), fr[.]news-pravda[.]com (9 notes), and pravda-fr[.]com (6 notes).

Bar chart showing the distribution of domains per X Community Note (Source: @CheckFirst via X)

Language analysis indicates that English-language misinformation is most prevalent (~95 notes), followed by Russian (~35 notes), French (~20 notes), Spanish (~6 notes), German (~3 notes), Polish (~3 notes), and other/mixed languages (~5 notes). This multilingual presence suggests a strategically targeted approach aimed at diverse international audiences.

(Source: @CheckFirst via X)

Content analysis reveals recurring themes such as, Ukraine-related disinformation (~65 notes), false claims involving celebrities and public figures (~25 notes), political misinformation (~30 notes), manipulated media and doctored images (~20 notes), anti-NATO narratives (~8 notes), and miscellaneous topics (~15 notes). Notably, about 70 notes explicitly identify content as “fake” or “fabricated,” approximately 110 notes provide counterevidence, around 45 explicitly reference “Russian propaganda,” and about 90 include links to credible external sources. Approximately 10 notes include WHOIS domain registration data, underscoring explicit Russian affiliations.

(Source: @CheckFirst via X)

A significant narrative promoted by the Pravda network in February 2025 falsely claimed several Hollywood celebrities—Ben Stiller, Jean-Claude Van Damme, and Angelina Jolie—were secretly paid by Ukraine. Approximately 25 Community Notes specifically addressed this campaign, primarily targeting fabricated “E! News” videos and related articles on news-pravda[.]com. The Community Notes exemplified several debunking strategies, including identifying that the fake videos were not present on official E! News platforms, pointing out technical errors (such as the misspelling of Jean-Claude Van Damme’s name), citing direct celebrity denials (notably from Ben Stiller), tracing domain registrations to Russian sources, and referencing authoritative fact-checks from AFP and AFP journalist Bill McCarthy.

Bar chart showing Community Notes posted daily during the first week of February 2025 (Source: @CheckFirst via X)

It is important to contextualize these findings within the broader volume of content promoting news-pravda[.]com on X. At the time of writing, the exact measurements of total posts promoting the Pravda network domains were impossible to realize. However, using its social monitoring tool CrossOver, CheckFirst captured a sample of tweets promoting news-pravda[.]com between January and March 7, 2025. Our limited dataset includes 2,018 tweets, highlighting that the 153 identified Community Notes created over three years represent only a fraction of the overall activity associated with these domains.

Frequency graph showing X containing news-pravda[.]com hyperlink posts over time (Source: @CheckFirst via X)

In January 2025, Meta announced plans to adopt a similar community-based model across its platforms—Facebook, Instagram, and Threads—phasing out traditional third-party fact-checking in favor of Community Notes. This initiative is being rolled out in the United States, with intentions to expand into the European Union.

Amaury Lesplingart is the co-founder and CTO of CheckFirst.


Cite this case study:

Amaury Lesplingart and Valentin Châtelet, “Russia-linked Pravda network cited on Wikipedia, LLMs, and X,” Digital Forensic Research Lab (DFRLab), March 12, 2025, https://dfrlab.org/2025/03/12/pravda-network-wikipedia-llm-x/.