Skip to content

Build nuclear AI in six clicks instead of six months.

RegFind DataFabric™ is a service providing a continuously-updated, citation-ready library of NRC, CNSC, IAEA, and FANR documents – delivered as raw text, clean chunks, and optional embeddings. Build your AI prototype in hours instead of quarters.

datafabric-json-sm2

DIY Scraping

  • ❌ 6 months of dev time
  • ❌ Missing metadata and duplication errors
  • ❌ Weekly doc hunts and pipeline maintenance
  • ❌ Open-ended budget

RegFind DataFabric

  • ✅ 30-minute download
  • ✅ Cleaned and complete data
  • ✅ Regular update feed
  • ✅ Flat annual subscription

Focus your engineers on delivering promised AI value, not sourcing data.

How it works

We handle the data so you can focus on building your product.
 

Here's what we take care of to jumpstart your project and keep it running smoothly:

1. Ingest

NRC, IAEA & FANR docs collected daily

2. Clean & Enrich

Text extraction, cleanup, and detailed metadata capture

3. Vectorize

Optional chunking and vectorization

4. Deliver

Complete or incremental JSONL files or Parquet vectors

Streamline your AI data pipeline

Engineers Working at Desk

Clean data

No noise, duplicates, or quirky formatting – just usable text that works out of the box.

Citation-perfect chunks

Structured for meaning, not just tokens. Chunks break cleanly at logical boundaries and include full source citations.

Plug into any stack

Built for teams building their own. Use our files or pre-chunked data with your preferred embedding model and vector store. No lock-in, no proprietary wrappers.

Always up to date

Regular feeds ensure your data stays in sync with the latest document releases.

Developer onboarding

We’ve built it, so we understand. We’ll guide your team through setup with direct support and proven AI integration know-how.

Document metadata

Structured fields like author, date, type, and more included for easier filtering and search.

Predictable and Flexible Pricing

RegFind DataFabric is a service that transforms fragmented public regulatory content into structured, searchable, and vectorized formats built for AI.

Starter Sample

For early exploration or proof-of-concept pilots.

Free-

Starter Sample includes

  • 1,000 curated NRC documents with metadata
  • Document text & pre-split chunks (JSONL)
  • Embeddings sample (JSONL & Parquet)

Core U.S. Edition

For teams building GenAI tools using NRC data.

$60k/ fleet / year

Core U.S. Edition includes

  • Full NRC corpus (3M+ docs) with full metadata
  • Regular updates
  • JSONL, chunks, or Parquet files
  • Optional embeddings + $25k

International Edition

For global organizations needing multi-jurisdictional coverage.

$75k/ fleet / year

International Edition

  • NRC + IAEA (+3.7K docs) with full metadata
  • Regular updates
  • JSONL, chunks, or Parquet files
  • Optional embeddings + $30k


We're regularly expanding our available libraries and will update customers and this page as new data sources become available.

  • We additionally have available CNSC documents. Add the Canada service pack for $20K/fleet/yr.
  • We additionally have available FANR documents. Add the UAE service pack for $5K/fleet/yr.

Need something else? Contact us and we'll get it sorted.

Works with your stack, your way


...and many more

Non-proprietary, no lock-in. Built for flexibility.

RegFind integrates cleanly with your infrastructure—cloud, hybrid, or air-gapped—and meets your data governance requirements from day one.

Ready to accelerate your nuclear AI projects?

Frequently asked questions

Can we use our own embedding model?

Yes. We provide raw text with metadata and stable chunk IDs in a JSONL file—so you can embed using OpenAI, BGE, e5, or any other model. You’re in full control.

How is the data delivered?

We support delivery via S3, Azure Blob, or direct file transfer. You’ll receive JSONL files, chunks, optional vectors (JSONL or Parquet), along with full metadata—ready for immediate use in your stack.

Can we host everything ourselves?

Yes. You retain full control—store, search, and embed the data within your own environment, whether cloud, hybrid, or fully air-gapped.

How fresh is the data?

New NRC and other library documents are added regularly to ensure everything's up to date.

How does RegFind handle public regulatory documents?

RegFind DataFabric is a professional service that processes and structures publicly available regulatory documents for use in AI workflows, compliance research, and enterprise knowledge applications.

This service does not sell or republish the documents themselves. Instead, RegFind DataFabric extracts, transforms, and organizes content from publicly accessible sources (e.g., NRC ADAMS, IAEA publications, and others) to enable easier integration with machine learning systems or search tools. The underlying documents remain the intellectual property of their respective rights holders and are linked or cited for reference only.

RegFind is not affiliated with, endorsed by, or acting on behalf of the U.S. Nuclear Regulatory Commission (USNRC), the International Atomic Energy Agency (IAEA), or any other regulatory agency. All organization names, logos, and trademarks referenced are the property of their respective owners.