RegFind

Clean data

No noise, duplicates, or quirky formatting – just usable text that works out of the box.

Citation-perfect chunks

Structured for meaning, not just tokens. Chunks break cleanly at logical boundaries and include full source citations.

Plug into any stack

Built for teams building their own. Use our files or pre-chunked data with your preferred embedding model and vector store. No lock-in, no proprietary wrappers.

Always up to date

Regular feeds ensure your data stays in sync with the latest document releases.

Developer onboarding

We’ve built it, so we understand. We’ll guide your team through setup with direct support and proven AI integration know-how.

Document metadata

Structured fields like author, date, type, and more included for easier filtering and search.

Starter Sample

For early exploration or proof-of-concept pilots.

Free-

Starter Sample includes

1,000 curated NRC documents with metadata
Document text & pre-split chunks (JSONL)
Embeddings sample (JSONL & Parquet)

Get a sample

Core U.S. Edition

For teams building GenAI tools using NRC data.

$60k/ fleet / year

Core U.S. Edition includes

Full NRC corpus (3M+ docs) with full metadata
Regular updates
JSONL, chunks, or Parquet files
Optional embeddings + $25k

Get started

International Edition

For global organizations needing multi-jurisdictional coverage.

$75k/ fleet / year

International Edition

NRC + IAEA (+3.7K docs) with full metadata
Regular updates
JSONL, chunks, or Parquet files
Optional embeddings + $30k

Get started

Can we use our own embedding model?

Yes. We provide raw text with metadata and stable chunk IDs in a JSONL file—so you can embed using OpenAI, BGE, e5, or any other model. You’re in full control.

How is the data delivered?

We support delivery via S3, Azure Blob, or direct file transfer. You’ll receive JSONL files, chunks, optional vectors (JSONL or Parquet), along with full metadata—ready for immediate use in your stack.

Can we host everything ourselves?

Yes. You retain full control—store, search, and embed the data within your own environment, whether cloud, hybrid, or fully air-gapped.

How fresh is the data?

New NRC and other library documents are added regularly to ensure everything's up to date.

How does RegFind handle public regulatory documents?

RegFind DataFabric is a professional service that processes and structures publicly available regulatory documents for use in AI workflows, compliance research, and enterprise knowledge applications.

This service does not sell or republish the documents themselves. Instead, RegFind DataFabric extracts, transforms, and organizes content from publicly accessible sources (e.g., NRC ADAMS, IAEA publications, and others) to enable easier integration with machine learning systems or search tools. The underlying documents remain the intellectual property of their respective rights holders and are linked or cited for reference only.

RegFind is not affiliated with, endorsed by, or acting on behalf of the U.S. Nuclear Regulatory Commission (USNRC), the International Atomic Energy Agency (IAEA), or any other regulatory agency. All organization names, logos, and trademarks referenced are the property of their respective owners.

Build nuclear AI in six clicks instead of six months.

DIY Scraping

RegFind DataFabric

Focus your engineers on delivering promised AI value, not sourcing data.

How it works

Here's what we take care of to jumpstart your project and keep it running smoothly:

1. Ingest

2. Clean & Enrich

3. Vectorize

4. Deliver

Streamline your AI data pipeline

Clean data

Citation-perfect chunks

Plug into any stack

Always up to date

Developer onboarding

Document metadata

Predictable and Flexible Pricing

Starter Sample

Starter Sample includes

Core U.S. Edition

Core U.S. Edition includes

International Edition

International Edition

Works with your stack, your way

...and many more

Non-proprietary, no lock-in. Built for flexibility.

Ready to accelerate your nuclear AI projects?

Frequently asked questions