Starter Sample
For early exploration or proof-of-concept pilots.
Starter Sample includes
- 1,000 curated NRC documents with metadata
- Document text & pre-split chunks (JSONL)
- Embeddings sample (JSONL & Parquet)
RegFind DataFabric™ is a service providing a continuously-updated, citation-ready library of NRC, CNSC, IAEA, and FANR documents – delivered as raw text, clean chunks, and optional embeddings. Build your AI prototype in hours instead of quarters.
NRC, IAEA & FANR docs collected daily
Text extraction, cleanup, and detailed metadata capture
Optional chunking and vectorization
Complete or incremental JSONL files or Parquet vectors
No noise, duplicates, or quirky formatting – just usable text that works out of the box.
Structured for meaning, not just tokens. Chunks break cleanly at logical boundaries and include full source citations.
Built for teams building their own. Use our files or pre-chunked data with your preferred embedding model and vector store. No lock-in, no proprietary wrappers.
Regular feeds ensure your data stays in sync with the latest document releases.
We’ve built it, so we understand. We’ll guide your team through setup with direct support and proven AI integration know-how.
Structured fields like author, date, type, and more included for easier filtering and search.
RegFind DataFabric is a service that transforms fragmented public regulatory content into structured, searchable, and vectorized formats built for AI.
For early exploration or proof-of-concept pilots.
For teams building GenAI tools using NRC data.
For global organizations needing multi-jurisdictional coverage.
We're regularly expanding our available libraries and will update customers and this page as new data sources become available.
Need something else? Contact us and we'll get it sorted.
Yes. We provide raw text with metadata and stable chunk IDs in a JSONL file—so you can embed using OpenAI, BGE, e5, or any other model. You’re in full control.
We support delivery via S3, Azure Blob, or direct file transfer. You’ll receive JSONL files, chunks, optional vectors (JSONL or Parquet), along with full metadata—ready for immediate use in your stack.
Yes. You retain full control—store, search, and embed the data within your own environment, whether cloud, hybrid, or fully air-gapped.
New NRC and other library documents are added regularly to ensure everything's up to date.
RegFind DataFabric is a professional service that processes and structures publicly available regulatory documents for use in AI workflows, compliance research, and enterprise knowledge applications.
This service does not sell or republish the documents themselves. Instead, RegFind DataFabric extracts, transforms, and organizes content from publicly accessible sources (e.g., NRC ADAMS, IAEA publications, and others) to enable easier integration with machine learning systems or search tools. The underlying documents remain the intellectual property of their respective rights holders and are linked or cited for reference only.
RegFind is not affiliated with, endorsed by, or acting on behalf of the U.S. Nuclear Regulatory Commission (USNRC), the International Atomic Energy Agency (IAEA), or any other regulatory agency. All organization names, logos, and trademarks referenced are the property of their respective owners.