
NuExtract
About
NuExtract is a line of lightweight, open-source text-to-JSON large language models (LLMs) crafted by NuMind for efficient structured information extraction. These models are adept at converting unstructured text into structured JSON formats, thus highly suitable for various data extraction tasks. The family includes different versions tailored for specific needs: from NuExtract-tiny with 0.5 billion parameters to NuExtract-large with 7 billion parameters. The latest iteration, NuExtract 1.5, features multilingual support, processes documents of any length, and even surpasses larger models like GPT-4o in certain benchmarks. Trained on a proprietary, high-quality synthetic dataset, these models are available under the MIT license and can operate in zero-shot or fine-tuned settings, offering flexibility for diverse applications 47.