DeepSeek unveils math-focused model designed for rigorous theorem proving

DeepSeek has introduced DeepSeekMath-V2, a new math-oriented model aimed at improving how AI systems approach formal reasoning. The release follows the company’s early-2025 debut of one of the first widely accessible “thinking models,” and reflects an ongoing effort across the field to move beyond shortcut pattern-matching and toward verifiable, logic-driven outputs.

Rather than optimizing for a correct final answer, DeepSeekMath-V2 centers on the structure of mathematical reasoning itself, placing theorem proving and step-by-step derivation at the core of its design. According to the company, the model uses a generation-verification loop: an LLM-based verifier evaluates proofs, and a proof generator is trained using that verifier as a reward model. This setup encourages the generator to detect weaknesses in its own derivations and iterate toward more rigorous logic. Verification scaling is then used to automatically label difficult proofs, creating additional training data to further refine the verifier. The process is an attempt to push AI away from brittle reasoning patterns that sometimes produce seemingly coherent but incorrect steps.

DeepSeekMath-V2 has already been tested in competitive environments. The model earned gold-level results on the 2025 International Mathematical Olympiad and the 2024 Chinese Mathematical Olympiad, and scored 118/120 on the 2024 Putnam exam when run with scaled test-time compute. While these benchmarks aren’t equivalent to solving open problems, they indicate a level of consistency in formal reasoning that earlier systems struggled to match.

Built on DeepSeek-V3.2-Exp-Base, the model is available on HuggingFace, and DeepSeek provides inference tools through its V3.2-Exp GitHub repository. The company frames the release as part of a long-term effort to expand the scientific utility of AI, particularly in mathematical domains where precision and verification matter more than fluent output. Although suggestions that AI could uncover major unsolved problems—like the Millennium Prize Problems—remain speculative, developing tools that help researchers work through formal logic is a practical step forward.

The broader implication is that math-capable models may eventually support scientific research in areas such as physics, healthcare, and engineering, where the underlying questions require strict reasoning rather than loosely guided intuition. DeepSeekMath-V2 doesn’t solve those challenges on its own, but its open-access availability ensures that researchers, students and developers can test and refine it in real-world workflows rather than in isolated benchmarks.