{"id":90,"date":"2025-02-16T04:23:35","date_gmt":"2025-02-16T04:23:35","guid":{"rendered":"https:\/\/tpbench.org\/?page_id=90"},"modified":"2026-05-02T06:14:58","modified_gmt":"2026-05-02T06:14:58","slug":"home","status":"publish","type":"page","link":"https:\/\/tpbench.org\/","title":{"rendered":"TP Bench"},"content":{"rendered":"\n<h3 class=\"wp-block-heading has-text-align-center\" style=\"font-size:clamp(22.041px, 1.378rem + ((1vw - 3.2px) * 1.454), 36px);\">TP Bench &#8211; Theoretical Physics Benchmark for AI<\/h3>\n\n\n\n<p class=\"has-text-align-left\">TPBench is a curated dataset and evaluation suite designed to measure the reasoning capabilities of AI models in theoretical physics. Our test problems span multiple difficulty levels\u2014from undergraduate to frontier research\u2014and cover topics such as cosmology, high-energy theory, general relativity, and more. By providing a unified framework for problem-solving and auto-verifiable answers, TPBench aims to drive progress in AI-based research assistance for theoretical physics.<\/p>\n\n\n\n<p class=\"has-text-align-center\"><a href=\"https:\/\/arxiv.org\/abs\/2502.15815\"><strong>Read the TPBench Paper on arxiv<\/strong><\/a> <a href=\"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/adfcb0\"><strong>(MLST journal version)<\/strong><\/a><\/p>\n\n\n\n<p class=\"has-text-align-center\"><a href=\"https:\/\/huggingface.co\/datasets\/ZhiqiGao\/TPBench\"><strong>Access Public Dataset on Huggingface<\/strong><\/a><\/p>\n\n\n\n<p class=\"has-text-align-center\"><a href=\"https:\/\/arxiv.org\/abs\/2506.20729\"><strong>Read our Paper on Test-time Scaling Techniques in Theoretical Physics (NeurIPS 25 ML4PS)<\/strong><\/a><\/p>\n\n\n\n<div style=\"height:28px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h3 class=\"wp-block-heading has-text-align-center\">Current Model Performance<\/h3>\n\n\n\n<p><\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"395\" src=\"https:\/\/tpbench.org\/wp-content\/uploads\/2026\/05\/difficulty_bar_chart_verified_flagship-1024x395.png\" alt=\"\" class=\"wp-image-498\" srcset=\"https:\/\/tpbench.org\/wp-content\/uploads\/2026\/05\/difficulty_bar_chart_verified_flagship-1024x395.png 1024w, https:\/\/tpbench.org\/wp-content\/uploads\/2026\/05\/difficulty_bar_chart_verified_flagship-300x116.png 300w, https:\/\/tpbench.org\/wp-content\/uploads\/2026\/05\/difficulty_bar_chart_verified_flagship-768x296.png 768w, https:\/\/tpbench.org\/wp-content\/uploads\/2026\/05\/difficulty_bar_chart_verified_flagship-1536x592.png 1536w, https:\/\/tpbench.org\/wp-content\/uploads\/2026\/05\/difficulty_bar_chart_verified_flagship-2048x790.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"447\" src=\"https:\/\/tpbench.org\/wp-content\/uploads\/2026\/05\/difficulty_bar_chart_verified_strong-1024x447.png\" alt=\"\" class=\"wp-image-497\" srcset=\"https:\/\/tpbench.org\/wp-content\/uploads\/2026\/05\/difficulty_bar_chart_verified_strong-1024x447.png 1024w, https:\/\/tpbench.org\/wp-content\/uploads\/2026\/05\/difficulty_bar_chart_verified_strong-300x131.png 300w, https:\/\/tpbench.org\/wp-content\/uploads\/2026\/05\/difficulty_bar_chart_verified_strong-768x335.png 768w, https:\/\/tpbench.org\/wp-content\/uploads\/2026\/05\/difficulty_bar_chart_verified_strong-1536x670.png 1536w, https:\/\/tpbench.org\/wp-content\/uploads\/2026\/05\/difficulty_bar_chart_verified_strong-2048x894.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"607\" src=\"https:\/\/tpbench.org\/wp-content\/uploads\/2026\/05\/difficulty_bar_chart_verified_weak-1-1024x607.png\" alt=\"\" class=\"wp-image-501\" srcset=\"https:\/\/tpbench.org\/wp-content\/uploads\/2026\/05\/difficulty_bar_chart_verified_weak-1-1024x607.png 1024w, https:\/\/tpbench.org\/wp-content\/uploads\/2026\/05\/difficulty_bar_chart_verified_weak-1-300x178.png 300w, https:\/\/tpbench.org\/wp-content\/uploads\/2026\/05\/difficulty_bar_chart_verified_weak-1-768x455.png 768w, https:\/\/tpbench.org\/wp-content\/uploads\/2026\/05\/difficulty_bar_chart_verified_weak-1-1536x910.png 1536w, https:\/\/tpbench.org\/wp-content\/uploads\/2026\/05\/difficulty_bar_chart_verified_weak-1-2048x1213.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<div style=\"height:26px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h3 class=\"wp-block-heading has-text-align-center\">Frequently Asked Questions<\/h3>\n\n\n\n<details class=\"wp-block-details is-layout-flow wp-block-details-is-layout-flow\"><summary>Where can I see the solutions that models provided for the public problems?<\/summary>\n<p>You can access the problem content and the model solutions here:   <a href=\"https:\/\/tpbench.org\/?page_id=2\" data-type=\"page\" data-id=\"2\">Public Problems and Model Solutions<\/a><\/p>\n<\/details>\n\n\n\n<details class=\"wp-block-details is-layout-flow wp-block-details-is-layout-flow\"><summary>Are you planning to evaluate on more models?<\/summary>\n<p>Yes, we will add new SOTA models to the evaluation as long as they release API access or model weights. <\/p>\n<\/details>\n\n\n\n<details class=\"wp-block-details is-layout-flow wp-block-details-is-layout-flow\"><summary>How can I contact you?<\/summary>\n<p>Please email us at <strong>research@tpbench.org<\/strong> if you have questions or concerns. <\/p>\n<\/details>\n\n\n\n<details class=\"wp-block-details is-layout-flow wp-block-details-is-layout-flow\"><summary>Interested in contributing to the dataset?<\/summary>\n<p>We invite interested researchers to&nbsp;contribute&nbsp;new problems and collaborate on future TPBench updates. Feel free to contact us via <strong>research@tpbench.org<\/strong>.<\/p>\n\n\n\n<p><\/p>\n<\/details>\n\n\n\n<details class=\"wp-block-details is-layout-flow wp-block-details-is-layout-flow\"><summary>How can I access the full data set?<br><\/summary>\n<p>We cannot make the full dataset publicly available to prevent potential data leakage into future model training. However, if you wish to evaluate your model using our private dataset, please contact us. Additionally, for collaborative projects, we can provide full dataset access under controlled conditions to mitigate data leakage risks.<\/p>\n<\/details>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>TP Bench &#8211; Theoretical Physics Benchmark for AI TPBench is a curated dataset and evaluation suite designed to measure the reasoning capabilities of AI models in theoretical physics. Our test problems span multiple difficulty levels\u2014from undergraduate to frontier research\u2014and cover topics such as cosmology, high-energy theory, general relativity, and more. By providing a unified framework [&hellip;]<\/p>\n","protected":false},"author":5,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-90","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/tpbench.org\/index.php?rest_route=\/wp\/v2\/pages\/90","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/tpbench.org\/index.php?rest_route=\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/tpbench.org\/index.php?rest_route=\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/tpbench.org\/index.php?rest_route=\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/tpbench.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=90"}],"version-history":[{"count":105,"href":"https:\/\/tpbench.org\/index.php?rest_route=\/wp\/v2\/pages\/90\/revisions"}],"predecessor-version":[{"id":502,"href":"https:\/\/tpbench.org\/index.php?rest_route=\/wp\/v2\/pages\/90\/revisions\/502"}],"wp:attachment":[{"href":"https:\/\/tpbench.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=90"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}