Close Menu
OnlyPlanz –

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Apple’s AI Ambitions Leave Big Questions Over Its Climate Goals

    August 12, 2025

    Australia to India: What this celebrity chef learned about life, hospitality and home | Travel

    August 12, 2025

    The Crossword: Tuesday, August 12, 2025

    August 12, 2025
    Facebook X (Twitter) Instagram
    Trending
    • Apple’s AI Ambitions Leave Big Questions Over Its Climate Goals
    • Australia to India: What this celebrity chef learned about life, hospitality and home | Travel
    • The Crossword: Tuesday, August 12, 2025
    • The Canon EOS R50 V: A Compact Video Powerhouse With Surprising Quality
    • Chancellor Rachel Reeves to meet Emma Little-Pengelly and John O’Dowd in Belfast
    • LinkedIn Adds New Animations for Professional Announcements
    • Slow Ventures cuts first check from $60M creator fund into woodworking founder
    • Harssh Limbachiyaa says nobody thought he and Bharti Singh would get married, opens up about their dating days: ‘I used to get the gossip’ | Lifestyle News
    Facebook X (Twitter) Instagram Pinterest Vimeo
    OnlyPlanz –OnlyPlanz –
    • Home
    • Marketing
    • Branding
    • Modeling
    • Video Creation
    • Editing Tips
    • Content
    • Engagement
    • More
      • Tools
      • Earnings
      • Legal
      • Monetization
    OnlyPlanz –
    Home»Monetization»A new AI coding challenge just published its first results – and they aren’t pretty
    Monetization

    A new AI coding challenge just published its first results – and they aren’t pretty

    onlyplanz_80y6mtBy onlyplanz_80y6mtJuly 24, 2025No Comments3 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Blue code on a dark background presented at an angle.
    Share
    Facebook Twitter LinkedIn Pinterest Email

    A brand new AI coding problem has revealed its first winner — and set a brand new bar for AI-powered software program engineers. 

    On Wednesday at 5pm PST, the nonprofit Laude Institute introduced the primary winner of the Okay Prize, a multi-round AI coding problem launched by Databricks and Perplexity co-founder Andy Konwinski. The winner was a Brazilian immediate engineer named Eduardo Rocha de Andrade, who will obtain $50,000 for the prize. However extra stunning than the win was his last rating: he received with appropriate solutions to simply 7.5% of the questions on the take a look at.

    “We’re glad we constructed a benchmark that’s really exhausting,” stated Konwinski. “Benchmarks needs to be exhausting in the event that they’re going to matter,” he continued, including: “Scores could be totally different if the large labs had entered with their greatest fashions. However that’s type of the purpose. Okay Prize runs offline with restricted compute, so it favors smaller and open fashions. I like that. It ranges the enjoying subject.”

    Konwinski has pledged $1 million to the primary open-source mannequin that may rating greater than 90% on the take a look at.

    Much like the well-known SWE-Bench system, the Okay Prize exams fashions in opposition to flagged points from GitHub as a take a look at of how effectively fashions can take care of real-world programming issues. However whereas SWE-Bench relies on a set set of issues that fashions can practice in opposition to, the Okay Prize is designed as a “contamination-free model of SWE-Bench,” utilizing a timed entry system to protect in opposition to any benchmark-specific coaching. For spherical one, fashions have been due by March twelfth. The Okay Prize organizers then constructed the take a look at utilizing solely GitHub points flagged after that date.

    The 7.5% prime rating stands in marked distinction to SWE-Bench itself, which at the moment exhibits a 75% prime rating on its simpler ‘Verified’ take a look at and 34% on its more durable ‘Full’ take a look at. Konwinski nonetheless isn’t positive whether or not the disparity is because of contamination on SWE-Bench or simply the problem of amassing new points from GitHub, however he expects the Okay Prize challenge to reply the query quickly.

    “As we get extra runs of the factor, we’ll have a greater sense,” he informed TechCrunch, “as a result of we count on folks to adapt to the dynamics of competing on this each few months.”

    Techcrunch occasion

    San Francisco
    |
    October 27-29, 2025

    It’d seem to be an odd place to fall quick, given the big selection of AI coding instruments already publicly out there – however with benchmarks turning into too straightforward, many critics see initiatives just like the Okay Prize as a needed step towards fixing AI’s rising analysis drawback.

    “I’m fairly bullish about constructing new exams for present benchmarks,” says Princeton researcher Sayash Kapoor, who put ahead the same concept in a latest paper. “With out such experiments, we are able to’t really inform if the problem is contamination, and even simply focusing on the SWE-Bench leaderboard with a human within the loop.”

    For Konwinski, it’s not only a higher benchmark, however an open problem to the remainder of the trade. “When you hearken to the hype, it’s like we needs to be seeing AI docs and AI legal professionals and AI software program engineers, and that’s simply not true,” he says. “If we are able to’t even get greater than 10% on a contamination free SWE-Bench, that’s the fact test for me.”

    arent challenge coding Pretty published results
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleSony FX2 Camera and BLAZAR Mantis 1.33x 35mm Anamorphic Lens Goes Daruma Dolls
    Next Article Google Posts $96B in Q2 Revenue, Spurred by Big AI Investments
    onlyplanz_80y6mt
    • Website

    Related Posts

    Monetization

    LinkedIn Adds New Animations for Professional Announcements

    August 12, 2025
    Monetization

    Elon Musk Threatens Major Apple Lawsuit Over OpenAI App Store Ranking

    August 12, 2025
    Monetization

    Friends’ Kitchen Side Hustle Surpassed $130,000 in 3 Days

    August 12, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    5 Steps for Leading a Team You’ve Inherited

    June 18, 20255 Views

    A Pro-Russia Disinformation Campaign Is Using Free AI Tools to Fuel a ‘Content Explosion’

    July 1, 20253 Views

    Meera Sodha’s vegan recipe for Thai-style tossed walnut and tempeh noodles | Noodles

    June 28, 20253 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews
    Tools

    Apple’s AI Ambitions Leave Big Questions Over Its Climate Goals

    onlyplanz_80y6mtAugust 12, 2025
    Modeling

    Australia to India: What this celebrity chef learned about life, hospitality and home | Travel

    onlyplanz_80y6mtAugust 12, 2025
    Content

    The Crossword: Tuesday, August 12, 2025

    onlyplanz_80y6mtAugust 12, 2025

    Subscribe to Updates

    Get the latest tech news from FooBar about tech, design and biz.

    Most Popular

    SLR reform is happening. Does it matter?

    June 18, 20250 Views

    Panthers in awe of Brad Marchand’s ‘will to win’ in Cup run

    June 18, 20250 Views

    DOJ Offers Divestiture Remedy in Lawsuit Opposing Merger of Defense Companies

    June 18, 20250 Views
    Our Picks

    Apple’s AI Ambitions Leave Big Questions Over Its Climate Goals

    August 12, 2025

    Australia to India: What this celebrity chef learned about life, hospitality and home | Travel

    August 12, 2025

    The Crossword: Tuesday, August 12, 2025

    August 12, 2025
    Recent Posts
    • Apple’s AI Ambitions Leave Big Questions Over Its Climate Goals
    • Australia to India: What this celebrity chef learned about life, hospitality and home | Travel
    • The Crossword: Tuesday, August 12, 2025
    • The Canon EOS R50 V: A Compact Video Powerhouse With Surprising Quality
    • Chancellor Rachel Reeves to meet Emma Little-Pengelly and John O’Dowd in Belfast
    Facebook X (Twitter) Instagram Pinterest
    • About Us
    • Disclaimer
    • Get In Touch
    • Privacy Policy
    • Terms and Conditions
    © 2025 ThemeSphere. Designed by Pro.

    Type above and press Enter to search. Press Esc to cancel.