Channels
Hi r/MachineLearning, Wanted to share something I'm excited about. I’ve been fascinated by AlphaEvolve and its results for more than a year now, but using open source frameworks seems overwhelming because of the high costs. I can’t really afford hundreds of Claude Opus calls every time I want to run it. I want to be able to try it out many times and all sorts of unique domains. What if it was possible for AlphaEvolve to be much more affordable while getting a better performance? Over the last six months or so, I’ve been working on LEVI, an open source AlphaEvolve-like system that can outperform existing open source frameworks at a fraction of the cost (upto 35x cheaper!). It can also run on Claude Code or Codex, making it even more accessible (I've mostly been using it with a QWEN-30B). LEVI comes in two flavors where I felt it’ll make the most difference: Code Optimization, and Prompt Optimization (sorry math, you got a less direct path; workable through the code route). The core thesis behind LEVI is that with the right search architecture, smaller models can substitute for or outperform larger ones. This means it’s much more economical to rely on smaller models for most of the work. That’s the entire takeaway. Making this work in practice is a different problem, but if you forget everything else from this post this is the only message I think I’m really trying to convey here. LEVI does it in three ways: 1) Invest in solution diversity from the start and ensure its maintained. We don’t want to converge to the same solution, especially with smaller models in the mix, and rely on large models to pull us out of the basin. 2) Use smarter routing across larger and smaller models (i.e. most mutations don’t require a Claude Opus X) 3) For prompt optimization not every rollout is as important. Build a proxy subset to approximate. I’ve tried LEVI on systems problems (like MoE scheduling or database transaction scheduling) and found that LEVI outperforms existing frameworks on almost every problem I threw at it while consistently using a smaller budget (unto 7x cheaper). For prompt optimization, across problems like IFBench and HotSpotQA, LEVI reaches a similar or better score as GEPA while using less than half the rollouts! Happy to answer any questions or take any suggestions! If there are unexpected or niche domains where this can be applied, I would love to hear. Technical Blog: https://ttanv.github.io/levi/ GitHub: https://github.com/ttanv/levi submitted by /u/Longjumping-Music638 [link] [Kommentare]
I've been building agents for about a year and recently shipped one for a client running ~140 MCP-exposed tools at peak. Along the way I made the canonical mistake. I used cosine similarity over tool description embeddings to pick which tools the model could see per turn. Worked great in demos. Was actively dangerous in production. Here's the problem. In a basic semantic-ranking setup you embed the user query, embed every tool description once, and rank by cosine similarity at runtime. That works for general document retrieval where chunks are paragraph-length, semantically rich, and roughly equal in form. Tool descriptions are not that. They are short (often
hi,where can i find Maven AI Evals for Engineers & PMs and End-to-End AI Engineering Bootcamp videos.They are too costly.cant afford them.Can anybody help me in finding the resoursec for them? submitted by /u/Zestyclose_Block5381 [link] [Kommentare]
ArXiv has an endorsement system for a reason. I would only offer endorsement to whom I have direct academic collaboration or mentorship with, since I'm putting my own academic reputation on the stake. This is also the standard of almost any serious academic researcher I am aware of. Now ArXiv is making effort to crack down AI slop and banning accounts uploading low-quality research papers, which is a great initiative. By definition of an "endorsement", I wish ArXiv could backtrack and at least issue warnings to their endorsers, and if this happens multiple times (let's say three), people giving out careless endorsement should also face consequences. submitted by /u/AffectionateLife5693 [link] [Kommentare]
There are coordinated efforts where people have favoured and jeopardised the double blind review process. No doubt out of these 80% there are great talent but we have to acknowledge that non chinese have been sobotaged and this was also reflected in the recent leaks of the reviewer data from the top ml conferences (won’t name them but they start with i). I have also personally faced such discrimination and had a discussion on the subreddit asking others if they have witnessed something similar. It was shocking to know that this is occurring on large scale. The question is how do we stop it, or highlight this? We have to preserve the sanctity of the research. submitted by /u/AppropriatePush6262 [link] [Kommentare]
I run evaluations on generative image models as part of my workflow, mostly comparing coherence, prompt adherence, and compositional accuracy across different architectures. The consensus here seems to be that open models are still a generation behind closed APIs. Based on my recent benchmarks, that gap is way smaller than people assume. On compositional control specifically, the latest open checkpoints handle multi-object scenes with spatial relationships about as reliably as the paid endpoints I've tested. Not perfect, but close enough that the failure modes are comparable. The thing that surprised me was text rendering in images, which used to be a disaster on open models. Recent architectures actually get it right roughly 70-80% of the time on short strings. Generation speed is another misconception. People complain about inference time but I'm getting 2MP outputs in under two minutes on a single consumer GPU. Drop resolution and step count and you're at 30 seconds. Fine for iteration. The structured prompting argument also falls flat. Everyone acts like having explicit scene control is a downside when it's literally what production pipelines need. Unstructured text prompts are the hack, not the other way around. These models ship without community optimizations, no fine-tuning, no custom pipelines. The baseline is already competitive. submitted by /u/ProfessionalAnt7436 [link] [Kommentare]
With more software engineers entering into data science and AI, I feel it's equally important for a person with data and AI background to dive into software development to survive, thrive in industry. I Know it's a very broad question, so suggestions with broad subjects, topics are welcome , like I often wonder how DSA is relevant. I totally understand the needs of the skills are deeply coupled with domain, industry and specific problems but unfortunately the industry doesn't understand this, it judges you, rewards you based on what you already know or pretend rather than your ability to learn or adapt. submitted by /u/Dapper_Chance_2484 [link] [Kommentare]
If ICML conference paper is rejected and no one opts-in or opts-out to keep the reviews visible, will the reviews be visible to everyone? There was clear instruction that only papers with at-least 1 opt-in AND zero opt-out options will be visible. None of the authors selected any option, But it in my openreview profile, it shows visible to everyone. please clarify. submitted by /u/Curious-Monitor497 [link] [Kommentare]
Im an undergraduate studying CS at a state school in the US. I’m interested in researching a specific style of self supervised learning (JEPA) and want to eventually go to grad school to study further. I have experience working in a lab similar to this topic, and I’ve become fairly comfortable with the literature and have a basic understanding of what its going on, but right now km only doing applied research in a specific domain (physics). I hope to eventually go to grad school to study this. But right now my opportunities are kinda limited as my school’s CS department is pretty mid. I was wondering if y’all have any advice on how to approach things? I know i can perform research independently but its not ideal due to: 1. Limited compute, less resources compared to a proper lab 2. Lack of a supervisor/guidance on the nuances of the field My current lab would be supportive if i do try to do things, but pure ml research is not really their main thing. I’ve heard people do REUs or cold email profs. But Im not sure if i could find something that specifix in an reu (also am international). And the labs i have seen working in this are either private or quite prestigious so im not sure how far cold emailing would take me. Sorry for the long post. Tldr; want to do pure ml research but theres no existing lab/professor at my current school who does something similar, wondering if any other pathways exist Any advice would be appreciated thanks submitted by /u/QuickStar07 [link] [Kommentare]
Hi folks, Deciding between these two Mac options has been a challenge for me, so pls help. I know mac is not even necessary for this but just help me to decide between these two options. For the reference, Im a swe student and looking forward to go deep into ml and data science in the near future… EDIT: mac book pro m5 ( base chip) that I’m referring here. submitted by /u/Both-Hovercraft3161 [link] [Kommentare]
Hi everyone, I'm an undergraduate student and ML researcher at UC Berkeley. My colleagues and I are working on a project that hopes to fix some of the problems users face with Colab. What are the features you wish it had as an ML professional, researcher, or enthusiast? What're the biggest problems you've faced while using it? Some of the issues that everyone feels (including us) is environment management and kernel persistence. But we would love to hear more from the community. submitted by /u/myplstn [link] [Kommentare]
Hey everyone, hope this is okay to post here. My co-author and I are currently between institutional affiliations, which means we don't have the academic email arXiv needs for an endorsement. We're hoping to find someone in cs.CV willing to take a quick look at our paper and endorse it if it meets your bar. The project: Locate-SAM2 We built a training-free pipeline connecting NVIDIA's LocateAnything-3B to Meta's SAM 2.1 through a lightweight adapter. The question we wanted to answer was simple: in a modular text-to-mask pipeline where everything is frozen, does the choice of grounder actually matter for the final mask? A few specifics, since the details are what tell you we're not just generating noise: On RefCOCO val, our system reaches 0.772 mIoU versus 0.717 for Grounding DINO Base, using the same SAM 2.1 backend throughout. RefCOCO appears in LocateAnything's training data, so we frame this honestly as in-domain benchmarking, not zero-shot transfer. We're not pretending otherwise. The paper has controlled comparisons across RefCOCO/+/g, adapter ablations, a ground-truth box oracle, a failure taxonomy, and a nonsense-prompt probe showing the pipeline needs abstention logic. Code is on GitHub and the paper is close to submission-ready. What we're hoping for Mainly an endorsement: someone to read the draft and, if they think it holds up, endorse us on arXiv. We'd acknowledge it and that's the whole ask. If anyone wants to get more involved, we're open to expanding the experiments or pointing the paper at a specific venue, and we'd talk co-authorship based on real contribution. We also have separate work in progress in physically-constrained DL, geospatial AI, and AI governance, in case any of that overlaps with what you do. We're not looking for a blind voucher. Drop a comment or a DM and we'll share the PDF and the repo. Happy to answer questions, and thanks for reading. submitted by /u/j_root_ [link] [Kommentare]
@Disneyディズニー「オラフ」がAIロボットになって登場 GTC展示ホールで見せた“キャラクターAIの生命感”
自ら起き上がる“AIバイク” ヤマハ「MOTOROiD:Λ」が示す新境地 Yamaha's "MOTOROiD:Λ" AI bike that stands up on its own【ロボスタ】倒れても起き上がる異形のバイク型AIロボット「モトロイド ラムダ」、ヤマハ発が描く「AIと走り、共に育つ未来」とはht...
I read and collected Arxiv whitepapers starting after the launch of ChatGPT. I copied and pasted excerpts into Word to track them. Then migrated to Obsidian. That vault of some 1700 papers is now online. I figured it was time to see if others would find the collection useful. My whitepapers were organized into some 90 categories, all of which emerged from paper topics. New categories became necessary with the discussion of new methods, techniques, models etc. If I wanted to write about a topic, I'd upload an md file containing research excerpts on that topic to ChatGPT. This worked to a degree but maxxed out context pretty quickly. And I always had related research in multiple categories, according to how the research was framed. (Personas research in Aligment, Psychology, HCI, etc). So I used a plugin to create topic notes that built in and outbound wikilinks across the papers centered on shared concepts. When I ported this all online I added another layer of synthesis: Inquiring Lines as I call them. These cover cross-cutting, tension-surfacing, synthesizing, and frontier-opening research frames. There's 6,000 of them in my collection. Each is a page to itself that's a useful description of a research line of inquiry. These now also have prompts you can run yourself that will find related (and more recent) research - (I can't adequately maintain each topic with new research). It's all at https://inquiringlines.com/inquiring-lines/ if you want to poke around. As is everything in the age of AI, it's a work in progress. But there's a lot of rich material in there. Have a look. submitted by /u/Barton5877 [link] [Kommentare]