Channels
As an ML researcher, how do you use AI tools in your daily work? Do you mostly use them to clean up grammar and wording, or also to rewrite, structure, or draft technical text? submitted by /u/Hope999991 [link] [Kommentare]
The prediction that equivariance reduces sample complexity by a factor of |G| appears in roughly every paper on geometric deep learning and is measured as an actual scaling law in roughly none of them. This paper does the measurement. The methodology is the interesting part. Naive estimators conflate group order with task difficulty (larger groups induce harder symmetry structure, not just more constraint), so the authors derive a relative exchange rate that cancels the shared difficulty out, meaning roughly how much less data the equivariant model needs compared to a vanilla baseline as a function of n, on a controlled C_n-symmetric task where n is a free knob. They also pre-specify a failure taxonomy: explicit conditions that would count as evidence against the hypothesis before seeing results. The headline number is beta_diff ~ 1.28, consistent with the theoretical 1.0. But the more durable finding is the wrong-group control: a model built with the wrong cyclic symmetry, same orbit size and same compute budget, is actively worse than no constraint. Not noise. The joint pairwise CI [+0.79, +3.26] excludes zero robustly across every estimator they run. Misalignment isn't just unhelpful; it is harmful. There is also a clean mathematical result slipped into Sec. 4.3: augmentation + test-time orbit averaging is exactly equivariant for output-pooling architectures, provably and verified to bit-identical training curves. The architecture-vs-augmentation gap collapses to whether you apply the orbit average at test time, not to anything structural. This seems underappreciated. The paper is unusually transparent about what it didn't nail: the relative-rate estimator was adopted post-hoc, the two-level bootstrap CI (seeds x group sizes) includes zero, and a finer-N replication on a sqrt(2)-spaced grid is inconclusive. They rank their findings explicitly by robustness. The wrong-group result is the one they would stake a claim on. The exchange rate is directionally probable. submitted by /u/AhmedMostafa16 [link] [Kommentare]
Hello everyone, Is it allowed to use OpenAI API outputs to create a silver code dataset or benchmark for a specific Python library? I am working on a project idea related to library-specific code generation. The concrete case is a specific Python library used in a technical/scientific domain. The goal would be to improve and evaluate how well code-generation models can use this library correctly. I am trying to understand the legal / Terms of Service boundary around using OpenAI API outputs in two different scenarios: Scenario 1: Silver dataset for fine-tuning an OSS model Use the OpenAI API to generate programming tasks, reference solutions, and verification tests for the specific Python library. Then human-review, filter, and validate the generated examples. Then use this silver dataset to fine-tune an open-source code model, with the goal of improving its performance on this specific library. My question: would this violate OpenAI’s terms because the API outputs are being used to train/fine-tune another coding model, even if the scope is narrow and library-specific? Scenario 2: Benchmark only, not training Use the OpenAI API to generate programming tasks, reference solutions, and verification tests. Human-review and validate them. Then use the resulting dataset only as an evaluation benchmark to compare different models. The benchmark would not be used to fine-tune or train any model. My question: is this generally considered allowed under OpenAI’s terms, assuming the benchmark is properly reviewed and documented as AI-assisted? I understand that Reddit is not legal advice, and I would still contact OpenAI or legal counsel for a definitive answer. However, I thought new ideas could come up from people who have already faced similar situations in practice. submitted by /u/ororo88 [link] [Kommentare]
It seems raw teleoperation data (RGB + joint states) structurally lacks affordance, contact intent, and embodiment-specific kinematic context. (information that can't be reliably recovered post-hoc once the demonstration is recorded) Most current approaches either filter/clean after collection, or rely on simulation to compensate. But neither seems to close the semantic gap for contact-rich tasks in unstructured environments. Is anyone working on supervision at acquisition time, enriching the stream as it's captured rather than labeling after the fact? And if not, is this a real bottleneck or am I overestimating the problem? submitted by /u/Several-Many9101 [link] [Kommentare]
About 10 years ago, I got into the basics of ML (like regression, KNN's, LVQ's) and read a few papers before taking a break a few years back. It feels like now, there's a lot of researchers in AI. How do you identify the ones who are actually solid vs those who (forgive my phrasing) are more researchers for appearance/status (i.e don't actually know what they're talking about)? Is the core filter h-index or where they work? How would you identify them? submitted by /u/roguejedi1 [link] [Kommentare]
I have a paper accepted at a non-archival ICML workshop this year, and I am trying to decide whether it is worth registering and attending. By coincidence, I will already be in Seoul around that time, but I would have to pay the workshop registration fee (~$400) out of my own pocket. I would only be registering for the workshop day since I have other commitments during the rest of the conference. I am thinking of applying to PhD programs this fall (I applied this year too, but didn't get in), and the workshop speakers and panellists look genuinely great. Not sure what the real benefits are here or whether I should go for it. For context, I am also attending ACL 2026 this year, but that trip is fortunately sponsored, so this would be a separate personal expense. I would also appreciate guidance on how non-archival workshops work in general. Since the paper is non-archival and not formally published (at least to my understanding), is registration still expected or required for accepted papers? Do authors typically attend and present in person, or is it common to skip attendance and conference registration? Has anyone been in a similar situation? I want to understand the benefits of this. Any advice would be greatly appreciated because I honestly have no idea how to evaluate this. submitted by /u/YOYOBOYOO [link] [Kommentare]
Most explanations of TPUs and systolic arrays are either hand-wavy diagrams or papers. I wanted to see the thing actually run, so I built it. TinyTPU is a 4×4 weight-stationary systolic array in real SystemVerilog, compiled to WebAssembly, with a step-by-step browser visualization. You enter two matrices, hit run, and watch the actual hardware execute: weights loading into PEs, matrix A streaming in diagonally (the "skew" that makes systolic arrays work), partial sums accumulating down the grid, results draining from the bottom. It has three levels: L1 - isolate a single MAC cell, watch one multiply-accumulate happen L2 - the full 4×4 array executing a real matmul L3 - tiling: what happens when your matrix is bigger than the hardware Nothing on screen is faked. The visualization reads state directly from compiled RTL. If you're trying to understand how matrix multiply maps to hardware why TPUs are efficient, what "weight-stationary" actually means, why the diagonal stagger exists this might click it for you in a way papers don't. Repo: tiny-tpu Live demo: Live If this project interests you please do star the repo, if you find something needs improving open a PR, I hope ya'll check this out and give me some feedback 🙏 submitted by /u/Horror-Flamingo-2150 [link] [Kommentare]
Hi all, Lately I have been working on creating a package for Multi Agent RL based drone environments with different objectives, all bundled into a single GitHub repository: https://github.com/tau-intelligence/MuJoCo-drones-gym I am currently trying to organize things for RL community people, with a couple more tools coming soon. But right now, I want to make it useful for the community and hence would love some feedback from different people, about how I could improve it, incorporate more things into it or fix some broken implementation. Also everyone is welcome to raise issues on the repo. Thank you for the support. PS: I have some research publications at RL and ML venues regarding work on RL, though I still want to consider myself as a student of the field and hence would love your help here. submitted by /u/MT1699 [link] [Kommentare]
maybe this should be asked in the Fc26 game subreddit but not sure. Anyway I just saw a video of someone predicting the winner of the world cup using the simulate match feature in the game but he only did it once. Would running this feature 100-1000 times give a significant result ? or is that feature only based on luck ? submitted by /u/Stillane [link] [Kommentare]
Do you think there is a possibility of using sewage water to cool AI servers? submitted by /u/TippaMyClit [link] [Kommentare]
https://reddit.com/link/1ty3xhz/video/dzede49lhk5h1/player Link to the replay. What are everyone’s thoughts on this? I know the benchmark has gotten a lot of criticism for being “too difficult” from a scoring perspective, but after watching the replay, it honestly looks like the models just aren’t that close to solving it yet. I’m not saying the benchmark is perfect, but the failures don’t really look like minor scoring issues. They look more like the model still doesn’t understand the task well enough to complete it reliably. submitted by /u/ClickedMoss5 [link] [Kommentare]
So I am trying to figure out what agent OS is. I am a layman and a lot of times when I see the information it comes off as very technical. However, I do like the idea of a dashboard because for my neurodivergent brain, it would be nice to have all of the AI tools in one space. Can you all help me understand what agent OS is? submitted by /u/EducatedBrotha [link] [Kommentare]
I've been building a content production tool for my company, which uses AI for things like structure and automatically inserting links with defined anchor text. 2 days ago, I started testing the results in AI text detection scanners and kept getting inconsistent results, even when I knew my articles looked more natural than a previous test. Revision after revision of code, 10 hours spent trying to get it right. And then I decided to pop in a few articles I had personally written, where I knew AI was not involved. Not a single one of the major scanners got it correct. Most of them flagged my original content as having more AI text than the articles my tool was producing. Now that I've gone down this rabbit hole and understand how AI writes and how the detectors work, I'm not sure that any tool is ever going to be able to do this correctly. For obviously written AI articles, sure, it will catch those. But for original content, I just don't see how it's ever going to work. What is everyone's thoughts on this? Has anyone done the same experiment? submitted by /u/Sypheix [link] [Kommentare]