InFeeo
Global
ai
New
Language
What will be the next breakthrough in ASR? [D](reddit.com)
Hey All, I am currently working on ASR models, and I have gathered some recent literature. From my literature search, it seems like the ASR models are getting more and more powerful due to two main things. Because pseudo-labelled data is growing, supervised models are rising rapidly. Whisper-large-v3 has been trained on 5M hours of weakly supervised data, and Nvidia Parakeet v3 has been trained on 660k hours of labelled data (open-sourced). Funny enough, Nvidia Parakeet v3 actually beats Whisper-large-v3 on almost every benchmark, even though it has a smaller model size and smaller data scale. So clearly, scale is not everything. New architectures are on the rise; We used to have self-supervised + CTC to solve the ASR task, but now it seems like Transducer, and Token-Duration-Transducers are taking off. As well as attention encoder-decoder architectures (Qwen) that are all trained in a supervised manner. Now, given that the labelled data is very huge, and the new architectures are coming up, are we saying bye to the self-supervised learning approaches like Data2Vec2.0, WavLM, etc., for ASR, and will we only use them for general-purpose speech tasks? This is actually not similar to how computer vision operates now. Dinov3 is a self-supervised approach that is extremely performant in segmentation, classification, depth estimation etc but I do not see this in the speech domain now. ASR is dominated by these huge supervised architectures (which is a dense-prediction task), as well as emotion recognition, diarization, and speech seperation are also all dominated by the supervised approaches. Do you think we will have our Dino moment with a new self-supervised architecture? Or supervised learning is the way to go? How would these methods actually perform if we trained a self-supervised model on these huge datasets? submitted by /u/ComprehensiveTop3297 [link] [Kommentare]
Time Series Forecasting for Agriculture/Crop Volume & Pricing – Looking for Advice [D](reddit.com)
Hi everyone, I work for a major berry company, and a large part of my role involves forecasting total industry crop volumes (weekly harvest/production forecasts) as well as future pricing. I'm relatively new to ML-based forecasting. This is only my second professional role, and I have a bachelor's degree in Information Systems with a few machine learning courses under my belt, but I'm definitely not a forecasting expert. For crop forecasting, I've been working with USDA and other industry datasets. I started with SARIMA models and have recently been experimenting with XGBoost and Holt-Winters methods to compare performance. I'm looking for recommendations on: Libraries/frameworks that are commonly used for production-grade time series forecasting Models that work well for agricultural production forecasting Approaches for forecasting commodity/produce pricing Feature engineering ideas (weather, seasonality, acreage, imports, etc.) Any papers, blogs, or resources that would be useful Most of the data is weekly and highly seasonal, with weather and supply conditions playing a major role. Any suggestions, lessons learned, or pointers from people working in forecasting would be greatly appreciated. submitted by /u/foreigneverythingg [link] [Kommentare]
I built a agentic dataset creation platform for training and robotics(reddit.com)
I would love feedback on the data quality and the 3D renderings specifically, because the renderings were the hardest part about getting this to work. Basically, Chaveta is a agentic dataset curation tool that allows you to submit a prompt and instantly receive a dataset for: - World models - Robotics (JSON Trajectories) - LLM Fine Tuning - Geological - Synthetic Tool Calling / LLM flows - Time series For the robotics path, you can also download to MCAP or simple JSON and we have a render tab that allows you to edit joints visually + we provide copy/paste scripts for importing the dataset into things like Transformers. Let me know what you think. submitted by /u/ComradePampers [link] [Kommentare]
Built a URDF playground with 3D visualization, validation, and conversion tools(reddit.com)
Hi everyone, I've been working on a browser-based URDF playground aimed at making robot development a bit easier. Steps: i) Paste URDF or Xacro directly into the browser ii) Instant 3D visualization iii) Shareable robot links iv) No ROS installation required Playground: https://roboinfra-dashboard.azurewebsites.net/playground Additional tooling: URDF/Xacro validation Auto-fix suggestions URDF → SDF conversion URDF → MJCF conversion URDF → USD conversion MoveIt configuration generation Mesh analysis GitHub Action integration Python SDK The goal is to make robotics workflows feel a little more like modern web development—open a browser, paste your robot description, and start iterating immediately. I'd really appreciate feedback from ROS, MoveIt, Isaac Sim, MuJoCo, and general robotics developers: What feature would make this genuinely useful in your workflow? What is currently missing from existing URDF tools? Any issues or suggestions after trying it? Thanks! submitted by /u/DateRealistic5066 [link] [Kommentare]
Genesis launch video, watched by millions, inspired me to look into what's actually available for simulation asset generation. Compared 4 tools.(reddit.com)
The Genesis sim video got me thinking: what does it actually take to build scenes like that (apart from gaussian splat part) with such accuracy, at scale? Asset and scene generation is one of the biggest bottlenecks in robot training. NVIDIA GR00T, Helix, HumanPlus, and ASAP all show the same pattern: more diverse scenarios lead to better sim-to-real transfer. But generating physically accurate objects and scenes takes time. Four platforms are working on this in 2026. Here's how they compare: 1. Rigyd: Agentic pipeline, best for on-demand scale and new types of objects Takes raw 3D (.glb, .fbx, .obj), images, or text and outputs calibrated OpenUSD + MJCF in ~2 minutes per asset with SimReady asset validator baked in. Generates full interactable scenes with per-object decomposition. Native Isaac Sim and MuJoCo support. Non-rigid and articulated objects are stated in the roadmap. The pipeline is agentic end-to-end, so no per-asset manual work. Good fit for teams that need to move fast with on-demand assets. 2. Lightwheel: High fidelity articulated objects, SimReady catalog Strong catalog of high-fidelity articulated assets and a SimReady library used by large enterprise customers. Per-asset visual and physical quality is high. USD and MJCF support via open-source converters. Good fit if you need a curated, validated catalog. Less flexible for new use cases or object categories outside their existing library. Catalog growth follows a curation model rather than an agentic pipeline. 3. NVIDIA Edify: Generative 3D, physics added separately Generates high-quality 3D meshes from text or image in under 2 minutes. Trained on licensed data, enterprise-safe. Tight Omniverse integration. The gap: it produces visual geometry, not SimReady assets. Physics, collision geometry, and USDPhysics schemas need to be added downstream before the asset is usable for robot training. Works well as an upstream step paired with a SimReady pipeline. 4. Moonlake: World modeling agent approach Acts directly inside Blender, automating the creation of articulated assets, physics-validated scenes, and complex environments rather than per-asset annotation. The approach is promising for research but production-grade Isaac Sim / MuJoCo integration is not there yet. If successful, world models could collapse scene generation and policy training into a single learning loop. What I think actually matters for sim-to-real transfer (ranked by impact): Per-object physics accuracy within the domain-randomization band Scene diversity (variation of scenes the policy sees during training) Visual fidelity (matters most for camera-only policies, less for contact-rich manipulation) How to choose: Need to scale across many object categories fast → Rigyd Need a validated catalog of articulated assets for known use cases → Lightwheel Need high-quality visual 3D in the NVIDIA ecosystem and will add physics downstream → Edify Researching end-to-end learned simulation → Moonlake For most teams the practical pattern is Rigyd for the long tail + hand-authored or Lightwheel assets for the few hero objects your scenario depends on. Both output standard OpenUSD/MJCF so they compose cleanly. Questions for the community: What's missing from this comparison? For those running training: where does asset prep actually bottleneck you? Image Credit: Genesis AI submitted by /u/yektabasak [link] [Kommentare]
Top 10 Robots Transforming the World in 2026: Humanoids, Warehouse Robots, Cobots, and Surgical Robotics(reddit.com)
We put together a robotics overview for business leaders, operators, procurement teams, investors, and executives who want to understand which robots are actually being deployed, which are still early, and where the industry is heading. The goal is not to make a technical ranking or a hype list. It is to explain the major categories of real-world robotics in a way that can be shared with people outside the robotics field. The overview covers: Boston Dynamics Spot — industrial inspection quadrupeds ANYbotics ANYmal — rugged inspection robots for energy, mining, chemicals, and heavy industry Agility Robotics Digit — logistics humanoids Figure 03 — general-purpose humanoids and embodied AI Boston Dynamics Atlas — all-electric humanoid mobility and manipulation Tesla Optimus — vertically integrated humanoid robotics strategy Unitree G1 — lower-cost humanoid research and education platform Universal Robots UR Series — collaborative robot arms for machine tending, packaging, assembly, and small manufacturers Amazon Proteus — autonomous mobile warehouse robots for logistics facilities Intuitive da Vinci 5 — surgical robotics and robotic-assisted surgery The main article is the general overview, and we are also building individual deep dives for each robot so non-technical readers can understand the business case, deployment maturity, pricing context, use cases, risks, and hardware/software stack behind each system. The audience is intentionally non-technical. It is meant to be something robotics professionals, engineers, founders, or operators can share with leadership teams, clients, or colleagues who need a grounded introduction without reading a robotics textbook. Disclosure: I’m affiliated with Black Scarab, where the article is published. The article is free to read and does not require signup. Most of the deep dives are already live. The Intuitive da Vinci 5 deep dive is still in progress and will complete the series. Full overview: https://www.blackscarab.ai/insights/top-10-robots-edge-ai-automation-humanoid-robotics submitted by /u/rgc4444 [link] [Kommentare]
Looking for high-fidelity robotics simulators for MacBook M4 supporting RL/DL pipelines (since Isaac Sim is out)(reddit.com)
Hey everyone, ​I'm deep into robotics simulation, specifically focusing on Reinforcement Learning (RL) and Deep Learning (DL) workflows. My hardware setup is an M4 MacBook Air (16GB unified memory). ​Initially, I wanted to use NVIDIA Isaac Sim/Isaac Lab because of its photorealistic graphics, advanced sensor simulation, and massive parallelized RL support. However, since Isaac Sim relies heavily on NVIDIA RTX hardware and CUDA, running it locally on Apple Silicon isn't feasible. I really want a local development environment rather than constantly relying on cloud instances. ​I need a simulation software that satisfies these core requirements: ​High-Quality Graphics: Clean rendering, realistic physics-based lighting, and solid sensor noise modeling for computer vision/DL perception models. ​Robust RL/DL Support: Seamless integration with Python ML ecosystems (like PyTorch, Stable-Baselines3, or JAX), OpenAI Gym/Gymnasium wrappers, and fast parallel simulation stepping. ​Apple Silicon friendly: Runs natively or optimized on macOS, making good use of the M4 chip and unified memory architecture without hitting x86_64 or CUDA bottlenecks. ​What are the best alternatives for this exact setup? ​I’ve looked into MuJoCo (especially with its native macOS build and the JAX-based MuJoCo XLA / MJX for acceleration, though I'm curious how well XLA handles Apple Silicon for parallel envs). I've also considered Unity with ML-Agents, which utilizes Apple's Metal API for incredible graphics and handles RL workflows beautifully on Mac. ​Has anyone successfully built a high-graphics RL/DL robotics pipeline on an M4 Mac? Which simulator did you choose, and what did your Python bridge look like? submitted by /u/Risheyyy [link] [Kommentare]
Understanding Pytorch better and Moving forward from papers [D](reddit.com)
Im moving to my final year of engineering, im panicking scared everything but im confident in myself. I can read papers, understand the code go through the architectures and see them at scale (in my head), while i struggle to interpret all the dimensions and helper functions being coupled, i somehow get by hour an abnormal amount of time spent on it. I dont get what i should be doing next? i aspire to combine encoders for vision, audio and ofc text to build a model. but i dont see how that happens overnight, i wanna know what you all experienced folks did after reading papers. it makes me curious about the implications and applications, how real researchers are working on top of it. somewhat like the Big Bang Theory, where all the scientists just discuss ideas, I wish to reach out to researchers too, leave any suggestions on what would help me stand out among all these AI proposals. submitted by /u/EnchantedHawk [link] [Kommentare]
Are privacy-preserving techniques actually being used in production ML systems? [D](reddit.com)
I've been reading more about privacy-preserving ML approaches such as differential privacy, federated learning, and on-device inference. The research literature is fairly active, but I'm curious about real-world adoption. For those working in industry: Are these techniques being deployed in production? What were the biggest engineering challenges? Did privacy requirements significantly impact model performance or infrastructure costs? Are there specific use cases where privacy-preserving approaches have proven especially valuable? Interested in hearing both success stories and cases where the tradeoffs made adoption difficult. submitted by /u/Electrical_Mine1912 [link] [Kommentare]
People were praising computers over human brain, now it is reverse [D](reddit.com)
From past few decades, people were praising that computers are faster than human brain, it can calculate and can solve complex problem that human brain can never and then AI came in, everybody thought it is the end of human race. Until, Context and memory problem hits! Now we don’t have a single architecture of method to preserve memory which a human brain can do easily(or hard depends on perspective) People are trying to solve memory problem and end of creating another type of RAG. Where human brain collects context only of problem and doesn’t hallucinate. I mean this is what i think currently has major issue, where human wins(no idea about future) Do you have anything in mind where humans are very ahead? submitted by /u/intellinker [link] [Kommentare]