InFeeo
United States
artificial-intelligence
New
Language

Channels

Machine Learning Concepts [D](reddit.com)
Dear Folks, I have created multiple content on Machine Learning(work in progress), and they are free. I am a data scientist and a post grad degree holder in AI/ML from IIT. To help the machine learning community with important Machine Learning Concepts, I have created multiple long form videos, and structured topicwise digestible contents structured as playlists for learning. If you go through the first two playlists: Introductory Machine Learning Concepts Probability Foundations: Univariate Models You might find helpful content, I have tried explaining with intuitions, derivations, and this is work in progress. For code implementations, scikit learn website has great content on them as well. In total they have 60+ topicwise videos so far, and I think they have the potential to help folks a lot in starting with concepts, or getting with mathematical concepts, or whether you are preparing for an AI/ML/Data job interviews etc. When I sat for my interviews, I was grilled on my project, but majority of questions from my project tested more on foundational concepts and there know how’s. These are FREE content on youtube. This is for the benefit of the learning community. Link: https://youtube.com/@aayushsugandh4036?si=w5MKORU2fWzLRrAJ submitted by /u/Negative_War_65 [link] [Kommentare]
Machine Learning Concepts [D](reddit.com)
Dear Folks, I have created multiple content on Machine Learning(work in progress), and they are free. I am a data scientist and a post grad degree holder in AI/ML. To help the machine learning community with important Machine Learning Concepts, I have created multiple long form videos, and structured topicwise digestible contents structured as playlists for learning. If you go through the first two playlists: Introductory Machine Learning Concepts Probability Foundations: Univariate Models You might find helpful content, I have tried explaining with intuitions, derivations, and this is work in progress. For code implementations, scikit learn website has great content on them as well. In total they have 60+ topicwise videos so far, and I think they have the potential to help folks a lot in starting with concepts, or getting with mathematical concepts, or whether you are preparing for an AI/ML/Data job interviews etc. When I sat for my interviews, I was grilled on my project, but majority of questions from my project tested more on foundational concepts and there know how’s. These are FREE content on youtube, and hope it benefits and helps the ML community. submitted by /u/Negative_War_65 [link] [Kommentare]
What should context compression keep? I looked at how six agents handle it[D](reddit.com)
I use Claude Code, Codex CLI, OpenCode, Cline, Cursor, and Amp enough to notice a pattern in how they handle long context. They are all converging on layered progressive compression, but they disagree on what to protect. Most protect recent user messages as a first-class asset. That makes sense. The user said it, which is the source of truth. Most also protect tool outputs that carry state. What surprised me was how differently they treat old assistant messages. Artifacts keeps recent tool calls verbatim but drops older context aggressively. Cursor starts pruning earlier design decisions once the window gets full. Codex CLI lets the model itself decide what to keep in the summary tier. The other axis is transparency. Do you tell the model it was compressed? Some systems silently replace old tool results with a placeholder, which means the model is reasoning under the illusion that it never happened. Others make it explicit: "the previous 40 tool calls are summarized below." I lean explicit because the model needs to know its own context was degraded. Verdents agent loop uses a similar tiered approach: snip first, prune second, summarize last, and a hard red line that protects user messages, stateful tool outputs, and anything the user explicitly flagged. The tradeoff is cost vs accuracy. Aggressive compression saves tokens but degrades the plan. Under-compression hits the window and causes context rot. submitted by /u/Direct_Band896 [link] [Kommentare]
Is Symbolic Regression still a thing, given LLMs' performance? [D](reddit.com)
I've been teaching myself about Symbolic Regression (SR), which looks like a super exciting field. (A great intro resource below [1]). But then I was wondering: given LLMs' increasingly-growing power in generating code, which is in a way very similar to Symbolic Regression (or of course, even directly tackling symbolic regression tasks), are existing SR techniques dead? Happy to hear your thoughts. [1] ETH Zürich AISE: Symbolic Regression and Model Discovery - YouTube submitted by /u/omomom42 [link] [Kommentare]
[P] Extreme Imbalance Data from 100K dataset only have 56 failure [P](reddit.com)
as in the title, my goal is to predicting failure and RUL of machine, dataset is timestamp and when machine is failure it will labeled with 1 that only have 56 https://preview.redd.it/plbydmenmm6h1.png?width=1205&format=png&auto=webp&s=2fefe3cc2e3fe554b81c9e0b4012c5345e73ec3f From this data im ditching operating hours and humidity because it didnt show correlation for machine failure, what algorithm or deeplearning suit for it? submitted by /u/False-Seesaw-1899 [link] [Kommentare]
Adaptive Tokenisation Via Temporal Redundancy Masking And Latent Inpainting [R](reddit.com)
link - https://arxiv.org/abs/2606.06158 Abstract : Adaptive video tokenisation seeks to dynamically allocate token budgets based on the underlying visual complexity of a sequence. Current continuous-regime approaches achieve this via iterative binarised searches or trained neural regressors, while discrete methods often require a full-rate decoder pass to estimate information content. We demonstrate that such computational overheads are not strictly necessary. We show that the latent space of a frozen continuous video tokeniser inherently encodes temporal redundancy that can be exploited directly: spatial positions whose latent representations change minimally between consecutive frames carry near-zero additional information. We introduce a parameter-free adaptive token allocation mechanism that applies a fixed threshold to per-position temporal-L1 differences, identifying and dropping redundant latent positions. Consequently, the compression rate emerges naturally from the input content rather than being enforced top-down: static scenes get compressed aggressively, while highly dynamic sequences retain more tokens. To reconstruct the dropped positions, we propose the Latent Inpainting Transformer (LIT), a lightweight factorised spatial-temporal attention architecture. The resulting inference pipeline is highly efficient, requiring only a single encoder pass and one LIT forward pass, eliminating the need for auxiliary routing networks. Evaluations across TokenBench and DAVIS, which are the standard benchmarks used by recent tokenisers, indicate that our framework yields meaningful, content-driven token allocation while maintaining competitive reconstruction fidelity, and delivers a 31x inference-time speedup over the continuous adaptive baseline (ElasticTok-CV) and an 2x speedup over the discrete information-theoretic baseline (InfoTok) submitted by /u/chhaya_35 [link] [Kommentare]
Anthropic walks back policy on silent nerfing for AI/ML, will notify users [N](reddit.com)
From Wired: “We’re changing Fable 5’s safeguards for frontier LLM development to make them visible.” Anthropic said in a statement to WIRED. “We made the wrong tradeoff and we apologize for not getting the balance right.” Anthropic now says it’s changing course, and that Claude Fable 5’s safeguards for AI development will be visible to users. If the company suspects a user is trying to use Claude to build a highly capable AI it will alert them that it’s either refusing the request, or rerouting the user to a less capable model. Full article: https://www.wired.com/story/anthropic-responds-to-backlash-on-claudes-secret-sabotage-on-ai-research/ submitted by /u/goldcakes [link] [Kommentare]
ICMI 2026 Reviews [D](reddit.com)
Did anyone else submit to ACM ICMI 2026? The reviews were recently released, and this is my first time submitting to ICMI, so I'm not very familiar with the acceptance patterns. I submitted a long paper and received the following overall ratings: 4 (Probably Accept), 3 (Borderline), 4 (Probably Accept) The reviewer with the highest stated expertise recommended acceptance, while the borderline reviewer had some concerns about soundness but still considered it a nice contribution. For those who have submitted to or reviewed for ICMI before, how would you interpret these scores? Is a 4/3/4 generally considered competitive after rebuttal, or is it still a long shot? Would appreciate any insights from past authors or reviewers. submitted by /u/kanishq95 [link] [Kommentare]
Looking for papers/resources on AI responses to psychological distress prompts [P](reddit.com)
Hi everyone, I’m close to completing my degree in Psychology, and I’m also a Systems Engineering student. is like, roughly comparable to Software Engineering / Computer Science outside Latin America. Although I study engineering, I’m still at an early stage with machine learning, LLMs, AI safety, and related technical topics. My research project is mainly psychology-oriented, but I’d really appreciate recommendations or warnings from a software/technical perspective. I’m working on a project about how AI systems respond to prompts involving psychological distress at different levels of intensity. I’m currently considering ChatGPT, Gemini, Wysa, and Replika, and I’m interested in comparing general-purpose LLMs, mental-health-oriented chatbots, and AI companions. Some aspects I’m thinking about are: How each system handles mental health, self-harm, crisis situations, and psychological/medical advice. whether responses change as the prompt becomes more intense, for example when a normal generated response is replaced by a safety protocol, moderation layer, or crisis-resource response. whether systems respond differently to declarative prompts versus question-based prompts, such as “I feel emotionally overwhelmed” vs. “What should someone do if they feels emotionally overwhelmed?” whether responses differ when distress is explicit, indirect, ambiguous, hypothetical, or written in third person. whether the system provides empathy, psychoeducation, referrals, crisis resources, refusal, redirection, or a combination of these. how to account for technical changes over time, such as model versions, neural network weights, safety layers, moderation classifiers, system prompts, memory/retrieval features, and product-level configurations. whether it is methodologically valid to compare systems with very different technical architectures. I’m not trying to evaluate these systems as therapists or test clinical effectiveness with real patients. The focus is on how they respond linguistically, procedurally, and safety-wise when confronted with psychological distress. I’d appreciate recommendations for papers, benchmarks, datasets, evaluation frameworks, or common methodological mistakes to avoid. I’m especially interested in technical issues such as reproducibility, stochastic outputs, temperature/settings, hidden safety layers, system prompts, memory, retrieval mechanisms, and product updates. Thanks in advance! submitted by /u/dakartt [link] [Kommentare]
Pyrecall open source tool for detecting catastrophic forgetting during LLM fine-tuning[P](reddit.com)
Surprised there's no real tooling for this given how much research exists on continual learning. Built pyrecall to fill the gap. Snapshots skill scores before/after fine-tuning, flags regressions, rolls back LoRA adapters by name. Fully local, no external APIs. v0.1.0, MIT, pip install pyrecall Curious if anyone has thoughts on the benchmark design that's the part I'm least confident about. https://github.com/Arths17/Pyrecall submitted by /u/Level_Frosting_7950 [link] [Kommentare]
How common are TMLR desk rejections with "not a suitable venue"? [D](reddit.com)
Submitted a short theoretical paper to TMLR and got desk-rejected with "does not meet our editorial standards or allow us to assess claims and evidence" and "not a suitable venue for this work." Is this a common outcome for first submissions? Curious what typically drives this kind of rejection, scope mismatch, insufficient experiments, or something else. Not looking to appeal, just trying to understand the bar so I don't waste time on the wrong venue next time. Anyone else gotten this and figured out what the actual issue was? submitted by /u/observer678 [link] [Kommentare]
Analysis of the results of the "Transforming autoencoders" architecture mentioned by Hilton, for my dissertation. [r](reddit.com)
Hello everyone, tomorrow I have a meeting with my dissertation supervisor and I wanted to have a dissertation proposal ready. Initially, I moved forward with the following proposal: "Interpreting the Routing Dynamics of Capsule Networks for Explainable AI." My first approach to this topic was to study the paper "Transforming autoencoders," which is the first paper about capsule networks. Next, I did a search on the state of the art of transforming autoencoders and only found 2 papers since 2011. I think I should take advantage of the work I have developed so far on transforming autoencoders and write a dissertation about them. If anyone could take a look at the readme and tell me what they think, I would appreciate it. What do you think? I should suggest another topic involving transforming autoencoders. There isn't much scientific research on them. The professor is approachable, and if I present a good new topic, he'll let me change it! submitted by /u/Future-Persimmon5393 [link] [Kommentare]
Routing LLMs by task verifiability: a small experiment (n=120, 3 models) inspired by Karpathy's framework [D](reddit.com)
Full disclosure: this is directional, not a paper. n=120 tasks, one internal evaluator, not peer reviewed. I work at an LLM infrastructure company. This experiment was done on my own time and is not a company claim. Karpathy's framework classifies tasks by verifiability. Can output be mechanically checked? High verifiability tasks like code compilation and structured JSON extraction are safer because the verifier catches errors. Low verifiability tasks like creative writing are riskier. I wondered if high verifiability tasks are also easier in practice. Can a weaker model do them as well as a frontier model if the verifier catches mistakes? Setup was 120 tasks across four categories. Code unit tests, structured extraction, multi hop reasoning, creative summarization. Three models: Claude Sonnet 4.6, GPT 5.5, local Mistral 3 8B via vLLM 0.6.3. Pass rate for the first two, human rating 1 to 5 for the last two. Results were messy. Code unit tests: Sonnet 4.6 94%, GPT 5.5 91%, Mistral 3 8B 87%. With one retry Mistral 3 hit 95%. That surprised me. I expected the gap to be bigger. Structured extraction: Sonnet 4.6 97%, GPT 5.5 94%, Mistral 3 8B 89%. With retry 96%. Also closer than I expected. But here is where it got weird. Sonnet 4.6 initially scored worse than GPT 5.5 on structured extraction, which made no sense. Turns out our JSON schema had an ambiguous nested array that confused Claude's tool use parser. Fixing the schema brought Sonnet to 98%, but I kept the original numbers in the table because the mistake is part of the story. Your verifier is only as good as your schema. Multi hop reasoning: Sonnet 4.6 78%, GPT 5.5 71%, Mistral 3 8B 51%. Retry didn't help. The model would hallucinate reasoning paths consistently. This is where the capability gap was real. Creative summarization: Sonnet 4.6 4.2 out of 5, GPT 5.5 3.9 out of 5, Mistral 3 8B 3.1 out of 5. Expected. Interpretation: high verifiability tasks seem simpler in the sense that weaker model plus verifier can approach frontier performance. Low verifiability tasks show the expected gap. Limitations: n=120 is tiny. Need 10x for confidence. Our verifier is just JSON Schema plus regexes. Constrained decoding might change the calculus entirely. I also didn't control for prompt length well. Any prompt over 8k tokens was excluded because Mistral 3 8B degrades near its limit, which probably skewed the sample. submitted by /u/DragonfruitAlone4497 [link] [Kommentare]