64th issue! If you missed them, you can read the previous issues of my monthly A.I. & Machine Learning newsletter here.
Hey there, Daniel here.
I’m an A.I. & Machine Learning Engineer who also teaches the following beginner-friendly machine learning courses:
I also write regularly about machine learning on my own blog as well as make videos on the topic on YouTube.
Since there's a lot going on, the utmost care has been taken to keep things to the point.
Enough about me! You're here for this month's A.I. & Machine Learning Monthly Newsletter.
Typically a 500ish (+/-1,000ish, usually +) word post detailing some of the most interesting things on machine learning I've found in the last month.
NVIDIA’s new cuML (CUDA ML) framework enables Scikit-Learn to be run on NVIDIA GPUs. It comes built into Google Colab and offers a “no code change” setup. See the graphic below for an example of a 92x speedup using the RandomForestClassifier model, all code available in an example notebook.
NVIDIA’s cuML speeds up Scikit-Learn on various tasks by up to 92x (potentially higher depending on the task).
Intel’s AutoRound framework helps to quantize (make smaller) LLMs and VLMs without losing large amounts of accuracy. Models quantized with AutoRound can retain 99%+ of their original accuracy whilst requiring 2-3x less memory. AutoRound is capable of running on a single GPU (e.g. A100 80G) in a couple of minutes (for smaller models) and a couple of hours (for larger models).
Retained performance (average on 13 tasks) of quantized models versus their original 16 bit implementation as well as how long each model took to go through a different AutoRound setting of “best”, “default”, “light” tuning.
12 factor agents is the AI Agent version of the twelve-factor app. If you’re looking to build AI Agents, you should read through these twelve steps. The author, Dex, breaks Agents down into bite sized components with the spirt of maximizing experimentation whilst retaining control.
My favourite is Factor 2: own your prompts. If prompts are your main entry point to an LLM, why wouldn’t you treat them like first-class code?
Factor 2 of 12: Own your prompts. If prompts are one of the main interactions your applications with an LLM or AI model, they should be treated as first-class code. Source: 12 factor apps.
Apple share how they use LLMs for App Store Review Summarization. Combines three LLMs for creating summaries tailored to a specific app:
A good case study to how multiple LLMs can be combined each with specialized functions to perform a task at scale.
Example of an LLM-generated review summary based on existing user reviews in the App Store. Source: Apple ML Research blog.
vLLM adds in support for Transforms backend and shares best practice for accelerating RLHF. vLLM is a serving and inference engine for LLMs. Meaning, if you want to serve and deploy your LLMs as well as have them run faster, vLLM is one of the best tools on the market. I’ve personally noticed speedups of 20x running LLMs such as Phi-4 with vLLM versus the native Hugging Face Transformers implementation. The good news is vLLM is expanding support to many more Transformers models by adding the --model-impl transformers
flag. The vLLM team also share best practices for generating data (using a large amount of inference) for Reinforcement Learning from Human Feedback techniques.
Bespoke Labs show how to use Reinforcement Learning to improve Qwen2.5-7B-Instruct’s tool use by 23% with only 100 samples. If you’re building an AI Agent, one of the most important requirements is for the underlying LLM to be able to use tools (consider tools as being structured outputs to call a certain function, e.g. “what is the weather?” → get_weather_tool
).
The article contains good tidbits and training recipe steps such as filtering the responses with ultra-long outputs to prevent the model from getting stuck in recursive loops.
Bespoke Labs share a case study on how to scale up synthetic data to build a high-quality chart extraction model. Bespoke’s MiniChart-7B is a fine-tuned version of Qwen2.5-VL-7B-Instruct which is capable of performing on par or better than models such as Gemini 1.5 Pro and Claude 3.5 at chart information extraction.
They achieved this thanks to a four stage synthetic data curation pipeline involving 40k real-world images of charts and doing the following: extract facts, generation questions about the charts, answer the questions, augment the questions and regenerate answers.
Their final dataset ends with 270k chart-question-CoT-answer (CoT = Chain of Thought, in other words, the model’s thinking steps outlined line by line) tuples (13k images, 91k curated unique QA pairs with 3 CoT traces each). This is a really cool example of how targeted synthetic data generation can get you outstanding results with a smaller as well as open model.
Example of a training data question and answer pair for creating Bespoke-MiniChart-7B. The chart image has a question related to the information contained within it and the associated answer. The text output of the model shows thinking steps as well as the final answer. These pairs of samples, chart, question and thinking trace are used to fine-tune the model. Source: Bespoke Labs blog.
Brakes on an intelligence explosion. Nathan Lambert, one of my favourite writers in the AI space and the post-training lead at the Allen Institute for AI, writes a series of inquiries about why he thinks AI 2027 (some form of Superhuman AI Researcher by 2027) won’t happen.
Points include: Labs making progress on evaluations by bootstrapping similar problems, current AI is broad, not narrow intelligence, data research is the foundation of algorithmic AI progress and the over-optimism of RL training (the real world doesn’t have as many narrow objectives as most RL systems tend to optimize for).
There are no new ideas in AI… only new datasets. ImageNet, the Web, human preferences and verifiers, what do all of these have in common?
Jack Morris writes how they’re all a form of dataset which led to the latest AI innovation. ImageNet led to superhuman computer vision systems, the Web (the whole internet of text) led to pretraining LLMs, human preferences led to ChatGPT (Reinforcement Learning on Human Preferences) and verifiers (such as calculators and problems with verifiable answers) led to reasoning models such as DeepSeek R1. And what seems to be the trend is that various tricks and tips for ML models all end up at similar results (e.g. Transformers and CNNs perform on par with computer vision tasks), the main thing that seems to drive significant progress is a high quality dataset.
SAM 3 (Segment Anything 3) announced as ‘coming soon’ at LlamaCon 2025. Source: LlamaCon 2025 livestream.
Perception Encoder: A state of the art language-aligned vision encoder which performs better or on par to SigLIP2. Perception Encoder comes in three flavours: Core, Language aligned (e.g. for use with a VLM) and Spatial aligned (e.g. for use with object detection/segmentation).
Perception LM: An open data, open training VLM which performs on par with Qwen2.5-VL-7B, perfect for those looking to see how modern VLMs are trained.
Locate 3D: An open-vocabulary detection model capable of detecting items in 3D space. For example you can see detect “bicycle” and the model will find where in a 3D world a bicycle is detected.
Byte Latent Transformer: A tokenizer-free language model that operates on raw bytes rather than tokens. The first model of its kind to equal language-trained tokens except trained on raw bytes.
Example of Locate 3D working on 3D point clouds in a room detecting a natural language query of “bicycle”. Source: Locate 3D website.
(laughs)
and (sighs)
in between the speech and the model will take those into account.Example of selecting an object in an image (using a segmentation model) and then have the DAM model describe what’s there. Source: DAM demo.
What a massive month for the ML world in April!
As always, let me know if there's anything you think should be included in a future post.
In the meantime, keep learning, keep creating, keep dancing.
See you next month,
Daniel
By the way, I'm also an instructor with Zero To Mastery Academy teaching people Machine Learning & AI in the most efficient way possible. You can see a few of our courses below or check out all Zero To Mastery courses.