Newsletter Archive

Wednesday, March 18, 2026

AI Pulse Daily Wednesday, March 18, 2026

Visit AI Pulse · Share

vllm-project/vllm: A high-throughput and memory-efficient inference and serving engine for LLMs

vLLM hits GitHub Trending with state-of-the-art LLM inference performance: continuous batching, speculative decoding, FlashAttention integration, and support for multiple quantization schemes (AWQ, GPTQ, INT4, INT8, FP8) to squeeze throughput and cut memory costs. Originally from UC Berkeley, now community-driven with contributions spanning academia and industry.

GitHub Trending · GitHub repo Repos

hiyouga/LlamaFactory: Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

LlamaFactory lets you fine-tune 100+ LLMs and vision models through a single unified API and web UI, cutting implementation time from weeks to hours. ACL 2024 paper + Docker support + HuggingFace integration make it the closest thing to an industry standard for efficient multi-model tuning.

GitHub Trending · GitHub repo Repos

State of Open Source on Hugging Face: Spring 2026

Hugging Face hit 11 million users, 2 million public models, and 500,000 datasets by 2025—nearly doubling activity—signaling a shift from passive interest to active participation in open-source AI development.

Hugging Face · 13 min read Community

GPT-5.4 mini brings some of the smarts of OpenAI's latest model to ChatGPT Free and Go users

OpenAI rolls out GPT-5.4 mini to free ChatGPT users, delivering reasoning and multimodal improvements over GPT-5.0 mini while running 2x faster. A separate nano variant hits the API for cost-sensitive tasks like data extraction.

Engadget · 2 min read Industry

NVIDIA, Telecom Leaders Build AI Grids to Optimize Inference on Distributed Networks

NVIDIA and major U.S./Asian telecom operators announced AI grids—geographically distributed inference platforms leveraging ~100,000 existing telecom data centers worldwide. Telcos are monetizing spare power and real estate by running AI inference at the edge, with some integrating AI-RAN directly into radio networks to co-locate model workloads with connectivity.

NVIDIA AI · 6 min read Industry

GPT-5.4 mini and GPT-5.4 nano, which can describe 76,000 photos for $52

OpenAI released GPT-5.4 mini and nano models alongside the flagship GPT-5.4. The nano model undercuts Google's Gemini 3.1 Flash-Lite at $0.20 per 1M input tokens, outperforms GPT-5 mini on benchmarks, and enables massive vision workflows—76,000 photo descriptions for $52. Mini is 2x faster than its predecessor.

Simon Willison · 3 min read Industry

Unsloth AI Releases Unsloth Studio: A Local No-Code Interface For High-Performance LLM Fine-Tuning With 70% Less VRAM Usage

Unsloth AI released Unsloth Studio, a no-code local interface that cuts LLM fine-tuning VRAM usage by 70% and doubles training speed via hand-written Triton kernels. Engineers can now fine-tune 70B-parameter models like Llama 3.1 and DeepSeek-R1 on a single RTX 4090, eliminating the multi-GPU cluster requirement.

MarkTechPost · 4 min read Tools

The Pentagon is planning for AI companies to train on classified data, defense official says

The Pentagon is building secure data centers where AI companies like Anthropic and OpenAI will train classified versions of their models on sensitive intelligence, marking the first time U.S. military classified data embeds directly into production model weights rather than just being queried by existing models.

MIT Tech Review · 4 min read Policy