🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
TorchXRayVision: A library of chest X-ray datasets and models. Classifiers, segmentation, and autoencoders.
Images to inference with no labeling (use foundation models to train supervised models)
A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
InternVideo: General Video Foundation Models via Generative and Discriminative Learning (https://arxiv.org/abs/2212.03191)
Sample to envision intelligent apps with Microsoft's Copilot stack for AI-infused product experiences.
Segment-anything related awesome extensions/projects/repos.
Awesome things about LLM-powered agents. Papers / Repos / Blogs / ...
Get updates on the fastest growing repos and cool stats about GitHub right in your inbox
Once per month. No spam.