Learn how to design, develop, deploy and iterate on production-grade ML applications.
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
Always know what to expect from your data.
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
Kestra is an infinitely scalable orchestration and scheduling platform, creating, running, scheduling, and monitoring millions of complex pipelines.
lakeFS - Data version control for your data lake | Git for data
Open Standard for Metadata. A Single place to Discover, Collaborate and Get your data right.
Learn how to design, develop, deploy and iterate on production-grade ML applications.
An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collection, ensuring safety & robustness. 📈
Feathr – A scalable, unified data and AI engineering platform for enterprise
The Virtual Feature Store. Turn your existing data infrastructure into a feature store.
:zap: Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
re_data - fix data issues before your users & CEO would discover them 😊
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
A curated, but incomplete, list of data-centric AI resources.
Data quality assessment and metadata reporting for data frames and database tables
Automatically find issues in image datasets and practice data-centric computer vision.
Get updates on the fastest growing repos and cool stats about GitHub right in your inbox
Once per month. No spam.