📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
Always know what to expect from your data.
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
Kestra is an infinitely scalable orchestration and scheduling platform, creating, running, scheduling, and monitoring millions of complex pipelines.
lakeFS - Data version control for your data lake | Git for data
Open Standard for Metadata. A Single place to Discover, Collaborate and Get your data right.
An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collection, ensuring safety & robustness. 📈
Feathr – A scalable, unified data and AI engineering platform for enterprise
:zap: Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
Data quality assessment and metadata reporting for data frames and database tables
Automatically find issues in image datasets and practice data-centric computer vision.
Get updates on the fastest growing repos and cool stats about GitHub right in your inbox
Once per month. No spam.