Tag: Data science
-
A serverless query engine from spare parts
Posted on May 7, 2023, Level intermediate Resource Length medium
An open-source implementation of a Data Lake with DuckDB and AWS Lambdas. In this post we will show how to build a simple end-to-end application in the cloud on a serverless infrastructure. The purpose is simple: we want to show that we can develop directly against the cloud while minimizing the cognitive overhead of designing and building infrastructure. By Ciro Greco.
Tags data-science streaming apis database serverless open-source
-
Moving beyond Google: Why ChatGPT is the search engine of the future
Posted on April 26, 2023, Level beginner Resource Length medium
I was thrilled when my school announced its new 1-to-1 technology program in my first year of teaching, a decade ago. This announcement meant that each of our students would now have a school-issued laptop in the classroom. Not only was it a welcome transition from traditional paper-based learning, but it also meant that I would be relieved from my daily tussles with the copy machine. By Zak Cohen
Tags miscellaneous data-science big-data search learning
-
Organoid intelligence: Computing on the brain
Posted on April 25, 2023, Level beginner Resource Length short
Small spheres of neurons show promise for drug testing and computation. In parallel with recent developments in machine learning like GPT-4, a group of scientists has recently proposed the use of neural tissue itself, carefully grown to re-create the structures of the animal brain, as a computational substrate. By Michael Nolan.
Tags miscellaneous data-science big-data startups
-
Simplified data pipelines with Pulsar transformation functions
Posted on April 24, 2023, Level intermediate Resource Length medium
They provide a low-code way to develop basic processing and routing of data using existing Pulsar features. Using functions in the cloud is a very efficient way of creating iterable workflows that can transform data, analyze source code, make platform configurations, and do many other useful jobs. As you develop a function you will quickly realize a need for a solid foundation of utilities and formatting. By Christophe Bornet.
Tags app-development data-science apache big-data
-
Eying efficiency: This is data's opportunity
Posted on April 13, 2023, Level intermediate Resource Length long
Data is the biggest value driver for businesses bringing positive change and enabling near term efficiency. Maintaining an eye on the future is so important to businesses - and it's why data leadership is of rising value. Although currently there's a great deal of economic uncertainty, the data community should feel optimistic; this is the opportunity to demonstrate business value, benefit colleagues and play a role in efficiency and sustainability initiatives. By Danielle McConville.
Tags data-science cio how-to big-data machine-learning
-
Real-time data linkage via Linked Data Event Streams
Posted on April 12, 2023, Level intermediate Resource Length long
Real-time interchanging data across domains and applications is challenging; data format incompatibility, latency and outdated data sets, quality issues, and lack of metadata and context. A Linked Data Event Stream (LDES) is a new data publishing approach which allows you to publish any dataset as a collection of immutable objects. The focus of an LDES is to allow clients to replicate the history of a dataset and efficiently synchronize with its latest changes. By towardsai.net.
Tags data-science streaming performance how-to big-data apache
-
Humanness in the age of AI
Posted on April 7, 2023, Level beginner Resource Length long
A path to an open and permissionless identity protocol. The Worldcoin project is initiating an open and permissionless identity protocol called World ID. It empowers individuals to verify their humanness online while maintaining their anonymity through zero-knowledge proofs. By @worldcoin.org.
Tags crypto big-data cloud cio data-science miscellaneous
-
Mastering weather predictions: AI with LSTM Deep Learning models for accurate temperature forecasts
Posted on April 1, 2023, Level beginner Resource Length long
Predicting temperature trends with advanced deep learning techniques using LSTM. Weather forecasting is one of the most important tools in the modern world and developing a good temperature prediction model can be a huge competitive advantage for many businesses. Ambient temperature measurement is directly linked to several business areas such as agriculture, energy sector, trading, aviation, and many other sectors. By Octavio Santiago.
Tags big-data data-science machine-learning app-development learning
-
From 50 ML projects, 48 made it to production within 2 weeks. How?
Posted on March 20, 2023, Level beginner Resource Length medium
Putting machine learning (ML) models in production is considered an operational challenge that is performed after all the hard work on training and optimizing the model is completed. In contrast, serverless ML starts with a minimal model, including the operational feature pipeline(s) and inference pipeline. By Jim Dowling.
Tags big-data data-science cloud cio devops
-
Pandas 2.0 and its ecosystem (Arrow, Polars, DuckDB)
Posted on March 15, 2023, Level intermediate Resource Length medium
Data manipulation and analysis can be challenging and involve working with large datasets. Thankfully, a widely used Python library known as Pandas has become the go-to tool for processing and manipulating data. Pandas recently got an update, which is version 2.0. This article takes a closer look at what Pandas is, its success, and what the new version brings, including its ecosystem around Arrow, Polars, and DuckDB. By Simon Späti.
Tags big-data data-science python programming
-
A deep dive into AIOps and MLOps
Posted on March 14, 2023, Level intermediate Resource Length medium
Monitoring and managing a DevOps environment is complex. The volume of data generated by new distributed architectures (such as Kubernetes) makes it difficult for DevOps teams to effectively respond to customer requests. By Hicham Bouissoumer, Nicolas Giron.
Tags big-data data-science devops cloud
-
Deploy Apache Flink cluster on Kubernetes
Posted on March 11, 2023, Level intermediate Resource Length medium
When it comes to deploying Apache Flink on Kubernetes, you can do it in two modes, either session cluster or job cluster. A session cluster is a running standalone cluster that can run multiple jobs, while a Job cluster deploys a dedicated cluster for each job. By Elvis David.
Tags apache devops cloud data-science big-data