// CodeIsGo.com / codeisgo.com

Pandas 2.0 and its ecosystem (Arrow, Polars, DuckDB)

Posted on March 15, 2023, Level intermediate Resource Length medium

Categories

Tags big-data data-science python programming

Data manipulation and analysis can be challenging and involve working with large datasets. Thankfully, a widely used Python library known as Pandas has become the go-to tool for processing and manipulating data. Pandas recently got an update, which is version 2.0. This article takes a closer look at what Pandas is, its success, and what the new version brings, including its ecosystem around Arrow, Polars, and DuckDB. By Simon Späti.

That makes it an excellent time to reflect on what Pandas is and why it’s successful. Further in the article:

What is Pandas
How does Pandas work?
What are the highlights of version 2.0
What changes code-wise?
What is Apache Arrow?
Why Apache Arrow?
Interoperability
When not to use Pandas
The alternatives
Polars: Riding the fast train of rust
DuckDB: The SQL version
What about Dask?
Others: Koalas, Vaex, VertiPaq

Apache Arrow sets the open standard to exchange in a heterogeneous data pipeline, which needs to read and share data among different steps. Overall, this article provides insights into the benefits of using Pandas, particularly with its 2.0 version, and the exciting changes in its ecosystem around Arrow, Polars, and DuckDB. Excellent read!

[Read More]

A deep dive into AIOps and MLOps

Posted on March 14, 2023, Level intermediate Resource Length medium

Categories

Tags big-data data-science devops cloud

Monitoring and managing a DevOps environment is complex. The volume of data generated by new distributed architectures (such as Kubernetes) makes it difficult for DevOps teams to effectively respond to customer requests. By Hicham Bouissoumer, Nicolas Giron.

The future of DevOps must therefore be based on intelligent management systems. Since humans are not equipped to handle the massive volumes of data and computing in daily operations, artificial intelligence (AI) will become the critical tool for computing, analyzing, and transforming how teams develop, deliver, deploy, and manage applications.

Further in the article:

What are Machine Learning operations?
Lifecycle of a Machine Learning model
Core elements of MLOps
What are Artificial Intelligence operations?
Core element of AIOps
AIOps toolset
What is the difference between MLOps and AIOps?

Coupled with the increasing complexity of architectures of modern applications, the demands of this digital economy have made the role of IT operations much more complex. As a result, ML and AI have emerged to automate some manual business processes to increase efficiency. Organizations throughout the world are increasingly looking to automation technologies as a means of improving operational efficiency. This indicates that tech leaders are becoming more and more interested in MLOps and AIOps. Good read!

[Read More]

Building serverless Java applications with the AWS SAM CLI

Posted on March 13, 2023, Level intermediate Resource Length medium

Categories

Tags apis devops aws app-development serverless

When using Java in the serverless environment, the AWS Serverless Application Model Command Line Interface (AWS SAM CLI) offers an easier way to build and deploy AWS Lambda functions. You can either use the default AWS SAM build mechanism or tailor the build behavior to your application needs. By Mehmet Nuri Deveci, Steven Cook, and Maximilian Schellhorn.

Since Java offers a variety of plugins and tools for building your application, builders usually have custom requirements for their build setup. In addition, when targeting GraalVM or non-LTS versions of the JVM, the build behavior requires additional configuration to build a Lambda custom runtime.

In this guide you will learn:

Building Uber-Jars with AWS SAM CLI
Running the build process inside a container
Using your own base build images for creating custom runtimes
Deploying the application without building with AWS SAM

This blog post shows how to build Java applications with the AWS SAM CLI. You learnt about the default build mechanisms, and how to customize the build behavior and abstract the build process inside a container environment. Visit the GitHub repository for the example code templates referenced in the examples.

[Read More]

Introduction to Web Audio API

Posted on March 12, 2023, Level intermediate Resource Length medium

Categories

Tags apis devops web-development app-development miscellaneous browsers

A critical part of WebRTC is the transmission of audio. Web Audio API is all about processing and synthesizing audio in web applications. It allows developers to create complex audio processing and synthesis using a set of high-level JavaScript objects and functions. By Madhu Balakrishna.

Web Audio API provides several ways to capture and playback audio in web applications. The article then details:

Capture and playback audio
Autoplay
Codecs
Permissions
Audio processing

Web Audio API provides a powerful set of tools for manipulating audio, including filtering, mixing, and processing. This allows developers to create web-based audio editing tools that can be used to record, edit, and export audio. Authors covered the basics of Web Audio transmission and concepts around it in case of WebRTC in this blog post. There is more to catch up on this topic. Nice read!

[Read More]

Deploy Apache Flink cluster on Kubernetes

Posted on March 11, 2023, Level intermediate Resource Length medium

Categories

Tags apache devops cloud data-science big-data

When it comes to deploying Apache Flink on Kubernetes, you can do it in two modes, either session cluster or job cluster. A session cluster is a running standalone cluster that can run multiple jobs, while a Job cluster deploys a dedicated cluster for each job. By Elvis David.

Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. Flink has been designed to run in all common cluster environments, perform computations at in-memory speed and at any scale.

In the article you will find clear advice on session cluster:

Deployment object which specifies the JobManager
Deployment object which specifies the TaskManagers
A service object exposing the JobManager’s REST APIs

The TaskManager can be configured with a certain number of processing slots which gives the ability to execute several tasks at the same time and this is what we call Parallelism.

Following this guide you will create a deployment object which will be used to instantiate our JobManager. This deployment object will create a single JobManager with the container image Flink-1.10.0 for scala and exposes the container ports for RPC communication, blob server, queryable state server and web UI. Files needed for deployment are included. Nice one!

[Read More]

How to orchestrate an ETL Data Pipeline with Apache Airflow

Posted on March 10, 2023, Level intermediate Resource Length medium

Categories

Tags apache database nosql data-science python big-data

Data Orchestration involves using different tools and technologies together to extract, transform, and load (ETL) data from multiple sources into a central repository. By Aviator Ifeanyichukwu.

Apache Airflow is an open-source platform for developing, scheduling, and monitoring batch-oriented workflows. Airflow’s extensible Python framework enables you to build workflows connecting with virtually any technology. A web interface helps manage the state of your workflows. Airflow is deployable in many ways, varying from a single process on your laptop to a distributed setup to support even the biggest workflows.

Data orchestration typically involves a combination of technologies such as data integration tools and data warehouses. What you will learn in this article:

How to extract data from Twitter
How to write a DAG script
How to load data into a database
How to use Airflow Operators

Apache Airflow is an easy-to-use orchestration tool making it easy to schedule and monitor data pipelines. With your knowledge of Python, you can write DAG scripts to schedule and monitor your data pipeline. Code for app written in Python is also included. Good read!

[Read More]

Why and how to replace end-to-end tests with synthetic monitors

Posted on March 9, 2023, Level beginner Resource Length medium

Categories

Tags programming cloud tdd miscellaneous performance agile

An older article about potential alternative to classic end-to-end tests: synthetic monitors. A thousand tests can’t prove your software works. They can only prove it doesn’t. When your code reaches production, even the most thorough end-to-end tests can’t prevent your users from seeing that “500 - Unexpected Server Error” screen that keeps you awake at night. By Lucas da Costa.

Author will describe how you can use Elastic’s Synthetic Monitors to help you build software that “works on your machine” and everyone else’s, both before and after your code reaches production. Further in the article:

Why can’t end-to-end tests prove your software works and what can you do about it?
Writing your first synthetic journeys
Running Synthetics Journeys on CI
Setting up Synthetic Monitors through Kibana
Automatically updating monitors and running tests against multiple environments

Kibana is shipping with many improvements to Elastic’s Synthetics, including, for example, a script recorder, which will make it easier for professionals other than developers to create journeys. Interesting read!

[Read More]

API rate limiting vs. API throttling: How are they different?

Posted on March 8, 2023, Level intermediate Resource Length medium

Categories

Tags apis cloud devops management

The explosive growth of digital services and mobile devices has created new challenges for developers trying to support users with different needs and usage patterns. High user demand, limited network data plans, and user frustration all combine to create a need for API throttling. By Vyom Srivastava.

Further in the article:

What is API throttling?
What is API Rate limiting?
Similarities Between API rate limiting and API throttling
Differences between API throttling and rate limiting
Common mistake: Misunderstanding throttling and rate limiting
When to implement API rate limiting vs. API throttling

APIs play a critical role in modern software development, so it is important to manage their usage and performance. In the simplest form of API throttling, the throttler would be part of the API server, and it would monitor the number of API requests per second and minute, per user, or per IP address based on user authentication. Rate limiting, on the other hand, is the practice of limiting the number of requests that can be made to an API within a specific time period. Good read!

[Read More]

How IBM's new supercomputer is making AI foundation models more enterprise-budget friendly

Posted on March 7, 2023, Level beginner Resource Length medium

Categories

Tags ibm cloud management

Foundation models are changing the way that artificial intelligence (AI) and machine learning (ML) are able to be used. All that power comes with a cost though, as building AI foundation models is a resource-intensive task. By Sean Michael Kerner.

IBM announced that it has built out its own AI supercomputer to serve as the literal foundation for its foundation model–training research and development initiatives. Named Vela, it’s been designed as a cloud-native system that makes use of industry-standard hardware, including x86 silicon, Nvidia GPUs and ethernet-based networking.

IBM is no stranger to the world of high-performance computing (HPC) and supercomputers. One of the fastest supercomputers on the planet today is the Summit supercomputer built by IBM and currently deployed in the Oak Ridge National Laboratory.

The Vela system, however, isn’t like other supercomputer systems that IBM has built to date. For starters, the Vela system is optimized for AI and uses x86 commodity hardware, as opposed to the more exotic (and expensive) equipment typically found in HPC systems. Interesting read!

[Read More]

Write reusable code for AppSync JavaScript resolvers

Posted on March 6, 2023, Level beginner Resource Length short

Categories

Tags nodejs javascript aws frontend apis database

Learn how to share code between AppSync JS resolvers. AWS AppSync is a fully managed service that allows developers to build scalable and performant GraphQL APIs. It is also serverless, meaning that you will only pay for what you use. By Benoît Bouré.

This tutorial is straightforward and covers:

Pre-requisites and Assumptions
Let’s get started
Testing

The introduction of AppSync JavaScript resolvers was a game-changer. They let developers write code quicker in a language they are more familiar with. They also open the possibility to write reusable code more easily. In this article, I showed you how to write simple helper functions that you can use in more than one resolver while still following AppSync’s rules. The code examples are also included. Good read!

[Read More]