Welcome to curated list of handpicked free online resources related to IT, cloud, Big Data, programming languages, Devops. Fresh news and community maintained list of links updated daily. Like what you see? [ Join our newsletter ]

Data integration vs. data ingestion: What are the differences?

Categories

Tags big-data cio data-science machine-learning

Data integration and data ingestion are two IT disciplines that are often confused with one another. Here’s how they differ and the challenges you may encounter. By Aminu Abdullahi.

Data integration combines data from different sources and transforms it into a unified view for easier access and analysis

Further in the article:

  • What is data integration?
  • ​​What is data ingestion?
  • Common challenges of data integration and ingestion
  • Data integration and ingestion tools

With the increasing amount of data being produced, businesses need better ways to handle and use the information they collect. Data integration and data ingestion are essential components of a successful data strategy and help organizations make the most of their data assets. Good read!

[Read More]

8 most popular Python HTML web scraping packages with benchmarks

Categories

Tags python programming web-development app-development performance

This blog post will cover Python web scraping packages in terms of their speed, ease of use, and personal investigations. This blog post won’t cover what webscraping is and how parsers work. By Dmitriy Zub.

The article recommendations:

  • If you need to scrape data from a dynamic page that doesn’t require clicking, scrolling and similar things but still requires rendering JavaScript, try requests-html. It uses pure XPath as lxml and should be faster than the other two browser automations.

  • If you need to do complex page manipulation on the dynamic page, try to use playwright or selenium.

  • If you scraping non-dynamic pages (rendered via JavaScript), try selectolax over bs4, lxml or parsel. It’s a lot faster, uses less memory, and has almost identical syntax to parsel or bs4. A hidden gem I would say.

  • If you need to use XPath in your parser, try to use either lxml or parsel. parsel is built on top of lxml and translates every CSS query to XPath and can combine (chain) CSS and XPath queries. However, lxml is faster.

Excellent read with charts and code to complement the comparison of each package!

[Read More]

Why I selected Elixir and Phoenix as my main stack

Categories

Tags erlang programming career elixir

This is just a personal journey documentation on how I decided to use my current tech stack. By Camilo.

Over the years I have tried different frameworks, mostly in PHP, like Code Igniter (2010), ProcessWire (2014) and Laravel (2015). They helped me complete different projects with diverse complexity. They are wonderful tools. But sadly most of the jobs I managed to land were using legacy versions of PHP and the codebase and developer experience was spartan to say the least.

I really loved the idea of LiveView and how it wasn’t needed a separate frontend framework to achieve a SPA like experience.

Author the dives into comparison of prons and cons for various frameworks and languages:

  • Vapor and Swift
  • Masonite and Python
  • Springboot and Java
  • Adonis.js and JavaScript
  • Elixir and Phoenix

Author also looked at the job market for each compared framework to assess his future job opportunities. A functional language took time for him to understand the conventions and workflows. It was like a whole new world. Interesting read!

[Read More]

2023 state of databases for serverless & edge

Categories

Tags serverless cio cloud database iot devops

There’s been massive innovation in the database and backend space for developers building applications with serverless and edge compute. There are new tools, companies, and even programming models that simplify how developers store data. By Lee Robinson.

This post will be an overview of databases that pair well with modern application and compute providers.

This comparison dives into:

  • A new programming model
  • Trends
  • Databases
    • Established serverless solutions
      • Firebase
      • MongoDB
      • MySQL
      • PostgreSQL
    • Rising serverless database solutions
      • Fauna
      • Convex
      • Grafbase
      • Neon
      • Supabase
    • Stateful backends and other solutions

While there are new companies creating serverless-first storage solutions, a new programming model is required for workloads to be compatible with serverless compute and modern runtimes. Interesting one!

[Read More]

Reducing Go execution tracer overhead with frame pointer unwinding

Categories

Tags golang programming microservices cloud performance

The Go Execution Tracer (aka runtime/trace) was designed to achieve low enough overhead to be usable on “a server in production serving live traffic”. This is achieved by writing events into per-P buffers, using RDTSC for timestamps, and encoding into a relatively efficient binary format. By Felix Geisendörfer.

Stack unwinding (aka stack walking) is part of the process for taking a stack trace. It involves iterating over all stack frames and collecting the return addresses (program counters) in each frame. It may also involve expanding this list if some of the program counters are part of inlined function calls.

CPU profile showing 94% of execution tracer overhead in gentraceback.

Source: https://blog.felixge.de/reducing-gos-execution-tracer-overhead-with-frame-pointer-unwinding/

So why is stack unwinding so expensive in Go? The short answer is because Go uses a form of asynchronous unwinding tables called gopclntab that require a relatively expensive lookup in order to traverse the stack frames. The gnarly details of this mechanism can be found in the gentraceback function. Follow the link to the full article to learn more!

[Read More]

Micro frontends for Java microservices

Categories

Tags java programming microservices cloud serverless

Microservices have been quite popular in the Java ecosystem ever since Spring Boot and Spring Cloud made them easy to build and deploy. Things have gotten even easier in recent years with the proliferation of new Java frameworks built specifically for microservices: MicroProfile, Micronaut, Quarkus, and Helidon. By Matt Raible.

The tutorial contains info about:

  • Java microservices with Spring Boot and JHipster
  • A quick introduction to Module Federation
  • Why should Java developers care?
  • Micro frontends in action with JHipster
  • Micro frontend options: Angular, React, and Vue
  • Build Java microservices with Spring Boot and WebFlux
  • Switch identity providers

… and more. This tutorial explained how to use micro frontends within a Java microservices architecture. Author likes how micro frontends allow each microservice application to be self-contained and deployable, independent of the other microservices. It’s also pretty neat how JHipster generates Docker and Kubernetes configuration for you. Great read!

[Read More]

Scala: Implicit parameters when to use them?

Categories

Tags scala programming akka serverless

Implicits are one of the most feared features of the Scala programming language and for good reasons! By Julien Truffaut.

First, the concept of implicits is fairly specific to Scala. No other mainstream programming language has a similar concept. This means that new Scala developers have no patterns to rely on to use implicits correctly.

Second, the keyword implicit is overused in Scala 2 (similar to _). Therefore, it requires a lots of time and practice to distinguish between the various usages of implicits. On that point, Scala 3 has made great improvements by introducing dedicated syntax for each implicit’s use case.

The article then goes over:

  • Usages
  • The environment pattern
  • Bonus: Alternative implementations of the environment pattern
    • Class parameters
    • ThreadLocal
    • Reader
    • ZIO

The most important take away about implicit parameters is that values injected by the compiler should be obvious! If you need to check your imports or run a debugger to figure out which value was injected, then it is not obvious and you would be better off passing the values explicitly. Nice one!

[Read More]

How Swift code runs on AWS Lambda

Categories

Tags aws swiftlang cloud serverless

A Swift binary doesn’t just run on AWS Lambda without some help. In comes the Swift AWS Lambda Runtime to abstract away all the complex interactions between the Lamba Runtime API and your code. By Kevin Hinkson.

AWS Lambda Service Architecture

Source: https://www.flew.cloud/blog/how-swift-code-runs-on-aws-lambda/

With AWS Lambda runtime API there is a clear separation of roles and responsibilities. Eg: where your Swift code runs is highlighted in red in the image above (Runtime + Function) and it just needs to know how to talk to the Runtime API as described in the diagram.

The article further briefly describes:

  • AWS Lambda
  • AWS Lambda runtime API
  • Lambda Runtime + Swift Function
  • Runtime API Calls

Once you read the article it should be not very difficult to understand how the Lambda service and the Runtime API work together. The simple and elegant API provides an interface for managing how Swift code can run on Lambdas . See AWS for more information and documentation details on Lambda runtime API. Nice one!

[Read More]

Azure Confidential Computing on 4th gen Intel Xeon scalable processors with Intel TDX

Categories

Tags azure app-development infosec cloud servers

Intel TDX meets the Confidential Computing Consortium (CCC) standard for hardware-enforced memory protection not controlled by the cloud provider, all while delivering minimal performance impact with no code changes. By Mark Russinovich Chief Technology Officer and Technical Fellow, Microsoft Azure.

The CCC defines confidential computing as the protection of data in use by performing computations in a hardware-based Trusted Execution Environment (TEE).

The article then dives into:

  • Azure and Intel enable innovative use cases
  • Intel TDX extends Azure’s existing confidential computing offerings
  • Removing trust in the hypervisor
  • Establishing trust via attestation
  • Confidential computing takes off
  • Azure’s vision for the confidential cloud

Customers use confidential computing with Intel processors to achieve higher levels of data privacy and mitigate risks associated with unauthorized access to sensitive data or intellectual property. They are leveraging innovative solutions such as data clean rooms to accelerate the development of new healthcare therapies, and privacy-preserving digital asset management solutions for the financial industry. Interesting read!

[Read More]

Using Apache Kafka to process 1 trillion inter-service messages

Categories

Tags event-driven apache apis app-development database

Cloudflare has been using Kafka in production since 2014. We have come a long way since then, and currently run 14 distinct Kafka clusters, across multiple data centers, with roughly 330 nodes. Between them, over a trillion messages have been processed over the last eight years. By Matt Boyle.

Cloudflare uses Kafka to decouple microservices and communicate the creation, change or deletion of various resources via a common data format in a fault-tolerant manner. This decoupling is one of many factors that enables Cloudflare engineering teams to work on multiple features and products concurrently.

The article is further describing:

  • Tooling
  • Connectors
  • Strict Schemas
  • Observability
  • A practical example
  • What’s next?

Making it easy for teams to observe Kafka is essential for our decoupled engineering model to be successful. Company therefore have automated metrics and alert creation wherever we can to ensure that all the engineering teams have a wealth of information available to them to respond to any issues that arise in a timely manner. Good read!

[Read More]