Transactional Writes in Spark
Covering Spark’s default and Databrick’s Transactional Write strategies used to write the result of a job to a destination and guarantee no partial results a...
Covering Spark’s default and Databrick’s Transactional Write strategies used to write the result of a job to a destination and guarantee no partial results a...
Have you ever wondered how Spark uses Kerberos authentication? How and when the provided through the spark-submit –principal and –keytab options are used? Th...
This post detailedly explains and presents a workaround solution to a problem with HBase authentication in long-running Spark 2 applications.
In this post I’m sharing my feedback and some preparation tips on the CRT020 - Databricks Certified Associate Developer for Apache Spark 2.4 certification ex...
Are you interested in taking the CA175 certification? Here goes my feedback on exam structure, exam environment and practical exercises you can do to prepare...
If you are using the HttpClient library of version 4.5.2 to make HTTP requests to a backend server with SSL and SPNego, and the requests are unexpectedly fai...
In this post you will see how Kerberos authentication with pure Java Authentication and Authorization Service (JAAS) works and how to use the UserGroupInform...
A step-by-step guide on how to implement custom Spark Evaluators in StreamSets
A step-by-step guide on how to process change data capture (CDC) events with StreamSets, using its Oracle CDC Client and delivering to CRUD and non-CRUD dest...
Comparing standard and memory optimized Excel generations using Apache POI library
How to rescale a running Flink job? Rescaling is useful to better use computational resources when your application does not have the same workload at all ti...
How traditional data transfer works, what is a Zero Copy optimization and how Kafka benefits from it when combined with the Page Cache.