The use of AI in financial services continues to grow. Especially nowadays with the global COVID-19 pandemic, industry take up is increasing and use cases are expanding out from the back office and into customer-facing applications.

This presentation from UK Finance and EY seeks to advance the thinking on how financial services firms can implement a framework that supports explainable artificial intelligence (AI), thus building trust among consumers, shareholders and other stakeholders. — helping to ensure compliance with emerging regulatory and ethical norms.

This expansion brings many opportunities for industry to improve efficiency, better manage risk and provide exciting new products and services to customers.

However, to take full advantage of this opportunity, there needs to be trust.

As with all innovations, ethical considerations must keep pace with technological development. Building trust requires transparency and communication. Indeed, this is a topic of growing regulatory and government interest in many countries. Transparency and communication with customers have long been key considerations for financial services but AI will require new approaches and techniques if explanations are to be meaningful.

Effective explanations will also require a degree of subtlety; given the huge potential range of use cases, close attention to the context of each will be key.

Databricks shared this video from the Data & AI Summit Europe on adding historical analysis on Salesforce data.

Fivetran makes it easy to automate data ingestion particularly for operational data sources such as Salesforce, Zendesk, and Oracle Eloqua, no matter how source schemas and APIs change.

Achieving historical analysis is cumbersome, time-consuming, and costly to build and maintain manually. A common approach is to include snapshots, which only take into account changes at a given time. Plus, the additional storage requirements can become unwieldy to manage.

Type 2 Slowly Changing Dimension (SCD) allows you to track any change at any point in time.

This session shows how Fivetran History Mode, which uses Type 2 SCD, can be easily configured and then switched on with 1-click and synchronized for a desired time period. This accelerates time to insights, making it easy to both automate data ingestion and historical analysis.

It’s very easy to be distracted by the latest and greatest approaches with technology, but sometimes there’s a reason old approaches stand the test of time.

Star Schemas & Kimball is one of those things that isn’t going anywhere, but as we move towards the “Data Lakehouse” paradigm – how appropriate is this modelling technique, and how can we harness the Delta Engine & Spark 3.0 to maximize it’s performance?

This session looks through the historical problems of attempting to build star-schemas in a lake and steps through a series of technical examples using features such as Delta file formats, Dynamic Partition Pruning and Adaptive Query Execution to tackle these problems.

Amidst rapidly changing conditions, many companies build ETL pipelines using ad-hoc strategy.

However, this approach makes automated testing for data reliability almost impossible and leads to ineffective and time-consuming manual ETL monitoring.

Software engineering decouples code dependency, enables automated testing, and powers engineers to design, deploy, and serve reliable data in a module manner.

As a consequence, the organization is able to easily reuse and maintain its ETL code base and, therefore, scale.

In this presentation, we discuss the challenges data engineers face when it comes to data reliability. Furthermore, we demonstrate how software engineering best practices help to build code modularity and automated testings for modern data engineering pipelines.

Community adoption of Kubernetes (instead of YARN) as a scheduler for Apache Spark has been accelerating since the major improvements from Spark 3.0 release.

Companies choose to run Spark on Kubernetes to use a single cloud-agnostic technology across their entire stack, and to benefit from improved isolation and resource sharing for concurrent workloads.

In this talk, the founders of Data Mechanics, a serverless Spark platform powered by Kubernetes, will show how to easily get started with Spark on Kubernetes.

We will go through an end-to-end example of building, deploying and maintaining an end-to-end data pipeline. This will be a code-heavy session with many tips to help beginners and intermediate Spark developers be successful with Spark on Kubernetes, and live demos running on the Data Mechanics platform.

Included topics:
– Setting up your environment (data access, node pools)
– Sizing your applications (pod sizes, dynamic allocation)
– Boosting your performance through critical disk and I/O optimizations
– Monitoring your application logs and metrics for debugging and reporting

The health emergency underway worldwide has highlighted the need to strengthen the surveillance and care of the sick at home, to avoid hospital overcrowding.

X-RAIS is an AI tool, which as a third eye supports radiologists during the reporting phase of radiological images.

Within this context, we extended X-RAIS capabilities with ALFABETO (ALl FAster BEtter TOgether).

ALFABETO has the main objective of assisting healthcare personnel in the initial triage phase at the patient’s home: using instrumental data, anamnestic data, etc.,

ALFABETO carries out an objective evaluation of the degree of severity of the pathology and a predictive analysis of the possible evolution in the short to medium term, thus providing the essential elements to decide the care strategy to be implemented (home care vs. hospitalization).

In this talk from the Databricks YouTube Channel is about date-time processing in Spark 3.0, its API and implementations made since Spark 2.4.

In particular,it covers the following topics:

  1. Definition and internal representation of dates/timestamps in Spark SQL. Comparisons of Spark 3.0 date-time API with previous versions and other DBMS.
  2. Date/timestamp functions of Spark SQL.  Nuances of behavior and details of implementation. Use cases and corner cases of date-time API.
  3. Migration from the hybrid calendar (Julian and Gregorian calendars) to Proleptic Gregorian calendar in Spark 3.0.
  4. Parsing of date/timestamp strings, saving and loading date/time data via Spark’s datasources.
  5. Support of Java 8 time API in Spark 3.0.

One of the most significant benefits provided by Databricks Delta is the ability to use z-ordering and dynamic file pruning to significantly reduce the amount of data that is retrieved from blob storage and therefore drastically improve query times.

Taking advantage of this approach over petabytes of geospatial data requires specific techniques, both in how the data is generated, and in designing the SQL queries to ensure that dynamic file pruning is included in the query plan.

This presentation demonstrates these optimizations on real world data, showing the pitfalls involved with the current implementation and the workarounds required, and the spectacular query performance that can be achieved when it works correctly.

Privacy engineering is an emerging discipline within the software and data engineering domains aiming to provide methodologies, tools, and techniques such that the engineered systems provide acceptable levels of privacy.

In this talk, learn about Databricks’ recent work on anonymization and privacy preserving analytics on large scale geo location datasets.

In particular, the focus is on how to scale anonymization and geospatial analytics workloads with Spark, maximizing the performance by combining multi-dimensional spatial indexing with Spark in-memory computations.

In production, we have successfully achieved 1500+ times enhancements in terms of geo location anonymization, and 10+ times enhancements on nearest neighbour search based on anonymized geo datasets.