May 16,2021

AI Blindspots — Part III.  AI Blindspots — Part II.  Question Answering with a fine-tuned BERT.  How Data Answers the ‘Am I Too Old to Learn Coding’ Timeless Question.  Susan Maina – Medium.  How To Predict Customer Churn From Your Website Logs?.  AI Blindspots — Part I. Shared Responsibility.  A Beginner’s Guide to Text Classification with Scikit-Learn.  20x times faster Grid Search Cross-Validation.  How to train your deep learning models in a distributed fashion..    Detecting deforestation from satellite images.  Google AI Blog: Learning to Manipulate Deformable Objects.  Recognizing 3D spaces without spatial labels.  Prepare data for predicting credit risk using Amazon SageMaker Data Wrangler and Amazon SageMaker Clarify.  Our Search for Demand: REX Real Estate Index.  A Simple Outline of Reinforcement Learning.  Making Interactive Line Plots with Python Pandas and Altair.  Feature Selection — Exhaustive vs. Cherry-Picked.  XGBoost: A Complete Guide to Fine-Tune and Optimize your Model.  A Hitchhiker’s Guide to Sentiment Analysis using Naive-Bayes Classifier.
AI Blindspots — Part III

In the last post of the three-part series on AI Blindspots, we will discuss teamwork, culture and AI quarterbacks. Photo by Leon on UnsplashTeamworkOne of the biggest blindspot that results in a failed AI initiative is the misconception that Data Scientists are solely responsible for the success of AI solutions. One of the most common indicators of failure for an AI initiative is misalignment between the owners of these milestones. An AI leader will inform and advise on what can be done with AI, what cannot be done with AI and what should not be done with AI. A healthy team dynamic, conducive culture and quarterback AI leaders will invariably pave the path to successful AI initiatives.
AI Blindspots — Part II

Feeling EmpoweredShaheen Gauher, PhDIn the first part of the AI Blindspots series, we touched upon the standard risk and investment assessment to undertake before placing an order for an AI model. Some of the important questions to ask AI vendors before ordering an AI model are — How was the data to build the model acquired? AI requires data that is consistent, connected, complete and correct — a much higher bar than your traditional data requirement for dashboarding and reporting tools. The notion that using AI will fix past mistakes and bad practices is wishful thinking and harmful. Using AI responsibly requires being diligent, being aware of the AI blindspots and above all humility.
Question Answering with a fine-tuned BERT

Apart from the “Token Embeddings”, BERT internally also uses “Segment Embeddings” and “Position Embeddings”. Segment embeddings help BERT in differentiating a question from the text. ?Please enter your text:The Vatican Apostolic Library (), more commonly called the Vatican Library or simply the Vat, is the library of the Holy See, located in Vatican City. The Vatican Library is a research library for history, law, philosophy, science and theology. In March 2014, the Vatican Library began an initial four-year project of digitising its collection of manuscripts, to be made available online.
How Data Answers the ‘Am I Too Old to Learn Coding’ Timeless Question

So I split the data into two; those that started coding below 30 years old and those over 30 years. Looking at the 2 charts, we see an increase in proportion for the Americans who started coding after age 30 years. We also note that Business related disciplines come in strong at third place for those who started coding in their 30s. My third question looked into how life as a developer turned out for the late coders, compared to the early coders. Types of skills acquiredLooking further at the data, I noticed that the developer types slightly differed for the early coders vs the late coders.
Susan Maina – Medium

Exploring the StackOverflow 2020 survey dataset to tackle three questions“…By the time I would finish school I’ll be fifty? “You’re going to be fifty anyhow” ― Edith Eva Eger, The Choice: Embrace the PossibleWe all pass through different phases where we seek to reinvent ourselves or start something that might completely change our life’s direction. Naturally, a drastic change of such magnitude is not only mind-boggling but might require a lot of time and possibly HARD work. As a woman in my thirties and currently transitioning into data science, this question was especially relevant. The dataThe StackOverflow survey is an annual event that targets developers…
How To Predict Customer Churn From Your Website Logs?

Lastly, we’ll extract meaningful features and select proper machine learning algorithms to predict customer churn. Note that the bold column names are related to customer churn, others are about website logging information. First, we need to discard some columns that are not related to customer churn events such as session logs and user names. Then, we can transform data based on userId and there are two types of data: user information and user activities. User information columns in our data are churn , gender , level , and locCity , which must be the same for each user.
AI Blindspots — Part I. Shared Responsibility

AI is also being touted as the elixir for all businesses with huge initiatives starting across enterprises and millions being poured into creating ‘AI solutions’. In this series of posts, I will highlight some AI blindspots, how to spot them and how to avoid them. In Part 1 — Shared Responsibility, we will take a closer look at the discussions around Ethics and Responsibility in AI. As awareness about the adverse impacts of automatic decision-making using AI is increasing [3][4[5][6] there are growing concerns about its disproportionate impact on vulnerable populations [7][8]. What I propose in this post is to rephrase the discussion topic from “Responsible AI” to “Using AI responsibly”.
A Beginner’s Guide to Text Classification with Scikit-Learn

A Beginner’s Guide to Text Classification with Scikit-LearnPhoto by UliSchu on PixabayIf you’re learning Python and would like to develop a machine learning model then a library that you want to seriously consider is scikit-learn. Scikit-learn (also known as sklearn) is a machine learning library used in Python that provides many unsupervised and supervised learning algorithms. In this simple guide, we’re going to create a machine learning model that will predict whether a movie review is positive or negative. This is known as binary text classification and will help us explore the scikit-learn library while building a basic machine learning model from scratch. from sklearn.model_selection import train_test_split train, test = train_test_split(df_review_bal, test_size=0.33, random_state=42)Now we can set the independent and dependent variables within our train and test set.
20x times faster Grid Search Cross-Validation

Grid Search CV:Grid Search cross-validation is a technique to select the best of the machine learning model, parameterized by a grid of hyperparameters. In this article, we will discuss a new algorithm Halving Grid Search CV, that performs equally well as Grid Search CV with a large decrease in time complexity compared to Grid Search. Halving Grid Search:While both Grid Search CV and Random Search CV train all the components (combination of parameter grid) on the entire data, Halving Grid Search CV or Halving Random Search CV follows a successive halving approach. (Image by Author), Benchmark Time Constraints and Performance AUC-ROC Score for Grid Search (GS) and Halving Grid Search (HGS) Cross-ValidationObserving the above time numbers, for parameter grid having 3125 combinations, the Grid Search CV took 10856 seconds (~3 hrs) whereas Halving Grid Search CV took 465 seconds (~8 mins), which is approximate 23x times faster. Conclusion:In this article, we have discussed an optimized approach of Grid Search CV, that is Halving Grid Search CV that follows a successive halving approach to improving the time complexity.
How to train your deep learning models in a distributed fashion.

How to train your deep learning models in a distributed fashion. But, how do you do distributed training, if I have a model training using Jupyter notebook where do I start, can I perform distributed training for any deep learning model? In a distributed training using the data-parallel approach, the model parameters which are weights and biases can be updated in 2 ways. Tensorflow’s distributed training support both centralized and decentralized training methods (more about it here), if you already have a notebook using distributed TF you can easily import it into Azure ML. Here is the code snippet where I run a single GPU training script on a Horovod enabled distributed training cluster with 4 nodes.
Detecting deforestation from satellite images

The MVPFor the MVP, we decided on the following items:Model trained that has acceptable performance on detecting deforestation. (What would be “acceptable performance” was determined by comparing with baselines)trained that has acceptable performance on detecting deforestation. HowDatasetsIn order to train a model that can detect deforestation from space, we need some labeled data, consisting of satellite images and labels that should be related to the presence or absence of deforestation. Towards Detecting Deforestation [22] — binary dataset for detecting coffee plantations in the Amazon rainforest; we’ll mostly refer to this as the coffee dataset. The images are given in two formats: the typical RGB images and TIFF files that have an additional near-infrared band.
Google AI Blog: Learning to Manipulate Deformable Objects

While the robotics research community has driven recent advances that enable robots to grasp a wide range of rigid objects, less research has been devoted to developing algorithms that can handle deformable objects. One of the challenges in deformable object manipulation is that it is difficult to specify such an object's configuration. Examples of scripted demonstrators for manipulation of 1D (cable), 2D (fabric), and 3D (bag) deformable structures in our simulator, using PyBullet. Specifying goal configurations for manipulation tasks can be particularly challenging with deformable objects. We also significantly extend prior results using Transporter Networks for manipulating deformable objects by testing on tasks with 2D and 3D deformables.
Recognizing 3D spaces without spatial labels

Training such systems to recognize 3D spaces usually involves capturing a scene using a sensor (often a 3D sensor), and then hand-labeling the spatial extents of objects in the scene, including marking their locations, with a 3D box. On average, it takes more than 20 minutes to label and draw boxes in a small indoor 3D scene. WyPR combines advances in 2D weakly supervised learning with unique properties of 3D point cloud data. Next, to obtain object bounding boxes, it leverages a novel 3D object proposal technique inspired from selective search and referred to as geometric selective search (GSS). WyPR gives models spatial 3D understanding capability without the need for labeled training scenes at a point level, which is an extremely time-consuming process.
Prepare data for predicting credit risk using Amazon SageMaker Data Wrangler and Amazon SageMaker Clarify

The result is an exportable data flow capturing the data preparation steps required to prepare the data for modeling. For instructions on getting started with Studio, see Onboard to Amazon SageMaker Studio or watch the video Onboard Quickly to Amazon SageMaker Studio. With Data Wrangler, switching between these tasks is as easy as adding a transform or analysis step into the data flow using the visual interface. We’re now ready to start exploring and transforming the data in our new Data Wrangler flow. Python code replicating the steps in the Data Wrangler data flow – Exporting as a Python file enables you to manually integrate the data processing steps defined in your flow into any data processing workflow.
Our Search for Demand: REX Real Estate Index

Therefore, this project aims to create a real estate index for predicting demand via home listings and sale transactions. The function f_(z_n) is a hedonic demand function of h_n determined by the home’s submarket membership. On the other hand, the model with XGBoost hedonic demand function had ? set to 7 with 7 submarkets identified. We note the following from our observations:Hedonic Demand Function: The use of a logistic regression hedonic demand function produces generally worse results than XGBoost, so we prefer the latter. However, there may be other hedonic demand functions that are interpretable (we can understand the importance/weight of each home feature).
A Simple Outline of Reinforcement Learning

In this article, I aimed to explain what Reinforcement Learning is and the basics without going too much details. Along with Supervised Learning (SL) and Unsupervised Learning (UL), Reinforcement Learning (RL) forms 3rd and last one. Illustration of Learning Types, Photo by IBMUnlike stated above, learning methods do not need to be discriminated sharply. In recent years, many successful AI applications are hybrid of those 3 paradigms, like self-supervised learning, inverse reinforcement learning etc. Since next actions are also required, it is named SARSA indicating state, action, reward, state, action sequence.
Making Interactive Line Plots with Python Pandas and Altair

Making Interactive Line Plots with Python Pandas and AltairPhoto by Neven Krcmarek on UnsplashLine plot is an essential part of data analysis. In case of working with time series, the importance of line plots becomes crucial. Trend, seasonality, and correlation are some features that can be observed on carefully generated line plots. In this article, we will create interactive line plots using two Python libraries: Pandas and Altair. Pandas provides the data and Altair makes beautiful and informative line plots.
Feature Selection — Exhaustive vs. Cherry-Picked

That doesn’t mean you should not apply exhaustive feature engineering techniques in your project. However, the entire machine learning pipeline is not ready and you must present preliminary results to other stakeholders. So, when you deal with Scenario 1, I recommend starting with the cherry-picked feature selection approach. You need to build a machine learning model that just works fine. The exhaustive approach in feature selection lets you push the model performance as far as the data allows.
XGBoost: A Complete Guide to Fine-Tune and Optimize your Model

XGBoost: A Complete Guide to Fine-Tune and Optimize your ModelPhoto by @spacex on UnsplashWhy is XGBoost so popular? In most cases, data scientist uses XGBoost with a“Tree Base learner”, which means that your XGBoost model is based on Decision Trees. It allows using XGBoost in a scikit-learn compatible way, the same way you would use any native scikit-learn model. Before going deeper into XGBoost model tuning, let’s highlight the reasons why you have to tune your model. Deep dive into XGBoost HyperparametersA hyperparameter is a type of parameter, external to the model, set before the learning process begins.
A Hitchhiker’s Guide to Sentiment Analysis using Naive-Bayes Classifier

A Hitchhiker’s Guide to Sentiment Analysis using Naive-Bayes Classifierimage from UnsplashClassification lies at the heart of Machine Learning and Human Intelligence. Here the class will represent either positive(for positive sentiment) or negative (for negative sentiment). So finally we will get an input d and our model has to learn to predict which class , ‘c’ , it belongs to . If you may recall, our main goal was to find the class (whether positive or negative sentiment) given a particular sentence(document). Stop Words: Words like the, a , an , was , when etc.
How to use Datasets and DataLoader in PyTorch for custom text data

Install XGBoost and LightGBM on Apple M1 Macs

Install XGBoost and LightGBM on Apple M1 MacsPhoto by the authorIn this previous article I explained how to install TensorFlow, Scikit-Learn and several other packages natively compiled for Apple M1 (arm64). Here I explain step by step how to install two of the most powerful Gradient Boosting packages: XGBoost and LightGBM. conda create -n boostconda activate boostconda install python=3.8.8conda install numpy scipy scikit-learnNote that numpy and scipy are dependencies of XGBoost. Trying to directly install XGBoost from pip fails when it loads and compile pip version of scipy . All dependencies are already installed in native version after Step 5.pip install xgboostThis compile and install XGBoost under the environment.
Basic Text to Speech, Explained. Learn the cool technology behind Alexa…

As the title suggests, in this blog we are going to learn about text to speech (TTS) synthesis. What is the first bell which rings in your mind when you listen to text to speech? The input to our model is text, which passes through several blocks and eventually is converted to audio. This embedding between the encoder and decoder is known as the latent feature. Latent features are crucial because, other features like speaker embedding (will be explained in the coming blog) are concatenated with these and passed to the decoder.
Compression in the ImageNet Dataset

In all it took around 4.5 hours to process the training set and around 10 minutes for the validation set. This is concerning because the size distribution in the validation set doesn’t reflect the training set. JPEG compression leverages this to save additional space by subsampling color information, in other words, it stores less color information than brightness information. Lower quality images look worse but are considerably smaller than high quality images. Next let’s color the points by chroma subsampling schemeImage space colored by chroma subsampling scheme.
Understand Differentiable Programming

With differentiable programming, we want to build a new kind of programs using networks of weighted parameters, trained from examples with gradient-based optimization [2]. Under the hood, they can be seen as a complex composition of continuous differentiable functions dealing with inputs and outputs. Actually, when writing a differentiable program, you’re writing a program A that builds another program B at runtime. Differentiable Programming LanguagesAlthough gradient-based optimization and automatic differentiation are used since years, differentiable programming is a pretty recent thing even if it is closely tied to these two techniques. Differentiable programming is also the frontier between Software Engineering and Data Science, that wouldn’t be surprising to see more teams merging those 2 profiles in the future!
A Gentle Introduction to Ensemble Diversity for Machine Learning

Tweet Share ShareEnsemble learning combines the predictions from machine learning models for classification and regression. In this post, you will discover ensemble diversity in machine learning. Ensemble diversity is a property of a good ensemble where contributing models make different errors for the same input. Ensemble diversity, that is, the difference among the individual learners, is a fundamental issue in ensemble methods. Ensemble diversity is a property of a good ensemble where contributing models make different errors for the same input.
Teaching AI how to forget at scale

With more memory space, AI systems can process information at drastically larger scales. Let’s say that we task an AI agent to navigate and find the yellow door. With Expire-Span, AI systems can gradually forget irrelevant information and continuously optimize such discrete operations in a highly efficient way. Similarly, Expire-Span helps AI keep data that’s useful for a given task and forgets the rest. As a next step in our research toward more humanlike AI systems, we’re studying how to incorporate different types of memories into neural networks.
Build BI dashboards for your Amazon SageMaker Ground Truth labels and worker metadata

This is the second in a two-part series on the Amazon SageMaker Ground Truth hierarchical labeling workflow and dashboards. Amazon SageMaker Ground Truth (Ground Truth) is a fully managed data labeling service that makes it easy to build highly accurate training datasets for machine learning (ML). Ground Truth reporting pipeline – A pipeline used to build BI dashboards using AWS Glue, Athena, and QuickSight to analyze and visualize Ground Truth output data and metadata generated by the AWS Glue ETL job. Ground Truth reporting pipelineThe reporting pipeline is built on the output of the Ground Truth outputs stored in Amazon S3 (referred as the Ground Truth bucket). In this post, you learned how to generate data lakes for annotations and worker metadata from Ground Truth output data generated from Part 1 using Ground Truth, Amazon S3, and AWS Glue.
Maximize TensorFlow performance on Amazon SageMaker endpoints for real-time inference

The business problem you want your ML model to solve is the inferences or predictions that you want your model to generate. In this post, we describe the parameters that you can tune to maximize performance of both CPU-based and GPU-based Amazon SageMaker real-time endpoints. SageMaker supports both real-time inference with SageMaker endpoints and offline and temporary inference with SageMaker batch transform. Costs are calculated based on instance usage, and price/performance is calculated based on throughput and SageMaker ML instance cost per hour. For more information, see Maximize TensorFlow* Performance on CPU: Considerations and Recommendations for Inference Workloads, Meaning of inter_op_parallelism_threads and intra_op_parallelism_threads, and the model SageMaker inference API.
Revealing the Magic Behind t-SNE

Revealing the Magic Behind t-SNEWhat you see below is a 2D representation of the MNIST dataset, containing handwritten digits between 0 and 9. t-SNE representation of the MNIST dataset. Source: Visualizing High-Dimensional Data Using t-SNE by Laurens van der Maaten and Geoffrey Hinton (https://www.jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf)In this post, we are going to dive deep into how this magic is done! It was introduced by Laurens van der Maaten and Geoffrey Hinton in their paper Visualizing High-Dimensional Data Using t-SNE. Source: Visualizing High-Dimensional Data Using t-SNE by Laurens van der Maaten and Geoffrey Hinton (https://www.jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf)When compared to other datasets, t-SNE often has “superior” performance.
Dynamic Workflows in Flyte

Dynamic workflows in Flyte intend to solve this problem for the users. IntroductionBefore we get an essence of how a dynamic workflow works, let’s understand the two essential components of Flyte, namely task and workflow. Vaguely, it provides the flexibility to mold workflows according to the project’s needs, which may not be possible with static workflows. When to Use Dynamic Workflows? How Flyte Handles Dynamic WorkflowsFlyte can combine the typically static nature of DAGs with the dynamic nature of workflows.
Causal ML for Data Science: Deep Learning with Instrumental Variables

Causal ML for Data Science: Deep Learning with Instrumental VariablesMap of the West Bank, Palestine, showing small peripheral neighbourhoods in red and larger more central neighbourhoods in blue. A short introduction to deep learning follows, where I situate this subset of ML within econometrics to explain the necessity of benchmarking deep learning results. Importantly, assuming a constant causal effect obviates the possibility that the average treatment effect is conditional on covariates. Nonetheless, the use of deep learning in economics is a controversial subject, primarily because of the “black box” nature of deep neural networks. For an introduction, I highly recommend “Deep Learning with Python”, which is written by François Chollet, who is the creator of the popular deep learning API, Keras.
5 Tools to Maintain Your Machine Learning Projects Efficiently

One of the most challenging types of projects to test and maintain is any project containing machine learning algorithms. Luckily, existing tools can help us test, debug, and maintain our machine learning projects in less time and with minimal effort. TensorWarch is a visual debugging tool designed by Microsoft Research to aid data scientists in debugging machine learning, artificial intelligence, and deep learning applications. DVC will track the evolution of your machine learning model to ensure reproducibility and allow you to switch between different experiments. But, luckily for us, there are different tools that we can use to assist us test, debug and maintain machine learning applications.
Exploratory Data Analysis of Text data Including Visualization and Sentiment Analysis

Exploratory Data Analysis of Text data Including Visualization and Sentiment AnalysisText data analysis is becoming easier and easier every day. Prominent programming languages like Python and R have great libraries for text data analysis. But with the more developed and improved version of libraries, it is easier to perform text data analysis with just simple and beginner-level coding knowledge. Preprocessing and data cleaning is a big part of data analysis. Frequency ChartsThis is common practice in text data analysis to make charts of the frequency of words.
Thirty-One Checks For Different Levels of Machine Learning Operation Maturity

Thirty-One Checks For Different Levels of Machine Learning Operation MaturityCreate a Check List for your MLOps Rollout. Note: The AI 2.0 movement started before 2012, but I used 2012 when I started applying R to Machine Learning applications. Over the last five years, I have seen training and deploying machine learning applications (ML applications) into production. All the MLOps checklist items are numbered sequentially and divided by five MLOps maturity levels. The maturity levels I use roughly follow the Microsoft Machine Learning Operations maturity model.
Generative Adversarial Networks (GANs) — From Intuition to Implementation

Generative Adversarial Networks (GANs)Formally GANs are defined as two neural networks contesting with each other in a game (in the sense of game theory, often but not always in the form of a zero-sum game). Then we use this critique as feedback to generate close to real images using a generator. Training discriminator is the same as training a normal classifier. When our input is fake images it classifies as fake and when input is real (i.e from training data) it classifies as real. Training Discriminator (Image by Author)Loss of Discriminator (Image by Paper)The above loss formula enforces the discriminator to classify real output as 1 and fake/generated output as 0.
A Better Comparison of TensorBoard Experiments

There Will be a Shortage Of Data Science Jobs in the Next 5 Years?

There Will be a Shortage Of Data Science Jobs in the Next 5 Years? Photo by Andrea Piacquadio from PexelsI have been in the data science field for the last half-decade when python programming came into the trend. And, also people are getting a good hike in their job data in the data science field. On the other hand, we were trying different machine learning libraries like logistic regression, random forest, boosting machines, naive Bayes, and other data science libraries to a better model. The Ray of HopeWhen all these things are going to automate, you might be thinking about the future of data science enthusiasts.
How Artificial Intelligence Learns Regardless of Day or Night

How Artificial Intelligence Learns Regardless of Day or NightPhoto by Possessed Photography on UnsplashHave you ever wondered how you can identify something at night when you have only ever seen it in bright daylight? This is where the gods of Artificial Intelligence gave us Domain Adaptation. It does what it says: it helps understand the data of one domain (day) in another domain (night). This is where domain adaptation comes in and helps use the knowledge of the day to apply on the night database. Domain adaptation can also be used in many fields, even if there is no camera involved.
Voice classification using Deep Learning, with Python

Voice classification using Deep Learning, with PythonSometimes humans are able to do certain kind of stuff very easily, but they are not able to properly describe how they do it. The SetupAs the title say, I’ve used Python. In particular, I’ve used these libraries:We will use them during the process. Deep LearningNow that we have everything we need, let’s talk about the Deep Learning. Let’s use a train-test split and mix the data:Define the model:Train the model:Evaluate the model:The model ends with a softmax.

Concepts about Positional Encoding You Might Not Know AboutPhoto by corina ardeleanu on UnsplashIn RNN, LSTM the words are fed in sequence, and hence it understands the order of words. The only loophole here is when we compare two different sentences of different lengths, for a particular index the positional embedding values would be different. So we discard this method for our natural language processing task and we go for the Frequency-based method for positional encoding as mentioned in the original paper “Attention is all you need”. And with an increase in length, the positional encoding values remain the same. But in the smooth sine curve from below (where i=4), we see the word position 0 and word position 5 distance on the y axis is very small.

Build a scalable machine learning pipeline for ultra-high resolution medical images using Amazon SageMaker

It utilizes an allreduce algorithm for fast distributed training (compared with a parameter server approach) and includes multiple optimization methods to make distributed training faster. For more examples of distributed training with Horovod on SageMaker, see Multi-GPU and distributed training using Horovod in Amazon SageMaker Pipe mode and Reducing training time with Apache MXNet and Horovod on Amazon SageMaker. SageMaker Pipe modeYou can provide input to SageMaker in either File mode or Pipe mode. ConclusionIn this post, we introduced a scalable machine learning pipeline for ultra high-resolution images that uses SageMaker Processing, SageMaker Pipe mode, and Horovod. For more information about SageMaker, see Build, train, and deploy a machine learning model with Amazon SageMaker.
Writing ML code that scales

When you update your local code and push it, you override all previous code: If you have manually changed the directory paths the last time, you have to do so again. By separating the setup from the training code you avoid going through the deployed code over and over. You then import this method in your main training code; locally the paths point to your local folders, on the remote machines they point to their appropriate directories. This way you won’t have to remember changing the default parameters after going from a local single GPU machine to a multi-worker setup. Separate model creation scriptSeparating your model creation routines from the main training code results in lean code.
What Problem Is Your Data Solving?

The evergreen popularity of careers in data science is the result of many factors, from shifts in the labor market to advances in cloud computing. Sometimes the issues data scientists face are organizational: how do you build an analytics stack and hire a data team from near-scratch? Having a seasoned data team in place doesn’t always guarantee success. For Elena Stamatelou, a major obstacle in product development is that even when huge amounts of data are available, “if we start immediately by looking at the data, we will probably get lost while trying to understand the data columns and fields and forget about the initial problem.” Read her introduction to design thinking to see how that approach informs her data science work. Let’s round out this week’s Variable with some of the best recent additions:What problems are you solving with data science?
AutoML will not replace your data science profession

Collecting dataThe data scientists or data engineers should decide what type and how much data needed to be collected. Data scientists are still needed for this step and AutoML cannot fully replace data scientists. Data cleaningData scientists and data engineers spend 60–70% of their time on data cleaning. Key responsibilities of data scientistsNow, we can figure out some of the key responsibilities of data scientists in a machine learning process. We’ve walked through the steps of a machine learning process and found the reasons for “AutoML will NOT replace your data science profession”.
Synthetic Data — key benefits, types, generation methods, and challenges

AI business world has an abundance of dependency on synthetic data –In the medical and healthcare sector, synthetic data is used for testing certain conditions and cases for which real data does not exist. Synthetic data enables data professionals to access the use of centrally recorded data while still maintaining the confidentiality of the data. Synthetic data is broadly classified into three categories:Fully Synthetic Data — This data is purely synthetic and does not have anything from original data. Partially Synthetic Data — This data replaces only values of some selected sensitive feature with the synthetic values. Hybrid Synthetic Data — This data is generated using both real and synthetic data.
How Computers Play the Imitation Game: From Autoencoders to StyleGAN2s in Less Than 10 Minutes

How Computers Play the Imitation Game: From Autoencoders to StyleGAN2s in Less Than 10 MinutesPhoto by Wilhelm Gunkel on UnsplashIt all started in 1950 with a game played by three subjects. This post has the ambition of summarising in less than 10 minutes the development of the most recent computer techniques in playing the Imitation Game. Phase 1: get yourself a good discriminator. The combination of all those techniques gave extremely good results for high-quality images. for low level and high level image features)transfers the features encoded by the mapping network into the generated image at each resolution level (i.e.
How enterprise app user interfaces need to change to accommodate machine learning probabilities

How enterprise app user interfaces need to change to accommodate machine learning probabilitiesMy plan fornhow to make AI simple, not complicated, for the millions of business software users that learn to co-exist with thinking machines and software robots. Now, there is no way to communicate the confidence of the predicted value to the user in the majority of the enterprise app interfaces. The trouble arises from the fact that the right threshold value can be wildly different from one field to another. First of all, now it is safe to write even low confidence values into fields to assist us, humans. There are moments when more information than the mere field value would make sense.
A very Bayesian interpretation of decision trees and other machine learning algorithms

A very Bayesian interpretation of decision trees and other machine learning algorithmsI remember enrolling for a course where my professor spent two lectures chewing over the math sprouting decision trees before disclaiming, “Class, decision trees algorithms do not use any of this.”. Well, the usual ways of growing decision trees are an approximation of that Bayesian model. Consider a boolean classification problem you are required to solve using decision trees. Because I’m a good egg, I drudge on your behalf and build a garden of all possible decision trees using the training data. The aim was to understand decision trees from a Bayesian perspective and highlight how the Bayesian statistics is always stealthily bustling in the background of any ML algorithm.
4 Must-Know Python Pandas Functions for Time Series Analysis

4 Must-Know Python Pandas Functions for Time Series AnalysisPhoto by Luke Chesser on UnsplashTime series data consists of data points attached to sequential time stamps. Daily sales, hourly temperature values, and second-level measurements in a chemical process are some examples of time series data. Time series data has different characteristics than ordinary tabular data. In this article, we will go over 4 Pandas functions that can be used for time series analysis. The temperature values are generated randomly using Numpy functions.
Intuitive Maths and Code behind Self-Attention Mechanism of Transformers

This blog post will get into the nitty-gritty details of the Attention mechanism and create an attention mechanism from scratch using python. Attention Mechanism concept Steps involved in Self Attention Mechanism (Intuitive mathematical theory and code)Input Pre-ProcessingRole of Query, Key, and Value matrixConcept of Scaled Attention Scores3. Attention Mechanism conceptAs discussed in the previous post, what happens when a sentence passes through an attention mechanism. Next, we will look into the multi-head attention mechanism, which has its underlying principle coming from the Self-Attention Mechanism. If we denote each self-attention flow/process as one head, then we will get a multi-head attention mechanism by concatenating all self-attention mechanisms together.
RetinaNet: The beauty of Focal Loss

released a paper, “Focal Loss for Dense Object Detection” which introduced a detector called the RetinaNet. Before diving into the nitty-gritty of RetinaNet, I will discuss the concept of Focal Loss. 3 — Focal Loss (Image by author)If you notice, the negation and the log term makes up the Cross-Entropy Loss and γ represents the tunable parameter. The Focal Loss formula now becomes:Fig.4 — Modified Focal Loss (Image by author)The authors have noted(through experiments) that the Focal Loss form doesn’t need to be exact. The multi-task loss function in RetinaNet is made up of the modified focal loss for classification and a smooth L1 loss calculated upon 4×A channelled vector yielded by the Regression Subnet.
Convolutional neural networks

Examples of convolutional neural networksAlexNet is a classic medium depth convolutional neural network:The reason why it’s drawn this way is that the net was implemented to run on two GPUs and was manually split into two parts. The input is a 224x224x3 imageThe first convolutional layer is 11x11x96These filters are applied with a stride of 4. Pop quiz: how many parameters are there in the first convolutional layer? Remember that the size of the W matrices in each of those convolutional layers is :So you end up with 11x11x3x96, which amounts to 34.848 parameters. As long as the weights in the convolutional layers are not too big, you might hope that this dF/dx would not be too big either.
Samuele Bolotta – Medium

Machine learning algorithms try to imitate the pattern between two datasets in such a way that they can use one dataset to predict the other. Specifically, supervised machine learning is useful for taking what you know as input and quickly transforming it into what you want to know. In order to explain the basics of supervised learning, I will first start with some basic probability theory. Afterwards, I will describe the “machine learning model”. Some probability theoryLet’s define supervised learning a little bit more precisely.
Understanding Pandas Melt — pd.melt()

Example — 1:Parameters Used:No parameters usedIf we do not specify any parameters while using the pd.melt() method, it will melt all the columns with its’ corresponding values. Figure 8: Using pd.melt( ) Function. The critical thing to notice here is that not specifying which columns to melt will melt all the leftover columns after specifying the column names in id_vars . Figure 14: Using pd.melt( ) Function. Figure 24: Using pd.melt( ) Function.
A Gentle Introduction to Multiple-Model Machine Learning

In this tutorial, you will discover multiple-model techniques for machine learning and their relationship to ensemble learning. Mixture of experts might be considered a true ensemble method, although hybrid machine learning models are probably not ensemble learning methods. Multiple-Model Techniques: Machine learning algorithms that are composed of multiple models and combine the techniques but might not be considered ensemble learning. Related TutorialsBooksSummaryIn this tutorial, you discovered multiple-model techniques for machine learning and their relationship to ensemble learning. Mixture of experts might be considered a true ensemble method, although hybrid machine learning models are probably not ensemble learning methods.
Google AI Blog: ALIGN: Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision

Learning good visual and vision-language representations is critical to solving computer vision problems — image retrieval, image classification, video understanding — and can enable the development of tools and products that change people’s daily lives. These examples demonstrate that the ALIGN model can align images and texts with similar semantics, and that ALIGN can generalize to novel complex concepts. Image retrieval with image text queries. For instance, considerations should be made towards the potential for the use of harmful text data in alt-texts to reinforce such harms. ConclusionWe have presented a simple method of leveraging large-scale noisy image-text data to scale up visual and vision-language representation learning.
The Washington Post Launches Audio Articles Voiced by Amazon Polly

Amazon Polly is a service that turns text into lifelike speech, allowing you to create applications that talk, and build entirely new categories of speech-enabled products. Post subscribers live busy lives with limited time to read the news. The Post started testing other options and ended up choosing Amazon Polly because of its high-quality automated voices. For more information, see What Is Amazon Polly? and log in to the Amazon Polly console to try it out for free.
Build a cognitive search and a health knowledge graph using AWS AI services

The steps to implement the solution are as follows:Create and export Amazon HealthLake data. Create and export Amazon HealthLake dataAs a first step, create a data store using Amazon HealthLake either via the Amazon HealthLake console or the AWS Command Line Interface (AWS CLI). Connecting the output of Amazon HealthLake to Amazon Kendra and Neptune gives you the ability to build a cognitive search and a health knowledge graph to power your intelligent application. Deploy this solution using Amazon HealthLake in your AWS account by deploying the example on GitHub. Dr. Taha Kass-Hout is Director of Machine Learning and Chief Medical Officer at Amazon Web Services, and leads our Health AI strategy and efforts, including Amazon Comprehend Medical and Amazon HealthLake.
Improve the streaming transcription experience with Amazon Transcribe partial results stabilization

We’re happy to announce that Amazon Transcribe now allows you to enable and configure partial results stabilization for streaming audio transcriptions. On the other hand, a low stability level leads to more accurate transcription results, but the partial transcription results are more likely to change. Access partial results stabilization via the Amazon Transcribe consoleTo start using partial results stabilization on the Amazon Transcribe console, complete the following steps:On the Amazon Transcribe console, make sure you’re in a Region that supports Amazon Transcribe Streaming. Medium – Provides partial transcription results that have a balance between stability and accuracy– Provides partial transcription results that have a balance between stability and accuracy Low – Provides relatively less stable partial transcription results with higher accuracy compared to High and Medium settings. To learn more about the Amazon Transcribe Streaming Transcription API, check out Using Amazon Transcribe streaming With HTTP/2 and Using Amazon Transcribe streaming with WebSockets.
Comprehensive Study of Least Square Estimation (Part 3)

Comprehensive Study of Least Square Estimation (Part 3)Photo by Wil Stewart on UnsplashThis is the third part of the series of articles written in “Comprehensive Study of Least Square Estimation”. Constrained Least SquareThe constrained LS problem is trying to find the solution for the LS problem with linear constraints. A is m x n and C is p x n matrix. Let’s take a look at the dimensions: A is m x n and C is p x n matrices, d is p x 1, and b is m x 1 vectors. Therefore the KKT matrix is a square (p + n) x (p + n) matrices.
Google Releases MLP-Mixer: An All-MLP Architecture for Vision

Google Releases MLP-Mixer: An All-MLP Architecture for VisionPhoto by AbsolutVision on UnsplashImage processing is one of the most interesting subareas in machine learning. The proposed MLP-Mixer doesn’t use any convolutions or any self-attention layers, and yet achieves almost SOTA results, this is quite thought-provoking. MLP-Mixer contains two types of layers: one with MLPs applied independently to image patches (i.e. After this is done, a “table” is constructed with the values of the image patches against the hidden dimension values. One important point to note here is that the size of the hidden representation of the non-overlapping patches is independent of the number of input patches.
How To Train Keras Models Using the Genetic Algorithm with PyGAD

This tutorial discusses how to train Keras models using PyGAD. import pygad.kerasga keras_ga = pygad.kerasga.KerasGA(model=model,num_solutions=10) initial_population = keras_ga.population_weightsThe next section summarizes the steps to train a Keras model using PyGAD. Create a Keras ModelThere are 3 ways to build a Keras model:PyGAD supports building a Keras model using both the Sequential Model and Functional API. def callback_generation(ga_instance):print("Generation = {generation}".format(generation=ga_instance.generations_completed))print("Fitness = {fitness}".format(fitness=ga_instance.best_solution()[1]))Create an Instance of the pygad.GA ClassThe next step towards training a Keras model using PyGAD is to create an instance of the pygad.GA class. mae = tensorflow.keras.losses.MeanAbsoluteError()abs_error = mae(data_outputs, predictions).numpy()print("Absolute Error : ", abs_error)The next sections list the complete code to build and train Keras models using PyGAD.
The Four Policy Classes of Reinforcement Learning

Policy approximationIn policy approximation solutions, we directly modify the policy itself. Policy function approximation (PFA)A policy function approximation (PFA) is essentially a parameterized function of the policy. Cost function approximation (CFA)Like the PFA, a Cost Function Approximation (CFA) also directly searches the policy space. Value function approximation (VFA)A value function approximation (VFA) represents downstream values as a function. However, there are many more combinations, such as embedding a VFA as a downstream policy into a direct lookahead algorithm.

Route Your Experiments to FilesPhoto by Tom Hermans on UnsplashRunning deep learning experiments can be a daunting task. Overall, after calling mirror_console_to_file(file) , any message posted to the stdout and stderr streams will be intercepted, routed to a file, and then printed as always. Going the extra mile, a timestamp along each message could be added by manually inspecting messages for thecharacter. Instead, the log will contain each message in sequence instead of each message overwriting the previous. Using it, you can fetch the current commit summary/hash and print it at the start of your log.
LIME for auditing black-box models

LIME for auditing black-box modelsImage created by authorWe like to believe that this is an era of ML and AI, but what we are forgetting is that this is also an era of black-box models which provide no justice and interpretation about the classifier and the decision it makes. One way to make a change is to audit black-box models. Wait… what are black-box models? A black-box model refers to something which is completely dark and hence, one can only observe the input and output variables but not what is going on inside it. The output of LIME shows the contribution of each feature to the prediction of a particular data sample.
Measuring “Fairness” When Ages Differ

For example, you might split your population by gender, then measure accuracy and false positive rates for females vs. males. In my experience at least, most populations used for modeling show age differences across groups of interest for fairness metrics. Fairness Metric ResultsI look at three common fairness metrics: false positive rates, false negative rates, and model accuracy. It’s hard to imagine curves where we’d see equal false positive rates with different overall rates. Including vs. Omitting AgeOne misconception is that if a model “adjust for” (incorporates) age, fairness metrics will also be corrected.
Dataloader for sequential data using PyTorch deep learning framework — Part 2

PyTorch is one of the most commonly used deep learning framework used for implementing various deep learning algorithms. The class data is initialized with two arguments, path and transform which are passed as arguments to __init__. The function __getitem__ is the most crucial, it loads the image, then resizes it, and then converts it into a tensor. After initializing the class data, we use a DataLoader function which automatically batches the whole data into a defined batch size. Moving on to writing a dataset class for the sequential dataset.
Generating Cool Storylines Using a T5 Transformer and Having Fun

the Text-To-Text Transformer (T5). This transformer model was pre-trained on a much cleaner version of the Common Crawl Corpus and Google named it the Colossal Clean Crawled Corpus (C4). Now you might be thinking, how is T5 different from every other Transformer model out there? Doing this helps us see the expressivity of the proposed Sequence2Sequence (“Text-To-Text”) Transformer. In SummaryIn this blog we saw the approach behind the Sequence2Sequence Transformer or as Google likes to say it “Text-To-Text” Transformer (T5).
Make Your Dashboard Stand Out — Tile Map

ImplementationAs a data visualizer A.K.A. to do so, you are supposed to have a data table of state locations first. For instance, I am going to visualize the population in each state and the census data is already accessible. Now you are seeing an upside-down USA map, this is because my provided location data table ordered the state from the north to the south, which needs you to reverse the vertical axis. Image by AuthorTip: it is also always popular to use a hexagon and make it look like a hive.
Fastai Course Chapter 3 Q&A on Linux

What are the six types of bias in machine learning, according to Suresh and Guttag? Measurement Bias is a bias that occurs in machine learning when the wrong features and labels are measured and used. Aggregation Bias is a bias that occurs in machine learning when the model can’t distinguish between the groups in the heterogeneous population. Development Bias is a bias that occurs in machine learning when the model is used or interpreted in inappropriate ways. In the paper “Does Machine Learning Automate Moral Hazard and Error?” why is sinusitis found to be predictive of a stroke?
7 Awesome Jupyter Utilities That You Should Be Aware Of

Jupyter Notebooks are considered to be the backbone of any data science experiment and for good reason. Merging multiple notebooksThere’s a lightweight library that makes it possible for you to merge two or more notebooks into one. Sometimes, this has been the reason I wasn’t able to export my notebooks properly as a PDF file. And lastly,Increase your productivity via a custom themeLately, I’ve been using dark themes in jupyter notebooks a lot. pip install jupyterthemesThen, get the list of themes:!jt -lChoose one like this:!jt -l Checkout the official repo of the library.
Understand Bayes’ Theorem Through Visualization

In this article, I will explain Bayes’ theorem concept using visualization and why it is not difficult as some may think. Bayes’ Theorem as a Visualization …Let’s start with the example in a company with 30 engineers with 25 engineers who are male as per below visualization below. Bayes’ Theorem formulaTo add one final notion, P(H|E) is called “posterior” in Bayes’ Theorem, which means they believe in the hypothesis after seeing the evidence. As we know Bayes’ theorem is branching from Bayesian statistics, which relies on subjective probabilities and uses Bayes’ theorem to update the knowledge and beliefs regarding the events and quantities of interest based on data. ConclusionIn this article, I have explained Bayes’ Theorem using visualization and hopefully, you have gained the understanding of;What Bayes’ Theorem is and what is it saying?
Data Versioning for Efficient Workflows with MLFlow and LakeFS

Data Versioning for Efficient Workflows with MLFlow and LakeFSPhoto by Hannes Egler on UnsplashIntroductionVersion Control Systems, such as Git, are essential tools for versioning and archiving source code. Version Control helps you keep track of the changes in the code. Apart from code, data changes too. Usually, Data Scientists need to access a range of datasets to complete a specific task. From feature engineering to model training or selection and hyper-parameter optimization, data gets processed and changes, too.
Understand Time Series Components with Python

Understand Time Series Components with PythonBunch of time series terms. Components of Time SeriesThere are some basic definitions and concepts before we start modeling our time series data. Mainly, there are four types of components majorly see in time series as discussed below:Trends:It is a movement of data values that decrease or increase with time. There can be an upward trend and downward trend of any data series. For example:import numpyimport matplotlib.pyplot as plt mean_value = 0std_dev = 1no_of_samples = 500time_data = numpy.random.normal(mean_value, std_dev, size=no_of_samples) plt.plot(time_data)plt.show()The white noise in time series.
DeepMind Combines Logic and Neural Networks to Extract Rules from Noisy Data

DeepMind Combines Logic and Neural Networks to Extract Rules from Noisy DataThe new model combines two different paradigms to solve a major problem in data science. The goal is to keep you up to date with machine learning projects, research papers and concepts. Last year, Google published a research paper under the catchy title of “One Model to Learn Them All” that combines heterogeneous learning techniques under a single machine learning model. A few years ago, Alphabet’s subsidiary DeepMind took another step towards multi-model algorithms by introducing a new technique called Differentiable Inductive Logic Programming(∂ILP) that combines logic and neural networks into a single model to extract rules from noisy data. The main limitation of ILP systems is their struggle with noisy or ambiguous data which is so common in deep learning scenarios.
How to Use Analytical Geometry to Find the Shortest Route/Loop for Any Euclidean Travelling Salesman Problem efficiently

ApproachThe approach here is that the shortest loop from the starting point is a non-intersecting polygon to the furthest point and back. So far, none of this resolves how to get to the shortest path, and there is no pattern yet. Step 1: since you are given all distances, identify the farthest point. This is easier to visualize if point C was on the axis itself like so. If we went to point C from point b and then went to the farthest point, we would then have to come back from the farthest point to point c again, thereby doubling the distance of the salesman between point c and the farthest point.
Complete List of Feature Engineering Methods: 40 Techniques, 10 Categories

With feature engineering, you have an idea of how to tune your model or edit your data before any attempt. Feature SplitBy feature split I am transforming a feature into two or more features, increasing the number of features in the model. Wrapper MethodsWrapper methods group several techniques that use an R Squared value to measure whether a feature should be conserved or not. These techniques work by iteratively using features while monitoring a change in score, hence they work recursively:Exhaustive Feature SelectionForward RegressionBackward Regression/RFE_Recursive Feature EliminationStepwise RegressionBi-directional elimination:Numerical ImputationA list of Numerical Imputation techniques. Hybrid MethodsHybrid methods are a combination of different techniques (that you can pick from the list above) to perform feature engineering.
Plotly Express: Interprete data through interactive visualization

It’s called Plotly Express. Plotly ExpressPlotly is a company that builds interactive and high-quality data visualization tools using Python, R, and JavaScript. In 2019, they released a new high-level Python visualization library: Plotly Express. Plotly Express offers other tools that combine more plots together. You can eventually look for other examples of code on the official website of Plotly Express.
My VS Code Setup To Prototype Algorithmic Trading Strategies Locally Using LEAN

Will Transformers Replace CNNs in Computer Vision?

In NLP, a classical type of input is a sentence and an image in a computer vision case. Transformers in computer visionNow that we know transformers are very interesting, there is still a problem in computer vision applications. Vision transformers’ complexity. Of course, transformers are still highly data-dependent, and nobody can say whether or not it will be the future of either NLP or computer vision. I hope this article could give you a great introduction to transformers and how they can be applied to computer vision applications.
State of the Art Models in Every Machine Learning Field 2021

State of the Art Models in Every Machine Learning Field 2021A collection of the best SOTA models in every ML field you can think of Mostafa Ibrahim Follow Apr 12 · 6 min readPhoto by Jon Tyson on UnsplashState-of-the-art models keep changing all the time. Model training speed has always been a huge bottleneck for machine learning because it just makes debugging network issues very long. It’s one of the few places where it seems that classical machine learning algorithms seem to outperform deep learning networks (or perform the same at least). I am a strong believer in the power of training machine learning models without labels. And to be honest one of the most enjoyable projects that I worked on was an unsupervised machine learning one.
GNNs to Data Augmentation to Building Distributed Applications at Scale with Open-source

Most commonly used implementations of GNNs in practice use a node-wise thresholding strategy to protect sensitive information (e.g., emails, phone numbers, street addresses, etc.). In the field of Natural Language Processing (NLP), data augmentation comes at a premium. Check out this paper led by researchers at Google, Carnegie Mellon, and Mila — Quebec AI exploring state-of-the-art techniques used for data augmentation approaches (DAA). In the field of machine learning, this is particularly true. That is why we created our AI community on Discord — to connect and learn with other data experts and enthusiasts.
Google AI Blog: Accelerating Eye Movement Research for Wellness and Accessibility

We also discuss the potential use of this technology as a digital biomarker of mental fatigue, which can be useful for improved wellness. Model OverviewThe core of our gaze model was a multilayer feed-forward convolutional neural network (ConvNet) trained on the MIT GazeCapture dataset. The unpersonalized gaze model accuracy was improved by fine-tuning and per-participant personalization. Smartphone gaze could provide a powerful way to make daily tasks easier by using gaze for interaction, as recently demonstrated with Look to Speak. ConclusionOur findings of accurate and affordable ML-powered smartphone eye tracking offer the potential for orders-of-magnitude scaling of eye movement research across disciplines (e.g., neuroscience, psychology and human-computer interaction).
OpenAI Scholars 2021: Final Projects

My advice to someone starting in deep learning research is to take your time to understand insights from fundamental papers and remember that the field is still relatively new. Blogplaycircle Feedback Loops in Opinion ModelingDanielle Ensign OpenAI Mentor: Jeff WuPrevious Roles: Software Engineer at ITHAKA, Brighten AI, and Phylliida I have a background in Software Development, AI Fairness, and VR Game Development. My project is exploratory, investigating prior work on opinion modeling from the context of deep learning. Blogplaycircle Characterizing Test Time Compute on Graph Structured ProblemsKudzo Ahegbebu OpenAI Mentor: William GussPrevious Roles: Software Engineer at Facebook and Genentech I am a software engineer with an applied physics and aerospace background. Having the dedicated time to explore deep learning with great mentorship has been transformative in my ability to understand and contribute to the field!
Build an anomaly detection model from scratch with Amazon Lookout for Vision

In this post, I go through the steps of creating an end-to-end machine vision solution that identifies visual anomalies in products using Amazon Lookout for Vision. When a brick on the belt breaks a light beam, the device takes a photo and sends it to Amazon Lookout for Vision for anomaly detection. The following diagram illustrates the architecture of our anomaly detection solution, which uses Amazon Lookout for Vision, Amazon Simple Storage Service (Amazon S3), and a Raspberry Pi. Amazon Lookout for Vision is a machine learning (ML) service that uses machine vision to help you identify visual defects in products without needing any ML experience. ConclusionNow you know how to use Amazon Lookout for Vision to train, run, update, and monitor an anomaly detection application.
PyOD: a Unified Python Library for Anomaly Detection

This task is commonly referred to as Outlier Detection or Anomaly Detection. My favorite definition: An anomaly is something that arouses suspicion that it was generated by different data generating mechanismCommon applications of outlier detection include fraud detection, data error detection, intrusion detection in network security, and fault detection in mechanics. Types of Outlier Detection AlgorithmsThere are several classes of outlier detection algorithms. If you are interested in learning more about outlier detection, see the Anomaly Detection Resources page of the PyOD Github repository. Referencesyzhao062/pyod: (JMLR’19) A Python Toolbox for Scalable Outlier Detection (Anomaly Detection) (github.com)Welcome to PyOD documentation!
Bayesian A/B Testing Explained

Bayesian A/B Testing ExplainedImage by Alessandro Crosato from SplashThere are many applications of A/B testing across various industries. There are two common ways to approach A/B testing, the frequentist approach and the bayesian approach, both stepping from the foundations of hypothesis testing. In this article, I’ll cover the explanation and implementation of the bayesian approach to A/B testing. Table of ContentsThe Bayesian Approach- Bayesian Machine Learning- Bayesian Machine Learning Bayesian A/B Testing- Explore Exploit Dilemma- Explore Exploit Dilemma Problem StatementBayesian Bandit / Thompson Sampling- Bayes Theorem- Beta Distribution- Bayes Theorem - Beta Distribution ImplementationConcluding RemarksResourcesThe Bayesian ApproachThe Bayesian approach stems from one main rule, that everything is a random variable. Generally the bayesian approach to A/B testing converges quicker than other traditional A/B tests.
Interactive Geospatial AI Visualization in Jupyter Notebook

Introduction to Layers, Raster, and VectorNow, this will be where most of your EDA will take place. Raster data is stored as a grid of values (think of an image with pixels) while vector data represents a specific feature such as a point, line, or polygon. The figure below provides a handy illustration to know what are layers, and the difference between raster / vector data. Layers, Raster, and Vector data explained (source: SBCounty.gov)If you want a more detailed explanation on this topic, do checkout my other blog post below. ConclusionThat’s it for an introduction to interactive geospatial visualization on Jupyter Notebook.
What To Look For In A Quantum Machine Learning Framework

What To Look For In A Quantum Machine Learning FrameworkDo you want to get started with Quantum Machine Learning? Image by author, Frank ZickertIn quantum machine learning, we aim to harness the phenomena of quantum mechanics to deliver a huge leap forward in the computation of machine learning algorithms. So the apparent question is, which one to use if you want to apply quantum machine learning? So for quantum machine learning, quantum computing is the means to achieve a goal. To summarize, these are things you may want to look for in a quantum machine learning framework:Who is the developer?
What You Need to Know about Julia in 2021

Python is not going to be phased out by Julia, but it is likely that Python and Julia will form some sort of synergy inside of the Data Science industry. That being said, while Julia might not come into play in most Data Science jobs professionally for the time being, education is a very important aspect of Data Science. Secondly, is always a looming, constant need for education in the Data Science industry because it is moving so quickly at times. However, Julia has proved this theory wrong by creating the multiple dispatch paradigm. There are conversations about why corporate involvement might be the next step for the Julia language, but in a lot of ways it is a mixed bag.
Machine Learning: An Initial Approach to Predict Patient Length-of-Stay

Picture by Anna Shvets on PexelsSome Observations:Patient length-of-stay (LOS) defines the time a patient spends admitted at a healthcare facility. Here, I describe a initial approach to predict LOS for hospitals to leverage. Training and Testing the DataI’ll use python’s scikit-learn library to predict LOS using 16 feature variables corresponding to 318,438 patients. Then I’ll take the remaining data for testing and use my features to predict LOS using my model. It’s a “black box” model that is creating thousands of different trees to best predict LOS.
AI Can Explain Anything — Except Why No One Listens to Me

As AI/ML systems have rolled out further into organizations, they have:Increased complexity of recommendations:Fundamentally, it is hard to explain what AI algorithms are doing. Business stakeholders need to become more involved during the process of building an answer. If we start building a ML model, non-technical stakeholders should not sit idly, and imagine the ball is out of their court. Working through variables removes late discovery of infeasibility:Redesigning the stakeholder-analyst interaction model does not create new operational capabilities, but can save a whole lot of time avoiding the wrong recommendations. But as the complexity of analysis approaches increases, it is imperative that data science does not become more complex to understand.
Data Wrangling Solutions — Working With Dates — Part 3

Data Wrangling Solutions — Working With Dates — Part 3Photo by Waldemar Brandt on UnsplashIn the last two tutorials, we went from importing data containing date columns to converting them from non-DateTime datatype to DateTime datatype. The data dictionary of this dummy dataset is as follows:release_date — Actual date column with first date value deleted . release_date_int — Another column containing date information but in an integer format , for example, date 2020–02–12 is present as 20200212 in YYYYMMDD format. — Another column containing date information but , for example, date release_date_text — Column containing dates in text format, and # as the separator . — Column containing only of the date data.
Serverless GPU-Powered Hosting of Machine Learning Models

Serverless GPU-Powered Hosting of Machine Learning ModelsPhoto by Caspar Camille Rubin on UnsplashMotivationWith the rise of MLOps in recent years, running machine learning models for inference tasks has become much easier. Meanwhile, Google with AI Platform Prediction and AWS with SageMaker offer solutions including inference accelerators for deep learning models. Alongside the major vendors, Algorithmia is touting itself as filling the gap in the market for truly serverless GPU model hosting. Currently a CPU version of a service is charged with $0.0001/sec and a GPU service with$0.0003/sec. Otherwise, Algorithmia is a super easy-to-use serverless machine learning platform, so I definitely recommend trying it out if your use case fits.
Semi-Supervised Learning Demystified with PyTorch and SESEMI

Semi-Supervised Learning Demystified with PyTorch and SESEMIArtistic render of the “three spirals” synthetic dataset used in “Exploring Self-Supervised Regularization for Supervised and Semi-Supervised Learning”. How can we use the world’s seemingly endless supply of unlabeled data to help us solve supervised learning problems? Self-supervised learning in essence is the practice of extracting supervisory information from completely unlabeled data to create a supervised learning task. For an excellent in-depth intro to the self-supervised learning world, see this blog post by Amit Chaudhary. By leveraging data augmentation, you can turn practically any supervised learning task into a semi-supervised task via self-supervision.
Multilayer Perceptron for Image Classification

IntroductionConvolutional Neural Network (CNN) has helped us to solve computer vision tasks like image classification. The transformer architecture, which is mostly used for Natural Language Processing tasks, can do the image classification task with a great performance on the ImageNet dataset. And it is comparable with the CNN model [1]. (2021) has proposed an architecture for image classification that only use a fully connected layer. Also, I will show you the implementation of this model using PyTorch.
Essence of Boosting Ensembles for Machine Learning

In this tutorial, you will discover the essence of boosting to machine learning ensembles. Tutorial OverviewThis tutorial is divided into four parts; they are:Boosting Ensembles Essence of Boosting Ensembles Family of Boosting Ensemble Algorithms AdaBoost Ensembles Classic Gradient Boosting Ensembles Modern Gradient Boosting Ensembles Customized Boosting EnsemblesBoosting EnsemblesBoosting is a powerful ensemble learning technique. Family of Boosting Ensemble AlgorithmsThere are a large number of boosting ensemble learning algorithms, although all work in generally the same way. We might consider three main families of boosting methods; they are: AdaBoost, Classic Gradient Boosting, and Modern Gradient Boosting. Notable examples included both Extreme Gradient Boosting (XGBoost) and the Light Gradient Boosting Machine (LightGBM) projects.
What On Earth Is Data Engineering?

What’s great about Data EngineeringNow that we know what Data Engineering is, and the role it plays in Data Science in a larger sense, what is so great about it? I would argue that Data Engineering skills are even more important that machine-learning skills, because the Data Engineering skills are foundational, and machine-learning skills do not really amount to much without data that is well engineered. Probably the best thing about Data Engineering and getting a Data Engineering job is that it can serve as an entry to a Data Science career path. That being said, Data Engineering is a much less accredited field — as employers do not believe that Data Engineering requires scholarly math (who would have thought?!) As I touched on before, Data Engineering skills are foundational skills for Data Science.
How many neurons for a neural network?

A neural network is a particular model that tries to catch the correlation between the features and the target transforming the dataset according to a layer of neurons. Let me just say that a neural network is made by some layers of neurons. In its simplest form, a neural network has only one hidden layer, as we can see from the figure below. In real-life examples, you would probably use Keras to build your neural network, but the concept is exactly the same. Don’t forget to scale your features before giving a dataset to a neural network.
I Finally Got a Data Science Job

Stage 1: Online applicationLike most job applications, the recruitment process began with an application via an online job portal. I want to work in data science because of its potential to solve many problems that we face in society today as well as its ability to disrupt industries at scale. This includes my ability to manipulate and interpret data using languages like Python and R, my experience in consulting and most of all, my passion work in data science as demonstrated by my continuous effort and initiative in doing data science projects and publishing articles online. What is most important here is making sure that your cover letter is coherent, professional and compelling. Your cover letter represents the recruiter’s first impression of you so make it count!
Perceptual Losses for Deep Image Restoration

One of the components influencing the performance of image restoration methods is a loss function, defining the optimization objective. In the case of image restoration, the goal is to recover the impaired image to visually match the pristine undistorted counterpart. Hand-crafted lossesIntuitively, a perceptual loss should decrease with the perceptual quality increasing. This has been a known issue for a long time with L1 used as a better alternative for image restoration. In our experiments, we compared loss functions on four image restoration applications: single image super-resolution with SR-ResNet, single image super-resolution with EDSR, denoising and JPEG artefact removal.
Introduction to Gaussian Process Programming in Plain English

For this study, please download the temperature dataset here. $pip install GPy$ pip install shapely$pip install numpy$ pip install pandas$pip install navpy$ pip install geopandasComplete this step by importing the necessary packages. Data Pre-processingLet’s now load our temperature dataset into a dataframe object. X = np.array([df_train['lat'], df_train['lon']]).Ty = np.atleast_2d(np.array(df_train["tmax"])).T kernel = GPy.kern.RBF(input_dim=2) m = GPy.models.GPRegression(X, y, kernel) # Plot the trained GP modelfig = m.plot()Gaussian Process PredictionIn the figure above, you have the radial level curves representing the uncertainty value. The less training data is available, the more uncertain the prediction becomes, and the lighter the color turns out to be.
Customer Churn Accuracy: Increased 4.6% With Feature Engineering

Therefore, I will give you a walk-through about how I increased 4% accuracy on a small customer-churn dataset with the extra customer service notes in today's blog. The company also provided the comments left by its customer service, indicating the customer's problems and how they helped the customers. And there are 115 unique customer service notes. Secondly, I applied sentiment analysis, sentence embedding, and TF-IDF for the customer service notes. XGBoostImage by authorI started with the basic xgboostclassifier model and gradually added the sentiment feature, sentence embedding, and TF-IDF into the model by steps.
Swarm Robotics: Projects, New Business Models & Technical Challenges

For instance, construction managers might decide to rent swarm robots to analyze a construction site and look for issues. Technical ChallengesSeveral characteristics that make swarm robotics strategic for many future use cases (autonomous, decentralized, etc.) AdaptationBased on my experience, to develop scalable swarm robotics, we need to find a solution to the limit of off-line resources. The following elements can explain such situation:Lack of standard definition for swarm robotics system and application problems. Despite promising business applications and models, several limiting factors are still preventing the development of scalable real-world swarm robotics systems.
Weekly review of Reinforcement Learning papers #8

Weekly review of Reinforcement Learning papers #8Image by the author[← Previous review][Next review →]Paper 1: Reinforcement Learning with Random DelaysRamstedt, S., Bouteiller, Y., Beltrame, G., Pal, C., & Binas, J. Reinforcement Learning with Random Delays. Paper 2: Reinforced Attention for Few-Shot Learning and BeyondHong, J., Fang, P., Li, W., Zhang, T., Simon, C., Harandi, M., & Petersson, L. (2021). In this article, the authors make the link between RL and few-shot learning, by training an attention mechanism with a reinforcement learning algorithm. Reinforcement learning could therefore help a lot in this domain.
Simplifying Reinforcement Learning Workflow in MATLAB

Simplifying Reinforcement Learning Workflow in MATLABImagine you were interested in solving a certain problem using Reinforcement learning. You have coded in your environment and you compile a laundry list of Reinforcement Learning (RL) algorithms to try. However, the Reinforcement Learning Designer app released with MATLAB 2021a is a strong contender in this category as well and this article is about that. Typical RL loop (image from mathworks.com)RL Designer app is part of the reinforcement learning toolbox. I have created a youtube series that delves into details of Reinforcement learning in MATLAB.
Foundations of NLP Explained — Bleu Score and WER Metrics

INTUITIVE NLP SERIESFoundations of NLP Explained — Bleu Score and WER MetricsPhoto by engin akyurt on UnsplashMost NLP applications such as machine translation, chatbots, text summarization, and language models generate some text as their output. N-gramAn ‘n-gram’ is actually a widely used concept from regular text processing and is not specific to NLP or Bleu Score. Bleu Score (Image by Author)Bleu Score can be computed for different values of N. Typically, we use N = 4. Bleu Score formula (Image by Author)Implementing Bleu Score in PythonIn practice, you will rarely have to implement the Bleu Score algorithm on your own. The nltk library, which is a very useful library for NLP functionality, provides an implementation of Bleu Score.
Transformers — You just need Attention

The transformers have an encoder-decoder structure and certain attention mechanism that gives State of the Art results in many tasks. In Natural Language processing, each number in word embedding has information related to linguistic features about the word. Note:- The size of positional embedding should be the same as our existing word embedding, That’s how it adds up. The embeddings that we discussed earlier will pass through 4 units:-Multi-head attention Add and norm layer Feed-Forward layer Add and norm1. In the paper, 8 attention heads are run in parallel, so multi-head attention.
A checklist to track your Machine Learning progress

To sum this up: While the focus of this checklist is on Deep Learning — as opposed to classic machine learning — it’s useful to learn the time-proven basics as well. Learning about the classic machine learning techniques like regression and clustering also includes learning their background. The intermediate level is where you’ll spend considerable time, there’s much to explore here: Language models, large datasets, custom training loops, hyperparameter search, and many, many more. In other words: Make your network do good on the training data, without making it do bad on the separate test data. Coming from supervised data — data where (human) annotations are available — , you now extend to unsupervised data.
The New Lathe Router And How To Use It

Router BasicsJust by analyzing the idea behind the Lathe Router, it is easy to see how a solution to this problem is quite hard to pin. The Lathe Router type performs this act of directing data to its respective model by utilizing returns and a passed function. Just as a pipeline or Lathe model is a LatheObject type, so is a Router. ModelingIn order to utilize our Lathe Router, we are going to first need to build some Lathe models. We will now make a new router, using the fn key-word argument to denote our filter function, and passing our Lathe objects.
How to Predict and Visualize Data in one Chart

TUTORIAL — PREDICTION — RHow to Predict and Visualize Data in one ChartIn one of my last projects, I was asked to perform a simple linear regression to foresee possible price developments. To compare the actual price development, we used the consumer price index as a baseline. This article will show you how I tried to achieve this with a different data set — using ggplot2 for plotting and linear regression for prediction. Image by the authorOktoberfest Beer Price and Inflation Index DataAs the leading price index in focus, I use available data from the Oktoberfest, the world’s largest beer festival. The data contains not only beer price information but also numbers of visitors, chicken prices, and beer sold.
Golang for Machine Learning?

Data ManipulationNow let’s explore how to perform data manipulation in Go. SubsettingOne of the easiest subsetting operations in Python is using the df.head() operation. FilteringSuppose now you want to explore the attributes for Iris-versicolor species only. You can filter the rows using the function Filter() as follows. versicolorOnly := df.Filter(dataframe.F{Colname: " Species",Comparator: "==",Comparando: "Iris-versicolor"}) fmt.Println(versicolorOnly)You will retrieve rows with Iris-versicolor species only!

Back propagation in a Neural Network | Image by AuthorIntroductionUnderstanding the mathematic operands behind Neural Networks (NNs) is highly important for the data scientist capabilities, in designing an efficient deep model. It is done by the gradient descent algorithm. Next, a backward pass is conducted to compute the weights gradient. | Image by AuthorTo find the best parameters that minimize the error, we use the gradient descent algorithm. The learning rate is a hyperparameter that has to be tuned.
What is it with NLP models and biblical names?

What is it with NLP models and biblical names? GIF via giphy — source: http://www.amc.com/shows/breaking-badThis blog post highlights an issue with spaCy and other Named Entity Recognition models not being able to accurately detect person names, especially if they are biblical names. The detection differences between regular names and biblical names are quite overwhelming. You’ll see that for the simplest example we can think of, “My name is X”, biblical names are almost never detected as names of persons. Recall: 0.94OntoNotes based model results:Model name: ner-english-ontonotesName set: Biblical, Template: "My name is {}"Recall: 0.50Name set: Other, Template: "My name is {}"Recall: 1.00Name set: Biblical, Template: "And {} said, Why hast thou troubled us?"
3 Key Concepts in Machine Learning

3 Key Concepts in Machine LearningPhoto by RetroSupply on UnsplashWhen I first started to learn about data science, support vector machine was my favorite algorithm. I see the algorithms as the shining surface of the machine learning box. In order to solve problems with machine learning, we need to a have comprehensive understanding of what lays in the box. They are the basic principles and concepts that are essential to implement machine learning algorithms successfully. However, they are of crucial importance for the performance and accuracy of the machine learning algorithms.
Simplify Polylines with the Douglas Peucker Algorithm

Simplify Polylines with the Douglas Peucker AlgorithmImage by AuthorI. Although the two apps were written in two different frameworks (Flutter & Android Native), in both instances I ended up using implementations of the Douglas Peucker algorithm. One of which has been widely implemented and often refers to as the Douglas Peucker algorithm. In their paper, Douglas & Peucker (1972) refers to these two points as the anchor point and the floating point, respectively. The CodeFigure 10: Output of sample codeThe code below is my implementation of the simplest form of the Douglas Peucker algorithm.
Dissecting ML Models With NoPdb

Dissecting a Vision TransformerTo see NoPdb in action, we will apply it to a Vision Transformer (ViT). Here are some of them:Average attention weights in the 5th Transformer block (contrast increased for better viewing). The brighter each patch, the higher the attention weight. I picked a somewhat silly example to demonstrate this: we are going to take the pre-softmax attention weights in all layers and multiply them by 3. Let’s see how this changes the predictions:balloon 0.2919192612171173alp 0.12357209622859955valley 0.049703165888786316parachute 0.0346514955163002airship 0.019190486520528793And the attention patterns we captured:The same plots as above, but after tweaking the attention weights.
Comparative Analysis of Bins Method and Convolutional Neural Network for Malaria Detection

2: Workflow of the systemApproach 1: Using Bins MethodImage PreprocessingTraining a model on raw images may lead to bad classification performances. Extracting color features using Bins Method:Step1: Spilt the RGB cell image into 3 planes- R, G and B planes. Bins Method also reduces the complexity to a great extent as the feature vector is of 8 components. Table 1: Results obtained for CNN and Bins methodThe Malaria classification system proved to be faster than the traditional techniques. [2] H. B. Kekre, Kavita Sonawane, “ Image Retrieval Using Histogram Based Bins of Pixel Counts and Average of Intensities” (pp.
How to Fine-Tune GPT-2 for Text Generation

Performance EvaluationThere are many ways to evaluate the quality of generated text. The algorithm outputs a score between 0 and 1, depending how similar a generated text is to reality. In comparison, the BLEU score for the GPT-2 model without any fine-tuning was of 0.288. It was originally created for machine translation and only looks at the vocabulary used to determine the quality of a generated text. In that regard, punctuation in the input text is absolutely essential when generating lyrics.
Build an intelligent search solution with automated content enrichment

We combine this content with metadata automatically generated using Amazon Comprehend Medical, into a unified Amazon Kendra index to make it searchable. Solution overviewWe take a two-step approach to custom content enrichment during the content ingestion process:Identify the metadata for each document using Amazon Comprehend Medical. Ingest the document along with the metadata in the search solution based on an Amazon Kendra index. These are defined as custom attributes in the CloudFormation template that we used to create the Amazon Kendra index. This example used the entities detected by Amazon Comprehend Medical to generate the Amazon Kendra metadata.
Choosing a Baseline Accuracy for a Classification Model

Choosing a Baseline Accuracy for a Classification ModelPhoto by Robert Anasch on UnsplashWhen you evaluate a machine learning model and end up with an accuracy number or other metric, you need to know if it is meaningful. This article is just to show the simplistic baseline accuracy for your model that I find useful. For highly imbalanced problems (like the voting classification problem), a model accuracy even a little higher than ZeroR could be significant. Unlike the ZeroR baseline, the Random Rate Classifier uses the class weights as part of the classification. There are other baselines that you might find useful (uniform guessing, random guessing, and One Rule are a few).
Book Review: Hands-on Machine Learning with Scikit-learn and TensorFlow

Book Review: Hands-on Machine Learning with Scikit-learn and TensorFlowPhoto by Kourosh Qaffari on UnsplashHands-on Machine Learning with Scikit-learn, Tensorflow & Keras is probably one of the most popular ML books (if not the most). I have just finished it recently and I enjoyed it so much that I thought it would be worth writing a book review about it. If you aren’t familiar with this book, it's an O’Reilly production book (which means the content quality is awesome), and it's meant to target beginners in machine learning. The book contentOne of the most impressive things about the book is that it genuinely covers almost everything in machine learning. Final thoughtsI am not sponsored by O’Reilly and I will not benefit in any way if you buy this book, this is simply a book review from someone learning about machine learning and is trying to share their experiences.
6 Best Python IDEs and Text Editors for Data Science Applications

Although many IDEs and text editors offer many properties and options to customize your development environment, there is no absolute best IDE. This article will go through the top 6 Python IDEs and text editors often used by developers to make their workflow as smooth and as efficient as possible. But, hundreds of IDEs and text editors are out there and trying them all is not the most feasible or time-efficient solution. So, in this article, I proposed to you 6 Python IDEs and text editors to chose from if you’re on the market for an IDE or text editor. So, give these IDEs and text editors a try and see which one of them suits your personality and requirements best.
3 signs that your AI project is doomed

3 signs that your AI project is doomedAdapted from Wikipedia. Special case: No data (and other basic requirements)Since machine learning is magic and everyone’s doing it, you can too! Special case: Toxic snobberyThe AI industry is rife with a special kind of bad apple, the bully who splits AI workers into two categories: legitimate AI participants like themselves and barely-necessary appendages. There are many different kinds of legitimate participants in the applied AI space and — depending on its difficulty — your project might need them all. They’re hoping to sprinkle some machine learning magic pixie dust over their work because all their friends are doing it.
Genetic Algorithm Robot: Evolving Altitude, using Python, C++ and an Arduino.

Genetic Algorithm Robot: Evolving Altitude, using Python, C++ and an Arduino. In the process of conducting the research, we found that programming a genetic algorithm was cumbersome for novice users to implement. The Arduino then executes those instructions while also reading the heights from the USRM and transmits them back to python using serial communication. Python: Python was used as the main language to handle the genetic algorithm while also sending directions to the Arduino. Robot Fitness Function:The fitness function determines how well a chromosome does at a specific task.
Where Does the Mean Squared Error Come from?

The mean squared error. Where Does the Mean Squared Error Come from? The simplest and one of the most commonly used loss functions in machine learning is the mean squared error, defined bywhich is just the mean of the squared distances between ground truth and prediction. Taking the average of normsThe mean squared error is the mean of squared errors. If you are interested in more details about the mean squared error, check out the following post by Tirthajyoti Sarkar!
Predictive Maintenance: Machine Learning vs Rule-Based Algorithms

Predictive Maintenance: Machine Learning vs Rule-Based AlgorithmsIs it better to go with machine learning or should you get started with a rule based algorithm first in your predictive maintenance project? Predictive Maintenance — ML vs RULE BASED (Photo by Isis França on Unsplash)What is Predictive Maintenance? Machine Learning Predictive MaintenanceThe implement a predictive machine learning model the domain knowledge of your team is still inevitable, especially when it comes to feature engineering, but it is not necessary to define specific rules contrary to what we saw it in the rule based predictive model. You then train the machine learning model with the help of the created training set to predict upcoming errors. Please note that there is a ton of different machine learning models and we will explore them in future articles, for now the main focus is differentiation rule based vs machine learning models.
Clustering Algorithm for data with mixed Categorical and Numerical features

The k-Means algorithm is not applicable to categorical data, as categorical variables are discrete and do not have any natural origin. k-Prototype is an extension of the k-Modes algorithm that works for mixed categorical and numerical features. What is k-Modes and k-Prototype Algorithm:k-Modes is an algorithm that is based on the k-Means algorithm paradigm and it is used for clustering categorical data. While training KPrototypes algorithm will training the categorical features using the k-Modes algorithm and the remaining numerical features will be trained using the standard k-Means algorithm. KPrototypes algorithm combines the implementation of k-Modes and k-Means algorithm to create clusters for data with mixed data types.
Trivially Scale PyTorch on AWS

Running on GPUsRunning on a single CPU machine… meh, not a big deal. This GAN script can actually run on 2 GPUs… to enable that, let’s add the script flags (these arguments here). Now we need to tell grid to run that model on 2 GPUs (i’m choosing a machine with 4 v100s). Now my uptight-anaconda is running 4 versions of this GAN, each on 2 GPUs! Under the hood grid has spun up the 2 machines and automagically distributed the models like this:
Quality > Quantity: Cleaning Noisy Datasets Using Training Dynamics

Quality > Quantity: Cleaning Noisy Datasets Using Training DynamicsPhoto by Najib Kalil on UnsplashWith deep learning at the helm, we have seen a huge increase in the amount of data being used in the field of NLP. Finally, plot all the data samples on the data map: a x-y plot with (x-axis as variability) and (y-axis as confidence), also optionally (hue as correctness). Let’s see how these data maps can be used to find wrongly labeled samples in the data. Moreover, introducing a new medium (of training dynamics) opens doors for further research in the same direction. ReferencesPaper discussed: Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics, Swayamdipta et al., 2020Example dataset used: Hateful Symbols or Hateful People?
Ensemble Machine Learning With Python (7-Day Mini-Course)

Ensemble learning refers to machine learning models that combine the predictions from two or more models. Modern machine learning libraries like scikit-learn Python provide a suite of advanced ensemble learning methods that are easy to configure and use correctly without data leakage, a common concern when using ensemble algorithms. Kick-start your project with my new book Ensemble Learning Algorithms With Python, including step-by-step tutorials and the Python source code files for all examples. Lesson 02 : Bagging Ensembles: Bagging Ensembles Lesson 03 : Random Forest Ensemble: Random Forest Ensemble Lesson 04 : AdaBoost Ensemble: AdaBoost Ensemble Lesson 05 : Gradient Boosting Ensemble: Gradient Boosting Ensemble Lesson 06 : Voting Ensemble: Voting Ensemble Lesson 07: Stacking EnsembleEach lesson could take you 60 seconds or up to 30 minutes. This is called an ensemble machine learning model, or simply an ensemble, and the process of finding a well-performing ensemble model is referred to as “ensemble learning.”Although there is nearly an unlimited number of ways that this can be achieved, there are perhaps three classes of ensemble learning techniques that are most commonly discussed and used in practice.
Google AI Blog: Crisscrossed Captions: Semantic Similarity for Images and Text

This undermines research into how inter-modality learning (connecting captions to images, for example) impacts intra-modality tasks (connecting captions to captions or images to images). The Crisscrossed Captions (CxC) dataset extends the development and test splits of MS-COCO with semantic similarity ratings for image-text, text-text and image-image pairs. Two different text encoding methods were used, but only one text similarity matrix has been shown for simplicity. Bottom: Image similarity matrix for each image in the dataset, resulting in a 5k x 5k matrix. Last, we then use these new intramodal pairs and their human ratings to select new intermodal pairs for human rating.
How Genworth built a serverless ML pipeline on AWS using Amazon SageMaker and AWS Glue

Genworth’s Advanced Analytics team engaged in an AWS Data Lab program led by Data Lab engineers and solutions architects. Component 2: ML batch inferenceGenworth’s Advanced Analytics team has already been using ML on premises. SageMaker batch transform manages the compute resources, installs the ML model, handles data transfer between Amazon S3 and the ML model, and easily scales out to perform inference on the entire dataset. In this post, we showed you how easy it is to build a serverless ML pipeline at scale with AWS Data Analytics and ML services. Genworth, Genworth Financial, and the Genworth logo are registered service marks of Genworth Financial, Inc. and used pursuant to license.
Perform batch fraud predictions with Amazon Fraud Detector without writing code or integrating an API

Unlike general-purpose machine learning (ML) packages, Amazon Fraud Detector is designed specifically to detect fraud. Now, you can generate batch predictions in Amazon Fraud Detector to quickly and easily evaluate a large number of events for fraud. Perform a batch prediction job through the Amazon Fraud Detector console. Create and publish a detectorYou can create and publish a detector version using the Amazon Fraud Detector console or via the APIs. For more information about Amazon Fraud Detector, including links to additional blog posts, sample notebooks, user guide, and API documentation, see Amazon Fraud Detector.
Create a serverless pipeline to translate large documents with Amazon Translate

In our previous post, we described how to translate documents using the real-time translation API from Amazon Translate and AWS Lambda. This event-driven architecture shows the flow of actions when a new document lands in the input Amazon Simple Storage Service (Amazon S3) bucket. In addition to Amazon S3 costs, the solution incurs usage costs from Amazon Translate, Lambda, and Step Functions. For more information, see Amazon Translate pricing, Amazon S3 pricing, AWS Lambda pricing, and AWS Step Functions pricing. For more information, see the Amazon Translate Developer Guide and Amazon Translate resources.
How to Schedule Python Scripts With Cron — The Only Guide You’ll Ever Need

As discussed before, you must follow a specific syntax to schedule cron jobs. We want get_users.py to run on every even minute (e.g., 0, 2, 4) and get_posts.py to run on every odd minute (e.g., 1, 3, 5). Here’s the correct pattern to run a job on every even minute:Image 1 — Cron job pattern for every even minute execution (image by author)And here’s for the odd minutes:Image 2 — Cron job pattern for every odd minute execution (image by author)Neat — let’s use these patterns to schedule executions. You’ll have to specify the scheduling pattern, full path to the Python executable, and full path to the script to make scheduling work. Use the following image as a reference:Image 5 — Editing a crontab file (image by author)Once done, press the ESC key to exit the insert mode, immediately followed by :wq and ENTER keys.
4 Ways to Visualize Individual Decision Trees in a Random Forest

Today, we'll discuss 4 different ways to visualize individual decision trees in a Random Forest. Plot decision trees using sklearn.tree.plot_tree() functionfunction Plot decision trees using sklearn.tree.export_graphviz() functionfunction Plot decision trees using dtreeviz Python packagePython package Print decision tree details using sklearn.tree.export_text() functionThe first three methods build the decision tree in the form of a graph. Building a random forest model on “wine data”Before discussing the above 4 methods, first, we build a random forest model on “wine data”. Accessing individual decision trees in a random forestThe number of trees in a random forest is defined by the n_estimators parameter in the RandomForestClassifier() or RandomForestRegressor() class. Plot decision trees using sklearn.tree.plot_tree() functionThis is the simple and easiest way to visualize a decision tree.
Decision Tree — Implemented from scratch

In this article, we will present one of the most basic machine-learning algorithms known as a Decision Tree. The training processTraining a decision tree is a process of identifying the most optimal structure of nodes. To find the best split (k, x[t]), we need to compare various options and choose the one that maximizes the gain. The treeThe decision tree is composed of nodes. Our decision tree is a binary tree (every node branches out to two sub-nodes).
How (Gaussian) Naive Bayes works

Naive Bayes makes the assumption that the features are independent. This means that we are still assuming class-specific covariance matrices (as in QDA), but the covariance matrices are diagonal matrices. Python implementationThe code underneath is a simple implementation of (Gaussian) Naive Bayes that we just went over. Underneath is a chart with the data points (color coded to match their respective classes), the class distributions that our (Gaussian) Naive Bayes model finds, and the decision boundaries generated by the respective class distributions. Charts of the data points with their respective classes color coded, the class distributions found by our (Gaussian) Naive Bayes model, and the resulting decision boundary from the class distributions.
How Grouping Works with Python Pandas vs R Data Table

How Grouping Works with Python Pandas vs R Data TablePhoto by Tetiana SHYSHKINA on UnsplashWhat are the average house prices in different cities of the US? All these questions can be answered by using a grouping operation given that we have proper data. Most data analysis libraries and frameworks implement a function to perform such operations. In this article, we will compare two of the most popular data analysis libraries with regards to tasks that involve grouping. The first one is Python Pandas and the other is R data table.
Don’t Make This Mistake with Scaling Data

Don’t Make This Mistake with Scaling DataPhoto by Kelly Sikkema on UnsplashMinMaxScaler is one of the most commonly used scaling techniques in Machine Learning (right after StandardScaler). From sklearns documentation: Transform features by scaling each feature to a given range. This estimator scales and translates each feature individually such that it is in the given range on the training set, e.g. Now, let's try to input values greater than the max:scaler.transform(np.array([[2, 20]])) # array([[1.5 , 1.125]])Scalar returns a value greater than 1. Follow me on Twitter, where I regularly tweet about Data Science and Machine Learning.
From DevOps to MLOPS: Integrate Machine Learning Models using Jenkins and Docker

The objective of this article is to integrate machine learning models with DevOps using Jenkins and Docker. One example is when we train a machine learning model, it is necessary to continuously test the models for accuracy. This task can be fully automated using Jenkins. There are several advantages using Jenkins. In this article, we will see how to integrate a machine learning model (Linear Discriminant Analysis and Multi-layer Perceptron Neural Network) trained on EEG data using Jenkins and Docker.
5 Unique Skills Every Data Scientist Should Know

Keep reading if you would like to learn more about these five unique data science skills. This skill is incredibly important if you decide to be employed by a company where data science is more customer-facing. These are the people you will have to explain your complex data science model to in a way that is easy to understand. These areas are beneficial in some aspects, but where data scientists can often struggle is during the complex coding process around machine-learning algorithms and data science model deployment. All in all, these are just some of the skills that every data scientist should know that could be unique and new to you.
Penalty Generation for non-masked Vehicle drivers using Number Plate

License Number Plate Image preprocessingWe apply the image processing technique on the number plate to reduce the image size and track the number late by drawing a rectangular box around the number plate. Below is an example of a license number plate detected using the rectangular box around the plate in a given image using OpenCV. Image by AuthorExtracting Text from License using OpenCV and PytesseractWe can extract the License Number using OpenCV. Below is the Vehicle image upload page which receives input from the user and processes the vehicle image to obtain the text of the number plate. Also, we need to work on improving the OpenCV-based Number Plate Extraction accuracy as the wrong Region of Interest may result in extracting empty Number Plate text.
Google AI Blog: Introducing FELIX: Flexible Text Editing Through Tagging and Insertion

However these models appear to be a suboptimal choice for many monolingual tasks, as the desired output text often represents a minor rewrite of the input text. The tagging model employs a novel pointer mechanism, which supports structural transformations, while the insertion model is based on a Masked Language Model. The Tagging ModelThe first step in FELIX is the tagging model, which consists of two components. The Insertion ModelThe output of the tagging model is the reordered input text with deleted words and MASK tokens predicted by the insertion tag. Example of the insertion model, where the tagger predicts two words will be inserted and the insertion model predicts the content of the MASK tokens.
Segment paragraphs and detect insights with Amazon Textract and Amazon Comprehend

To overcome these manual processes, we have AWS AI services such as Amazon Textract and Amazon Comprehend. Amazon Simple Notification Service (Amazon SNS) – A fully managed messaging service that is used by Amazon Textract to notify upon completion of extraction process. Deploy the architecture with AWS CloudFormationYou deploy an AWS CloudFormation template to provision the necessary AWS Identity and Access Management (IAM) roles, services, and components of the solution, including Amazon S3, Lambda, Amazon Textract, Amazon Comprehend. Postprocess the Amazon Textract response to segment paragraphsWhen the document is submitted to Amazon Textract for text detection, we get pages, lines, words, or tables as a response. With managed ML services like Amazon Textract and Amazon Comprehend, you can gain insights into your previously undiscovered data.
Automate multi-modality, parallel data labeling workflows with Amazon SageMaker Ground Truth and AWS Step Functions

This is the first in a two-part series on the Amazon SageMaker Ground Truth hierarchical labeling workflow and dashboards. AWS services used to implement this solutionThis solution creates and manages Ground Truth labeling jobs to label video frames using multiple types of annotations. The first post covers the Step Functions workflow that automates advanced ML data labeling workflows using Ground Truth for chaining and hierarchical label taxonomies. CheckForFirstLevelCompleteThis step waits for the FIRST_LEVEL Ground Truth labeling jobs triggered from the TriggerLabelingFirstStep . For more information about the data lake for Ground Truth dataset annotations and worker metrics from Ground Truth, check back to the Ground Truth blog for the second blog post in this series(coming soon).
Automatically scale Amazon Kendra query capacity units with Amazon EventBridge and AWS Lambda

In this post we’ll demonstrate how you can automatically scale your Amazon Kendra index based on a time schedule using Amazon EventBridge and AWS Lambda. DEFAULT_UNITS – The number of query processing units that your Amazon Kendra Enterprise Edition requires to operate at minimum capacity. ADDITIONAL_UNITS – The number of query capacity units you require at those times where additional capacity is required. This allows your index to scale up with the additional query capacity units at 7 AM and scale down at 8 PM. ConclusionIn this post, you deployed a mechanism to automatically scale additional query processing units for your Amazon Kendra Enterprise Edition index.
The hidden world of GPU inefficiency

The hidden world of GPU inefficiencyPhoto by Dušan veverkolog on UnsplashLike a sleeping lion, the GPUs we already use have way more power than meets the eyeIn the last post, we explored how near-future business transformation is threatened by a GPU supply pinch. But despite the rising criticality of accelerators in the modern data center, GPU sticks out like a sore thumb, remaining stubbornly static in its configurations. Spiky workloads and selfish utilizationWhen an organization asks “how do our GPUs perform?”, they might run a GPU benchmark. But benchmarks are designed to push the GPU to maximum utilization — all they can tell you is the best-case performance scenario. This is the hidden world of GPU inefficiency.
What Kind of Data Science Learner Are You?

People come to data science from diverse professional, academic, and cultural backgrounds, and what works for some might not do the job for others. In this week’s Author Spotlight, for example, prolific contributor Khuyen Tran shared her tried-and-true method for digesting complex topics: teaching them. It’s only when Khuyen translates concepts into something others can understand and use themselves that they click for her, too. Photo by Alexander Gamanyuk on UnsplashFor every learn-by-teaching data scientist there must be at least one (probably more?) Choose your adventure:If you have a fresh take on studying data science or have accumulated some memorable experiences (good or bad) along your learning journey, consider sharing them with our community.
Online Learning: Recursive Least Squares and Online PCA

Online LearningOnline Learning, is a subset of Machine Learning which emphasizes the fact that data generated from environments can change over time. Online Learning can then be used in order to provide a definitive answer to this problem. Recursive Least SquaresIntroductionRecursive Least Squares (RLS) is a common technique used in order to study real-time data. The smaller the value of λ and the lower will be the importance of past input values.) The smaller the value of and the lower will be the importance of past input values.
7 Steps to Design a Basic Neural Network (part 2 of 2)

7 Steps to Design a Basic Neural Network (part 2 of 2)Image by Lindsay Henwood on UnsplashIn Part 1 of 2 of this segment, we saw the limitation of using a traditional prediction model like logistic regression to correctly classify two colors in a noisy dataset. Then, we built our own neural network structure, initialized parameters, and computed the forward propagation activation functions. In this Part 2 of 2, we will complete the build of our neural network model to better classify the color dots in our original dataset. Step 4: Estimate CostWe discussed earlier the importance of minimizing cost in our neural network model. Z and Activation Functions for 1-Hidden Layer Neural Network (image by author)So converting this cost equation to python, we can use np.dot to compute the product and np.log to compute the logarithms.
7 Steps to Design a Basic Neural Network (part 1 of 2)

7 Steps to Design a Basic Neural Network (part 1 of 2)Image by Lindsay Henwood on UnsplashThis two-part article takes a more holistic, overarching (and yes, less math-y) approach to building a neural network from scratch. With that visual in mind, we first have to structure our neural network. Hidden Layer: In a process known as forward propagation, input layer data is transmitted to each of 4 nodes in the hidden layer. Z and Activation Functions for 1-Hidden Layer Neural Network (image by author)Let’s unpack details from the 4 functions shown above. function computes the product of all input layer data (a way to represent all our X feature data) and input layer weights , and adds all input layer bias terms .
Think outside the ‘black’ box

Think outside the ‘black’ boxPhoto by Kevin Ku on UnsplashIntroductionArtificial Intelligence plays a big role in our daily lives. These models are often referred to as ‘black box’ models. Figure 1: Black box representation [Slimmer AI]One thing that can help humans understand AI is explainable AI (X-AI). In contrast to black box models, glass box models offer increased interpretability. In a glass box model all parameters are known to us and we know exactly how the model comes to its conclusion, giving us full transparency.
Convert a Tensorflow2 model to OpenVINO®

One of the common usages of this technology is using openVINO to make a fast inference (i.e. We will go trough the process of converting a model, load it into an Inference engine plugin and perform an inference. We will considered model trained using Tensorflow, even if openVINO supports many other frameworks. Using the openVINO model optimiser we converted it into a new representation required to load the model to the Inference Engine module. When the trained model was ready using Inference Engine to infer an input data it is almost trivial.
Learning Pathways For Data-Science — What Should You Learn?

A Common Starting PointIn most if not all applications of Data Science, the key word is going to be data. Data is a vital part of nearly every application of Data Science. I would also like to say that this is probably one of the most accessible forms of Data Science, right beside Data Engineering. Data EngineeringThe last path I wanted to review in the world of Data Science is that of the Data Engineer. Similarly to the Data Scientist that creates and deploys endpoints, Data Engineer roles are incredibly accessible.
Machine Learning In The World Of Blockchain and Cryptocurrency

Mining cryptocurrency involves employing computing resources to guess a set of values used to solve a function on a blockchain. The paper's authors devised a multidimensional RL algorithm that uses Q Learning (model-free algorithm) to optimise cryptocurrency mining. The authors proved that through machine learning techniques, the development of performant mining strategies could be solved. Several mining companies such as Argo Blockchain, Riot Blockchain and Hive Blockchain have reportedly mined several millions worth of Bitcoins. TakeawayResearchers have created reinforcement learning systems that provide optimisation on cryptocurrency mining strategies.
Fake News Detection with Machine Learning, using Python

Fake News Detection with Machine Learning, using PythonOne of the most challenging area of Machine Learning is the one that regards the language and it is known as Natural Language Processing (NLP). It is true that all the area of Machine Learning can be complex and challenging at some level, but NLP is particularly difficult as it requires to explore human communication and, somehow, human consciousness. Plus, we will use a traditional Machine Learning tool that is becoming more and more popular for its easiness of use and its interesting features: PyCaret. Ok, so now that we have the data we can start with the Machine Learning part. Now let’s use PyCaret and its Machine Learning models:Now, the best model is saved as best_model and it is the Random Forest Classifier.
Enhancing Autoencoders with memory modules for Anomaly Detection.

Enhancing Autoencoders with memory modules for Anomaly DetectionPhoto by Matthew Hamilton on UnsplashVideos, Anomalies and AutoencodersDetecting anomalies in video streams is a hard task. These keys are used to update the memory items and fetch relevant memory items during the training phase. MNAD introduces a Test Time Memory updation scheme which allows for a finely tuned memory module at all times of inference. MNAD introduces feature separateness loss and compactness loss which incentivises populating the memory module with diverse memory items. Together, both these losses help increase the diversity of the memory items stored in the Memory module and increases the discriminative power of the Memory module.
Bias in Your Datasets: COVID-19 Case Study

For this, the brightness distribution between NORMAL images and COVID images was compared. Brightness distribution for Covid/Normal images (Image by Author)A significant difference in the distribution of brightness was observed. The t-SNE algorithm seems to extract important information from 28x28 non-detail images that allows it to make a good classification. The brightness of the edges therefore makes it possible to correctly separate the covid images from the non-covid images in our dataset, which represents a real bias in our study! This required the use of a U-Net neural network, pre-trained on CXR images and developed specially for lung segmentation.
How To Set Up An ML Data Labeling System

Conceptually, a labeling system is simple: a user should be able to send data to a labeling system and get back labels for that data. To help you choose what’s right for you, here’s some lessons we’ve learned on how to set up a great ML data labeling system. Even as you switch labeling workforces or tools, you can use the same labeling instructions to quickly get a new labeling system up and running. For examples of great labeling instructions, Landing AI provides a great set of labeling instructions for industrial defect detection and some examples of instructions for difficult / ambiguous scenarios. After that, you can send the same golden set of data (without labels) and identical labeling instructions to your candidate labeling systems.
How to Develop a Weighted Average Ensemble With Python

Tutorial OverviewThis tutorial is divided into four parts; they are:Weighted Average Ensemble Develop a Weighted Average Ensemble Weighted Average Ensemble for Classification Weighted Average Ensemble for RegressionWeighted Average EnsembleWeighted average or weighted sum ensemble is an ensemble machine learning approach that combines the predictions from multiple models, where the contribution of each model is weighted proportionally to its capability or skill. Download Your FREE Mini-CourseDevelop a Weighted Average EnsembleIn this section, we will develop, evaluate, and use weighted average or weighted sum ensemble models. For example, we can define a weighted average ensemble for classification with two ensemble members as follows:... # define the models in the ensemble models = [('lr',LogisticRegression()),('svm',SVC())] # define the weight of each model in the ensemble weights = [0.7, 0.9] # create a weighted sum ensemble ensemble = VotingClassifier(estimators=models, weights=weights) 1 2 3 4 5 6 7 . Weighted Average Ensemble for ClassificationIn this section, we will look at using Weighted Average Ensemble for a classification problem. Weighted Average Ensemble for RegressionIn this section, we will look at using Weighted Average Ensemble for a regression problem.
Google AI Blog: Do Wide and Deep Networks Learn the Same Things?

In “Do Wide and Deep Networks Learn the Same Things? In very wide or very deep models, we find a characteristic block structure in their internal representations, and establish a connection between this phenomenon and model overparameterization. Error Analysis of Wide and Deep ModelsHaving explored the properties of the learned representations of wide and deep models, we next turn to understanding how they influence the diversity of the output predictions. On both CIFAR-10 and ImageNet datasets, wide and deep models that have the same average accuracy still demonstrate statistically significant differences in example-level predictions. We also show that wide and deep models exhibit systematic output differences at class and example levels.
Achieve 12x higher throughput and lowest latency for PyTorch Natural Language Processing applications out-of-the-box on AWS Inferentia

With AWS Inferentia you can also achieve out-of-the-box highest performance and lowest cost on opensource NLP models, without the need for customizations. To maximize inference performance of Hugging Face models on AWS Inferentia, you use AWS Neuron PyTorch framework integration. This performance boost comes with minimum impact on latency, because AWS Inferentia is optimized to maximize throughput at small batch sizes. Learn more about the AWS Inferentia chip and the Amazon EC2 Inf1 instances to get started running your own custom ML pipelines on AWS Inferentia using the Neuron SDK. He helps customers use AWS Inferentia and the AWS Neuron SDK to accelerate and scale ML workloads in AWS.
How to Use The Progressive Overload Method to Aggressively Improve Your Data Science Skills in 6 Weeks

The progressive overload method was first developed by Milo of Croton, a six-time Olympic champion who lived in the 6th century B.C. The progressive overload method is best used by individuals who already have a grounding in the basic foundations of data science. The same principle applies to using the progressive overload method for improving your data science skills. The progressive overload method benefits the data science learning experience by providing a definite framework that ensures consistent improvements in your skills. How to use the progressive overload method to improve your data science skills.
How to adapt a multilingual T5 model for a single language

How to adapt a multilingual T5 model for a single languageT5 is an encoder-decoder transformer from Google that once was SOTA on several NLU and NLG problems and is still very useful as a base for seq2seq tasks such as text summarization. The first T5 model was for English only, and then the massively multilingual version followed. We also preserve a small number of English tokens in the model, in order to make it bilingual. We start by loading the existing multilingual model. The Russian T5 model is available in the Huggingface repository.
How To Set Up An ML Data Labeling System

Conceptually, a labeling system is simple: a user should be able to send data to a labeling system and get back labels for that data. To help you choose what’s right for you, here’s some lessons we’ve learned on how to set up a great ML data labeling system. Even as you switch labeling workforces or tools, you can use the same labeling instructions to quickly get a new labeling system up and running. For examples of great labeling instructions, Landing AI provides a great set of labeling instructions for industrial defect detection and some examples of instructions for difficult / ambiguous scenarios. After that, you can send the same golden set of data (without labels) and identical labeling instructions to your candidate labeling systems.
WiDS Datathon 2021: My First Kaggle Datathon

WiDS Datathon 2021: My First Kaggle DatathonPhoto by Diabetesmagazijn.nl on UnsplashThe Women in Data Science (WiDS) is a global movement to support data scientists around the world. The best part of this competition is the local workshops and online resources to help beginners to advance their data science skills. Data CleaningThe most important part of a data science project is to clean the data properly. But, this approach won’t work for this dataset because all the missing values of ‘bmi’ have either missing values of ‘height’ or ‘weight’ associated with it. I recommend all beginners to participate in datathons so that you could improve your data science skills.
Who’s Who and What’s What: Advances in Biomedical Named Entity Recognition (BioNER)

[3] This rich ontology, containing millions of concepts, is used in several studies that focus on creating BioNER systems. The MedMentions dataset is ‘only’ annotated with 352k mentions of UMLS concepts, but it is still one of the most challenging datasets out there. learned name embeddings by using a kind of ‘Siamese’ network: They used pre-trained word embeddings and character embeddings as an input to a BiLSTM. The authors utilize both semantic and syntactic information in an unsupervised way, by using pre-trained word embeddings with syntactic parsing information. They trained several one-versus-all classifiers using pre-trained word embeddings and a BiLSTM classifier on top, one classifier for each type of entity.
What Is Embedding and What Can You Do with It

They pioneered the concept of word embedding as the foundation of the technique. Many concepts in machine learning or deep learning are built on top of one another, and the concept of embedding is no exception. Ideally, an embedding captures some of the semantics of the input by placing semantically similar inputs close together in the embedding space. Here is another example talked about on Google Machine Learning Crash Course, talking about how to use embedding for a movie recommender system: Embeddings: Categorical Input Data. Summary:In the context of machine learning, embedding functions as a task-specific dictionary.
Feature selection in machine learning using Lasso regression

I have already talked about Lasso regression in a previous blog post. So, the idea of Lasso regression is to optimize the cost function reducing the absolute values of the coefficients. So, the idea of using Lasso regression for feature selection purposes is very simple: we fit a Lasso regression on a scaled version of our dataset and we consider only those features that have a coefficient different from 0. Obviously, we first need to tune α hyperparameter in order to have the right kind of Lasso regression. If you want to learn more about Lasso Regression, join my Supervised Machine Learning in Python online course.
Neural Networks for Real-Time Audio: Stateless LSTM

The second neural network architecture we will be analyzing for real-time audio is the Stateless LSTM. We can do this by adding 1-D convolutional layers prior to the stateless LSTM layer. Keras/Tensorflow was chosen for implementing the Stateless LSTM model. In Keras, the LSTM layer is stateless by default, so the only parameter needed is the number of hidden_units . The input_size defines how many previous audio samples will be used to predict the current sample.
What Makes Graph Convolutional Networks Work?

What Makes Graph Convolutional Networks Work? By now, if you’ve been following this series, you may have learned a bit about graph theory, why we care about graph structured data in data science, and what the heck a “Graph Convolutional Network” is. But we’re focused on Graph Convolutional Networks, for classification on graphs*. Let’s take a quick second to look at the forward propagation formula:Forward propagation equation for regular neural networks. Similar to any neural network, Graph Convolutional Networks need a way to propagate values forward through layers to achieve their goals.

The 9th International Conference on Learning Representations (ICLR 2021), a virtual conference focused on deep learning, kicked off this week, offering conference and workshop tracks that present some of the latest research in deep learning and its applications to areas such as computer vision, computational biology, speech recognition, text understanding, and more. As a Platinum Sponsor of ICLR 2021, Google will have a strong presence with over 100 accepted publications and participation on organizing committees and in workshops. If you have registered for ICLR 2021, we hope you’ll watch our talks and learn about the work at Google that goes into solving interesting problems for billions of people. Learn more about our research being presented in the list below (Googlers in bold). Zhen Qin, Le Yan, Honglei Zhuang, Yi Tay, Rama Kumar Pasumarthi, Xuanhui Wang, Michael Bendersky, Marc NajorkDo Wide and Deep Networks Learn the Same Things?
Will Hurd Joins OpenAI’s Board of Directors

OpenAI is committed to developing general-purpose artificial intelligence that benefits all humanity, and we believe that achieving our goal requires expertise in public policy as well as technology. So, we’re delighted to announce that Congressman Will Hurd has joined our board of directors. Will served three terms in the U.S. House of Representatives, has been a leading voice on technology policy, and coauthored bipartisan legislation outlining a national strategy for artificial intelligence. “Will brings a rare combination of expertise—he deeply understands both artificial intelligence as well as public policy, both of which are critical to a successful future for AI,” said Sam Altman, OpenAI’s CEO. “I’m excited to join this thoughtful, values-driven company at the forefront of artificial intelligence research and deployment.”After two decades of service in government and U.S. national security, Will is currently a managing director at Allen & Company.
Learning What To Do by Simulating the Past

Preferences Implicit in the State of the World develops an algorithm, Reward Learning by Simulating the Past (RLSP), that does this sort of reasoning, allowing an agent to infer human preferences without explicit feedback. In our latest paper presented at ICLR 2021, we introduce Deep Reward Learning by Simulating the Past (Deep RLSP), an extension of the RLSP algorithm that can be scaled up to tasks like the balancing Cheetah task. To address this, we sample likely past trajectories, instead of enumerating all possible past trajectories. Finally, while we focused on imitation learning in this project, Deep RLSP is also very promising for learning safety constraints, such as “don’t break the vase”. This post is based on the paper “Learning What To Do by Simulating the Past”, presented at ICLR 2021.
Creating an end-to-end application for orchestrating custom deep learning HPO, training, and inference using AWS Step Functions

User requests go through Amazon API Gateway to Step Functions, which is responsible for orchestrating the training or HPO (Step 3). Create a step for HPO and training in Step FunctionsTraining a model for inference using Step Functions requires multiple steps:Create a training job. We orchestrate both model training and HPO using Step Functions. Finally, we have states for updating status to ERROR (one for HPO and another one for model training). We learned how to orchestrate training, HPO, and endpoint creation using Step Functions.
Shapash 1.3.2, announcing new features for more auditable AI

Shapash 1.3.2, announcing new features for more auditable AIShapash is a Python library released by MAIF data team in January 2021 to make Machine Learning models understandable by everyone. Version 1.3.2 is now available and Shapash allows the Data Scientist to document each model he releases into production. The Shapash Report :The Shapash Report is a “Standalone” HTML document generated with a line of code by the Data Scientist when he deploys his model to production. An open tool: Each organization can adapt the content of its Shapash Report. Also, to facilitate the navigation, the Shapash Report offers a “Table of contents” section as well as different menu buttons.
Industrial Motor Fault Classification using Deep Learning with IoT Implications

MethodologyThe scope of this experiment is to assess the feasibility of developing a machine learning model capable of classifying different motor operating states. This function would ensure that the machine learning model does not skew the final results. This conversion is achieved by transforming the data using a FFT. Building a Deep Learning ModelTo begin this section, I will reference a useful infographic on the relationship between artificial intelligence, machine learning, and deep learning. I made very little adjustments to the learning model; the most significant gains came from data processing.
Why Should I Even Bother With Quantum Machine Learning?

Why Should I Even Bother With Quantum Machine Learning? This post is part of the book: Hands-On Quantum Machine Learning With Python. But machine learning algorithms have become increasingly hard to train. So when we look at the representation, current machine learning algorithms, such as the Generative Pre-trained Transformer 3 (GPT-3) network, published in 2020, come to mind. Yet, the complexity of current machine learning models becomes a problem.
EmbedRank: Simple Unsupervised Keyphrase Extraction using Sentence Embeddings

BackgroundKeyword/Keyphrase extraction is the task of extracting important words that are relevant to the underlying document. Now, these could be either abstractive (relevant keywords from outside of the written text) or extractive (relevant keywords present in the written text) in nature. Most of the current keyword extraction systems have limitations in terms of speed and generating some-what irrelevant and redundant keywords for the document. Sent2Vec defines sentence embeddings as the average of the context word embeddings present in the sentence, wherein context word embeddings are not just restricted to unigrams but extended to n-grams present in each sentence as well. Intuitively, w represents word vectors for word concept and D — document vector represents the concept of a document.
Could An AI Ever Have Emotions?

Could an AI Ever Have Emotions? Emotions, Feelings and AGI. First up, what would emotions look like in a machine? Thus, throwing it back to earlier topics, feelings = emotions + qualia. Emotions can be thought of as the physiological responses to stimuli, while feelings are your perception of these emotions.
Neural Networks for Real-Time Audio: WaveNet

Image by authorThis is the second of a four-part series on using neural networks for real-time audio. For training the neural network, we will examine the code from the “model.py” python file, which defines the WaveNet class implementation. The base WaveNet class, which inherits PyTorch’s nn.Module , is shown below:The forward() function is where the audio processing occurs. The WaveNet class is initialized with several parameters defining the size and structure of the neural net. The basic goal is to recreate the forward() function from the PyTorch WaveNet class in high performance c++ code.
The Euclidean Distance Is Just the Pythagorean Theorem

The Euclidean Distance Is Just the Pythagorean TheoremHave you ever wondered why the Euclidean distance is defined the way it is? To see why the magnitude is defined this way, we shall go back to the basics: the Pythagorean theorem. The Pythagorean theorem. If we apply this to a two-dimensional vector x = (x₁, x₂), we can see that the Pythagorean theorem gives its magnitude! Applying the Pythagorean theorem once again yields the magnitudeThis is precisely what is going on in the general n-dimensional case.
You Don’t Really Need Another MOOC

You Don’t Really Need Another MOOCDon’t get me wrong, I love MOOCs. But most of the time, we don’t really need it. We don’t need to know Python generators before developing your first Python program. We don’t need to know Spark before processing 2 million rows of data. Just like we don’t need to add functionality until it’s required, we don’t need to do every new MOOC or read every new textbook.
Modelling Physiological Signal Estimation as a Deep Learning Problem

Modelling Physiological Signal Estimation as a Deep Learning ProblemPhoto by Ryan Stone on UnsplashLet's start with a small exercise. light reflected from the surface of an object (which is received at the sensor), as a linear combination of two major components. Modelling the reflected light C(t) as a combination of its constituent terms. It helps to think of each such pixel to be an independent sensor for the task of extracting the pulsatile signal, p(t). Quantifying the colour signal C(t) received at the kth pixel as a simplification.
Strong Learners vs. Weak Learners in Ensemble Learning

Tutorial OverviewThis tutorial is divided into three parts; they are:Weak Learners Strong Learners Weak vs. Strong Learners and BoostingWeak LearnersA weak classifier is a model for binary classification that performs slightly better than random guessing. It is used as a weak learner so often that decision stump and weak learner are practically synonyms. Strong learning is what we seek, and we can contrast their capability with weak learners, although we can also construct strong learners from weak learners. Weak vs. Strong Learners and BoostingWe have established that weak learners perform slightly better than random, and that strong learners are good or even near-optimal and it is the latter that we seek for a predictive modeling project. Since strong learners are desirable yet difficult to get, while weak learners are easy to obtain in real practice, this result opens a promising direction of generating strong learners by ensemble methods.
What is MLOps and Why We Should Care

He was calling for people to watch one of his latest presentation videos on the topic of MLOps and asking for opinions and spreadings. Andrew also stated that big data problems where there’s a long tail of rare events in the input (web search, self-driving cars, recommender systems) are also small data problems. He clarified that doing research on boosting modeling technologies is a great thing, but we should recognize and make contributions to the importance of data quality as well. Data is Food for AIHence, he promotes the idea of MLOps which helps ensure consistently high-quality data. Image by AuthorBasically, the MLOps team will keep inspecting the process and analyze the possibilities of improving the label/data consistency and quality during the training and deploying phases.
The 26 Most Important Data Terms You Need To Know

Gartner says that data fabric enables friction-less access and sharing of data in a distributed data environment. Business Intelligence Terminology5. Business Intelligence is the discipline of analyzing and transforming data to extract valuable business insights to enable decision-making. Data mining is a process of extracting and discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems. Data Science is the discipline of applying advanced analytics techniques to extract valuable information from data for business decision-making and strategic planning. It brings together fields such as data mining, statistics, mathematics, machine learning, data visualization, and software programming.
Beginner guide to Variational Autoencoders (VAE) with PyTorch Lightning (Part 2)

In this section, we will be discussing PyTorch Lightning (PL), why it is useful, and how we can use it to build our VAE. Put simply, PyTorch lightning is an add-on to PyTorch which makes training models much simpler. In addition to streamlining code, Pytorch Lightning also contains many useful features such as automatic learning rate finding. Let's look at how to translate Pytorch Code into Pytorch Lightning. PyTorch Lightning will then automatically obtain the data from the respective Dataloaders and use it for model training.
Practical Classification Metrics

A test-set with a 50:50 proportion will have higher scores in general as compared to a test-set with a 10:90 proportion for the primary class. But, if in a practical setting you expect the imbalance to be present, PRC metrics might be more interpretable. In general, it is better to treat the secondary class as an alternate class and adopt the precision-coverage metrics, as shown below instead. Precision-Coverage for both Primary and Secondary Class (PPV, NPV)Precision-Coverage for both the primary and secondary class is also a useful metric. The same metrics can be computed in an alternate way which will help to expand per-class precision-coverage metrics to a multi-class setting.
Weekly review of Reinforcement Learning papers #7

A learning gap between neuroscience and reinforcement learning. The model-based reinforcement learning framework no longer needs to be proven. It is a machine learning library for reinforcement learning based on models in continuous state-action spaces (we can regret that we cannot use discrete spaces). Paper 4: Evolving Reinforcement Learning AlgorithmsCo-Reyes, J. D., Miao, Y., Peng, D., Real, E., Levine, S., Le, Q. V., … & Faust, A. The starting point is that all reinforcement learning algorithms can be represented as a graph.
Customer Satisfaction Prediction Using Machine Learning

I did a univariate analysis on the timestamps after extracting attributes like a month, year, day, day of week e.t.c. The data given is of 699 days and the timestamp between which data is collected is 2016–10–04 09:43:32 - 2018–09–03 17:40:06 . Image by Paritosh MahtoThe evolution of the total orders received is shown above, the maximum number of orders are received in 201711. Image by Paritosh Mahto6.f RFM AnalysisFor the given data of customers, I did an RFM analysis on this data.RFM analysis is basically a data-driven customer behaviour segmentation technique.RFM stands for recency, frequency, and monetary value. After merging, data cleaning, and data analysis of data we will get the final data which can be used further for preprocessing and feature extraction.
Similarity Encoding for Dirty Categories Using dirty_cat

This is somewhat similar to one-hot encoding, but similarity encoding takes the values between 0 and 1 instead of taking binary values of 0 or 1. One-hot Encoding vs Similarity EncodingNow let’s compare the performance of the one-hot encoding and the similarity encoding in predicting the employees’ salaries. To do that, we will apply two different encoding methods, one-hot encoding and similarity encoding, to the employee_position_title column. Now we will create two pipelines, one using one-hot encoding and another one using similarity encoding to encode the employee_position_title column. The R² score of the pipeline using similarity encoding is 0.059 higher than the pipeline using one-hot encoding.
Stop using SMOTE to handle all your Imbalanced Data

In case of imbalance class problems, the model is trained mainly on the majority class and the model becomes biased towards the majority class prediction. Handling class balancing techniques can be broadly classified into two categories:Over-sampling techniques: Oversampling techniques refer to create artificial minority class points. Under-sampling techniques: Undersampling techniques refer to remove majority class points. SMOTE stands for Synthetic Minority Oversampling Technique, is an oversampling technique that creates synthetic minority class data points to balance the dataset. Repeat step 3 for all minority data points and their k neighbors, till the data is balanced.
Building a Convolutional VAE in PyTorch

While the capability and success of Generative Adversarial Networks (GANs) in content generation, we often overlooked another type of generative network: variational autoencoder (VAE). The MNIST contains 60000 training images and 10000 testing images, showing handwritten numerical characters from 0 to 9. Our VAE structure is shown as the above figure, which comprises an encoder, decoder, with the latent representation reparameterized in between. We can build the aforementioned components of the VAE structure with PyTorch as the following:TrainingLoss FunctionOne of the core concepts of the VAE is its loss function designed. Thus, the VAE loss is the combination of :Binary Cross Entropy (BCE) Loss — This calculates the pixel-to-pixel difference of the reconstructed image with the original image to maximise the similarity of reconstruction.
Tune your deep learning models to the highest accuracy on Amazon SageMaker with complete visibility and control

If you tune your model and select accuracy, you could end up with 99% accuracy, but still catch zero cases of fraud. When constructing your hyperparameter tuning jobs, make sure to pick a tuning metric that makes the most sense for your use case. Training job as an AWS constructOn AWS we love providing customers with elastic compute. In machine learning, a big way this becomes valuable is for running training jobs. Note — you can really pass in any hyperparameters you want, just make sure your training script can read through from the argparser.
May Edition: Questions on Explainable AI

May Edition: Questions on Explainable AIAs machine learning models penetrate almost every area of knowledge, actually understanding what ML systems do starts to seem problematic. TDS Editors Just now·3 min readMany papers, blogs, and software tools present explainability and interpretability in a quasi-mathematical way, but… is there a canonical definition of what interpretability and explainability mean? Machine learning algorithms definitely can’t be left to themselves, running in the wild. The question is, how can we, as human beings, understand algorithms that surpass human performance? So, for this Monthly Edition, we decided to highlight some of the best blogs and podcasts that TDS authors want you to know about.
Predicting Fake News using NLP and Machine Learning | Scikit-Learn | GloVe | Keras | LSTM

Next, I looked at the count of the sentences by each target category as follows:Evidently, fake articles have a lot of outliers but 75% of the fake articles have the number of sentences lower than the 50% of the genuine news articles. It is seen that, on average, fake articles are wordier than genuine ones. In the box plot, it is evident that the average word length is higher in the fake articles. POS Tag CountsNext, I tried to look at the part-of-speech (POS) combinations in Fake vs Genuine articles. Apart from that, all other POS types are almost equal in fake and genuine articles.
CitySpire

You could take hours googling different websites or you can checkout CitySpire. Solution: CitySpire’s goal is to be a one-stop resource for users to receive the most accurate city information. Crime Rate per 1,000 inhabitants: To summarize crime data, I used FBI crime reporting standard of reporting crime per 1000 residents. Team Overview:I worked for 8 weeks (part time) as a data scientist and machine learning engineer on a remote interdisciplinary team to create CitySpire. CitySpire — Team GI worked with an amazing team.
3x times faster Pandas with PyPolars

In this article, we will discuss the implementation and usage of the PyPolars library and compare its performance with Pandas library. PyPolars is an open-source Python data frame library similar to Pandas. Ideally, PyPolars is used when the data is too big for Pandas and too small for SparkHow PyPolars Works? The best part of the PyPolars library is its API similarity to Pandas, which makes it easier for the developers. (Image by Author), Benchmark Time Number for Pandas and Py-Polars basic operationsFor the above benchmark time numbers for some basic operations using Pandas and PyPolars library, we can observe that PyPolars is almost 2x to 3x faster than Pandas.
How Does It Feel to Make a Career Change to Become a Data Scientist?

However, the number of aspiring data scientists is also increasing rapidly. There is a ton of resources, articles, and blog posts that explain the required and on-demand skills expected from a data scientist. If you are a data scientist, I don’t think there will ever be a time when you don’t have anything new to learn. I find this a fantastic reason to become a data scientist. The job satisfaction of working as a data scientist was so attractive that I did not care about such small issues.
These Trusted Mental Models Will Make You a More Intelligent Data Scientist Immediately

Without mental models, data scientists would be unable to draw meaningful conclusions from data. When it comes to learning data science, a study of mental models must be a part of the foundational study because these models will dictate the impact of the future data scientist. All data scientists can agree on one thing, and that’s that learning and doing data science is hard. Using a mental model is good — developing a latticework of mental models is better. In other words, using one mental model is good — developing a latticework of mental models is better.
Discover Hidden Trip Themes from GPS Data with Topic Modeling

Discover Hidden Trip Themes from GPS Data with Topic ModelingVisualizing LDA extracted trip topics, figure by authorEvery text is produced by an author whose utterances, units of discourse exerted with intent, are preserved in writing. Unsupervised machine learning methods like topic modeling allow us to extract underlying cultural themes from large volumes of text data. Beyond text data, topic modeling has been used to find patterns in image data, social network data, and even genetic data. Preparing GPS trip data for LDAIn order to perform topic modeling, the e-scooter trip data must be prepared the same way text data is prepared as an input to LDA. Document-term matrix in terms of GPS data: a trip-point matrix, figure by authorTo implement topic modeling on the e-scooter trip data, we leverage scikit-learn, a popular machine learning library.
Machine Learning in KNIME with PyCaret

Machine Learning in KNIME with PyCaretPyCaret is an open-source Python library and KNIME is an open-source data analytics platformPyCaretPyCaret is an open-source, low-code machine learning library and end-to-end model management tool built-in Python for automating machine learning workflows. PyCaret — An open-source, low-code machine learning library in PythonKNIMEKNIME Analytics Platform is open-source software for creating data science. The first one being the KNIME Analytics Platform which is a desktop software that you can download from here. Open the Anaconda prompt and run the following commands:# create a conda environmentconda create --name knimeenv python=3.6# activate environmentconda activate knimeenv# install pycaretpip install pycaretNow open the KNIME Analytics Platform and go to File → Install KNIME Extensions → KNIME & Extensions → and select KNIME Python Extension and install it. Python setup in KNIME Analytics Platform?We are ready nowClick on “New KNIME Workflow” and a blank canvas will open.

SGD on two data batches in distributed training. 1. shows an example of performing gradient descent on two batches of data on a single node sequentially. 2. shows an example of performing gradient descent with the same batch size but on two training nodes concurrently. The authors proposed a method called AdaSum to add the gradient vectors when they are are orthogonal and perform arithmetic average when the direction of gradient vectors are in parallel. The AdaSum operator is implemented and ready to use in the Horovod distributed training library.
Is My Cat Really a Dog?

So, Is my Cat a Cat? That left 15 images incorrectly classified as dogs. Taking a closer look at some of the images that were incorrectly classified, I found it difficult to draw any conclusions. Even if my cat was mistaken for a dog more than I expected, she was still correctly classified in the vast majority of cases. By manually cropping some of the incorrectly classified images, I was able to filter the results for further accuracy.
Deep Learning vs GBDT model on tabular data — with code snippet

Also, regarding the model explainability, whereas the XGBoost model used all 115 features, the TabNet model is only using 16 features (the pre-trained model used only 4 features). Considering the above 2 points, we would consider that the XGBoost model is superior to other deep learning models in this case. Furthermore, we compared their explainability by seeing the feature importance list of the XGBoost model and TabNet model. The XGBoost model’s feature importance list was somewhat more understandable and expected, on the other hand, the TabNet model’s one was less intuitive. From this simple experiment, we confirm that although improvement of deep learning models in recent years is impressive and definitely state-of-the-art, on tabular data, GBDT models are still as good as those deep learning models and sometimes even better than them, especially when we would like to deploy a machine learning model in the real-life business.
This AI ‘Knows’ What a Painting Feels Like — Meet ArtEmis’ Neural Speakers

ARTIFICIAL INTELLIGENCEThis AI ‘Knows’ What a Painting Feels Like — Meet ArtEmis’ Neural SpeakersThe AI explained why Van Gogh’s Starry Night may evoke awe: “The blue and white colors of this paintings make me feel like I am looking at a dream.” — WikiArt (Public Domain)AI has accomplished incredible feats in recent years. From beating world-class champions at Go and Dota, to drive cars autonomously, to find new exoplanets. Everything seems to be within reach for AI, even those areas that better express what it means to be human, such as art and creativity. As they explain in their paper, published in arXiv, they’ve created an AI system that can recognize and explain what emotions a painting evokes. The artist expresses in forms and colors what words can’t express, only for an observer to get aroused by the message.
DINO and PAWS: Advancing the state of the art in computer vision

But our work with DINO shows highly accurate segmentation may actually be solvable with nothing more than self-supervised learning and a suitable architecture. By using self-supervised learning with Transformers, DINO opens the door to building machines that understand images and video much more deeply. Self-supervised learning with Vision TransformersTransformers have produced state-of-the-art results in many areas of artificial intelligence, including NLP and speech. Learn MoreThese self-attention maps for selected heads were generated using DINO with videos of a horse, a BMX rider, a puppy, and a fishing boat. The benefits of high-performance self-supervised computer vision modelsThe need for human annotation is usually a bottleneck in the development of computer vision systems.
Introducing hierarchical deletion to easily clean up unused resources in Amazon Forecast

Amazon Forecast just launched the ability to hierarchically delete resources at a parent level without having to locate the child resources. Previously, it was difficult to delete resources while building your forecasting system because you had to delete the child resources first, and then delete the parent resources. Delete a forecast resourceFor a forecast resource without child resources, the following dialog is displayed. When a forecast resource has underlying child resources such as forecast export jobs, the following dialog is displayed. Delete dataset import job, predictor backtest export job, or forecast export job resourcesThe dataset import job, predictor backtest export job, and forecast export job resources don’t have any child resources.
9 Discord Servers for Math, Python, and Data Science You Need to Join Today

9 Discord Servers for Math, Python, and Data Science You Need to Join TodayPhoto by Alexander Shatov on UnsplashEarlier this month, I wrote an article about the importance of having a community to support and inspire you during your data science learning journey. №1: MathematicsOne of the fundamental steps when you begin your data science learning journey is gaining a solid understanding of mathematics. Math is also the reason some may decide not to go into data science in the first place. №7: Artificial IntelligenceData science is a broad field; the term data science is used to describe any technology that includes interacting with data. №9: Data ScienceLast but not least is the Data Science Discord Server — another great option for a place to discuss data science topics, recent trends, and research advancements.
3 Python Pandas Tricks for Efficient Data Analysis

3 Python Pandas Tricks for Efficient Data AnalysisPhoto by Nick Fewings on UnsplashPandas is one of the predominant data analysis tools which is highly appreciated among data scientists. In this article, we will go over 3 pandas tricks that I think will make you a more happy pandas user. Thus, we start by creating a data frame to wok on. Then, we create a data frame with 3 columns for each store. A = pd.DataFrame({"date": days,"store": "A","sales": np.random.randint(100, 200, size=10)}) B = pd.DataFrame({"date": days,"store": "B","sales": np.random.randint(100, 200, size=10)}) C = pd.DataFrame({"date": days,"store": "C","sales": np.random.randint(100, 200, size=10)})We now combine these 3 data frames with the concat function.
Deep In Singular Value Decomposition

More On Singular Value DecompositionFor minimizing the dimensions of features, SVD is easily the most popular method of choice for Data Scientists. This is because SVD is easily the most versatile and venerable method of decomposition at our disposal. Why not use Random Projection? That in mind, there really is not a good reason to use Random Projection instead most of the time, and a risk of using Random Projection is the potential of your data’s dimensions not being high-dimensional enough. Finding The SVD Of A MatrixIn order to figure out the singular value decomposition of a matrix, we must first figure out what exactly we are trying to calculate.
Saturn Cloud vs Google Colab for Data Science and Machine Learning

In this example, I will replicate an interactive dashboard from the Voila Gallery using both Google Colab and Saturn Cloud. On the other hand, the deployment of a dashboard on Saturn Cloud is relatively straightforward, requiring only five clicks because Saturn Cloud has already taken care of the heavy lifting related to deployment. A code block written in NumPy arrays on Google Colab Free A code block written in Dask arrays on Google Colab Free A code block written in Dask arrays on Saturn Cloud with a Dask cluster. For intermediate to advanced data science practitioners who are looking for a complete solution to deploy data science solutions efficiently, it is worth considering Saturn Cloud. In conclusion, Google Colab is great for personal small-scale data science projects, while Saturn Cloud is the winner for scalable data science.
Modelling Binary Logistic Regression using Tidymodels Library in R (Part-1)

Modelling Binary Logistic Regression using Tidymodels Library in R (Part-1)Photo by Sharon McCutcheon on UnsplashIn the supervised machine learning world, there are two types of algorithmic tasks often performed. The Pima Indian Diabetes 2 data set is the refined version (all missing values were assigned as NA) of the Pima Indian diabetes data. Loading Libraries and DatasetsStep1: At first we need to install the following packages using install.packages( ) function and loading them using library( ) function. Diabetes$diabetes <- relevel(Diabetes$diabetes, ref = "pos") levels(Diabetesdiabetes)Train and Test SplitThe whole data set generally split into 75% train and 25% test data set (general rule of thumb). Step 1: call the model function: here we called logistic_reg( ) as we want to fit a logistic regression model. Build Your Own Movie Recommender System Using BERT4Rec In this post, we will implement a simple but powerful recommendation system called BERT4Rec: Sequential Recommendation with BidirectionalEncoder Representations from Transformer. We will apply this model to movie recommendations on a database of around 60,000 movies. The first step is to construct the user’s history in the form of a time-sorted list of movies. Most of them are from the Marvel universe, just like the user’s history. ConclusionIn this project, we built a powerful movie recommendation system called BERT4Rec. Deploy a Fake News Generator to the Web With Anvil Deploy a Fake News Generator to the Web With AnvilYou’ve probably heard of Streamlit and Dash, two Python-centric ways of making web apps. If you click on the Code button at the top of your form, you will see the client-side form code. Web servers are usually limited on RAM and CPU, and you don’t want your web app slowing down because of your model. This shows you how we can install and connect to your Anvil app from our Colab notebook. Through this article you saw how to deploy a machine learning model to the web with Anvil. Google AI Blog: Flexible, Scalable, Differentiable Simulation of Recommender Systems with RecSim NG For data-driven simulation, RecSim NG makes it easy to implement various model-learning algorithms, such as expectation-maximization (EM), generative adversarial training, etc. This allows for a simulation model to be integrated directly into the full data-science and model-development workflow. A user’s utility for any selected item is simply their affinity for the item, perturbed with Gaussian noise. This gradual concentration of available items around “mainstream” content providers has a negative impact on overall user utility over time. More sophisticated algorithms that compute policies that explicitly maximize long-term user utility are discussed in this ICML-20 paper. A Gentle Introduction to Mixture of Experts Ensembles In this tutorial, you will discover the mixture of experts approach to ensemble learning. Tutorial OverviewThis tutorial is divided into three parts; they are:Subtasks and Experts Mixture of Experts Subtasks Expert Models Gating Model Pooling Method Relationship With Other Techniques Mixture of Experts and Decision Trees Mixture of Experts and StackingSubtasks and ExpertsSome predictive modeling tasks are remarkably complex, although they may be suited to a natural division into subtasks. This is called the gating model, or the gating network, given that it is traditionally a neural network model. Mixture of Experts and Decision TreesWe can also see a relationship between a mixture of experts to Classification And Regression Trees, often referred to as CART. PapersBooksArticlesSummaryIn this tutorial, you discovered mixture of experts approach to ensemble learning. Annotate dense point cloud data using SageMaker Ground Truth Like in the signal processing domain, point cloud downsampling approaches attempt to remove points while preserving the fidelity of the original point cloud. In this case, the point cloud data is in xyzrgb format, an accepted format for a Ground Truth point cloud. For more information about the data types allowed in a Ground Truth point cloud, see Accepted Raw 3D Data Formats. Labels in the downsampled point cloud (like cuboids) are directly applicable to the larger point cloud because they’re defined in a world coordinate space shared by the full-size point cloud (x, y, z, height, width, length). To learn more about the input format of Ground Truth as it relates to point cloud data, see Input Data and Accepted Raw 3D Data Formats. Translate All: Automating multiple file type batch translation with AWS CloudFormation You can learn more about and connect with AWS Machine Learning Heroes at the community page. On July 29, 2020, AWS announced that Amazon Translate now supports Microsoft Office documents, including .docx, .xlsx, and .pptx. The new support for Office documents in Amazon Translate is really great news for teachers like me. Still, we have to sort the documents by their file types and call Amazon Translate separately for different file types. However, when I start the Amazon Translate job on the console, I have to choose the file content type. AutoNLP: Automatic Text Classification with SOTA Models AutoNLP: Automatic Text Classification with SOTA ModelsFigure 1. AutoNLP | Image by author | Icon taken from freepickDeveloping an end-to-end Natural Language Processing model is not an easy task. In the next section, will see how the experience was like from start to finish when creating a text classification model using AutoNLP. Installing autonlp | Image by authorWe are also going to require installing Git Large File Storage (Git LFS). Project information | Image by authorAs can be seen in the previous image, each of the models launched has finished satisfactorily (remember that we only launched 5 models). Introducing Azure Databricks for Data Science Azure Databricks Architecture - Custom ImageThe core of the Azure Databricks architecture is a Databricks runtime engine, it has optimized Spark offering, Delta Lake, and Databricks I/O for Optimized Data Access Layer engine. It also provides native integration capabilities with different Azure data services, such as Azure Data Factory and Synapse Analytics. Azure Databricks in ActionAssuming you have the Azure Databricks workspace created in your subscription. In subsequent posts, or later, I will touch upon another aspect regarding Azure Databricks Service. References[1] Microsoft Documentation | Azure Databricks Service[2] Azure Databricks | Getting Started[3] Azure Databricks Notebooks ICLR 2021 — A selection of 10 papers you shouldn’t miss ❓Why → A new straightforward approach to entity retrieval that quite surprisingly shatters some existing benchmarks. ?Key insights → Entity retrieval is the tasks of finding the precise exact entity that natural language refers to (which can be ambigous at times). Authors’ TL;DR →We propose adaptive federated optimization techniques, and highlight their improved performance over popular methods such as FedAvg. ❓Why → To make federated learning widespread, federated optimizers must become boring, just like ADAM¹¹ is in 2021. ?Key insights → Federated learning is an ML paradigm where a central model, hosted by the server, is trained by multiple clients in a distributed fashion. Understanding Google’s Switch Transformer Switch TransformerThe Switch Transformer is a switch feed-forward neural network (FFN) layer that replaces the standard FFN layer in the transformer architecture. They found the following:After 100,000 steps, the Switch Transformer model has a greater negative loss perplexity than the FLOP-matched T5 equivalent. A 7x speedup was observed between the T5-Base model and the 64 expert Switch Transformer model. These scaling results show that, for any available increase in computation, a larger Switch Transformer model will outperform a larger dense model. The Switch Transformer models outperform the FLOP-matched equivalent T5 models in all tasks, with the exception of ARC. Four Deep Learning Papers to Read in May 2021 Four Deep Learning Papers to Read in May 2021Welcome to the end of April edition of the ‚Machine-Learning-Collage‘ series, where I provide an overview of the different Deep Learning research streams. Thereby, I hope to give you a visual and intuitive deep dive into some of the coolest trends. So without further ado: Here are my four favourite papers that I read in April 2021 and why I believe them to be important for the future of Deep Learning. Unlike many other curriculum learning approaches, we are always assured that the goal state must be reachable. Additionally, in order to stabilise the learning dynamics, OpenAI recommends a PPO-style clipping of the policy update. Breadth vs Depth OpinionBreadth vs DepthI’ve seen pretty conflicting advice about whether it’s better for a data scientist to specialize or be a generalist. What’s better for a data science career? Photo by Mert Talay on UnsplashWhat does it mean to specializeFirst, let’s be clear on what I mean by specializing. They list pretty similar skills (the standard data science stack). ConclusionThere is no one-size-fits-all answer to the question of whether it’s better to specialize or be a generalist in data science. AWS Elastic Beanstalk App AWS Elastic Beanstalk AppChicago, IL (image taken from triplepundit.com)In a NutshellWorked for two months in a team of web developers and data scientists on inherited project by collecting and cleaning a new data, performing data analysis and feature engineering, creating and adding new features to the project using machine learning modeling techniques, delivering them through FastAPI endpoints, and eventually deploying the app to AWS Elastic Beanstalk with AWS Relational Database Services. The app has to present the important city data in very intuitive and easy to understand interface. An inherited app had some API endpoints such as crime rates, walk score, pollution, population, rental prices, air quality, city recommendations and livability score. We had to get a housing data, weather data, jobs listings data and school data per city and state and start cleaning and feature engineering it. Data ModelingI’ve decided to make two features, one with the weather temperature forecasting and another one with the weather conditions (sunny, rainy, cloudy, snowy days) per year. Creating, editing, and merging ONNX pipelines. We are happy to share sclblonnx , a python package that enables easy editing and augmenting of ONNX graphs. Thus, internally we are developing (and continuously trying to improve) a higher level API for the manipulation of ONNX graphs. Basic usageIn its bare essence, the sclblonnx package provides a number of high-level utility functions to deal with ONNX graphs. g = so.add_output(g, 'sum', "FLOAT", [1])That’s it really, we have just created our first functioning ONNX graph. Finally, we can store the model:so.graph_to_file(g, "filename.onnx")Another example: merging two existing graphsPerhaps more useful than creating ONNX graph to add two numbers from scratch, is merging two existing — potentially complex — ONNX graphs; the merging of two or more graphs is how one creates a pipeline. Image Captions with Attention in Tensorflow, Step-by-step We then input these encoded image features, rather than the raw images themselves, to our Image Caption model. For our Image Caption model, we need only the image feature maps, and do not need the Classifier. Image Caption Model with AttentionThe model consists of four logical components:Encoder : since the image encoding has already been done by the pre-trained Inception model, the Encoder here is very simple. How does Attention enhance the Image Caption performanceThe earliest Image Caption architectures included the other three components without Attention. We walked through an end-to-end example of Image Captions using the Encoder-Decoder architecture with Attention. Gentle introduction to 2D Hand Pose Estimation: Let’s Code It! Gentle introduction to 2D Hand Pose Estimation: Let’s Code It! Train images are used primarily for training, validation — to control validation loss and decide when to stop model training, and test — to do a final model evaluation. Model Class. Dataset class is not hard to write when you follow the rules:Your Dataset class inherits (subclasses) torch.utils.data.DatasetYou need to rewrite function __len__(), which returns the length of the dataset. So, your Dataset class for the FreiHAND dataset should look something like this. Stepping into the magical world of GANs Stepping into the magical world of GANsA generative model can potentially do magic, if trained properly it may write poetry, generate music, draw images like an expert. Once the training is done, the discriminator is removed and the generator is used to produce the samples. Class ‘0’ or fake class coming from the generator and class ‘1’ coming from the real images in this case MNIST. The discriminator wants the real class to be classified as real and fake to be classified as fake and the generator wants the fake class to be classified as real. Fig 3: Images produced by Generator (Image Source: Author)Conclusion:GAN is one of the coolest additions with a lot of potential and active development, this primer is just to get you started. Autoencoder For Denoising Images Autoencoder For Denoising ImagesImage by Cara Shelton on UnsplashIn this post, you will learn how autoencoders work and why they are used for denoising medical images. The so-called autoencoder technique has proven to be very useful for denoising images. How Autoencoders WorkThe network is provided with original images x, as well as their noisy version x~. How To Denoise With AutoencodersNow we can use the trained autoencoder to clean unseen noisy input images and plot them against their cleaned version. The white dots which were introduced artificially on the input images have disappeared from the cleaned images. Handling Missing Values in Pandas Example — 1:We can use the value parameter to specify by which value we want to fill the missing elements. In the following example, we are specifying value=0 So it will fill all the missing elements with 0. Parameters Used:value = 0pd.fillna( ) with value=0Python Implementation:Python CodeOutputC. Example — 2:We can also specify different values to fill the missing elements for different columns by using the value parameter. If there are multiple consecutive missing elements, they will get filled by the last valid observation. If there are multiple consecutive missing elements, they will get filled by the next valid observation. Google AI Blog: Model-Based RL for Decentralized Multi-agent Navigation In “Model-based Reinforcement Learning for Decentralized Multiagent Rendezvous”, presented at CoRL 2020, we propose an holistic approach to address the challenges of the decentralized rendezvous task, which we call hierarchical predictive planning (HPP). This is a decentralized, model-based reinforcement learning (RL) system that enables agents to align their goals on the fly in the real world. Putting Together Prediction, Planning and ControlAkin to a standard navigation pipeline, our learning-based system consists of three modules: prediction, planning, and control. Finally, the agent’s planner selects a goal for itself from this new belief distribution and passes this goal to the agent’s control module. Evaluation environments, each of which are independent of the training environment for the agent’s control policy and prediction modules. How Artificial Intelligence Got Real For a very long time, artificial intelligence was a thought experiment—a sci-fi trope. A good place to start is Mike Ferguson’s new series on artificial general intelligence, where he patiently walks us through decades of philosophy, neuroscience, and adjacent disciplines that have theorized about humans’ potential to create AGI. think of AI as a developer-centered field, but it’s through design, Amanda argues, that people experience its benefits and power. Meanwhile, Cassie Kozyrkov reminds us that none of AI’s rewards come risk-free—especially when you think you’ve nailed it and that the system you’ve created is as good as it’s ever going to get. Data cleaning is another area where AI might make data analysts’ lives a lot easier in the near future, but we’re still not quite there. 3 Beginner Mistakes I’ve Made in My Data Science Career Believing Complex Algorithms Always Result in Better Solutions“So what are the characteristics of these clustered residents?” my manager asked. We had used the most advanced, recently released model to segment the residents of a smart city. The whole model was a black box, so we have no idea how it does the segmentation but gave the highest accurate clusters. Whenever I have been presented with a problem to solve, my brain is used to thinking of neural networks and complex algorithms. If not, level up to a slightly complex one while accounting for the trade-off in interpretability and operational costs. End-to-end Computer Vision Pipeline in 5 Minutes End-to-end Computer Vision Pipeline in 5 MinutesWriting a complete yet quick Computer Vision pipeline for prototyping a product or as a building block to construct a more complex system is becoming increasingly important. Here, we are going to discuss how you can do so in 5 minutes or less using the popular TorchVision library. Go to the PyTorch installation page, copy and paste the written command line into your terminal, for example:conda install pytorch torchvision torchaudio -c pytorch2. MNIST dataset (source: tensorflow.org)First, we are going to apply a transformation to the dataset so that they can fit well into our upcoming AlexNet model (ie. import torchfrom torchvision import datasets, transforms, modelsfrom torch.autograd import Variable transform = transforms.Compose([transforms.Resize(224),transforms.ToTensor(),])Then, we download both the training and testing set by passing the above transform object. How to Build Better Machine Learning Models We already talked about Random Search, but in case you want to see an example of pruning you could take a look at the TensorFlow Model Optimization Pruning Guide. In this case, Early Stopping would just stop training when it reaches the red box (for demonstration) and would straight up prevent overfitting. It (Early stopping) is such a simple and efficient regularization technique that Geoffrey Hinton called it a “beautiful free lunch”. — Hands-On Machine Learning with Scikit-Learn and TensorFlow by Aurelien GeronAdapted from Lutz PrecheltHowever, for some cases, you would not end up with such straightforward choices for identifying the criterion or knowing when Early Stopping should stop training the model. We also specify that it should stop training if it does not see noticeable improvements in loss values in 3 epochs. The Top AI Trends and Challenges Impacting the Media, Advertising, and Entertainment Industries I recently interviewed some of the top data science leaders from Comcast/Freewheel, Condé Nast, ViacomCBS, Audoir, USA Today Network, and Samba TV on the biggest trends, challenges, and opportunities they see for ML & AI in media, advertising, & entertainment — and what the future may hold. What are some of the biggest trends you’ll see being adopted by the entertainment and media industries? Also solutions like Federated machine learning that are more compliant in terms of user privacy.”What will be the biggest challenges for those that already have dedicated data science teams? I expect consumers will want the ability to provide instant feedback to creators and to get instant, interactive recommendations based on what they feel like watching,” says Christopher Whitely. That was a snippet of insights that our speakers will discuss at our upcoming DSS Virtual: Media, Advertising & Entertainment next week. How to write the perfect Data Science CV How to write the perfect Data Science CVWriting a good CV can be one of the toughest challenges of job searching. I always thought a nicely designed CV is not that important — we aren’t designers so no one expects a polished CV from a Data Scientist, right? Awesome-CV is a beautiful CV template in which you only need to change the content. Less is MorePhoto by Prateek Katyal on UnsplashMany applicants write a CV that spans over multiple pages. Important keywords for a Big Data Engineer position are Big Data, Hadoop, Spark, Redshift, etc. Spotify Artist Recommender That shows us show similar the artist is with the other artists using “cosine similarity”. I opened up Spotify and listened to the top 3 which are over 93% cosine similar and I think it did a pretty good job. As long as you can structure your data into a pivot table this will work. Let’s say you have a web site and you want to recommend either pages or documents to your users. You could create a pivot table with the page name or document title as the index (rows) and you could maybe create a list of “tags” as the columns. Seismic Fault Prediction with Deep Learning Source: AuthorAn example of seismic image (Source: Force Competition, see Reference)So whose fault is it? Fault Mapping and PredictionWith the advancement in Deep Neural Network, it might be possible to train seismic images to create a model that may be able identify Faults in the seismic data. In this article, I would like to walk you through a Deep Learning Framework that can predict the Faults from the seismic data. Data ExplorationThe data for this study comes in a SEG-Y format which is industry standard for seismic data (SEG stands for Society of Exploration Geophysicists). Furthermore, the padding needs to be set to 1 to make sure the final output shape matches with input shape. Transformer Networks: A mathematical explanation why scaling the dot products leads to more stable gradients Transformer Networks: A mathematical explanation why scaling the dot products leads to more stable gradientsThe main purpose of the self-attention mechanism used in transformer networks is to generate word embeddings which take the context of the surrounding words into account. But why are the dot products scaled with √64 before they’re fed into the softmax function? Assume, using backpropagation, we have computed the gradients at the output of the softmax function, as depicted in the figure below:(Image by author)Next, we want to backpropagate through the softmax function and obtain the gradients at the input. In the transformer network the inputs to the softmax function consist of dot products between key vectors and query vectors. The larger the dimension ? of the key vectors and query vectors, the larger the dot products will tend to be. A Comparison of Synthetic vs Human Labeled Dataset to Train a UNet Segmentation Model Visualizing the benchmark dataset (Image by Author)ExperimentsMeasuring ResultsThe models created in each experiment are evaluated against an independent benchmark dataset designed to accurately represent the real world context of the model. In contrast with that, the training dataset is designed to make the model as good as it can be. We believe that this may be an important distinction, which warrants the extra overhead of maintaining the benchmark dataset separately. ) learn.fit_one_cycle(10, slice(1e-2, 1e-3))learn.save('model1-a-0')learn.fit_one_cycle(10, slice(1e-2, 1e-3))learn.fit_one_cycle(10, slice(1e-5, 1e-7))learn.fit_one_cycle(10, slice(1e-6, 1e-7))learn.save('model1-a-1')The synthetic dataset takes longer for each epoch than the traditionally labeled dataset. On the left, the images were run through the model created with synthetic data only, while the right is the model from our manually labeled dataset. Machine Learning Model Interpretation Machine Learning Model InterpretationInterpreting a machine learning model is a difficult task because we need to understand how a model works in the backend, what all parameters the model uses, and how the model is generating the prediction. There are different python libraries that we can use to create machine learning model visualizations and analyze who the model is working. Staker is an open-source python library that enables machine learning model interpretations for different types of black-box models. It helps us create different types of visualization, making it easier to understand how a model is working. To interpret the model using Skater we first need to create a model. Geometric foundations of Deep Learning Geometric Deep Learning is an umbrella term we introduced in [5] referring to recent attempts to come up with a geometric unification of ML similar to Klein’s Erlangen Programme. Geometric Deep Learning blueprint. The “5G” of Geometric Deep Learning: Grids, Group (homogeneous spaces with global symmetries), Graphs (and sets as a particular case), and Manifolds, where geometric priors are manifested through global isometry invariance (which can be expressed using Geodesics) and local Gauge symmetries. In future posts, we will be exploring in further detail the instances of the Geometric Deep Learning blueprint on the “5G” [15]. This was one of the primary motivations for studying multi-layer architectures [19–20], which had ultimately led to deep learning. Growing and Pruning Ensembles in Python This may involve growing an ensemble from available models or pruning members from a fully defined ensemble. Now that we are familiar with ensemble selection methods, let’s explore how we might implement ensemble pruning and ensemble growing in scikit-learn. Download Your FREE Mini-CourseBaseline Models and VotingBefore we dive into developing growing and pruning ensembles, let’s first establish a dataset and baseline. # get a list of models to evaluate def get_models(): models = list() models.append(('lr', LogisticRegression())) models.append(('knn', KNeighborsClassifier())) models.append(('tree', DecisionTreeClassifier())) models.append(('nb', GaussianNB())) models.append(('svm', SVC(probability=True))) return models 1 2 3 4 5 6 7 8 9 # get a list of models to evaluate def get_models ( ) : models = list ( ) models . We can then set the type of voting to perform via the “voting” argument, which in this case is set to “soft.”... # create the ensemble ensemble = VotingClassifier(estimators=models, voting='soft') 1 2 3 . Google AI Blog: Holistic Video Scene Understanding with ViP-DeepLab Driven by the potential value of a model that predicts depth and video panoptic segmentation at the same time, we present “ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation”, accepted to CVPR 2021. In this work, we propose a new task, depth-aware video panoptic segmentation, that aims to simultaneously tackle monocular depth estimation and video panoptic segmentation. For the new task, we present two derived datasets accompanied by a new evaluation metric called depth-aware video panoptic quality (DVPQ). Finally, we also present two new datasets for the new task, depth-aware video panoptic segmentation, and test ViP-DeepLab on them. ConclusionWith a simple architecture, ViP-DeepLab achieves state-of-the-art performance on video panoptic segmentation, monocular depth estimation, and multi-object tracking and segmentation. Announcing the AWS DeepComposer Chartbusters challenges 2021 season launch We’re back with two new challenges for the AWS DeepComposer Chartbusters 2021 season! Compete in the Melody-Go-Round challengeYou can compete in the AWS DeepComposer Chartbusters Melody-Go-Round challenge in just a few simple steps:In the AWS DeepComposer Music studio, record a track, import a track, or pick any of the available input tracks. For more information on judging criteria, visit AWS DeepComposer Melody-Go-Round page. For more information on datasets and judging criteria visit AWS DeepComposer Melody Harvest page. To learn more about the different generative AI techniques supported by AWS DeepComposer, check out the learning capsules available on the AWS DeepComposer console. AWS DeepRacer device software now open source Today, we’re expanding AWS DeepRacer’s ability to provide fun, hands-on learning by open-sourcing the AWS DeepRacer device software. Now that the AWS DeepRacer device software is openly available, anyone with the car and an idea can make new uses for their device a reality. We’ve compiled 6 sample projects from the AWS DeepRacer team and members of the global AWS DeepRacer community to help you get started exploring the possibilities that open source provides. Purchase an AWS DeepRacer car today to start experimenting with your first AWS DeepRacer robotics project today! We are offering a 25% discount on the AWS DeepRacer (100 off) and AWS DeepRacer Evo (150 off) till May 27th, 2021. Intelligent governance of document processing pipelines for regulated industries It consists of several components including data quality, data catalog, data ownership, data lineage, operation, and compliance. In this post, we discuss data catalog, data ownership, and data lineage, and how they tie together with building document processing pipelines for regulated industries. It’s imperative for a document processing pipeline to have a well-defined data lineage framework. Yes An operational query on a document ID to determine where in the pipeline the current document processing is. After the document metadata is registered with Metadata Services, the DynamoDB Document Registration stream is invoked to start the Document Classification Lambda function. Data Science 101 Table of ContentsIntroduction Problem Statement Data Collection Exploratory Data Analysis Feature Engineering Model Comparison Results Discussion Summary ReferencesIntroductionThere is a certain trend in all technical processes, and data science is no exception. Keep reading below, if you would like to know more about the six steps of the data science process. To build a data science model or utilize a machine learning algorithm, you will need to understand what the problem is. Regarding the holistic data science process described in this article, the data collection process is perhaps the furthest removed step from academia to professional environments. There is much to discuss when incorporating a data science model into a company’s ecosystem. Easy MLOps with PyCaret + MLflow Easy MLOps with PyCaret + MLflowPhoto by Adi Goldstein on UnsplashPyCaretPyCaret is an open-source, low-code machine learning library and end-to-end model management tool built-in Python for automating machine learning workflows. PyCaret — An open-source, low-code machine learning library in PythonTo learn more about PyCaret, you can check out their GitHub. Data Preparation — Now it’s time to prepare the data model training. experimentation (model training, hyperparameter tuning, model ensembling, model selection, etc. ? Data PreparationCommon to all modules in PyCaret, the setup is the first and the only mandatory step in any machine learning experiment using PyCaret. Add Prediction Intervals to your Forecasting Model The prediction intervals are always associated with a percentage of tolerance which grants the uncertainty magnitude of our estimations. In this post, we introduce a simple yet effective methodology to make our model produce prediction intervals. The main advantage is that we can retrieve prediction intervals for every regression model completely for free. Only in this case, we have the guaranty to approximate unknown behaviors and grant a reliable uncertainty interpretation for our forecasting intervals. This latest aspect is analyzed deeper in the artificial examples below, where we try to build forecasting intervals in the presence of two different regimes. Recent Advances in Graph Convolutional Network (GCN) Recent Advances in Graph Convolutional Network (GCN)Graph Convolution Network (GCN) has risen in popularity due to its versatility in solving deeply interconnected real-world problems. In this post, we are going to highlight some of the advances in GCN architecture, in simpler terms…Photo by Alina Grubnyak on UnsplashTable of Content:GCN Building Blocks SAGEConv GINConv Graph Attention (GAT) NetworkGCN Building BlocksGCN combines the convolutional principle of the more traditional Convolutional Neural Network (CNN) into a graph data structure. Aggregate FunctionThe aggregate function is key in many of the GCN variants. The initial formulation of GCN above uses normalization as its aggregate function defined as follow. Here comes GINConv that feeds the aggregated feature matrices to a learnable Artificial Neural Network (ANN). Automate Hyperparameter Tuning for Multiple Models with Hyperopts Any ML optimization function will try to minimize the loss function defined on certain model parameters. The first step is to define a search space. For instance, I have defined the search space for 6 different algorithms I will be considering. Note: There is a specific notation to define search space. For instance, our model pipeline (see below) is tagged as ‘model,’ making our notation ‘model__param_name.’ Likewise, if you name your pipeline ‘clf,’ it will be ‘clf__param_name.’ The SageMaker Saga AWS Sagemaker Demo in 11 minutesLooking for a quick start on Sagemaker Console, check out this video on youtubeAmazon Sagemaker in 11 minutes | AWS by Anuj SyalExploring the Full Potential: SageMaker’s Features and CapabilitiesPrepareEven if you don’t have a labelled dataset, AWS Sagemaker allows you to take the help of mechanical Turks to label your dataset correctly. Amazon SageMaker includes hosted Jupyter notebooks that make it easy to explore and visualise your training data stored on Amazon S3. The next important characteristic also accounts as a formidable aspect in Amazon ML vs. SageMaker which deals with evaluation. Amazon SageMaker Model Monitor: This is used to detect quality deviations for deployed ML models. Amazon SageMaker Autopilot: It is used to build ML models automatically with full visibility and control. Introduction to Inductive Learning in Artificial Intelligence Introduction to Inductive Learning in Artificial IntelligenceMachine learning is one of the most important subfields of artificial intelligence. In the paper “Inductive learning for risk classification,” the authors discuss the application of inductive Learning to credit risk analysis, a similar domain application. Another example we can find in the paper “On the Application of Inductive Machine Learning Tools to Geographical Analysis” discusses the role of inductive machine learning in geographical analysis. The study, “Interactive Inductive Learning: Application in Domain of Education.” presents a technique for using Interactive inductive Learning supported by enterprise modeling as a way to support mechanisms that can help save time and effort in study course comparison. Inductive learning algorithms are domain-agnostic, and they can be applied in any task that requires classification or pattern recognition. LambdaNetworks: Efficient & accurate, but also accessible? A reproducibility project with CIFAR-10 Therefore, we started this reproducibility project wondering: Could lambda layers be scaled to mainstream computers while keeping their attractive properties? However, attention has some shortcomings that lambda layers aim to fix while, at the same time, slightly increasing the accuracy. Therefore, reproducing the lambda layers can contribute to the early future adoption of this potentially superior algorithm by the community. Eager to learn more about attention and thrilled by the potential of lambda layers, choosing this reproducibility project was an obvious choice for us. Furthermore, as can be seen from the lambda layer architecture, the output of the lambda layer has the same dimensions as its input. Deep Learning with Keras Cheat Sheet (2021), Python for Data Science Keras is a powerful and easy-to-use deep learning library for TensorFlow that provides high-level neural network APIs to develop and evaluate deep learning models. Check out the sections below to learn how to optimize Keras to create various deep learning models. Basic ExampleThe code below demonstrates the basic steps of using Keras to create and run a deep learning model on a set of data. The steps in the code include: loading the data, preprocessing the data, creating the model, adding layers to the model, fitting the model on the data, using the trained model to make predictions on the test set, and finally evaluating the performance of the model. A Summary of Active Learning Frameworks A Summary of Active Learning FrameworksPhoto by Sayan Nath on UnsplashTL;DRIf you are dealing with a classification task, I recommend the modAL. AL freamworks Comparison (Image by author)Active Learning ToolsActive learning could decrease the number of labels to save the annotation budget. In order to find a suitable active learning tool for the sequence labeling task, I have done a little survey of the OSS active learning tools. ALiPy provides a module based implementation of active learning framework, which allows users to conveniently evaluate, compare and analyze the performance of active learning methodsALiPy has a simple usage interface and supports more active learning algorithms than other tools. Local implementation (Image by author)Expected implementation (Image by danny911kr from AlpacaTag)Other Active Learning Tools What I learnt from my Data Science job What I learnt from my Data Science jobCredits — Instagram andrewtneelI recently started working as a Data Scientist and here are my thoughts after the first three months into my job. Machine Learning versus economic value- In school we take courses on different learning algorithms but not on their economic value. Active learning- Labelling data is expensive, while labelling the samples from which we will benefit the most is less expensive. There is no standard fairness/ethics system which every team has to comply withEveryone working in Machine Learning might already know this. But it is high time the community comes together to propose some basic fairness checks before deploying your ML models. The Rise of Cognitive AI In a similar fashion, Francois Chollet describes an emergent new phase in the progression of AI capabilities based on broad generalization (“Flexible AI”), capable of adaptation to unknown unknowns within a broad domain. I will use the term “Cognitive AI” to refer to this new phase of AI. Cognitive AI will bring an additional level of more sophisticated capabilities. We have established Cognitive Computing Research at Intel Labs to drive Intel’s innovation at the intersection of machine intelligence and cognition and address these emerging cognitive AI competencies. The nascent technologies for the next wave of machine learning and AI will create a new class of AI solutions with higher understanding and cognition. Understanding and Implementing Graph Neural Network Understanding and Implementing Graph Neural NetworkIntroduction to graph data structure, how to define your own simple graphs, and how to build a simple graph neural network Juan Nathaniel 1 day ago·4 min readSource: Image from UpsplashGraph Neural Networks (GNNs) are becoming increasingly popular for many prediction tasks where items are interrelated (eg. However, graph data structures may be more difficult to grasp compared to other commonly known deep learning data sources, such as images, text, and/or tables. At the end of this post, hopefully you would have been familiar with graph structures and how to implement your own GNNs using PyTorch Geometric (PyG) library. ConclusionThis post briefly summarize a simple graph structure and how it is constructed using PyG. Next we implement a simple GNN where we use the popular GNN’s variant called Graph Convolutional Network (GCN). Many AI systems can’t recognize both of these examples. SEER can. Training AI systems with curated and labeled data sets has produced specialized AI models that excel at tasks like object recognition. Preliminary evaluations show that SEER can outperform conventional computer vision systems in recognizing objects that, while representative of life for billions of people, are less represented in conventional image data sets used to train AI systems. We hope our work with SEER will help make AI work better for everyone, not just those who have typically benefitted the most. Testing AI with images from different regions across the globeWe tested SEER on images from the Dollar Street data set that we used in our 2019 study on biases in computer vision systems. The SEER results show exciting signs of how self-supervised learning could make AI work better for people across the world. Monitor and Manage Anomaly Detection Models on a fleet of Wind Turbines with Amazon SageMaker Edge Manager With such scale, energy companies need an efficient platform to manage and maintain their wind turbine fleets, and the ML models running on the devices. The mini wind turbineThe mini wind turbine of this project is a mechanical device integrated with a microcontroller (Arduino) and some sensors. If you want to know more about these technical details and learn how to build your own mini wind turbine, see the GitHub repository. Configure your edge deviceThe Edge Manager agent uses certificates provided by AWS IoT Core to authenticate and call other AWS services. Create the edge fleetTo create your edge fleet, complete the following steps:On the SageMaker console, under Edge Inference, choose Edge device fleets. Analyzing Music Taste Analyzing Music TastePhoto by Alexey Ruban on UnsplashI’m a big fan of music, and for about 8 years now, I’ve exclusively used Spotify’s platform for listening to music and podcasts, discovering new music, and collaborating on playlists with friends. After removing a few returned columns that weren’t as useful, we’re left with this:Dataframe with audio features. So I reached out to some friends, and had one agree to send me his version of the streaming data. This strategy will attempt to train a model to tell the difference between my music taste and the music taste from my friend, rather than the difference between “good” and “bad.”I followed the same steps above to convert the json dictionary into a dataframe that includes the audio features. I then re-fit the models comparing my streaming history to the new streaming history dataset. Trying to Solve CartPole without Googling Anything Trying to Solve CartPole without Googling AnythingWhat is CartPole? I did not hold myself to this standard since I tried to solve the whole thing without using any help. First I took random actions across 1000 simulations of cart pole (each simulation is called an episode). Run another episode where the neural net predicts how much longer it stays up, occasionally sampling a random action. Continue to update the neural net at the end of each episode, eventually taking a way the random action. Understanding PCA The Principal Component Analysis (PCA) is one of the most basic and useful methods for this purpose. First, we will get an intuitive understanding of PCA, and then we proceed with a more detailed mathematical treatment. PCA can be thought of as an iterative process that finds directions along which the variance of projected data is maximal. To see how it performs on real data, we will see how PCA performs on the famous Iris dataset! [[ 1.57502004e+02 -3.33667355e-15 2.52281147e-15 5.42949940e-16][-3.33667355e-15 9.03948536e+00 1.46458764e-15 1.37986097e-16][ 2.52281147e-15 1.46458764e-15 2.91330388e+00 1.97218052e-17][ 5.42949940e-16 1.37986097e-16 1.97218052e-17 8.87857213e-01]]What the PCA doesn’t doAlthough PCA is frequently used for feature engineering, there are limits on what it can do. Understanding and Implementing Graph Neural Network Understanding and Implementing Graph Neural NetworkIntroduction to graph data structure, how to define your own simple graphs, and how to build a simple graph neural network Juan Nathaniel 1 hour ago·4 min readSource: Image from UpsplashGraph Neural Networks (GNNs) are becoming increasingly popular for many prediction tasks where items are interrelated (eg. However, graph data structures may be more difficult to grasp compared to other commonly known deep learning data sources, such as images, text, and/or tables. At the end of this post, hopefully you would have been familiar with graph structures and how to implement your own GNNs using PyTorch Geometric (PyG) library. ConclusionThis post briefly summarize a simple graph structure and how it is constructed using PyG. Next we implement a simple GNN where we use the popular GNN’s variant called Graph Convolutional Network (GCN). The Secret Guide To Human-Like Text Summarization Photo by NeONBRAND on UnsplashThe Secret Guide To Human-Like Text SummarizationSummarization has become a very helpful way of tackling the issue of data overburden. T5 Text SummarizerYou can build simple yet incredibly powerful abstractive text summarizer using Google’ T5 pre-trained model. BONUS….T5 Text Summarizer PipelinesI have built a text summarizer pipelines that can extract text from PDF documents, summarize the text and store both the original text and the summary into a SQLite database and output the summary to a text file. Image by author: run summarization pipeline (BERT & T5) to summarize text data, save the summary to text file and store the summary to databaseNote: key in a ratio below ‘1.0’ (e.g. ‘0.5’) if you wish to shorten the text with BERT extractive summarization before running it through T5 summarization. Build better time-series models: introducing ByteHub Build better time-series models: introducing ByteHubData plumbing is essential to machine-learning. This article introduces ByteHub: an open-source feature store, designed to help data scientists build better models of time-series data, ready to be used for tasks like forecasting and anomaly detection. We’ll explore three ways in which the data science workflow can be improved when building time-series models, with examples to demonstrate how this works with ByteHub. An important task for any data scientist working with time-series models is to identify which data sources to use. ByteHub introduces a concept called feature transforms: snippets of Python code that can be applied to raw time-series data. 10 Quick Python Snippets That Can Help You Work with Dates Photo by Brad Neathery on UnsplashIf you use Python in your work, or simply for scripting, chances are that there will come a point in time when you are going to need to work with dates. Find the difference between 2 dates in monthsWe can also find the difference between the 2 dates in months quite quickly. For example, between October 2018 and November 2021, there isn’t just a month difference. If you want to convert those dates in the format that the majority of the world uses, here is how:10. Find the first day of the monthIn some companies, they pay their employees on the first day of every month. Different Ways To Master Quantum Machine Learning And these did not cover quantum machine learning but quantum computing in general. I really believe that painstakingly working everything out in small pieces made an impact on how I understand quantum machine learning. But quantum machine learning is taught the wrong wayWhen I started studying the quantum part of quantum machine learning, I took a deep dive into the theory and into math. We will not only implement different quantum machine learning algorithms, such as Quantum Naïve Bayes and Quantum Bayesian Networks. This book is your comprehensive guide to get started with Quantum Machine Learning”–the use of quantum computing for machine learning tasks. Implementation Differences in LSTM Layers: TensorFlow vs PyTorch But if you are trying to understand the implementation differences when using LSTM layers, then I hope you already have a background in deep learning and know the fundamentals of LSTMs. Long Short-Term Memory (LSTM) networks are a specialized version of RNN that was introduced with the intention of preserving long-term ‘context’ information in sequences. 2) shows a typical LSTM layer. *tanh(cᵗ) is given out as the output of the LSTM layer at timestep t . Please note that both cᵗ⁻¹ and hᵗ⁻¹ are given as lateral inputs in an LSTM layer, compared to the vanilla RNN layer where only aᵗ⁻¹ is given. Transfer learning applied on the Unsplash data using the Alexnet pre-trained network from MATLAB Transfer learning applied on the Unsplash data using the Alexnet pre-trained network from MATLABTransfer learning using the pre-trained deep learning networks from MATLAB can be easily implemented to achieve fast and impressive results Utpal Kumar 2 days ago·5 min readI obtained the image data from Unsplash. I downloaded 42 cat images, 46 dog images, and 35 horse images for the input into the pre-trained Alexnet model in MATLAB. One can take a pre-trained network and use it as a starting point to learn a new task. We follow the standard operations to augment the training images — randomly flip the training images along the vertical axis, randomly translate them up to 30 pixels horizontally and vertically. YValidation = imdsValidation.Labels;accuracy = mean(YPred == YValidation) % accuracy = 0.9189ConclusionsI used the pre-trained Alexnet network from MATLAB to fine-tune it with the Unsplash data. Dynamic Ensemble Selection (DES) for Classification in Python Perhaps the canonical approach to dynamic ensemble selection is the k-Nearest Neighbor Oracle, or KNORA, algorithm as it is a natural extension of the canonical dynamic classifier selection algorithm “Dynamic Classifier Selection Local Accuracy,” or DCS-LA. — From Dynamic Classifier Selection To Dynamic Ensemble Selection, 2008. — From Dynamic Classifier Selection To Dynamic Ensemble Selection, 2008. k-Nearest Neighbor Oracle (KNORA) With Scikit-LearnThe Dynamic Ensemble Library, or DESlib for short, is a Python machine learning library that provides an implementation of many different dynamic classifiers and dynamic ensemble selection algorithms. fit ( X_train , y_train ) # define the KNORA-U model model = KNORAU ( pool_classifiers = pool ) # fit the model model . Geopandas Hands-on: Geospatial Relations and Operations Geopandas Hands-on: Geospatial Relations and OperationsPart 1: Introduction to geospatial concepts (follow here)Part 2: Geospatial visualization and geometry creation (follow here)Part 3: Geospatial operations (this post)Part 4: Building geospatial machine learning pipeline (coming soon)In the two previous posts we have discussed the basics of geopandas and how to visualize geospatial dataset. Here, we are going to introduce the basic concepts of geospatial operations including those of relations and spatial joins. Table of Content:Recap Geospatial relations Spatial joinsRecapBefore we proceed with the tutorial, ensure that we restore the previous posts’ geodataframes. A. IntersectsSuppose we ask the following question: does the buffer region intersect with Brooklyn borough in New York? Spatial joinsAnother important geospatial operations that we could perform with geopandas is spatial joins. Shogun: An Underrated Python Machine-learning Package Using ShogunIn order to use the Shogun package, we of course will need to install it with PIP. A really cool alternative to this in the Shogun package is CHAID, or Chi-squared Automatic Interaction Detector, tree. from sklearn.preprocessing import OrdinalEncoderfrom sklearn.model_selection import train_test_splitNow it was time to prepare my data, put it into new Shogun types, and then pass it into my model. ft = [1]Now we can initialize our new CHAID tree:classifier = CHAIDTree(0, ft, 10)classifier.set_labels(labels_train)And then train and predict:classifier.train(features_train)labels_predict = classifier.apply_multiclass(features_test)ConclusionShogun is a pretty awesome package for machine-learning in Python. Regardless, I am excited to take this knowledge with this package back into C++ and have a little fun with this package. The Intuitive Explanation of Logistic Regression As such I’ve put together a very intuitive explanation of the why, what, and how of logistic regression. You can visit this post to learn about Simple Linear Regression & this one for Multiple Linear Regression. Now knowing a bit about linear regression; you’d know that the linear regression output is equatable to the equation of a line. Building Our First Logistic Regression ModelLet’s go ahead and jump into building our own logistic regression model. ConclusionIf you’ve made it this far then hopefully you’ve learned a thing or two about logistic regression and will feel comfortable building, interpreting, & communicating your own logistic regression models. The power of democracy in Feature Selection We weight each directed edge between a candidate x a candidate y by the difference k(x)-k(y) in a weighted majority graph. In this example, a and b go to the second round, and b wins; let’s look at the majority graph. Many Condorcet methods may require different levels of knowledge on the majority graph. Image by authorAnd have this ranking A ≥ D ≥ C ≥ B that have 2 disagreements because B beats C in the majority and A beats C in the majority. A candidate is said to be proper if for each other candidate b, b is better than him for at least 3/4 of the electors or b is worst than him for at least 3/4 of the electors. Data Science Survival Guide for Non-Technical Colleagues EvaluationAfter you manage to gather a suitable dataset, the data scientist can start training and evaluating the system. The way this works is that the dataset is split into training and testing set. As expected, the training set is used to train the system, and the testing set to evaluate it. Do not hesitate to ask the data scientist to explain any given metrics, he or she will be happy to do so. This can all be reflected in the used metrics, and therefore, it is a good practice to agree on the used metrics in the beginning, the same way you agreed on the inputs and outputs. When “TOPS” are Misleading When “TOPS” are MisleadingNeural accelerators are often characterized with the performance feature “TOPS” — Trillion operations per second. To be able to compare the different NPU architectures in a simple way, the metric “Trillion Operations per Second” (TOPS) was created. One with a broad focus on industrial use and “low TOPS”, and the other on high-speed image analysis with “high TOPS”. 2 Inference with MobileNetV1 (224x224) on the i.MX 8M Plus NPU [image by author]There are other neural accelerators marked with nominal higher TOPS. — ComparisonComparing NXP’s i.MX 8M Plus with Gyrfalcon’s Lightspeeur 2803s, it seems obvious that the Lightspeeur is clearly superior to the 8M Plus in TOPS. Prepare data for your ML models in the fastest and easiest way with Amazon SageMaker Data Wrangler — a visualisation and data preparation tool Prepare data for your ML models in the fastest and easiest way with Amazon SageMaker Data Wrangler — a visualisation and data preparation toolSummaryAmazon SageMaker Data Wrangler is a new service announced back in December 2020 aiming to simplify the process of data preparation and feature engineering for machine learning. We will import our data in the Data Wrangler environment and will explore its capabilities of data processing and visualisation. To get started, open SageMaker Studio, since this is how we will be accessing the graphical user interface (GUI) for Data Wrangler. Step 3 — Transforming dataOne of the benefits of Data Wrangler is the collection of available transformations that you can do to your data out of the box. First make sure you save your Data Wrangler flow, if you want to keep it, and then shut down the instance as in the screenshot below. Build a Jina neural search with Streamlit I’m going to walk through how to use Jina’s new Streamlit component to search text or images to build a neural search front end. Check out our text search app or image search app, and here’s the component’s repo. Why use Jina to build a neural search? Building a Streamlit component helps the data scientists, machine learning enthusiasts, and all the other developers in the Streamlit community build cool stuff powered by neural search. For image search, simply swap out the text code above for our image example code and run a Jina image (like our Pokemon example. Gentle introduction to 2D Hand Pose Estimation: Approach Explained Gentle introduction to 2D Hand Pose Estimation: Approach ExplainedImage by AuthorIn 2018 I spent 6 months working on my master’s thesis on Hand Pose Estimation. Image by AuthorA typical 2D hand pose estimator looks something like this:Input : hand image. Image by AuthorIn this tutorial, we will learn how to estimate 2D hand pose from a single RGB image. Estimating pose with heatmaps is a widely used approach in 2D Hand (and Human) Pose estimation, and you’ll see it in literally any paper (with slight modifications). Hope, now the 2D Hand Pose Estimation is much more clear to you. 5 Deep Learning Trends Leading Artificial Intelligence to the Next Stage I’ve already subtly mentioned some differences between DL systems and the human brain. The final purpose of AI was to build an electronic brain that could simulate ours, an artificial general intelligence (some call it strong AI). Artificial neurons are the exact opposite. Another shortcoming of artificial neurons is their simplicity. Planning requires us to decompose complex tasks into sub-tasks but this ability is beyond what DL systems can do today. Deep Learning For Audio With The Speech Commands Dataset While most people know of applications of deep learning to images or text, deep learning can also be useful for a variety of tasks on audio data! However, here at Aquarium, we have some customers working on some pretty interesting applications with audio data. We recently held a 3-day hackathon, so I decided to try doing deep learning on audio for myself. In this blog post, we’re going to train a very simple model on the Speech Commands audio dataset. I’ve found some things to be fairly helpful while doing deep learning that weren’t included in the sample code and I would generally recommend for most people getting started with a deep learning project. Manipulation of Information Space for Continual Learning Manipulation of Information Space for Continual LearningLet’s start with some questions:Let’s assume that I have two concepts such as chocolate chip cookies and brownies. This post is about the representation of information and the implementation of the manipulation of information. We need to be able to manipulate this representative information space. As we can optimize the proxies and manipulate the information space, we can also add new vectors to the information space so that we can introduce new classes to the model for it to be able to predict. After we enlarged our proxy space properly, we can still enhance our proxy space by optimizing this new and bigger set of proxies to be as decomposed as possible from each other. Correlation Vs Causation Correlation Vs CausationPhoto by Benjamin Behre on UnsplashCorrelation is also known as an association. We need to understand that correlation is a numerical quantity that quantifies the strength of the relationship between two different things. Therefore, it is important to understand while predicting the difference between causation and correlation between the feature and the target variable. I hope that after reading this you get a clear understanding of causation and correlation. You can view my Github profile for different data science projects and packages tutorials. K-Means Clustering — A Comprehensive Guide to Its Successful Use in Python K-Means Clustering in action. For this, you run K-Means clustering multiple times, trying a different number of clusters each time, and record the value of WCSS (Within Cluster Sum of Squares). 3 clusters:Australian city clustering: 3 clusters. 4 clusters:Australian city clustering: 4 clusters. In the end, whatever your goal is, I hope this story helped you to get a better understanding of K-Means clustering and that you will feel confident to try it out yourself. 7 Tools Used By Data Scientists to Increase Efficiency 7 Tools Used By Data Scientists to Increase EfficiencyPhoto by Alvaro Reyes on UnsplashDuring the progress of any data science project, most data scientists tend to utilize tools and gadgets that would help them reach their goals faster and more efficiently. This article will take a look through the data science tools catalog and talk about 7 of the most used tools by data scientists today. One of the main steps in any data science project is data analysis. Data visualization is an essential part of any data science project; it can make or break your project. That’s why many tools have been developed to help data scientists efficiently complete their projects without wasting any time or brain-power on routine tasks. How to Run 40 Regression Models with a Few Lines of Code MACHINE LEARNINGHow to Run 40 Regression Models with a Few Lines of CodeImage by Malte Helmhold. Most of you probably don’t even know that there are ten regression models out there. Don’t worry if you don’t know because, by the end of this article, you will be able to run not only ten machine learning regression models but over 40! Lazy Predict helps build dozens of models without much code and helps understand which models works better without any parameter tuning. Regression Project with Lazy PredictFirst of all, to install Lazy Predict, you can copy and paste pip install lazypredict to your terminal. Hands-on PostgreSQL: Basic Queries Hands-on PostgreSQL: Basic QueriesPhoto by Nam Anh on UnsplashSQL provides numerous functions and methods to manage data stored in tabular form. I previously wrote an article as an introduction to PostgreSQL that explains how to setup a PostgreSQL database and create tables. A more practical way is to copy data from a csv file. I have a csv file that contains some of the columns from the Melbourne housing dataset available on Kaggle. The last two lines are about the characteristics of the csv file. Using Machine Learning to Generate Image Captions For the images, we need to convert them into a fixed-size vector using the Inception V3 model as described earlier. Line 56–63: Save the extracted features to diskNow we won’t predict our caption all at once that is we won’t just give the computer the image and ask it to generate a caption for it. What we would do is give it the image’s feature vector and also the first word of the caption and let it predict the second word. Then we give it the first two words and let it predict the third word. Let us consider the image given in the dataset section and the caption ‘A girl going into a wooden building’. PyTest for Machine Learning — a simple example-based tutorial IntroductionWhy do you need software testing for data science and machine learning? A good software testing strategy can help offset this trade-off. An introductory level tutorial is here,And a more comprehensive and slightly more advanced (dealing with design patterns) is the following one,However, I felt a lack of dedicated tutorials highly focused on using PyTest for machine learning modules with clear examples. This is a scenario that any Python ML practitioner is highly likely to encounter sooner or later. The GitHub repo for this example is here: PyTest ML repo. Role of AI in Achieving Sustainable Development Goals Role of AI in Achieving Sustainable Development GoalsPS. Table of contentsWhat are Sustainable Development Goals (SDGs) AI and environmental outcomes AI and societal outcomes AI and economic outcomes ConclusionPhoto by Joao Vitor Marcilio on UnsplashWhat are Sustainable Development Goals (SDGs)SDGs are a collection of 17 interlinked global goals designed as a “blueprint to achieve a better and more sustainable future for all”. Let us consider the potential upsides and risks of AI to environmental, societal, and economic outcomes. AI and environmental outcomesThere is a growing number of AI applications in the environmental sector, including those within the energy (eg. However, some of these AI technologies can be computationally expensive. An introduction to Information Theory for Data Science Information theory and set theoryThere is a link between Shannon’s measure of information and set theory, which in a very practical way allows us to reason using set theory and make use of Venn diagrams to visually represent formulas. Theoretical justificationI do not present here the theoretical justification of the association between information theory and set theory. The interested reader can refer to chapter 6 of A First Course in Information Theory by Raymond T. Weung. Visualization of formulasFrom examining the link between information theory and set theory, we can come to the conclusion that it is possible to represent information theory formulas visually by Venn diagrams. Mutual information between two random variables (image by author)When the variables X and Y are perfectly « correlated », then the two disks X and Y are entirely overlapping, and so H(X,Y)=H(X)=H(Y). Google AI Blog: HDR+ with Bracketing on Pixel Phones One such improvement (launched on Pixel 5 and Pixel 4a 5G in October) is a feature that operates “under the hood”, HDR+ with Bracketing. This approach, known as exposure bracketing, can deliver the best of both worlds, but it is time-consuming to do by hand. It is also challenging in computational photography because it requires:Capturing additional long exposure frames while maintaining the fast, predictable capture experience of the Pixel camera. Taking advantage of long exposure frames while avoiding ghosting artifacts caused by motion between frames. Bottom: HDR+ with Bracketing captures five short exposures before the shutter press and one long exposure after the shutter press. Build a medical sentence matching application using BERT and Amazon SageMaker For detecting medical entities such as medical conditions, medications, and other medical information in medical text, consider using Amazon Comprehend Medical, a HIPAA-eligible service built to extract medical information from unstructured medical text. Then we deploy the model using Amazon Elastic Container Service (Amazon ECS). You can solve such use cases by using Amazon Comprehend custom models. For more information, see Comprehend Custom and Building a custom classifier using Amazon Comprehend. AWS services such as Amazon Comprehend and Amazon Comprehend Medical use deep learning models to handle negation. Cognitive document processing for automated mortgage processing Quantiphi’s cognitive document processing solution combines state-of-the-art AI and ML services from AWS with Quantiphi’s custom document processing models to digitize a wide variety of mortgage documents. Quantiphi’s cognitive document processing solutionQuantiphi’s cognitive document processing solution works across all types of structured and unstructured mortgage documents. Quantiphi’s cognitive document processing solution is capable of achieving over 90% accuracy, provides substantial cost reductions, and facilitates better visibility of the mortgage processing workflow while assuring faster processing. SummaryTraditional methods of mortgage loan processing are manual in nature and highly time-consuming. Mortgage companies can use Quantiphi’s solution to increase their operational efficiency and significantly reduce their mortgage processing time. Are you sure you’re building predictive models? Are you sure you’re building predictive models? If you Google “how to build a predictive model”, you get tons of articles with themes like “Perfect way to build a Predictive Model in less than 10 minutes”. Predictive models attempt to capture associations precisely Explanatory models tend to be hypothesis-driven. If you’re still interested in building a predictive model, read on! You can for sure learn how to build predictive models by grabbing a dataset from somewhere (e.g. A Primer on the EM Algorithm Example EM model used for mixture models [1]The Expectation-Maximization (EM) algorithm is one of the main algorithms in machine learning for estimation of model parameters [2][3][4]. Unfortunately, the EM algorithm may not be as easy to understand to the beginner. The first step in the EM algorithm is to assume there are hidden variables Z that help generate the data. In the EM algorithm, Z is going to be estimated using X and θ so the conditional probability p(Z | X, θ) is of interest. Jensen’s Inequality is the second result needed to obtain the EM algorithm. Introducing OpenHAC— an open source toolkit for digital biomarker analysis and machine learning Human Activity ClassificationThrough OpenHAC it is also possible to create, analyze, and extract new digital biomarker features with it’s machine learning classification tools. OpenHAC uses the PyCaret library, powered by Scicit-Learn, to compare, create, save, load, and deploy machine learning models. Setup data and compare multiple modelsOpenHAC uses PyCaret’s functions to initialize a training environment that creates a pipeline. It can then train and evaluate performance of all estimators in the PyCaret library using cross validation. Choose a classifier and assess performanceOpenHAC can train and evaluate any of the models in PyCaret’s library. The Filter Function in Python filter functionThe filter function takes in two arguments: a function that returns True or False (checks for a specific condition) and the iterable object we want to apply it to (such as a list in this case). filter(function, iterable)The filter function takes each element from our list (or whatever iterable we pass in) and passes it in to the function we give it. If the function with that specific element as an argument returns True , the filter function will add that value to the filter object (that we can then create a list from just like we did with the map object returned by the map function). In other words, we can think of the filter function as filtering our list or sequence based on some condition. [x,y,z] → filter → [x (if f(x) returns True), y (if f(y) returns True), z (if f(z) returns True)] Baby’s First Algorithmic Sampling from a Distribution: Methods Available and How to Use Them Baby’s First Algorithmic Sampling from a Distribution: Methods Available and How to Use ThemTL;DRThis writeup includes descriptions from a recent paper on algorithmic sampling, to describe in simpler terms the motivation and approach for sampling using simple or Markov Chain Monte Carlo methods. Sampling as a conceptHere we refer to samples xᵢ from a distribution p(x) as single realizations whose probability distribution is p(x). If we use Monte Carlo methods for sampling (i.e. Monte Carlo (MC) methods are iterative methods of exploring (sampling with good coverage of) the domain space. There are 2 overarching categorizations for these methods: simple and Markov chain. The AI reliability paradox On a high-stakes task, the answer could be Ronnie Reliable… but perhaps not for the first reason that comes to mind. Supposing that the project is a wonderful idea that will make the world a better place if it’s done right, is Ronnie Reliable still the better choice? You trust Ronnie Reliable. Whether the task is handled by humans or machines, never underestimate the importance of safety nets. So, if you’re wise, you’ll opt for the best system but build safety nets as if it’s the worst system. What Can We Learn from Elon Musk’s Twitter Graph? What Can We Learn from Elon Musk’s Twitter Graph? Graph analysis is both an incredibly exciting and fast-developing field of data analytics. “Triangle counting gained popularity in social network analysis, where it is used to detect communities and measure the cohesiveness of those communities. The below-shown functions start from the “RootNode” Elon Musk and iterate over his follows, the follows of his follows, etc. The “Elon Musk Graph” can be visually explored (link), by simply clicking on the nodes — drag them around to organize them in a better way. Data Wrangling Solutions — Working With Dates — Part 2 Data Wrangling Solutions — Working With Dates — Part 2Photo by Steinar Engeland on UnsplashIn the last tutorial, we looked at the various approaches to read the data files containing the date-time variables. The data dictionary of this dummy dataset is as follows:release_date — Actual date column with first date value deleted . release_date_int — Another column containing date information but in an integer format , for example, date 2020–02–12 is present as 20200212 in YYYYMMDD format. — Another column containing date information but , for example, date release_date_text — Column containing dates in text format, and # as the separator . — Column containing only of the date data. Image Captions with Deep Learning: State-of-the-Art Architectures INTUITIVE IMAGE CAPTIONS SERIESImage Captions with Deep Learning: State-of-the-Art ArchitecturesPhoto by Brett Jordan on UnsplashImage Captioning is a fascinating application of deep learning that has made tremendous progress in recent years. Almost all Image Captioning architectures make use of this approach with the three components we’ve just seen. (Image by Author)Architecture: Multi-ModalThe Inject architecture was the original architecture for Image Captioning and is still very popular. It is therefore not surprising to find that Attention has also been applied to Image Captioning resulting in state-of-the-art results. (Image by Author)A few different variants of the Transformer architecture have been proposed to address the Image Captioning problem. A Learning Theoretic Perspective on Local Explainability In our work, we focus on a notion of interpretability that is based on the quality of local approximation explanations. In what follows, we’ll first provide a quick introduction to local explanations. Here, the “complexity” of the local explanations corresponds to how large of a local neighborhood the explanations span (the larger the neighborhood, the lower the complexity — see Fig 1 for a visualization). ReferencesJeffrey Li, Vaishnavh Nagarajan, Gregory Plumb, and Ameet Talwalkar, 2021, “A Learning Theoretic Perspective on Local Explainability“, ICLR 2021. Gregory Plumb, Denali Molitor and Ameet S. Talwalkar, 2018, “Model Agnostic Supervised Local Explanations“, NeurIPS 2018. Google AI Blog: Evolving Reinforcement Learning Algorithms However, because the RL algorithm taxonomy is quite large, and designing new RL algorithms requires extensive tuning and validation, this goal is a daunting one. This results in increasingly better RL algorithms, and the discovered algorithms generalize to more complex environments, even those with visual observations like Atari games. Evolving RL AlgorithmsWe use an evolutionary based approach to optimize the RL algorithms of interest. The computational graph formulation allows researchers to both build upon human-designed algorithms and study the learned algorithms using the same mathematical toolset as the existing algorithms. We analyzed a few of the learned algorithms and can interpret them as a form of entropy regularization to prevent value overestimation. Essence of Stacking Ensembles for Machine Learning In this tutorial, you will discover the essence of the stacked generalization approach to machine learning ensembles. Tutorial OverviewThis tutorial is divided into four parts; they are:Stacked Generalization Essence of Stacking Ensembles Stacking Ensemble Family Voting Ensembles Weighted Average Blending Ensemble Super Learner Ensemble Customized Stacking EnsemblesStacked GeneralizationStacked Generalization, or stacking for short, is an ensemble machine learning algorithm. Combine With Model: Machine learning model to combine predictions. Stacking Ensemble FamilyMany ensemble machine learning techniques may be considered precursors or descendants of stacking. Related TutorialsBooksSummaryIn this tutorial, you discovered the essence of the stacked generalization approach to machine learning ensembles. Build an event-based tracking solution using Amazon Lookout for Vision Amazon Lookout for Vision is a machine learning (ML) service that spots defects and anomalies in visual representations using computer vision (CV). In minutes, you can begin using Amazon Lookout for Vision to automate inspection of images and objects—with no ML expertise required. Setting up Amazon Connect and the associated contact flowTo configure Amazon Connect and the contact flow, you complete the following high-level steps:Create an Amazon Connect instance. Create an Amazon Connect instanceThe first step is to create an Amazon Connect instance. Build, train, and deploy the Amazon Lookout for Vision modelIn this section, we see how to build, train, and deploy the Amazon Lookout for Vision model using the open-source Python SDK. Quality Assessment for SageMaker Ground Truth Video Object Tracking Annotations using Statistical Analysis Computer vision annotation tools, like those available in Amazon SageMaker Ground Truth (Ground Truth), simplify the process of creating labels for computer vision algorithms and encourage best practices, resulting in high-quality labels. Ground Truth accomplishes this through annotation consolidation to get agreement on what the ground truth is based on multiple responses. SageMaker Ground Truth Ground Truth handles the scheduling of various annotation tasks and collecting the results. Frame 217 was flagged because there was a large difference between frame 216 and the subsequent frame, frame 217. For more information about labeling with Ground Truth, see Easily perform bulk label quality assurance using Amazon SageMaker Ground Truth. Amazon Forecast now provides estimated run time for forecast creation jobs, enabling you to manage your time efficiently Amazon Forecast now displays the estimated time it takes to complete an in-progress workflow for importing your data, training the predictor, and generating the forecast. You can now manage your time more efficiently and better plan for your next workflow around the estimated time remaining for your in-progress workflow. Additionally, the displayed estimated time to complete a workflow refreshes automatically, which provides better expectations and removes further frustration. After the import job is complete and the status becomes Active , the Actual import time shows the total time of the import. After the import job is complete and the status becomes Active , the Actual import time shows the total time for the predictor creation. Perform medical transcription analysis in real-time with AWS AI services and Twilio Media Streams In this post, we show you how to integrate Twilio Media Streams with Amazon Transcribe Medical and Amazon Comprehend Medical to transcribe and analyze data from phone calls. For non-healthcare industries, you can use this same solution with Amazon Transcribe and Amazon Comprehend. Amazon Transcribe Medical is an ML service that makes it easy to quickly create accurate transcriptions between patients and physicians. Amazon Transcribe Medical, Amazon Comprehend Medical, and Twilio Media Streams are all managed platforms. This application uses Amazon Transcribe Medical to transcribe media content in real time, and stores the output in Amazon S3 for further analysis. Securing Amazon SageMaker Studio internet traffic using AWS Network Firewall Network Firewall subnet – Contains a Network Firewall endpoint. The route tables are configured so that all inbound and outbound external network traffic is routed via Network Firewall. For details and reference network architectures with Network Firewall and NAT gateway, see Architecture with an internet gateway and a NAT gateway, Deployment models for AWS Network Firewall, and Enforce your AWS Network Firewall protections at scale with AWS Firewall Manager. Network Firewall loggingIn this section, you configure Network Firewall logging for your firewall’s stateful engine. For more information about creating and managing Network Firewall rule groups, see Rule groups in AWS Network Firewall. Conv1D and Conv2D: Did you realize that Conv1D Is a Subclass of Conv2D? However, while writing the report, I came up with the following questions: Is it possible to create a 2D CNN similar to a 1D CNN? If I define a two-layer 1D CNN with 32 and 64 filters, the parameters and shape would be as shown. The implementation is as follows:Conv2DIn conclusion, can we affirm that 1D CNNs are a subclass of 2D CNNs? As an illustrative example, we have this last 2D CNN whose number of parameters is similar to the 1D CNN implemented above. Nevertheless, this conclusion does not imply that we should always use 2D CNNs instead of 1D CNNs, since the latter ones are much more simple to implement when dealing, for example, with multi-channel time-series. 17 Clustering Algorithms Used In Data Science and Mining For instance, based on the area of overlap, exists two types of clustering:? Hard clustering: Clusters don’t overlap: k-means, k-means++. k-means clustering is adopted by various real-world businesses such as search engines (e.g., document clustering, clustering similar articles), customer segmentation, spam/ham detection system, academic performance, faults diagnostic systems, wireless communications, and many more. After that, it computes the probability for each data point by simply dividing the distance by the total distances. ? Estimate the overall kernel density function of the data space by adding the density functions of all data points. As always, you can use any illustration and other information from this post by citing it as:Mahmoud Harmouch, 17 clustering algorithms used in data science & mining, towards data science, April, 23, 2021. Getting Started with Albumentation: Deep Learning Image Augmentation Technique in PyTorch I have referred to the following notebook:Mounting Google Drive in Google ColabI persistently use Google Colab for easy/sharable notebook prototypes. A Street in Venice, ItalyOriginal TorchVision Data PipelineI normally create a Dataloader to process image data pipelines using PyTorch and Torchvision. image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)start_t = time.time()if self.transform:augmented = self.transform(image=image)image = augmented['image']total_time = (time.time() - start_t)return image, label, total_timeNow create a transform in Albumentation. Please refer to the official Albumentation website or Albumentation Github to apply the best suitable augmentation to your own needs! GitHub: Bengali.AI Handwritten Grapheme Classification CompetitionConclusionTo summarize, I gave an introduction to the Image Augmentation technique, the Albumentation library in Python, with example codes in the tutorial. Geopandas Hands-on: Geospatial Data Visualization and Intro to Geometry Geopandas Hands-on: Geospatial Data Visualization and Intro to GeometryPart 1: Introduction to geospatial concepts (link to post)Part 2: Geospatial visualization and geometry creation (this post)Part 3: Geospatial operations (coming soon)Part 4: Building geospatial machine learning pipeline (coming soon)In this post we are going to cover aspects of geospatial visualization and introduce you on how to create your own geospatial data using Geopandas. Table of Content:Recap Geospatial visualization Introduction to geometry and create your own geospatial dataRecapIn the previous post here, we discussed the basics of geopandas such as its underlying datatypes, the geometry column, and the built-in attributes that we can easily manipulate for our use. And now, we are going to dig deeper into geopandas such as how to generate pretty geospatial visualization and create your own geospatial data for subsequent usage in a machine learning pipeline. Geospatial visualizationWe are still going to use the New York Boroughs in-built dataset provided by geopandas. In the next post, we will dig deeper into how you can perform geospatial operations such as merge, aggregate, spatial joins, etc. Overview Of 4 Model Validation Approaches to Mitigate Overfitting Problem Overview Of 4 Model Validation Approaches to Mitigate Overfitting ProblemImage by Jacqueline Macou from PixabayWhy model validation is importantA model trained without validation might overfit the test data. This is likely to occur if we are working with only two sets of data, training and testing data. (Image by author)To mitigate overfitting the test data, you will need validation. It involves the use of three sets of data: the training data, validation data and the test data. In overall, we have N training processes, N validation processes and only 1 test process. Reframing — Representing problems in Machine Learning ReframingThe Reframing design pattern solves the challenge of posing an intuitive machine learning problem with a changed contextual output. For example, an intuitive regression problem can be reframed into a classification problem and vice versa. Alternatively, this can also be defined as a regression problem as the label(amount of rainfall) is a real number(e.g. Instead of narrowing down our predictions to a single real number, we relax our prediction target to be a discrete probability distribution. By AuthorSharper PDF points to a smaller standard deviation of the output distribution whereas a wider PDF indicates larger std. Changing Dynamics of Market Crises: A Review of Country, Sector, and Equity Behaviours Over the Past Two Decades Changing Dynamics of Market Crises: A Review of Country, Sector, and Equity Behaviours Over the Past Two DecadesIn this paper, we study the behaviours and similarity profile of country financial indices, sector financial indices and equities over the past 20 years. Next, we investigate the collective similarity of equity trajectories over time and demonstrate greater distance between trajectories during times of crisis. We wish to study how country, sector and equity behaviours have evolved over the past 20 years — especially during times of crisis. Sector similarity:The sector erratic behaviour dendrogram consists of 3 clearly separated clusters. The country erratic distance matrix norm is 12.2 and the sector erratic distance matrix norm is 17.0. Calculating the Business Value of a Data Science Project Calculating the Business Value of a Data Science ProjectPhoto by israel palacio on UnsplashThere is a big focus in data science on various performance metrics. But none of these tell you what stakeholders actually want to know: What business value does this have? At the end of the day, for a for-profit business, business value is monetary value. Business value = This is great news for data scientists: we love numbers. For example, in this scenario, false negatives cost more than false positives (100 for false positives vs 50 for false negatives). Phasic Policy Gradient (PPG) Part 1 Phasic Policy Gradient (PPG) Part 1This is Part 1 of a two part series discussing the theory behind the idea. Policy PhaseThe goal of the policy phase is to optimize the training of the policy network itself. Specifically, the policy network is trained using the following objective:PPO policy objective function [1]Where:Ratio between current policy and old policy before update [1]And A represents the advantage function. We don’t want to destroy the progress made during the policy phase during the auxiliary phase! Figure 2: PPG Algorithm [1]Figure 3: PPG default hyperparameters [1]We run the auxiliary phase every N runs of the policy phase. How to Efficiently Load Image Datasets into Colab from Github, Kaggle and Local Machine Google Colab is a free Jupyter notebook environment from Google whose runtime is hosted on virtual machines on the cloud. Understanding Colab’s file systemThe Colab notebooks you create are saved in your Google drive folder. Image by authorOpen a new Google Colab Notebook and follow the same steps described with the Github link above. From the drop-down list, click on ‘File Upload’ and browse to the zip file on your system then ‘Open’ it. Copy other datasets API by authorStep 5: Run this API command on Colab to download the dataBack to our Colab notebook, paste this API command to a new block. Time Series Anomaly Detection Figure 1: Anomaly Detection LSTM-VAE Model Architecture. SR-CNN is a novel algorithm that borrows SR model from visual saliency detection domain and applies it to time-series anomaly detection [3]. In the online compute module, anomaly detection processor calculates the anomaly status for incoming time-series signal online, while the alert processor sends out notifications if an anomaly occurs. We also looked at anomaly detection from a computer vision point of view via a clever combination of salience maps and convolutional neural networks. al, “Time-Series Anomaly Detection Service at Microsoft”, KDD 2019: https://arxiv.org/abs/1906.03821 Machine Learning Model Dashboard Machine Learning Model DashboardExplainer Dashboard(Source: By Author)Nowadays, creating a machine learning model is easy because of different python libraries that are in the market like sklearn, lazypredict, etc. Explainer dashboard is an open-source python library that creates machine learning model dashboards that can be used to easily understand and analyze the important factors on which the model is working like feature importance, model performance, visualizations, etc. In this article, we will use an explainer dashboard to create machine learning dashboards and understand how the model is working. pip install explainerdashboardImporting required librariesIn this step, we will import the required libraries and functions to create a machine learning model and dashboard. from sklearn.ensemble import RandomForestClassifierfrom explainerdashboard import ClassifierExplainer, ExplainerDashboardfrom explainerdashboard.datasets import titanic_survive, titanic_namesCreating the Model & DashboardThis is the final step in which we will create the machine learning model and then interpret that model by creating a dashboard. Google AI Blog: MaX-DeepLab: Dual-Path Transformers for End-to-End Panoptic Segmentation Panoptic segmentation is a computer vision task that unifies semantic segmentation (assigning a class label to each pixel) and instance segmentation (detecting and segmenting each object instance). An example image and its panoptic segmentation masks from the Cityscapes dataset. However, the training process still relies heavily on box detection, which does not align with the mask-based definition of panoptic segmentation. In “MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers”, to be presented at CVPR 2021, we propose the first fully end-to-end approach for the panoptic segmentation pipeline, directly predicting class-labeled masks by extending the Transformer architecture to this computer vision task. MaX-DeepLab is fully end-to-end: It predicts panoptic segmentation masks directly from images. Metrics, Metrics, and More Metrics, Plus: Our On-Demand Beginners’ Guide Is Here This edition of the Variable comes to you loaded, as always, with some of the best TDS reads of the past week. Before we get to them, though, we wanted to share a new resource we’ve just recently launched—our free, on-demand, email-based beginners’ guide. If you’re currently taking your first few steps in data science, you can sign up to receive a daily dose of beginner-friendly articles and practical tips. Ismael Kherroubi Garcia goes in a similar direction in his exploration of the meaning of data in data science. We’re as grateful as ever for your support, and for choosing to join us on your data science adventures. Supercharge Your Machine Learning Experiments with PyCaret and Gradio Supercharge Your Machine Learning Experiments with PyCaret and GradioPhoto by Hunter Harritt on Unsplash? IntroductionThis tutorial is a step-by-step, beginner-friendly explanation of how you can integrate PyCaret and Gradio, the two powerful open-source libraries in Python, and supercharge your machine learning experimentation within minutes. ? PyCaretPyCaret is an open-source, low-code machine learning library and end-to-end model management tool built-in Python for automating machine learning workflows. ? GradioGradio is an open-source Python library for creating customizable UI components around your machine learning models. # install slim version (default)pip install pycaret # install the full versionpip install pycaret[full]When you install the full version of pycaret, all the optional dependencies as listed here are also installed. Supercharge your Machine Learning Experiments with PyCaret and GradioI hope that you will appreciate the ease of use and simplicity in PyCaret and Gradio. Forecasting Long-Term Daily Municipal Water Demand Forecasting Long-Term Daily Municipal Water DemandSummaryThe Municipal government of the City of London, Canada completed an applied machine learning project focused on obtaining daily forecasts for long-term citywide water demand (see the source code). The Municipal Artificial Intelligence Applications Lab and the Water Demand team at the City of London sought out to investigate whether a more accurate forecasting model could be developed for the task of long-term water demand forecasting. ProblemIn the literature, water demand forecasting typically falls into one of two categories: short-term and long-term. As mentioned earlier, climate is often used as a feature for obtaining long-term water demand forecasts. A future study could include climate features and characterize the effect of climate change on municipal water demand and its implications for water demand in the far future. PyCaret is an open-source, low-code machine learning library and end-to-end model management tool built in Python for automating ML workflows Time Series 101 — For beginnersPhoto by Chris Liverani on Unsplash? What is Time Series Data? See below as an example of time series data, the chart below is the daily stock price of Tesla Inc. (Ticker Symbol: TSLA) for last year. Example of Time Series Data — Tesla Inc. (ticker symbol: TSLA) daily stock price 1Y interval. It is not very hard to distinguish the difference between cross-sectional and time-series data as the objective of analysis for both datasets are widely different. I will soon be writing a tutorial on unsupervised anomaly detection on time-series data using PyCaret Anomaly Detection Module. Build and Run a Docker Container for your Machine Learning Model The idea of this article is to do a quick and easy build of a Docker container with a simple machine learning model and run it. Before reading this article, do not hesitate to read Why use Docker for Machine Learning and Quick Install and First Use of Docker. In order to start building a Docker container for a machine learning model, let’s consider three files: Dockerfile, train.py, inference.py. In order to build the image, we run the following command in our terminal:docker build -t docker-ml-model -f Dockerfile . The goal was to produce quick and easy steps to build a Docker container with a simple machine learning model. Text Files Processing, Cleaning, and Classification of Documents in R Text Files Processing, Cleaning, and Classification of Documents in RWith the increasing number of text documents, text document classification has become an important task in data science. Both Python and R programming languages have amazing functionalities for text data cleaning and classification. It takes the training data, test data, the tags for training data, and the ‘k’ value. This classifier first uses the training data and the tags for the training data to learn the trend about the data. You need to find a suitable value of K.Here I am using the value of K as 4.library(class) set.seed(245)prob.test = knn(dtm.train, dtm.test, tags, k=4, prob = TRUE)prob.testHere is the part of the output:[1] rauto rauto rauto rauto rauto smed smed smed smed smed smed[12] rauto rauto smed rauto rauto smed rauto rauto rauto rauto rauto[23] smed rauto rauto smed rauto rauto rauto smed smed rauto smed[34] rauto rauto smed rauto smed smed rauto smed smed rauto rauto[45] rauto rauto rauto smed rauto smed smed smed smed smed smed[56] smed smed rauto rauto smed smed rauto rauto rauto smed rauto[67] smed rauto rauto smed smed rauto rauto rauto rauto smed rauto[78] smed rauto rauto smed rauto rauto rauto smed smed rauto rauto[89] rauto rauto smed rauto rauto smed rauto rauto smed smed smedIt has a total of 400 predictions. Why Parallelized Training Might Not be Working for You Advantages of Parallelized TrainingThe most obvious advantage of parallelized training is speed. In the case of a hyperparameter search, simultaneously evaluating multiple configurations allows us to quickly narrow down the most promising options. With distributed data parallel (DDP) training, a copy of the model’s parameters are placed on each available GPU and each copy is fed a different subset of the entire dataset. After every batch evaluation, the gradients of the the copies are synced and averaged. As the size of training datasets increase, DDP serves as a way to keep training times reasonable. The Highest Data Science Salaries Table of ContentsIntroduction Employment Level State Salary Breakdown Summary ReferencesIntroductionData Science salaries vary from state to state, as well as industry to industry. Whether you are in the industry of Computer Systems Design and Related Services, or Management of Companies and Enterprises, you can expect a data science salary to be one of the most competitive salaries in the workforce. Commute or remote work environmentSome other caveats are that this data is composed of technically both “Data Scientists and Mathematical Science Occupations”, so for data science only, the values may be different. SummaryAs you can see, data science salaries are high compared to other occupations, and with a higher job outlook over the course of the next 9 or so years, it looks like this position is here to stay. To summarize, we went over two main facets of the data science occupation:* Employment Level * State Salary BreakdownThank you for reading! Development of a Benchmark Dataset Integrating the fast.ai DataBlock API with Label Studio This would be to evaluate Label Studio as a labeling toolkit that would enable improved labeling speed and accuracy. Before implementing Label Studio, I used a combination of draw.io and Mac’s Photo Viewer. The Label Studio GUI made it possible for me to greatly improve from the draw.io+Mac Photo Viewer process. My favorite features of Label Studio were:The setup and editing of the configuration was easy. below:The relevant parts of my development server architecture (Image by Author)We use the Label Studio API, which is implemented as a REST api. A Few Words on Representation Learning Image Source: Deep LearningFrom this perspective, deep neural networks are representation learning models. However, instead of solving a task by mapping representations to targets (which a classical classifier would do), representation learning aims to map representations to other representations. Deep unsupervised representation learning seeks to learn a rich set of useful features from unlabeled data. Specifically, in Computer Vision (CV), unsupervised representation learning is one of the foremost, long-standing class of problems that pose significant challenges to many researchers worldwide. In recent years, deep unsupervised representation learning has gained much attention because of the latest breakthroughs in Natural Language Processing (NLP). How biological and artificial A.I neurons compare Explore how understanding the difference between artificial and biological neurons may give us clues about how to move towards a more flexible kind of artificial intelligence. And yet, our biological neurons are way more complex than our artificial ones and hold so much rich detail and so many mysteries within. Most artificial neurons in deep learning systems produce an active output (some may output a 0. As we will emphasize later on, around 0.5 to 2% of our biological neurons are active at any one time vs around 50% in typical artificial deep learning systems. But understanding in depth our biological neurons could give us some ideas that may enrich our experiments and strategies when working with artificial neurons. What Happened at the 2021 NVIDIA GTC What Happened at the 2021 NVIDIA GTCA review of what was unveiled at this year’s NVIDIA GPU Technology Conference, and what was said about Deep Learning and AI during its sessions Marcelo Ortega 5 days ago·8 min readThis year’s GTC started with NVIDIA CEO and Founder Jensen Huang’s Keynote announcing what will be released this year. Then, GTC starts, and over a thousand virtual sessions, speakers talk about ways to use Nvidia as an instrument. Photo from Nvidia sessionWhere ML Frameworks are goingIf Nvidia is itself a tool, then ML frameworks like TensorFlow and PyTorch are the tools that let us use that other tool. To give you an example of what’s coming next you can refer to BMW use case where they have done a 3D Digital Twin of the factory in Nvidia Omniverse to train new robots & people. Classical ML models, on the other hand, are good for abstraction, but bad for learning. How to Combine Predictions for Ensemble Learning In this post, you will discover common techniques for combining predictions for ensemble learning. Tutorial OverviewThis tutorial is divided into three parts; they are:Combining Predictions for Ensemble Learning Combining Classification Predictions Combining Predicted Class Labels Combining Predicted Class Probabilities Combining Regression PredictionsCombining Predictions for Ensemble LearningA key part of an ensemble learning method involves combining the predictions from multiple models. Standard ensemble machine learning algorithms do prescribe how to combine predictions; nevertheless, it is important to consider the topic in isolation for a number of reasons, such as:Interpreting the predictions made by standard ensemble algorithms. Combining Predicted Class LabelsA predicted class label is often mapped to something meaningful to the problem domain. BooksArticlesSummaryIn this post, you discovered common techniques for combining predictions for ensemble learning. An EPIC way to evaluate reward functions Our method, Equivalent-Policy Invariant Comparison (EPIC), allows one to evaluate a reward function by computing how similar it is to other reward functions. EPIC can be used to benchmark reward learning algorithms by comparing learned reward functions to a ground-truth reward. It can also be used to validate learned reward functions prior to deployment, by comparing them against reward functions learned via different techniques or data sources. EPIC is a new way to evaluate reward functions and reward learning algorithms by comparing how similar reward functions are to one another. Most significantly, EPIC can only compare reward functions to one another, and cannot tell you what a particular reward function values. It’s here! Join us for Amazon SageMaker Month, 30 days of content, discussion, and news Join us for 30 days of new Amazon SageMaker content designed to help you build, train, and deploy ML models faster. The SageMaker Savings Plans offer a flexible, usage-based pricing model for SageMaker. The SageMaker Savings Plans are on top of the productivity and cost-optimizing capabilities already available in SageMaker Studio. We would love for you to join the thousands of customers who are seeing success with Amazon SageMaker. Her goal is to make it easy for customers to build, train, and deploy machine learning models using Amazon SageMaker. How the Dot Product Measures Similarity How the Dot Product Measures SimilarityThe dot product is one of the most fundamental concepts in machine learning, making appearances almost everywhere. In this post, our goal is to unravel the dot product and provide a simple geometric explanation! The fundamental properties of dot productTo see what the dot product has to do with similarity, we have three key observations. So, the dot product of x and y equals to the one with xᵧ and y. If we assume that both x and y have a magnitude of one, the dot product equals to the scaling factor! The Cognitive Science of AGI So what is the difference between neuroscience, cognitive science, and psychology? For instance, MIT’s BCS programs offers grad degrees in cognitive science, systems neuroscience, cellular and molecular neuroscience, and computation — but not psychology[2]. So, why all this talk about neuroscience, cognitive science and all the other cognitive-sounding sciences? Young, Doubt and Certainty in Science: A Biologist’s Reflections on the Brain, 1960Pretend, if you will, that you are a visitor from the lands of computer science on a tour of the field of cognitive science. He will graduate from the University of Virginia in Spring 2021 with a BS in computer science and applied math, as well as a BA in cognitive science and philosophy. Train on Cloud GPUs with Azure Machine Learning SDK for Python To save you that pain, here is a getting started guide to running your machine learning models in the Azure cloud with GPU compute instances. Select how many CPU cores you wold like to be able to run concurrently for this compute type. In this compute type there are 6 CPU cores per GPU, so for one GPU you would enter 6, for two GPUS enter 12, etc. vm_size is the Azure specific name of the compute type you want to use in your compute cluster. is the Azure specific name of the compute type you want to use in your compute cluster. Geopandas Hands-on: Introduction to Geospatial Machine Learning Geopandas Hands-on: Introduction to Geospatial Machine LearningPart 1: Introduction to geospatial concepts (this post)Part 2: Geospatial visualization and geometry creation (coming soon)Part 3: Geospatial operations (coming soon)Part 4: Building geospatial machine learning pipeline (coming soon)In this post we are going to cover the preliminary ground of basic geospatial datatypes, attributes, and how to use geopandas to achieve these. Table of Content:What is Geopandas Installation Geospatial concepts Introduction to basic geometric attributesWhat is GeopandasGeopandas is open-sourced library and enables the use and manipulation of geospatial data in Python. Geopandas is also built on top of shapely for its geometric operation; its underlying datatype allows Geopandas to run blazingly fast and is appropriate for many machine learning pipelines that require large geospatial datasets. import geopandaspath_to_data = geopandas.datasets.get_path("nybb")gdf = geopandas.read_file(path_to_data)gdfIntroduction to basic geometric attributesNow that we have some ideas of geospatial data and how to import our very first one using Geopandas, lets perform some basic methods to further cement our understanding. In the next post, we will dig deeper into how these geospatial data can be visualized and created from scratch. PyPy Is Faster than Python, but at What Cost? Learning MachinePyPy Is Faster than Python, but at What Cost? This will replace the default CPython interpreter with PyPy which is supposedly significantly faster than Python “without changing a thing.”The author then gives an example of measuring the time taken to add integers between 0 and 100,000,000 inside a loop with Python and PyPy. Here, Python needs 9.28 seconds while PyPy needs 0.22 seconds. The reason for this is that most machine learning libraries use C API that is not supported by PyPy. Here’s a GitHub issue about PyPy support for PyTorch that is opened in 2019. When and how to use power transform in machine learning Power transfom is a family of functions that transform data using power laws. Some power transformationsThe most common power transformations are the Box-Cox and the Yeo-Johnson transformations. For this first example, we are going to avoid the use of power transformationsmodel = Pipeline([ ('scaler',StandardScaler()),('model',KNeighborsClassifier()) ]) model.fit(X_train,y_train) roc_auc_score(y_test,model.predict_proba(X_test)[:,1])Without power transform, we get an AUROC value equal to 0.976Now, let’s try to use the power transformation. If we apply power transform to the pipeline (before the scaler), the code is:model = Pipeline([ ('power',PowerTransformer()), ('scaler',StandardScaler()), ('model',KNeighborsClassifier()) ]) model.fit(X_train,y_train) roc_auc_score(y_test,model.predict_proba(X_test)[:,1])Using the power transformation, the AUROC value increases to 0.986. If you want to learn more about Power Transformations, join my Data pre-processing for machine learning in Python online course. How to import CSV files using Pandas DataFrame error-free Then read stick with me for some tips to avoid any form of error when loading your CSV files using Pandas DataFrame. The files are of different types of delimited files like tab-separated file, comma-separated file, multi-character delimited file etc. Therefore, I have laid out some steps to avoid any error when importing and loading your data file. Check the file is on the path:Now check whether your file is present in the described path using the below code. Print the file data to cross-check:Now, we can check whether our data file has loaded correctly using the below code. What Happened at the 2021 NVIDIA GTC What Happened at the 2021 NVIDIA GTCA review of what was unveiled at this year’s NVIDIA GPU Technology Conference, and what was said about Deep Learning and AI during its sessions Marcelo Ortega 4 days ago·8 min readThis year’s GTC started with NVIDIA CEO and Founder Jensen Huang’s Keynote announcing what will be released this year. Then, GTC starts, and over a thousand virtual sessions, speakers talk about ways to use Nvidia as an instrument. Photo from Nvidia sessionWhere ML Frameworks are goingIf Nvidia is itself a tool, then ML frameworks like TensorFlow and PyTorch are the tools that let us use that other tool. To give you an example of what’s coming next you can refer to BMW use case where they have done a 3D Digital Twin of the factory in Nvidia Omniverse to train new robots & people. Classical ML models, on the other hand, are good for abstraction, but bad for learning. Visual Introduction to Singular Value Decomposition (SVD) Visual Introduction to Singular Value Decomposition (SVD)(image by author)In this article, you’ll learn about Singular value decomposition (SVD), which is a major topic of linear algebra, data science, and machine learning. As eigendecomposition, the goal of singular value decomposition (SVD) is to decompose a matrix into simpler components: orthogonal and diagonal matrices. To represent the unit circle and the basis vectors before the transformation, let’s use this function using the identity matrix:Figure 1: The unit circle and the basis vectors. It will plot the unit circle and the basis vectors transformed by the matrix:Figure 2: Effect of the matrix A on the unit circle and the basis vectors. Let’s see the effect of each matrix successively:Figure 3: Effect of the matrix V^T on the unit circle and the basis vectors. What is “Artificial General Intelligence”? What is “Artificial General Intelligence”? The course was entitled CS 1501: Artificial General Intelligence, and was taught for three semesters at UVA to almost 150 students, even receiving a teaching award from the UVA CS faculty. This new web series will introduce these questions by looking at a specific superset of AI: Artificial General Intelligence — or general-purpose AI. Another “breed” of intelligence that is scientifically backed is Fluid Intelligence and Crystallized Intelligence. We will do what all great people in modern history do when asked a question: we look at Wikipedia:Wikipedia: Artificial general intelligence (AGI) is the intelligence of a machine that can understand or learn any intellectual task that a human being can. Google AI Blog: Multi-Task Robotic Reinforcement Learning at Scale Multi-task data collection across multiple robots where different robots collect data for different tasks. Large-Scale Multi-Task Data Collection SystemThe cornerstone for both MT-Opt and Actionable Models is the volume and quality of training data. To that end, we create a scalable and intuitive multi-task success detector using data from all of the chosen tasks. To further improve the performance, we focus data collection on underperforming tasks, rather than collecting data uniformly across tasks. This post is based on two papers, "MT-Opt: Continuous Multi-Task Robotic Reinforcement Learning at Scale" and "Actionable Models: Unsupervised Offline Reinforcement Learning of Robotic Skills," with additional information and videos on the project websites for MT-Opt and Actionable Models. The Importance of Hyperparameter Optimization for Model-based Reinforcement Learning The Importance of Hyperparameter Optimization for Model-based Reinforcement LearningModel-based reinforcement learning (MBRL) is a variant of the iterative learning framework, reinforcement learning, that includes a structured component of the system that is solely optimized to model the environment dynamics. MBRLModel-based reinforcement learning (MBRL) is an iterative framework for solving tasks in a partially understood environment. Automated Machine Learning (AutoML) is a field dedicated to the study of using machine learning algorithms to tune our machine learning tools. This is partly due to the harder problem of dynamic hyperparameter tuning (where hyperparameters can change within a run), but more on that later. By design, dynamic configurations make many more choices about the parameter settings than static configurations, thus making it very challenging to tune dynamic configurations by hand and without automatic HPO methods. Enforce VPC rules for Amazon Comprehend jobs and CMK encryption for custom models You can now control the Amazon Virtual Private Cloud (Amazon VPC) and encryption settings for your Amazon Comprehend APIs using AWS Identity and Access Management (IAM) condition keys, and encrypt your Amazon Comprehend custom models using customer managed keys (CMK) via AWS Key Management Service (AWS KMS). This policy applies these rules for all Amazon Comprehend APIs that start new asynchronous jobs, create custom classifiers, and create custom entity recognizers. When you enable classifier encryption, Amazon Comprehend encrypts the data in the storage volume while your job is being processed. Model encryption with a CMKAlong with encrypting your training data, you can now encrypt your custom models in Amazon Comprehend using a CMK. In the following example, we use an AWS KMS key ( 1234abcd-12ab-34cd-56ef-1234567890ab ) to encrypt an Amazon Comprehend custom model. Why Is the Maintainability and Reproducibility of Machine Learning Models Hard? Imagine wanting to revert to a previous model artifact if the current model isn’t performing as effectively as the previous model. All these tenacious jobs fall under the maintainability and reproducibility of machine learning models. Unearthing the NecessityMaintainability and Reproducibility of machine learning models isn’t a walk in the park. A multi-tenant service is required to host your machine learning models to serve them to multiple users. These points highlight the significance of having a sophisticated platform that could seamlessly integrate with and enhance your bare machine learning models. Explainable AI (XAI) with a Decision Tree Let us visualize the first three levels of the decision tree, max_depth=3. Decision tree visualization with max_depth=8 Image by AuthorIn the class line we can see the classification result of the node. Decision Tree surrogate modelOne popular way to explain the global behavior of a “black box” model is to apply the global surrogate model. Random Forest Classifier is a commonly used model that solves the overfit problem Decision Tree models tend to have. Using decision tree visualization can help us assess the correctness of the model intuitively and perhaps even to improve it. The simplest way to train a Neural Network in Python The simplest way to train a Neural Network in PythonPhoto by Uriel SC on Unsplashscikit-learn is my first choice when it comes to classic Machine Learning algorithms in Python. While MLPClassifier and MLPRegressor have a rich set of arguments, there’s no option to customize layers of a Neural Network (beyond setting the number of hidden units for each layer) and there’s no GPU support. From developers of scikit-neuralnetwork: scikit-neuralnetwork is a deep neural network implementation without the learning cliff! Note, GPU support requires an NVIDIA GPU with CUDA support. scikit-neuralnetwork is also useful when we need a Neural Network that works as a drop-in replacement for a sklearn algorithm. Machine Learning doesn’t occur in a vacuum, so why develop it in one? Machine Learning doesn’t occur in a vacuum, so why develop it in one? This slows down development drastically, and limits machine learning at scale to just the large tech companies with the budgets to afford an infrastructure team. If your deployment workflow takes 5 minutes, then you can do 5x fewer tests than someone whose workflow takes 1 minute, genius! Sticking to this requirement will also mean that development code doesn’t have to be rewritten to deploy it into production. A fast and repeatable deployment process for testingWhile I touched on this earlier, let’s revisit what this means for machine learning. Machine Learning Service for Real-Time Prediction Prediction endpointNow we come to the main part — the prediction endpoint. 4.2 Load a new version of the modelFor a production application, we will need to retrain the ML pipeline at some point. Then the REST service loads the latest ML pipeline as soon as it’s ready. 4.3 Storing predictionsIn terms of the production ML application, it is not enough just to make a prediction. The final version of the prediction endpoint will look like this: Stop Using All Your Features for Modeling Stop Using All Your Features for ModelingImage by Arek Socha from PixabayA real-world dataset contains a lot of relevant and redundant features. As the dimensionality of data or the number of features in the data increases, the number of configurations covered by features decreases. In this article, we will discuss how to select the best set of features using the Recursive Feature Selection algorithm and its implementation. Conclusion:In this article, we have discussed how to select the best set of k features using the Recursive feature selection technique. There are various other feature selection techniques that every data scientist should know, read the below-mentioned article to know 7 such feature selection techniques. Machine Learning for Store Delivery Scheduling You can find the first article, where the XGBoost model implementation is explained, in this link: Article 1. ConstraintInventory after delivery must be lower than Maximum CapacityDemand planning equation — (Image by Author)Constraint Equation — (Image by Author)2. Average Inventory Level (%)By day,100 x Storage Qty (Pcs) / Storage Maximum Capacity (Pcs)Inventory Optimization: reach the lowest value possible without getting Inventory Shortage IncidentsReducing Replenishment FrequencyUsing the sales forecast, a replenishment strategy can reduce the number of replenishments. Example of Replenishment Frequency Reduction — (Image by Author)The example above shows an example of replenishment day skipped; day n inventory is higher than minimum inventory and enough to meet day n+1 demand. Remark: This KPI is highly dependant on Inventory Level (%); with a low inventory level it is hard to skip a replenishment without risking shortage incident. Four Skills to Start Your Data Science Learning Path Four Skills to Start Your Data Science Learning PathAt least twice a week, I’m approached by technical and non-technical folks alike asking my thoughts on where to begin learning about data science and machine learning. I added the emphasis on the word “data” there because, as you might guess, data science falls flat if you don’t have any data to work with! Given that data science is all about working with data, the first two skills we’ll cover in this post — Python and SQL — revolve around working with data in general. PythonAlthough used across many computer science fields, Python is easily one of the most popular coding language used in the data science field today. GitThough this isn’t exactly a data analysis or data science skill, it’s still extremely important for collaborating with others on any codebase (including data science codebases) and building a portfolio of work on something like GitHub. Autoencoder network optimization for dimensionality reduction Autoencoder network optimization for dimensionality reductionIn the previous article we saw how we could mimic a PCA dimensionality reduction with autoencoder network (AE) by using a linear activation function and “mse” as loss metric. On a synthetic dataset we compared the dimensionality reduction performances based on the classification scores with the latent variables. We saw how it was possible to improve the results by modifying the network (adding more layers, stacked autoencoder) or eventually allowing the activation function to be non linear. In this article I would like to perform the same basic analysis but this time using a RandomizedSearchCV approach to find the best network structure. In the first part we will use as starting point the performances of the shallow network as we did in the previous article. Do Artificial Neural Networks Really Learn? Sometimes while I am waiting for my artificial neural network (ANN) to finish its training I think about this. It might look like all these stories about philosophers and their ideas have nothing to do with artificial neural network algorithms and Python libraries. Artificial Neural NetworksAn Artificial Neural Network (ANN) is the key to understand Deep Learning. [2] In reality, although ANNs and real neural networks have things in common, they are not that similar. It is fair to say that our learning process is not the same as an ANN’s learning process. Generating text with Recurrent Neural Networks based on the work of F. Pessoa One particular type of generative model often used to tackle problems with sequences of discrete tokens is Recurrent Neural Networks (RNN). Não é porque isto aconteça aos consagrados; é porque é o maior tributo (...)''— Ser lúcido é estar indisposto consigo próprio. def plot_history(history_dict):plt.figure(figsize=(15,5))plt.subplot(121)plt.plot(history_dict['sparse_categorical_accuracy'])plt.plot(history_dict['val_sparse_categorical_accuracy'])plt.title('Accuracy vs. epochs')plt.ylabel('Accuracy')plt.xlabel('Epoch')plt.xticks(np.arange(len(history_dict['sparse_categorical_accuracy'])))ax = plt.gca()ax.set_xticklabels(1 + np.arange(len(history_dict['sparse_categorical_accuracy'])))plt.legend(['Training', 'Validation'], loc='lower right')plt.subplot(122)plt.plot(history_dict['loss'])plt.plot(history_dict['val_loss'])plt.title('Loss vs. epochs')plt.ylabel('Loss')plt.xlabel('Epoch')plt.xticks(np.arange(len(history_dict['sparse_categorical_accuracy'])))ax = plt.gca()ax.set_xticklabels(1 + np.arange(len(history_dict['sparse_categorical_accuracy'])))plt.legend(['Training', 'Validation'], loc='upper right')plt.show()plot_history(history_dict)Figure 3: Accuracy and Loss evolution over several epochs of the RNN model. Character-level recur-rent neural networks in practice: comparing training and sampling schemes.Neural Computing and Applications, 31(8):4001–4017. Generating text with recurrent neural networks. PUBG Winner Ranking Prediction using R Interface ‘h2o’ Scalable Machine Learning Platform Deep Feed Forward Neural NetworksThe deep neural network is the same as that of the non-deep neural network, the difference occurs in terms of the number of hidden layers. The deep neural networks have hidden layers ranging between 2–8, the larger the hidden layer, the more complex the neural network is. The structure of the deep neural network is the same as non-deep with an input layer, hidden layer & output layer. Multi-Layer Feed Forward Neural Network- Image By AuthorLITERATURE REVIEWThe experimental section of this research paper consists of various tests and results obtained by performing iterations. The best algorithm among all was Deep Neural Network with the lowest error value for testing data. A Gentle Introduction to Ensemble Learning Algorithms In this tutorial, you will discover the three standard ensemble learning techniques for machine learning. Tutorial OverviewThis tutorial is divided into four parts; they are:Standard Ensemble Learning Strategies Bagging Ensemble Learning Stacking Ensemble Learning Boosting Ensemble LearningStandard Ensemble Learning StrategiesEnsemble learning refers to algorithms that combine the predictions from two or more models. Bagging Ensemble LearningBootstrap aggregation, or bagging for short, is an ensemble learning method that seeks a diverse group of ensemble members by varying the training data. Replacement means that if a row is selected, it is returned to the training dataset for potential re-selection in the same training dataset. BooksArticlesSummaryIn this tutorial, you discovered the three standard ensemble learning techniques for machine learning. Training Your Own Message Suggestions Model Using Deep Learning Every message is assigned a conversation ID, which is useful as it helps in training only relevant responses against an input text. For training our model, we will only be using the first 2 columns in the dataset i.e conversation_id and message. Filtering Input and Target TextOnce input and target text arrays are populated, we tokenize and pad every item of both arrays. Post training, text prediction can be done “greedily” or by using beam search. Using such techniques, the input text could be normalized and then fed to the model, hopefully, leading to better predictions. The Big Issue With Softmax The Big Issue With SoftmaxSoftmax is one of the most commonly used activation functions in machine learning. It is usually the last layer of a neural network, transforming the final high-level feature vector into probability distributions. The Softmax function. The issue with SoftmaxSuppose that our two-dimensional feature vector contains the scores regarding the input’s class for the labels “cat” and “dog”. However, because the exponential function turns addition into multiplication, the Softmax is insensitive for translating feature vectors with the same value! Soft-launching an AI/ML Product as a Solo Founder Object Detection APITensorFlow 2 Detection Model Zoo is a collection of pre-trained models (COCO2017 dataset). The Object Detection API uses protobuf files to configure training and evaluation (pictured below). This is what my experiment tracking looked like when I ported AnyNet from Facebook Research’s model zoo to TensorFlow 2 / Keras. Comment below if you’d like to read a deep dive into my workflow for experiment tracking and training computer vision models! My hope is that by explaining my decision-making process, I demonstrate where these foundational skills support a successful AI/ML product strategy. 6 Useful Metrics to Evaluate Binary Classification Models Therefore, people often summarise the confusion matrix into the below metrics: accuracy, recall, precision and F1 score. Luckily, precision and recall are two metrics that consider False Positive and False Negative. Well, Newt would have to ask himself whether reducing False Negative is more or less important than minimising False Positive. Remember I said earlier that False Positive and False Negative means different impacts? This metric is often useful for evaluating classification models when neither precision nor recall is clearly more important. Alexey Grigorev on His Career, Data Science, and Writing Then, in 2018, you joined OLX as a senior data scientist, and now, you’re a lead data scientist. If someone can do all these things, they can 100% call themselves a senior data scientist. A mid-level data scientist will also be able to train a model, or work with data engineers on pipelines. Eugene: To summarize, a senior data scientist is someone you can trust. Also, for the last 2–3 years, I’m trying to learn things outside of data science. The exploration-exploitation dilemma In very simple terms, this is how we could summarise the exploration-exploitation dilemma. Exploitation and exploration are two possible behaviours when facing a decision making problem that both have pros and cons. On one hand, exploitation consists of taking the decision assumed to be optimal with respect to the data observed so far. In this post, we will give some key notions related to the exploration-exploitation dilemma and present four of the most popular methods to handle this problem: e-greedy methods, optimistic initialisation, upper confidence bounds and Thompson sampling. OutlineIn the first section, we will discuss a bit more the exploration-exploitation dilemma and introduce the well known multi-armed bandit framework. Weather forecasting with Machine Learning, using Python Weather forecasting with Machine Learning, using PythonPhysicists define climate as a “complex system”. With the computational developments of the last years, Machine Learning algorithms are certainly part of them. The challenge I want to discuss is based on forecasting the average temperature using traditional machine learning algorithms: Auto Regressive Integrated Moving Average models (ARIMA). Machine Learning AlgorithmsLet’s consider the 1992–2013 decade and plot it:Performing the train/test split:Plotting the split:The Machine Learning algorithms are the ARIMA models. ConclusionsThese methods are extremely easy to adopt as they don’t require any specific computational power like Deep Learning methods (RNN, CNN … ). Neural Networks for Survival Analysis in R survivalmodelsThe package {survivalmodels} currently contains the neural networks:* CoxTime⁷* DeepHit⁸* DeepSurv⁹* Logistic-Hazard¹⁰ ¹¹* PCHazard¹¹* DNNSurv¹²The first five of these use {reticulate}¹³ to connect the great Python {pycox}¹⁴ package, written by Håvard Kvamme, this means you can use neural networks in R with the speed of Python. We are going to train and tune the Pycox neural networks in {survivalmodels} (all but DNNSurv). Hyper-parameter configurationsTraining and tuning neural networks is an art but for this article, we are keeping it simple. Pre-processingAll neural networks require some data pre-processing. SummaryIn this demonstration we used neural networks implemented in Python and interfaced through {survivalmodels}. An Introduction to Reinforcement Learning IntroductionReinforcement learning (RL) is an area of machine learning. You might have already worked with supervised learning (where the data is labelled), and unsupervised learning (unlabeled data, e.g., for generative techniques). These two broad fields are complemented by the domain of reinforcement learning. In one informal sentence, Reinforcement learning learns to achieve a goal through interaction. To come up with a more precise definition of Reinforcement Learning we need a mathematical approach. Design patterns in machine learning Design patterns in machine learningAccording to its definition, a design pattern is a reusable solution to a commonly occurring problem. By the 2000s, design patterns — especially the SOLID design principles for OOP — were considered common knowledge to programmers. Design patterns, however, have not been extended yet to deal with the challenges of this new era. The design patterns are a continuous evaluation of performance, which means you expect drifts to happen and, hence, design the system to notice it as soon as possible. The SOLID design principles of machine learningThe reason I’m writing about design patterns is that this field has reached the level of maturity where we should not only share our best practices but we should be able to abstract them to real design patterns. How You Should Read a Machine Learning Paper How You Should Read a Machine Learning PaperIntroductionThroughout this post, we will review the most important principles you should take into account when reading a machine learning paper and if you actually need to read papers to advance in your path as Machine Learning Engineer/Practitioner. This last point is crucial, a scientific paper to be considered as such has been reviewed and verified to assess its quality. Don’t (just) Read PapersDon’t just read papers, be multimodal, go to different sources when learning about a topic that interests you. If you liked this post then you can take a look at my other posts on Data Science and Machine Learning here. If you want to learn more about Machine Learning, Data Science and Artificial Intelligence follow me on Medium and stay tuned for my next posts Leveraging Geolocation Data for Machine Learning: Essential Techniques INTUITIVE GEO-LOCATION SERIESLeveraging Geolocation Data for Machine Learning: Essential TechniquesPhoto by Daniel Olah on UnsplashLocation data is an important category of data that you frequently have to deal with in many machine learning applications. Location data typically provides a lot of extra context to your application’s data. Geospatial data (used to augment location information)We could augment our dataset by adding external location-based data (either publicly available or from a third party). The plot of location data (Image by Author)By themselves, the data points do not carry enough context. This means that dealing with location data is mostly about data preparation rather than about building any location-specific machine learning or deep learning models. An Introduction to Apache Airflow An Introduction to Apache AirflowPhoto by Christopher Gower on UnsplashWhat is Apache Airflow? Airflow DAGs PageFrom the DAGs view, you are able to see all DAGs that are currently registered into Airflow. Install and Setup AirflowInstall Airflow in a new airflow directory(venv) % mkdir airflow && cd airflow (venv) % pip install apache-airflowSetup the proper directory structure and create a new airflow folder. Airflow uses a Sqlite database to keep track of metadata for all of the airflow DAGs. (venv) % airflow db initNow you should see a bunch of files inside the airflow directory. How do machines plan? —An introduction States, actions and rewards (costs)It is easiest to visualise what a typical planning problem would look like in a navigation setting. In a more general case, we can model a probabilistic state transition as a Markov Decision Process (MDP), which we will discuss next. In many cases, the state transition and reward dynamics of the environment are unknown until you try things out and see what happens, so it is difficult to plan things ahead without some trial-and-error. States s ∈ S (we denote the set of possible states as S, and each state as s), as described before, is a unique encoding of a specific situation. Usually, the concepts of states, actions and rewards are introduced as a part of the MDP formulation, used in reinforcement learning. To ROUGE or not to ROUGE? To ROUGE or not to ROUGE? … what the ROUGE score is. A large overlap of n-grams results in a high ROUGE score and a low overlap — in a low ROUGE score. For text summarization we want to look at ROUGE longest common subsequence (ROUGE L) as this will give us the longest overlap. We see that here ROUGE failed to give us a good indication because it still shows a high ROUGE score (77%) for the machine-written summary, but this summary is actually factually incorrect. Extract Tables from PDF file in a single line of Python Code Data can be present in any format, data collection and data preparation is an important component of a model development pipeline. Camelot uses two table parsing techniques, i.e Stream and Lattice to extract tables from PDF documents. Lattice Algorithm steps to find the tables in the PDF documents are:Convert the PDF document into Image using Ghostscript. Spanning cells or merged cells are detected by using the line intersection and line segments. tables = camelot.read_pdf('table.pdf', password='*******')camelot.read_pdf is the only single line of Python code, required to extract all tables from the PDF file. How to Build an Impressive Data Science Resume? How to Build an Impressive Data Science Resume? This article will explore some simple strategies to significantly improve the presentation as well as the content of data science resumes. The same logic would apply to any data science job position hence resume plays a critical role in getting shortlisted. I personally advocate removing the career objective from the resume and instead use that space for a better profile summary. Use bullet pointsMake sure the details you include in your resume are in bullet points, be it the profile summary or professional/project experience. Feature Selection: How To Throw Away 95% of Your Data and Get 95% Accuracy This test is computed as the ratio: between-group variability / within-group variability, where the group is the target class. from boruta import BorutaPyfrom sklearn.ensemble import RandomForestClassifier boruta = BorutaPy(estimator = RandomForestClassifier(max_depth = 5),n_estimators = 'auto',max_iter = 100).fit(X_train, y_train)3.6 MRMRMRMR (which stands for “Maximum Relevance Minimum Redundancy”) is an algorithm designed in 2005 for feature selection. from mrmr import mrmr_classif mrmr = mrmr_classif(X_train, y_train)All these algorithms provide a “ranking” of the features (except for Boruta, which has a yes/no outcome). Thus, in the case of MNIST, we could throw away 95% of our data and still get more than 95% accuracy (which corresponds to an area under ROC of 99.85%!). An effective feature selection allows you to build data pipelines that are more efficient in terms of memory, time, accuracy, interpretability and ease of debugging. The Map Function in Python The iterable we pass in to the map function is the message, msg. The function we pass in to the map function will be a lambda function, which takes each element from the msg string, and if the element is a letter in the alphabet, it replaces it with the shifted letter depending on the n value we pass in. Thus replacing the letter z with the letter of the index 1 in the alphabet, which is b.map(lambda x:abc[(abc.index(x)+n)%26] if x in abc else x, msg)Remember that the map function will return a map object. The function then returns this string. To decrypt a message, we can use the following decrypt function (notice how we are subtract n from abc.index(x) instead of adding it): How to start being an active learner? Active learning methods aim to choose a subset of the most informative and representative images from our unlabeled set of images. In my experience, active learning methods can be also divided into task-specific or task-agnostic. Cons: Might be harder to understand, requires a deeper theoretical understandingPersonal insight: I like the coupling elimination between active learning and the specific task. For now, you can take a look at [3] and [6] and start being an active learner! References:[1] Semi-supervised Active Learning for Instance Segmentation via Scoring Predictions[2] Learning loss for active learning[3] Active Learning for Segmentation Based on Bayesian Sample Queries[4] Learning to Sample: an Active Learning Framework[5] Suggestive annotation: A deep active learning framework for biomedical image segmentation[6] Cost-effective active learning for melanoma segmentation[7] Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning[8] The Relevance of Bayesian Layer Positioning to Model Uncertainty in Deep Bayesian Active Learning[9] Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning Flashlight: Fast and flexible machine learning in C++ Deep and ML frameworks are good at what they do — but altering the internals of these frameworks has traditionally proved difficult. We wrote Flashlight from the ground up in modern C++ because the language is a powerful tool for doing research in high-performance computing environments. Flashlight’s modular internals make it a powerful research framework for research frameworks. By making it easier to rapidly iterate on custom low-level code, Flashlight opens the door to research that pushes the limits of performance. Get it on GitHub:Flashlight: A C++ standalone library for machine learning Implementing the Perceptron Algorithm in Python PerceptronLet us try to understand the Perceptron algorithm using the following data as a motivating example. Then, for binary classification in Logistic Regression, we needed to output probabilities between 0 and 1, so we modified the hypothesis as — sigmoid(theta.X) . For the Perceptron algorithm, we apply a different function over theta.X , which is the Unit Step Function, which is defined as —source: Andrew Ng coursewhere,source: Andrew Ng courseUnlike Logistic Regression which outputs probability between 0 and 1, the Perceptron outputs values that are either 0 or 1 exactly. Let’s look at the Unit Step Function graphically —Unit Step Function; sourceWe can see for z≥0, g(z) = 1 and for z<0, g(z) = 0. Let’s code the step function. Data Visualization using Grammar Graphics Data Visualization using Grammar GraphicsData Visualization is visually representing the data in order to find out certain patterns or outliers. Creating plots using grammar is easy using plotnine because it makes custom plots easy and can also create simple plots. In this article, we will learn about how to use plotnine to create different bars and charts. Let’s get started…Installing required librariesWe will start by installing the Plotnine using pip. You can view my Github profile for different data science projects and packages tutorials. BSTS and CausalImpact: Analysing Dell XPS Laptop Popularity by Web Page Views BSTS and CausalImpact: Analysing Dell XPS Laptop Popularity by Web Page ViewsDisclaimer: This article is written on an “as is” basis and without warranty. One way of determining consumer interest in a particular product range is through analysing web page views, e.g. Apart from looking at sales figures, one possibility would be to look at web page views for the search term “Dell XPS”. Had CausalImpact been used to compare Dell XPS to page views for another software product, then the results may be different. Finally, web page views from just one source has been used as the benchmark. Causal Machine Learning for Econometrics: Causal Forests Causal Machine Learning for Econometrics: Causal ForestsIntroduction to causal machine learning for econometrics, including a Python tutorial on estimating the CATE with a causal forest using EconML Haaya Naushan 22 hours ago·13 min readPhoto by Lukasz Szmigiel on UnsplashEquity is not the same principle as equality. In this article, I will focus on a specific technique, causal forests, a causal machine learning method developed by economists, Susan Athey and Stefan Wager. The code snippet below shows how to reset the parameters of the causal forest and subsequently fit the causal forest model again, in order to calculate SHAP values. Currently, traditional econometrics is being used to asking traditional economic questions, and the potential of causal machine learning is being overlooked. My hope is that the gap can be bridged; I optimistically believe that causal machine learning advancements in theoretical econometrics can be applied to social research. Tasks & Tools List for Building Scalable ML Pipelines Building an ML pipeline is an overwhelming piece of business that requires many different components to be integrated in a seamless manner. Simply put, an ML pipeline is a sequence of tasks that are performed to move ML model(s) from an experimental Jupyter Notebook(or Google Colab) to a robust application in production. Spanning and versioning datasets using tools like DVC(talked to the creator). Get detailed performance metrics using Tensorflow Model Analysis(TFMA)Check the model fairness indicators. Main Tasks:Automate the ML pipeline by setting up the pipeline orchestrator that underpins all the components above. A Statistical Analysis of Social Factors That Affect Citizen Happiness A Statistical Analysis of Social Factors That Affect Citizen HappinessImage by Ricardo Moura on UnsplashAbstractThis article aims to explore various social, urban and national factors that may affect the happiness of citizens around the globe. Since our primary goal was to predict happiness score and the factors affecting them, we decided to have a deeper look at the correlations of each possible contributing factor with the happiness score (the dependent variable) to have an idea about which factors are more likely to affect happiness score strongly. The five-number summary of happiness score shows that happiness score is in the range 2.693 and 7.769. The mean happiness score is 5.379 and the median happiness score is 5.322. Similarly, the distribution of happiness score below also shows that there is not even a single country which has a perfect happiness score. Creating a simple machine learning demo with GradioML Creating a simple machine learning demo with GradioMLImage by AuthorDeploying a machine learning model is sometimes an overlooked aspect of data science. Traditionally, experts such as machine learning engineers and DevOps specialists collaborate with data scientists to put these models into production. Instead of going full-out on this deployment effort, there are times when a simple user interface demo may get the job done of communicating the contents of the machine learning model. Using these tools to receive various types of data as input, machine learning tasks such as classification and regression can easily be demoed. ConclusionIn this exercise, an image classification model using CNN was built and deployed as a simple demo using a tool called GradioML. Multi-model deployment in AWS Sagemaker | MLOPS | Pytorch If you are reading through this article, I assume that you are aware of AWS sagemaker and can deploy models in the platform. Things you need for the setupAWS account with sagemaker access ECR docker container access Sagemaker notebook or local jupyter notebook environmentHow multi-model works? As a next step, multiple models are uploaded into the S3 bucket, and endpoints are created as a regular process. The process is a little messy when we are trying to deploy multiple models. Below are the steps which I have followed to make it easier and deploy multiple models. How to Implement Gradient Descent Optimization from Scratch Tutorial OverviewThis tutorial is divided into three parts; they are:Gradient Descent Gradient Descent Algorithm Gradient Descent Worked ExampleGradient Descent OptimizationGradient descent is an optimization algorithm. There are many extensions to the main approach that are typically named for the feature added to the algorithm, such as gradient descent with momentum, gradient descent with adaptive gradients, and so on. Gradient descent is also the basis for the optimization algorithm used to train deep learning neural networks, referred to as stochastic gradient descent, or SGD. Gradient Descent AlgorithmIn this section, we will take a closer look at the gradient descent algorithm. Gradient Descent Worked ExampleIn this section, we will work through an example of applying gradient descent to a simple test optimization function. AI predicts effective drug combinations to fight complex diseases faster Today, Facebook AI and the Helmholtz Zentrum München are introducing a new method that will help accelerate discovery of effective new drug combinations. We’ve built the first single AI model that predicts the effects of drug combinations, dosages, timing, and even other types of interventions, such as gene knockout or deletion. CPA uses a novel self-supervision technique to observe cells treated with a finite number of drug combinations and predicts the effect of unseen combinations. So far, there hasn’t been an effective approach to predict the effects of unseen drug combinations and other perturbations. Then, it independently recombines the attributes to predict their effects on the cell’s gene expressions. Predicting the Future: Learn to Forecast with Arima Models Predicting the Future: Learn to Forecast with Arima ModelsImage by Alexas_Fotos from PixabayXTS ObjectsIf you’re not using XTS objects to perform your forecasting in R, then you are likely missing out! The first thing you’ll need to do is create your date index. train <- sales_xts[index(sales_xts) <= "2015-07-01"]validation <- sales_xts[index(sales_xts) > "2015-07-01"]Time to Build a ModelThe auto.arima function incorporates the ideas we just spoke about to approximate the best arima model. From here you will plot the validation data and then throw the forecast on top of the plot. forecast <- forecast(model, h = 121)forecast_dates <- seq(as.Date("2015-09-01"), length = 121, by = "day") forecast_xts <- xts(forecastmean, order.by = forecast_dates) plot(validation, main = 'Forecast Comparison') lines(forecast_xts, col = "blue")ConclusionI hope this was a helpful introduction to ARIMA forecasting. What You Need To Get Started With Quantum Machine Learning What You Need To Get Started With Quantum Machine LearningThe one thing you need to get started with quantum machine learning is not a degree in physics. Quantum machine learning is the use of quantum computing to solve machine learning problems. You wouldn’t have decided to look into this post if you haven’t considered learning quantum computing, machine learning, or quantum machine learning. What you need to get started with quantum machine learning is not a degree in physics or math. Whether you just get started with quantum computing and machine learning or you’re already a senior machine learning engineer, Hands-On Quantum Machine Learning With Python is your comprehensive guide to get started with Quantum Machine Learning — the use of quantum computing for the computation of machine learning algorithms. Use Gaussian Mixture Models to Transform User-Item Embedding and Generate Better User Clusters Use Gaussian Mixture Models to Transform User-Item Embedding and Generate Better User ClustersPhoto by Markus Spiske on UnsplashI. In short, it is based on learning Gaussian Mixture model from item embedding data and then use it to generate new user vectors based on the probability of each user to be part of each item cluster distribution. Cluster item vectors by learning a mixture model based on the assumption that each item was sampled from a Gaussian distribution. In step 2, we iterate over our user vectors and estimate the probability of each user vector being part of each item cluster. The method first learns a Gaussian mixture model based on item data, which essentially assumes that the data was generated from a mix of Gaussian distributions. 18 Valuable Things to Learn Next as a Data Scientist №1: Unix-like operating systemsOne aspect of Data Science work that I think is very often over-looked is the use of Unix-like operating systems. While some Data Scientists might include the functions of distributions as vital information for Data Science, I disagree. №4: Data collectionA vital portion of the Data Science process is always going to be actually getting data. №17: Data EthicsAnother important concept that I think a lot of Data Scientists should get into is the concept of Data Ethics. Data has become somewhat of a currency, data is sold, data is exchanged, and data is valued. Train multiple Time Series Forecasting Models in one line of Python Code Train multiple Time Series Forecasting Models in one line of Python CodeImage by Mediamodifier from PixabayAutomated Machine Learning (AutoML) refers to automating some of the components of the machine learning pipeline. It can train multiple time series forecasting models including ARIMA, SARIMAX, FB Prophet, VAR, etc, in just one line of Python code, and then choose the best one out of it for predictions. Some of the features of Auto-TS libraries are:Finds the optimal time series forecasting model using genetic programming optimization. Trains naive, statistical, machine learning, and deep learning models, with all possible hyperparameter configurations, and cross-validation. Installation:Auto-TS can be installed using PyPl using the command:pip install autotsUsage:This library comes up with only train time series forecasting models. How To Train an LSTM Model ~30x Faster Using PyTorch with GPU Table of ContentsIntroduction Getting Started with PyTorch on Saturn Cloud Setting up LSTM Model Training Model Training and GPU Comparison Model Inference Final Thoughts ReferencesIntroductionDisclaimer: I worked with Saturn Cloud to make this example. To follow this tutorial, follow the link to Saturn Cloud [3], and click on the button on the homepage “Try Saturn Cloud For Free”. Model Training and GPU ComparisonThe default setting in the code is set to GPU. Spoiler alert, GPU is considerably faster, so I recommend using the default train() function if you want to skip to just using that and not compare. Using %%time , we can see that the speed of using GPU with PyTorch is nearly 30 times faster, 26.88 to be more specific. PyTorch TabNet: integration with MLflow TabNet is a modern Neural Network architecture for tabular data. TabNet is a neural network architecture dedicated to structured data, in tabular form. The books introducing the development of neural networks in Python are full of examples dealing with tabular data. In the kind of examples mentioned above, the network architecture is almost always a Fully Connected network. TabNet is an interesting solution and there is a nice Open Source implementation, based on PyTorch. Counterfactual predictions under runtime confounding Predictions used to inform medical treatment decisions may not have access to all confounding factors. How can we make counterfactual predictions using only a subset of confounding factors? What is “runtime confounding?”Runtime confounding occurs when all confounding factors are recorded in the training data, but the prediction model cannot use all confounding factors as features due to sensitivity, interpretability, or feasibility requirements. These assumptions enable us to identify our target estimand asu(v) = \mathbb{E}[ \mathbb{E}[Y \mid A = a, V = v, Z =z] \mid V =v].This suggests that we can estimate an outcome model $$\mu(v,z) := \mathbb{E}[Y \mid A = a, V = v, Z =z]$$ and then regress the outcome model estimates on $$V$$. The PL approach requires us to efficiently estimate a more challenging high-dimensional target $$\mathbb{E}[Y \mid A = a, V = v, Z =z]$$ when our target is a lower-dimensional quantity $$u(V)$$. Google AI Blog: Presenting the iGibson Challenge on Interactive and Social Navigation This year, Stanford and Google are proud to announce a new version of the iGibson Challenge on Interactive and Social Navigation, one of the 10 active visual challenges affiliated with the Second Embodied AI Workshop at CVPR 2021. In addition, this year’s interactive and social iGibson challenge explores interactive navigation and social navigation — how robots can learn to interact with people and objects in their environments — by combining the iGibson simulator, the Google Scanned Objects Dataset, and simulated pedestrians within realistic human environments. New Features of the iGibson 2021 DatasetTo facilitate research into techniques that address these problems, the iGibson Challenge 2021 dataset provides simulated interactive scenes for training. The challenge is implemented in Stanford’s open-source iGibson simulation platform, a fast, interactive, photorealistic robotic simulator with physics based on Bullet. For more details on participating, please check out the iGibson Challenge Page. Estimating 3D pose for athlete tracking using 2D videos and Amazon SageMaker Studio The team surpassed our expectations, developing a 3D pose estimation pipeline using 2D videos captured with mobile phones in just two weeks. Recent advances in computer vision and deep learning have enabled scientists to explore pose estimation in a 3D space, where the Z-axis provides additional insights compared to 2D pose estimation. However, building a 3D pose estimation model from scratch is challenging because it requires imaging data along with 3D labels. 3D pose estimationWe employed a state-of-the-art 3D pose estimation algorithm encompassing a camera distance-aware top-down method for multi-person per RGB frame referred to as 3DMPPE (Moon et al.). We used two metrics commonly used for both 2D and 3D pose estimation, as described in the next section on evaluation. Implement checkpointing with TensorFlow for Amazon SageMaker Managed Spot Training Finally, we see the savings that we achieved by running our training job on Spot Instances using Managed Spot Training in SageMaker. Managed Spot Training uses EC2 Spot Instances to run training jobs instead of On-Demand Instances. Managed Spot Training is available in all training configurations:All instance types supported by SageMakerAll models: built-in algorithms, built-in frameworks, and custom modelsAll configurations: single instance training and distributed trainingInterruptions and checkpointingThere’s an important difference when working with Managed Spot Training. TensorFlow image classification model with Managed Spot TrainingTo demonstrate Managed Spot Training and checkpointing, I guide you through the steps needed to train a TensorFlow image classification model. To run a Managed Spot Training job, you need to specify few additional options to your standard SageMaker Estimator function call:use_spot_instances – Specifies whether to use SageMaker Managed Spot Training for training. AWS launches free digital training courses to empower business leaders with ML knowledge Today, we’re pleased to launch Machine Learning Essentials for Business and Technical Decision Makers—a series of three free, on-demand, digital-training courses from AWS Training and Certification. These courses are intended to empower business leaders and technical decision makers with the foundational knowledge needed to begin shaping a machine learning (ML) strategy for their organization, even if they have no prior ML experience. With the new Machine Learning Essentials for Business and Technical Decision Makers course, we’re making a portion of the AWS Machine Learning Embark curriculum available globally as free, self-paced, digital-training courses. The AWS Machine Learning Embark program has already helped many organizations harness the power of ML at scale. The Met Office partnered with the Amazon ML Solutions Lab through the AWS Machine Learning Embark program to explore novel approaches to solving this. Math Animations, Irreproducible Research, and Telling Stories with Data “Story” is a word that sometimes feels overused—including in the context of data science. Not every slide deck with a clear structure and useful takeaways is a story, and that’s ok. But as Marie Lefevre argues in her post about compelling data storytelling, there are time-tested ways to make any analysis memorable and engaging, so why settle for dry and perfunctory? When you hear “reproducible research,” do you run straight to your kitchen to throw some popcorn into the microwave? Maybe you should, if the next item on your agenda is Vincent Vanhoucke’s call to recognize the value of irreproducible research, based on his robotics research work at Google. Bias-variance tradeoff in machine learning: an intuition A bad model performs badly everywhere, an average model performs on par in most circumstances, whereas a fine-tuned model performs great in one situation but not so good in others. Bias-variance tradeoffLet’s now connect this intuition with the formal concept of bias-variance tradeoff. In machine learning, each model is specified with a number of parameters that determine model performance. An exampleLet’s now extend the intuition and the concept of bias-variance tradeoff with an example in Python. 2) Regularization: It is a technique to optimize model performance by adding a small bias in the cost function. Transformer-Based Real-Time Recommendation at Scribd Transformer-Based Real-Time Recommendation at ScribdSequential deep learning model has proven benefits in the recommendation industry for generating more relevant and dynamic recommendations based on users’ past sequence of actions. Serving using User and Item Embeddings: -As explained above last latent output vector from encoders is called user embeddings. And then these user embeddings are passed through feed forward network to compute prediction scores for all items. So dot product between user embeddings and item embeddings would give actual recommendation score. Features:- For each interaction in user’s sequence of interactions, we basically have 3 types of features: interaction features, user features and item features. 5 New Data Science Books That You Should Consider Reading Over the past years, there were so many data science books published in almost any language you’re comfortable reading. №1: Data Science: The Complete Guide To Data Science For Beginners By Sabra DealIt’s undeniable that data science continues to attract new and intelligent people to join every day. Rather you can think of this book as a detailed, high-level data science roadmap for anyone confused about what data science is and what it takes to become a data scientist. №4: Data Analytics Guide For Beginners By Hosea DroskiAs I always say, data science is all about the data, and one of the essential steps in any data science project is data collection and analytics. Today, I listed 5 books that I believe are very promising, new data science books that I believe everyone should give a read. Why You Are *Not* a (Data) Scientist Why You Are *Not* a (Data) ScientistIs Bill Gates a scientist? Image created from screenshots of Wikipedia articlesBy now, you already know that I have a fundamental problem with the term “Data Scientist” and how it obscures the nuances and intricacies required to fulfill this job. Data science is related to data mining, machine learning and big data. — WikipediaIn my view, very broadly, data science deals with the “what” while computer science deals with the “how” in terms of computation. I do understand that data science needs the rigor of a scientist, needs to be fact-based with rigid process, coupled with extreme speculation of one’s own results. Efficient Machine Learning — Why You Should Think About the “Customer” in Your Algorithm Efficient Machine Learning — Why You Should Think About the “Customer” in Your AlgorithmLooking at traditional metrics alone will leave you high and dry when users find your system unusable. Kabir Nagrecha 6 days ago·4 min readIt’s easy to get caught up in building a lovely new machine learning algorithm. If you genuinely believe that your approach is 100% accurate, either you don’t need machine learning or you need to rethink what you’re working on! Consider how your audience is using the modelWho is going to be using your machine learning algorithm? In the online setting, your users need results and they need them now. Learning from Audio: Pitch and Chromagrams Learning from Audio: Pitch and ChromagramsIntroductionNow that the idea of a spectrogram is fully understood, we want to delve deeper into various structures beyond the frequency over time. The higher the sound, the higher the pitch and the lower the sound, the lower the pitch. To fully understand pitch, we need to understand the pitch classes and octaves. If we start at C and go up one black key, we hit what’s called C# (pronounced C sharp.) About ChromagramsNow that we understand pitch in Music, we can dive into chroma filters which acts as the basis of our chromagrams. Time Series Forecasting with PyCaret Regression Module ? PyCaret Regression ModulePyCaret Regression Module is a supervised machine learning module used for estimating the relationships between a dependent variable (often called the ‘outcome variable’, or ‘target’) and one or more independent variables (often called ‘features’, or ‘predictors’). PyCaret’s Regression module default settings are not ideal for time series data because it involves few data preparatory steps that are not valid for ordered data (data with a sequence such as time series data). PyCaret regression module by default uses k-fold random cross-validation when evaluating models. The following section in this tutorial will demonstrate how you can change default settings in PyCaret Regression Module easily to make it work for time series data. ? Initialize SetupNow it’s time to initialize the setup function, where we will explicitly pass the training data, test data, and cross-validation strategy using the fold_strategy parameter. Similarity Metrics in NLP The first of those is the dot product. So, vectors' orientation is often seen as being just as important (if not more so) as distance. The dot product is calculated using:Dot product formulaThe dot product considers the angle between vectors, where the angle is ~0, the cosθ component of the formula equals ~1. So, a higher dot-product correlates with higher orientation. It is not normalized — meaning larger vectors will tend to score higher dot products, despite being less similar. Build a Transformer in JAX from scratch Build a Transformer in JAX from scratchImage by AuthorIn this tutorial, we will explore how to develop a Neural Network (NN) with JAX. So are you ready to build a Transformer with JAX and Haiku? Down the road, we will need to transform the forward_fn function into a pure function using hk.transform so that we can take advantage of automatic differentiation, parallelization etc. The model will be a pure forward_fn function transformed by hk.transformforward_fn = build_forward_fn(vocab_size, d_model, num_heads, num_layers, dropout_rate) forward_fn = hk.transform(forward_fn)2. ConclusionIn this article, we saw how one can develop and train a vanilla Transformer in JAX using Haiku. What Is a Gradient in Machine Learning? For example, deep learning neural networks are fit using stochastic gradient descent, and many standard optimization algorithms used to fit machine learning algorithms use gradient information. In this tutorial, you will discover a gentle introduction to the derivative and the gradient in machine learning. This idea of gradient from algebra is related, but not directly useful to the idea of a gradient as used in optimization and machine learning. We can use gradient and derivative interchangeably, although in the fields of optimization and machine learning, we typically use “gradient” as we are typically concerned with multivariate functions. This is the basis for the gradient descent (and gradient ascent) class of optimization algorithms that have access to function gradient information. Q&A with Abhinav Gupta, winner of the J.K. Aggarwal Prize We’re pleased to congratulate Facebook AI’s Abhinav Gupta on receiving the International Association for Pattern Recognition’s J.K. Aggarwal Prize for his work in unsupervised and self-supervised learning. Gupta is a research manager at Facebook AI Research and an associate professor at Carnegie Mellon University. Abhinav Gupta: The visual world is rich yet structured. So we came up with the idea of leveraging the redundancy in visual data to act as supervision for training convolutional neural networks. Some recent Facebook papers like MOCO, PIRL, and SwAV have demonstrated that self-supervised learning can even outperform supervised learning in a few cases. Protecting people from hazardous areas through virtual boundaries with Computer Vision Choose Create notebook instance. For Notebook instance name, enter a name for your notebook instance. client.publish(topic=iot_topic, payload='Loading model...') model = awscam.Model(model_path, {'GPU': 1})Then you run the model frame-per-frame over the images from the camera. For more details about connecting an AWS DeepLens device to a Raspberry Pi device, see Building a trash sorter with AWS DeepLens. For a more detailed walkthrough of this tutorial and other tutorials, samples, and project ideas with AWS DeepLens, see AWS DeepLens Recipes. HawkEye 360 uses Amazon SageMaker Autopilot to streamline machine learning model development for maritime vessel risk assessment HawkEye 360 partnered with the Amazon ML Solutions Lab to build machine learning (ML) capabilities into our analytics. Knowing which characteristics are indicative of a suspicious vessel isn’t immediately clear. The following image demonstrates some of the existing pattern finding behavior that has been built into Mission Space. We can see that a Mission Analyst identified a specific rendezvous (highlighted in magenta) and Mission Space automatically identified other related rendezvous (in purple). He is responsible for the conception, creation, and productization of all HawkEye space innovations. Set up an Airflow Environment on AWS in Minutes Set up an Airflow Environment on AWS in MinutesApache Airflow is a powerful platform for scheduling and monitoring data pipelines, machine learning workflows, and DevOps deployments. In this post, we’ll cover how to set up an Airflow environment on AWS and start scheduling workflows in the cloud. Teams can easily share an Airflow environment, so you’re not always required to be on-call for your productions jobs. Up until recently, creating and maintaining an Airflow environment was fairly complex, even for experienced engineers. Navigate to Managed Apache Airflow in the AWS console and click Create environment . How I Created a Fake News Detector with Python How I Created a Fake News Detector with PythonPhoto by Markus Winkler on UnsplashThe proliferation of fake news is a significant challenge for modern democratic societies. The Greek Fake News DatasetThe success of every machine learning project depends on having a proper and reliable dataset. After that process was completed, the resulting dataset was used to train the text classification model of the Greek Fake News Detector application. This library was used to create the text classification model of the Greek Fake News Detector application. Developing the Web ApplicationI decided to develop the Greek Fake News Detector for a number of a reasons. Sampling Techniques in Statistics In this article, we will try to understand what sampling is and then get into the details of different sampling techniques. Image by authorThere are a lot of sampling techniques out there but we will just talk about a few common sampling techniques in statistics. Cluster Sampling:Cluster sampling is often confused with stratified sampling but both these sampling techniques are different from each other. Convenience Sampling:It is one of the easiest sampling techniques but it is one of the most dangerous sampling techniques as the samples are selected based on availability. The sampling techniques — simple, cluster, stratified and systematic are all probability sampling techniques and involve randomization. Machine Comprehension with BERT Today, I will show you how to c̶h̶e̶a̶t̶ ̶o̶n̶ ̶y̶o̶u̶r̶ ̶g̶r̶a̶d̶e̶ ̶3̶ ̶h̶o̶m̶e̶w̶o̶r̶k set up your own reading comprehension system using BERT. To get started, you will need Docker. We will be using Docker to make this work more usable and the results more reproducible. For the purpose of this tutorial, you will not need to pull any images as the config file already does that. The config file will also mount our local bert_QA folder as /workspace in the container. Decision Tree Algorithm in Python From Scratch The aim of this article is to make all the parts of a decision tree classifier clear by walking through the code that implements the algorithm. The code uses only NumPy, Pandas and the standard python libraries. The full code can be accessed via https://github.com/Eligijus112/decision-tree-pythonAs of now, the code creates a decision tree when the target variable is binary and the features are numeric. This is completely sufficient to understand the algorithm. The golden standard of building decision trees in python is the scikit-learn implementation:When I tested out my code I wanted to make sure that the results are identical to the scikit-learn implementation. Optimizing the Price-Performance Ratio of a Serverless Inference Service with Amazon SageMaker Step 1: Creating and deleting inference services from SageMaker Training jobs using Boto3First, we will use Boto3 to create simple inference services based on API Gateway and AWS Lambda. Each SageMaker Training job will start by creating a service, and then delete it before ending. Step 2: Load testing the inference service and giving it a price-performance scoreWhen an inference service is created, our entry_point.py performs a load test to get its latency response performance. It makes it very easy to run a load tester from within a SageMaker Training job. Below is the code snippet for load testing from our entry point:With line 11, we calculate a basic score aggregate for the inference service. Exploratory Data Analysis, Visualization, and Prediction Model in Python Exploratory Data Analysis, Visualization, and Prediction Model in PythonThis article focuses on a data storytelling project. As a data scientist or a data analyst, you may have to deal with data where subject matter is not so well known to you. To get an understanding of data, it is helpful to find some commonly used descriptive variable data such as mean, median, max, min, std, and quartiles. Prediction of DeathUsing all the variables in the dataset, we can train a machine learning model to predict death. So that after training the model, you can check the model with some data that is unseen by the model. Run machine-learning workflows to transform data and build AI-powered text indices with txtai Run machine-learning workflows to transform data and build AI-powered text indices with txtaiPhoto by Omar Flores on Unsplashtxtai executes machine-learning workflows to transform data and build AI-powered text indices to perform similarity search. txtai supports indexing text snippets, documents, audio and images. Pipelines and workflows enable transforming data with machine-learning models. In addition to building embedding indices, txtai now supports transformations to prepare data for indexing through pipelines, workflows to join pipelines together, API bindings for JavaScript/Java/Rust/Go and the ability to scale out processing. This article will cover methods to vectorize data, machine-learning pipelines and workflows. Stratified normalization: Using additional information to improve the neural network’s performance Therefore, we have studied a new participant-based normalization method, named stratified normalization, for training deep neural networks. To present our paper briefly, I will focus on the proposed method, and I will contrast it with the well-known batch normalization method. Batch normalization method. Stratified normalizationThe stratified normalization is the method proposed and consists of a feature normalization per participant and session. Stratified normalization method. Mastering the shifts with variational autoencoders the interest in this area started with the introduction of the band excitation (BE) method in scanning probe microscopy (SPM). In these, the cantilever is excited by an oscillatory signal applied either to the piezo element driving cantilever (intermittent contact topographic imaging, magnetic force microscopy), or cantilever directly (Kelvin Probe Force Microscopy and Electrostatic Force Microscopy). Our encoder maps the inputs (spectra) into the offset latent variable and two (or more) conventional latent variables. However, once we plot our latent variables versus the ground truth, we see that while there is a definite relationship between the ground truth variables and latent variables, they are not equal. Classically in VAEs, this collapse of latent space is perceived as a problem calling for the adjustment of the “loss” function. AWS and NVIDIA to bring Arm-based Graviton2 instances with GPUs to the cloud We’re working with NVIDIA to bring an Arm processor-based, NVIDIA GPU accelerated Amazon Elastic Compute Cloud (Amazon EC2) instance to the cloud in the second half of 2021. In 2018, AWS was the first major cloud provider to offer Arm-based instances in the cloud with EC2 A1 instances powered by AWS Graviton processors. In 2020, AWS released AWS-designed, Arm-based Graviton2 processors, delivering a major leap in performance and capabilities over first-generation AWS Graviton processors. AWS Graviton2 processors deliver seven times more performance, four times more compute cores, five times faster memory, and caches twice as large over first-generation AWS Graviton processors. To learn more about how AWS and NVIDIA work together to bring innovative technology to customers, visit AWS at NVIDIA GTC 21. Nine Emerging Python Libraries You Should Add to Your Data Science Toolkit in 2021 As Data Science continues to grow and develop, it’s only natural for new tools to emerge, especially considering the fact that data science had some significant barriers to entry in the past. In this article, I wanted to go over nine libraries that I’ve come across in the past year that are game changers. These libraries have been incredibly useful in my data science journey and I wanted to share them with you in hopes that it’ll help you with your journey too! The following libraries are broken down into three categories: Putting Your Models Into Production Putting Your Models Into ProductionSource (Unsplash)You’ve been slaving away for an innumerable number of hours trying to get your model just right. An often overlooked step is the actual deployment of these models into production. In this article, we’ll look at Tensorflow Serving, a system for deploying TensorFlow models into production. Since we will not be focusing on the model building process, we’ll just create a simple sequential model using Keras that we’ll train on the MNIST dataset. We’ve successfully deployed our model using TensorFlow Serving! How to Build A First-Time Machine Learning Project (with Full Code) How to Build A First-Time Machine Learning Project (with Full Code)Photo by Crystal Kwok on UnsplashWhile Machine Learning can seem overwhelming, knowing where to begin is key when looking for potential ways to get started. When trying to identify data of your own, I find that there are 5 good questions to ask when kicking off a Machine Learning project. 1 = Work Order was Past Due, and 0 = Work Order completed on time.) In summary, the HAVC department has a very high likelihood of having late Work Orders, whereas Plumbing has a very low likelihood of late Work Orders. I’m genuinely curious to hear about how you’ve been able to use Machine Learning to solve a problem. How to get a Free Server for a Machine Learning model How to get a Free Server for a Machine Learning modelA 10 step tutorial on how to start and configure a free server anywhere in the world Roman Orac Just now·5 min readPhoto by Paul Hanaoka on UnsplashHaving an always-on server is a great way to show your references to your future employers or to test your Machine Learning model in the real world. When successfully connected to AWS instance, you should see this ASCII art:Connect to AWS server (image by the author),7. Deploy a Machine Learning model to ServerYour 1st Machine Learning Model in the Cloud (image by the author). The Your 1st Machine Learning Model in the Cloud will teach you how to develop and deploy a Machine Learning model to a server in the Cloud. Before you goFollow me on Twitter, where I regularly tweet about Data Science and Machine Learning. NumPy Basics Cheat Sheet (2021), Python for Data Science The NumPy library is the core library for scientific computation in Python. It provides a high-performance multidimensional array object and tools for working with arrays. Check out the different sections below to learn the various array functions and tools NumPy offers. Creating ArraysA NumPy array is a grid of values, all of the same type. Array of evenly spaced values (step value)>>> np.arange(10,25,5)array([10, 15, 20])Array of evenly spaced values (number of samples)>>> np.linspace(0,2,9)array([0.0, 0.25, 0.5 , 0.75, 1.0, 1.25, 1.5 , 1.75, 2.0])2x2 identity matrix>>> np.eye(2)array([[1., 0. Why do naive Bayes classifiers work? The assumption of independence underlying naive Bayes classification may be unrealistic, but it doesn’t always hinder performance — this article explains why. Naive Bayes classifiers are based on Bayes theorem and work surprisingly well despite their naive assumption of conditional independence amongst input variables. Why do naive Bayes classifiers work well even when dependencies exist? A classic application of naive Bayes — Email spam filteringLet’s look at a simple example to illustrate how naive Bayes works and how the independence assumption plays a role. As we’ve discussed, naive Bayes works well in lots of situations despite its naive independence assumption. How to Use Pairwise Correlation For Robust Feature Selection In my last article on the topic of Feature Selection, we focused on a technique to remove features based on their individual properties. It ranges from -1 to 1, -1 being a perfect negative correlation and +1 being a perfect positive correlation. For an in-depth guide on how to use, interpret and understand correlation coefficient, refer to my separate article. But what does correlation coefficient have to do with Machine Learning or feature selection? That’s why there is no point in keeping feature 2 since it only adds to complexity when training a model. CenterNet Explained CenterNet, ExplainedWet dog heat map. Each box prediction is encoded as x- and y- offsets relative to the cell center, and width- and height-offsets relative to the corresponding anchor. This generates a singular point at the elementwise comparison output, which is then scaled back to confidence values via the elementwise multiplication. This detection framework has since been used for more detection tasks, both by the original authors and other researchers, such as pose estimation and 3D object detection. The PerceptronTraining Neural Networks Explained SimplyObject Detection With Deep Learning — RCNN, Anchors, Non-Maximum-SuppressionYOLOv3 Explained Deep Learning Techniques for Text Classification Deep Learning Techniques for Text ClassificationPhoto by Annie Spratt on UnsplashA. ObjectivesThe experiment will evaluate the performance of some popular deep learning models, such as feedforward, recurrent, convolutional, and ensemble-based neural networks, on five text classification datasets. [9] introduced a novel deep learning technique for classification called Random Multimodel Deep Learning (RMDL). The Models Summary with their Feature ExtractionsTo sum up, we will build deep learning models using two different feature extractions on five text classification datasets as follows:Table 7. ConclusionsThis project has demonstrated a comprehensive experiment focusing on building deep learning models using two different feature extractions on five text classification datasets. Five Simple Image Data Augmentation Techniques to Mitigate Overfitting In Computer Vision Five Simple Image Data Augmentation Techniques to Mitigate Overfitting In Computer VisionResearch has demonstrated the efficiency of Deep Convolutional Neural networks for many Computer Vision tasks in many domains such as autonomous vehicles, medical imaging, etc. One thing is clear about data: the more high quality data we have, the better it is. To be clear of any doubts, we are going to increase the number of our training data with a technique called Image Augmentation in order for our model to avoid overfitting the training image data. The more high quality data we have, the better it is, but getting such data can be very expensive and time-consuming. Basic image augmentation techniques and illustrationsThese five main techniques of image transformations can be used to increase the size of the data. Implications of Information Theory in Machine Learning The probability distribution for x will be p(x) = Pr{χ = x}, x ∈ χ. Mutual InformationMutual Information is a measure of the amount of information that one random variable contains about another random variable. H(Y), in this case, will be:Therefore, Mutual Information is defined by:Calculation of Mutual InformationAlternatively, I can also use H(X) and H(X|Y) to calculate Mutual Information, and it will yield the same result. Mutual Information between them can be a great precursor to check how useful the feature will be for predictions. Let us discuss the implications of Information Theory in Machine Learning. Survival Models for Histopathology Survival Models for HistopathologyMachine learning algorithms for histopathology images are becoming increasingly complex. Others train a CNN model to distinguish tumor from non-tumor and then use only the tumor regions for the survival model. They then randomly sampled one large patch from each ROI and trained a CNN survival model [Zhu2016]. Selecting the highest and lowest scoring tiles for survival model training [Courtiol2019]Examining the highest and lowest survival patches was particularly insightful. Another survival model then turned the aggregated features into a risk prediction using a linear survival model. Remove Body Tattoo Using Deep Learning Remove Body Tattoo Using Deep LearningPhoto by Tima Miroshnichenko from PexelsDeep Learning is interesting and it is my favorite area of research. I really like to play with new research development that deep learning practitioners are doing. I just came across an amazing GitHub repo from one of my fellow mates on the computer vision group. It can remove any kind of tattoo from body parts. I will walk you through the step-by-step process of how you can utilize the same repo using your own image. Face Mask Detection using darknet’s YOLOv3 Face Mask Detection using darknet’s YOLOv3This article aims to offer complete guidelines (step-by-step) for someone who wants to train an object detector from the YOLO family on custom data. For this tutorial, I am going to use YOLOv3, one of the most frequently used versions of the YOLO family, which comprises the state-of-the-art object detection system for the real-time scenario and it is amazingly accurate and fast. I chose Face Mask Detection dataset from Kaggle and I downloaded it directly to my Google Drive (you can check out how to do so here). To create a .txt file we need 5 things from each .xml file. For each in an .xml file fetch the class (namely the field), and the coordinates of the bounding box (namely the 4 attributes in ). Gradient Descent With Adadelta from Scratch Tutorial OverviewThis tutorial is divided into three parts; they are:Gradient Descent Adadelta Algorithm Gradient Descent With Adadelta Two-Dimensional Test Problem Gradient Descent Optimization With Adadelta Visualization of AdadeltaGradient DescentGradient descent is an optimization algorithm. The objective() function below implements this function# objective function def objective(x, y): return x**2.0 + y**2.0 1 2 3 # objective function def objective ( x , y ) : return x* * 2.0 + y* * 2.0We can create a three-dimensional plot of the dataset to get a feeling for the curvature of the response surface. ... # calculate gradient gradient = derivative(solution[0], solution[1]) 1 2 3 . shape [ 0 ] ) ] # run the gradient descent for it in range ( n_iter ) : # calculate gradient gradient = derivative ( solution [ 0 ] , solution [ 1 ] ) # update the average of the squared partial derivatives for i in range ( gradient . shape [ 0 ] ) ] # run the gradient descent for it in range ( n_iter ) : # calculate gradient gradient = derivative ( solution [ 0 ] , solution [ 1 ] ) # update the average of the squared partial derivatives for i in range ( gradient . How Microlearning Can Help You Improve Your Data Science Skills in Less Than 10 Minutes Per Day How does the use of microlearning benefit the data science learning experience? Microlearning is a beneficial learning tool for those who already have a basic foundation in data science skills. Therefore, only spending 10 minutes per day learning concepts will leave you knowing only the bare bones of data science after 6 months of study. How to use microlearning to improve data science skills. Using microlearning to learn data science concepts is no exception to this rule. Build Multiple Machine Learning Models Easily Build Multiple Machine Learning Models EasilyPhoto by Pietro Jeng on UnsplashCreating machine learning models and finding out the best model is a tiresome task because it will take a lot of time and effort. Lazy predict helps in building multiple machine learning models in just 2 lines of code. It not only creates multiple models but also helps in understanding which models work for the given data. In this article, we will learn how to use lazy predict and create multiple machine learning models using it. This is how we can use Lazy Predict to create multiple machine learning models easily and effortlessly. Causal Inference in Data Science: A/B Testing & Randomized Trials with Covariate Adjustment Causal Inference in Data Science: A/B Testing & Randomized Trials with Covariate AdjustmentPhoto by Caspar Camille Rubin on Unsplash1: Background and MotivationCausal Inference is a field that touches several domains. These include:This piece concerns A/B Tests (aka randomized trials) and specification of statistically efficient conditional sampling estimators via covariate adjustment. Image by AuthorIn the null Causal DAG above, we have binary intervention A and continuous outcome Y. Note however we have a set of additional variables in set L with directed arrows into outcome Y. The marginal Causal DAG (under the null) corresponding to the idealized A/B test described above is show in Figure 3. This 21 Step Guide Will Help Implement Your Machine Learning Project Idea Frame the machine learning problemSince we have already identified the business problem, here we reframe it into a machine learning problem. Broadly speaking, most business problems fall into one of these 3 types of machine learning problems. Transfer Learning — When you take the information an existing machine learning model has learned and adjust it to your own problem. A good tool to start using is MLflow, which is an open-source end-to-end machine learning workflow library. We have done everything to build the best machine learning model for the problem at hand. How To Analyze Survey Data In Python Therefore, I won't waste any of your time (or mine) and I will stick to highlighting methods and tools that are specifically useful in survey data. Groupby Crosstabs and HeatmapsLooking at subgroups of the data can be extremely important, especially in survey data. However, often with survey data, people are allowed to chose more than one answer to a question. A heatmap that shows fake survey data in percentagesSummaryThis, of course, isn’t a fully exhaustive process on how to analyse survey data. As mentioned, I have saved all the code in my Github account here and I really hope that this will make your life easier when dealing with tedious survey data in the future. How to Split a Dataset Into Training and Testing Sets Creating different data samples for training and testing the model is the most common approach that can be used to identify these sort of issues. In this way, we can use the training set for training our model and then treat the testing set as a collection of data points that will help us evaluate whether the model can generalise well to new, unseen data. The simplest way to split the modelling dataset into training and testing sets is to assign 2/3 data points to the former and the remaining one-third to the latter. Therefore, we train the model using the training set and then apply the model to the test set. Note that splitting the dataset into training and testing sets is not the only action that could be required in order to avoid phenomenons such as overfitting. Fuzzy C-Means Clustering —Is it Better than K-Means Clustering? Fuzzy C-Means Clustering —Is it Better than K-Means Clustering? One of the widely used soft clustering algorithms is the Fuzzy C-means clustering (FCM) Algorithm. Fuzzy C-Means Clustering:Fuzzy C-Means clustering is a soft clustering approach, where each data point is assigned a likelihood or probability score to belong to that cluster. Installation and Usage:To implement the fuzzy c-means algorithm, we have an open-sourced Python package, that can be installed using PyPl:pip install fuzzy-c-meansFuzzy c-means is a Python module that can implement the fuzzy c-means algorithm. References:[1] Fuzzy Clustering Wikipedia (18 Jan 2021): https://en.wikipedia.org/wiki/Fuzzy_clustering[2] Fuzzy c-means Documentation: https://pypi.org/project/fuzzy-c-means/ Dimensionality reduction with Autoencoders versus PCA Dimensionality reduction with Autoencoders versus PCAExample of a dimensionality reduction with PCA (left) and Autoencoder (right). IntroductionPrincipal Component Analysis (PCA) is one of the most popular dimensionality reduction algorithms. The steps to perform PCA are:Standardize the data. Obtain the Eigenvectors and Eigenvalues from the covariance matrix or correlation matrix, or perform Singular Value Decomposition. We will perform PCA with the implementation of sklearn: it uses Singular Value Decomposition (SVD) from scipy ( scipy.linalg ). A checklist to track your Data Science progress The second commonly used network type is the Convolutional Neural Networks, which uses convolution operation at its core. What’s also common is to store data about the data, the metadata, in separate files. In summary, this category has you learn to handle the tasks around running neural networks on small datasets. In case you miss some feature, this is the point where you begin to write custom training loops and custom callbacks. Use a library of your choice and simply re-implement it (and tick advanced architectures or convolutional neural networks or both). 4 Must-Know Special Methods for Python 4 Must-Know Special Methods for PythonPhoto by Dorothea OLDANI on UnsplashEverything in Python is an object and we define objects through classes. In addition to the user-defined functions, it is possible to use built-in Python function within user-defined classes. Special methods allow for using the built-in Python functions to enrich user-defined classes. Consider the print function which is one of the most commonly used Python function. In this article, we will go over 4 special methods that you will be likely to use in your classes. NLP With RAPIDS, Yes It’s Possible! Raw Embeddings from Torch Sentence TransformerFor sentence embeddings, we want to map a variable length input text to a fixed sized dense vector. Why did I choose the paraphrase model? First, for each input sentence, you have to map words to their ids in the vocabulary used by the model, adding and tokens. Eventually, we have to add tokens at the end in order for our input to fit the model max input sequence length. For all 768 dim of my normalized sentence embeddings, I have computed the coordinate’s variance among my 90_000 text samples. Quantitative evaluation of a pre-trained BERT model Quantitative evaluation of a pre-trained BERT model. While most, if not all, downstream NLP tasks are performed, to date, with subsequent fine-tuning of a pre-trained transformer model, it is possible to use a pre-trained model as is, without subsequent fine tuning. For instance, the utility of a pre-trained BERT model as is, without any fine tuning for a wide variety of NLP tasks is largely overlooked. Examples of direct use of a pre-trained BERT model without fine tuning areUnsupervised NER . Such direct use-cases of a pre-trained model, however, highlight the need for a quantitative (as opposed to a qualitative measure) evaluation method of the pre-trained model on a task, beyond just the model training loss (or training for a fixed number of steps). How to Use Variance Thresholding For Robust Feature Selection Intro to Feature Selection With Variance ThresholdingToday, it is common for datasets to have hundreds if not thousands of features. That’s why there is an entire skill to be learned in the ML field — feature selection. Feature selection is the process of choosing a subset of the most important features while trying to retain as much information as possible. Basic feature selection techniques should be able to drop BMI by finding out that BMI can be represented by weight and height. In this article, we will explore one such feature selection technique called Variance Thresholding. 5 Essential Skills To Develop As A Data Scientist! To overcome these situations and thrive with perseverance is paramount for every successful data scientist. Data Science is not always a simple subject, as there are challenges and obstacles that could hinder even the best data scientists. Hence, every data scientist must continue to persevere and destroy all the obstacles they come across with sheer will and passion! One of the prominent aspects of Data Science is the amount of content and quality of matter that you can learn through your passing days as a data scientist or a Data Science enthusiast. The fire and will to learn and master Data Science will be the driving force to making you accomplish a successful career as a data scientist. Google AI Blog: Monster Mash: A Sketch-Based Tool for Casual 3D Modeling and Animation Because of its complexity, 3D animation is generally practiced by teams of skilled specialists and is inaccessible to almost everyone else, despite decades of advances in technology and tools. Creating a walk cycle using Monster Mash. Here you can see a few of the animated characters that have been created using Monster Mash. The original hand-drawn outline used to create each 3D model is visible as an inset above each character. All of the code for Monster Mash is available as open source, and you can watch our presentation and read our paper from SIGGRAPH Asia 2020 to learn more. Acoustic anomaly detection using Amazon Lookout for Equipment The ML Solutions Lab team used the existing data collected by KAES equipment in the field for an in-depth acoustic data exploration. Anomaly detection with Amazon Lookout for EquipmentTo implement these solutions, the ML Solutions Lab team used Amazon Lookout for Equipment, a new service that helps to enable predictive maintenance. Amazon Lookout for Equipment uses AI to learn the normal operating patterns of industrial equipment and alert users to abnormal equipment behavior. Amazon Lookout for Equipment analyzes the data from industrial equipment sensors to automatically train a specific ML model for that equipment with no ML expertise required. After sufficient data is ingested into the Amazon Lookout for Equipment platform, inference can begin and anomaly detections can be identified. Detect abnormal equipment behavior and review predictions using Amazon Lookout for Equipment and Amazon A2I Set up an Amazon A2I private human loop and review the predictions from Amazon Lookout for Equipment. Create the Amazon Lookout for Equipment datasetWe use Amazon Lookout for Equipment Create Dataset APIs to create a dataset and provide the component map we created in the previous step as an input. For more details, refer to the section Set up Amazon A2I to review predictions from Amazon Lookout for Equipment in this post. With Amazon Lookout for Equipment and Amazon A2I, you can set up a continuous prediction, review, train, and feedback loop to audit predictions and improve the accuracy of your models. Visit the webpages to learn more about Amazon Lookout for Equipment and Amazon Augmented AI. How to Start Implementing Machine Learning to Music Music Source Separation is a task in which Machine Learning models learn the structure of a specific sound, extract it, and isolate it from the other sounds it is mixed into. Spleeter is a bi-directional LSTM that is trained on the MUSDB dataset, a dataset specifically for music source separation. Deezer:MUSDB:Real-time AccompanimentReal-time Accompaniment is one of the more interesting problems tackling the idea of creativity in Music Generation. Let’s say you decide to teach the network to play music through the piano in the genre of Jazz (side note: it is highly unlikely to be able to teach a network to play piano across many genres as it is far too broad. As you feed in the music without a piano, it will play alongside the other instruments in real time, creating its own generated performance! A journey of building an Advanced Object Detection Pipeline — Doubling YoloV5’s performance A journey of building an Advanced Object Detection Pipeline — Doubling YoloV5’s performanceSeveral tricks I used during a Kaggle object detection competition which boosted my score to roughly top 10% during most of the competition Mostafa Ibrahim 17 hours ago·6 min readPhoto by Diego PH on UnsplashI have spent the last 3 months diving deep into object detection. I have tried tons of stuff ranging from implementing state-of-the-art models like YoloV5, VFNets, DETR to fusing object detection models with image classification models to boost performance. The official competition metric was (mean) Average Precision which is one of the most commonly used object detection metrics. Weighted boxes fusion is a technique to filter down the number of boxes that object detection models produced so that the results are more accurate and more correct. DETR is an amazing object detection transformer that I have written an article about before. 11 Times Faster Hyperparameter Tuning with HalvingGridSearch While both GridSearch and RandomizedSearch train the candidates on all of the training data, HalvingGridSearch and HalvingRandomSearch take a different approach called successive halving. In the first iteration, HGS trains all candidates on a small proportion of the training data. So, with each passing iteration, the ‘surviving’ candidates will be given more and more resources (training samples) until the best set of hyperparameters are left standing. min_samples takes an integer to specify the number of samples of the training data to use in the first iteration. All candidates are trained on this data and in the next iteration min_samples grows by factor and the number of candidates decreases by factor . 10 Tips To Land Your First Data Science Job As a New Grad 10 Tips To Land Your First Data Science Job As a New GradPhoto by Austin Chan on UnsplashMy Job Search JourneyI recently started a new job as a Data Scientist at my dream company. Although I was very disappointed, the fact that I had the chance of getting the job interview did help boost my confidence. There are a lot of resources to help you brush up on the basic data science interview. Here are a few common behavioral interview questions suggested by Akshay Sachdeva:Describe a time when you disagreed with a team member. I know it’s very difficult to land your first data science job without prior experience, but it’s not impossible! How to Create an Answer From a Question With DPR These two values are then multiplied to give the TF-IDF score. — the word is found in both the query and the context (high score). Because IDF is a low number due to how common the is, the TF-IDF score is low too. So, the TF-IDF score is great for finding sequences that contain the same uncommon words. The TF-IDF score is normalized so that short documents will score better than long documents given they both have the same number of word matches. A checklist to track your Data Science progress In summary, this category emphasizes handling small audio/time-series, image, and text datasets, and applying simple operations to pre-process the data. GeneralThe intention behind this category is to learn the general handling of Data Science related tasks. What’s also common is to store data about the data, the metadata, in separate files. In summary, this category has you learn to handle the tasks around running neural networks on small datasets. In case you miss some feature, this is the point where you begin to write custom training loops and custom callbacks. What Is Semi-Supervised Learning Tutorial OverviewThis tutorial is divided into three parts; they are:Semi-Supervised Learning Books on Semi-Supervised Learning Additional ResourcesSemi-Supervised LearningSemi-supervised learning is a type of machine learning. Semi-Supervised Learning, 2006The book “Semi-Supervised Learning” was published in 2006 and was edited by Olivier Chapelle, Bernhard Scholkopf, and Alexander Zien. Table of ContentsChapter 01: Introduction to Semi-Supervised LearningPart I: Generative Models Chapter 02: A Taxonomy for Semi-Supervised Learning Methods Chapter 03: Semi-Supervised Text Classification Using EM Chapter 04: Risks of Semi-Supervised Learning Chapter 05: Probabilistic Semi-Supervised Clustering with ConstraintsPart II: Low-Density Separation Chapter 06: Transductive Support Vector Machines Chapter 07: Semi-Supervised Learning Using Semi-Definite Programming Chapter 08: Gaussian Processes and the Null-Category Noise Model Chapter 09: Entropy Regularization Chapter 10: Data-Dependent RegularizationPart III: Graph-Based Methods Chapter 11: Label Propagation and Quadratic Criterion Chapter 12: The Geometric Basis of Semi-Supervised Learning Chapter 13: Discrete Regularization Chapter 14: Semi-Supervised Learning with Conditional Harmonic MixingPart IV: Change of Representation Chapter 15: Graph Kernels by Spectral Transforms Chapter 16: Spectral Methods for Dimensionality Reduction Chapter 17: Modifying DistancesPart V: Semi-Supervised Learning in Practice Chapter 18: Large-Scale Algorithms Chapter 19: Semi-Supervised Protein Classification Using Cluster Kernels Chapter 20: Prediction of Protein Function from Networks Chapter 21: Analysis of BenchmarksPart VI: Perspectives Chapter 22: An Augmented PAC Model for Semi-Supervised Learning Chapter 23: Metric-Based Approaches for Semi-Supervised Regression and Classification Chapter 24: Transductive Inference and Semi-Supervised Learning Chapter 25: A Discussion of Semi-Supervised Learning and TransductionI highly recommend this book and reading it cover to cover if you a Object detection with Detectron2 on Amazon SageMaker Object detection, which is one type of CV task, has many applications in various fields like medicine, retail, or agriculture. Object detection models allow you to implement these diverse use cases and automate your in-store operations. In this post, we discuss Detectron2, an object detection and segmentation framework released by Facebook AI Research (FAIR), and its implementation on Amazon SageMaker to solve a dense object detection task for retail. Researchers use this dataset to test object detection algorithms on dense scenes. You can reuse the code associated with this post on your own data labeled for object detection with Ground Truth. Win a digital car and personalize your racer profile on the AWS DeepRacer console With the 2021 AWS DeepRacer League Virtual Circuit now underway, developers have five times more opportunities to win physical prizes, such as exclusive AWS DeepRacer merchandise, AWS DeepRacer Evo devices, and even an expenses paid trip to AWS re:Invent 2021 to compete in the AWS DeepRacer Championship Cup. Digital rewards: Collect them all and showcase your collectionDigital rewards are unique cars, paint jobs, and body kits that are stored in a new section of the AWS DeepRacer console: your racer profile. The next time you log in and access your racer profile, you’ll see the celebration to commemorate your achievement. Customize your racer profile and avatarWhile new digital rewards allow you to customize your car on the track, the new Your racer profile page allows you to customize your personal appearance across the AWS DeepRacer console. Head over to the AWS DeepRacer League today to get rolling, or sign in to the AWS DeepRacer console to start customizing your avatar and collecting digital rewards! Improve operational efficiency with integrated equipment monitoring with TensorIoT powered by AWS Detecting industrial equipment issues at an early stage and using that data to inform proper maintenance can give your company a significant increase in operational efficiency. Customers see value in detecting abnormal behavior in industrial equipment to improve maintenance lifecycles. Amazon Lookout for Equipment automates these traditional data science steps to open up more opportunities for a broader set of equipment than ever before. Combining TensorIoT and Amazon Lookout for Equipment has never been easierTo delve deeper into how to visualize near real-time insights gained from Amazon Lookout for Equipment, let’s explore the process. With Amazon Lookout for Equipment and TensorIoT solutions, TensorIoT helps make your assets even smarter. Medical NER with AWS Comprehend AWS Comprehend is a high-level service, AWS offers that automates many different NLP tasks such as Sentiment Analysis, Topic Modeling, and NER. Comprehend branched out to create an sub-service called Comprehend Medical, that is specifically geared for Medical NER. In this article we will cover how to build a web application with Streamlit, that can call Comprehend Medical and return medical entities detected. This REST API will serve as an interface for the backend Lambda function which uses the Boto3 SDK to access Comprehend Medical for Medical NER. Integrate Lambda Function with AWS Comprehend MedicalNow that the general flow of the architecture has been established we can focus on the back-end work to integrate Comprehend Medical for NER. Understanding Evaluation Metrics in Classification Modeling To find out the model performance in classification modeling, we use the classification result table that usually we call Confusion Matrix. In this article, I will tell you how to properly use the evaluation metrics in classification modeling. In classification modeling, evaluation metrics are calculated from True Positives (TP), False Positives (FP), True Negatives (TN), and False Negatives (FN). Here I will explain some of the evaluation metrics in classification modeling and when you should properly use them. A review of evaluation metrics for data classification evaluations. An Introduction to Kernel Methods Thus, what one can do is to express the probability distribution as a superposition of “mini-densities” as a so-called kernel density (normalized by the amount of observations). Code 1: Kernel density function. Figure 2: Gaussian kernel with different values for γ. One possible kernel is the Gaussian kernel, which is parametrized by a hyperparameter γ (Figure 2). While there are empirical formulas to choose γ, the easiest way is to perform cross-validation to find its optimal value. Super-fast Machine Learning to Production with BigQuery ML But if your company is at the beginning of its Machine Learning journey, it’s probably not the most important part of the process. BigqueryML comes into playBigQuery ML is a tool to create, execute, and server machine learning models directly in BigQuery, by using standard SQL queries. BigQuery ML is evolving so fast, and what I found out to be cons could be greatly improved in the coming months. Once again with BigQuery ML, you simply need one command:SELECT *FROMML.PREDICT(MODEL medium.trial_to_paid_prediction_model,(SELECT*FROMmedium.trial_to_paid_prediction_new_data),STRUCT(0.55 AS threshold) #Optional - Default to 0.5)And that’s it! Since we used a logistic regression model, BigQuery ML gives us the predicted values based on the threshold we choose. Getting Started with Weaviate Python Library In this example we are going to use news articles to construct weaviate data. Now lets create the Author class schema in the same manner, but with properties name and wroteArticles . This can be done by creating an appropriate batch request object and submitting it using the batch object attribute of the client. We cannot see the Author s data by getting the objects from weaviate. 0%| | 0/100 [00:00 Data Science and AI Books for 2021 Data Science and AI Books for 2021Photo by Sylvia Yang on UnsplashHere are some of the notable data science publications that are currently available in print form, listed in order of publication. I selected books that had larger number of reviews on Amazon, were published since the beginning of 2020, and covered interesting subjects related to data science, engineering, and AI. This includes AI models as well as the more traditional inventory forecasting that suddenly became mission critical in the age of Covid-19. The book is a follow up to The One Hundred Page Machine Learning Book. Good luck and I hope you enjoy reading one (or more) of these books as you continue your journey in data science and engineering. 6 Data Science Slack Workspaces You Need to Join 6 Data Science Slack Workspaces You Need to JoinPhoto by Pankaj Patel on UnsplashOne of the most difficult aspects of self-learning any new skill or technology is the feeling of loneliness. These 6 Slack workspaces will provide you the sense of community and belonging you need to succeed in your data science learning journey. №1: Open Data Science CommunityThe Open Data Science Community (ODSC) is more than just a Slack workspace; it’s an organization for everything in data science. They hold data science conferences all over the world, publish posts and videos about different data science topics and keep you up-to-date with the latest in data science research. As data scientists, an essential part of our role is to stay up-to-date with recent research in your particular topic of data science. How to Apply Your Hard Earned Data Science Skillset How to Apply Your Hard Earned Data Science SkillsetTaking the leap from learning to application can be tough. I think this is especially true of data science — for both books and many of the courses available today. My first steps into understanding machine learning and data science were taking the original Machine Learning by Andrew Ng on Coursera. Both books and courses serve a purpose and can provide a solid foundation in data science. Many of the foundational components that make up the data science skillset are vast areas of research in their own rights. Writing a music album with Deep Learning MotivationHaving experimented with the WaveNet, PerformanceRNN and Piano Transformer I was ready to make music. Examples of the PerformanceRNN network trained with increasing number of epochs. Chosen sample for the first track generated by the PerformanceRNNThe piano track was great, but not enough to create a song. I copied the lowest notes of the piano melody to generate a Cello accompanying for certain parts of the song. Thus, I used the Piano Transformer model to generate a continuation of the piano track used in the first song. Neural Network Optimizers Made Simple: Core algorithms and why they are needed Review of Optimization with Gradient DescentLoss CurveLet’s start with a typical 3D picture of the gradient descent algorithm at work. Loss curve for Gradient Descent (Source)This picture shows a network with two weight parameters:The horizontal plane has two axes, for weights w1 and w2 respectively. Although they continue to use gradient descent at the core, optimization algorithms have developed a series of improvements on the vanilla gradient descent, to tackle these challenges. First Improvement to Gradient Descent — Stochastic Gradient Descent (SGD)Gradient Descent usually means “full-batch gradient descent”, where the loss and gradient are calculated using all the items in the dataset. Second Improvement to Gradient Descent — MomentumAdjust the Update Amount dynamicallyOne of the tricky aspects of Gradient Descent is dealing with steep slopes. Building LSTM-Based Model for Solar Energy Forecasting Building LSTM-Based Model for Solar Energy ForecastingSolar energy is one of the most important constituents of alternative sources of clean and renewable energy. Forecasting of Solar Energy Generation is critical for downstream application and integration with the conventional power grids. The approaches of forecasting can be broadly classified into the following waysFig 1: Types of Forecasting Models a) Approach Based b) Forecasting Window Based c) Based on Number of Variables. Based on the forecasting window if it is less than thirty minutes it is very short-term forecasting and if it is more then it is short-term. While we wanted to use LSTM based on the capability of understanding complex and non-linear patterns. Adding the Informer Model to Flow Forecast The decoder can effectively forecast long sequences in a single forward pass. The Informer model employs a probabilistic attention mechanism to forecast long sequence. They test the model forecasting several different time intervals of data including They also test the model on a weather forecasting dataset. Porting the model to Flow ForecastDespite similarities to our full transformer model and the Informer, moving the model to our framework was challenging for several reasons. Using the Informer in Flow ForecastWe now have several tutorials on how to use the Informer for time series forecasting in Flow Forecast. Google AI Blog: Announcing the 2021 Research Scholar Program Recipients In March 2020 we introduced the Research Scholar Program, an effort focused on developing collaborations with new professors and encouraging the formation of long-term relationships with the academic community. In November we opened the inaugural call for proposals for this program, which was received with enthusiastic interest from faculty who are working on cutting edge research across many research areas in computer science, including machine learning, human-computer interaction, health research, systems and more. Of the 86 award recipients, 43% identify as an historically marginalized group within technology. Please see the full list of 2021 recipients on our web page, as well as in the list below. We offer our congratulations to this year’s recipients, and look forward to seeing what they achieve! Shedding light on fairness in AI with a new data set Facebook AI has built and open-sourced a new, unique data set called Casual Conversations, consisting of 45,186 videos of participants having nonscripted conversations. By making fairness research more transparent and normalizing subgroup measurement, we hope this data set brings the field one step closer to building fairer, more inclusive technology. The AI research community can use Casual Conversations as one important stepping stone toward normalizing subgroup measurement and fairness research. Our new Casual Conversations data set should be used as a supplementary tool for measuring the fairness of computer vision and audio models, in addition to accuracy tests, for communities represented in the data set. We applied the Casual Conversations data set to measure performance by subgroup for the top five winners of the Deepfake Detection Challenge on roughly 5,000 videos that overlap with the Casual Conversations data set in our paper. Save the date for the AWS Machine Learning Summit: June 2, 2021 On June 2, 2021, don’t miss the opportunity to hear from some of the brightest minds in machine learning (ML) at the free virtual AWS Machine Learning Summit. Machine learning is one of the most disruptive technologies we will encounter in our generation. This Summit, which is open to all and free to attend, brings together industry luminaries, AWS customers, and leading ML experts to share the latest in machine learning. Hear from ML leaders from across AWS, Amazon, and the industry, including Swami Sivasubramanian, VP of AI and Machine Learning, AWS; Bratin Saha, VP of Machine Learning, AWS; and Yoelle Maarek, VP of Research, Alexa Shopping, who will share a keynote on how we’re applying customer-obsessed science to advance ML. About the AuthorLaura Jones is a product marketing lead for AWS AI/ML where she focuses on sharing the stories of AWS’s customers and educating organizations on the impact of machine learning. How to Avoid a Rainy Wedding Day with Bayesian Statistics Do you have 162 minutes to spare? In an ideal world, all of us would shout “Yes!” in unison and proceed to consume every single item in this week’s Variable. Just like data science at its best helps people take the right action at the right time, we love guiding your reading decisions. Here’s a selection of this week’s must-reads—and why you should consider reading each. If you’re new to statistics, you’ll want to read Ines Lee’s explainer on regression to the mean because it’s crisp, clear, and well-illustrated. Python loops: Some beginner-friendly looping challenges Looping through listsThe list is a built-in data type in Python. It is the most versatile data type used in Python loops. So our first set of looping challenges are based on lists. Below is an example of a Python dictionary where the fruits are “keys” and the prices are “values”. Strings are immutable — once created cannot be changed — but each element can be accessed with loops. Creating Scale Independent SVG Charts Creating Scale Independent SVG ChartsPhoto by Isaac Smith on UnsplashData Visualization plays an important role in data analysis because as soon as the human eyes see some charts or graphs they try finding the patterns in that graph. Data Visualization is visually representing the data using different plots/graphs/charts to find out the pattern, outliers, and relation between different attributes of a dataset. In this article, we will be exploring how to create different charts and plots using leather. pip install leatherImporting required librariesIn this step, we will be importing the required libraries for creating charts and plots using leather. import leatherCreating charts using leatherNext, we will start creating bars and graphs:Bar chartdata = [(3, 'Hello'),(5, 'How'),(9, 'Are'),(4, 'You')] chart = leather.Chart('Bars')chart.add_bars(data)chart.to_svg('bars.svg')Source: By Author2. Generating “aviation regulations” with RNNs Generating “aviation regulations” with RNNsPhoto by Shaun Darwood on UnsplashIntroductionRecurrent Neural Networks are great! Because of their ability to estimate the next point on a sequence of data, RNNs can also be used to generate original data points. In this article, I’ll showcase some of the results of applying a Recurrent Neural Network to the full set of US civil aviation regulations. Model descriptionThis article will not go into details on the hows and whys of RNNs. Even though the generated text reads very familiar to someone accustomed to aviation regulations, it fails to generate statements that really make sense. Introduction to Classification Using K Nearest Neighbours Introduction to Classification Using K Nearest NeighboursAs machine learning practitioners, we come across a wide array of machine learning algorithms that we may exert to build the predictive model. This is analogous to how the KNN classification algorithm works. However, whenever the algorithm is put into action, it needs to search the entire dataset to find the K nearest neighbours. Now that we have discussed the theoretical concepts of KNN classification algorithm, we will be applying our learning to build a classifier using K Nearest Neighbour algorithm. I hope this article would have helped you get a hunch of how the KNN algorithm works. Logistic Regression From Scratch in Python Logistic RegressionLet’s use the following randomly generated data as a motivating example to understand Logistic Regression. For a binary classification problem, we naturally want our hypothesis ( y_hat ) function to output values between 0 and 1 which means all Real numbers from 0 to 1. One such function is the Sigmoid or Logistic function. Sigmoid or Logistic functionThe Sigmoid Function squishes all its inputs (values on the x-axis) between 0 and 1 as we can see on the y-axis in the graph below. Why did we choose the Logistic Function only, why not any other? How to Create a Complex Data Science Project With 2 Lines of Code MACHINE LEARNINGHow to Create a Complex Data Science Project With 2 Lines of CodeImage by Vlada Karpovich. PyCaret allows you to run a whole data science project, from data cleaning, dealing with class imbalance, to hyper-tuned machine learning models with two lines of code. It goes from preparing your data to deploying supervised and unsupervised models with 2 or 3 lines of code. Since we will test the credit card default dataset, we need to type data = get_data('credit') and run the cell. You can add it as part of the process and get quick insights without having to type dozens of lines of code. Spec2Vec: The Next Step in Mass Spectral Similarity Metrics Spec2Vec: The Next Step in Mass Spectral Similarity MetricsSource: AuthorCalculating the similarity between two MS/MS mass spectra is an essential step in most untargeted metabolomics workflows. Why mass spectral similarity metrics matterA datapoint in an untargeted metabolomics dataset is, at first, just a measured entity. This is how mass spectral similarity metrics are mainly used:Spectral library matching : Matching a compound against a spectral library (such as HMDB). Mass spectral networks: Clustering related compounds in a mass spectral network. This is a well-known problem but no fundamentally alternative mass spectral similarity metrics have been proposed as solutions, until Spec2Vec. Adding the Informer Model to Flow Forecast The decoder can effectively forecast long sequences in a single forward pass. The Informer model employs a probabilistic attention mechanism to forecast long sequence. They test the model forecasting several different time intervals of data including They also test the model on a weather forecasting dataset. Porting the model to Flow ForecastDespite similarities to our full transformer model and the Informer, moving the model to our framework was challenging for several reasons. Using the Informer in Flow ForecastWe now have several tutorials on how to use the Informer for time series forecasting in Flow Forecast. Variational Inference with Normalizing Flows on MNIST Variational Inference with Normalizing Flows on MNISTIntroductionIn this post, I will explain what normalizing flows are and how they can be used in variational inference and designing generative models. Before getting into normalizing flows, it is helpful to review what variational inference is and how normalizing flows relate to it. We usually like to parameterize distributions with some parameters θ — where most of the times θ are the parameters of a neural network. In other words, variational parameters enable us to learn the model’s parameters too. Acknowledgements[1] Rezende and Mohamed 2015, Variational Inference with Normalizing Flows. Detecting objects in urban scenes using YOLOv5 This week, we have released the code, trained models as well as all annotated images (dataset). At the time of evaluating our options, YOLOv5 was one of the fastest and most accurate object detection model available. This strategy was adopted for both YOLOv5 (pre-trained on the MS COCO object detection dataset) and the SSD (pre-trained on the ImageNet image classification dataset). The City of Montreal dataset largest image resolution is 704x480, thus with an input image size of 704x704, no pixel information is lost. This can have a big impact on the detection performance of small objects, like pedestrians and construction cones for example, because they tend to be very narrow. If Data Science Feels Like a Struggle, You Might Just Be on the Right Path I decided to do a Data Science MSc in Barcelona and afterwards another Computer Science MSc at Imperial. Since data science is so broad, it attracts a group of diverse people who all know something really well. What pushed you to start writing about data science for a wider audience? But with time I started gamifying most aspects of the writing process: Many of my blog posts start out as weekend side projects. Data science and machine learning are dynamic fields; are there any developments you’d be especially excited to see in the short term? Towards Data Science Facebook AI Similarity SearchPhoto by NeONBRAND on UnsplashAccurate, fast, and memory-efficient similarity search is a hard thing to do — but something that, if done well, lends itself very well to our huge repositories of endless (and exponentially growing) data. I will use the example of image similarity search. When we then compare the vectors of two similar images — we will find that they are very similar. Imagine trying to do that with a Google Image search — you could be waiting some time. Instead, we want to find a more efficient approach — and that is exactly what Facebook AI Similarity Search (FAISS) offers. Stacking machine learning models for speech sentiment analysis How to build a model that recognizes human sentiment from audio and text recordings. In the context of the final project of Le Wagon’s bootcamp, my team and I decided to take on a fascinating task: Speech sentiment recognition. Our dataWe found a data set from Carnegie Melon University called CMU-MOSEI¹ which is the largest dataset of sentence level sentiment analysis and emotion recognition in online videos. We decided to analyse both audio recordings and text transcripts to predict the sentiment behind a person’ sentence. Our intuition was that combining two models with two different sources, using multi-modal learning, could improve our performance. How Microsoft Icebreaker Addresses the Cold-Start Challenge in Machine Learning Models How Microsoft Icebreaker Addresses the Cold-Start Challenge in Machine Learning ModelsThe new technique allows the deployment of machine learning models that operate with minimal training data. Within the machine learning research community, several efforts such as weakly supervised learning or one-shot learning have been created in order to address this issue. Knowing What You Don’t Know: The Ice-Start Challenge in Machine LearningThe ice-start problem/dilemma refers to the amount of training data required to make machine learning models effective. Conceptually, Icebreaker relies on deep generative model that minimizes the amount and cost of data required to train a machine learning model. Microsoft Icebreaker is an innovative model to enable the deployment of machine learning models that operate with little or no-data. Data Scientists Must Embrace Mathematics The field of data science has several subdivisions such as data mining, data transformation, data visualization, machine learning, deep learning, etc. As a scientific discipline, a data science task could be broken into 3 main stages:Illustrating a data science task. Data science requires a solid foundation in mathematics and programming. YouTubeYouTube contains several educational videos and tutorials that can teach you the essential math and programming skills required in data science, as well as several data science tutorials for beginners. As a data scientist, it’s important to keep in mind that the theoretical foundations of data science are very crucial for building efficient and reliable models. A Complete Guide to Confidence Interval, t-test, and z-test in R for Data Scientists A Complete Guide to Confidence Interval, t-test, and z-test in R for Data ScientistsThe confidence interval, t-test, and z-test are very popular and widely used methods in inferential statistics. But as mentioned in the title, this article will focus on using R to construct the confidence interval and perform the t-test, or z-test. We will cover:What are the confidence interval and a basic manual calculation2. z-test of one sample mean in R3. t-test of one sample mean in R4. The null hypothesis can be expressed as:If we do not find enough evidence for the null hypothesis, we will reject the null hypothesis and say that the alternative hypothesis is true. It is common to use a z.test or a t.test function to find a confidence interval in R. But remember if you are using these functions to find confidence interval the ‘alternative’ parameter has to be set as ‘two-sided’ always. Build A Web App To Interpret Your ML Predictions in Seconds With Shapash The web app contains 3 types of visualizations — Feature Contributions, Feature Importances, Local Interpretations, and Test data. There are two types of explanations — Global Explanations and Local Explanations. Feature importance and Feature Contributions give global explanations (overall model performance) and local explanations give interpretations for individual predictions. Compare plot: Using xpl.plot.compare_plot() , you get compare plot which helps you understand where the differences of predictions of several instances come from. in the meanwhile, you can get local explanations using the below code. 300 NLP Notebooks and Freedom This is probably a good time to take you down the rabbit hole on the state of the Super Duper NLP Repo (SDNR)? . If this is your first time hearing about the SDNR, it’s a handy repository of more than 300 Colab notebooks (and counting) focusing on natural language processing (NLP). ?‍?What’s Under the Hood ?SDNR’s eclectic collection of code varies across NLP tasks. Check out the tail ?TasksSo what kind of NLP tasks does the SDNR cover? It’s a multi-lingual CLIP ?? :Anyway, now you know the ins and outs of the Super Duper NLP Repo! Why Python Is The Perfect Language For A Machine Learning Project Why is it so good for Machine Learning and why aren’t other languages like C, C++, java, not the best match you ask? Why Do Those Developers Love Python in Their Machine Learning And AI Projects? Python Is Platform-Independent :Python is platform-independent, running on platforms such as Windows, Linux, and a whole host of other platforms. God bless someone who swifts from Python to a more structure-oriented language such as Java or even C and C++. All in all, Python seems to go well with almost any Machine learning and Artificial Intelligence project you choose, or up until something better comes our way at least. Step-by-Step Basic Understanding of Neural Networks with Keras in Python Step-by-Step Basic Understanding of Neural Networks with Keras in PythonArtificial Neural Network. A photo by AuthorIn this article, we will discuss the simple neural network and its definition with Keras’s example. Basic types of Neural NetworksArtificial Neural NetworkConvolutional Neural NetworkRecurrent Neural NetworkNeural networks are very good algorithms for non-linear data for ease of handle. Hidden layer: This layer is in between the input layer and output layer. Gradient descent and stochastic gradient descentThe main aim of gradient descent is to minimize the cost (loss). GAMs and Smoothing Splines(Part-1) But in general, to approximate f(x), gams use Smoothing Splines. While there are many different types of splines — Natural Splines, Cubic Splines, p-splines, B-splines, Regression Splines, in this article, I am going to discuss Natural Cubic splines and Smoothing Splines and their applications in GAMs. As gams use smoothing splines, let’s construct a smoothing spline based on the data. # Linear regression model on top of Smoothing splines basis. In part-2 article, I will try to explain the 2-D interaction splines using Tensor Splines and using them in gams. Ensemble Methods Explained in Plain English: Bagging Ensemble Methods Explained in Plain English: BaggingIn this article, I will go over a popular homogenous model ensemble method — bagging. BaggingHow Bagging WorksIn bagging, a large number of independent weak models are combined to learn the same task with the same goal. For regression problems, the final predictions will be an average (soft voting) of the predictions from base estimators. Diagram of Bagging AlgorithmsImplementing Bagging Algorithms with Scikit-LearnYou can build your own bagging algorithm using BaggingRegressor or BaggingClassifier in the Python package Scikit-Learn. Difficult to interpret: The final predictions of a bagging algorithm are based on the mean predictions from the base estimators. We Don’t Need To Worry About Overfitting Anymore Photo by Mohamed Nohassi on UnsplashMotivated by prior work connecting the geometry of the loss landscapeand generalization, we introduce a novel, effective procedure for instead simulta- neously minimizing loss value and loss sharpness. In particular, our procedure,Sharpness-Aware Minimization (SAM), seeks parameters that lie in neighbor-hoods having uniformly low loss; this formulation results in a min-max optimiza- tion problem on which gradient descent can be performed efficiently. We present empirical results showing that SAM improves model generalization across a variety of benchmark datasets[1]Source: Sharpness Awareness Minimization Paper [1]In Deep Learning we use optimization algorithms such as SGD/Adam to achieve convergence in our model, which leads to finding the global minima, i.e a point where the loss of the training dataset is low. But several kinds of research such as Zhang et al have shown, many networks can easily memorize the training data and have the capacity to readily overfit, To prevent this problem and add more generalization, Researchers at Google have published a new paper called Sharpness Awareness Minimization which provides State of the Art results on CIFAR10 and other datasets. In this article, we will look at why SAM can achieve better generalization and how we can implement SAM in Pytorch. Hands-on Distributed Training with Determined AI, a Breakthrough Algorithm, Coded Bias… and More! The survey analysis shows that vaccine hesitancy is persistent, data science and statistics Professor Alex Reinhart discusses potential strategies to tackle it. MIT researcher Joy Buolamwini tackles this paradigm, as she discovers how facial recognition does not see dark-skinned faces accurately in the documentary Coded Bias, just launched on Netflix. If you are interested in reinforcement learning, check out this project by Google AI, showcasing how a novel algorithm teaches agents to solve tasks by providing only examples of success. Machine learning datasets are filled with labeling errors. Researchers from MIT and Amazon dive into this problem in their research paper, highlighting how 10-widely cited datasets, including ImageNet contain an error rate of 3.4%. Develop a Neural Network for Cancer Survival Dataset shape [ 1 ] # define model model = Sequential ( ) model . shape [ 1 ] # define model model = Sequential ( ) model . shape [ 1 ] # define model model = Sequential ( ) model . shape [ 1 ] # define model model = Sequential ( ) model . shape [ 1 ] # define model model = Sequential ( ) model . Use computer vision to detect crop disease through image analysis with Amazon Rekognition Custom Labels In this post, we showcase how you can build an end-to-end disease detection, identification, and resolution recommendation solution using Amazon Rekognition Custom Labels. Amazon Rekognition Custom Labels, an automated ML feature of Amazon Rekognition, lets you quickly train custom CV models specific to your business needs, simply by bringing labeled images. To create our custom model, we follow these steps:Create a project in Amazon Rekognition Custom Labels. Amazon Rekognition Custom Labels lets you manage the ML model training process on the Amazon Rekognition console, which simplifies the end-to-end model development and inference process. For more information about using custom labels, see What Is Amazon Rekognition Custom Labels? Rendering Responsive Text on Video using Python IntroductionAs I mentioned earlier, we will be using just the OpenCV package in this project. When working on a machine learning or a computer vision project like this, we will need to install the package first. OpenCV (Open Source Computer Vision Library) is an open-source computer vision and machine learning software library. Feel free to search “How to install OpenCV to Windows or macOS” if you want to know more about the installation. I am importing the OpenCV package that we just installed into the program using the import method. A Comprehensive Python Implementation of GloVe A Comprehensive Python Implementation of GloVePhoto by Cookie the Pom on UnsplashAs an NLP data scientist, I frequently read papers with topics varying from word vectors, RNNs, and transformers. So I carried out a comprehensive Python implementation of the model, which aligns with the goal of training a huge vocabulary with only a single machine. The key is (word i’s index, word j’s index), where word j appears in the context of word i. Index 0 corresponds to the most frequent token, index 1 corresponds to the second most frequent token, so on and so forth. The most obvious way is to write the (word i’s index, word j’s index, count) triplets into a shared text file between scans. How to Set up a Machine Learning Model for Legal Contract Review This makes sense because we want the model to identify certain parts of a contract by asking it a question (e.g. With this knowledge we can load the model and its tokenizer (which prepares the inputs for a model) like so:Note that we set the use_fast parameter to False. The reason for that is that the QuestionAnswering (Q&A) model is not (yet) compatible with fast tokenizers which have smarter overflow handling. This is totally fine for our purpose, we just have to remember to set the parameter accordingly. Now that we have loaded the model, we can quickly test it with a sample. Pandas MultiIndexing And Working With Time Series Data Photo by eskay lim on UnsplashPandas MultiIndexing And Working With Time Series DataBefore we can do any predictive modeling or analytics, we first need to clean and format our data. It’s much easier to analyze time series data such as stock prices in tabular format where each row is a date and each column is a stock. If you run the following line of code on our data above (stored in the dataframe called data ), it creates a multi-index for data . data = data.set_index(['ticker','date'])We’ve chosen to index by both stock ticker and date, hence multi-indexing because we are indexing by more than one column. Pandas really does make analyzing time series data a lot easier — no wonder why it’s such a staple in data analysis. Deploying a basic Streamlit app to Heroku Deploying a basic Streamlit app to HerokuImage by authorStreamlit is an app framework to deploy machine learning apps built using Python. About the appThis Streamlit app simulates the Central Limit Theorem as the characteristics of the sample and population change. Varying these characteristics of the population and sample has an effect on the population parameters, sample statistics, population distribution and the distribution of sample means. Image by authorcentral_limit_theorem.pyThis Python script (.py file, not a notebook) contains the code of the Streamlit app. In our ‘Procfile’, we’ll first run ‘setup.sh’ that creates the required config files and then run the app using the ‘streamlit run’ command. Beginner’s Guide to XGBoost for Classification Problems How to preprocess your datasets for XGBoostApart from basic data cleaning operations, there are some requirements for XGBoost to achieve top performance. The dataset contains weather measures of 10 years from multiple weather stations in Australia. Next, let’s deal with missing values starting by looking at their proportions in each column:If the proportion is higher than 40% we will drop the column:Three columns contain more than 40% missing values. If you are not familiar with them, check out my separate article for the complete guide on them. The only thing missing is the XGBoost classifier, which we will add in the next section. Make Pandas Run Blazingly Fast INFO: Pandarallel will run on 2 workers. When you run this on your machine, the number of workers will be much more and so, the speedup. Now, let’s create a function to apply to this data frame. ConclusionWith just a simple change in the function call, we are able to get a great speedup for our Pandas code. This is absolutely great since data processing takes a lot of time and keeps us away from analyzing the data. Linear Tree: the perfect mix of Linear Model and Decision Tree When fitting a Decision Tree, the goal is to create a model that predicts the value of a target by learning simple decision rules based on several input variables. They are wrappers that build a Decision Tree on the data fitting a linear estimator from sklearn.linear_model . In other words, we only have to choose a Linear Model to build our Linear Tree. LINEAR TREE FOR REGRESSIONIn this section, we use a Linear Tree to model a regression task. Linear Tree Regressor at various depths (image by the author)It’s clearly visible that the Linear Tree operates a linear approximation in the splits. Simple Implementation of OpenAI CLIP model: A Tutorial But, there is no limitation and we can use it to train CLIP model as well. DatasetAs you can see in the tittle image of this article, we need to encode both images and their describing texts. In the forward function, we first encode the images and texts separately into fixed size vectors (with different dimensionalities). Finding MatchesThis function does the final task that we wished our model would be capable of: it gets the model, image_embeddings, and a text query. The model knows the meaning of “two” and brings images that have two dogs in them in contrast to the previous query! iOS Computer Vision Object Detection with Turi Create iOS Computer Vision Object Detection with Turi CreateAn overview of an iOS computer vision project that will take less than a day to build and deploy Kevin McGovern 1 day ago·4 min readPhoto by Carl Heyerdahl on UnsplashComputer Vision projects are everywhere. Apple has done an amazing job making computer vision for iOS more approachable with it’s Create ML tool. Based on the title of this post you can probably assume this is where Turi Create enters the fold. Once your data is in an sFrame, you’re ready to start building your model using Turi Create. Once that setting has been changed, you still need to get Turi Create to recognize your GPU. Large-scale forecasting: Self-supervised learning framework for hyperparameter tuning What the research is:A new self-supervised learning framework for model selection (SSL-MS) and hyperparameter tuning (SSL-HPT), which provides accurate forecasts with less computational time and resources. This SSL-HPT algorithm estimates hyperparameters 6-20x faster when compared with baseline search-based algorithms, while producing comparably accurate forecasting results in various applications. Most existing hyperparameter tuning methods — such as grid search, random search, and Bayesian optimal search — are based on one key component: search. Because of this, they are computationally expensive and cannot be applied to fast, scalable time series hyperparameter tuning. How it works:We developed the self-supervised learning framework with two main tasks in mind in the forecasting domain: SSL-MS and SSL-HPT. Join AWS at NVIDIA GTC 21, April 12–16 Starting Monday, April 12, 2021, the NVIDIA GPU Technology Conference (GTC) is offering online sessions for you to learn AWS best practices to accomplish your machine learning (ML), virtual workstations, high performance computing (HPC), and Internet of Things (IoT) goals faster and more easily. Amazon Elastic Compute Cloud (Amazon EC2) instances powered by NVIDIA GPUs deliver the scalable performance needed for fast ML training, cost-effective ML inference, flexible remote virtual workstations, and powerful HPC computations. At the edge, you can use AWS IoT Greengrass and Amazon SageMaker Neo to extend a wide range of AWS Cloud services and ML inference to NVIDIA-based edge devices so the devices can act locally on the data they generate. AWS is a Global Diamond Sponsor of the conference. Available sessionsML infrastructure:ML with Amazon SageMaker:ML deep dive:High performance computing:Internet of Things:Edge computing with AWS Wavelength:Automotive:Computer vision with AWS Panorama:Game tech:Visit AWS at NVIDIA GTC 21 for more details and register for free for access to this content during the week of April 12, 2021. Text data representation with one-hot encoding, Tf-Idf, Count Vectors, Co-occurrence Vectors and Word2Vec To bridge this gap a lot of research has gone into creating numerical representation for text data. In this article, we will explore some of them : One-hot encoding, Count Vectors, Tf-Idf, Co-occurrence vectors and Word2Vec. Image by authorAt the end of the two steps, we can finally get the one-hot encoding representation of all our three reviews (R1 to R3). As you can notice, the drawbacks of the count vectors are similar to one-hot encoding in terms of:The size of the vector representing each document. Image by author (Skip-gram Architecture applied to the example)Yes, I got the general idea of Word2Vec, But how is the semantic relationship captured from vectors? Creating A Data Science And Machine Learning Portfolio Using Notion The introductory section of this portfolio provides visitors with an insight into my professional background, motivations and speciality. I’ve written two paragraphs that briefly detail my expertise within machine learning, including specific techniques I’ve worked on. Adding these links provides visitors with other avenues that present my professional self. To close this section, I’ve included links to navigate to other areas within the portfolio. Some visitors might be interested in projects I've worked on, while others might want to view my work experience. BBN: Bayesian Belief Networks — How to Build Them Effectively in Python ContentsIntroduction to Bayesian Belief Networks (BBN) and Directed Acyclic Graphs (DAG)Bayesian Belief Network Python example using real-life data- Directed Acyclic Graph for weather prediction- Data and Python library setup- BBN setup- Using BBN for predictions- Directed Acyclic Graph for weather prediction - Data and Python library setup - BBN setup - Using BBN for predictions ConclusionsBayesian Belief Networks (BBN) and Directed Acyclic Graphs (DAG)Bayesian Belief Network (BBN) is a Probabilistic Graphical Model (PGM) that represents a set of variables and their conditional dependencies via a Directed Acyclic Graph (DAG). Bayesian Belief Network Python example using real-life dataDirected Acyclic Graph for weather predictionLet’s use Australian weather data to build a BBN. You will see how we calculate these using our weather data in the next few sections. Directed Acyclic Graph (DAG) for a Bayesian Belief Network (BBN) to forecast whether it will rain tomorrow. Here is a snapshot of the data:A snippet of Kaggle’s Australian weather data with some modifications. How to make an effective coreference resolution model How to make an effective coreference resolution modelWritten by Marta Maślankowska and Paweł Mielniczuk. Photo by Dariusz Sankowski on UnsplashIntroductionIn this article, we present how to improve AllenNLP’s coreference resolution model in order to achieve a more coherent output. Ready-to-use yet incompleteBoth Huggingface and AllenNLP coreference resolution models would be a great addition to many projects. Solving cataphora problemMany coreference resolution models, such as Huggingface, have serious problems with detecting cataphora as it’s a rather rare occurrence. Hopefully, by now you are familiar with coreference resolution and can easily adapt our proposed solutions to your project! Different Metrics to Evaluate Binary Classification Models and Some Strategies to Choose the Right One This article is a comprehensive overview of the different metrics for evaluating binary classification models and some strategies to choose the right one for your use case. In this article we will try to answer this question after discussing the different evaluation metrics for a binary classification model. You want to take a look at each patient's health information and determine whether the corresponding patient has a heart disease or not. Since True Positive and False Positive rates are both between 0 and 1, AUC is also between 0 and 1. Strategies to choosing the right metric for your modelYou know how to evaluate your classification models using different metrics, but which one should you choose for your use case? A simple way to understand Association Rule from the Customer Basket Analysis Use Case A simple way to understand Association Rule from the Customer Basket Analysis Use CaseImage by Steve Buissinne from PixabayA) IntroductionThe goal of this article is to explain the association rule applied to the customer basket analysis use case. We will go through the steps of defining what an association rule is, identify the main way of measuring it, and also mention its advantages and drawbacks. D) Application of the association rule to our use caseFirst, it is important to note that an association rule has two parts: an antecedent (if) and a consequent (then). By applying the association rule to our previous use case we can have the following expression with the example of coffee and milk. If Lift score = 1, it means that there is no association between C and S.Coffee is the antecedent. Pandas Basics Cheat Sheet (2021), Python for Data Science The Pandas library is one of the most powerful libraries in Python. It is built on NumPy and provides easy-to-use data structures and data analysis tools for the Python programming language. Pandas Data StructuresThere are two main types of data structures that the Pandas library is centered around. The first is a one-dimensional array called a Series, and the second is a two-dimensional table called a Data Frame. Series — One dimensional labeled array>>> s = pd.Series([3, -5, 7, 4], index = ['a','b','c','d'])a 3b -5c 7d 4Data Frame — A two dimensional labeled data structure Math You Need to Succeed In ML Interviews Math You Need to Succeed In ML InterviewsI should preface this blog with the intended audience — it is for those who are interested in machine learning engineering interviews not research scientist (or similar roles like data science). If you are like me when studying for your MLE interview, you start to read books and blogs and take notes. However, there are certain equations that we need to be aware of and should use when answering machine learning questions. I have complied a (non-exhaustive) list of equations and explanations that you must know when studying for MLE Interviews. It is written with the general ML engineer in mind, for specific fields you would need to know other equations. iOS Computer Vision Object Detection with Turi Create iOS Computer Vision Object Detection with Turi CreateAn overview of an iOS computer vision project that will take less than a day to build and deploy Kevin McGovern 16 hours ago·4 min readPhoto by Carl Heyerdahl on UnsplashComputer Vision projects are everywhere. Apple has done an amazing job making computer vision for iOS more approachable with it’s Create ML tool. Based on the title of this post you can probably assume this is where Turi Create enters the fold. Once your data is in an sFrame, you’re ready to start building your model using Turi Create. Once that setting has been changed, you still need to get Turi Create to recognize your GPU. KERAS: Under The Hood KERAS: Under The HoodINTRODUCTIONGetting started with deep learning has become very simple and convenient, all thanks to wonderful duo of keras and tensorflow. Keras has made such an amazing abstraction that even a total stranger to the topic as well can start training their own deep learning models. input_layer = Input(shape = (5,))dense_layer = MyDenseLayer(nodes=10)dense_layer_op = dense_layer_one(input_layer)model = Model(inputs = input_layer,outputs = dense_layer_op)Now let us look at the architecture and try to understand it line by line. Second line creates an object of our custom layer class, this line basically defines the number of nodes in our custom layer. These methods are actually already defined in the Base Layer Class, and they have these special roles to play that we discussed above already. Neural Network Models for Combined Classification and Regression Tutorial OverviewThis tutorial is divided into three parts; they are:Single Model for Regression and Classification Separate Regression and Classification Models Abalone Dataset Regression Model Classification Model Combined Regression and Classification ModelsSingle Model for Regression and ClassificationIt is common to develop a deep learning neural network model for a regression or classification problem, but on some predictive modeling tasks, we may want to develop a single model that can make both regression and classification predictions. # split into input (X) and output (y) variables X , y = dataset [ : , 1 : - 1 ] , dataset [ : , - 1 ] X , y = X . values # split into input (X) and output (y) variables X , y = dataset [ : , 1 : - 1 ] , dataset [ : , - 1 ] X , y = X . values # split into input (X) and output (y) variables X , y = dataset [ : , 1 : - 1 ] , dataset [ : , - 1 ] X , y = X . values # split into input (X) and output (y) variables X , y = dataset [ : , 1 : - 1 ] , dataset [ : , - 1 ] X , y = X . Use Google Colab for Better Machine Learning Why Use Google Colab? Google Colab improves on the Jupyter Notebook in many ways. You load, edit, and save any .ipynb file to the Google Drive associated with the Colab login. It is helpful to have a separate Google account for each project and thus a different Google Drive. A Colab notebook has many useful extensions of a Jupyter Notebook. Machine Learning Model Visualization Machine Learning Model VisualizationPhoto by Luke Chesser on UnsplashCreating machine learning models is a day to day task of an ml engineer who can easily understand them and interpret them to derive useful information but it is difficult for a person who does not belong to the data science field to understand a machine learning model and what it is trying to say? In this article, we will explore Shapash and its features by creating a dashboard using a given dataset. Let get started…Installing required librariesWe will start by installing shapash using pip, you can run the command given below to install shapash. Shapash Dashboard(Source: By Author)Here you can see how easily we created a dashboard using shapash where we can see different visualizations and we can use it to clearly interpret and understand the machine learning model we are using. Feature Contribution(Source: By Author)This is how we can use shapash to create dashboards of machine learning models. Everything to Know About Convolutional Neural Networks These all things are possible by “computer vision” which is nothing but machine learning using convolutional neural networks. What is a convolutional neural networkA convolutional neural network or CNN is a kind of neural network that is used in processing data with input shape in 2D matrix form like images. Before building the model we need to understand and learn few important concepts of convolutional neural networks. There are three main significant units in convolutional neural networks i.e. ConclusionI hope with this article you will be able to understand and grasp the concepts of convolutional neural networks. Extending Julia’s Operators With Amazing Results The following is the code that already exists in the latest version of Lathe:Pipelines can contain a predictable Lathe model with preprocessing thatoccurs automatically. This is done by putting X array processing methodsinto the iterable steps, and then putting your Lathe model in. This is done by putting X array processing methodsinto the iterable steps, and then putting your Lathe model in. export LatheObjectNow we will add using to both my preprocess.jl file and the models.jl file. This is because the operators from Julia’s base are going to be used to mutate the Pipeline object that we pass with it. Data/Applied/Research Scientist, ML Engineer: What’s the Difference? Now in 2020, this catch-all role is more often split into multiple roles such as data scientist, applied scientist, research scientist, and machine learning engineer. I used to get questions like “What does a data scientist do?” Now, I get questions such as “What does a data/applied/research scientist do? Then in April 2018, Lyft rebranded their data analysts as data scientists, and data scientists as research scientists. Will this escalate with companies losing machine learning candidates if they offer the data scientist title, relative to competitors offering the research scientist or machine learning engineer title? If you’re thinking of splitting the data scientist role into distinct specializations, please also consider the benefits of having data scientists be more end-to-end. Data Science in a world of radical uncertainty Book cover for Radical Uncertainty by John Kay and Mervyn King, sourced from WaterstonesThe authors John Kay and Mervyn King partition all uncertainty into “resolvable uncertainty” and “radical uncertainty”. The book refers to this as a small world, and in this world, radical uncertainty doesn’t exist. Many of the positive traits we assign to humans, from creativity to love, create and require radical uncertainty. Innovation creates radical uncertainty and radical uncertainty creates space for innovation. The book “Radical Uncertainty” is a guide to understand how radical uncertainty impacts every aspect of our lives, and also why it is a force for good. Progressively approaching Kaggle Progressively approaching KagglePhoto by Brett Jordan on UnsplashThe Titanic Competition is most people’s first attempt at getting started on Kaggle. It has a wonderful archive of resources but if you’re looking for something newer, quicker and progressive that gets you acquainted with Kaggle competitions, then the Tabular Playground Series is a fantastic place to start. It has a beginner-friendly setup to help Kagglers get comfortable with Kaggle competitions. It gives an end-to-end experience of how competitions work and quickly enables you to build confidence of exploring the mainstream competitions. 2: Explore modelsHere’s your chance to experiment and build a plethora of models to figure out which ones work best. Julia’s Fantastic CUDA Implementation We will touch more on what this dispatch means later. Consider the addition of these two arrays:x = [5, 10, 15, 20]y = [5, 10, 15, 20] x .+ yUsing the CUDA package, we can simply cast the CuArray type over our two arrays. using CUDAx = CuArray([5, 10, 15, 20])y = CuArray([5, 10, 15, 20]) x .+ yCUDA DispatchAs with most packages available to Julia programmers, this package takes advantage of Julia’s multiple dispatch. This means that in a lot of cases, the arithmetic can be performed the exact way as it would normally without the CUDA. I think that this is really cool, because I genuinely have never worked with such an easy-to-use implementation of such a great parallel processing platform. 5 Reasons Why I Left the AI Industry We may never achieve artificial general intelligenceI’ve mentioned already the term artificial general intelligence (AGI). They say machines need to learn without labels, as kids do, using self-supervised learning (also called unsupervised learning). However, there’s too much we don’t understand about the brain yet to try and build AGI. We fear what we don’t understand. And, by definition, we won’t understand AGI. Questions to ask your interviewer for Engineering/ applied scientist roles Questions to ask your interviewer for Engineering/ applied scientist rolesPreparing for your interviews is one aspect of job search and getting a job that you want is another aspect of it. I know the question: “So do you have any questions for me now” after an hour-long interview can be daunting. Most questions are generic but has some specific questions for Data Scientist, ML engineer, applied scientist, data engineer and research engineer roles. Are these very applied or there is some fundamental research questions that have been answered? This helps in maintaining an external portfolio and gives an idea about the type of work your team is involved in. Is F1 the appropriate criterion to use? What about F2, F3,…, F beta? In this post, we will review the F measures. Barak Or 2 days ago·4 min readIntroAccording to many data scientists, the most reliable model performance measure is accuracy. Preliminary: Confusion matrix, Precision, and RecallConfusion matrix (Image by author)The confusion matrix summarizes the performance of a supervised learning algorithm in ML. SummaryIn this post, I reviewed the F measures. I hope that the provided data will help those dealing with classification tasks and motivate them to use the F measures along with the accuracy. You are underutilizing shap values — feature groups and correlations Also, it is easy to see that variables related directly to rain or humidity are more important than those related to wind, pressure, and especially temperature. It makes a lot of sense that the features related to the accident itself are the most impactful for our question (“was the accident fatal”). I have separated the accident variables into three groups: before, during, and after. This has to be done with caution, since it is not a causal model, but I do not want to enter too deep into the data itself, since my purpose is just to give some ideas into different shap analysis. The shap correlation analysis has the very useful property of being undisturbed by such differences. Python Bites: Manipulating Text Files Python Bites: Manipulating Text FilesPhoto by Patrick Fore on UnsplashText files are commonly used to store data. Python provides versatile functions and methods to handle text file. There are many options to create a text file. %%writefile myfile.txtPython is awesomeThis is the second lineThis is the last lineWe now have a text file named “myfile.txt” in the current working directory. In order to read the content of a text file, we first need to open it. Neural network from TENET exploiting time inversion (3) Simple recurrent neural network for an open dynamical systemIt’s quite a complicated model which requires taking into account both internal autonomous system state and external inputs. weight matrix A:(9) Weight matrix update in BPTTThe recurrent neural network with teacher forcing described above is called Error Correction Neural Network (ECNN) or Historical Consistent Neural Network (HCNN)[2]. [TENET] Superposition of Causal and Retro-Causal neural networksFinally we symmetrically combine normal time and inverted time networks into a single network [3]. (11) Causal & Retro-Causal neural networks combinedTraining this network again requires BPTT with teacher forcing. (12) Teacher forcing for Causal & Retro-Causal networks combinedMy implementation of this part on PyTorch: https://github.com/uselessskills/hcnn/blob/master/tutorials/crc_hcnn.ipynbConclusionWe began from time inversion idea presented by TENET and then mapped it to a model fueled by recurrent neural networks. Is F1 the appropriate criterion to use? What about F2, F3,…, F beta? In this post, we will review the F measures. Barak Or 1 day ago·4 min readIntroAccording to many data scientists, the most reliable model performance measure is accuracy. Preliminary: Confusion matrix, Precision, and RecallConfusion matrix (Image by author)The confusion matrix summarizes the performance of a supervised learning algorithm in ML. SummaryIn this post, I reviewed the F measures. I hope that the provided data will help those dealing with classification tasks and motivate them to use the F measures along with the accuracy. Killer Data Processing Tricks For Python Programmers This is because in my personal opinion, mapping is something that comes in handy quite frequently in Python. This is especially true in scenarios where there is a lot of data at play, as the map method can be quite efficient when put to use against virtually any data problem. Consider the following list:data = [5, 10, 15, 20]Our goal with this list is to map a mathematical change to it. In other words, we can index a data frame with a condition in order to separate data based on attributes. import pandas as pdConsider the following data frame:df = pd.DataFrame({"A" : [5, 10, 15, 20],"B" : ["tall", "short", "tall", "short"]})We could index this data frame with any conditional statement. Machine Learning Projects on the Cloud — Key Steps in the Process The Data Science Process Using Cloud Based SystemsImage by Author, inspired by https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/overviewThe image above perfectly illustrates the cloud based data science process and at a high level it looks similar to the good old data science process. The data science application may be looking to develop an automation or application which can be used as part of the business process. It is important to have this in place as data science projects are highly evolutionary in nature and documentation on agreement and scope is critical. The con of the cloud based approach is that machine learning projects cannot be just quickly piloted. ReferencesThis article has been based on Microsoft documentation on the Azure platform for machine learning, data science, AutoML, deployment, etc. Google releases EfficientNetV2 — a smaller, faster, and better EfficientNet Google releases EfficientNetV2 — a smaller, faster, and better EfficientNetPhoto by Fab Lentz on UnsplashWith progressive learning, our EfficientNetV2 significantly outperforms previous models on ImageNet and CIFAR/Cars/Flowers datasets. The issue is that when it was previously used, the same regularisation effect was used for different image sizes. Which is why they use training-aware NAS to dynamically search for the best combination of fused and regular MB Conv layers [1]. It also shows that a smaller expansion ratio for the MB Conv layers (along the network) is more optimal. They also add a scaling rule to restrict maximum image sizes as EfficientNets tend to aggressively scale up image sizes [1] (leading to memory issues). A.I. Law LawDid you ever realise that your next lawyer might be an Artificial Intelligence (AI) system, debating at the speed of light with an AI judge? Photo by Michael Dziedzic on UnsplashFinally, AI can also help law firms peep into the crystal ball of future court cases. As can be seen from these few examples, AI lawyers will become instrumental and offer a competitive advantage to today’s law firms. They will be crucial to make mundane jobs more efficient with AI augmentation (where an AI supports the lawyer’s work). Of course, this change won’t happen overnight, but to gain a competitive advantage, now is the right time to invite AI legal partners to join law firms. TensorFlow Model Deployment using FastAPI & Docker PreprocessingWe use Tensorflow’s TextVectorization layer which tidies things up and outputs a layer which we will be using in the process of creating a graph on a Sequential or Functional Model. Our model cannot be saved in ‘.h5’ format since we are using the TextVectorization layerFastAPIBefore we start creating APIs, we need a particular directory structure that will be utilized for creating a Docker image. tf_keras_imdb/ : SavedModel from TensorFlowmain.py : Python file for creating REST APIs using FastAPI framework||--- model| |______ tf_keras_imdb/||--- app| |_______ main.py||--- DockerfileWhenever we are building an API using FastAPI, we use pydantic to set the type of input our API expects. Uvicorn is a lightning-fast ASGI server implementation, which creates a server on our host machine and lets our API host the model on. To create a docker image and deploy it, we run the following commands, and voila! FLORES researchers kick off multilingual translation challenge at WMT and call for compute grants But unfortunately these benefits are currently limited to languages where enough data exists to create modern translation systems. These are commonly referred to as low-resource languages, and we’ve previously released the FLORES data sets and open-source models to enable further research into these under-resourced languages. To spur progress on the challenge of low-resource translation, we are launching the Large-Scale Multilingual MT Track at the Workshop for Machine Translation (WMT), the premier academic translation competition. Researchers will be asked to submit a short statement on how they plan to use the GPU compute to advance research in low-resource machine translation with the FLORES data set in WMT 2021. With FLORES and the associated WMT-Multilingual Track and compute grants, we hope that the machine translation community will be able to make more rapid progress on low-resource translation and help create a more connected and communicative world. Build a CI/CD pipeline for deploying custom machine learning models using AWS services CodePipeline invokes Step Functions and passes the container image URI and the unique container image tag as parameters to Step Functions. If the recipient accepts the changes, an Amazon API Gateway endpoint invokes a Lambda function with an embedded token that references the waiting Step Functions step. This link passes the embedded token and type to the API Gateway or Lambda endpoint as request parameters to progress to the next Step Functions step. Deploy the model with a CI/CD pipelineIn this section, you use the CI/CD pipeline to deploy a custom ML model. CodePipeline moves to the next stage to trigger a Step Functions step (which you created earlier). Underrated Metrics for Statistical Analysis Signs TestThe first metric I would like to touch on is the signs test. The signs test is a statistical test that uses the binomial distribution in order to return the probability. Although the signs test has its short-comings in terms of performance, it is certainly a great test to have in your arsenal. The normal distribution is a distribution for which the PDF is incredibly easy to calculate. Wilcox Rank-SumAnother under-estimated statistical test is the Wilcox Rank-sum test. 5 Tools to Speed Up Your Data Science Project Progress 5 Tools to Speed Up Your Data Science Project ProgressPhoto by Hunter Haley on UnsplashWhen you first get into the realm of data science, you will probably be all by yourself. After all, looking for data science tools is like a none-ending spiral; once you get in, it may take hours — sometimes days — to get out! №2: DataRobotWhether you’re new to data science or an experienced one, this next tool is for you. In this article, I recommended 5 tools that offer great help to teams working on data science projects. These tools will help you with data cleaning, data analysis, and even building, training, and testing machine learning models. Compositional AI: The Future of Enterprise AI Compositional AI: The Future of Enterprise AIAbstract. In this work, we will present the emerging paradigm of Compositional AI, also known as Compositional Learning. Compositional AI envisions seamless composition of existing AI/ML services, to provide a new (composite) AI/ML service, capable of addressing complex multi-domain use-cases. AI ServiceWith this background, let us go back to the drawing board and try to define an ‘AI Service’. Unfortunately, similar integration/fusion tools for AI Services are lacking today — a critical requirement for Compositional AI. Terraform + SageMaker Part 1: Terraform Initialization You won’t necessarily be able to use your AWS Terraform scripts on those other platforms, but because Terraform is its own scripting language, that knowledge carries over very well to the other platforms. You won’t necessarily be able to use your AWS Terraform scripts on those other platforms, but because Terraform is its own scripting language, that knowledge carries over very well to the other platforms. Terraform State ManagementWhen we create resources on our platform of choice — AWS in our case — Terraform manages what it has provisioned in the form of a Terraform state file. The Terraform state file is generally a file that ends in the .tfstate suffix and maintains the information of everything that you’ve provisioned to date with your Terraform scripts. By running the terraform validate command, Terraform will make sure that everything is squared away with your Terraform files without actually provisioning anything. Improve Warehouse Productivity using Spatial Clustering with Python Scipy (2) Order Lines DataFrameFunction: Calculating Number of single-line orders per storage Location (%)(1) Distribution of single-line orders lines per storage location — 5,000 order lines (%)Insights: let us take the example of Distribution aboveScope: 5,000 order lines for 23 aisles5,000 order lines for 23 aisles Single line orders: 49% of orders located in alleys A11, A10, and A091. Optimization: Picking locations clustering using Scipy(3) Order Lines Processing for Order Wave Picking using Clustering by Picking LocationIdea: Picking Locations ClustersGroup picking locations by clusters to reduce the walking distance for each picking route. Example: Locations Clustering within 25 m distance (5,000 order lines)(6) Left [Clustering using Walking Distance] / Right [Clustering using Euclidian Distance]The left example using Walking Distance is grouping locations within the same aisle reducing picking route distance; while the right example can group locations covering several aisles. Optimization: Picking locations (x_i, y_i) clustering using Scipy for Multi-line Orders1 | Function: Centroid for every multi-line orderUnlike single-line orders, multi-line orders can cover several picking locations. Method 1: Clustering for mono-line orders reduce the walking distance by 34%Clustering for mono-line orders reduce the walking distance by Method 3 vs. Ridge, LASSO, and ElasticNet Regression Ridge, LASSO, and ElasticNet RegressionLASSO, Ridge, and Elasticnet regression | Photo by Daniele Levis PulusiThis article is a continuation of last week’s intro to regularization with linear regression. For ridge regression, we penalize the size of the regression coefficient. Now, the additional penalty in order to regularize is either this Ridge regression, which uses the so-called L2 norm, or the LASSO (least absolute shrinkage and selection operator) regression, which uses the so-called L1 norm. The function that does this uses a method called ‘Elasticnet’, know that ridge regression is a specific case of elastic-net, and I will talk more about this later. As you can probably see, the same function is used for LASSO and Ridge regression with only the L1_wt argument changing. Wrangling our brains: can we use data science to strengthen intuition? How to activate intuitionWhen you search “how to activate intuition” online, countless blogs and articles pop up. From business magazines to relationship blogs, there is endless literature on how one might go about activating that sixth sense. The fine line between intuition and data science. There is a bit of a gray area around where intuition ends and data science begins. Which brings us to another question: what is data science? How Do I Become a Data Scientist? The Four Basic Strategies to Learn Data Science 3) People in general learn data science best by doing data science. Even though data scientists will sometimes treat data analytics as a “diet” or “basic” version of data science, data analytics is different field requiring different skills. Data analytics and data science also generally emphasize different fields of math: data analytics tends to rely on statistics while data science on linear algebra, for example. The best way to refine your data science skills is by doing data science: finding or creating contexts to push you as you practice data science. Thus, not all universities have literal data science degrees or departments but instead require that you enroll in a related program like computer science, statistics, or engineering to learn data science. Fusing EfficientNet & YoloV5 — Advanced Object Detection 2 stage pipeline tutorial Fusing EfficientNet & YoloV5 — Advanced Object Detection 2 stage pipeline tutorialPhoto by Greta Farnedi on UnsplashIn this article, I am going to be explaining a concept which I call the “2 class filter”. This is an ensembling technique for object detection and classification models that was heavily used during a Kaggle competition that I have been doing for the last few weeks. Classification — EfficientNetThe next thing to do is to train a classification network on the dataset. And then we are going to be checking each classification prediction. I think if you want to apply it to your custom scenario, you will need to think about what cases the classification network prediction can help your object detection model. MMDetection Tutorial — An End2End State-of-the-art Object Detection Library To download the pretrained weights, simply run these commands:wget will install the weights and the O flag specifies where they will be installed. Then we need to point to the dataset and provide some other attributes. Now that we are almost done with the dataset, let’s do the model and then build both. But first, let’s create that test annotation file that we ignored earlier. One of the best things about MMdetection, is that now if you want to change the models, you can just point to a different configuration file and download a different checkpoint and run the same code! Foundations of NLP Explained Visually: Beam Search, How It Works In this article, I will explore Beam Search and explain why it is used and how it works. We will briefly touch upon Greedy Search as a comparison so that we can understand how Beam Search improves upon it. Beam Search — How it worksWe now understand Beam Search at a conceptual level. Character probabilities for the second position (Image by Author)It generates character probabilities for the second position. ConclusionThis gives us a sense of what Beam Search does, how it works, and why it gives us better results. Tilted Empirical Risk Minimization A toy linear regression example illustrating Tilted Empirical Risk Minimization (TERM) as a function of the tilt hyperparameter $$t$$. In machine learning, models are commonly estimated via empirical risk minimization (ERM), a principle that considers minimizing the average empirical loss on observed data. In contrast, in this post, we describe our work in tilted empirical risk minimization (TERM), which provides a unified view on the deficiencies of ERM (Figure 1). In our work (ICLR 2021), we aim to address deficiencies of ERM through a simple, unified framework—tilted empirical risk minimization (TERM). DiscussionOur work explores tilted empirical risk minimization (TERM), a simple and general alternative to ERM, which is ubiquitous throughout machine learning. Iterated Local Search From Scratch in Python Tutorial OverviewThis tutorial is divided into five parts; they are:What Is Iterated Local Search Ackley Objective Function Stochastic Hill Climbing Algorithm Stochastic Hill Climbing With Random Restarts Iterated Local Search AlgorithmWhat Is Iterated Local SearchIterated Local Search, or ILS for short, is a stochastic global search optimization algorithm. It is related to or an extension of stochastic hill climbing and stochastic hill climbing with random starts. We will use this as the basis for implementing and comparing a simple stochastic hill climbing algorithm, stochastic hill climbing with random restarts, and finally iterated local search. Stochastic Hill Climbing AlgorithmCore to the Iterated Local Search algorithm is a local search, and in this tutorial, we will use the Stochastic Hill Climbing algorithm for this purpose. Iterated Local Search AlgorithmThe Iterated Local Search algorithm is a modified version of the stochastic hill climbing with random restarts algorithm. Featurizing text with Google’s T5 Text to Text Transformer In this article we will demonstrate how to featurize text in tabular data using Google’s state-of-the-art T5 Text to Text Transformer. Featuretools aims to automatically create features for different types of data, including text, which can then be consumed by tabular machine learning models. A Machine Learning Demo Featurizing Text using Hugging Face T5In order to extend the NLP primitives library for use with T5, we will build two custom TransformPrimitive classes. We can easily fine-tune the T5 model for this task by the following:from sklearn.model_selection import train_test_split train_df, eval_df = train_test_split(dft5) model.train_model(train_df, eval_data=eval_df)Next, we load the pre-tuned Hugging Face model. from featuretools.primitives.base import TransformPrimitivefrom featuretools.variable_types import Numeric, Textclass T5Encoder(TransformPrimitive):name = "t5_encoder"input_types = [Text]return_type = Numericdefault_value = 0def __init__(self, model=model):self.model = model def get_function(self): def t5_encoder(x):model.args.use_multiprocessing = Truereturn list(np.array(model.predict(x.tolist())).astype(float))return t5_encoderThe above code creates a new class called T5Encoder which will use the fine-tuned T5 model, and the below code creates a new class called T5SentimentEncoder which will use the pre-tuned T5 model. Self-Supervised Voice Emotion Recognition Using Transfer Learning Therefore, the first preprocessing step was extracting audio clips for each sentence from the full audio files. Splitting audio files by sentences using transcript timestampsThe modeling techniques that I used required same length features, in my case same length audio clips. The audio clips that were longer than seven seconds I cropped with random offsets. Cropping or padding audio files to a fixed lengthAfter splitting and padding my audio files, I used the Librosa library to convert the audio to Mel scale spectrograms. Playback of recorded utterancesConclusionIn summary, I have created a self-supervised machine learning classifier that predicts emotion from voice using transfer learning, trained on labels generated from audio transcripts. Understanding One Way ANOVA An ANOVA Example With Financial DataHere’s some tab denominated data that you can copy and paste directly into Excel. (It need to be organized this way in order to properly run the ANOVA analysis.) We should expect to see a low SS (low variation) across season buckets and a high SS (high variation) within each bucket. And at a basic level, that’s what ANOVA is:ANOVA is a way to test whether the means of different groups are significantly different. Interpreting Our ANOVA ResultsBut what I like about ANOVA is the way it breaks out the sources of variation (explaining variation is basically explaining variance). MMDetection Tutorial — An End2End State-of-the-art Object Detection Library To download the pretrained weights, simply run these commands:wget will install the weights and the O flag specifies where they will be installed. Then we need to point to the dataset and provide some other attributes. Now that we are almost done with the dataset, let’s do the model and then build both. But first, let’s create that test annotation file that we ignored earlier. One of the best things about MMdetection, is that now if you want to change the models, you can just point to a different configuration file and download a different checkpoint and run the same code! Transforming Organizational Decision Making with Collective Reasoning Transforming Organizational Decision Making with Collective ReasoningThe barriers to organizational transformationConfidence, accuracy and speed in making decisions is the holy grail for organizational transformation. Advances in collective intelligence and artificial intelligence offer a breakthrough opportunity to overcome these three barriers and provide a transformative new approach to making well-informed and predictively accurate organizational decisions. Collective reasoning goes a step beyond collective intelligence. Such scoring methods can be trained using ground truth data, integrating human and collective intelligence, to provide a framework for a whole new approach to collective decision making by organizations. Predictive Model with Relevant Reasons and ThemesEach decision process produces a collective reasoning model of a decision: a collective cognitive model of what the group believes will be the outcome of a decision, such as a decision to invest or pass. 4 Cool Tips for Python Beginners We can sort a list using the sort method. One common mistake is to try to assign the sorted list to a new variable. a = [3, 1, 6, 2] a_sorted = a.sort()You may think that “a_sorted” is a list that contains the sorted elements of “a”. The sort method works in place which means it sorts the list “a” but it does not return anything. a[1, 2, 3, 6]If you want to sort a list and assign it to a new variable, you can use the sorted function. Why Data Integrity is key to ML Monitoring Machine Learning (ML) applications are built on data: large amounts of data are streamed into ML systems in order to train models on historical examples and make high-quality predictions in real-time. Yet many machine learning projects degrade or fail, in large part because data integrity is difficult to maintain. For example, ML models often need to combine data from both historical batch sources and real-time streams. With so many moving parts including data and model versions, it’s common for ML models in production to see data inconsistencies and errors. Unlike other types of software, ML applications lack a comprehensive solution that puts the right processes and monitoring in place. Installing CVAT (Intel’s Computer Vision Annotation Tool) on the Cloud CVAT on ScalewayGathering the resourcesAs we have established in the previous section, there are two cloud resources that we need to take our data annotation to the next level: an object storage bucket and an instance. Here’s a step-by-step guide to procuring them:If you have not already, you will need to create an account and log into console.scaleway.com. In order to SSH to your Scaleway instances, you’ll need to create an SSH key. Now it is time to create your object storage bucket! This can be done from the Storage / Object Storage tab in the Scaleway console. Sentiment Analysis and Emotion Recognition in Italian (using BERT) Sentiment Analysis and Emotion Recognition in Italian (using BERT)A screenshot of the sentiment and emotion classification library we have built. We created a new data set for Italian sentiment and emotion prediction and fine-tuned a BERT model. FEEL-IT: Emotion and Sentiment Classification for the Italian Language. Sentiment ClassificationTo evaluate our data set and model for sentiment analysis, we compared our FEEL-IT UmBERTo to the same model on another data set: SentiPolc16. FEEL-IT: Emotion and Sentiment Classification for the Italian Language. Running multiple GPU ImageNet experiments using Slurm with Pytorch Lightning Each GPU then calculates the forward pass and the output predictions are aggregatedThe backward pass is a bit more tricky. Instead, each GPU is responsible for sending the model weight gradients — calculated using its sub-mini-batch — to each of the other GPUs. If you don’t, your accuracy will be GPU dependent based only on the subset of data that GPU sees. Setting GPU device and DDP backendNow we need to update our trainer to match the number of GPUs we’re using. As mentioned earlier, I’m using DDP as my distributed backend so set my accelerator as such. Foundations of NLP Explained Visually: Beam Search, How It Works In this article, I will explore Beam Search and explain why it is used and how it works. We will briefly touch upon Greedy Search as a comparison so that we can understand how Beam Search improves upon it. Beam Search — How it worksWe now understand Beam Search at a conceptual level. Character probabilities for the second position (Image by Author)It generates character probabilities for the second position. ConclusionThis gives us a sense of what Beam Search does, how it works, and why it gives us better results. A Guide to (Highly) Distributed DNN Training A Guide to (Highly) Distributed DNN TrainingPhoto by Laura Ockel on UnsplashThese days data distributed training is all the rage. The training time performance: Training time performance refers to the speed at which we are able to train. Take a look at TensorFlow’s distributed training guide for an overview of the additional distributed strategies that are supported. Chapter 4 — Training Data InputThere are two main strategies for managing the input data in a data distributed training scenario. In such a case, you might want to reconsider running distributed training so as to halve the training cost. OpenAI Scholars 2020: Final Projects The OpenAI Scholars program provides stipends and mentorship to individuals from underrepresented groups to study deep learning and open-source a project. Demo Day introductions by Sam Altman and Greg BrockmanLearn more about our Scholars program. I wanted to have a say in how AI is shaped—the Scholars program has been a great opportunity to learn and participate. I joined the Scholars program in order to learn from the brilliant folks at OpenAI and to immerse myself in AI research. The OpenAI Scholars program was this magical opportunity to get started by learning from the very best minds in the field. OpenAI Scholars 2020: Applications Open We are now accepting applications for our third class of OpenAI Scholars, a 4-month full-time program where we provide stipends and mentorship to 8 individuals from underrepresented groups to study deep learning and open-source a project. The second class of Scholars recently released their projects and presented their work at the 2019 Scholars Demo Day. While we hope that some of the scholars will join OpenAI, we want this program to improve diversity in the field at large. For Bay Area participants, we offer an optional desk at the OpenAI office (which our past Scholars have found very valuable). We ask all Scholars to document their experiences studying deep learning to hopefully inspire others to join the field too. OpenAI Scholars Spring 2019: Final Projects Blog PostGitHub Repo The OpenAI Scholars program allowed me to build a solid foundation in deep learning and gain a thorough understanding of Natural Language Processing and Understanding. Blog PostGitHub Repo Before joining the Scholars program I had already undertaken a plan to self-study robotics. The OpenAI Scholars program gave me the opportunity to greatly enhance my self-study with a curriculum focused exclusively on Deep Reinforcement Learning. The OpenAI Scholars program provided me with the guidance and resources to learn core deep learning methods in a short amount of time. Over the first two months of self-designed study, I learned about the theory of reinforcement learning and became acquainted with how to implement deep reinforcement learning algorithms from scratch. Logistic Regression, Clearly Explained (Plus Nine Handy Cheat Sheets) Is an engaging, energizing post something you could use today? Read Carolina Bento’s practical introduction to logistic regression, where she lays out with great clarity the model’s real-life use cases. Kendric Ng approached this topic through a specific—and timely—angle, asking whether or not a data science master’s degree is necessary for people who want to land a job at tech companies. Taking a cue from Kendric, the TDS team collected several other related posts from our archives, presenting a wide range of opinions on the importance of degrees and credentials in data science and adjacent fields. As always, we thank all of you for pushing us to publish the best work on data science and to bring new and exciting voices into this community. Stay updated with Neuroscience: March 2021 Must — Reads 2: (A): The implemented k-tree model, with dendritic constraints, and a “control” fully connected neural network (FCNN). In spite of these encouraging results, the current k-tree dendritic model has some limitations. x(t) is the stochastic signal, r(t) is the retina firing rate, namely the electric-spikes retrieved from the stimulated retina. In particular, r(t) may contain the anticipatory information created by the retina as a result from the signal x(t) and its time modification ẋ(t). In particular, it seems there is a synergy between the input signal x(t) and its time modification ẋ(t) that gives rise to the anticipation of the retina. Boston Airbnb Business Analysis and Listing price prediction I offer here an analysis of how the business has been growing in Boston and propose a model to predict the price of a given listing. My analysis will focus on Airbnb data collected from 2008 to 2016. I will try to answer the questions below:1- How is the evolution of the Airbnb business in Boston? 4- What are the features that influence the price in Boston Airbnb? Part I: How is the evolution of the Airbnb business in Boston? A Guide to (Highly) Distributed DNN Training A Guide to (Highly) Distributed DNN TrainingPhoto by Laura Ockel on UnsplashThese days data distributed training is all the rage. The training time performance: Training time performance refers to the speed at which we are able to train. Take a look at TensorFlow’s distributed training guide for an overview of the additional distributed strategies that are supported. Chapter 4 — Training Data InputThere are two main strategies for managing the input data in a data distributed training scenario. In such a case, you might want to reconsider running distributed training so as to halve the training cost. GANscapes: Using AI to Create New Impressionist Paintings The discriminator tries to discern real images from the training set from the fake images from the generator. Since then, Nvidia has released a new version of their AI model, StyleGAN2 ADA, designed to yield better results when generating images from a limited dataset. One of the significant improvements in StyleGAN2 ADA is dynamically changing the amount of image augmentation during training. Training StyleGAN2 ADAI trained the GAN on Google Colab Pro for about three weeks. !python stylegan2-ada/train.py --aug=ada --p 0.186 --mirror=1 \--metrics=none --snap=1 --gpus=1 \--data=/content/drive/MyDrive/GANscapes/dataset_1024 \--outdir=/content/drive/MyDrive/GANscapes/models_1024 \--resume=/content/drive/MyDrive/GANscapes/models_1024/00020-dataset_1024-mirror-auto1-ada-p0.183-resumecustom/network-snapshot-000396.pklI replaced 0.186 with the last used p value and the resume path to the previously saved model. Understand Support Vector Machines There are numerous libraries available that can help me use SVM without having to worry about the underlying concepts. You don’t need to understand SVM to be able to use it. You can even apply SVM to solve classification problems without requiring to understand the fundamental concepts behind it. Similarly, having access to SVM as a tool will not necessarily make you a useful data scientist. SVM is an ideal algorithm to understand the geometric viewpoint of supervised machine learning and consolidate the use of vector algebra in machine learning. Image classification using Machine Learning, made simple Image classification using Machine Learning, made simpleNote: In order to make it even simpler to understand, some technical steps are skipped. Even if you have little experience with Machine Learning, I’m sure you can follow me. The Machine Learning partThe Machine Learning algorithm that is extremely good at classifying things (and many other tasks involving images) is known as Convolutional Neural Network. That means that it is super easy to build a binary Machine Learning classification algorithm, and it is highly effective too! I really hope you enjoyed this, and I would love to know if the algorithm works on your image. Simple Logistic Regression using Python scikit-learn Simple Logistic Regression using Python scikit-learnlogistic regression python cheatsheet (image by author from www.visual-design.net)What is Logistic Regression? Don’t let the name logistic regression tricks you, it usually falls under the category of the classification algorithm instead of regression algorithm. Simply put, the prediction generated by a classification model would be a categorical value, e.g. cat or dog, yes or no, true or false … On the contrary, a regression model would predict a continuous numeric value. Logistic regression makes predictions based on the Sigmoid function which is a squiggles-like line as shown below. Machine Learning: Process for solving any Machine Learning problem Photo by Jr Korpa on UnsplashMachine Learning: Process for solving any Machine Learning problemWhen we solve a machine learning problem, the algorithm that we may end up using depends on the type of data that we have and the problem itself. After reading this article, you will -Understand the basics of Machine LearningSee various steps involved in the Machine Learning processGet to know about other popular Machine Learning FrameworksMachine LearningMachine Learning is a field of study concerned with building systems or programs which have the ability to learn without being explicitly programmed. Now if you would like to get a general introduction to Machine Learning, check out this post -Now that we understand what Machine Learning is, let us see how it is applied to solve interesting business problems. Machine Learning ProcessA process is defined as a series of actions of steps taken in order to achieve a particular end. Other Machine Learning FrameworksThere are several other machine learning processes as well, but more or less they all tend to be very similar. Image-to-Text Generation for New Yorker Cartoons Image-to-Text Generation for New Yorker CartoonsNo computer has ever won the New Yorker Cartoon Caption Contest, and for good reason: it’s hard. People have created New Yorker caption generators, but none actually use the image of the cartoon to generate the caption. I tackled this task as a learning exercise for image-to-text algorithms, and after a lot of trial and error, I created an image-to-text model that produced decent captions for New Yorker cartoons based on the image alone. Here is the model’s submission for this week’s New Yorker caption contest (contest #749). The model’s submission for New Yorker Caption Contest #749. LSTM Framework For Univariate Time-Series Prediction Code & Data Walkthrough — Data PrepThe data is the US/EU exchange rate from 2010 to present, not seasonally adjusted. So the first data point will be the first 60 days of data. The second data point is the first 61 days of data but not including the first. The third data point is the first 62 days of data but not including the first and second. Our sequence length is 60 days for this part of the code. Transfer Learning and Data Augmentation applied to the Simpsons Image Dataset Transfer Learning and Data Augmentation applied to the Simpsons Image Dataset1. Another method to work with limited data is by using Data Augmentation (DA). Traditional ML algorithms rely significantly on feature engineering, while Deep Learning (DL) focuses on learning data by unsupervised or semi-supervised feature learning methods and hierarchical feature extraction. We start by implementing a Convolutional Neural Network (CNN) model from scratch to be used as the benchmark model. Data AugmentationAs we saw above, DA is a set of methods used to inflate a dataset while reducing overfitting. An in-depth EfficientNet tutorial using TensorFlow — How to use EfficientNet on a custom dataset. An in-depth EfficientNet tutorial using TensorFlow — How to use EfficientNet on a custom dataset. The dataset we are going to be using here is a Chest X-ray dataset from the Kaggle competition VinBigData. We will be using a resized version of 512x512 images since the original images are quite huge (2k+). Anyway, the main aim of the tutorial is to for you to use it on a custom dataset. target_size=(height, width),batch_size=batch_size,# Since we use categorical_crossentropy loss, we need categorical labelsclass_mode="categorical",)validation_generator = test_datagen.flow_from_directory(VAL_IMAGES_PATH,target_size=(height, width),batch_size=batch_size,class_mode="categorical",)model.compile(loss="categorical_crossentropy",optimizer=optimizers.RMSprop(lr=2e-5),metrics=["acc"],)If all goes according to plan, you should get a similar message to this:Found X images belonging to x classes. State-of-the-Art Data Labeling With a True AI-Powered Data Management Platform How can Superb AI’s Platform help in Data LabelingSuperb AI has a unique approach to data labeling. Steps to Labeling Data in Superb AI’s PlatformSuperb AI’s platform is straightforward and user-friendly to use and perform data labeling. After creating the class groups, Superb AI gives us a complete project overview as shown below:Screenshot from Superb AI’s platform. To start the data labeling process, please follow the following steps:Open the project → Click on Start Labeling. By continuing to innovate, we are confident that Superb AI will become the best global SaaS platform in the space of machine learning data management. Develop a Neural Network for Woods Mammography Dataset # define model model = Sequential ( ) model . shape [ 1 ] # define model model = Sequential ( ) model . shape [ 1 ] # define model model = Sequential ( ) model . shape [ 1 ] # define model model = Sequential ( ) model . shape [ 1 ] # define model model = Sequential ( ) model . How we’re using Fairness Flow to help build AI that works better for everyone To help do that, Facebook AI developed a tool called the Fairness Flow, and we’re sharing more details here. But Fairness Flow can provide necessary insight to help us understand how some systems in our products perform across user groups. We are working to understand and potentially expand the ways Fairness Flow can be used for more AI models. Assessing model fairness using Fairness FlowSome AI models are designed to predict whether certain outcomes are true or false, likely or unlikely, or positive or negative. Fairness Flow provides metrics that speak to multiple dimensions of fairness, but product teams ultimately determine how to perform the measurements that fit their context. Rust detection using machine learning on AWS Conventionally, corrosion detection is done using visual inspection of structures and facilities by subject matter experts. In this post, we describe how to build a serverless pipeline to create ML models for corrosion detection using Amazon SageMaker and other AWS services. Solution overviewThe corrosion detection solution comprises a React-based web application that lets you pick one or more images of metal corrosion to perform detection. Deep learning approachIn recent years, deep learning has been used for automatic corrosion detection. The most challenging aspect of this problem when using deep learning is that corroded parts of structures don’t have predictable shapes, which makes it difficult to train a comprehensive deep learning model using object detection or semantic segmentation. How Digitata provides intelligent pricing on mobile data and voice with Amazon Lookout for Metrics History of anomaly detection at DigitataWe have been through four phases of anomaly detection at Digitata in the last 13 years:Manually monitoring our KPIs in reports on a routine basis. We can use traditional anomaly detection methods to have anomaly detection on a measure, such as revenue or purchase count. How we used Amazon Lookout for MetricsInside Amazon Lookout for Metrics, you need to describe your data in terms of measures and dimensions. The data is then stored in Amazon S3 and in Amazon CloudWatch, from where we can use services such as Amazon Lookout for Metrics. He has a special interest in launching AI services and helped grow and build Amazon Personalize and Amazon Forecast before focusing on Amazon Lookout for Metrics. Feature Engineering Examples: Binning Categorical Features Feature Engineering Examples: Binning Categorical FeaturesWorking with categorical data for machine learning (ML) purposes can sometimes present tricky issues. For example, your model performance may benefit from binning categorical features. In this post, we’ll briefly cover why binning categorical features can be beneficial. Then we’ll walk through three different methods for binning categorical features with specific examples using NumPy and Pandas. High cardinality can also exacerbate the curse of dimensionality if you choose to one hot encode your categorical features. Python loops: A comprehensive guide You are given a list of values and asked to do something with each item. ‘nested loops’). Even if you are not interested in names, by i and j you are specifying both of these items and asking to append the item j (age) in a new list. Looping over dictionariesDictionaries in Python are a collection of key-value pairs — meaning, every item in the dictionary has a key and an associated value. sent = 'the sky is blue' # splitting the sentence into wordssent_split = sent.split() # extract each word with a loopfor i in sent_split:print(i) Out:theskyisblueWhile loopsLike for loops, while loops repeatedly execute a block of code — as long as a condition is true. Hypothesis Testing In this article, we seek to explain the usefulness of the most common hypothesis tests and when to use them. The null hypothesis vs the alternative hypothesisThe null hypothesis is the default claim, premise, value or parameter that is generally accepted. The alternative hypothesis is a research or investigative hypothesis that includes the claim to be tested which could hopefully replace the null hypothesis, in case it is proved to be true. Still, let’s run our test to see if our intuition is right:[IN]: shapiro_results_men = shapiro(men['Purchase']) print('Shapiro-wilk test results for men: {}'.format( test_result(shapiro_results_men[1])) ) shapiro_results_women = shapiro(women['Purchase']) print('Shapiro-wilk test results for women: {}'.format( test_result(shapiro_results_women[1])) ) [OUT]: Shapiro-wilk test results for men: Reject null hypothesis Shapiro-wilk test results for women: Reject null hypothesisHere, rejecting the null hypothesis means rejecting the hypothesis that the observations are normally distributed. Going furtherThere are many hypothesis tests in statistics and this list is far from comprising them all. The Dying ReLU Problem, Clearly Explained What’s the Dying ReLU problem? The dying ReLU problem refers to the scenario when a large number of ReLU neurons only output zero values. From the red outline below, we can see that this happens when the inputs are in the negative range. When most of these neurons return output zero, the gradients fail to flow during backpropagation and the weights do not get updated. However, the dying ReLU problem does not happen all the time, since the optimizer (e.g. Building a Mask R-CNN from scratch in TensorFlow and Keras The last convolutional layers filter number is equal to the number of classes. When checking the predicted mask, we need to use the filter, corresponding to the classlabel. (As in case of predictions, you don’t have the ground truth classlabels, so you will need the predicted classlabel to choose the proper mask.) Hopefully this article helped in understanding the basics, so that you can implement your own, even improved Mask R-CNN. You can find the whole implementation, and the toydataset generation files at:https://github.com/rajkifranciska/maskrcnn-from-scratchIf this article was helpful, please cite me as:@misc{rajki_2021,title={Building a Mask R-CNN from scratch in TensorFlow and Keras}, journal={Medium},author={Rajki, Franciska},year={2021}, month={Mar}} 4 Ways to Effectively Debug Data Pipelines in Apache Beam 4) Using labels is recommended but each label MUST be uniqueBeam can use labels in order to keep track of transformations. As you can see in the beam pipeline on Google Cloud below, labels make it VERY easy for you to identify different stages of processing. This means that there is less configuration involved in order to get your unit test coded, and less configuration typically means time saved. Parentheses are helpful: Since PCollections can do multiple transformations all at once (‘a composite transform’), it is quite likely that transformations will span multiple lines. Using labels is recommended but each label MUST be unique:Using labels for steps within your pipeline is critical. CIFAR 100: Transfer Learning using EfficientNet CIFAR 100: Transfer Learning using EfficientNetPhoto by Marina Vitale on UnsplashConvolutional Neural Network (CNN) is a class of deep neural networks commonly used to analyze images. ?Transfer LearningAs stated in the Handbook of Research on Machine Learning Applications, transfer learning is the improvement of learning in a new task through the transfer of knowledge from a related task that has already been learned. In simple terms, transfer learning is a machine learning technique where a model trained on one task is re-purposed on a second related task. Model training using transfer learningTo train a machine learning model, we need a training set. train_data_generator = DataGenerator(X_train_data, y_train_data, augment=True)valid_data_generator = DataGenerator(X_val_data, y_val_data, augment=False)The EfficientNet class is available in Keras to help in transfer learning with ease. Gentle Introduction to Classification Models Discriminative classifiers that model the conditional probability distribution of the target given an input variable Pr(t| x ). The general consensus is that discriminative models outperform generative models in most cases. Yet, it cannot be stressed enough though that is not always the case, and you should not disregard generative models. As an example, generative adversarial networks (GANs) are generative models that have proved extremely useful in a variety of tasks. There are a few other reasons why you shouldn’t disregard generative models, e.g. Building a Python Code Generator Building a Python Code GeneratorCan NLP techniques be used to generate actual code? In this blog, I make an attempt to build a python code generator that can convert simple English problem statements into their corresponding python code. Here, our English sentence will be our Input or SRC sequence, and our Python Code will be our Output or TRG sequence. This dataset contains about 5000 data points where each data point comprises an English problem statement and its corresponding Python code. We have successfully trained a model that is capable of converting simple problem statements(in English) into corresponding python code. Build A Machine Learning App In Less Than an Hour Build A Machine Learning App In Less Than an HourIntroduction:Machine learning has achieved a lot of success in the past few years and growing super fast. However, it also depends on the machine learning experts to develop a machine learning model like doing the pre-processing, feature engineering, build the model, hyperparameter tuning, etc. Also, create a machine learning app using the Streamlit framework with bare minimum code. The aim of the article is to attract a lot of people who are interested in machine learning and don’t have coding experience to try out machine learning. Automated machine learning, also referred to as automated ML or AutoML, is the process of automating the time-consuming, iterative tasks of machine learning model development. Four Deep Learning Papers to Read in April 2021 Four Deep Learning Papers to Read in April 2021Welcome to the April edition of the ‚Machine-Learning-Collage‘ series, where I provide an overview of the different Deep Learning research streams. Thereby, I hope to give you a visual and intuitive deep dive into some of the coolest trends. So without further ado: Here are my four favourite papers that I read in March 2021 and why I believe them to be important for the future of Deep Learning. Options (Sutton et al., 1999) are one specific type of such a temporal abstraction. Veeriah et al. Aerobotics improves training speed by 24 times per sample with Amazon SageMaker and TensorFlow Our predominant data source is aerial drone imagery: capturing visual and multispectral images of trees and fruit in an orchard. We aim to continue making training faster by using more devices, and making it more efficient by leveraging SageMaker managed Spot Instances. We also aim to make the training loop tighter by serving SageMaker models that are capable of online learning, so that improved models are available in near-real time. To learn more about Amazon SageMaker, visit the product page. About the AuthorMichael Malahe is the Head of Data at Aerobotics, a South African startup that builds AI-driven tools for agriculture. How to start with machine learning for free How to start with machine learning for freeThe field offers competitive salaries, it’s challenging and fun. The question I get asked the most often is: How to start with Machine Learning? By studying these resources you’re going to learn if Machine Learning is right for youDon’t start spending money on expensive courses right away. Reading the book is recommended for machine learning practitioners, data scientists, statisticians, and anyone else interested in making machine learning models interpretable. Adversarial search will be explored through the creation of a game and an introduction to Machine Learning includes work on linear regression. Not all pixels matter for classification Sparse sensor placement for classificationOur baseline model has to observe all 4096 pixels of the image before making a decision. Finally, the most important pixels are inferred from the projection basis and the low-dimensional decision vector. We can still nonetheless exploit this low-dimensional embedding to identify the most relevant pixels for classification. The equality constraints Ψᵀs = w enforces that the selected pixels indeed project the image onto the appropriate decision line. As such, s can be understood as a mask selecting the most informative pixels for classification purposes. What Are Explainable AI Principles? Explainable AI (XAI) principles are a set of guidelines for the fundamental properties that explainable AI systems should adopt. Explainable AI seeks to explain the way that AI systems work. The four explainable AI principles are designed to help us use AI safely, effectively and for its intended purpose. To learn more about explainable AI, here’s a helpful introduction to what explainable AI is, how it works and why it matters. In summaryExplainable AI principles are guidelines for the properties that AI systems should adoptfor the properties that AI systems should adopt There are four principles developed by the NISTdeveloped by the NIST The principles focus on the ability of an AI system to provide an explanation that is meaningful and accurate , while operating within the limits for which it was designedthat is , while operating for which it was designed The four principles help to promote the safe and effective use of AI systems as AI becomes more important in our everyday livesRead more about AI, NLP and analytics at: https://highdemandskills.com/ Four Deep Learning Papers to Read in April 2021 Four Deep Learning Papers to Read in April 2021From Meta-Gradients to Clockwork VAEs, a Global Workspace Theory for Neural Networks and the Edge of Training Stability Robert Lange Just now·7 min readWelcome to the April edition of the ‚Machine-Learning-Collage‘ series, where I provide an overview of the different Deep Learning research streams. Thereby, I hope to give you a visual and intuitive deep dive into some of the coolest trends. So without further ado: Here are my four favourite papers that I read in March 2021 and why I believe them to be important for the future of Deep Learning. Options (Sutton et al., 1999) are one specific type of such a temporal abstraction. Veeriah et al. 4 Useful clustering methods you should know in 2021 Step 2: Get cluster labelsWe get 3 cluster labels (0, 1 or 2) for each observation in the “Iris” data. Non-hierarchical methodsK-Means ClusteringIn k-means clustering, the algorithm attempts to group observations into k groups (clusters), with roughly the same number of observations. To learn more about how K-means clustering works, step-by-step implementation, objectives and assumptions of K-means clustering and how to find the optimal number of clusters (hyperparameter tuning for k), read my “Hands-On K-Means Clustering” post. Today, we have discussed 4 different clustering methods and implemented them with the “Iris” data. To learn more about hyperparameter tuning in clustering, I invite you to read my “Hands-On K-Means Clustering” post. Python for Data Science Cheat Sheet (2021) Python has become the most popular computing language to perform data science in 2021. But before you can make astounding deep learning and machine learning models you need to know the basics of Python and the different types of objects first. Check out the different sections below to learn the various types of objects and their capabilities. Variables and Data TypesA variable in Python is used to store values, and here you’ll learn how to assign variables to a specific value and how to change that value, along with converting to different data types in Python. Variables to strings>>> str(x)'5'Variables to integers>>> int(x)5Variables to floats>>> float(x)5.0Variables to booleans Real Face or AI-Generated Fake?. Can you tell the difference between the… Applications of AI trickeryIn the past few years, an application of AI called “deepfake” has taken the Internet by storm. The second network would learn to identify the difference between real and fake images. I created an application to quiz you on whether you can tell a real person’s face from a fake one. I used the 1-million fake face dataset to get GAN-generated images for this project, and Kaggle’s UTKFace dataset for the real images. The discriminator in a GAN learns the difference between fake and real images. Teaching Machines to Read Movie Reviews: Thinking About Interpretability Teaching Machines to Read Movie Reviews: Thinking About InterpretabilityPhoto by h heyerlein on UnsplashWhat makes a movie review negative or positive? Now, accurately classifying movie reviews as thumbs up/thumbs down is pretty simple (existing methods are already about 99% accurate). Still, teaching machines to read more like humans could mean classifying movie reviews with more granularity, or perhaps move us towards machines that can make sense of complex, higher stakes things (like corporate ethics statements or online conspiracy theories). Words like “stinker” are a first clue to how we read reviews: words that are not context-dependent, but consistently have a good/bad meaning. Using Words to Understand Movie ReviewsWe can start off by using words (the lexical level of language) for our RF classifier. Feedback Transformers from Facebook AI Introducing Transformers with Feedback MemoryIn order to defeat the limitations of Transformer architecture, the concept of feedback memory is introduced. In order to overcome this, feedback memory mashes all the information for a particular time step to a single memory representation. having less layers in the architecture, the Transformers show lot more fall in performance as compared to the feedback transformer. Feedback Transformers converges to reach higher average reward in reinforcement learning as compared to TransformersFigure-7: Maze Navigation in Gridworld. [3] Feedback Transformers by Yannic Kilcher. Overparameterized but generalized ~ Neural Network Overparameterized but generalized ~ Neural NetworkCredit: Maximalfocus UnsplashDeep learning has evolved from just being parameterized non linear functions to being used in major computer vision and natural language processing tasks. These models give 0 training error, therefore highly overfitting the training data but are still able to give good test performance. They had the following view for their approach-If a deep neural network assigns the same label to two images, they have to converge into a similar representation at some stage of the network. Training in a scenario when unlimited training data was present (ideal world) to where limited data is present (real world) gives similar test error until real world converges. Thus, one can study models in the real world by studying their corresponding behavior in the ideal world. Upgrade Your Beginner NLP Project with BERT Upgrade Your Beginner NLP Project with BERTPhoto by Brett Jordan on UnsplashIntroductionWhen I first started learning Data Science and looking at projects, I thought you could either do a Deep Learning or regular project. This tutorial is ideal for someone who already has an NLP project and is looking to upgrade it and get a taste for Deep Learning . Embedding Words as VectorsBag of Words models have 3 key issues:Similar words are not related to each other. Yes, we can leverage state of the art, deep learning models using just a few lines of code. It is a deep learning model with a transformer architecture. Implementing Single Shot Detector (SSD) in Keras: Part V — Predictions Decoding Implementing Single Shot Detector (SSD) in Keras: Part V — Predictions DecodingPredictions decoding process. Step 1: Bounding Boxes DecodingFigure 1: Formula to decode ssd (centroid with standard deviated encoded) bounding boxes. Therefore, we can sort those predictions by their confidence score and select the k highest confidence score. Keras’s Layer for Decoding SSD PredictionAfter the understanding each of the steps for decoding SSD predictions above, we can put them together into one Keras’s layer. The benefit of creating a Keras’s layer for the decoding process is that we can create a model file that has the decoding process built in. Tune XGBoost Performance With Learning Curves Tutorial OverviewThis tutorial is divided into four parts; they are:Extreme Gradient Boosting Learning Curves Plot XGBoost Learning Curve Tune XGBoost Model Using Learning CurvesExtreme Gradient BoostingGradient boosting refers to a class of ensemble machine learning algorithms that can be used for classification or regression predictive modeling problems. Learning curves are widely used in machine learning for algorithms that learn (optimize their internal parameters) incrementally over time, such as deep learning neural networks. For more on learning curves, see the tutorial:Now that we are familiar with learning curves, let’s look at how we might plot learning curves for XGBoost models. Plot XGBoost Learning CurveIn this section, we will plot the learning curve for an XGBoost model. Tune XGBoost Model Using Learning CurvesWe can use the learning curves as a diagnostic tool. 7 Traits of Incredibly Efficient Data Scientists Data science is built on repetitive tasks, including the fundamentals of obtaining, preparing, and cleaning data. By choosing the right algorithms and tools from the start, a data science project becomes much more efficient. Here are some resources to help you choose the best tools and algorithms for your next data science project:3. The most efficient data scientists make time to learn new things and better themselves. Data science is one of those professions where you’re expected to do a lot of different tasks perfectly on any given day. Boost your Data Viz Productivity Through data visualization, our stakeholders understand the impacts of our analysis. However, despite its importance, I have been receiving questions on how aspiring data scientists can start exploring data analytics. Simple, with Data Visualization. Declare your needsAltair supports declarative programming which allows you to build Altair visualizations based on the input data and output properties. Both are important tools for Data Scientists to optimize their data retrieval and devops process. Three Model Compression Methods You Need To Know in 2021 Three Model Compression Methods You Need To Know in 2021With the advent of convolutional neural networks and transformers to handle complex image recognition and natural language processing tasks, deep learning models have skyrocketed in size. Even in the absence of such applications, the costs imposed by a large model size are well-worth navigating around. As such, the problem of model compression is important. Model compression aims to alleviate the costs of large model sizes (like the ones mentioned above) by representing the model in a more efficient format with minimal impact on its performance. In recent research, three methods have emerged as especially important (and interesting) strategies for model compression. 3 lesser-known pipe operators in Tidyverse Tee pipeThe tee pipe operator %T>% works almost like %>% operator, except in situations when one of the operations in a sequence of operations does not return a value. Let us look at an example, where we write a sequence of operations using the main pipe operator %>% . To tackle this problem, we will use the tee pipe operator before the plot() function. Redoing the above example with tee pipe operator. # using tee pipe operatorrnorm(100) %>%matrix(ncol=2) %>%sin() %T>%plot() %>%colSums() # output[1] 2.372528 -4.902566Image by authorWe can see from form the above example, with the tee pipe operator the complete sequence of operations is executed. Saving Money Using Data Science In this article, I am going to show you how you can use the same technique to save money when renting an apartment. So, some open source data with a few lines of R code can save you1,300 per year if you live in Boston. Choosing the right month to sign a lease in Miami, FL will save you about $360 per year. Image by AuthorImage by AuthorThe dataset from Apartment List contains complete data for 479 cities. Finally, if you want a very fast 1–2 minute read, try this article on eight (8) tips for improving communications between data science and business users. Survival Analysis: censoring, survival functions and hazard functions. Often neglected in the implementations of the most popular machine learning and statistical analysis frameworks is survival analysis. As you may have guessed by the name, survival analysis has historically been employed by the medical research community to measure the survival rate on certain drugs or treatments for various conditions. You can use survival analysis to predict when one of your current customers will stop using your service (churn), or when a machine you made will break (failure-time analysis). Sociologists use survival analysis to predict the occurrence and timing of events (event-history analysis). In the future, I plan to dig a little deeper and implement some survival analysis models with pysurvival. Polynomial Regression in Python. Machine Learning from Scratch: Part 4 Polynomial RegressionLet’s take the following dataset as a motivating example to understand Polynomial Regression, where the x-axis represents the input data X and y-axis represents y the true/target values with 1000 examples( m ) and 1 feature( n ). Polynomial vs Linear Regression; Image by AuthorSo, one question you have to answer while fitting models to data is — What features do you want to use? One important thing to note here is that our hypothesis is still linear, because X² or X^0.5 are only features. The figure below shows the first 10 examples, where we added a new feature X² to our input data. Transforming input(X), adding the X² feature; Image by AuthorWe can add as many features as we want, which will be some exponentiation of the feature we already have. The Counter-Intuitiveness of Fairness in Machine Learning The Counter-Intuitiveness of Fairness in Machine LearningThe idea that what happened in the past can serve as a good predictor of the future is the central tenet behind much of the incredible success of Machine Learning (ML). In addition, there is a burgeoning research community examining how we can safeguard fairness in predictive algorithms under these laws (e.g. As a number of scholars point out, we are still left with correlated variables that are tantamount to the use of protected variables. For example, in Yang and Dobbie’s study, they found that all their input variables were correlated with the protected variables. Step 2: Make predictions using coefficient estimates from step 1 and the average values of the protected variables. Deep Text Corrector using Monotonic Attention (with dataset creation) Creating Dataset for this case studyFor this project, along with the movie corpus dataset (Cornell Movie Dialogue Corpus), I have created a dataset from scratch. The text preprocessing has been done using Regular Expressions. Repairing the data- A novel contains a lot of quoted text said by different characters. These were then detected and replaced to their original form i.e\\xe2\x80\x9c → “\\xe2\x80\x9d → ”xx_v4 = re.sub(r"\\xe2\\x80\\x9c", '"', xx_v3, flags=re.IGNORECASE) xx_v4 = re.sub(r"\\xe2\\x80\\x9d", '"', xx_v4, flags=re.IGNORECASE)4. So, we removed these half-sentences by identifying the words like — ‘said’, ‘whispered’, ‘asked’ etc ***→Lastly, after this also, some sentences are questions. Predicting 2020–21 NBA’s Most Valuable Player using Machine Learning Predicting 2020–21 NBA’s Most Valuable Player using Machine LearningPhoto by Keith Allison on Wikimedia CommonsAt the end of every season, media members across the National Basketball Association (NBA) are asked to decide on the winner of the league’s most sought-after individual regular season award: The Most Valuable Player (MVP). Created in the 1955–56 season, it aims to reward the best performing and most consistent player of the regular season. With more than half of the season’s games in the books, it is becoming clear who the real MVP candidates are. The target value we are trying to predict is the Share of the total MVP votes each player gets. BPM stands for Box Plus/Minus and is a metric that estimates a basketball player’s contribution to the team when that player is on the court. A Beginner’s Guide to Image Augmentations in Machine Learning Data Augmentation is one of the most important yet underrated aspects of a machine learning system and has a significant impact on the model's performance. In this article, we will go over some prevalent image augmentation techniques and also discuss why such methods are required in the first place. It becomes increasingly important to generate more training data artificially using data augmentation techniques in all such cases. It becomes increasingly important to generate more training data artificially using data augmentation techniques in all such cases. In the next section, we would go on to discuss various image augmentation techniques with python code. One common misconception about Random Forest and overfitting One common misconception about Random Forest and overfittingPhoto by Robert Bye on UnsplashDoes 100% train accuracy indicate overfitting? There are numerous suggestions to tune the depth of trees in Random Forest to prevent that from happening: see here or here. The post explains why 100% train accuracy with Random Forest has nothing to do with overfitting. With these 3 concepts in mind, it is easy to see how Random Forest can produce 100% training accuracy. Train a fully grown simple decision tree and Random Forest on the train set and make predictions to the two test sets. Creative Collaboration with AI Artificial Intelligence c̶a̶n̶ ̶b̶e̶ is creative. Using various ML models and approaches, you can (for example):There are more and more models, solutions, and applications with ML/DL-powered generative force. It allows you to augment your abilities and skills, realize your visions and create your dreams with the help of various AI models and apps. There is an unleashed creative potential waiting for being recognized. So, like in the case of GPT-3, I stopped using AI… 3 Key Pieces of Information About Logistic Regression Algorithm If the probability of an email being spam is 0.9, then the probability of this email not being spam is 0.1. Log odds is the logarithm of the odds. Please take a look at the following table to see why we care about the log odds for the logistic regression algorithm. The log odds of the probability value of 0.5 is 0. It is an important relationship for the logistic regression algorithm as we will see in the following part of the article. How I achieved 90% accuracy on a text classification problem with ZERO preprocessing I recuperated the training and test test from John Snow Labs (a must see reference for all things NLP). In the first chunk you can see that the text in the description column is converted to a document using the DocumentAssembler. This document column is then used as the input for BERT sentence embeddings. Without any preprocessing at all, which can be quite time consuming, BERT sentence embeddings were used to obtain an excellent classification of our 4 categories. I kept this one short and sweet just to introduce you to the power of sentence embeddings. Text-to-Speech: One Small Step by Mankind to Create Lifelike Robots We combined the four individual sounds using an audio editor and you can play the recordings below to hear the output. However, although we have technically solved text-to-speech, the speech output is far from natural-sounding. Using our approach described earlier, the word “right” will sound the same in the sentences “Are you all right?” and “You are right.”. Currently, Google’s text-to-speech API is powered by WaveNet and you can experiment with it yourself on Google Cloud’s website. The speech model therefore has to output a value out of one of these 65,536 values. 4 Machine learning techniques for outlier detection in Python Based on the feedback given by readers after publishing “Two outlier detection techniques you should know in 2021”, I have decided to make this post which includes four different machine learning techniques (algorithms) for outlier detection in Python. You will also learn the exact definitions and differences of technical terms such as outlier detection, novelty detection and anomaly detection. Anomaly detection: When we combine both outlier detection and novelty detection, it is called anomaly detection. Note: In non-technical usage, there is no difference between outlier detection and novelty detection. Local Outlier Factor (LOF) AlgorithmLocal Outlier Factor (LOF) is an unsupervised machine learning algorithm that was originally created for outlier detection, but now it can also be used for novelty detection. AWS ML Community showcase: March 2021 edition In our Community Showcase, Amazon Web Services (AWS) highlights projects created by AWS Heroes and AWS Community Builders. Here are a few highlights of externally published getting started guides and tutorials curated by our AWS ML Evangelist team led by Julien Simon. AWS ML Heroes and AWS ML Community Builder ProjectsMaking My Toddler’s Dream of Flying Come True with AI Tech (with code samples). Choose from community-created and ML-focused blogs, videos, eLearning guides, and much more from the AWS ML community. About the AuthorCameron Peron is Senior Marketing Manager for AWS Amazon Rekognition and the AWS AI/ML community. 9 Comprehensive Cheat Sheets For Data Science Data science is one of those tech fields that has exploded in popularity and resources in recent years. This comprehensive 10-page sheet cheat covers all the core basics of probability theory containing a semester worth of materials. №3: SQLYou can’t spell data science without “data” after all; data scientists try to figure out the story that their data is trying to tell and then use this story to make predictions on new data. Stanford University has created a comprehensive machine learning cheat sheet that contains sub-cheat sheets for supervised learning, unsupervised learning, model metrics, and deep learning. That’s why the last cheat sheet on this list is a Jupyter Notebook cheat sheet. Fine-tuning pretrained NLP models with Huggingface’s Trainer There are many pretrained models which we can use to train our sentiment analysis model, let us use pretrained BERT as an example. You can search for more pretrained model to use from Huggingface Models page. model_name = "bert-base-uncased"tokenizer = BertTokenizer.from_pretrained(model_name)model = BertForSequenceClassification.from_pretrained(model_name, num_labels=2)Since we are using a pretrained model, we need to ensure that the input data is in the same form as what the pretrained model was trained on. Step 2: Preprocess text using pretrained tokenizerX_train_tokenized = tokenizer(X_train, padding=True, truncation=True, max_length=512)X_val_tokenized = tokenizer(X_val, padding=True, truncation=True, max_length=512)Let us preprocess the text using the tokenizer intialised earlier. Next, we specify some training parameters, set the pretrained model, train data and evaluation data in the TrainingArgs and Trainer class. Using TFRecords to Train a CNN on MNIST Using TFRecords to Train a CNN on MNISTWhen I started with TFRecords, I took me a while to understand the concept behind it. The first option is of use when we create our TFRecord dataset, the second option allows slightly more comfortable iterating. Afterwards, we shuffle our data, set a batch size, and set repeat with no argument; this means to repeat it endlessly. To recap until here:We created two TFRecord files, one for the training data, one for the test data. With it, we created two TFRecord files, one for the training data, and one for the testing data. How Zebra Medical Vision Developed Clinical AI Solutions That’s why, for the first two years, Zebra Medical Vision hardly did any machine learning at all. They dubbed this role the “Clinical Information Manager” — and this person usally has a PhD in biomedical engineering or clinical research. With all this infrastructure and this team in place, Zebra Medical Vision can move at a dazzling pace. The more practical approach: Zebra Medical Vision realized that patients get CT scans for many other diseases. So Zebra Medical Vision built an algorithm that locates and identifies these compression factors. ?Two-layered Recommender System Methodology: A Prize-Winning Solution ?Two-layered Recommender System Methodology: A Prize-Winning SolutionPhoto by Denise Jans on UnsplashA Cinema Challenge hackathon was held from 14 to 22 November of 2020. Participants could decide on one of three projects:Challenge 1 — film recommender system; Challenge 2 — tv-program recommender system; Projects Contest. TV-program was considered watched if the user watched over 80% of it and did not change the channel. Solution accuracy was asserted using Kaggle. Top features — user watch time, user features from LightFM, schedule change. iPad Pro + Raspberry Pi for Data Science Part 4: Installing Kubernetes for Learning Purposes Raspberry Pi + iPad ProiPad Pro + Raspberry Pi for Data Science Part 4: Installing Kubernetes for Learning PurposesHello there friends! We’re back again with a fourth part in our series for enabling a Raspberry Pi to work directly with an iPad Pro. Minikube has been awesome, but it unfortunately doesn’t work with Raspberry Pi. This is because Raspberry Pi uses an ARM-based CPU architecture, and there unfortunately isn’t a flavor of Minikube that currently supports that. Unique to this deployment of Kubernetes, K3s makes use of a specific k3s.yaml file for configuration settings. Three More Ways To Make Fictional Data Three More Ways To Make Fictional DataSince writing about this topic earlier, a handful of folks throughout the community have shared with me their own picks for tools that generate fictional data. Evaluation — A reminder, I’m evaluating each tool based on its ability to replicate the results from an earlier article “How To Make Fictional Data.” Faker gets very close. To get a longitude and latitude in the desired range (location), I used the generic number range data type. I also used the generic number range data types for weight and wing-span. Mockaroo lets you generate up to 1,000 rows of realistic test data in CSV, JSON, SQL, and Excel formats.”Mockaroo boasts 145 data types. Object Extraction From Images Object Extraction Using SkimageLet’s say you have an image as below, which is exactly the same as the one above, except that I manually added a “white stain” in the middle. Smart guys may already notice that, with a “level” value of 150, we get more pixels contained within a contour. Geniuses may already discover that the contours essentially separate the pixel values that are smaller than the “level” value and the pixel values that are larger. A Side NoteSometimes, your image might be too sharp, meaning that the pixel values may have large variations within the object. If the pixel values vary across the “level” value, multiple contours can be detected inside the object. Deep Learning for Semantic Text Matching Candidate Generation:Inverted Index based candidate generation:Traditionally token based inverted index is used for candidate generation. Union of these individual lists with BM25 scoring can be used as candidate documents for the next step of reranking. Inverted Index based candidate generation (Source: Author Image)But such token based retrieval has limitations as it does not capture semantic relation between words. Embedding based candidate generation:Recently candidate generation using DNN embeddings has become popular, as they can better capture the query and document semantics. Word2vec CBoW and Skip-gram are the two early word embedding models that generated a lot of interest around dense word embeddings. How I Taught My Air Conditioner Some Hebrew So I was thinking to myself: “What if I just taught that air conditioner to recognize my speech”? Sensibo IoT sensors let you toggle anything to do with your air conditioner from your phone. All of the above meant that I only needed to write my own “button clicking” app, which instead of waiting for my finger tap would listen to my voice to toggle the air conditioner. It turns out that a really big percentage of speech recognition models listen to the user speaking and try to classify the speech as whole words. If the user waits for too long before the air conditioner does anything, he’ll just park his car and click the normal button. Two-Dimensional (2D) Test Functions for Function Optimization Two-dimensional functions take two input values (x and y) and output a single evaluation of the input. In this tutorial, you will discover standard two-dimensional functions you can use when studying function optimization. Tutorial OverviewA two-dimensional function is a function that takes two input variables and computes the objective value. Nevertheless, there are standard test functions that are commonly used in the field of function optimization. We will explore a small number of simple two-dimensional test functions in this tutorial and organize them by their properties with two different groups; they are:Unimodal Functions Unimodal Function 1 Unimodal Function 2 Unimodal Function 3 Multimodal Functions Multimodal Function 1 Multimodal Function 2 Multimodal Function 3Each function will be presented using Python code with a function implementation of the target objective function and a sampling of the function that is shown as a surface plot. Google AI Blog: Constructing Transformers For Longer Sequences with Sparse Attention Methods Moreover, we also show that theoretically our proposed sparse attention mechanism preserves the expressivity and flexibility of the quadratic full Transformers. To achieve structured sparsification of self attention, we developed the global-local attention mechanism. In the BigBird paper, we explain why sparse attention is sufficient to approximate quadratic attention, partially explaining why ETC was successful. Behind both ETC and BigBird, one of our key innovations is to make an efficient implementation of the sparse attention mechanism. ConclusionWe show that carefully designed sparse attention can be as expressive and flexible as the original full attention model. GPT-3 Powers the Next Generation of Apps Given any text prompt like a phrase or a sentence, GPT-3 returns a text completion in natural language. Applications and industriesTo date, over 300 apps are using GPT-3 across varying categories and industries, from productivity and education to creativity and games. Using GPT-3, Viable identifies themes, emotions, and sentiment from surveys, help desk tickets, live chat logs, reviews, and more. Algolia Answers helps publishers and customer support help desks query in natural language and surface nontrivial answers. With natural language processing, technical experience is no longer a barrier, and we can truly keep our focus on solving real world problems. Introducing Amazon Lookout for Metrics: An anomaly detection service to proactively monitor the health of your business You can connect Lookout for Metrics to 19 popular data sources, including Amazon Simple Storage Solution (Amazon S3), Amazon CloudWatch, Amazon Relational Database Service (Amazon RDS), and Amazon Redshift, as well as software as a service (SaaS) applications like Salesforce, Marketo, and Zendesk, to continuously monitor metrics important to your business. Solution overviewThis post demonstrates how you can set up anomaly detection on a sample ecommerce dataset using Lookout for Metrics. For Datasource, choose Amazon S3. About the AuthorsAnkita Verma is the Product Lead for Amazon Lookout for Metrics. He has a special interest in launching AI services and helped grow and build Amazon Personalize and Amazon Forecast before focusing on Amazon Lookout for Metrics. Configure Amazon Forecast for a multi-tenant SaaS application Forecast data ingestionForecast imports data from the tenant’s Amazon Simple Storage Service (Amazon S3) bucket to the Forecast managed S3 bucket. For example:s3://tenant_a [ Tag tenant = tenant_a ] s3://tenant_b [ Tag tenant = tenant_b ]There is a hard limit on the number of S3 buckets per account. The tenant tag validation condition in the following code makes sure that the tenant tag value matches the principal’s tenant tag. The tenant tag validation condition in the following code makes sure that the tenant tag value matches the principal’s tenant tag. The tenant tag validation condition in the following code makes sure that the tenant tag value matches the tenant. Evaluation Bias: Are you inadvertently training on your entire dataset? A good first step is to start using a third validation split for evaluating your training runs. You only use this holdout split for evaluation purposes once you feel that you already have a model that will generalize well based on how it has performed on the validation data. Remember, the underlying reason that we use a 3rd validation split is not to hide the samples from the algorithm. If you have a lot of data, then you can afford to let your validation and test splits eat into your training set. There are latent, unseen features that our model is trying to tease out during training. Vital Signs: Assessing Data Health and Dealing with Outliers Vital Signs: Assessing Data Health and Dealing with OutliersPhoto by jesse orrico on UnsplashAt the doctor’s office, you and the medical assistant go through a familiar routine before the doctor arrives. The Data Health Tool, now included in the Alteryx Intelligence Suite, does something similar for your data. About the Data Health ToolThe Data Health Tool gathers “vital signs” for your dataset that reveal whether it’s ready to yield robust, accurate insights, or if it would benefit from some special treatment first. The Data Health Tool uses a method established in peer-reviewed research in 2008 (read it here). With the Data Health Tool, once your outliers are identified, it’s up to you how you prefer to proceed with handling them. Forecasting Climate Change in Italy with Long Short Term Memory Networks IntroductionWe are living a time full of challenges for humanity and one of the biggest challenges is climate change. So, in this time characterized by many changes and difficulties, how important is the role of Data Science in climate change? From my personal point of view, the role of Data Science in climate change will get bigger and bigger in the near future. With algorithms that take into consideration local weather, climate patterns or household behaviour, data scientists can predict how much energy we need in real-time and over the long-term. In agriculture with IoT devices that sense soil moisture and nutrients, in conjunction with weather data, farmers can have better control of the irrigation and fertilizer system. Building a Naive Bayes Machine Learning Model to Classify Text Building a Naive Bayes Machine Learning Model to Classify TextA quick start guide to get you up and running with an easy yet highly relevant NLP project in Python Aden Haussmann Just now·6 min readNaive Bayes in Python (All images by author)IntroductionNatural Language Processing (NLP) is an extremely exciting field. Bayes’ TheoremNaive BayesOne of the simpler supervised Bayesian Network models, the Naive Bayes algorithm is a probabilistic classifier based on Bayes’ Theorem (which you might remember from high school statistics). Alternatively, you might consider a Deep Learning approach based on neural networks, but this would require far more training data. The solution is to extract features from the text and turn those into vectors that can be understood by the model. I hope you gained a good high level understanding of how Naive Bayes works, and how to implement it for classifying text, specifically. Demystify Deep Learning Terminologies and Build Your First Neural Network Parts of a neural networkThe image below is a basic representation of a neural network. Layer — This refers to a collection of neurons operating together at a specific depth in a neural network. Deep neural network (DNN) — This is when a neural network contains a deep stack of hidden layers (several of the middle columns). Optimizer — This is a technique for modifying the attributes of the neural network such as the weights and the learning rate so as to reduce the loss. Table by authorWe will feed our neural network with the data and have it determine the relationship between the 2 sets. Feature Store: Data Platform for Machine Learning Feature Store: Data Platform for Machine LearningFeature data (or simply called, Feature) are critical to the accurate predictions made by Machine Learning (ML) models. In the following, I will briefly survey the leading feature stores in 2 tech companies: Uber and Airbnb, and also an open-source feature store: Feast. Airbnb: ZiplineAirbnb built their Feature store, called Ziplin, at least 4 years back. geohash(4) Feature Quality Monitoring: It is common to see feature pipeline breakages, feature data missing, drifts and inconsistency. My ThoughtsFor generic ML data platform, here are my 3 personal thoughts:(1) One of the most valuable and challenging problems is the data transformation from the raw data into high-quality, ML-friendly feature. Mouse-free data science. Detect your cat’s prey with a Raspberry… Video by AuthorFor months our two out-door cats carried dead, living and partially living mice into our home at night. The plan to create a smart-er cat flap that prevents cats unwanted gifts turned into an “idée fixe” that received a subdued smile from my wife. However 8 out of 10 detections resulted in the cat flap locking after the cat had already passed through. Photo by AuthorI tried to shorten the delay with a by-pass to the API service of the cat flap. Then I tried to connect my Raspberry to the cat flap via ZigBee. Predicting political orientation with Machine Learning Predicting political orientation with Machine LearningNote: This is not a political post, and the scientific analysis has been done without any bias. :)In this blog, we will use (traditional) Machine Learning techniques to predict the political orientation of Twitter users’, using Python. I’m a physicist, and I don’t like when Machine Learning is applied as a black box. Vectorization can be performed with more sophisticated Machine Learning techniques, usually involving Deep Learning. We have our 5000 vectors, thus we are just considering a basic Machine Learning classification task. How I Taught My Air Conditioner Some Hebrew So I was thinking to myself: “What if I just taught that air conditioner to recognize my speech”? Sensibo IoT sensors let you toggle anything to do with your air conditioner from your phone. All of the above meant that I only needed to write my own “button clicking” app, which instead of waiting for my finger tap would listen to my voice to toggle the air conditioner. It turns out that a really big percentage of speech recognition models listen to the user speaking and try to classify the speech as whole words. If the user waits for too long before the air conditioner does anything, he’ll just park his car and click the normal button. Audio Deep Learning Made Simple: Automatic Speech Recognition (ASR), How it Works Load Audio FilesStart with input data that consists of audio files of the spoken speech in an audio format such as “.wav” or “.mp3”. Read the audio data from the file and load it into a 2D Numpy array. Convert to uniform dimensions: sample rate, channels, and durationWe might have a lot of variation in our audio data items. Since our deep learning models expect all our input items to have a similar size, we now perform some data cleaning steps to standardize the dimensions of our audio data. However, as we’ve just seen with deep learning, we required hardly any feature engineering involving knowledge of audio and speech. Feeding the Beast: The Data Loading Path for Deep Learning Training Feeding the Beast: The Data Loading Path for Deep Learning TrainingOptimize your deep learning training process by understanding and tuning data loading from disk to GPU memory Assaf Pinhasi Mar 16·11 min readPhoto by David Lázaro on UnsplashDeep learning experimentation speed is important for delivering high-quality solutions on time. The data loading path — i.e. 16/32) if you have a small amount of training data, and larger values if you have a lot of training data. In the simple case, transforming the raw input example into a training example is as simple as decoding a .jpg into pixels. Stage 2 — Loading examples from storageIn the majority of cases, I/O is the largest cost in data loading. Multi-Class Classification With Transformers Building the ModelFirst, we need to initialize our pre-trained BERT model like so:We will be building a frame around BERT using the typical tf.keras layers. There are s few parts to this frame:Two input layers (one for the input IDs, and another for the attention mask). input layers (one for the input IDs, and another for the attention mask). Once this is done, we can freeze the BERT layers to speed up training (at the cost of a likely performance decrease). The reason we freeze BERT parameters is that there a lot of them, and updating these weights will significantly increase training time. Google AI Blog: Recursive Classification: Replacing Rewards with Examples in RL Top: To teach a robot to hammer a nail into a wall, most reinforcement learning algorithms require that the user define a reward function. Doing so avoids potential bugs and bypasses the process of defining the hyperparameters associated with learning a reward function (such as how often to update the reward function or how to regularize it) and, when debugging, removes the need to examine code related to learning the reward function. Right: In the example-based control approach, the model is provided only with unlabeled experience (grey circles) and success examples (green circles), so one cannot apply standard supervised learning. Instead, the model uses the success examples to automatically label the unlabeled experience. The key difference is that the approach described here does not require a reward function. AI names colors much as humans do What the research is:Across the thousands of different languages spoken by humans, the way we use words to represent different colors is remarkably consistent. Facebook AI has now shown that cutting-edge AI systems behave similarly. The images on the left show two color-naming systems created entirely by neural networks. How it works:We built two neural networks, a Speaker and a Listener, and tasked them with playing the “communication game” illustrated below. This chart shows color-naming systems created by human languages (shown in blue) and by neural networks (shown in orange). Amazon Kendra adds new search connectors from AWS Partner, Perficient, to help customers search enterprise content faster Today, Amazon Kendra is making nine new search connectors available in the Amazon Kendra connector library developed by Perficient, an AWS Partner. Improving the Enterprise Search ExperienceThese days, employees and customers expect an intuitive search experience. Perficient Connectors for Amazon KendraPerficient has years of experience developing data source connectors for a wide range of enterprise data sources. To get started with Amazon Kendra, visit the Amazon Kendra Essentials+ workshop for an interactive walkthrough. To learn more about other Amazon Kendra data source connectors visit the Amazon Kendra connector library. Careers in Machine Learning, Python Music, and AI’s Brain Connection Head straight to Eugene Yan’s invigorating Q&A with Chip Huyen, where Chip shares too many valuable insights to count (about machine learning and getting into Stanford, yes—but also about setting goals and finding community through writing). Photo by Camille Vandoorsselaere on UnsplashThis week’s must-readsMany a data scientist has started out thinking that a machine learning career revolves around mastery and expertise. Of course, networking and business acumen will only get you so far if you can’t produce valuable work, which itself often relies on highly specialized knowledge. Sit back with your snack of choice and treat yourself to Mark Saroufim’s thought-provoking polemic on the current state of machine learning, including an unflinching look at the parts of the field that no longer feel vibrant. She stresses the importance of finding the right learning rhythm, and balancing ambitious goals with realistic expectations. What is MLOps — Everything You Must Know to Get Started This new requirement of building ML systems adds/reforms some principles of the SDLC to give rise to a new engineering discipline called MLOps. In order to understand MLOps, we must first understand the ML systems lifecycle. Model training and experimentation — data scienceAs soon as your data is prepared, you move on to the next step of training your ML model. You can add version control to all the components of your ML systems (mainly data and models) along with the parameters. Other tasks include:Test a model by writing unit tests for model training. Analyzing and Interpreting Data From Rating Scales Analyzing and Interpreting Data From Rating ScalesNote: The code for this post can be found hereImprove Customer Rating (image by author)Rating Scales are an effective and popular way to gauge attitudes and opinions. The goal of this 2-part series is to demonstrate basic concepts needed to effectively utilize Rating Scales data as well as warn about common pitfalls. Feedback Form QuestionsUnderstanding The Rating ScaleEach Rating Scale is implemented as a closed-ended question to elicit information. thermometer for temperature and ruler for length), Rating Scales can be used to measure properties that are cognitive in nature. One common pitfall with Rating Scales analytics is the assumption that the distance between choices are equal. How To Use Data (and Psychology)To Get More Data Photo by Joshua Sortino on UnsplashHow To Use Data (and Psychology)To Get More DataGetting people to fill out a survey is an unfortunately complicated business. As we all know, good data, and a good amount of data, is essential to building any working models. There is also attributes that all individuals value which will almost always have a positive effect but I’ll come to that in the next section. You may get just as many, if not more, responses due to individuals’ good nature. Conclusion:Therefore, to sum it all up here’s a summary of my process when looking to survey individuals. Removing the “Adversarial” in Generative Adversarial Networks To maximize the discriminator’s loss by generating convincing pictures that are indistinguishable from the dataset images. To minimize the discriminator’s loss by classifying real or fake images with high performance. A commonly used algorithm for GANs is gradient descent-ascent, which alternates between two steps. Discriminator performs a gradient ascent step towards maximizing discriminator loss. Other variants of gradient descent-ascent have proposed solutions that address the issue of nonconvergence. Understand MapReduce Intuitively There are numerous methodologies to increase performance, but the most commonly technique used is known as MapReduce. Worker nodes are assigned numerous jobs to be performed ahead of time, and all the nodes complete their jobs simultaneously. However, it does not show to be any faster than implementing the function itself without MapReduce. Simply put, MapReduce is a procedure utilized to its maximum potential in parallel computing. ConclusionAs the intuition of MapReduce starts to form, it is quite simple to see its utility in Data Science and Machine Machine Learning. Curious about Variational Autoencoders (VAEs)? Start Here. Curious about Variational Autoencoders (VAEs)? In recent years, GANs (generative adversarial networks) have been all the rage in the field of deep-learning generative models, leaving VAEs in relative obscurity. But there’s much to gain from a solid footing in variational autoencoders, which tackle similar challenges but use a different architectural foundation. If you were looking for an engaging, accessible way to learn more about VAEs, Joseph and Baptiste Rocca’s introduction hits the spot. They define terms, walk us through the various elements that make up VAEs and how they relate to each other, and add beautiful illustrations for all the visual learners out there. CNNs for Audio Classification CNNs for Audio ClassificationImage by AuthorConvolutional Neural NetsCNNs or convolutional neural nets are a type of deep learning algorithm that does really well at learning images. These properties make CNNs formidable learners for images because the real world doesn’t always look exactly like the training data. The data for this example are bird and frog recordings from the Kaggle competition Rainforest Connection Species Audio Detection. Scale and pad the audio features so that every “channel” is the same size. Lastly, stop iterating when you note a decrease in performance in the validation data in comparison to the training data. Super Resolution: Adobe Photoshop versus Leading Deep Neural Networks Super Resolution of image from Unsplash by Adobe’s Super Resolution algorithmHow effective is Adobe’s Super Resolution compared to the leading super resolution deep neural network models? There are many positive comments describing how good Adobe Photoshop’s Super Resolution is, such as “Made My Jaw Hit the Floor”. Adobe Camera Raw’s Super ResolutionThe Adobe Camera Raw Super Resolution, or equivalent Photoshop Camera Raw filter is a recent very fast and easy to use Super Resolution method literally possible by clicking “enhance” in Adobe’s products using Camera Raw. In each example the left image is bicubic interpolation upscaling, the centre image is Adobe’s Super Resolution and right image is the IDN deep neural network’s Super Resolution. The visual improvement in resolution and quality with most of the images is very noticeable from Adobe’s Super Resolution, although artifacts are introduced or exaggerated that are not there in the IDN Deep Neural Network Super Resolution. How to Manually Optimize Machine Learning Model Hyperparameters In this tutorial, you will discover how to manually optimize the hyperparameters of machine learning algorithms. In this section, we will explore how to manually optimize the hyperparameters of the Perceptron model. For example:... # define model model = XGBClassifier() 1 2 3 . # define model model = XGBClassifier ( )Before we tune the hyperparameters of XGBoost, we can establish a baseline in performance using the default hyperparameters. TutorialsAPIsArticlesSummaryIn this tutorial, you discovered how to manually optimize the hyperparameters of machine learning algorithms. Google AI Blog: Progress and Challenges in Long-Form Open-Domain Question Answering Open-domain long-form question answering (LFQA) is a fundamental challenge in natural language processing (NLP) that involves retrieving documents relevant to a given question and using them to generate an elaborate paragraph-length answer. While there has been remarkable recent progress in factoid open-domain question answering (QA), where a short phrase or entity is enough to answer a question, much less work has been done in the area of long-form question answering. It achieves a new state of the art on ELI5, the only large-scale publicly available dataset for long-form question answering. Our submission tops the KILT leaderboard for long-form question answering on ELI5 with a combined KILT R-L score of 2.36. The follow-up work on open-domain long-form question answering has been a collaboration involving Kalpesh Krishna, Aurko Roy and Mohit Iyyer. Announcing AWS Media Intelligence Solutions Today, we’re pleased to announce the availability of AWS Media Intelligence (AWS MI) solutions, a combination of services that empower you to easily integrate AI into your media content workflows. AWS MI allows you to analyze your media, improve content engagement rates, reduce operational costs, and increase the lifetime value of media content. With AWS MI, you can choose turnkey solutions from participating AWS Partners or use AWS Solutions to enable rapid prototyping. TripleLift is an AWS Technology Partner that provides a programmatic advertising platform powered by AWS MI. Customers can dramatically reduce the time and cost requirements to produce, distribute and monetize media content at scale with AWS Media Intelligence and its underlying AI services. AWS and Hugging Face Collaborate to Simplify and Accelerate Adoption of Natural Language Processing Models Thanks to its managed infrastructure and its advanced machine learning capabilities, customers can build and run their machine learning workloads quicker than ever at any scale. As NLP adoption grows, so does the adoption of Hugging Face models, and customers have asked us for a simpler way to train and optimize them on AWS. Working with Hugging Face Models on Amazon SageMakerToday, we’re happy to announce that you can now work with Hugging Face models on Amazon SageMaker. In our business, we use machine learning models to help customers contextualize conversations, remove time-consuming tasks, and deflect repetitive questions. Getting StartedYou can start using Hugging Face models on Amazon SageMaker today, in all AWS Regions where SageMaker is available. Pytorch Training Tricks and Tips Pytorch Training Tricks and TipsPhoto by ActionVance on UnsplashIn this article, I will describe and show the code for 4 different Pytorch training tricks that I personally have found to improve the training of my deep learning model. Converting all calculations to 16-bit precision in Pytorch is very simple to do and only requires a few lines of code. The most direct way to fix the problem is to reduce your batch size, but suppose that you don’t want to reduce your batch size. If you don’t want to reduce your batch size, you can use gradient accumulation to stimulate your desired batch size. Suppose that your machine/model can only support a batch size of 16 and increasing it results in a CUDA out of memory error, and you want to have a batch size of 32. Medical Cost Prediction Medical Cost PredictionPhoto by Bermix Studio on UnsplashA health insurance company can only make money if it collects more than it spends on the medical care of its beneficiaries. colSums(is.na(train)) #> age sex bmi children smoker region charges#> 0 0 0 0 0 0 0 colSums(is.na(test)) #> age sex bmi children smoker region charges#> 0 0 0 0 0 0 0Awesome! formula <- as.formula(paste(' ~ .^2 + ',paste('poly(', colnames(X_train), ', 2, raw=TRUE)[, 2]', collapse = ' + ')))formula #> ~.^2 + poly(age, 2, raw = TRUE)[, 2] + poly(bmi, 2, raw = TRUE)[,#> 2] + poly(children, 2, raw = TRUE)[, 2] + poly(smoker, 2,#> raw = TRUE)[, 2]#> Then, insert y_train and y_test back to the new datasets. age x bmi , age x children , age x smoker , bmi x children , bmi x smoker , children x smoker are six interactions between pairs of four features. summary(lm_all) #>#> Call:#> lm(formula = charges ~ age + bmi + children + smoker, data = train)#>#> Residuals:#> Min 1Q Median 3Q Max#> -11734 -2983 -1004 1356 29708#>#> Coefficients:#> Estimate Std. How to Fine-Tune BERT Transformer with spaCy 3 How to Fine-Tune BERT Transformer with spaCy 3Since the seminal paper “Attention is all you need” of Vaswani et al, Transformer models have become by far the state of the art in NLP technology. BERT ArchitectureIn this tutorial, I will show you how to fine-tune a BERT model to predict entities such as skills, diploma, diploma major and experience in software job descriptions. Below is a step-by-step guide on how to fine-tune the BERT model on spaCy 3. Data Labeling:To fine-tune BERT using spaCy 3, we need to provide training and dev data in the spaCy 3 JSON format (see here) which will be then converted to a .spacy binary file. We were able to extract most of the skills, diploma, diploma major, and experience correctly. Scikit-Learn Cheat Sheet (2021), Python for Data Science Scikit-learn is a free software machine learning library for the Python programming language. It features various classification, regression, clustering algorithms, and efficient tools for data mining and data analysis. It’s built on NumPy, SciPy, and Matplotlib. Basic Example:The code below demonstrates the basic steps of using scikit-learn to create and run a model on a set of data. The steps in the code include: loading the data, splitting into train and test sets, scaling the sets, creating the model, fitting the model on the data, using the trained model to make predictions on the test set, and finally evaluating the performance of the model. Framework for a successful Continuous Training Strategy Continuous training seeks to automatically and continuously retrain the model to adapt to changes that might occur in the data. Yet, and regardless of the use case, three main questions need to be addressed when designing a continuous training strategy:1 — When should the model be retrained? The three most common strategies are: periodic retraining, performance based or based on data changes. For future ones, a different window may be selected, according to the comparison with the test set. The disadvantage is that it requires a more complex training pipeline (see next question ‘What to train’) to test the different window sizes and select the optimal one and it is much more computing intensive. How I passed the AWS Certified Machine Learning Specialty How I passed the AWS Certified Machine Learning SpecialtyPhoto David KolbWhy did I start a Machine learning education? This article is specifically about the AWS machine learning speciality, the study path I took and what the exam involves. The AWS Machine learning Speciality was one part of my Machine learning platform-specific goals. In Amazon’s words, the AWS Certified Machine learning — Speciality Certification “validates a candidate’s ability to design, implement, deploy, and maintain machine learning (ML) solutions for given business problems.” To cover all that, the exam is split into four domains. SummaryI enjoyed the journey to the AWS Machine learning speciality. A practical guide to TFRecords Images are a common domain in deep learning, with MNIST [1] and ImageNet [2] being two well-known datasets. To make loading and parsing image data efficient we can resort to TFRecords as the underlying file format. For each image and corresponding label, we then use the function above to create such an object. This is also possible, and goes the other way:Earlier, we defined a dictionary that we used to write our content to disk. In the last step, we have to parse our image back from a serialized form to the (height, width, channels) layout. Boost basic Dataset and simple CNN to answer real environment problem We modified our dataset and used Plant Village dataset from Kaggle which is similar to our first Dataset (leaves images on a uniform background), but without any data augmentation. For instance, the same tomato plants background is used for any tomato classes. Summary of iteration 2:Dataset “Plant Village” with data augmentation and adding new classes from dataset “Image-Net”. This should help the model to:Not focus on the background image. Summary of the iteration 2-Bis:Dataset: “Plant Village” with data augmentation and adding new classes from dataset “Image-Net”. Keywords to know before you start reading papers on GANs Over the past few weeks, I have probably read a dozen papers on GANs (and its variants) and tinkered around with their code on custom images (courtesy open-source Github repos). While most of these papers are brilliantly written, I wish there were a few keywords that I had known before I plunged into these academically-written manuscripts. Below I will discuss a few of them and hope it saves you some time (and frustration) when you encounter them in papers. As for the pre-requisites, I am assuming most of you already know what Discriminator and Generator networks are with regard to GANs. For those of you who might need a recap:A Generator network’s aim to produce fake images that look real. Optimise Deep Learning Workflow with Fast S3 Optimise Deep Learning Workflow with Fast S3I know most data scientists do not care about storage, and they shouldn’t. However, having a fast S3 object storage in the system would definitely help optimise our deep learning workflow. Optimising DL Workflow with Fast S3Prior to machine learning and deep learning, I spent 10+ years on big data (Hadoop & Spark) and devOps (cloud, platform-as-a-service). How does a fast S3 object storage help optimise our DL workflow? By using a fast S3 like FlashBlade S3, and tuning number of parallel reads and buffer size, it is possible to reach comparable performance to that of reading from fast NFS. Pretrained Transformers as Universal Computation Engines Pretrained Transformers as Universal Computation EnginesTransformers have been successfully applied to a wide variety of modalities: natural language, vision, protein modeling, music, robotics, and more. This enables the models to utilize generalizable high-level embeddings trained on a large dataset to avoid overfitting to a small task-relevant dataset. To illustrate this, we take a pretrained transformer language model and finetune it on various classification tasks: numerical computation, vision, and protein fold prediction. We refer to this as “Frozen Pretrained Transformer”. Furthermore, we find the language-pretrained frozen transformers converge faster than the randomly initialized frozen transformers, typically by a factor of 1-4x, indicating that language might be a good starting point for other tasks. Create forecasting systems faster with automated workflows and notifications in Amazon Forecast Forecast enables notifications by onboarding to Amazon EventBridge, which lets you activate these notifications either directly through the Forecast console or through APIs. Create rules for Forecast notifications through EventBridgeTo create your rules for notifications, complete the following steps:On the Forecast console, choose your dataset. For this post, we choose Forecast Dataset Import Job State Change because we’re interested in knowing when the dataset import is complete. Ranjith Kumar Bodla is an SDE in the Amazon Forecast team. Shannon Killingsworth is a UX Designer for Amazon Forecast and Amazon Personalize. The Evolution of Facial Recognition — A Case Study in the Transformation of Deep Learning The Evolution of Facial Recognition — A Case Study in the Transformation of Deep LearningMachine learning has often been described as the study of “algorithms that create algorithms”. In reality, humans have a heavy role in ensuring that machine learning algorithms work with the given data. Although the No Free Lunch Theorem gives a theoretical appeal for the impossibility of a “universal learner”, deep learning is a huge step in that direction. Neural networks are — if you will — the “algorithms that create machine learning algorithms.” Deep learning is differentiated from machine learning by the massive parametrization of its models. As such, deep learning models are not only more powerful than machine learning models, but much more generalizable across different contexts. Logistic Regression in real-life: building a daily productivity classification model Scatter plot of the target values for training (circle) and values predicted by the Linear Regression model (triangle). Output of the Linear Regression Model. But even with seemingly encouraging results, a Linear Regression model has a few limitations when it comes to classification tasks:Implies outcome values have a specific order. For a model with only one feature, or predictor, the link function g can be described as:Definition of the link function of a Logistic Regression model. Mathematically speaking, what you achieved is the inverse of the logit function, a function that’s called logistic function. How To Deploy Machine Learning Models How To Deploy Machine Learning ModelsImage Created by AuthorJupyter notebooks are where machine learning models go to die. In general, companies don’t care about state-of-the-art models, they care about machine learning models that actually create value for their customers. For machine learning code you should also describe and/or link to experiments that were run so people can view the process of creating your models. If things have gone well you have a front-end web app running on your machine that allows you to access your machine learning model predictions. How to Learn MoreI hope this overview on how to deploy machine learning models helped you understand the basic steps to deploying your models. Cervical Cancer Prediction and Boruta analysis (R) Then we apply the two functions onto the columns and we can establish an attribute that represents cervical cancer later on. A positive result does not mean that the patient suffers from Cervical cancer, but the likelihood increases the more positives a patient receives. ConclusionTo sum up, Cervical cancer is one of the most life-threatening diseases out there which is responsible for thousands of deaths per year. Cervical cancer screening for individuals at average risk: 2020 guideline update from the American Cancer Society. Retrieved February 22, 2021, from https://medium.com/opex-analytics/why-you-need-to-understand-the-trade-off-between-precision-and-recall-525a33919942— — —Dataset: UCI Machine Learning Repository: Cervical cancer (Risk Factors) Data Set. Perform K-Means Clustering in R One of the common questions regarding the K-means algorithm is if it can handle non-numeric data. data("iris")?irisEdgar Anderson's Iris Data DescriptionThis famous (Fisher's or Anderson's) iris data set gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. Preview of the dataBefore I performed the K-means algorithm, I first checked the labels to see how many clusters were presented in this dataset. Then, I fitted a K-means model with k = 3 and plotted the clusters with the “fpc” package. In this blog, I’ve discussed fitting a K-means model in R, finding the best K, and evaluating the model. Normal Equation in Python: The Closed-Form Solution for Linear Regression It works only for Linear Regression and not any other algorithm. Normal Equation is the Closed-form solution for the Linear Regression algorithm which means that we can obtain the optimal parameters by just using a formula that includes a few matrix multiplications and inversions. To calculate theta , we take the partial derivative of the MSE loss function (equation 2) with respect to theta and set it equal to zero. This is the Normal Equation —The Normal Equation; source: Andrew NgIf you know about the matrix derivatives along with a few properties of matrices, you should be able to derive the Normal Equation for yourself. The AlgorithmCalculate theta using the Normal Equation. Xgboost Regression Training on CPU and GPU in Python Now that we have software that is able to use the GPU let's train some models! For the modeling task, I will load data containing 1017209 rows and the following columns:Store — store identification numberDate — date of the store sales recordDayOfWeek — weekday of the DateSales — revenue from goods sold during that Date (only available in train_data.csv)ShopOpen — boolean flag if a shop was open during that Date (if not open, Sales should be 0)Promotion — boolean flag if any promotions were done during that DateStateHoliday — factor variable if the Date is a state holiday or notSchoolHoliday — factor variable if the Date is a school holiday or notStoreType — factor variable describing the type of a storeAssortmentType — factor variable describing the assortment type of a storeSnippet of data; Image by authorThe task is to model the Sales (Y) variable using all the other features. Note that all the other features are categorical. After adding two additional features regarding the day of the month and the month of the year we can inspect how many unique categorical values there are in the dataset:Unique categorical levels; Image by authorDistribution of the Y variable:Y variable histogram; Image by authorThe final dimensions of the full Y and X matrices:Dimensions; Image by authorThe X matrix has 1150 features and more than a million rows. An ideal real-life dataset to test out the computation speeds! Ethics in Data Science or: How I Learned to Start Worrying and Question the Process Ethics in Data Science or: How I Learned to Start Worrying and Question the ProcessSo, you’ve created a model. Is your model good or is it doing good? — Photo by Gabriele Lasser on UnsplashAs part of my working through the immersive data science program at Metis I completed a short side presentation on the concerns of ethics in technology, particularly in relation to our work in data science. I focused on the two books above, Weapons of Math Destruction by Cathy O’Neil and Algorithms of Oppression by Safiya Umoja Noble. Is your model good or is it doing good? Optimise Deep Learning Workflow with Fast S3 Optimise Deep Learning Workflow with Fast S3Building reproducible and scalable deep learning system with fast S3 as the central data and model repository. However, having a fast S3 object storage in the system would definitely help optimise our deep learning workflow. How does a fast S3 object storage help optimise our DL workflow? Since FlashBlase S3 is very fast, it is feasible to directly read the S3 data into the training iteration. By using a fast S3 like FlashBlade S3, and tuning number of parallel reads and buffer size, it is possible to reach comparable performance to that of reading from fast NFS. Implementing Transfer Learning from RGB to Multi-channel Imagery In this article, we shall be exploring two distinct concepts implemented within the Semantic Segmentation part of the project —Transfer Learning for Multi-Channel InputWhat is Transfer Learning? Transfer learning is a machine learning technique for the re-use of a pre-trained model on a new problem. Given the small number of images, transfer learning seemed like a good path to explore. Typically with transfer learning, we exclude the final layer and replace it with layers more specific to the new task. This should be set to true if we’re making inference with the pre-trained model as opposed to implementing transfer learning. Tomorrow’s car silicon brain, how is it made? The bandwidth and power consumption of such external memory is a bottleneck to the high system performance reacquired. Thus more computation units and higher peak performance can be achieved with a smaller computation unit design. [3] describe three main techniques to improve performance by optimizing Computation Unit designs:Low Bit-width Computation Unit: The bit-width of input arrays directly impacts the size of computation units. The bit-width of input arrays directly impacts the size of computation units. CAVBench [8] currently is a good starting point for autonomous driving computing system performance evaluation. Object Detection Explained: R-CNN Object Detection Explained: R-CNNObject detection consists of two separate tasks that are classification and localization. The key concept behind the R-CNN series is region proposals. In the following blogs, I decided to write about different approaches and architectures used in Object Detection. Extract region proposalsSelective Search is a region proposal algorithm used for object localization that groups regions together based on their pixel intensities. PaperRich feature hierarchies for accurate object detection and semantic segmentationRelated Articles Compute cost and environmental impact of Deep Learning Compute used to train the state of the art deep learning models continues to grow exponentially, exceeding the rate of Moore’s Law by a wide margin. Estimate the cloud compute cost and carbon emissions of the full R&D required for a new state of the art DL model. NAS can approximate the compute cost of the full cycle of Research and Development to find a new state of the art model. Image from [2]Co2 emissions and monetary cost of training of some well known deep learning models. Financial analysts such as ARK Invest predict Deep Learning market cap to grow from$2 trillion in 2020 to \$30 trillion in 2037 [3].
A Gentle Introduction to XGBoost Loss Functions

Tutorial OverviewThis tutorial is divided into three parts; they are:XGBoost and Loss Functions XGBoost Loss for Classification XGBoost Loss for RegressionXGBoost and Loss FunctionsExtreme Gradient Boosting, or XGBoost for short, is an efficient open-source implementation of the gradient boosting algorithm. XGBoost can be installed as a standalone library and an XGBoost model can be developed using the scikit-learn API. # check xgboost version import xgboost print(xgboost.__version__) 1 2 3 # check xgboost version import xgboost print ( xgboost . You can see a full list here:Next, let’s take a look at XGBoost loss functions for regression. The XGBoost objective function used when predicting numerical values is the “reg:squarederror” loss function.
The Most In-Demand Skills for Data Scientists in 2021

An in-depth analysis of the most in-demand skills from webscraping over 15,000 Data Scientist job postings. IntroductionI just wanted to start off by saying that this is heavily inspired by Jeff Hale’s articles that he wrote back in 2018/2019. I’m writing this simply because I wanted to get a more up-to-date analysis of what skills are in demand today, and I’m sharing this because I’m assuming that there are people out there that also want to see an updated version of the most in-demand skills for data scientists in 2021. Take what you want from this analysis — it’s obvious that the insights gathered from webscraping job postings are not a perfect correlation to what data science skills are actually most-demanded. However, I think this gives a good indication of what general skills you should focus more on, and likewise, stray away from.
MLOps for Research Teams

To make MLOps more concrete, we’ll look at what problems it solves for research teams. MLOps solves these problems and allows research teams to achieve their goals despite the complexity that comes from dealing with large datasets, code, and machine learning models. Most machine learning teams should have an architecture that includes the following:Our MLOps architecture consists of several integrated components that together address the difficulties most teams face. While some research teams still operate without MLOps tools or best practices, we believe MLOps has become an essential ingredient for nearly all teams. We love finding the right MLOps architecture for machine learning research teams.
How to Avoid Burnout as an Ambitious New Data Scientist

When you’re done work, you’re done work. Therefore, it’s imperative that when you’ve completed all of your tasks for the day, you call it a day. To ensure that I don’t get sucked back into work, I’ll take time to work out, work on an article for Medium, clean, or get dinner started. As a new data scientist, it’s important to have some healthy habits in place so that when work gets crazy, you have some constants that will help you avoid burnout and will keep you healthy and productive. A common complaint of people suffering burnout is that they feel like they are stuck in a rut doing the same thing day in day out.
Three ways to run Linear Mixed Effects Models in Python Jupyter Notebooks

Accessing LMER in R using rpy2 and %RmagicThe second option is to directly access the original LMER packages in R through the rpy2 interface. The rpy2 interface allows users to toss data and results back and forth between your Python Jupyter Notebook environment and your R environment. The next set of lines install rpy2 then uses rpy2 to install the lme4 and lmerTest packages. Next, you’ll need to activate the Rmagic through the code, in you Jupyter Notebook cell by running the following code. %load_ext rpy2.ipythonAfter this, any Jupyter Notebook cell starting with %%R will allow you to run R command from your notebook.
Building a data dashboard for housing prices using Plotly-Dash in Python

For this project, I’ll be using Plotly-Dash, which is a Python library for creating analytical data web apps. Lastly, I called the Geonames API to download latitude, longitude and population data for each city. Note that you’ll need to register on Geonames.org to use their Python API, and the key is your username:Code by author. I ran some quick statistics on the success rate of the API for pulling population data, and discovered that it successfully downloaded population data for ~ 94% of the cities. Dash uses dictionaries and lists extensively in its keyword arguments, so getting familiar with these Python concepts is definitely helpful when building a Dash app.
Building a full-stack spam catching app — 3. Frontend & Deployment

Building a full-stack spam catching app — 3. In the last post, we built out the backend for our app by creating the spam classifier and a small Flask app to serve the model. At this endpoint, we simply call render_template on our HTML file. Finally, we have the event listeners, which are what actually connect these JavaScript functions to our HTML page. Knowing a little bit of HTML, CSS, and JavaScript goes a long way in giving life to a data science project!
Data Augmentation for Brain-Computer Interface

Brain-computer interface has always been facing severe data-related issues such as lack of sufficient data, lengthy calibration time and data corruption. In this article, I’ll explain the issue of creating enough training data in the context of non-invasive BCIs and present a non-exhaustive list of data augmentation techniques for EEG datasets. BCI & Data AcquisitionBrain-computer interface (BCI) systems are designed to connect the brain and external devices for several use cases. Data augmentation & BCITwo approaches exist to generate augmented data. Image by AuthorData Augmentation Techniques for EEGData Augmentation helps increase the available training data, facilitate the use of more complex DL models.
Chip Huyen on Her Career, Writing, and Machine Learning

You share a lot about machine learning in production, such as through your writing and tweets. Maybe people in data and machine learning are unaware of those solutions, or too lazy to learn how to use them. I wonder how we can help smaller companies also benefit from machine learning.) What’s your advice for small to medium size companies starting to work on deploying their first machine learning models? The class caught the attention of various machine learning teams and led to her role at NVIDIA.
Why you should monitor your pictures’ sharpness when deploying Computer Vision models

As you probably know, the pictures are encoded into n-dimensional arrays (1 layer for grayscale pictures and 3 for RGB ones). If you are not at ease with this concept, I recommend this article to you:And, when it comes, for example, to tabular datasets, we can monitor the statistical characteristics of each feature (min, max, mean, standard deviation, etc.) So we need to find a way to calculate the variations within the picture, from one pixel to another. Gradient calculation of the [1, 3, 0] vector — Image by AuthorFrom 1 to 3, the function is “y = 2x”, its derivative being “2”. The workshop team can monitor the camera sharpness on their control screen so they decided to clean the camera at some point during this period.
Image Feature Extraction Using PyTorch

When we want to cluster data like an image, we have to change its representation into a one-dimensional vector. This model is mostly used for image data. Therefore, this neural network is the perfect type to process the image data, especially for feature extraction [1][2]. K-Means AlgorithmAfter we extract the feature vector using CNN, now we can use it based on our purpose. At first, the K-Means will initialize several points called centroid.
Enter the j(r)VAE: divide, (rotate), and order… the cards

Enter the j(r)VAE: divide, (rotate), and order… the cardsIntroduction to joint (rotationally-invariant) VAEs that can perform unsupervised classification and disentangle relevant (continuous) factors of variation at the same time. In this case, each encoded object corresponds to a single latent vector, and we can simply cluster the points in the latent space. Rather than working with the standard MNIST data set, we are going to make our own data set of playing card suits, with monochrome clubs, spades, diamonds, and hearts. From left to right (a = 12, s = 1), (a = 12, s = 10), (a = 120, s = 1), and (a = 120, s = 10). From left to right (a = 12, s = 1), (a = 12, s = 10), (a = 120, s = 1), and (a = 120, s = 10).
Advanced YoloV5 tutorial — Enhancing YoloV5 with Weighted Boxes Fusion

Advanced YoloV5 tutorial — Enhancing YoloV5 with Weighted Boxes FusionPhoto by Eric Karim Cornelis on UnsplashThere are tons of YoloV5 tutorials out there, the aim of this article is not to duplicate the content but rather extend on it. Most of the popular object detection models like YoloV5, EfficientDet use a command-line interface to train and evaluate rather than a coding approach. Weighted Boxes fusion is a method to dynamically fuse the boxes either before training (which cleans up the data set) or after training (making the predictions more accurate). You can also try to use it after predicting the bounding boxes with YoloV5 in the same way. Those are most of the aspects that you can easily control and use to boost your performance with YoloV5.

data = pd.concat(objs = [df, df_test], axis = 0).reset_index(drop = True)Our target variable will the Survived column — let’s keep it aside. target = ['Survived']First we check for null values in columns of training data. df.isnull().sum()sum of null values in colsRight away we can observe that three columns seem quite unncessary for modelling. data = data.drop(['PassengerId', 'Ticket', 'Cabin'], axis = 1)Now we move on to other columns that have null values. data.Age.fillna(data.Age.median(), inplace = True)data.Fare.fillna(data.Fare.median(), inplace = True)data.Embarked.fillna(data.Embarked.mode()[0], inplace = True)Doing that, we now have no null values in our data.
Deeper Neural Networks Lead to Simpler Embeddings

Deeper Neural Networks Lead to Simpler EmbeddingsRecent research is increasingly investigating how neural networks, being as hyper-parametrized as they are, generalize. Perhaps one of the most intriguing, though, is one proposing that deeper neural networks lead to simpler embeddings. This makes neural networks more likely — by chance — to find simple solutions rather than complex ones. Huh et al begin by analyzing the rank of linear networks — that is, networks without any nonlinearities, like activation functions. This paper’s fascinating contribution argues instead that simpler solutions are in fact better, and that more successful, highly parameterized neural networks arrive at those simpler solutions because, not despite, their parametrization.
Sequence Dreaming with Depth Estimation in PyTorch

Sequence Dreaming with Depth Estimation in PyTorchWhile Big Sleep is still the Big Hype on reddit, I decided to take another look at open questions in the context of deep dreaming on consecutive frames, i.e. Inspired by preceding work, such as this Caffe implementation, I wanted to include more recent insights on single class dreaming (see my previous post) and depth estimation, in addition to integrating everything into an up-to-date PyTorch framework. Sequence dreaming with the tricycle class. I approach this problem by warping the previous dream pattern onto the next frame and parametrizing the strength of the update with the flow vector field. In each step, the vector field is computed with the Farneback method (provided by opencv2) or alternatively the Spatial Pyramid Network (SPyNet).
Speed-up your Pandas Workflow by changing a single line of code

Speed-up your Pandas Workflow by changing a single line of codePhoto by Tim Gouw on UnsplashPandas is one of the most popular Python libraries used for data explorations and visualization. Pandas do not take benefit of all the available CPU cores to scale up the computations. In this article, you can read how to scale up the performance of Pandas library computation using Modin, just by changing one line of code. Unlike other distributed libraries, Modin can be easily integrated and compatible with Pandas library and has similar APIs. (Source), CPU cores utilization in Pandas and ModinFor a large data science workstation or a cluster having a lot of CPU cores, Modin performance increasingly exponentially, as follows full utilization of the CPU cores.
4 Easy Steps for Implementing CatBoost

Now is the time to learn this powerful library, and below is how you can implement it in four easy steps. Here are the main installation commands:!pip install catboost !pip install ipywidgets !jupyter nbextension enable — py widgetsnbextensionHere are the main import commands:from catboost import CatBoostRegressor from sklearn.model_selection import train_test_split import numpy as np import pandas as pdAs you can see, there are only just a few lines of code that you need for installing and importing. Here are the main dataset defining commands:dataframe = pd.read_csv('file_path_to_your_dataset.csv') X = dataframe[['X_feature_1', 'X_feature_2', etc.,]] y = dataframe['target_variable'] X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.75, random_state=42)Apply ModelWhen applying the CatBoost model, it works similarly to other sklearn approaches. However, the most important part is to designate your categorical variables, so that you can get the most out of your CatBoost model. References[1] Photo by Manja Vitolic on Unsplash, (2018)[2] Yandex, CatBoost, (2021)[3] Photo by Christopher Gower on Unsplash, (2017)[4] Photo by krakenimages on Unsplash, (2020)
Two outlier detection techniques you should know in 2021

Two outlier detection techniques you should know in 2021Photo by Alexander Andrews on UnsplashAn outlier is an unusual data point that differs significantly from other data points. Elliptic Envelope and IQR are commonly used outlier detection techniques. The intuition behind the Elliptic Envelope (Image by author)The Elliptic Envelope method considers all observations as a whole, not individual features. Wait till loading the Python code (Code snippet 5)Image by authorIQR-based detectionAn IQR-based detection is a statistical approach. ImplementationWait till loading the Python code (Code snippet 6)Image by authorThe outlier indices of each feature are very useful.
What does the h-index tell us that we could not know without it?

What does the h-index tell us that we could not know without it? A more complex way is to calculate their h-index (h) which is supposed to also take into account how these N citations are distributed across papers of a researcher. If a researcher's h-index is h, then it means that h is the greatest number for which the statement "he or she has h articles with at least h citations" holds true. Note that this bound is tight: if the N citations are distributed equally between the 1st √N papers, then the bound is reached and we have h=√N. The 1st and 2nd rows show h-index as a function of the number of citations (N) in linear and log-log scales, respectively.
Mixture Density Networks: Probabilistic Regression for Uncertainty Estimation

Types of UncertaintyThere are two major kinds of uncertainty — Epistemic and Aleatoric Uncertainty (phew, that was quite a mouthful). And have another learned parameter(a latent representation) which decides how to mix these gaussian components. Mixture Density NetworksMixture Density Networks are built from two components — a Neural Network and a Mixture Model. Weight Regularization — Applying L1 or L2 regularization to the weights of the neurons which compute the mean, variances and mixing components. SummaryWe have seen how important uncertainty is important to business decisions and also explored one way of doing that using Mixture Density networks.
Comparing Keras and PyTorch on sentiment classification

Comparing Keras and PyTorch on sentiment classificationPhoto by Karolina Grabowska from PexelsAfter part one which covered an overview of Keras and PyTorch syntaxes, this is part two of our comparison of Keras and PyTorch! Dataset of IMDB movie reviewsWe will use IMDB dataset, a popular toy dataset in machine learning, which consists of movie reviews from the IMDB website annotated by positive or negative sentiment. Both Keras and PyTorch have helper functions to download and load the IMDB dataset. embedding_dim = 128hidden_dim = 64Let’s start with the implementation in Keras (credits to the official Keras documentation)Now, let’s implement the same in PyTorch. So with samples.transpose(0, 1) we effectively permute the first and second dimensions to fit PyTorch data model.
Deep learning-based cancer patient stratification

Patient Stratification Using Deep LearningMolecular patterns or latent factors can stratify patients based on prognosis or response to drugs, or any other clinical variable. Can we predict subtypes using latent factors? On the right side of Figure 3, we color-coded the 2D projection of latent factors based on the CMS status. Figure 3: Predicting subtypes using latent factors obtained via deep learning is more accurate. As you can see, in many cancers, when using latent factors, we push this accuracy metric to a higher level.
Reinforcement Learning For Mice

Reinforcement Learning For MiceReinforcement LearningReinforcement learning(RL) is a type of deep learning that has been receiving a lot of attention in the past few years. He is making different types of mazes and is observing the mice while they were exploring different mazes. The mouse is learning intelligent behavior in complex dynamic environments. Value Function: Almost all reinforcement learning algorithms are based on estimating value functions — functions of states that estimate how good it is for the agent to be in a given state. Markov Decision Process:Markov Decision Process (MDP) is a mathematical framework to describe an environment in reinforcement learning.
Google AI Blog: Leveraging Machine Learning for Game Development

ChimeraWe developed Chimera as a game prototype that would heavily lean on machine learning during its development process. For the game itself, we purposefully designed the rules to expand the possibility space, making it difficult to build a traditional hand-crafted AI to play the game. With each iteration, the quality of the training data improved, as did the agent’s ability to play the game. We found that a relatively simple neural network was sufficient to reach high level performance against humans and traditional game AI. We hope this work will inspire more exploration in the possibilities of machine learning for game development.
RAPIDS and Amazon SageMaker: Scale up and scale out to tackle ML challenges

In this post, we combine the powers of NVIDIA RAPIDS and Amazon SageMaker to accelerate hyperparameter optimization (HPO). This RAPIDS with SageMaker HPO example is part of the amazon-sagemaker-examples GitHub repository, which is integrated into the SageMaker UX, making it very simple to launch. The key ingredients for cloud HPO are a dataset, a RAPIDS ML workflow containerized as a SageMaker estimator, and a SageMaker HPO tuner definition. SageMaker estimatorNow that we have our dataset, we build a RAPIDS ML workflow and package it using the SageMaker Training API into an interface called an estimator. Search strategyIn terms of HPO search strategy, SageMaker offers Bayesian and random search.
How to load and store MLFlow models in R on DataBricks

How to load and store MLFlow models in R on DataBricksDatabricks has became an important building block in Cloud Computing, especially now, after Google announces the launch of Databricks on Google Cloud. It is true that Databricks supports both R, Python and Scala codes, but different weaknesses are found when working with MLFlow and R, specifically when trying to register a ML model. This is a pain in the neck if we want to load MLFlow models in our R notebooks, but there is a solution. The supported magic commands are: %python , %r , %scala , and %sql . Now, after this “trick” the model has been correctly registered in the Model repository and it is ready to be used.
Double Descent Behavior Exists in Semi-Supervised Learning — Part 1

Double Descent Behavior Exists in Semi-Supervised Learning — Part 1Recently in graduate deep learning class, our project group decided to read an interesting paper about generalization: Reconciling modern machine learning practice and the bias-variance trade-off [1]. Then, we came up with a research question,“Can we empirically observe similar double descent behaviors when we trained those models in a semi-supervised learning setting? More interestingly, [1] shows the existence of double descent across a wide spectrum of models and datasets. Double Descent Risk Curve for RFF model on MNIST [1]. ConclusionTo wrap it up, this paper introduced the existence of double descent risk curve, reconciling the U-shaped bias-variance trade-off.
How to Run 30 Machine Learning Models with a Few Lines of Code

MACHINE LEARNINGHow to Run 30 Machine Learning Models with a Few Lines of CodeImage by Keira Burton. Although the scikit-learn library makes our lives easier by making possible to run models with a few lines of code, it can also be time-consuming when you need to test multiple models. It runs 30 machine learning models in just a few seconds and gives us a grasp of how models will perform with our dataset. import pyforestimport warningswarnings.filterwarnings("ignore")from sklearn import metricsfrom sklearn.metrics import accuracy_scoreNow, let's import the dataset we will be using from Kaggle. import lazypredictfrom lazypredict.Supervised import LazyClassifierFinally, let's run the models and see how it goes.
Machine Learning: The Great Stagnation

Machine Learning: The Great StagnationThis blog post generated a lot of discussion on Hacker News — many people have reached out to me giving more examples of the stagnation and more examples of projects avoiding it. However, this risk free approach is growing in popularity and has specifically permeated my field “Machine Learning”. “Useful” Machine Le