May 30,2022

  Developing a Python Program Using Inspection Tools.  Gato, the latest from Deepmind. Towards true AI?.  Making sense of bias and variance!.  Understanding The Hyperplane Of scikit-learn’s SVC Model.  How I Helped A Retail Company Using My Data Science Skills.  Image Classification with Python: CNN vs Transformers.  How Data Science Depends on Pandas and Numpy?.  4 Elegant Ways to Deal With Missing Data.  How To Visually Inspect The Quality Of Your Chatbot’s NLU Model.  When is a Machine Learning Model ready for Product.    A Student Researcher’s view on Conversational AI.  Making sense of bias and variance!.  Bird Species Classification with Machine Learning.  Understanding The Hyperplane Of scikit-learn’s SVC Model.  Why do we minimize the mean squared error?.  REST APIs on Industrial PLCs.  Why GPT Won’t Tell You the Truth.  Fundamentals of Matrix Algebra with Python | Part 2.  Data Safety is Personal Safety.  Why We Misjudge People All the Time.  
News Blog Paper China
Developing a Python Program Using Inspection Tools
       
We can load the saved models as follows:import torch import tensorflow as tf torch_model = torch.load("lenet5.pt") keras_model = tf.keras.models.load_model("lenet5.h5") 1 2 3 4 import torch import tensorflow as tf torch_model = torch . >>> import torch >>> import tensorflow as tf >>> torch_model = torch.load("lenet5.pt") >>> keras_model = tf.keras.models.load_model("lenet5.h5") 1 2 3 4 5 6 7 Python 3.9.13 (main, May 19 2022, 13:48:47) [Clang 13.1.6 (clang-1316.0.21.2)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import torch >>> import tensorflow as tf >>> torch_model = torch.load("lenet5.pt") >>> keras_model = tf.keras.models.load_model("lenet5.h5")Nothing shall be printed in the above. We can further confirm that from the key of state_dict() output compared to the layer names from the Keras model. Similarly, in the Keras model we saw a member named set_weight that is exactly the counterpart name for get_weight .
Gato, the latest from Deepmind. Towards true AI?
       
Gato can play games, generate text, process images, and control robotic arms. The deep learning field is progressing rapidly, and the latest work from Deepmind is a good example of this. We have seen in the last decade surprising applications of neural networks specialized for playing games, translating text, captioning images, etc. Moreover, and impressively, Gato is not even close to the largest neural networks we have seen! After all, our brains are somehow very intricate neural networks connecting and integrating sensory information to output our actions.
Making sense of bias and variance!
       
Making sense of bias and variance! A beginner-friendly primer on two fundamental data science conceptsWhat is bias? Depends where you’re hearing the word. I’ve made a tongue-in-cheek laundry list of various bias usages for your amusement, but in this article, we’ll focus on one specific species of bias — statistical bias —which I’ll explain in a moment. If you’re an excellent shot and you’re aiming…
Understanding The Hyperplane Of scikit-learn’s SVC Model
       
Understanding The Hyperplane Of scikit-learn’s SVC Model How to interpret the coef_ attribute of the linear SVC from scikit-learn for a binary classification problem Photo by Lisa Vanthournout on Unsplash This post will teach you how to interpret the coef_ and intercept_ attributes of scikit-learn’s SVC, and how they can be used to make predictions for new data points. After fitting an SVC to the data in the variable clf , the data points and the hyperplane with support vectors are also plotted. This is the resulting plot: Plot of the binary classification problem’s data points and hyperplane — Created by author NOTE: sklearn.inspection.DecisionBoundaryDisplay is pretty cool and can be used to draw the hyperplane and support vectors for a binary classification problem (two labels). Plot of the binary classification problem’s data points, hyperplane, and two new points — Created by author Using the fitted model we can classify these points by calling the predict function. This was not a deep look into how the SVC model works, but just enough to get the essential understanding of what is going on when making a classification.
How I Helped A Retail Company Using My Data Science Skills
       
How I Helped A Retail Company Using My Data Science SkillsThe Power Of Critical ThinkingImage by AuthorData scientists are like swiss army knives, we can use our knowledge nearly everywhere if we try to understand and think about a need or a problem. Many of us are hesitating to deal with projects that seem to be out of the data science scope and personally I think that by doing that, they are losing a great…
Image Classification with Python: CNN vs Transformers
       
Image Classification with Python: CNN vs TransformersComputer Vision & Explainability with Convolutional Neural Network, Transfer Learning, ViT, TensorFlow & HuggingFaceSummaryIn this article, using Computer Vision and Python, I will explain 3 different strategies for image classification: build a CNN from scratch, leverage a pre-trained model, and apply the cutting edge Vison Transformers (ViT).
How Data Science Depends on Pandas and Numpy?
       
How Data Science Depends on Pandas and Numpy? Pandas and Numpy play a vital role in data science and machine learningPhoto by Alex Chumak on UnsplashIntroductionWhen you extract data from the web, it is not necessary that the obtained data is always clean or structured, right? Well, Python provides us a feature to structure this messy data using Pandas data frames, a package in Python.
4 Elegant Ways to Deal With Missing Data
       
4 Elegant Ways to Deal With Missing DataImputation of missing data with 4 techniquesPhoto by Emily Morter on UnsplashIf you have some experience in real-world data analysis or data science you know that 80% of work is about data collection and data preparation. In the world of data analysis, it is very uncommon to find a database without missing values. If your data is large enough and if missing values does make up a significant part of your data, you can simply drop observations that are not complete. This article will discuss four techniques for missing values imputation and provide examples with python code. We will insert 10% missing values while making sure that no row or line is filled entirely with NAN values.
How To Visually Inspect The Quality Of Your Chatbot’s NLU Model
       
How To Visually Inspect The Quality Of Your Chatbot’s NLU ModelUsing PCA and interactive chartsIntroductionIn this article, I will share an idea to visually evaluate the quality of a chatbot’s NLU model. We can take the vector representation of the training set and project it to 2D space to visualize it as a scatter plot. ResultsOverviewWe will visualize how moodbot’s training set looks in 2D feature space using 4 DIET configurations to help visualize the impact of hyperparameter tuning. VisualizationsLet’s train a model under each config 3 times and visualize the results:Figure 6: Training set in 2D feature space using the “basic” configFigure 7: Training set in 2D feature space using the “big” configFigure 8: Training set in 2D feature space using the “bigger” configFigure 9: Training set in 2D feature space using the “biggest” configAnalysisMoodbot has 7 intents. ConclusionThis article has shown how interactive charts and dimensionality reduction methods can be used to assist in visually evaluating the quality of a training set and an ML model’s performance.
When is a Machine Learning Model ready for Product
       
When is a Machine Learning Model ready for ProductWhat to look for in your machine learning model when building a productPhoto by REVOLT on UnsplashWhen you build a machine learning model, you need to make sure that the model is fit for its purpose. How do you know when a machine learning model is ready for the product? Why knowing when your machine learning model is ready for the product is importantAlmost every business desires to have an AI and Machine Learning is a driven system. When is the Machine Learning model ready for production? I will focus on the model evaluation as an end-to-end process that includes: — Model Development — Model Testing — Model Improvement I will try to provide a checklist of things you should look for when you are evaluating the readiness of your machine learning model for the product.
A Student Researcher’s view on Conversational AI
       
(Images Source: Google [Aside from some images, in whose case, the source is specifically mentioned below the image])What Exactly is Conversational AI? Conversational AI is the set of technologies behind automated messaging and speech-enabled applications that offer human-like interactions between computers and humans. The main component in Conversational AI is the use of Natural Language Processing (NLP)Components of Conversational AIThere are mainly 2 components of Conversational AI, namely-Machine LearningNatural Language ProcessingUnder NLP there are four steps-Input GenerationInput AnalysisOutput GenerationReinforcement Learning. ScalabilityConversational AI is also very scalable as adding infrastructure to support conversational AI is cheaper and faster than the hiring and onboarding process for new employees. Where we use Conversational AISource: IBMWe use Conversational AI across various fields.
Making sense of bias and variance!
       
Making sense of bias and variance! That’s what we mean by high variance, not high bias. High variance without bias is a jolly sort of incompetence that is spread out fairly. As you can see from this diagram, the worst possible results are those with high variance and high bias, while the best combo is the one that keeps them both low. Further reading: Going sidewaysIf you’ve had your fill of variance, I suggest stepping over to my parallel article that uses a coronavirus case study to teach you about other kinds of bias: sampling bias, selection bias, information bias, reporting bias, and confirmation bias.
Bird Species Classification with Machine Learning
       
Scientists have determined that a known species of bird should be divided into 3 distinct and separate species. Predict what species is a bird based on genetics and locationLoad LibrariesNext, we load up some essential libraries for visualizations and machine learning. Missing data helper functionLoad the dataFirst, we load the train and test data using the read_csv function. We see location and species seemingly for their respective locations and species (loc2 & species C, loc3 & species A). Based on the species plot, it appears we have in our hands an imbalanced class as species B is considerably less than species A and CWhy is this a problem?
Understanding The Hyperplane Of scikit-learn’s SVC Model
       
Understanding The Hyperplane Of scikit-learn’s SVC ModelHow to interpret the coef_ attribute of the linear SVC from scikit-learn for a binary classification problemPhoto by Lisa Vanthournout on Unsplash
Why do we minimize the mean squared error?
       
Have you ever wondered why we minimize the squared error? The quantity that we want to minimize — aka the loss function — isMSE Loss FunctionThe intuition behind this loss is that we want to penalize more big errors than small errors, and that’s why we’re squaring the error term. The answer is that the choice of this loss function is not that arbitrary, and it can be derived from more fundamental principles. Therefore the function that we want to solve iswhich is the same as minimizing the squared error loss! ConclusionsWe just saw that minimizing the squared error is not an arbitrary choice but it has a theoretical foundation.
REST APIs on Industrial PLCs
       
REST APIs on Industrial PLCsExpand the capabilities of a PLC to perform virtually any taskCommunication to a REST API, Image by AuthorIntroductionAPIs allow for easy interfacing between two different applications and in this article a basic implementation of a Python REST API¹ will be used to manage various SQL commands from a PLC². In addition to distributing the computational load to another device, the REST API is useful for performing actions not typical to controllers. Using a REST API allows you to do so, and any additional custom operations required by the process. REST APIThe API will run on a server located within the local network associated with all of the controllers. ConclusionUsing a REST API is an easy way to expand the capabilities of a PLC.
Why GPT Won’t Tell You the Truth
       
Why GPT Won’t Tell You the TruthIn ancient China, the talent for summarizing was ranked just below the talents of astrology and medicine. (GPT-3 claim about the task of summarization)When writing my article about text summarization, an admittedly dull topic, I wanted to spruce it up somehow. Here’s part of it:In short, text summarization is the process of creating a shortened version of a text document, while preserving its most important information. Automated text summarization is often used to generate summaries of news articles, or to provide brief overviews of lengthy documents. Getting a generative text model to say something memorable is a bit like playing the lottery — the more tickets you buy, the more likely you are to win.
Fundamentals of Matrix Algebra with Python | Part 2
       
Fundamentals of Matrix Algebra with Python | Part 2 Understanding and implementing basic matrix algebra concepts and operations with Python1) Trace The trace of a matrix is simply the sum of its diagonal elements, as highlighted below in Figure 1. Figure 1 — Trace of a Matrix (Image By Author) Gist 1 provides the Python code to calculate the trace of a matrix, which is trivial using Numpy. Figure 2 — Example 2×2 matrix (Image By Author) Figure 3 presents the formula for evaluating a 2×2 determinant. Figure 6 — Identity Matrix (Image By Author) Multiplying a matrix by its inverse gives the Identity Matrix, I, i.e. 4) Orthogonality Orthogonality is synonymous with orthonormality.
Data Safety is Personal Safety
       
Data Safety is Personal SafetyPhoto: Rob King / UnsplashLoving America is never easy work, but it feels particularly hard this week. It feels particularly hard as I write today, after the mass shooting at Robb Elementary in Texas and in advance of the Supreme Court ruling on Roe vs Wade any day now. How do we love a place that does not seem to want to love us back?
Why We Misjudge People All the Time
       
Why We Misjudge People All the TimeOn the psychology of self-presentation and impression managementImage by the author. My mom doesn’t know me. Not really. For a long time, she only knew the role I played in her presence. That’s why she was so surprised when she started reading me online.
The Grief of the Powerless
       
The Grief of the PowerlessWe must not give up our anger, nor accept that there is no better wayA picture of Alexandria Rubio, one of the victims of the Robb Elementary school shooting, is left at a memorial in Town Square in front of the county courthouse, three days after a gunman killed nineteen children and two adults, in Uvalde, Texas, U.S. May 27, 2022. REUTERS/Marco BelloJournalists began calling me two days ago, as I knew they would. It happens after almost every tragedy: from the New York Times and NPR when the novel Coronavirus struck down hundreds of thousands; from WIRED editors when the Supreme Court overturned Roe vs Wade; from various…
3 Free Machine Learning Courses You Should Take Right Now
       
3 Free Machine Learning Courses You Should Take Right Now Get started with your machine learning journey for free Photo by Avel Chuklanov on Unsplash There are many ways to get started with studying machine learning. I am an advocate for sharing free resources for learning and fortunately, there are several free machine learning courses available. Machine learning crash course From Google Machine Learning Crash Course | Google Developers Learn and apply fundamental machine learning concepts with the Crash Course, get real-world experience with the… developers.google.com Length: 15 hours Best suited for: Learners who can code with Python and already have a good understanding of linear algebra and statistics. Core subject: Practical machine learning This relatively short course covers an extensive breadth of machine learning topics. Practical deep learning for coders From FastAI Practical Deep Learning for Coders If you're ready to dive in right now, here's how to get started.
Don’t Listen to Anyone Who Claims to Have the One Neat Trick
       
Don’t Listen to Anyone Who Claims to Have the One Neat TrickWisdom is quiet and knows that life is complicatedA few years ago when I was a full-blown gym rat, I was horribly obnoxious. (I still am, but for other reasons.) I used to try to encourage people during classes by reminding them “the way to make progress is to keep going when you start to feel tired!”I’d rattle off research that I’d read on how to learn physical skills (pay attention to external cues, not your own body movements) and why to fight through fatigue…
Led by Freddie Freeman, Dodgers near LA record with 24 hits in 14–1 win
       
Led by Freddie Freeman, Dodgers near LA record with 24 hits in 14–1 win(Photo by Chris Coduto/Getty Images)by Rowan KavnerFreddie Freeman thought his swing was getting too big after a couple hitless performances. The rest of the Dodger lineup did, too. Thirteen more runs would follow as a loaded Dodger offense exploded for 24 hits — the most by any team this season — in a 14–1 win. After getting a night off, Mookie Betts went right back to tormenting the opposition atop the Dodger lineup. He catalyzed the Dodger barrage, doubling two pitches into the game and finishing with three hits on the night.
The Earth Was Silent For 4 Billion Years
       
The Earth Was Silent For 4 Billion YearsAnimals made no noise for 90% of the planet’s life. Now industrial noise threatens the “biophony”“Singing Robin”, via Bernd ThallerWe often think of the natural world as noisy — beautifully so. When I used to camp in Ontario as a Boy Scout, the night was alive with crickets, and when we were up further north, the occasional howl of a…
Why Are More Black Americans Committing Suicide?
       
Why Are More Black Americans Committing Suicide? Many in the mental health field are beginning to question what’s behind the alarming increase in suicide among Black Americans, and research has pointed to five contributing factors.
Examples of Information Retrieval Application on Image and Text
       
Examples of Information Retrieval Application on Image and TextInformation Retrieval in Image and Text use-casePhoto by NASA on UnsplashHello, welcome to my second blog post. They are “Automatic Order Extraction” and “Image Retrieval for Image Website.” I will also provide the background of the usefulness of each example. Information Retrieval: elasticsearchOne of the popular Information Retrieval in the text is elasticsearch. It is an Information Retrieval tool that is built on top of Apache Lucene, where it is optimized to do retrieval jobs. I wrote 2 implementations using Information Retrieval on text and images.
Tesla’s Self Driving Algorithm Explained
       
Tesla’s Self Driving Algorithm ExplainedOn Tesla AI Day, Andrej Karpathy — the director of AI and Autopilot Vision at Tesla — enlightened us with a presentation about their self-driving neural network. The first feature output has a high resolution (160 x 120) and focuses on all the details in the image. Feature Queue and Video Module (from Tesla AI Day)When do we push to the feature queue? Video Neural Network ?The feature queue is consumed by a spatial recurrent neural network (RNN). Final Architecture Overview (from Tesla AI Day)? Elon Musk announced that Tesla is going to hold a second AI Day on August 19, 2022.
Manage ML Automation Workflow with DagsHub, GitHub Action, and CML
       
Connect is different from Migration as it allows your GitHub repository to be associated with the DagsHub repository. CML CML or Continuous Machine Learning is an open-source tool provided by iterative.ai to implement CI/CD in machine learning projects. GitHub action provides the capability to automate the ML workflow, and CML supports GitHub action by giving commands for ML workflow output. To understand more about GitHub Action and CML, please refer to the following documentation: Usage The key file in any CML project is .github/workflows/cml.yaml: The example above generates visual reports in pull… cml.dev GitHub - iterative/setup-dvc: DVC GitHub action DVC GitHub action. Using DagsHub connect, I would mirror the sample GitHub repository onto my DagsHub repository.
Deep Attentive Variational Inference
       
Figure 1: Overview of a local variational layer (left) and an attentive variational layer (right) proposed in this post. Attention blocks in the variational layer are responsible for capturing long-range statistical dependencies in the latent space of the hierarchy. A quick review of Deep Variational AutoEncodersLatent variable models augment the set of observed variables with auxiliary latent variables. VAEs are trained by maximizing the Evidence Lower BOund (ELBO) which is a tractable, lower bound of the marginal log-likelihood:\[\text{log } p(x) \ge \mathbb{E}_{q(z\mid x)}\large[\text{log } p(x\mid z)\large] – D_{KL} \large(q(z\mid x) \mid \mid p(z)\large). \( -\text{log } p(x) \) in bits per dimension and relative decrease for varying number of variational layers \(L\).
Explainable AI: Unfold the Blackbox
       
Explainable AI Is more important in high stake AI domains like- Financial Investments, Medical diagnoses, Autonomous Vehicles, and Legal or Defense-related decision-making. Benefits Building Confidence in AI-driven Business Decisions- Explainable AI will help in creating trust & confidence in business decisions. Keeping the AI models explainable and transparent, can greatly reduce the impact of erroneous results and organizations can mitigate the risks from regulatory & compliance bodies. However, despite the growing interest in Explainable AI, there is a big gap between the Explainable AI Vision and practice. There can be multiple explanations required from the same AI algorithm, so how Explainable AI will help in this context is still a question.
Causal AI — Enabling Data-Driven Decisions
       
Causal AI — Enabling Data-Driven DecisionsUnderstand how Causal AI frameworks and algorithms support decision making tasks like estimating the impact of interventions, counterfactual reasoning and repurposing previously gained knowledge on other domains. Decision Making through existing methodsLet’s explore if we are able to get answers to above questions from Supervised Machine Learning and other traditional approaches2.1 Does Supervised Machine Learning help? 2.3 How Causal AI help address the challengesThe above questions are all causal questions and, unlike many conventional machine learning tasks, cannot be answered using only passively observed data and traditional machine learning algorithms. Causal AI — Quick Overview3.1 Basic ConceptsCausal inference enables estimating the causal effect of an intervention on some outcome from real-world observational data, holding all other variables constant. Promoting Ethical and Socially Responsible AI: Causal learning is one of the key approach, researchers are using to develop Socially Responsible AI.
How to Perform Better on Machine Learning Projects
       
How to Perform Better on Machine Learning ProjectsAsk three questions to be better the next timePhoto by Matt Howard on UnsplashComedian Chris Rock comes up with novel ideas in small steps. However, I would like to single out three questions that we can use after concluding a (machine learning) project. Because I tried to be clever, I thought letting the learning rate vary throughout training would be brilliant. Also, letting the learning rate vary introduced additional hyperparameters and increased the project’s complexity. I meticulously tweaked the learning rate and refactored much of the underlying code in the example.
p-values: A Legacy of “Scientific Racism”
       
Legacies of Statistics & AI p-values: A Legacy of “Scientific Racism” A deeper look at the untold history of p-values and its legacy Disclaimer: I’ll be putting quotes around certain words, like race, racial measurements”, etc. Also, the reader should have a bit of familiarity with p-values, hypothesis testing, and Bayes’ theorem. What is Scientific Racism? “Scientific racism”, or more accurately psuedo-scientific racism (because racism is not scientific), was a way in which European colonial governments — and the statisticians they hired to do government surveys and data collection — justified their racist policies by using statistical measurements, often in a extremely biased and incorrect way. Now that I’ve given a brief context to what scientific racism is, we’ll see more specific examples later in the piece related to significance testing and p-values.
Find the order of ARIMA models
       
Find the order of ARIMA modelsUnderstand and find the best parameters for your time-series basic modelingImage by @m_____meARIMA is one of the best models to start a univariate time series experiment. 1) Auto-Regressive modelsAR of order p is a model that regresses on its own p past values. We use the autocorrelation function to assess the degree of dependence in the time series and select an appropriate model (MA, AR, or ARIMA). The auto-ARIMA process seeks to identify the most optimal parameters for an ARIMA model, settling on a single fitted ARIMA model. This article gave you the technique to tune your model order with sufficient confidence.
Broadcasting in Numpy: A Powerful Technique You Should Know
       
Broadcasting in Numpy: A Powerful Technique You Should KnowThe ins and outs of how broadcasting works under the hoodPhoto by Donald Giannatti on UnsplashIntroWhat is broadcasting? Broadcasting is a mechanism that allows Numpy to handle arrays of different shapes during arithmetic operations. Broadcasting RulesIf broadcasting can’t be used universally, when does broadcasting work and when does it fail? Broadcasting is used very frequently in Numpy, so it’s critical to understand how it works under the hood. Some of my previous articles:ReferencesBroadcasting, NumpyNumpy Documentation, NumpyNumpy — Broadcasting, TutorialspointPython | Broadcasting with Numpy Arrays, GeeksforGeeks
How Neural Networks Actually Work — Python Implementation (Simplified)
       
How Neural Networks Actually Work — Python Implementation (Simplified) Neural Network (NN) is a black box for so many people. In fact, the number of features on the dataset is equal to the number of the neurons in the input layer. In our example, shown in the Figure below, we have 3 features and therefore the input layer of the architecture must have 3 neurons. We have 3 neurons in the input/first layer, 4 neurons in the hidden layer, and 1 neuron in the output — a 3–4–1 Neural Network. gˡ is the activation function at layer l, and,ⱼ = gˡ(zˡⱼ+bˡⱼ)— Output of unit j in layer l. This becomes the input to the units in the next layer, layer l+1.
Random Forest or XGBoost? It is Time to Explore LCE
       
Local Cascade Ensemble (LCE) [Fauvel et al., 2022] is a new machine learning method which proposes to answer this question. Thus, LCE further enhances the prediction performance of both Random Forest and XGBoost. There are two complementary ways to generate diverse predictors: (i) by changing the training data distribution and (ii) by learning different parts of the training data. Similar to XGBoost, LCE excludes missing values for the split and uses block propagation. Results show that LCE obtains on average a better prediction performance than the state-of-the-art classifiers, including Random Forest and XGBoost.
Beginner’s Guide to Reinforcement Learning
       
Beginner’s Guide to Reinforcement LearningA high-level overview of Reinforcement Learning modelsPhoto by Kelly Sikkema on UnsplashReinforcement learning is the fourth major learning method in machine learning, along with supervised, unsupervised, and semi-supervised learning. How Reinforcement Learning WorksReinforcement learning models should be trained to make a series of decisions independently. As we have already seen, reinforcement learning is completely contrary to supervised and unsupervised learning. In many areas, machine learning and deep learning models will continue to be sufficient to achieve good results. However, there are also applications, such as stock trading, where Reinforcement Learning will replace deep learning models as it provides better results.
Reinforcement Learning in Minecraft: Create a Bot to Find Diamonds
       
Reinforcement Learning in Minecraft: Create a Bot to Find DiamondsReinforcement Learning and Behavior Cloning in Python with MineRLImage by author (Mojang license)Minecraft is the next frontier for Artificial Intelligence. We’ll design a bot and try to achieve one of the most difficult challenges in Minecraft: finding diamonds from scratch. Sequence of actions to find diamonds, image by author (Mojang license)What we’re gonna talk about is not limited to Minecraft. Step 4000 | Training loss = 0.878Step 8000 | Training loss = 0.826Step 12000 | Training loss = 0.805Step 16000 | Training loss = 0.773Step 20000 | Training loss = 0.789Step 24000 | Training loss = 0.816Step 28000 | Training loss = 0.769Step 32000 | Training loss = 0.777Step 36000 | Training loss = 0.738Step 40000 | Training loss = 0.751Step 44000 | Training loss = 0.764Step 48000 | Training loss = 0.732Step 52000 | Training loss = 0.748Step 56000 | Training loss = 0.765Step 60000 | Training loss = 0.735Step 64000 | Training loss = 0.716Step 68000 | Training loss = 0.710Step 72000 | Training loss = 0.693Step 76000 | Training loss = 0.695Our model is trained. ConclusionI hope you enjoyed this little guide to reinforcement learning in Minecraft.
Toward a Journalistic Ethic of Citation
       
Toward a Journalistic Ethic of CitationThe need to share credit and show our workAfter The New York Times published its extensive report on the history of Haiti’s impoverishment at the hands of its overthrown colonial overlords, a robust debate broke out between academic and journalistic Twitter about inadequate citation and sourcing. Journalism must do better.
The Scars Are All That’s Left
       
School ShootingsThe Scars Are All That’s LeftI went to Columbine long after the tragedy there. My time in the school was still marked by the trauma of the shooting. Photo by kyo azuma on UnsplashAll my love to teachers. Content Warning: This post discusses school shootings, bullying, suicide, death threats, and other difficult…
Can’t Afford Kids? Put That Baby in a Box
       
Can’t Afford Kids? Put That Baby in a BoxThe GOP’s policy solution for post-Roe social chaos: create infant drop-off locations all over the countryPhoto by Kelli McClintock on UnsplashBeyond the occasional mass shooting, now as unremarkable as a traffic jam or a tornado, there isn’t much Big News out of central Indiana. So when a baby was deposited in a box affixed to the Carmel Fire Department…
Australia Doesn’t Have Mass Shootings
       
Australia Doesn’t Have Mass ShootingsIt often feels like the United States is the only place that doesPictured: Better when they’re not inside people. Photo: Jay Rembert / UnsplashI’m going to tell you a terrible story, that for most people living in the United States is probably all too familiar. A young man, less than 30 years old, felt unwelcome and unrewarded by society. Up until that day, he’d tried and failed at a number of pursuits, and every time he got…
Why is the sky blue? Why is the ocean blue? The answers aren’t the same.
       
The combination of a blue sky, dark overhead, lighter near the horizon, along with a reddened Sun at either sunrise or sunset, can all be explained scientifically, along with the blue color of the oceans as an independent phenomenon. (Credit: ssxss/pixabay)Why is the sky blue? The sky is blue. The oceans are blue. If you’ve ever been curious about the world you live in, you’ve probably wondered why the sky is blue.
The Last Movie Star — Tom Cruise. Forget the other three wives, Tom…
       
Source: Paramount PicturesThe Last Movie Star — Tom CruiseForget the other three wives, Tom married TomTrigger alert: Aggressive LanguageNed Tanen, the president of Paramount Pictures movie group, was screaming again. In 1985, most of the execs who worked with him and adored him called those days, “the green-vomit days” in honor of the possessed girl in the hit movie, The Exorcist.
What Can We Tell the Kids?
       
What Can We Tell the Kids? There’s no good way to tell children about another school shooting or much reason to believe our policymakers will help prevent the next onePhoto: Colin Lloyd / UnsplashThis year will mark the decade anniversary of the Sandy Hook Elementary mass shooting.
This Is How It Is
       
Photo by Terrance BarksdaleThis Is How It IsAmerica is a man with a gunIt doesn’t have to be this way but it is. A grandmother at a grocery store. A teacher reading a book to a classroom of little kids. A morgue truck filled with dead bodies. A television crew reporting the same news on a loop: “a man with a gun went…
Saying Goodbye (For Now) To My Morbid Hobby Turned Obsession
       
Saying Goodbye (For Now) To My Morbid Hobby Turned ObsessionA decision I should have made much soonerPhoto by Immo Wegmann on UnsplashIn recent years, I’ve taken up a, more often than not, rather morbid hobby; one that millions of Americans indulge in every day. As a nation and as a global society, our obsession with this particular genre of information has inspired TV shows, movies, books, and entire societies/communities…
5 Advanced JavaScript concepts that will make you a better developer
       
5 Advanced JavaScript concepts that will make you a better developer Photo by Arnold Francisca on Unsplash Currying Currying means evaluating functions with multiple arguments and decomposing them into a sequence of functions with a single argument. Simple curry example In the above example, we created our own simple implementation for currying a function with exactly three parameters. As a general solution, I suggest using Ramda or similar which supports currying functions with any number of arguments and also with support for changing order of arguments using placeholders. Or simply when you treat the falsy values as valid onesReflect Reflect is a global object that provides some useful methods for metaprogramming. Introspection methods which are non-destructive methods and modification methods which are destructive since they mutate the object or its behavior.
Google AI Blog: Deep Learning with Label Differential Privacy
       
Over the last several years, there has been an increased focus on developing differential privacy (DP) machine learning (ML) algorithms. DP-SGD protects the privacy of each example pair [input, label] by adding noise to the stochastic gradient descent (SGD) training algorithm. DP algorithms include a privacy budget, ε, which quantifies the worst-case privacy loss for each user. In “Deep Learning with Label Differential Privacy”, presented at NeurIPS 2021, we consider a more relaxed, but important, special case called label differential privacy (LabelDP), where we assume the inputs (input 1 , …, input n ) are public, and only the privacy of the training labels (label 1 , …, label n ) needs to be protected. We hope that the release of the multi-stage training algorithm code provides researchers with a useful resource for DP research.
Another Look at Exploratory Data Analysis with Excel’s Ideas Feature
       
Another Look at Exploratory Data Analysis with Excel’s Ideas FeatureWill Excel replace data scientists? In its simplest form, it helps with exploratory data analysis by quickly producing a data visualization dashboard. Exploratory data analysis (EDA) is a big topic in data science and machine learning. For The Data ScientistPeople who work with data from researchers, analysts, business intelligence professionals, and more spend a lot of time on exploratory data analysis tasks. To make Excel smart enough to do data analysis (or at least exploratory data analysis — to automate the process of finding patters in data) Microsoft is enhancing its Office products with cloud infrastructure.
Want to Succeed Doing ML in the Real World?
       
Want to Succeed Doing ML in the Real World? Because there are more things than can go wrong doing ML in the real world, compared to an online course or a Kaggle competition. Even if your job title is “ML engineer”, do not think you need to train an ML model for every problem. This way you gain some time until the startup grows a bit and the real ML work enters the picture. Do you love reading and learning about ML in the real world, AI, and data science?
AWS SageMaker X HuggingFace X AWS QuickSight
       
The significance of using Huggingface with SageMaker is to simplify the training of the transformer-based model on SageMaker and make them easy to deploy for production. Also, using QuickSight Dashboard brings Data and Model Analysis under one roof, making things easier to monitor and configure. Source: AuthorSimilarly within the preprocess s3 bucket directory we store the training data and validation datasets for further usage. The pre-processed dataset is then uploaded to the s3 bucket for further creating an EDA dashboard in AWS QuickSight. In a SageMaker training environment, this estimator executes a Hugging Face training script.
Super Resolution — A Basic study
       
Super Resolution — A Basic studyA study of super-resolution, its roots, and different types of loss functions that have been utilized generally for the model trainingWhat is super-resolution? Goal of super resolution techniques is to interpolate in a manner that the sharpness of the edges is retained and the image does not look pixelated. It utilises a CNN as a mapping function from Low resolution (LR) input to High resolution (HR) output using convolutional neural networks (CNNs). A LR image is first super resolved using bicubic interpolation and then passed through the network which outputs another Super Resolved (SR) image. Adversarial LossGenerative Adversarial Networks have been used heavily in recent times for the super resolution of images.
How to Explore a Dataset of Images with Graph Theory
       
Wasserstein DistanceThe Wasserstein metric, also known as the earth mover’s distance, is a distance metric between two probability distributions. Example of Wasserstein Distance (Image by author)There are many reasons to use the Wasserstein metric. For this task, the k-nearest neighbor graph is a natural choice for its simplicity and explainability. K-Nearest Neighbor Graph (K-NNG)In simple terms, the K-nearest neighbor graph is a graph where each node is connected to its k nearest neighbors. 30-nearest neighbor graph of the cup dataset (image by author)We can easily separate different types of data with this method, but we can also explore the underlying structure of the dataset.
Create a Bot to Find Diamonds in Minecraft
       
Create a Bot to Find Diamonds in MinecraftReinforcement Learning and Behavior Cloning in Python with MineRLImage by author (Mojang license)Minecraft is the next frontier for Artificial Intelligence. We’ll design a bot and try to achieve one of the most difficult challenges in Minecraft: finding diamonds from scratch. Sequence of actions to find diamonds, image by author (Mojang license)What we’re gonna talk about is not limited to Minecraft. Step 4000 | Training loss = 0.878Step 8000 | Training loss = 0.826Step 12000 | Training loss = 0.805Step 16000 | Training loss = 0.773Step 20000 | Training loss = 0.789Step 24000 | Training loss = 0.816Step 28000 | Training loss = 0.769Step 32000 | Training loss = 0.777Step 36000 | Training loss = 0.738Step 40000 | Training loss = 0.751Step 44000 | Training loss = 0.764Step 48000 | Training loss = 0.732Step 52000 | Training loss = 0.748Step 56000 | Training loss = 0.765Step 60000 | Training loss = 0.735Step 64000 | Training loss = 0.716Step 68000 | Training loss = 0.710Step 72000 | Training loss = 0.693Step 76000 | Training loss = 0.695Our model is trained. We could train another agent to find diamonds, and even a third one to create the iron pickaxe.
MLOps: How to Operationalise E-Commerce Product Recommendation System
       
Photo by JJ Ying on Unsplash MLOps: How to Operationalise E-Commerce Product Recommendation System Introduction One of the most common challenges in an e-commerce business to build a well-performing product recommender and categorisation model. In the first part, we will talk about how to build an e-commerce product recommendation system and will do product categorisation with some hands-on coding exercises. A product will be treated as a single word and a sequence of product views (sessions) will be treated as a sentence. In the next step, those product vectors are fed into a K-Means algorithm as inputs to create arbitrary number of product clusters. It is very likely that you will have many consecutive duplicate product views in your data which might distort the algorithm.
A Complete Guide to Decision Trees
       
A Complete Guide to Decision TreesLearn everything you need to know about decision trees, including a Python examplePhoto by Simon Wilkes on UnsplashThe Decision Tree is a machine learning algorithm that takes its name from its tree-like structure and is used to represent multiple decision stages and the possible response paths. Advantages and Disadvantages of Decision TreesThe simple and understandable structure makes the decision tree a popular choice in many use cases. Advantages and Disadvantages of a Decision Tree | Photo by AuthorDecision trees as part of Random ForestsRandom Forest is a supervised machine learning algorithm that is composed of individual decision trees. Train a Decision Tree in PythonThe Skicit-Learn Python module provides a variety of tools needed for data analysis, including the decision tree. With the help of Skicit-Learn, a decision tree can be trained in just a few lines of code:# Import Modulesfrom sklearn.datasets import load_irisfrom sklearn import tree # Load Iris Datasetiris = load_iris() # Define X and Y VariableX, y = iris.data, iris.target # Set up the Decision Tree Classifier clf = tree.DecisionTreeClassifier() # Train it on the Iris Dataclf = clf.fit(X, y)So we can train a decision tree relatively easily by defining the input variable X and the classes Y to be predicted, and training the decision tree from Skicit-Learn on them.
The inherent Train-Inference mismatch of SourceCodeAI
       
Finally we train a model using that dataset only to find a dramatic performance drop on our internal repositories. How likely is it to face AWS integrations in database related snippets on random Github repositories?. How train-inference populations mismatch affects autocomplete applicationsCode autocomplete is a trendy topic in the source code AI world. While big companies can theoretically leverage their internal code base for that need (with the risk of overfitting their internal code practices and styling), med-small companies don’t have such a luxury. Relevant examples could be (public) repositories of private users which include relevant terms within them (like ‘credit card =’.
Leukopy — blood cell classification with neural networks
       
Leukopy — blood cell classification with neural networksHow deep learning could revolutionise the identification of leukocytes on blood smearsImage by Colin Behrens from PixabayThis article was written in collaboration with Mathieu Sarrat and Laleh Ravanbod. The diagnosis of many pathologies, such as infectious diseases, leukaemia or other haematological disorders rely on the classification of subtypes of white blood cells, a.k.a. Classically, circulating blood cells are split into 5 major subtypes:platelets, red blood cells, granulocytes (basophils, neutrophils, eosinophils), monocytes, and lymphocytes. VI — ConclusionOur models do well in classifying 11 classes of blood cell pictures from 3 different datasets, but many improvements are possible. A possible extension of this work could involve object (= blood cell) detection (e.g.
Forging Trust: Predicting Behavior in an Unpredictable World
       
Forging Trust: Predicting Behavior in an Unpredictable WorldWe all fear the unknown and the unpredictability of those around us. In a world filled with uncertainty, fears, and challenges, we all seek individuals we can trust, connect with, and provide the psychological safety all humans crave. Therefore, we need a more objective method for assessing who to trust and actionable techniques to forge the trust, relationships, and connections we need to thrive in every aspect of our lives. Forging trust, building healthy connections and relationships, and letting go of our inconsistent assessment of others is easily achieved when you apply objective observations of the world and people around us. In a world filled with never-ending challenges and uncertainty, letting go of our fear by understanding the behaviors of others and forging healthy relationships is a great solution.
Australia Doesn’t Have Mass Shootings
       
Australia Doesn’t Have Mass ShootingsIt often feels like the United States of America is the only place that doesPictured: Better when they’re not inside people Photo by Jay Rembert on UnsplashI’m going to tell you a terrible story, that for most people living in the United States is probably all too familiar. A young man, less than 30 years old, felt unwelcome and unrewarded by society. Up until that day, he’d tried and failed at a number of pursuits, and every…
Adam Neumann Is Here to Elevate The World’s Blockchains
       
Adam Neumann Is Here to Elevate The World’s BlockchainsThe comeback tour starts with “Goddess Nature Tokens”Photo: Created by authorThe prodigal son of entrepreneurship has returned. Fresh off the back of convincing investors that his co-working real estate company (sorry, tech company) was worth $47 billion, before a botched IPO, a torrent of yogababble, and the erratic founder…
To Live and Grieve in America
       
To Live and Grieve in AmericaA republic, if we can keep it. Photo by the author, South Carolina Lowcountry, April 2022“I prefer someone who burns the flag and then wraps themselves up in the Constitution over someone who burns the Constitution and then wraps themselves up in the flag.” — Molly Ivins“So what, my dear…
Children Die and Parents Grieve, but Nothing Will Change
       
Children Die and Parents Grieve, but Nothing Will ChangeGun deaths are one of many grave illnesses of the United States of AmericaGrieving woman. Photo by Milada Vigerova on Unsplash.
A List of Suggestions That Won’t Stop Active Shooters From Killing Kids
       
A List of Suggestions That Won’t Stop Active Shooters From Killing KidsIt’s a horrific event. The lives of tens of people were cut short for no reason other than prejudice, bigotry, and access to a gun. I’m writing about the Uvalde school shooting, but I could be talking about any of the other 27 school shootings that have happened this year. I could be talking about any of the dozens of school shootings that, statistically, will happen in the coming months. I could be talking about any of the shootings of people at…
Dear Conservatives, I Don’t Want to Take Away Your Guns
       
Dear Conservatives, I Don’t Want to Take Away Your GunsDear conservatives,I’m a liberal, and I’m not interested in a debate about the Second Amendment. I already know how it plays out. You’ll say it’s your right enshrined by the Constitution, and I’ll say a “well-regulated militia” was intended to fend off a tyrannical government armed with muskets, not drones with long-range missile capabilities.
The forgotten benefits of “low tech” user interfaces
       
The forgotten benefits of “low tech” user interfacesSeemingly outmoded technologies sometimes hold the key to better user experiences. The benefits of low tech user interfacesIn the process of exploring both low and high-tech options for this device’s user interface, I was reminded of some occasionally overlooked benefits that low-tech user interfaces provide, as well as some of the shortcomings they embody. Low tech user interfaces are usually cheaper to manufacture than high tech ones. Notably, lower tech user interfaces often cost less to develop because they require simpler software tooling and logic than higher tech user interfaces. More constrainingThe final benefit of lower tech user interfaces doesn’t directly help the product manufacturer or the end user.
CSS: Absolutely positioning things relatively
       
CSSCSS: Absolutely positioning things relativelyUsing CSS grid to render complex webpages responsivelyBy Benjamin MorrisResponsiveness is hardAs software engineers, we have a plethora of tools available to control the rendering of a webpage (Introduction to CSS layout) and can easily create bespoke user interfaces for different devices (Using media queries). However, Canva is a design platform where users can create designs by freely dragging and dropping elements using our fixed dimensions editor. Our friend: the CSS gridYou can read the basic concepts of grid layout if the CSS grid is new to you, but it does what it says on the tin. grid-template-columns: 31vw 14vw 10vw 31vwAnd we can do the same for rows, using the screen width as our constant. Our backend algorithms can reconfigure the content and produce a different grid layout for each screen size, allowing everything to move around.
No Training Data? No Problem! Weak Supervision to the Rescue!
       
Image by Author using images from Scientist icons created by Nikita Golubev — Flaticon, People icons created by Freepik — Flaticon, Cross icons created by Freepik — Flaticon, Drake Meme by Imgflip. Image by Author using images from Scientist icons created by Nikita Golubev — Flaticon, People icons created by Freepik — Flaticon, Cross icons created by Freepik — Flaticon, Drake Meme by Imgflip. Image by Author using images from Scientist icons created by Nikita Golubev — Flaticon, Boolean icons created by Flat Icons — Flaticon, Hierarchy icons created by surang — Flaticon, Knowledge icons created by Freepik — Flaticon. We do this with the help of some fancy math, and it doesn't need any ground-truth data [Data Programming Paper][MeTaL Paper][Flying Squid paper]! ? Weak Supervision FrameworksIn the Weak Supervision Benchmark [WRENCH Paper][Github], the authors benchmark various weak supervision frameworks and compare them to fully supervised golden benchmarks, as shown below.
How I Helped A Retail Company Using My Data Science Skills
       
Finally, I created a GET API to let the ERP providers use it to get the XML data. Second Problem: Courier Status UpdateAndreas told me that his company had hard times with the courier providers. After some research, I was able to find the API from the courier provider’s website. After some analysis, he decided to skip the courier providers and hire two drivers to deliver the orders in Athens. Most data scientists have a huge toolbox of skills that can solve nearly anything by just using critical thinking and trying to adapt.
Everything You Always Wanted to Know About Synthetic Data
       
Despite several credits on synthetic data, I’ve had many similar conversations to what’s above, where people aren’t aware of synthetic data. According to the senior VP of AI at Unity, synthetic data is actually better than real data, and we could enrich the real data through synthetic data. After interviewing hundreds of data scientists, YData has concluded, that unavailability of high-quality data is one biggest issues data scientists’ faced. Utility indicates the performance of the synthetic data in downstream applications compared to the original dataset, fidelity measures how well the synthetic data statistically matches the original data, and privacy indicates the level of confidentiality of the synthetic data. The synthetic data generated can be differentially private and is best suited when data scientists require data at the same granularity as the original data for common data science problems.
HydraSum: Disentangling Stylistic Features in Text Summarization… (Paper Review/Described)
       
This paper focuses on the Text Summarization task and tries to give a sense of control over the predictions over the model’s final user. The model uses a weighted average (called gate) of the decoders’ output to generate summaries with a certain style. 1) There is also a gating mechanism (g) that basically is a weighted sum of the k decoders' output. (Image from [1])ResultsThe authors did an excellent analysis to show different decoders will learn different styles. HydraSum: Disentangling Stylistic Features in Text Summarization using Multi-Decoder Models.
Complete Detailed Tutorial on Linear Regression in Python for Beginners
       
Photo by Gábor Szűts on UnsplashComplete Detailed Tutorial on Linear Regression in Python for BeginnersLinear Regression Basic, Simple, and Multiple Linear Regression Implementation in Scikit-LearnLinear regression is the most basic type of machine learning. Because a lot of other popular machine learning and deep learning algorithms are built on linear regression. Linear Correlation: It is also necessary for linear regression to work that the dependent variable and independent variable are linearly correlated as shown in the picture above. For simple linear regression, we need only two variables, I will keep only petal_length and petal_width from here. This is the video version of the Simple Linear Regression tutorial:Multiple Linear Regression ExampleIn the last example, we had only one variable to predict petal width.
Google AI Blog: Image-Text Pre-training with Contrastive Captioners
       
More recently, contrastive dual-encoder (CLIP, ALIGN, Florence) and generative encoder-decoder (SimVLM) approaches trained using web-scale noisy image-text pairs have been explored. Dual-encoder models exhibit remarkable zero-shot image classification capabilities but are less effective for joint vision-language understanding. In “CoCa: Contrastive Captioners are Image-Text Foundation Models”, we present a unified vision backbone model called Contrastive Captioner (CoCa). We feed sampled video frames into the CoCa frozen image encoder individually, and fuse output features by attentional pooling before applying a learned classifier. ConclusionWe present Contrastive Captioner (CoCa), a novel pre-training paradigm for image-text backbone models.
Powering Next Generation Applications with OpenAI Codex
       
OpenAI Codex, a natural language-to-code system based on GPT-3, helps turn simple English instructions into over a dozen popular coding languages. We’re already seeing new applications of Azure OpenAI Service across many industry verticals, from healthcare to financial services. Applications and IndustriesSince its release via our API, we’ve been working closely with developers to build on top of Codex. Through tight integration with Codex, GitHub Copilot can convert comments to code, autofill repetitive code, suggest tests and show alternatives. Developers search for entire commands using natural language rather than trying to remember them or assemble them piecemeal.
Introducing two new datasets to help measure fairness and mitigate AI bias
       
We also trained an AI model for reducing demographic biases in text, which can help break stereotypical associations present in NLP data. This method, known as a demographic text perturber, introduces demographically diverse variations that are otherwise similar to the original text. We hope that these datasets will be used to help further research around fairness in the AI community. This extensive set of demographic terms can be used to better measure and then mitigate model biases. For more information on the comprehensive set of demographic terms and evidence of its broad utility, download the research paper.
Two Methods for Performing Graphical Residuals Analysis
       
In this article, we will see two different graphical methods for analyzing the residuals in a regression problem: but those are just two methods useful for understanding if our data are linearly distributed. Thus, the residuals can be:Positive if they are above the regression lineNegative if they are below the regression lineZero if the regression line actually passes through the pointThe residuals visualized: they are the green vertical lines (the red line is the regression line). Residuals vs Predicted valuesOne of the graphs related to the residuals we may be interested in is the “Residuals VS Predicted values” plot. Is there a way by which the residuals can warn us the linear model we are applying is not a good choice? our initial intuition; then, we have to use other methods to finally decide if we can apply a linear model to our problem or not (but we’ll see these methods in another article).
Vectorization: Must-know Technique to Speed Up Operations 100x Faster
       
Why is Vectorization FasterA major reason why vectorization is faster than its for loop counterpart is due to the underlying implementation of Numpy operations. ndarray is optimized as the underlying operations are done using efficient C operations, which allows for vectorization. However, before registering your own vectorized function, think about whether there is an existent Numpy way of performing the same operation. Vectorization Thinking ProcessNow that you know what is vectorization, why to use vectorization, and when to use vectorization, I’d like to send you off with a bonus tip regarding the thinking process for vectorization. Major advantages of vectorization are (way) faster code execution and cleaner code.
XGBoost: Cardinality, the crucial HyperParameter that is always under-considered
       
The code and the plot below illustrate that :Show the discrete nature of Gradient Boosted Trees. Cardinality is important for two reasons:It’s directly linked to the cardinality of your predictions set. If your model cardinality is much higher than the cardinality of your predictions set, the odds are that your model is overfitting. Computing the cardinality of Gradient Boosted treesComputing the cardinality of the predictions that a Gradient Boosted Tree model can generate is not easy. Driving Gradient Boosted Tree cardinalityThe Hyper Parameters to tune in order to control model cardinality are the same as the ones that drive overfitting.
9 Actionable Ways to Improve Your Data Visualization Game
       
In this article, I summarize some of the main insights I acquired over the years on how to improve my data visualization skills. While he didn’t specifically say data visualization, I humbly believe he would agree that our ability to plot data falls within the writing category. Go the Extra Mile: The cherry on the cake is the numbers on top of each bar. Go the Extra Mile: Here are two great articles showing how malicious actors lie using data visualization. Go the Extra Mile: On a more extreme note, you can also force everything to black to simulate a faulty printer.
JAX vs PyTorch: Automatic Differentiation for XGBoost
       
Then, we will dive into the implementation of automatic differentiation with PyTorch and JAX and integrate it with XGBoost. First, we implement our loss function:Next, our automatic differentiation:Putting them together:Figure 1: PyTorch — Demonstration of automatic differentiation on mockup data. Figure 3: PyTorch — Run-time performance of automatic differentiation on real-world data (loaded in Figure 2). Run-Time Performance BenchmarkLet’s present a more thorough comparison of run-time performance. Now, let’s compare automatic differentiation to manual differentiation:Figure 6: Run-time benchmark results: Manual differentiation is faster than JAX.
6 Dimensionality Reduction Techniques
       
To overcome the issue of overfitting, training time and storage due to the high dimensionality, a popular approach consists in applying a dimensionality reduction technique to the original dataset. In this post I will describe six dimensionality reduction methods that you have to know when doing a data science project. Multi-dimensional Scaling (MDS)The next three methods I will present will consist in non-linear dimensionality reduction techniques. t-Distributed Stochastic Neighbor Embedding (t-SNE)The last algorithm I will present is a manifold-based dimensionality reduction technique. ConclusionThese six dimensionality reduction techniques represent the variety of methods that can be used for this purpose.
8 Cool Dplyr Functions to Learn in R
       
8 Cool Dplyr Functions to Learn in R In this post, we will check some important functions available in one the coolest R data wrangling libraries — dplyr Photo by Ales Nesetril @unsplash.com Dplyr is a really handy library you can use in R. Dplyr is a data manipulation package that is part of the tidyverse universe, a collection of libraries that has the goal of making R faster, simpler and easier. Other than the cool functions you can access by installing the package, Dplyr leverages the pipe ( %>% ) structure, a better way to encapsulate functions. We’ll perform our examples using the starwars dataframe, a built-in data set that you can use immediately after running library(dplyr) . Filter The first dplyr function that we will learn is filter . starwars_df <- starwars %>%mutate(height_x_mass = height*mass,franchise = ‘Star Wars’) New Columns Created with Mutate on the StarWars DataFrame — Image by Author Nice!
Even Google does not get it right
       
Even Google does not get it rightHow to get deep translation quality evaluation right — from metrics to common caveatsImage by the authorTen years ago learning a new language was a struggle. Looking through those examples it seems like we are miles away from a perfect machine translation (MT) tool. We would use MT quality evaluation models. Machine Translation Quality EvaluationThe most straightforward way to evaluate how well translation model works is to ask experienced translators to judge. Rouge measures recall: how much the words (and/or n-grams) in the human reference summaries appeared in the machine generated summaries.
The Email Scam That Nearly Worked On Me
       
The Email Scam That Nearly Worked On MeIt’s easy for scammers to impersonate real companies — because real companies often behave like scam artistsvia Ayesha Raheem at PixabayTwo days ago I received an email that appeared to be from Norton, the makers of antivirus software. The email said they’d just charged me a ton of money on autopay. A screenshot …
Three Paintings That Changed The Way I Look At Art and Life
       
Three Paintings That Changed The Way I Look At Art and LifeThis is how philosophy, life, and beauty get a voiceART!? ‘one of the most elusive of the traditional problems of human culture’ — Richard WollheimArt in its entirety is as elusive as a fully satisfying description of the idea of God. For me, it is so elusive that I have…
Our Balance Keeps Us Learning
       
Our Balance Keeps Us LearningA closer look at the magical vestibular systemPhoto: Loic Leray / UnsplashSome years ago, I headed out for an early morning downhill ski, and one run later ended up with nausea and a headache that lasted for 24 hours. It was one of those not-infrequent days when the mountain was covered in fog so thick that visibility was near zero. I got off the lift, skied a few feet in the gray…
A Few Words I Have Never, Ever Spelled Correctly
       
A Few Words I Have Never, Ever Spelled CorrectlyI hate these words so much“Dictionary” by Stock CatalogI’ve been a professional writer for over 25 years, and a blogger for almost twenty years. I’ve written millions and millions of words. So you’d think I’d be very good at spelling, yes? Mostly, I am! Generally, when I’m writing I blast along at 100 WPM without giving much thought to proper spelling.
How Protected Are You From Covid?
       
How Protected Are You From Covid? This new wave, driven by the new BA.2.12.1 sub-lineage of Omicron, is reaching some who have been vaccinated before and some who have had Covid before. How protected are those who’ve already had Covid? Data continue to show that unvaccinated and under-vaccinated people are much more likely to get infected and die from Covid. There is also emerging evidence that vaccination reduces the risk of long Covid.
Einstein was right. Flying clocks around the world in opposite directions proved it.
       
This artful illustration of Einstein, some of his equations, and a rendering of a surreal clock helps us conceptualize the differing passage of time experienced by people in different locations and moving at different rates. Although time dilation had been measured for subatomic particles previously, it wasn’t until ~50 years ago that it was measured for an actual clock. Flying clocks around the world in opposite directions proved it. Flying around the world gave Einstein the ultimate test. In 1905, our conception of how the Universe changed forever when Einstein put forth his…
After Uvalde: Beyond Easy Answers
       
After Uvalde: Beyond Easy AnswersWe need to focus on the shooter and not on the gun. Photo by Thomas Def on UnsplashWhen the news alert about today’s shooting in Uvalde, Texas ran across my smartphone, I wondered how long it would be until the social media would be filled with angry people. There was a lot of metaphorical rending of garments about the state of our nation that seems to worship the gun. With every mass…
The Greatest Recorded SCREAM in Rock History
       
The Greatest Recorded SCREAM in Rock HistoryNow that’s rock and roll. Photo by Panos Sakalakis on UnsplashIn 1963, The Beatles made a splash by singing “Yeah, Yeah, Yeah,” upsetting the British establishment who hated hearing Americanisms in place of the Queen’s English. Less than a decade later, The Who made a pretty big splash of their own with just one loud “Yeah.”
Regular Expression (RegEx) in Python: The Basics
       
Regular Expression (RegEx) in Python: The BasicsMaster the fundamentals of RegEx in PythonImage by authorConsider you have a lot of text data, and you want to extract meaningful information. Raw string in PythonBefore delving deep into the regex, it is crucial to understand what raw string is. Using raw strings for regex patterns is recommended to avoid the Python interpreter treating the strings unexpectedly. Normal vs. raw string (image by author)Summary of typical regex metacharactersMetacharacters are characters with special meaning in the regex pattern. Important metacharacters used in regex pattern (image by author)1.
Google Brain’s New Model Imagen Is Even More Impressive Than Dall-E 2!
       
Google Brain’s New Model Imagen Is Even More Impressive Than Dall-E 2! Still, what’s even more amazing is how it works using something I never discussed on the channel; a diffusion model. They used a huge text model, similar to GPT-3, to understand the text as best as an AI system can. So instead of training a text model along with the image generation model, they simply use a big pre-trained model and freeze it so that it doesn’t change during the training of the image generation model. Instead, we first generate a photorealistic image using the diffusion model we just discussed and then use other diffusion models to improve the quality of the image iteratively.
NLP Using Deepleaning Tutorials: A Sentiment Classifier Based on Perceptron (Part 2/4)
       
So, I will present a complete solution for Sentiment Analysis based on the simplest neural network “Perceptron”, using a real-world task and dataset: to classify whether restaurant reviews on Yelp is positive or negative. Part #2: The Vocabulary and The VectorizerEach text is a collection of words or characters, which are called tokens. This makes it possible to use a text as input for a Neural Network basically based on mathematical equations. The content of the Vocabulary class is :content of vocabulary.py fileVectorizing a textThe Vectorizer class encapsulates Vocabulary features. In addition to creating a vocabulary, the Vectorize() function returns a vectorized representation of a text input (= Vector of numerical values).
How Data Science Depends on Pandas and Numpy?
       
Well, Python provides us a feature to structure this messy data using Pandas data frames, a package in Python. Before discussing the Pandas data frame, let us first understand what a data frame is. In general, Pandas data frame consists of three main components: the index, the columns, and the data. Note: Make sure you import the below libraries when you use Pandas data frame:import numpy as npImport pandas as pdTopics to be covered1. In Pandas, a data frame is created by using the ‘data frame()’ function and passing values: data, columns, and indices.
Weak Supervision with Snorkel for Multilabel Classification Tasks
       
We build 3 keyword labeling functions for each tag, yielding 12 labeling functions in total. 100%|█████████████████████████████| 135/135 [00:02<00:00, 61.12it/s]Combining Labeling Function OutputsWe now have 14 labeling functions, which are expected to overlap or conflict with each other. Don’t forget to also apply the labeling functions to test data as follows, since we can only evaluate the performance of labeling functions on test data. Wrapping UpWe’ve been introduced to Weak Supervision, a data labeling method without actually labeling any data (manually). The idea of weak supervision is to combine the outputs of many labeling functions which are used to programmatically label data.
What Are Vision Transformers And How Are They Important For General Purpose Learning?
       
What Are Vision Transformers And How Are They Important For General Purpose Learning? What is a Vision Transformer (ViT)? Vision Transformers (ViT)The concept of Vision Transformer (ViT) is an extension of the original concept of Transformer, the latter of which is described earlier in this article as text transformer. Image Classification (Image -> Label)The task of image classification is the most common problem in vision. Final RemarksIn this article, we have explained the concept of text transformers & image transformers.
BERTScore: Evaluating Text Generation with BERT
       
BERTScore: Evaluating Text Generation with BERTMachine Learning Research Paper SummaryImage by AuthorBERTScore is an automatic evaluation metric used for testing the goodness of text generation systems. Unlike existing popular methods that compute token level syntactical similarity, BERTScore focuses on computing semantic similarity between tokens of reference and hypothesis. Image by AuthorThe authors of this paper also introduced the notion of weights for each word similarity calculation. They stick with IDF weights which they derive based on a large amount of offline text data. I would encourage you to also read through the paper, details of which are mentioned below —⏩ Paper Title: BERTScore: Evaluating Text Generation with BERT⏩ Paper: https://arxiv.org/abs/1904.09675⏩ Author: Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, Yoav Artzi⏩ Organisation: Cornell University, ASAPP IncI hope you enjoyed reading this.
Key Learning Points from MLOps Specialization — Course 4
       
MLOPS SPECIALIZATION SERIESKey Learning Points from MLOps Specialization — Course 4Final insights (with lecture notes) from the Machine Learning Engineering for Production (MLOps) Course by DeepLearning.AI & Andrew NgPhoto by Built Robotics on UnsplashRealizing the potential of machine learning (ML) in the real world goes beyond model training. By leveraging the best practices of MLOps, teams can better operationalize and manage the end-to-end lifecycles of ML models in a sustainable manner. In this final article of the 4-part MLOps Specialization series, I summarize the lessons of Course 4 so that you can skip the hours of online videos while still gleaning the key insights. ContentsThis article covers Course 4 of the 4-course MLOps specialization. Follow this page to stay updated with more ML content.
Various Challenges for Applying Machine Learning in Healthcare
       
Photo by Waldemar Brandt on UnsplashThere could be challenges when trying to implement machine learning models for healthcare data. One of the challenges when dealing with healthcare data is that the data might be causal for machine learning models. Shortage of Data Scientists and Machine Learning EngineersPhoto by Mick Haupt on UnsplashSo far, we’ve talked about data and machine learning algorithms that are the limitations of using data science in healthcare. Bias would be present in machine learningPhoto by Alex Padurariu on UnsplashWhen performing the machine learning tasks, there would be a presence of bias which could lead to the machine learning models not performing well on the unseen or the test set respectively. The bias that is present in machine learning models would be due to the type of data that is fed to machine learning models.
Should you get a Master in Data Science? Advice from a Stanford MS Student
       
Advice from a Stanford MS Student Stanford Hoover Tower (Unsplash) Personal experience and what you get from a Master First, I will give some thoughts on what you can gain from a Master. You have to take 15 units of Mathematical and Statistical Foundations, 3 of Experimentation, 6–12 in Scientific Computing and 6–12 in Machine Learning. I also took two courses in Machine Learning, STATS 315A and STATS 315B, which focus on the Mathematics behind Machine Learning and Deep Learning. A case I often see is Data Science / Machine Learning jobs that require Python, R, SQL and C++ while others do not require C++ knowledge. This one is for a Research Engineer in Computer Vision and Machine Learning.
Natural Language Processing: PDF Processing Function for Obtaining a General Overview
       
Natural Language Processing: PDF Processing Function for Obtaining a General OverviewMany of the documents used for Natural Language Processing (NLP) today are in .pdf format. Photo by Dmitry Ratushny on UnsplashPython Library: PyPDF2The main Python library which will be discussed today is PyPDF2. PyPDF2 is a Python library that allows for the analysis and manipulation of .pdf format files through Python. The pdf_to_list_series_and_df() function accepts a dictionary and will return a series and data frame of that dictionary. Today provided a quick way to get the metadata of a PDF and convert it into a dictionary, series, and data frame.
The Art of Explaining Predictions
       
The Art of Explaining PredictionsHow to explain your model in a human-friendly wayPhoto by Jason Goodman on UnsplashAn important part of a data scientist’s role is to explain model predictions. In the end, we will apply some of this knowledge by explaining a model using SHAP values. Characteristics of a Good ExplanationWhen we talk about a good explanation we mean one that will be readily accepted. Figure 1: overview of characteristics of a good feature (source: author)TrueIt may seem obvious but a good explanation should be true. That is we are giving explanations for model predictions.
What Are Transposed Convolutions?
       
What Are Transposed Convolutions? You will often find layers of transposed convolutions in the decoder part of AutoEncoders, or in the generator part of GANs. Using Stride (Transposed) of OneNow, let’s try a 2D transposed convolution ( F.conv_transposed2d , in PyTorch's functional API), using a stride (transposed) of one, and a padding (transposed) of zero:import torch.nn.functional as F stride_transp = 1padding_transp = 0 F.conv_transpose2d(input_image,weight=kernel,stride=stride_transp,padding=padding_transp)Output:tensor([[[[1., 2., 1. Using Stride (Transposed) of TwoWe’ll keep the same input image and kernel/filter, the same padding (transposed) of zero, but we’ll be using a stride (transposed) of two now. The figure below illustrates the whole thing:Transposed convolution using stride (transposed) of two.
What’s Your Midterms Plan? It’s Time to Take a Voluncation.
       
What’s Your Midterms Plan? It’s Time to Take a Voluncation. Let’s all consider taking time off of work to volunteer. Compound words seem sufficient enough without inventing all sorts of…what are they called? There’s smog (smoke and fog), motel (motor hotel), and Brangelina, Bennifer, and Kimye.
ICYMI: The Great Highway Is (Still) Engulfed in Sand
       
ICYMI: The Great Highway Is (Still) Engulfed in SandWe love to see it — ‘it’ being the ‘Great Sandway’Photo: Courtesy of Twitter via [at]snaileThe Great Highway, which stretches 3.5-miles along San Francisco’s western neighborhoods, has been a subject of debate since more than two miles of it was closed to vehicular traffic in 2020. On April 6, San Francisco Public Works (DPW) first closed the southbound lanes of the Upper Great Highway for standard sand removal maintenance. Moreover: San Francisco doesn’t have work crews solely dedicated to sand removal, according to District 4 Supervisor Gordon Mar. The Great Highway has been a part of San Francisco’s infrastructure since it was opened in 1929. The southernmost portion of the highway, the Great Highway Extension from Sloat Boulevard to California State Route 35, is expected to be permanently close to vehicle traffic starting in 2023 as part of the Ocean Beach Climate Change Adaptation Project.
It’s Time to Start Traveling Differently
       
It’s Time to Start Traveling Differently“There are some who have crossed deserts, floated on ice caps, and cut their way through jungles but whose souls we would search in vain for evidence of what they have witnessed.” — Alain de Botton. Aunty Kehau’s dark eyes lit up as she ushered my 10-year old son Nikko and I through the chain-linked enclosure. She threw back her long salt and pepper hair, pulled Nikko close, thanking him for…
How I Ended Decades of Chronic Pain
       
How I Ended Decades of Chronic PainWithout meds, my back and hip are, to my surprise and delight, nearly pain-freeImage: Unsplash / Raimond KlavinsLow back pain has dogged me since my mid-20s, moderate and annoying most days with the occasional must-lie-down-now episode that made me grumpy and ornery but got me out of yard work for a few days.
Image Classification with Python: CNN vs Transformers
       
Image classification is the process of categorizing and labeling groups of pixels within an image based on specific rules or models. In this article, using Computer Vision and Python, I will explain 3 different strategies for image classification: build a CNN from scratch, leverage a pre-trained model, and apply the cutting edge Vison Transformers (ViT). So, if there is a pattern, image data points will appear in clusters. I compared 3 approaches: CNN from scratch, Transfer Learning from a pre-trained CNN (VGG16), and the cutting-edge Transformers for images (ViT). I went through data analysis, preprocessing, model training, data augmentation, evaluation, and explainability.
Melting ML Requests by Using SQS and Multiprocessing
       
Melting ML Requests by Using SQS and MultiprocessingLet’s consider a site that has a lot of users and predicts whether there are dogs in the pictures posted. You may transmit, store, and receive messages across software components using SQS at any volume without losing messages or necessitating the availability of other services. Durability: To ensure the safety of your messages, Amazon SQS stores them on multiple servers. Availability: Amazon SQS uses redundant infrastructure to provide highly-concurrent access to messages and high availability for producing and consuming messages. Scalability: Amazon SQS can process each buffered request independently, scaling transparently to handle any load increases or spikes without any provisioning instructions.
How Uber uses AI to serve you better
       
How Uber uses AI to serve you betterHow can Uber deliver food and always arrive on time or a few minutes before? For several years Uber used XGBoost, a well-known gradient-boosted decision tree machine learning library. Here is the complete DeepETA model to answer this question. Illustration of the DeepETA model structure. References:►Read the full article: https://www.louisbouchard.ai/uber-deepeta/►Uber blog post: https://eng.uber.com/deepeta-how-uber-predicts-arrival-times/►What are transformers: https://youtu.be/sMCHC7XFynM►Linear Transformers: https://arxiv.org/pdf/2006.16236.pdf►My Newsletter (A new AI application explained weekly to your emails!
Google Brain’s New Model Imagen Is Even More Impressive Than Dall-E 2!
       
Google Brain’s New Model Imagen Is Even More Impressive Than Dall-E 2! Still, what’s even more amazing is how it works using something I never discussed on the channel; a diffusion model. They used a huge text model, similar to GPT-3, to understand the text as best as an AI system can. So instead of training a text model along with the image generation model, they simply use a big pre-trained model and freeze it so that it doesn’t change during the training of the image generation model. Instead, we first generate a photorealistic image using the diffusion model we just discussed and then use other diffusion models to improve the quality of the image iteratively.
After brief scare, Mookie Betts stays in and does it all in a seventh straight win
       
After brief scare, Mookie Betts stays in and does it all in a seventh straight win(Photo by Tim Nwachukwu/Getty Images)by Rowan KavnerMookie Betts stuck his hands out and braced himself for the impact. The Dodgers let out their collective breath as Betts remained in the game after what he assumes was a stinger. Then he went from catalyzing the Dodgers’ seventh straight win with his offense to cementing it with his defense. Betts finished the night 3-for-4 with three RBI and the crucial outfield assist in a 7–4 win. In an inauspicious start to the bottom of the seventh inning, Dodger reliever Justin Bruihl walked the first batter he faced on four pitches then hit J.T.
The Age of Extinction Is Here — Some of Us Just Don’t Know It Yet
       
This is when things start to get really, really bad — really, really fast. My Western friends still don’t really grasp this at all. My Western friends don’t think these days. My Western friends don’t understand that we are part of systems. Imagine how much worse inflation’s going to get when Extinction really begins to bite.
Top 5 Machine Learning Fields: A Blog for Beginners
       
Top 5 Machine Learning Fields: A Blog for BeginnersDiscussing the Top 5 Machine Learning Fields for BeginnersPhoto by Jackson So on Unsplash1. In this blog, we will be covering some of the hottest machine learning algorithms used by organizations. Supervised learning: This is where the algorithms are trained using labeled dataSupervised learning is one of the most common approaches used in machine learning to train machine learning models. The data used in machine learning is called “supervised data” and this supervised data is used to train the machine learning algorithm. Unsupervised learning is a machine learning task where the output is not directly a label or a value, but a set of clusters.
5 Data Science Applications in the Life Insurance Industry
       
5 Data Science Applications in the Life Insurance IndustryHow data science offers opportunities to the life insurance value chainPhoto by dominik hofbauer on UnsplashBackgroundAfter a bit of a search, I was surprised in the first instance by how little has been written on the topic of data science applications in the life insurance industry on the Medium platform. low drive from ‘top-down’)With respect to 1), it needs to be established that the term Data Science should be defined as encompassing Data Transformation, Data Visualisation, Predictive Analytics, Machine Learning, and other AI disciplines. This article is written to do exactly this by setting out 5 data science applications in the life insurance industry. Data science techniques enable life insurers to quantify as well as visualize the socio-economic status of a particular geometric area. In addition, it is the writer’s view that data science applications such as Data Transformation and Visualisation are often overlooked.
A comprehensive cheat sheet on Tableau Charts: A Road to Tableau Desktop Specialist Certification
       
Bar ChartA Bar chart is the most effective and easiest way to visualize data in Tableau. Minimum Requirements: 0 or more Dimensions, 1 or more MeasuresStacked Bar ChartStacked Bar Charts are an extended version of Horizontal Bar Charts. bar chart in a bar chart. Dual Axis Chart vs Combined Axis ChartIn the Combined Axis chart, both measures share the same axis. The combined Axis chart is also known as the Blended Axis Chart or Shared Axis Chart.
Monkey Patching Python Code
       
In this tutorial, we are going to see how we can apply this technique to some Keras and TensorFlow code. After finishing this tutorial, you will learn:What is monkey patchingHow to change an object or a module in Python at runtimeLet’s get started. Tutorial OverviewThis tutorial is in three parts; they are:One model, two interfacesExtending an object with monkey patchingMonkey patching to revive legacy codeOne Model, Two InterfacesTensorFlow is a huge library. But since we’re using Python, it is possible for us to add it using the monkey patching technique. Therefore, monkey patching is unwelcomed in production code.
The Berkeley Crossword Solver
       
The Berkeley Crossword SolverWe recently built the Berkeley Crossword Solver (BCS), the first computer program to beat every human competitor in the world’s top crossword tournament. in Berkeley (3)Domain ender that UC Berkeley was one of the first schools to adopt (3)Angeleno at Berkeley, say (8)Our ApproachThe BCS uses a two-step process to solve crossword puzzles. Compared to the previous state-of-the-art method for answering crossword clues, this approach obtained a 13.4% absolute improvement in top-1000 QA accuracy. FillWinning The American Crossword Puzzle TournamentThe American Crossword Puzzle Tournament (ACPT) is the largest and longest-running crossword tournament and is organized by Will Shortz, the New York Times crossword editor. To encourage future work, we are releasing a dataset of 6.4M question answer clues, a demo of the Berkeley Crossword Solver, and our code at http://berkeleycrosswordsolver.com.
Introducing CommerceMM: A new approach to multimodal understanding for online shopping
       
We’ve built CommerceMM, a new approach to multimodal understanding for online shopping. To address this need, we’ve developed a powerful new approach to pretraining and a versatile new model for Commerce MultiModal Representation (CommerceMM). CommerceMM achieves a richer understanding of multimodal data by integrating its characterizations of a post’s text and image. But online shopping is conducive to more diverse text and image data, which we can use to teach AI systems to find new relationships between modalities. This step refines the relationships between all the embedding modalities: image and image, image and text, image and multimodal, text and multimodal, etc.
How Data Scientists Can Reduce Data Wrangling Time with a Data Mart
       
Eventually, I put these frequent queries into ETL pipelines and created an analytics data mart that helped reduce my data cleaning and preparation time by more than 50%. The users of the data mart can be data scientists or data analysts with product stakeholders. Create a list of questions the data mart will be used to answerThis will determine the type of data you’ll have in the data mart. Following along in the product data mart example from step 2, we’ll need to use data sources related to signups, product behavior, and user experiments. I’ve had times when data engineers used test data and all I could do was validate the table schema.
Store Data Efficiently With These CSV Alternatives
       
In this post, we’ll discuss…· Issues with CSV for Data Scientists;· My thoughts on using Pickle files instead;· Better alternatives for CSVs and Pickle;· Benchmarking different file formats to store datasetsYou can access the Colab notebook I used in this post for benchmarking. Especially if you have no information on what software the client is using, CSVs are great. The size of a Pickle file on the disk may vary. Because you can read pickle files only from a Python program. Feather file formatThe feather file format is a fast, language-agnostic data frame storage for Python (pandas) and R.Feather is optimized for low storage space and high performance.
Matplotlib vs. Plotly: Let’s Decide Once and for All
       
Matplotlib vs. Plotly: Let’s Decide Once and for All Deep and rapid comparison in terms of 7 key aspects Goofy Image by Author There is an annoying habit of soccer fans. Switching from Plotly Express to Plotly Graph Objects (GO) requires a steep learning curve. Measuring Matplotlib: Measuring Plotly: Matplotlib is almost 80 times faster than Plotly, with lower SD errors. Not many know this, but outside Jupyter Notebooks, Matplotlib plots render in an interactivity mode by default. For example, Plotly uses uniform colors rather than a palette when creating bar charts or boxplots.
Monkeying with DALL-E
       
Monkeying with DALL-EGenerative Art StorytellingCan there be a movie or a comic book with AI-generated characters, sets & plots? Let us try with DALL-E.Wikipedia: DALL-E is an artificial intelligence program that creates images from textual descriptions. Let us pursue the idea of giving a series of text prompts that flow like a story. You can see the whole process… give your input, and DALL-E processes it and outputs the image. This is improving as we speak (Check Dall-E 2 from OpenAI)Can we generate text prompts programmatically & autogenerate coherent narratives?
How To Become a Better Data Science Team
       
That is — the data science team. While this isn't set in stone, here is an example of a data science team: a few data scientists, a data engineer, a business/data analyst, a data science manager, etc. With that being said, let’s look at the tools and methods to better your data science team, whether you are a data scientist yourself, a manager, or possibly a technical recruiter. SummaryWhile improving yourself will improve the team, there are other things that are more team-centric that can be focused on to ultimately make your data science team even better overall. What other tools and methods do you think are important to point out in regards to data science team improvement?
A Tale of Two Architectures
       
A Tale of Two ArchitecturesDeploying Machine Learning Models as MicroservicesIt was the easiest to start with, it was the hardest to evolve from, one could say when paraphrasing the first lines of Dicken’s masterpiece (one of them) A Tale of Two Cities. The classification model was needed to make sure the document was of the type the models were expecting. But isn’t that too much for an application whose main purpose is to use machine learning models to analyse documents? Clients send documents’ locations plus an identifier (request id) to an input Kafka topic and listen to an output one2. This new gateway would interact with Kafka and expose two methods to clients: send(requestId, doc location) and receive(requestId, status, result).
Git in 4 Minutes
       
Git in 4 MinutesA succinct introduction to GitPhoto by EKATERINA BOLOVTSOVA: https://www.pexels.com/photo/person-holding-silver-macbook-on-brown-wooden-table-7445340/A lot of introductions to Git are very convoluted and overcomplicated. You don’t need to spend much time on it at all, so here’s a very basic introduction in 4 minutes — with absolutely no waffle. HEAD in Git is the most recent commit, so we can count back any number from the head to access that version. git checkout HEAD~2 And to return to the HEAD we can simply dogit checkout HEADThis should be all you need for a basic understanding and personal usage of Git. Of course there is a lot more to Git, but not that I can get at in 4 minutes.
F-Distribution Simply Explained
       
The F-Distribution is closely related to the Chi-Square Distribution. If you unfamiliar with this, refer to my ‘Chi-Square Distribution Simply Explained’ article that is linked above! However, due to the complexity and exhaustivity of this derivation for the F-Distribution I have omitted it in this article. Notice all the Gamma functions, this is because the Chi-Square distribution is a certain case of the Gamma Distribution. ConclusionIn this article we have explained where the F-Distribution originates from and its relation to the Chi-Square Distribution.
5 Ways Abortion Rights Protect Women’s Health
       
5 Ways Abortion Rights Protect Women’s HealthThe healthcare advances we’d abandon if women lose access to abortion carePhoto: Gayatri Malhotra / UnsplashIt’s only been six years since the Supreme Court struck down one of the most restrictive abortion laws in the nation at that time. In June 2016, the Court ruled 5–3 in favor of abortion clinics in the case Whole Woman’s Health v. Hellerstedt. Yet today, when…
5 Lessons Learned in Pandemic Year 2
       
5 Lessons Learned in Pandemic Year 2The U.S. hit a grim milestone of 1 million Covid deathsPhoto: SJ Objio / UnsplashOne million Americans have died from a virus that was unknown to humans just a couple of years ago. Here are 5 things we’ve learned in year two of the COVID-19 pandemic. Four, public health shifted from the collective “us” to the individual “me.” That is not the ethos of public health but pandemic fatigue. We have come a long way since early 2020; unfortunately, with the heavy human toll of 1 million deaths in the U.S. from Covid-19. We hope year 3 of the Covid pandemic is the last, but we know there is still more work to be done.
Prominent Tech Leaders and VCs Have Nothing to Say on Buffalo and “Great Replacement Theory”
       
Prominent Tech Leaders and VCs Have Nothing to Say on Buffalo and “Great Replacement Theory”From self-described free speech advocates like Elon Musk to more conscientious execs like Reid Hoffman, the tech industry’s silence is deafening
7 Scikit-learn Utilities to Generate Artificial (Synthetic) Data
       
7 Scikit-learn Utilities to Generate Artificial (Synthetic) DataExplained with graphical visualizations(Image by author)Artificial or synthetic data is a type of data that is generated artificially through computer algorithms. To generate data that satisfies specific conditions that are usually not available in real-world data. The methods described in this article include 7 Scikit-learn utilities (data generator algorithms) that are used to generate different types of synthetic data. Calling the make_blobs function (Code by author)Visualizing blobs data (Image by author)Important hyperparametersn_samples: Total number of data points (observations/samples) to generate. Calling the make_circles function (Code by author)Visualizing circles data (Image by author)Important hyperparametersn_samples: Total number of data points (observations/samples) to generate.
Trading With AI, a Dream Or Reality
       
Trading With AI, a Dream Or Reality I am sure you see multiple stories about how AI can trade in the market without emotion and rational decision-making. Some of us look at the AI model as a portfolio manager for holding positions that are swapped between assets. Photo by Annie Spratt on UnsplashIs there a human who can always win the trading game based on his strategy? We like to be masters in the market; the master can fail, so your ai model will fail in some situations. So you can not behave it like a classic AI problem.
New advances in speech recognition to power AR experiences and more
       
Tomorrow’s speech recognition systems will need to be far more efficient so they can run on-device on ultralight, compact, and stylish glasses. Speech recognition systems are already an increasingly important part of our products and services. Improving speech recognition for real-world needsSpeech recognition researchers across the industry and academia are continually publishing ever-improving results on widely used published benchmarks. Recognizing these rare or previously unseen words is particularly challenging for modern “end-to-end” speech recognition systems, such as the widely used RNN-T models. Such speech is extremely challenging for speech recognition systems to accurately transcribe.
Optimize F1 aerodynamic geometries via Design of Experiments and machine learning
       
With CFD, F1 aerodynamicists test different geometry concepts, assess their aerodynamic impact, and iteratively optimize their designs. Problem statementWhen exploring new aerodynamic concepts, F1 aerodynamicists sometimes employ a process called Design of Experiments (DoE). Generating the next design candidate to test in CFDSelecting which candidate to test next requires careful consideration. By doing so, we’re sampling in the region of the design space where the regressor is least confident about its prediction. In the following figure, we present the permutation importance for a Gaussian Process Regressor (GP) predicting aerodynamic downforce (Cz).
Build a risk management machine learning workflow on Amazon SageMaker with no code
       
Data engineers can use Amazon SageMaker Data Wrangler to quickly aggregate and prepare data for model building without writing code. Amazon Simple Storage Service (Amazon S3) acts as our data repository for raw data, engineered data, and model artifacts. Therefore, we give this responsibility to data engineers so they can transform data without writing code with Data Wrangler. Import the dataCreate a new Data Wrangler data flow from the Amazon SageMaker Studio UI. A data engineer can easily prepare data using Data Wrangler without writing any code and pass the prepared dataset to a business analyst.
Detect social media fake news using graph machine learning with Amazon Neptune ML
       
In this post, we demonstrate how to use Amazon Neptune ML to detect fake news based on the content and social context of the news on social media. Neptune ML is a new capability of Amazon Neptune that uses graph neural networks (GNNs), a machine learning (ML) technique purpose-built for graphs, to make easy, fast, and accurate predictions using graph data. The DGL makes it easy to apply deep learning to graph data, and Neptune ML automates the heavy lifting of selecting and training the best ML model for graph data. Neptune ML uses the DGL to automatically choose and train the best ML model for your workload. A model training strategy is a configuration set that specifies what type of model and model hyperparameter ranges are used for the model training.
The Perceptron Algorithm
       
The Perceptron AlgorithmUnderstand and code your own Perceptron Algorithm in R or PythonPhoto by Andres Haro on UnsplashThe Perceptron algorithm is possibly the simplest of binary classification algorithms. As a result, a keen understanding of this algorithm is critical to any aspiring data scientist. This article is meant to serve as a high-level overview of the mathematics behind the algorithm, as well as an example of how one might go about programming their own Perceptron algorithm. Perceptron AlgorithmNow assuming you have some basic knowledge of vectors and we have reviewed the the meaning of linear separability and hyperplanes, we can build on this knowledge to understand how the Perceptron algorithm works. Below is a full R script with all of the steps in the algorithm defined as functions.
Molecules in augmented reality from ML-powered recognition of everyday objects and hand-drawn molecular structures
       
Molecules in augmented reality from ML-powered recognition of everyday objects and hand-drawn molecular structuresNew paper presents a smartphone app that recognizes hand-drawn chemical structures and molecules in everyday stuffA new paper by Todd Martinez’s group in Stanford just came out presenting an impressive new smartphone app that recognizes chemicals in photographs or parses hand-drawn molecular structures, to then launch 3D views of them in augmented reality on your phone. An overview of MolAR and how it worksAR and VR for molecular visualizationViewing 3D molecular structures is critical to understanding how the world works at the atomic level. But molecules are intrinsically 3D objects, and for a long time we’ve been stuck with 2D views on flat screens. However, Martinez’s group realized about a yet important problem: how to input molecular structures into the app in a more interactive way? MolAR uses deep learning technologies to input molecules in two ways:By recognizing and parsing molecules hand-drawn on a piece of paper, board, etc.
Comparing things: The Bayesian approach
       
Comparing things: The Bayesian approachHow to make probabilistic comparisons that embrace uncertaintyComparing things is hard. Rating teamsA popular way to rate sports teams or individual players is the so-called ELO rating system, developed by Dr. Arpad Elo. In a nutshell, the winning team gets awarded points while the losing team loses points. We can sample many values from the two teams’ rating distribution and plot their densities to see how they compare. But the best thing about the Bayesian approach is yet to come: how we can evolve our prior beliefs based on data!
Understanding min_child_weight in Gradient Boosting Decision Trees
       
Understanding min_child_weight in Gradient Boosting Decision TreesDo you really know how this hyperparameter works? At every point in the development of the trees, every observation in the data will have a so-called hessian value. Hessian Value of an ObservationTheoryThis document from XGBoost gives a great high-level summary of the math background, even if it skips some concepts like the learning rate. The loss function that we want to minimise:loss = - y * log(p(y)) - (1-y) * log(1-p(y))This is the so-called log loss function. From the above formula,p(y) = exp(raw_score * sigma) / (1 + exp(raw_score * sigma))We need to insert this expression in the original loss function to get the format the gradient boosting algorithm will work with.
A Comprehensive Guide to Microsoft’s Swin Transformer
       
Unlike the Vision Transformer (ViT) (Dosovitskiy et al., 2020) which precedes it, Swin Transformer is highly efficient and has greater accuracy. Swin Transformer Architecture & Key ConceptsThe Swin Transformer introduced two key concepts to address the issues faced by the original ViT — hierarchical feature maps and shifted window attention. In fact, the name of Swin Transformer comes from “Shifted window Transformer”. As we can see, the ‘Patch Merging’ block and the ‘Swin Transformer Block’ are the two key building blocks in Swin Transformer. Swin Transformer BlockThe transformer block used in Swin Transformer replaces the standard multi-head self-attention (MSA) module used in ViT with a Window MSA (W-MSA) and a Shifted Window MSA (SW-MSA) module.
A Brief Overview of Machine Learning
       
A Brief Overview of Machine LearningLearn the fundamentals of machine learning and artificial intelligence and their potential challenges and caveats. In this article, I would explain machine learning and the different categories of machine learning. We would then see how well the machine learning models performed on the test set (it is really important that machine learning models perform well on the test set). Lot of companies are investing huge sums of money in machine learning research to enhance their productivity and increase the revenue from the products using machine learning. ConclusionWe have covered machine learning and the types of machine learning.
Can Hybrid-ML Approaches Help When Supervised Data Isn’t Enough for LLMs?
       
Can Hybrid-ML Approaches Help When Supervised Data Isn’t Enough for LLMs? Notably, this process requires a large amount of labeled data (supervised), reflecting the characteristics and requirements of the task to replicate. The research around “few-shot learning” [11] techniques focused mostly in studying and comparing approaches learning from small supervised data and large sets of unlabeled data. In these situations, using hybrid approaches leveraging symbolic representations of text seemed to be more effective and produced the best results. When using hybrid approaches, leveraging enriched symbolic representations of text compensates for the scarcity of supervised data, overperforming even compared to the classical methods based on BoW representation.
Outlier Detection Techniques in Python
       
Outlier Detection Techniques in PythonA guide to outlier detection methods with examples in PythonImage by Sebastian Voortman on PexelsOutlier detection, which is the process of identifying extreme values in data, has many applications across a wide variety of industries including finance, insurance, cybersecurity and healthcare. Finally, outlier detection has been used for rare disease detection in a healthcare context. A commonly used clustering method for outlier detection is DBSCAN, which is an unsupervised clustering method that addresses many of the limitations of IQR. For this reason, any data science team should be familiar with the available methods for outlier detection and removal. If you are interested learning about the basics of python programming, data manipulation with Pandas, and machine learning in python check out Python for Data Science and Machine Learning: Python Programming, Pandas and Scikit-learn Tutorials for Beginners.
Build an Image Duplicate Finder System: A Guide
       
Image Duplicate Finder vs. Content-Based Image Retrieval SystemsThe main difference between both systems is that an image duplicate/near-duplicate finder detects only identical and near-identical images (Image 2). Image 2 — An Example of the Ins and Outs of the Image Duplicate Finder System (image by author)Image 3 — An Example of the Ins and Outs of the Content-Based Image Retrieval System (image by author)Notice how the content-based image retrieval system recognizes the apple and outputs images of different scenes. Image 4 — Euclidean Distance (image by author)We can represent images as vectors. Image 5 — The General Formula of Euclidean Distance (image by author)Image 6 — An Example of applying the Euclidean Distance Formula (image by author)The method formula is straightforward. Image 8 — Cosine Similarity Equation (image by author)As the angle between two vectors gets small, the similarity gets stronger [9].
Transformers and Multimodal: The Same Key for all Data Types
       
A first approach to the analysis of different types of data took place between text and audio thanks to LSTMs. Multimodal Machine LearningHaving now a single architecture capable of working with different types of data represents a major advance in the so-called Multimodal Machine Learning field. Image by the authorVATT: Transformers for Multimodal Self-Supervised LearningOne of the most important applications of Transformers in the field of Multimodal Machine Learning is certainly VATT [3]. Internally, once again, there is a Transformer that takes in input data of different types transformed into a sequence of tokens. “Multimodal Machine Learning: A Survey and Taxonomy”[2] “Ashish Vaswani et al.”.”Attention Is All You Need”[3] “Hassan Akbari et al.”.
Inference Optimization for Convolutional Neural Networks
       
In this blog post, we will look at quantization and fusion methods for convolutional neural networks. Model parameters are stored in floating point numbers, and model operations are calculated using these floating point numbers. It performs some or all of the operations on 8-bit integers, which can reduce the model size and memory requirements by a factor of 4. n = Net().cuda()summary(n, (3, 224, 224))The model summary shows that we have about 70 million parameters and estimated model size of 294MB. To run the quantized model for the eval() setting we need to define the configuration.
The What, Why and How of Generative Flow Networks
       
Let’s call it the surrogate reward function, because it’s the current best representation of the relationship between chemical structure and antibiotic “reward”. P_F is the forward policy and P_B is the backward policy. Intuitively, the inner part is the ratio of the forward flow (from origin to termination) over the backward flow (from reward to origin). The heart of the reward environment is a function that calculates reward based on input coordinates:"""Calculate reward function according to section 4.1 inhttps://arxiv.org/abs/2106.04399. We’ll see how that works in code in the next section, but just know that the backward policy is necessary!
Replicate a Logistic Regression Model as an Artificial Neural Network in Keras
       
Replicate a Logistic Regression Model as an Artificial Neural Network in KerasNeural Networks and Deep Learning Course: Part 11Image by Gerd Altmann from PixabayLogistic regression is a very simple neural network model with no hidden layers as I explained in Part 7 of my neural network and deep learning course. Here, we will build the same logistic regression model with Scikit-learn and Keras packages. However, we can build the same model in Keras with a neural network mindset because a logistic regression model can be technically considered an ANN. Making train and test parts (Code by author)Build the logistic regression model with Scikit-learnNow, we use the Scikit-learn LogisticRegression() class to build the logistic regression model on the breast cancer dataset. Building a logistic regression model in Scikit-learn (Code by author)The output of the above model (Image by author)Build the same logistic regression model with a neural network mindset in KerasThis is the main topic that we’ll give more emphasis on today.
Exploring the Dow-Jones Industrial Average using Linear Regression
       
Dow-Jones Industrial Average in 2020 (figure by author)Exploring the Dow-Jones Industrial Average using Linear RegressionA simple example of Feature Selection and Model DriftThe Dow-Jones Industrial Average (DJIA) was first introduced by Charles Dow in 1896 and has since become one of the main references for stock market performance on the New York Stock Exchange. The value of the Dow is what is known as a price-weighted index, the average of the price of 30 well-known stocks. In order to be able to better evaluate our model, we’ll split our dataset into Training and Testing periods. We use the wonderful statsmodels Python package to perform an Ordinary Least Squares (OLS) Linear Regression fit. Despite all this, we were able to get excellent results, which is a testament to the power of linear models.
Slang Terms for Cryptocurrencies Everyone Should Know
       
Slang Terms for Cryptocurrencies Everyone Should KnowA cryptocurrency is a digital currency that may be traded without the involvement of a government or bank. On the other hand, cryptocurrencies are generated using cryptographic processes that allow users to purchase, sell, and exchange them safely. 7 Paper HandsAn investor with paper hands (also known as weak hands) sells their stock at the first hint of difficulty. Keep in mind that having paper hands isn’t always a terrible thing. 8 Satoshi (SAT)The smallest specified unit of Bitcoin cryptocurrency is the Satoshi.
What To Do When You Feel You Can’t Handle Your Life Anymore
       
When I feel like that I just don’t want to do anything and nothing can make me change my state. Serotonin: This one is a mood stabilizer that improves sleep, reduces anxiety, and increases happiness. The key is to find what helps you feel comfortable and better and start doing it when you feel that something is missing in your life. Something that has worked for me with this tip is that although I feel that it will not help me feel better, at least it entertains me. And this is the point; you don’t need anyone to help you see life more positively or to solve your problems.
The Great Resignation Is Coming To An End
       
The Great Resignation Is Coming To An EndThere’s no government magic to save itLicensed from Shutterstock // Trismegist sanIn March, 4.5 million people quit their jobs, amounting to an annualized total of 54 million, exceeding 2021’s total of 47 million people who gave their bosses the middle finger in the midst of the most significant labor crunch in generations.
Does The Buffalo Mass Shooting Indicate A Change In White Supremacy?
       
Does The Buffalo Mass Shooting Indicate A Change In White Supremacy? The face of extreme acts of hate and White supremacy has traded White, middle-aged men wearing sheets and pointed hats for armed, younger White males wearing camouflage.
There is a Major Problem With Modern Book Cover Design
       
There is a Major Problem With Modern Book Cover DesignSometimes, maybe you *should* judge a book by its cover. Traditionally, book covers have been the predominant form of marketing a book. In the 1950s, book cover design focused on illustrations done in a vintage style. Ultimately, a book cover should do two things: tell you what kind of book you’re going to be reading, and look good. Modern book design is experiencing another big issue, however.
How ADHD Contributes to Car Crashes, and Why it Matters
       
How ADHD Contributes to Car Crashes, and Why it MattersDriving while distractedImage by hoffmann-tipsntrips from PixabayDozens of studies confirm that those with ADHD are more likely to be involved in car accidents, including lethal ones. Any number of attributes of ADHD — impulsivity (“I need to get off at this exit now! So what if I’m in the left lane?”), poor sustained attention (“Highway driving is soooo boring!”)…
Tell a Story Instead of Just Showing the Data
       
Tell a Story Instead of Just Showing the DataBe a storyteller, not a number cruncher. Take a look at the following graph of the number of homicides and homicide rate per year in Baltimore, Maryland:What story do you see? (Data via the FBI Uniform Crime Report.) If you are not from Baltimore, you might wonder what is going on and why the number and rate of homicides jumps in 2015. You might not know of the daily shootings and almost daily…
Analytical Excellence Is All about Speed
       
Analytical Excellence Is All about SpeedCareer advice to guide data analysts to the topIn Data Science’s Most Misunderstood Hero, I described the excellences of each of the three areas in data science. Data pro vs amateur differences #4–#6Understanding the career; refusing to be a data charlatan; resistance to confirmation bias. Data pro vs amateur differences #7Realistic expectations of data. Data pro vs amateur differences #9Thinking differently about time. Data pro vs amateur differences #10Scroll to the top if you missed it.
The impossible scam of US drug plans
       
The impossible scam of US drug plansEven the guy who invented them can’t figure them out. US health insurance is a dismal swamp of scams and opacity, a system whose patient outcomes are in freefall and whose patient costs are screaming upwards on a line that it asymptotic to infinity. As bad as the whole health insurance system is, drug plans are worse.
Overcome the Pressure to Be Happy
       
Overcome the Pressure to Be HappyPhoto by Hybrid on UnsplashHappiness is hard work. You might find it in a satisfying career, keeping your body healthy, or connecting with other people, but all of these things require effort and energy. It’s important to pay attention to how your behavior is impacting your happiness so that you can adjust your efforts accordingly. But there’s a point at which this attention can go too far and turn into an unhealthy preoccupation.
rajini++: The Superstar Programming Language
       
rajini++: The Superstar Programming LanguageIntroducing rajini++, the superstar programming language! (Source)Introducing rajini++, an esoteric programming language based on the dialogues of superstar Rajinikanth. returning ix = 100.0 to mainValue returned from myfunc_one: 100.0Running rajini++ programsThe rajini++ programs are stored in .rpp files. Learn more about rajini++Learn the rajini++ Language: The rajini++ language documentation with syntax and examples can be found at the rajiniPP Wiki. The rajini++ Language Spec: The rajini++ commands and its equivalent in python3 can be found here.
Google AI Blog: Vector-Quantized Image Modeling with Improved VQGAN
       
In “Vector-Quantized Image Modeling with Improved VQGAN”, we propose a two-stage model that reconceives traditional image quantization techniques to yield improved performance on image generation and image understanding tasks. This approach, which we call Vector-quantized Image Modeling (VIM), can be used for both image generation and unsupervised image representation learning. We describe multiple improvements to the image quantizer and show that training a stronger image quantizer is a key component for improving both image generation and image understanding. To test the image understanding capabilities of VIM, we also fine-tune a linear projection layer to perform ImageNet classification, a standard benchmark for measuring image understanding abilities. ConclusionWe propose Vector-quantized Image Modeling (VIM), which pretrains a Transformer to predict image tokens autoregressively, where discrete image tokens are produced from improved ViT-VQGAN image quantizers.
DALL·E 2 Research Preview Update
       
Last month, we started previewing DALL·E 2 to a limited number of trusted users to learn about the technology’s capabilities and limitations. As of today:Our users have collectively created over 3 million images with DALL·E. We’ve enhanced our safety system, improving the text filters and tuning the automated detection & response system for content policy violations. Less than 0.05% of downloaded or publicly shared images were flagged as potentially violating our content policy. We’re inspired by what our users have created with DALL·E so far, and excited to see what new users will create.
Use Amazon Lex to capture street addresses
       
Solution overviewFor this example, we’ll use an Amazon Lex bot that provides self-service capabilities as part of an Amazon Connect contact flow. Solution architectureWe’ll use an Amazon Lex bot integrated with Amazon Connect in this solution. The following AWS Regions support Amazon Lex, Amazon Connect, and Amazon Location Service: US East (N. Virginia), US West (Oregon), Europe (Frankfurt), Asia Pacific (Singapore), Asia Pacific (Sydney) Region, and Asia Pacific (Tokyo). Enter the ARN (Amazon Resource Name) for the Amazon Connect instance that you’ll use for testing the solution. However, you can easily integrate Amazon Lex with the Amazon Location Service to look up the correct address, based on the customer’s input.
20 Open-Source Single Speaker Speech Datasets
       
20 Open-Source Single Speaker Speech Datasets A comprehensive open-source multi-lingual speech data Photo by Jason Rosewell on Unsplash Speech synthesis, also known as text-to-speech (TTS) is one of the new key technologies in the artificial intelligence domain. I have consolidated 20 open-source single speaker multi-lingual speech datasets which is available publicly. Jejueo Single Speaker Speech Datasets is part of the initiative by the Center for Jeju Studies. Nevertheless, this data provides a good start since you can barely find a publicly available single speaker speech datasets for Greek language. ║ Sample rate ║ Format ║ File size ║ License ║╠═══════════════╬═══════════╬═══════════╣═══════════════════════╣║ 22050 ║ wav ║ 1.11 GB ║╚═══════════════╩═══════════╩═══════════╩═══════════════════════╝ ╔═══════════════╦═══════════╦═══════════╦═══════════════════════╗╠═══════════════╬═══════════╬═══════════╣═══════════════════════╣║ 22050 ║ wav ║ 1.11 GB ║ CC BY 4.0 ╚═══════════════╩═══════════╩═══════════╩═══════════════════════╝
Causal Inference with Linear Regression: Endogeneity
       
In my previous article , we discussed some common issues when designing a linear regression — Omitting Important Variables and Including Irrelevant Variables. In this article, we’ll discuss Endogeneity in a linear regression model, especially in the context of Causal Inference . Endogeneity refers to situations in which a predictor (e.g., treatment variable) in a linear regression model is correlated to the error term . If the confounding variable Z is added in the linear regression model, then the affected predictor (e.g., treatment variable) would no longer be endogenous. Then in the linear regression with measurement error, the OLS estimator, β_hat is no longer unbiased.
Can AI and ML Save the Clean Transition Acceleration, or Just Keep the Light on During the Storms?
       
Can AI and ML Save the Clean Transition Acceleration, or Just Keep the Light on During the Storms? What we do provide, though, is the ability to examine examples, and create machine learning (ML) models based on the inputs and desired outputs. Renewable Energy’s Strong FundamentalsSince the 2019 Renewable Energy Industry Outlook, there are “Strong fundamentals bolstered by three enabling trends”. The integration of AI can help renewable energy suppliers expand the marketplace by introducing new service models and encouraging higher participation. In short, AI and ML can both save the Clean Transition acceleration AND keep the light on during the storms ahead.
Generalizing Your Model: An Example With EfficientNetV2 and Cats & Dogs
       
Let’s insert an image augmentation pipeline into the greater data pipeline prior to model training. We are able to decrease validation loss by 45% when increasing model complexity from 5.92E+6 params in the b0 baseline model to 1.18E+8 params in the L model. It makes intuitive sense that differentiating cats & dogs would benefit from more model complexity. With the help of cats & dogs we’ve explored an image augmentation pipeline to reduce overfitting. Even with the best model and augmentation pipeline, our model can still get some samples wrong.
SBERT vs. Data2vec on Text Classification
       
Hugging Face is the home for thousands of pre-trained models which have made great contributions to democratizing artificial intelligence through open source and open science. Today, I want to give you an end-to-end code demo to compare two of the most popular pre-trained models by conducting a multi-label text classification analysis. The second model is Data2vec, a powerful pre-trained model offered by the AI team from Meta (Facebook). Overall, after 5 tries, I can conclude that SBERT has a bit better performance in terms of best f1 score while Data2vec used way less memory. Text classification.
Train XGBoost Models in Amazon SageMaker in 4 Simple Steps
       
Train XGBoost Models in Amazon SageMaker in 4 Simple StepsHow to train & deploy XGBoost models as endpoints using SageMakerPhoto by Lala Azizli on UnsplashGetting started with Amazon SageMaker can be challenging as there are many tricks that AWS just expects you to know… In return once you get a handle on them, you can significantly speed up the deployment of your ML models without having to worry about Docker and setting up compute resources. Most tutorials are direct recitation of AWS documentation and not very applicable if you want to tailor your models to a realistic problem. Let’s build a simple XGBoost model that tells people whether they should get a Beagle or a German Shepherd based on how big their home is. For simplicity, we’ve set Beagle to be most suitable for homes smaller than 500 sq.ft and German Shepherd for those that are larger than 500 sq.ft. Sources: Left [1] & Right [2]Before we dive in, you might be wondering how much will this SageMaker learning cost me?
NeuralProphet: Forecasting Energy Demand
       
NeuralProphet: Forecasting Energy DemandThe gap between classical forecasting techniques and deep learning modelsNetwork Image — By JJyingIntroductionIn this article, use NeuralProphet (by Meta AI) to forecast energy demand. Forecasting energy demand is extremely important as the demand for electricity increases. The energy demand generally increases every year until June, when it then decreases for the rest of the year. We can see that energy demand is at its lowest in April and October, and energy demand is at its highest in July. Best Model Predictions — By AuthorModel Performance ComparisonIn the next cell, I am going to compare the NeuralProphet model with other common forecasting strategies.
Word2Vec with Time Series: A Transfer Learning Approach
       
Word2Vec with Time Series: A Transfer Learning ApproachLearn Meaningful Embedding Representations for Time SeriesPhoto by Shyam on UnsplashThe vector representation is a crucial concept in the machine learning ecosystem. With the rise of deep learning, obtaining a meaningful data representation with few assumptions and less effort is becoming a reality. The adoption of deep learning embedding representations in the NLP field was revolutionary. 2D embedding visualization for all the observations in every time series (image by the author)These visualizations prove the goodness of our approach. With low assumptions and few parameters to set, we can generate meaningful time series embedding.
How to do fast multiplication using the FFT
       
How to do fast multiplication using the FFTCurrent Deep Learning workflows rely on thousands of integer multiplications, therefore getting an efficient multiplication performance is critical nowadays. This article shows how to perform integer multiplications using the most-important signal discovery of the 20th century, the Fast Fourier Transform. Not only Deep Learning convolutions depend on integer multiplication, other scientific and computing applications, such as rendering fractal images at high magnification and public-key cryptography, rely on integer multiplication. Hence, the idea is to change from the coefficient representation to the value representation, perform the multiplication in a pairwise fashion, and transform back the value representation to the coefficient representation. Thus, the c vector can be also obtained as the Inverse Discrete Fourier Transform (IDFT) of this pairwise multiplication, c = IDFT(DFT(a)DFT(b)).
Intriguing Properties of Neural Networks
       
Intriguing Properties of Neural NetworksHow do Neural Nets Work? Two such properties are covered in ‘Intriguing properties of neural networks,’ which we’ll be discussing in this article. Consider a neural network classifier f that maps an input image x ∈ ℝᵐ to a set of labels {1 . Thus, we can frame a box-constrained optimization⁶ problem:Formal definition of the adversarial example generation problem — Szegedy et al. These examples are fed to different models (mentioned across columns), and the corresponding error induced is listed in the above table — Szegedy et al.
Make Your Matplotlib Plots Stand Out Using This Cheat Sheet
       
Make Your Matplotlib Plots Stand Out Using This Cheat SheetCheat sheet for editing the background, ticks, and annotations in MatplotlibImage by Hunter Harriet on UnsplashMatplotlib is the most extensive plotting library in python, arguably one of the most frequently used. If you’re like me and you often forget the precise code to format plots, this piece is written specifically for you. BackgroundOne of the easiest and simplest ways to make your graphs stand out is to change the default background. Smaller numbers should be rounded to the nearest decimal, nothing beyond 3 decimal places is ever really needed. Result of above code, Image by the author.
A Quiet Morning in America
       
A Quiet Morning in AmericaIt gets under your skinI pour myself another cup of coffee: two scoops into the Aeropress, a gentle pour of boiling water, a quick stir. I leave the plastic stirrer in the tube like a tombstone while the water percolates through the grounds. Quiet mornings are hard to come by.
The Unbearable Fraudulence of J.D. Vance
       
The Unbearable Fraudulence of J.D. VanceEven in a party with no shortage of heels, the Senate nominee’s slide from sympathetic storyteller to shameless sycophant is especially repugnant. There’s an old Washington saying that whenever a senator looks in the mirror, they see a president. If it’s true, then a good chunk of Senate…
A Note About the Attack on My Hometown
       
A Note About the Attack on My HometownBuffalo — 05.14.22Photo by Jonathan Rivera on Unsplash[*deep breath*]This isn’t an essay. This won’t be edited or even read before I post. I’m not offering the definitive take, nor a particularly novel one. I’m ruminating from a perspective I specifically have because of where specifically this happened.
Elon Musk, the Poop Emoji, and the Embarrassing Twitter Sale Pantomime
       
Elon Musk, the Poop Emoji, and the Embarrassing Twitter Sale PantomimeForget bot control; the world needs troll controlImage created by authorIn a strange turn of events, I find myself writing about the poop emoji for the second time this year. But, this is less about tongue-in-cheek product advertising and more about a multi-billion dollar maverick making everyone (himself included) look like absolute…
A Final Farewell to the iPod
       
A Final Farewell to the iPodThe device that saved Apple and made the company what it is today is now officially part of the company’s pastOne of the most memorable advertising campaigns that I can remember for a tech product came back in the mid-2000s. It was a commercial filled with colors and silhouettes of people dancing to music that was coming from their white headphones. The message was so simple but drove home the point that this…
The three-body problem in software development
       
The three-body problemTo be able to explain the three-body problem and how it could relate to software development, let me start by explaining the one-body problem. But if we add another massive object, and make the whole thing the three-body problem, things become unpredictable and chaotic. The three-body problem, for most initial conditions, does not have a general closed-form solution like one-body problem or two-body problem. Building up n-body problem in software developmentHow does the three-body problem relate to software development? On the other hand, the three-body problem seems to be an unsolvable problem.
Why We Should Read Fewer Books
       
Why We Should Read Fewer BooksOn re-reading those authors we’ve lovedIn September of 1914, the 25-year-old Ludwig Wittgenstein entered a bookshop in Turnov during his active duty in WWI and found only one book there. The book was Tolstoy’s Gospels in Brief. He bought it, read it, and re-read it during the war and the dreadful period of his imprisonment in Italy. He noted in his war journal: “I always carry [it]…
The 10-Step Guide to Better Sleep
       
The 10-Step Guide to Better SleepAny one of these can help you fall asleep faster and sleep more soundly. Together they create a virtuous 24-hour cycle. Image: PixabaySleep is not an isolated act of unconsciousness, but part of a 24-hour cycle programmed by evolution into all the systems of the brain and body. In writing dozens of articles on sleep science and healthy…
Google AI Blog: Contextual Rephrasing in Google Assistant
       
Conversation on a smart display device, where Assistant understands multiple contextual follow-up queries, allowing the user to have a more natural conversation. We demonstrate how Assistant is now able to rephrase follow-up queries, adding contextual information before providing an answer. High level architecture of Google Assistant contextual rephraser. Candidate ScoringWe extract a number of signals for each rephrasing candidate and use an ML model to select the most promising candidate. Example conversation on a phone where Assistant understands a sequence of contextual queries.
Customize pronunciation using lexicons in Amazon Polly
       
However, in some situations you may want to customize the way Amazon Polly pronounces a word. For such scenarios, Amazon Polly supports phonetic pronunciation, which you can use to achieve a pronunciation that is close to the correct pronunciation in the foreign language. Now, let’s look at how in such scenarios we can use phonetic pronunciation using SSML tag to customize the speech produced by Amazon Polly. A good practice is that after you test the custom pronunciation on the Amazon Polly console using the tag, you create a library of customized pronunciations using lexicons. Upload and apply the lexicon fileUpload your lexicon file to Amazon Polly using the following instructions:On the Amazon Polly console, choose Lexicons in the navigation pane.
Why is statistics important in Data Science, Machine learning, and Analytics
       
It also encompasses predictive analytics, in which data scientists employ a variety of machine learning or statistical algorithms. The Data Science LifecycleTo comprehend the role that statistics play in data science, you must first have a thorough understanding of the data science lifecycle. Advanced machine learning algorithms in data science utilize statistics to identify and convert data patterns into usable evidence. Problem-solvingIn addition to pure computations and fundamental data analysis, data scientists use applied statistics to relate abstract discoveries to real-world problems. Photo by charlesdeluvio on UnsplashA few important use cases:Statistics in data science, in its roots, find a structure and relations between various unstructured data.
An Introduction to Word2Vec in NLP
       
An Introduction to Word2Vec in NLPAn intuitive mathematical explanation to Word2VecPhoto by Sven Brandsma on UnsplashThe tailor showed her how to sew a button onto her jacket. The effectiveness of Word2Vec is due two reasons — One, the use of fixed size vectors which means the vector size does not depend on the number of unique words in the corpus. The main idea of Word2Vec revolves around predicting the context (outside) words based on a center word or vice versa in a fixed size window. CBOW predicts the probability of dream and true given the center word come and Skip-Gram predicts the center word come, given the context words dream and true. Obtaining vector representations of words using Word2Vec is a highly efficient because the vectors so formed are dense and carry sematic information which is crucial to any NLP application.
Why You Should STOP Using ‘OR’ in SQL Joins Right Now
       
Why You Should STOP Using ‘OR’ in SQL Joins Right Now Taking a “much-needed” step towards query run-time optimization Photo by Jose Aragones on Unsplash As per my experience, after Python, Structured Query Language (SQL) is the most sought-after tool leveraged by numerous Data Scientists in their Machine Learning/Data Science projects. Having said that, let’s move to why I brought you here and, of course, why I said that you should immediately stop using the conditional “OR” specifically in the SQL Joins. SQL query involving inner join (Image created by author using snappify.io) At first glance, I am sure you would think, what is wrong with this query, right? I conducted a similar experiment using a conditional “AND” instead of a conditional “OR”, and it was executed within a few minutes. Question 3: If the joins are usually optimized in SQL, why SQL still adopted the nested operation in Experiment 1?
Building simple Business Intelligence Project using Azure Synapse and Power BI
       
Building simple Business Intelligence Project using Azure Synapse and Power BI Spotify Data Analysis using Synapse and Power BI Photo by Mike Kononov on Unsplash In this article, I created a very simple end-end-end Business Intelligence (BI) project using Azure Synapse and Power BI. Image by Author Below are the steps followed in this project (please refer to above), Extracted data from Spotify API through Azure Synapse notebook (powered by Apache Spark) Transformed the data again using Synapse notebook Loaded the data into Azure Data Lake from the Synapse notebook Analysed the data using Synapse notebook Connected the data into Power BI from the data lake and built the Dashboard. Here, I copied the code from my Synapse notebook and pasted it into this article as there is no option for embedding the synapse notebook into the medium. Getting Track features Spotipy provides a sp.search() method for querying the track features. API provides three artist features Artist popularity , generes and followers .
Data Quality Comparison on AWS Glue and Great Expectations
       
ChallengesThe Provectus Data Engineering team was working on a data pipeline enabled by the Pandas engine. Initially, Pandas was enough, but as the amount of data grew, the team was confronted with the limitations of Pandas for handling Big Data. We ruled out changing our basic solution too much, because Pandas Profiling works only with Pandas, and we still had not tried using Great Expectations with Apache Spark. Instead, we decided to use spark_df_profiling, which is based on Pandas Profiling but supports Spark. Great Expectations also supports Spark, so the Data Test part was easy.
A Small Python Library For Marketing Mix Modeling: MaMiMo
       
I noticed that people are really interested in my articles about marketing mix modeling, and that’s why I have created a small present for you: a small library that helps you create simple marketing mix models yourself! In case you don’t know what marketing mix modeling is: imagine that you are in a company that sells stuff. Marketing mix modeling is a simple way to do that. We then defined some saturation and carryover transformations to conduct marketing mix modeling. from scipy.stats import uniform, randintfrom sklearn.model_selection import RandomizedSearchCV, TimeSeriesSplit tuned_model = RandomizedSearchCV(model,param_distributions={'adstock__tv_pipe__carryover__window': randint(1, 10),'adstock__tv_pipe__carryover__strength': uniform(0, 1),'adstock__tv_pipe__saturation__exponent': uniform(0, 1),'adstock__radio_pipe__carryover__window': randint(1, 10),'adstock__radio_pipe__carryover__strength': uniform(0, 1),'adstock__radio_pipe__saturation__exponent': uniform(0, 1),'adstock__banners_pipe__carryover__window': randint(1, 10),'adstock__banners_pipe__carryover__strength': uniform(0, 1),'adstock__banners_pipe__saturation__exponent': uniform(0,1),},cv=TimeSeriesSplit(),random_state=0,n_iter=100)
ML & Neuroscience: April 2022 must-reads
       
ML & Neuroscience: April 2022 must-readsThis month: Microsoft, INRI and IIIT tackle the brain-computer ? interface and Visio-linguistic Transformers to solve the brain encoding problem ? and the very first public graph neural network framework ?️ to explore brain structural and functional networks. We can conclude that multi-model Visio-linguistic Transformers outperform current ML methods employed to decypher the brain encoding problem. In particular, the authors present BrainGB, a unified, modular, scalable, and reproducible framework for brain network analysis with GNNs. Several studies have tried to predict brain disease by learning the brain network graph structure. unified, module, scalable and reproducible framework for using brain network analysis merged with graph neural network.
A Deep Dive into Curve Fitting for ML
       
A Deep Dive into Curve Fitting for MLCurve fitting is the problem that underlies all of machine learningPhoto by Osman Rana on UnsplashCurve fitting is one of the most theoretically challenging parts of machine learning, primarily due to how important it is to the end result. Add the Curse of Dimensionality into the mix, and curve fitting goes from possibly intuitive to impossibly inaccessible. There’s little content about how to think about curve fitting. As is often the case, these are easiest to visualize in two dimensions, but curve fitting often has to be done in more. There is no fitting problem to be had as, if f(x) is known, then it can be applied without any guessing.
Data Augmentations in Torchvision
       
Data Augmentations in Torchvision This blog aims to compare and familiarise with different data transformations techniques used by the research community Image by author. On the other hand, automated policies are optimized to get the highest validation accuracy for specific tasks without human interference. Manual augmentations There are over 30 different augmentations available in the torchvision.transforms module. Automatic augmentation AutoAugment AutoAugment Policy with 5 sub-policies. Essentially we train the network on different combinations of N and M and pick one with the best validation accuracy.
Resistance is the road to freedom
       
[This week I begin a tour of Austria and Germany to discuss my book Fascismus: Und Wie Mann Ihn Stoppt . Like my generation, I may have failed, but we have been protagonists: we have fought for ideals against overwhelming odds. This is not just a war of resistance by once country against aggression by another; and not just a war for ethnic and linguistic survival by Ukrainians against fascist-inspired Russian ethno-nationalism. Faced with the rise of two totalitarian nuclear states, are we prepared to cease actively supporting democratic oppositions within them? Resistance to Putin does not only mean manufacturing and supplying arms and ammunition to Ukraine for as long as its people want to resist.
Not All Stablecoins are Created Equal
       
Not All Stablecoins are Created EqualAssessing risk in the wake of the UST implosion. Image created by the author. I’ve been in the crypto space since 2014 and there have only been a handful of times where we had to post the suicide hotlines. Bitconnect and the 2018 collapse come to mind. And here we are once again.
How Often Should You Pee?
       
How Often Should You Pee? What ‘having to go’ says about your healthImage: PixabayWaking up once or twice during the night to pee is annoying, especially for me if it happens around dawn, when the odds of falling back asleep are near zero. But one or two nocturnal bathroom trips are normal. It’s also natural to urinate a lot during the day. Exactly how often is a wee bit unclear, however.
An overachiever’s guide to rest
       
And so here are five statements about overachievers, why we might be more prone to burnout, and why it’s incredibly hard for us to simply rest. One word that strikes a certain amount of fear and loathing in an overachiever’s heart is the phrase: group work. Doing quality work is central to a vibrant professional life; people know that your A-game is going to be amazing. It’s hard to justify rest when there’s just so much to be done to improve the quality of your work. But it can also become integrated into an overachiever’s core identity, to the individual’s own harm.
A quiet morning in America
       
A quiet morning in AmericaIt gets under your skinI pour myself another cup of coffee: two scoops into the Aeropress, a gentle pour of boiling water, a quick stir. I leave the plastic stirrer in the tube like a tombstone while the water percolates through the grounds. Quiet mornings are hard to come by.
How to Take a Break from Yourself
       
How to Take a Break from YourselfEscaping from your routines can help you escape from repetitive, unhelpful thoughtsPhoto: Usman Omar / UnsplashI don’t always find myself to be good company. Like a friend who only talks about his own hang-ups, my inner monologue can get pretty tiresome. Honestly, that again? Can’t we give it a rest?
We Won’t Fix the Baby Formula Shortage Until We Fix Capitalism
       
We Won’t Fix the Baby Formula Shortage Until We Fix CapitalismPhoto credit: JVGThe baby-formula shortage is just the latest in a string of economic troubles that are showing us all just how unjust an unregulated free market can be. We don’t actually want a free-for-all system, and we’re just waking up to that, crisis-by-crisis. I took this photo at my local grocery store in March 2020, when the entire nation was freaking out about…
The Marketplace of Ideas Requires Regulation Like Any Other Market
       
The Marketplace of Ideas Requires Regulation Like Any Other MarketAnd anyone who denies this hasn’t thought it throughImage: Just dance, Shutterstock, standard license, purchased by authorWhen I was a renter, I had a decidedly mixed experience with landlords. Some were excellent. Some, however, were awful, regularly failing to fix heating or AC units when they went out.
7 Dimensions to Evaluate an AI Environment
       
7 Dimensions to Evaluate an AI EnvironmentWhen evaluating an AI problem, there are several characteristics that can help evaluate an AI environment. Partially ObservableA fully observable AI environment has access to all required information to complete the target task. 3-Competitive vs. CollaborativeCompetitive AI environments face AI agents against each other in order to optimize a specific outcome. Dynamic AI environments often need to enable faster and more regular training of AI agents. Understanding an AI environment is one of the most challenging steps in any AI problem.
PyScript Is Ok-ish To Make Your Pages Interactive, but Only as a Last Resource if You Don’t Know Any Javascript
       
PyScript Is Ok-ish To Make Your Pages Interactive, but Only as a Last Resource if You Don’t Know Any JavascriptPyScript is too slow and heavy, and it doesn’t support all the features and libraries you may want to use. There’s been a lot of hype these days about the possibility of running Python code inside web pages thanks to PyScript, a web component that injects into your web page a series of HTML tags where you can… well… run Python code. A concrete example comparing PyScript vs. JavaScript hands-onLet’s study this concrete example, and let’s compare PyScript vs. a piece of code in JavaScript, the browser’s core programming language. So you go to an alternative, such as Bokeh, and you write a piece of Python code inside the PyScript tags injected into the HTML. As Thuwarakesh states in his article, the possibility of running Python code inside the web browser is useful for various reasons.
Things Data Scientists Say, Clever Or Not So Clever?
       
Things Data Scientists Say, Clever Or Not So Clever? And other silly thoughts about our workSaying a data scientist is not like a data analyst is beside the point. Read below for more silly things Data Scientists sometimes say. “A Data Scientist Is Different From A Data Analyst”Well, yes… and no. “I’m a scientist”Okay, a data scientist is a scientist — usually.
How are AI Projects Different
       
How are AI Projects DifferentGuide to AI software developmentMichael Dziedzic on UnsplashI am often asked by prospective clients to explain the artificial intelligence (AI) software process, and I have recently been asked by managers with extensive software development and data science experience who wanted to implement MLOps. Therefore, I thought the AI software process would be a good topic for discussion in an article. This article is intended as an outline of the key differences rather than a comprehensive discussion on the topic of the AI software process. Data quality: ensuring the data received in production is processed in the same way as the training data. [3] How to write better scientific code in Python[4] Considerations for Deploying Machine Learning Models in Production
Personalize your machine translation results by using fuzzy matching with Amazon Translate
       
Translators who enhance their workflow with machine translation capabilities such as Amazon Translate often expect fuzzy matching data to be used as part of the automated translation solution. In this post, you learn how to customize output from Amazon Translate according to translation memory fuzzy match quality scores. The first segment with 99% match quality isn’t machine translated, whereas the second segment is, because its match quality is below the defined threshold. Amazon Translate supports customization of machine translation using translation memory thanks to the parallel data feature. ConclusionIn this post, you learned how to customize your Amazon Translate translation jobs based on standard XLIFF fuzzy matching quality metrics.
Graph Convolutional Network for Time Series — An Intro
       
Graph Convolutional Network for Time Series — An IntroGraph convolutional network (GCN) is an absolute game-changer in the deep learning domain. The solution assumes each road segment’s traffic speed as a separate time series. The traffic network was defined as a graph and traffic speed as a signal on this graph. Keras.io Time-series traffic forecasting TutorialSummaryThe graph convolutional network is a very unique and revolutionary concept. [2] A paper named “T-GCN: A Temporal Graph Convolutional Network for Traffic Prediction” by Ling Zhao et al was published in IEEE, 2020.
Building Memory-Efficient Meta-Hybrid Recommender Engine: Back to Front (part 2)
       
Building Memory-Efficient Meta-Hybrid Recommender Engine: Back to Front (part 2)Image by authorSeries overviewIn the previous part, we reviewed the mechanics of a memory-based recommender system and built a custom collaborative filtering recommender. At RBC Group we are open to sharing our experience in designing so-called meta-hybrid recommender engines for real-business problem-solving. This suggests that the accuracy improvement of a collaborative filtering recommender involves:either adjusting “users’ neighbourhoods” i.e. This is why we depict such a meta-hybrid recommender as memory-efficient. Each time we will measure the distribution of precision and recall to compare results:Some outperforming of the meta-hybrid recommender is obvious.
How to Perform Feature Selection in a Data Science Project
       
How to Perform Feature Selection in a Data Science ProjectFour methods and a whole process for Feature Selection, with examples in PythonPhoto by Vladislav Babienko on UnsplashFeature selection is an essential part of a Data Science project. The answer to the second question leads us to features selection; in fact, you don’t want to have in your data frame meaningless features: it is a waste of computation. In this article, we will see four methods for features selection and I also will describe a process to follow for features selection (generally speaking, it is difficult you will just have to apply one of the following methods to do the job of features selection). Lasso RegressionIf you are working on a linear regression problem, another way to perform features selection is to apply the Lasso regularized model. This means that the Lasso Regression model also performs features selection.
Design Considerations of Model Deployment System
       
As the engineer tasked with building the model deployment pipeline, it’s critical to understand that this is a cross-functional effort and requires collaborating with multiple organizational stakeholders. In this article, we will dive into the key requirements of model deployment systems, coming from three stakeholders: the Data Science team, the DevOps team and the Product team. ? A mature model deployment system must be able to easily integrate with multiple ML frameworks and model development environments, giving data science teams the flexibility to choose any tools they found suitable for their model training needs. ? Depending on the use case, there may be different latency SLAs (Service Level Agreement) that the model serving system needs to take into consideration. A bit background: my name is Chaoyu Yang, I’m the creator of the open source model serving framework BentoML.
Implementing Various NLP Text Representation in Python
       
Implementing Various NLP Text Representation in PythonOne-Hot Encoding, Bag-of-Words, N-Grams, and TF-IDFImage By Amador Loureiro on UnsplashNatural language processing (NLP) is a subset of machine learning that deals with language and semantics. Bag-of-Word (BoW)The BoW text representation is similar to the OHE representation. For the document “blue sky blue sea”, the OHE representation for the word “blue” will return 1 and the BoW representation for the word “blue” will return 2. Similar to previous representations, each column of the TF-IDF representation is each word in the bag of words of the corpus. Term FrequencyInverse Document Frequency: The document frequency calculates how many documents contain the word t. The inverse document frequency divides N (total number of documents) by the document frequency.
6 Techniques to Power Feature Engineering in Your Next Machine Learning Project
       
6 Techniques to Power Feature Engineering in Your Next Machine Learning ProjectTechniques for creating new features from dataPhoto by Vishnu Mohanan on UnsplashWhat is Feature Engineering? Feature Engineering in machine learning is creating new features from existing data. Of all these steps, arguably the most important is the Feature Engineering step. Whether predicting the behavior of a user, or a machine, Feature Engineering is crucial to the success of your project. While numeric scaling of data isn’t feature engineering, it is important since many algorithms do not like data that isn’t scaled.
Model Prediction and Distribution with Spark
       
Model Prediction and Distribution with SparkHow to Implement and Distribute Machine Learning Models with Spark — A PySpark ImplementationApache Spark is a system that provides a cluster-based distributed computing environment with the help of its packages, including:SQL querying,streaming data processing,machine learning. pip install mlflowYou may import `spark` after installing it as the following script. from mlflow import sparkExecution of MLFlowYou may run the `start_run()` function after importing MLflow to activate MLflow in a Spark session. import mlflow from mlflow import sparkwith mlflow.start_run(): mlflow.spark.log_model(model, "sparkML-model")The corresponding model inferences can be occupied by using the `mlflow.pyfunc` function. Then, a Spark UDF can be generated by using the model path.
Towards Data Science
       
Instead of trying to find a linear approximation of a data set, it aims to find a linear classification to assign the data set to output classes. It basically finds the optimal line that separates two classes in N-dimensional spaceA data point is classified correctly if it is on the side of the line corresponding to its class. A better error function option when using the sigmoid function is the log loss error. With linear regression and logistic regression, we can now build models for all supervised machine learning problems! Keep an eye out for my next article which will step through an example of logistic regression for multi-class output.
Training in PyTorch from Amazon S3
       
Training in PyTorch from Amazon S3How to Maximize Data Throughput and Save MoneyPhoto by Guillaume Jaillet on UnsplashIn previous posts (such as here and here) we discussed different options for streaming data from Amazon S3 into a TensorFlow training session. In this post we revisit the topic of training from S3, this time with an emphasis on PyTorch training. Streaming Data from Amazon S3While streaming the data from Amazon S3 directly into the training loop may sound simple, if not designed well, it could become a bottleneck in your training pipeline. File Object DownloadA number of solutions for training from Amazon S3 involve explicitly downloading the data into the local training environment. Amazon S3 PyTorch Plug-in:Last year AWS announced the release of a dedicated library for pulling data from S3 into a PyTorch training environment.
Neural Sheaf Diffusion for deep learning on graphs
       
Any node classification task can be posed as finding the right sheaf on which the limit of sheaf diffusion is able to linearly separate node features. Example of node classification by sheaf diffusion on a synthetic heterophilic dataset with four node classes (colour-coded). Consistent with our theory, the sheaf diffusion is able to linearly separate the nodes in the limit. The SCN model can be seen as a discrete, parametric and non-linear version of the sheaf diffusion equation. A key question is whether and when SCNs tend to behave like sheaf diffusion to a certain extent.
How to make a Transformer for time series forecasting with PyTorch
       
How to make a Transformer for time series forecasting with PyTorchThis post will show you how to transform a time series Transformer architecture diagram into PyTorch code step by stepA transformer station. The encoder input layerImage by Wu, Green, Ben & O’Banion, 2020 [2] (my emphasis)The encoder input layer is simply implemented as an nn.Linear() layer. The encoder layers used by [2] are identical to those used by [4] on which the PyTorch Transformer library is based, so we can simply use PyTorch to create the encoder layers. The decoder input layerImage by Wu, Green, Ben & O’Banion, 2020 [2] (my emphasis)The decoder input layer is simply a linear layer, just like the encoder input layer. How to create src and trg for a time series transformer modelLet’s first take a closer look at how src and trg are made for a time series transformer model.
How Much is an Astronaut’s Time Worth?
       
How Much is an Astronaut’s Time Worth? Much of the International Space Station is available to rentPhoto by Brian McGowan on UnsplashThe Pay Transparency Movement is gaining momentum, leading millions of workers to wonder what their time is worth. Most pay transparency seekers have down-to-earth jobs. But what if your 9–5 involves soaring around the earth at 17,000 MPH on a spacecraft like the International Space Station (ISS)…
Black Dads, Black Daughters: Burdens and Bravery
       
me and my dadBlack Dads, Black Daughters: Burdens and BraveryWhen I publish something (anything) that talks about race, I brace myself. I know it’s coming. Not the thoughtful critiques, which I respect. But the ones that are so familiar to woman and people of color, the expletive-laden missives calling me the n-word, the b-word, the c-word (or a combination), wishing I’d die, wishing I’d never been born, etc., the ones rank with misogyny and racism, air-horning a…
Stop Acting Surprised When Anti-Black Domestic Terrorism Happens in America
       
Stop Acting Surprised When Anti-Black Domestic Terrorism Happens in Americaimage via WikipediaYesterday, an 18-year-old white terrorist supremacist, Payton Gendron, drove hours from his home in Conklin, New York to Buffalo, New York. He targeted Buffalo because the city had a population that was 85% Black. He arrived at the Tops Friendly Market at around 2:30 pm, heavily armed and dressed in tactical gear, and began opening fire at shoppers in the…
The TerraUSD Stablecoin Lunacy Explained
       
Photo by Tezos on UnsplashThe TerraUSD Stablecoin Lunacy ExplainedIf it sounds too good to be true, it probably is too good to be trueOne of the more ironic parts of the crypto space over the past few years has been stablecoins. Crypto is a bet on dollar (and other fiat currency) devaluation, yet stablecoins choose to peg and therefore derive their value from the dollar.
Say Their Names: Here are the Victims of the Buffalo Hate Crime Shooting
       
Say Their Names: Here are the Victims of the Buffalo Hate Crime ShootingIn yet another hate-filled, anti-Black, racist, mass shooting, Payton S. Gendron shot and killed ten Black New York residents who were shopping at a supermarket on a Saturday.
Dealing with Spring Allergy Symptoms
       
Dealing with Spring Allergy SymptomsTips and treatments you can do on your ownPhoto by Brittany Colette on UnsplashAllergy season is getting worse every year. Climate change, increased carbon dioxide emissions, and urban planning are major culprits. While it will take time (and policy change) to impact these causes, you can do more than just sit back and suffer. There are many ways to lessen your exposure to allergens and reduce your allergy symptoms on your own.
Help With Despair Over the State of the World
       
Help With Despair Over the State of the World25 Practical Tips From a Buddhist MonkPhoto by Rachel KrantzNearly everyone I talk to feels despair over the state of the world right now. If you’re a person who cares, you likely feel pervasive anxiety about the many, many, many important problems you can’t control or entirely solve. That leads to an understandable sense of overwhelm, existential dread, resignation, cynicism, and…
Why Cooperation Might Have Shrunk Our Brains
       
Why Cooperation Might Have Shrunk Our BrainsIt’s good thing, too! Collective intelligence FTWPortal 2Back in 2010, the game designer Matt Wood was working on a sequel to the hit title Portal, when he ran into a really interesting problem:The awesome power of two brains working as one.
Reckoning With Our Social Media Brain Drain
       
Reckoning With Our Social Media Brain DrainImagine all the inventive things Elon Musk (and you) could be doing instead of tweetingPhoto: visuals / UnsplashWhen I first heard that Elon Musk actually bought Twitter — irrespective of what comes of the deal, which is now undergoing further diligence— my immediate thought was: What a shame, I’d much rather he focus on taking carbon out of the…
Creating a Movie Rating Model Part 4: Creating a Full Model Training Pipeline
       
Creating a Movie Rating Model Part 4: Creating a Full Model Training PipelineFormalizing our feature engineering and model training code into a single pipelineTitle card created by the authorHello there friends! We are back with part 4 in our series of creating a movie rating model. Now we are ready to craft a formal model training pipeline. If this is the case, we can make use of a special Pipeline object offered by the Scikit-Learn library. Okay, with our ColumnTransformer all good to go, we are now ready to instantiate and make use of the Scikit-Learn Pipeline object.
Control the training of your neural network in Tensorflow with callbacks
       
Control the training of your neural network in Tensorflow with callbacksHow to use a callback to stop training at adequate performancePhoto by Erwan Hesry on UnsplashIn this article I will explain how to control the training of a neural network in Tensorflow through the use of callbacks. This is especially useful to log performance or to stop training if our performance metric reaches a certain threshold. Classification task with a deep neural networkWe will use a neural network with several layers to classify the clothings in the dataset. The best approach would be to use a convolutional neural network, but for this example a deep neural network will do just fine. Here’s how to set up a callback to control the training of a neural network!
History most certainly did not end
       
History most certainly did not endAuthor in Ukraine in 1991 (courtesy Dan Perry)When should a country exist? Euphoria was so palpable that one was left in no doubt: the politest way to put it was that people wanted independence from the Soviet Union. They were a tad suspicious of Ukrainian nationalism; but the Soviet Union had become so corrupt, decrepit and dysfunctional that they also were happy to be free. The end of history, in the artful words of U.S. historian Francis Fukuyama. The Soviet Union was in effect a formalization of such a thing.
A Data-Driven Exploration of UST’s Collapse
       
A Data-Driven Exploration of UST’s CollapseIntroductionTerraUSD’s spectacular rise ended in chaos and confusion this week following a dramatic depegging. UST’s mechanism has allowed it to grow extremely quickly because it does not need to be overcollateralized like Maker’s DAI stablecoin. While the Curve UST+3pool was in a vulnerable state with low liquidity, large UST sell orders began rolling in on Binance, including a $10mn sell order. Stablecoin order books all have a similar shape, typically in the form of two giant buy and sell walls. Far from being just Binance, UST liquidity evaporated on all major pairs on centralized exchanges.
About those kill-switched Ukrainian tractors
       
About those kill-switched Ukrainian tractorsWhat John Deere did to Russian looters, anyone can do to farmers, anywhere. Update: An earlier draft of this story describes the John Deere practice of requiring farmers to buy seed from Monsanto in order to access the data generated by their own plowing. While this was once the case (I had this arrangement described in detail to me by a…
Calculations in Tableau: A Road to Tableau Desktop Specialist Certification
       
Welcome to the eleventh chapter, In this piece, we are going to learn about Calculations in Tableau . if we are creating a calculated field on dimensions, the new field would represent a dimension and the same goes for a measure. Right-click on the measure and choose “Quick Table Calculation” or “Add Table Calculation”. Total table calculations supported by Tableau are:The scope of a table calculation could be:Table(Across)Source: Tableau DocumentationTable(Down)Source: Tableau DocumentationTable(Across then Down)Source: Tableau DocumentationTable(Down then Across)Source: Tableau DocumentationPane(Down)Source: Tableau DocumentationPane(Across then Down)Source: Tableau DocumentationPane(Down then Across)Source: Tableau DocumentationCellSource: Tableau DocumentationEditing & Removing a Table CalculationTo edit a Calculation, simply right-click and choose “Edit”. To remove a calculation, right click on the measure in the view(on which calculation is applied) and choose “Clear Table Calculation”.
To Really Get to Know a Place, You Have to Walk It
       
To Really Get to Know a Place, You Have to Walk ItEscaping our cars is necessary to developing an intimate knowledge of our world. Photo by Tyler Nix on UnsplashWe live in a car culture. I was reminded of this recently when a potential landlord in the small town of Moab, Utah raised doubts about my ability to make it here without an automobile. I pointed out that I had walked from one end of the town to…
Estimating Model Performance without Ground Truth
       
But how to estimate model performance in the absence of ground truth? Monitoring model performanceIt is crucial to have proper monitoring in place to be able to detect early signs of slumping model performance. We can estimate model performance without ground truth labels when the model is properly calibrated. Estimating model performance in the absence of ground truth data is tricky, but possible to accomplish. Confidence-Based Performance EstimationThe algorithm that allows us to estimate model performance in the absence of ground truth, developed by NannyML, an open-source library for post-deployment data science, is called Confidence-Based Performance Estimation, or CBPE.
Fake Reviews: Maybe You Should Be Worried About AI’s Writing (and Reading) Skills
       
Fake reviews have become a steady crutch for many companies relying on misleading information to hoard in sales. In a recent, rather troubling, study humans could detect fake reviews with a measly 55.36% success rate. Fake Reviews: Maybe You Should Be Worried About AI’s Writing (and Reading) SkillsAt a Quick Glance: Introductory PointersUnable to tell fact from fiction, consumers are already rightfully full of doubts. Key Takeaways➊ The practice of fake reviews can result in a decline of overall faith in even authentic reviews. ➍ Fake review detection models and tools to address these issues come in many flavors of strategies.
Deepmind’s New Model Gato Is Amazing!
       
Deepmind’s New Model Gato Is Amazing! Image from Deepmind’s paper. Image from Deepmind’s paper. Gato is the first generalist model that performs so well on so many different tasks, and it’s extremely promising for the field. Tokenization is when you prepare your inputs for the model, as they do not understand text or images by themselves.
Detecting Road Damages From Image And Video
       
Detecting Road Damages From Image And VideoFrom training a yolov5 model for object recognition to hosting on streamlitModel detecting road defects from videoMotivationImagine that you work in an industry where a lot of image data is generated and there are certain objects in the image that you are interested in. Files arranged to images and labels folderThe annotations fileLet us now take a closer look at the annotations files. It can be applied to images as well as videos, so we download a test image and a test video. python yolov5/detect.py — source data/test/ — weights yolov5/runs/train/RoadTrainModel4/weights/best.pt — conf 0.25 — name RoadTestModel— source is the location where test images and videos are located. You just upload image of any road and it gives the areas where the road is damaged.
Gentle Introduction to Statistics for Machine Learning
       
Gentle Introduction to Statistics for Machine LearningPhoto by Edge2Edge Media on UnsplashWHAT IS STATISTICS? Statistics is a field that has been existing for a long time now and it’s also a must-know field for every data scientist. Machine learning is the ability of computers to learn patterns from data and make predictions. It used data science techniques to analyze the data and the field of statistics is one of the subsets of data science. A lot of techniques used in machine learning are made possible through the field of statistics.
Enhance the caller experience with hints in Amazon Lex
       
Amazon Lex now supports a hints capability to enhance the recognition of relevant phrases in a conversation. Solution overviewLet’s review the overall architecture for the solution (see the following diagram):We use an Amazon Lex bot integrated with an Amazon Connect contact flow to deliver the conversational experience. This creates an Amazon Lex bot called BankingBot , and one slot type ( accountNumber ). In the Amazon Lex section, select your Amazon Lex bot and make it available for use in the Amazon Connect contact flow. The capability is available in all AWS Regions where Amazon Lex operates in the English (Australia), English (UK), and English (US) locales.
How to Make Animated and Racing Bar Plots in Python
       
Photo by Cullan Smith on UnsplashHow to Make Animated and Racing Bar Plots in PythonComplete Working CodeBar plot is pretty basic and very common. All the plotting libraries have bar plot options for sure. This article will focus on the animated bar plot. I will share the code for some animated bar plots. I should warn you that it takes a lot longer to render these plots than regular bar plots.
Fine-Tuning for Domain Adaptation in NLP
       
Domain adaptation, it’s when we fine-tune a pre-trained model on a new dataset, and it gives predictions that are more adapted to that dataset. Fine-tuning in NLP refers to the procedure of re-training a pre-trained language model using your own custom data. Image By AuthorIn our case we will fine-tune using a masked language model task (MLM). Train and save your custom modelPerplexity EvaluationIs the custom model you created really better than the source model? First of all create a personal account on Hugging Face, and then run the following commands.
Learnt Harmonic Mean Estimator for Bayesian Model Selection
       
Re-targeted harmonic mean estimatorA re-targeted harmonic mean estimator was introduced by Gelfand & Dey in 1994 [8] to address this issue. In the table below we present the model evidence values computed for this problem by the original harmonic mean estimator and our learnt harmonic mean estimator. Bayesian model evidence values computed for the Normal-Gamma benchmark problem by the original harmonic mean estimator and our learnt harmonic mean estimator. Harmonic codeThe learnt harmonic mean estimator is implemented in the harmonic software package, which is open source and publicly available. In this article we review harmonic mean estimators for computing the model evidence, including our recently proposed learnt harmonic mean estimator.
7 Lessons I’ve Learnt From Deploying Machine Learning Models Using ONNX
       
The ONNX Runtime is a simple API that is cross-platform and provides optimal performance to run inference on an ONNX model exactly where you need them: the cloud, mobile, an IoT device, you name it! The serverless-plugin-optimize plugin significantly decreases the Serverless package size. ? Lesson 7: Scaling Serverless apps efficientlyFinding the best balance between performance and cost is a crucial aspect of running Serverless apps at scale. In our use case, we are generating predictions for 10,000s products per AWS Lambda invocation. At Bazaarvoice, we are championing these technologies by delivering artificial intelligence solutions using ONNX on a global scale.
How To Detect Outliers in a Data Science Project
       
How To Detect Outliers in a Data Science ProjectThree methods to detect outliers, with examples in PythonPhoto by Will Myers on UnsplashAt the beginning of a Data Science project, one important part is outlier detection. So, an outlier is data that has a value too high or too low with respect to the other data we are analyzing. Do not consider what “latitude” and “mean production” refers to, just look at the data: which points would you consider outliers? Isolation ForestLet’s understand what Isolation Forest is, quoting from Wikipedia:Isolation forest is an anomaly detection algorithm. This means that in an Isolation Forest we have randomly sub-sampled data that are processed in a tree structure based, on randomly selected features.
What is Average Precision in Object Detection & Localization Algorithms and how to calculate it?
       
What is Average Precision in Object Detection & Localization Algorithms and how to calculate it? A step-by-step visual guide to understanding the mean average precision for object detection and localization algorithmsWhat is Object Detection and Localization? Evaluation MetricsThe performance of the object detection and localization algorithm is evaluated by a metric called Average Precision (AP) (and mean average precision). i.eMean Average Precision — The mean of Average Precision (AP) across all the k classes — Image by AuthorSummaryMean average precision (mAP) quantifies the performance of object detection and localization algorithm. In order to understand mAP, we need to understand what IoU, True Positive, True Positive, False Positive, False Negative, Recall, Precision, and the precision-recall curve are.
Fundamentals of Matrix Algebra with Python | Part 1
       
Fundamentals of Matrix Algebra with Python | Part 1 Understanding and implementing basic matrix algebra concepts and operations with PythonIntroduction Matrix algebra is fundamental to many complex and prominent areas of research and development in engineering and computer science. Photo by Aron Van de Pol on Unsplash This article introduces some basic concepts of matrix algebra with some Python code to illustrate the results. Figure 1 — Example m×n Matrices (Image By Author) Use the Python code in Gist 1 to create these arrays using Numpy. Figure 5 — Transpose of a Matrix (Image By Author) Gist 5 is a naive Python implementation to transpose a matrix. Gist 5 — Naive Python Code to Transpose an mxn Matrix This algorithm runs on a moderately sized matrix, m = 6000, n = 2500 , in ≈8.49 seconds, which is millennia in computer time.
Open Pretrained Transformer (OPT) Is a Milestone for Addressing Accessibility
       
Open Pretrained Transformer (OPT) Is a Milestone for Addressing AccessibilityOPT in GPT-3 OutImage by Gerd Altman from PixabayOn May 3rd 2022, Meta AI announced a new large language model (LLM) Open Pretrained Transformer (OPT-175B). (Note: Joelle Pineau is a Co-Managing Director at Facebook AI Research and an Associate Professor at McGill University. She had a role in making OPT accessible.) Meta AI team has paid attention to make the OPT model publicly accessible. In this post, we wanted to share our first impression on the accessibility aspect of the OPT language model.
Text Summarization with GPT2 and Layer AI
       
This project focuses on fine tuning GPT2 model to perform text summarization on the public Amanzon reviews dataset…I have included code in this article where it is most instructive. Then we will use Layer to fetch the pre-trained version of GPT2 to fine tune it for summarization purposes. Using Hugging’s Face transformers library and Layer ai to fine tune GPT2 for text summarizationGPT-2 generates synthetic text samples in response to the model being primed with an arbitrary input. Customers create a text review and a title for it when they write a review on Amazon. The training part includes building and uploading the GPT2 model to Layer.
The Senate’s Message on Abortion: We Won’t Do Anything to Save It
       
The Senate’s Message on Abortion: We Won’t Do Anything to Save ItAbortion rights need congressional protection. The Senate just told those in fear of forced births to fend for themselves. Photo by Chad Stembridge on UnsplashChuck Schumer made clear that with Wednesday’s vote on The Women’s Health Protection Act, which would create federal protections for providing and…
Use These Key-Commands To Whip Through Gmail
       
Use These Key-Commands To Whip Through GmailExperience the zen-like flow-state of almost never touching your trackpadHey, all you Gmail users? I’m about to make your lives every so slightly more delightful. (Impatient? Just wanna see the key commands? Skip to the cheat sheet halfway down this…
Period Tracking, Abortions, and Privacy
       
Period Tracking, Abortions, and PrivacyOur reproductive freedoms and our devices are at riskPhoto: Josefin / UnsplashThe last few weeks have been incredibly demoralizing, draining, and upsetting — particularly if you care at all about reproductive health and reproductive justice. The tumult kicked off with Politico publishing a leaked Supreme Court majority opinion draft earlier this month to dismantle Roe vs Wade.
Peloton Is Spinning Out of Control
       
Peloton Is Spinning Out of ControlThe company’s stock price just hit an all-time lowImage: PelotonMotivation That Moves You. On May 9, Peloton unveiled a brand refresh, including a new campaign featuring its most popular instructors and a first-ever tagline. While the tagline will do little to motivate users into fitness — a constant human struggle that takes more…
History most certainly did not end
       
History most certainly did not endAuthor in Ukraine in 1991 (courtesy Dan Perry)When should a country exist? Euphoria was so palpable that one was left in no doubt: the politest way to put it was that people wanted independence from the Soviet Union. They were a tad suspicious of Ukrainian nationalism; but the Soviet Union had become so corrupt, decrepit and dysfunctional that they also were happy to be free. The end of history, in the artful words of U.S. historian Francis Fukuyama. The Soviet Union was in effect a formalization of such a thing.
How Literacy Created Civilisation, Part I: In the Beginning Was the Word
       
How Literacy Created Civilisation, Part I: In the Beginning Was the WordThis couple from Pompeii were keen to show off their education. The critical factor that powered the leap from our hunting-gathering state of nature to civilisation is generally considered to be agriculture. Cuneiform, the first writing system that arose in Mesopotamia around 3200 B.C., began basically as a system of tally marks pressed into clay tablets. In Northern Europe, the cycle of literacy and economic and cultural advancement started to whir in a virtuous spiral. Literacy enabled a growing middle class of managers, accountants, and lawyers to distribute goods and services in an exponentially more sophisticated and profitable market.
The Art of Avatar Creation
       
The Art of Avatar CreationWhat drives players of games like “Elden Ring” to share screenshots of their avatars? My respondents were split on the question of whether they considered a user-designed game avatar as being “them” or a separate character. Same with something like Elden Ring.” Cole said something similar: “I think of my avatar as a separate character in almost every game I play. (This speaks to a broader issue with game character creation systems, particularly as they pertain to representations of nonwhite people.) As character creation engines become even more robust, inclusive, and imaginative, the potential for customization will only expand.
Why Congress is Out of Touch
       
Why Congress is Out of TouchAnd how politicians from Abraham Lincoln to Katie Porter manage(d) to be differentRep. Katie Porter (D-CA) said something important this week. “Too often, Congress recognizes issues too late,” she told Sarah Ferris of Politico after giving an emotional speech to her fellow Democratic House members about how inflation was affecting her family. Porter is a…
This Uninhabitable Bay Area Fixer-Upper Is $661,500
       
My 50s, as well, are already looking like another decade spent renting, barring any given family inheritance I’ve yet to grow aware of. When I see listings like this $661,500 ad on the real estate listing website Redfin for a three-bedroom, one-bathroom home that isn’t even remotely habitable, it further cements my premeditated goal into my working-class subconscious. Located at 911 Mclaughlin Street in Richmond, California, the single-story domiciles look like absolute shit from the outside. Photo: Courtesy of MLS But inside the house built in 1942 exists the most perplexing, mind-boggling dichotomous. Photo: Courtesy of MLS Also: an attached one-car garage has been converted — but it’s not written in the listing what it’s been changed into, which reads a bit strange.
Fake Reviews: Maybe You Should Be Worried About AI’s Writing (and Reading) Skills
       
Fake reviews have become a steady crutch for many companies relying on misleading information to hoard in sales. In a recent, rather troubling, study humans could detect fake reviews with a measly 55.36% success rate. Fake Reviews: Maybe You Should Be Worried About AI’s Writing (and Reading) SkillsAt a Quick Glance: Introductory PointersUnable to tell fact from fiction, consumers are already rightfully full of doubts. Key Takeaways➊ The practice of fake reviews can result in a decline of overall faith in even authentic reviews. ➍ Fake review detection models and tools to address these issues come in many flavors of strategies.
Google AI Blog: Challenges in Multi-objective Optimization for Automatic Wireless Network Planning
       
Multi-objective Optimization via Local SearchCombinatorial optimization remains a difficult task, so we created a domain-specific local search algorithm to optimize network quality. The local search algorithmic paradigm is widely applied to address computationally-hard optimization problems. To evaluate the quality of a candidate network, we combine the different objective functions into a single one, as described in the following section. Three candidate local search moves. Using combinatorial optimization in concert with geospatial and radio propagation modeling, we built a scalable auto-planner for wireless telecommunication networks.
Moderate, classify, and process documents using Amazon Rekognition and Amazon Textract
       
We show how you can use Amazon Rekognition and Amazon Textract to optimize and reduce human efforts in processing documents. Amazon Rekognition identifies moderation labels in your document and classify them using Amazon Rekognition Custom Labels. Classify documents into different categories such as W-2s, invoices, bank statements, and pay stubs using Rekognition Custom Labels. Training pipelineBefore we deploy this architecture, we train a custom model to classify documents into different categories using Rekognition Custom Labels. For more information, see the Amazon Rekognition Custom Labels guide, Amazon Rekognition developer guide and Amazon Textract developer guide.
Intelligently search your Jira projects with Amazon Kendra Jira cloud connector
       
You can now use the Amazon Kendra Jira cloud connector to index issues, comments, and attachments in your Jira projects, and search this content using Amazon Kendra intelligent search, powered by machine learning (ML). This post shows how to use the Amazon Kendra Jira cloud connector to configure a Jira cloud instance as a data source for an Amazon Kendra index, and intelligently search the contents of the projects in it. In our solution, we configure a Jira cloud instance as a data source to an Amazon Kendra search index using the Amazon Kendra Jira connector. ConclusionWith the Amazon Kendra Jira connector, your organization can make invaluable knowledge in your Jira projects available to your users securely using intelligent search powered by Amazon Kendra. To learn more about the Amazon Kendra Jira connector, refer to the Amazon Kendra Jira connector section of the Amazon Kendra Developer Guide.
The Intel®3D Athlete Tracking (3DAT) scalable architecture deploys pose estimation models using Amazon Kinesis Data Streams and Amazon EKS
       
The creation of a user group requires a project ID, pipeline parameter set ID, user group name, and user group description. The creation of a video requires a job ID, video path, video results path, video progress percentage, and video status. This API requires a user ID, project ID, pipeline ID, pipeline parameter set ID, job parameters, and job status. POST create_ job Inserts a new job record with user ID, project ID, pipeline ID, pipeline parameter set ID, job results path, job parameters, and job status. The user ID, project ID, pipeline ID, pipeline parameter set ID, job results path, job parameters, and job status are required for job creation.
Image classification and object detection using Amazon Rekognition Custom Labels and Amazon SageMaker JumpStart
       
Rekognition Custom Labels abstracts away the complexity involved in building a custom model. In the search bar, enter Rekognition Custom Labels and choose the Rekognition Custom Labels for Vision notebook. We encourage you learn more about Rekognition Custom Labels and try it out with your business-specific datasets. To get started, you can navigate to the Rekognition Custom Labels example notebook in SageMaker JumpStart. About the AuthorsPashmeen Mistry is the Senior Product Manager for Amazon Rekognition Custom Labels.
Run automatic model tuning with Amazon SageMaker JumpStart
       
In this post, we demonstrate how to run automatic model tuning with JumpStart. With SageMaker automatic model tuning, ML engineers and data scientists can offload the time-consuming task of optimizing their model and let SageMaker run the experimentation. In this post, we showed the value of running automatic model tuning on a JumpStart pre-trained model using SageMaker APIs. For more details on how to optimize a JumpStart model with automatic model tuning, refer to our example notebook. Dr. Ashish Khetan is a Senior Applied Scientist with Amazon SageMaker JumpStart and Amazon SageMaker built-in algorithms and helps develop machine learning algorithms.
Open Pretrained Transformer (OPT) Is a Milestone for Addressing Accessibility
       
Open Pretrained Transformer (OPT) Is a Milestone for Addressing AccessibilityOPT in GPT-3 OutImage by Gerd Altman from PixabayOn May 3rd 2022, Meta AI announced a new large language model (LLM) Open Pretrained Transformer (OPT-175B). (Note: Joelle Pineau is a Co-Managing Director at Facebook AI Research and an Associate Professor at McGill University. Meta AI team has paid attention to make the OPT model publicly accessible. OPT team (including the OPT paper authors) are active and reply quickly on Github issues. In this post, we wanted to share our first impression on the accessibility aspect of the OPT language model.
Managing Molecules for Computational Chemistry with Python
       
Managing Molecules for Computational Chemistry with PythonFrom “SMILES” to 3DCredits: Maria Gulyaeva, royalty free from pexels.comSmiles in Chemistry… no, we are not going to talk about the one in the pictures above. Briefly, computational chemistry is a branch of chemistry that uses computer tools to assist in complex chemical investigation like drug design. Most molecule editors can import SMILES strings and convert them back into two-dimensional drawings (despite molecules are in 3D in reality). Given its versatility with Python library like RDKit and free tools like PyMol, we describe here the .mol file format. If you want to know more on the details of this structure, check this “anatomy of a MOL file” by Robert Belford.
How to install Miniconda x86_64 & Apple M1 side by side on Mac Book M1
       
How to install Miniconda x86_64 & Apple M1 side by side on Mac Book M1 Miniconda x86_64 & Miniconda Apple M1 side by side on Mac Book M1 PC: Author If you are a Python Developer/ ML Engineer/ Data Scientist who uses Apple’s Mac M1 for your work, you might know the pain ? of not having arm64 distribution of your project dependencies. But luckily now Apple M1 is officially supported by Anaconda, you can download and install Anaconda3/Miniconda3 for your Mac with apple’s silicon chip. How to have Miniconda3 x86_64 & Miniconda3 Apple M1 side by side, is it even possible? Next, we need to install Miniconda Apple M1 in the same fashion except for a few changes. "${conda_path_intel}/etc/profile.d/conda.sh"elseexport PATH="${conda_path_intel}/bin:$PATH"fifiunset __conda_setup# <<< conda initialize <<<} Here, conda_path_m1 is where I installed Miniconda for Apple M1is where I installed Miniconda for Apple M1 conda_path_intel is where I installed Miniconda for x86_64 replace the paths accordingly, based on where you have installed Miniconda Apple M1 & x86_64.
How to make a Transformer for time series forecasting with PyTorch
       
How to make a Transformer for time series forecasting with PyTorchThis post will show you how to transform a time series Transformer architecture diagram into PyTorch code step by stepA transformer station. In this post, you will learn how to code a transformer architecture for time series forecasting in PyTorch. The encoder input layerImage by Wu, Green, Ben & O’Banion, 2020 [2] (my emphasis)The encoder input layer is simply implemented as an nn.Linear() layer. The decoder input layerImage by Wu, Green, Ben & O’Banion, 2020 [2] (my emphasis)The decoder input layer is simply a linear layer, just like the encoder input layer. How to create src and trg for a time series transformer modelLet’s first take a closer look at how src and trg are made for a time series transformer model.
3 Not-So-Common Pandas Tricks You Should Know
       
3 Not-So-Common Pandas Tricks You Should Know Making the most out of Pandas Photo by Joshua Chun on Unsplash If you are reading this article, you must have heard of or used Pandas. To_period We use dates with many different intervals or periods such as day, week, month, quarter, and so on. df["class_cum_sum"] = df.groupby("class")["amount"].cumsum() Let’s confirm the results on class A. df[df["class"]=="A"].head() (image by author) The class cumulative sum column contains the cumulative sum values calculated separately for each class. df.dtypes # outputdate datetime64[ns]class objectamount int64month period[M]quarter period[Q-DEC]cumulative_sum int64class_cum_sum int64 Pandas also has a “category” data type which consumes much less memory than the object data type. df.memory_usage() # outputIndex 128date 800class 800amount 800month 800quarter 800cumulative_sum 800class_cum_sum 800class_category 304dtype: int64 The class_category column consumes less than half of the memory consumption of the class column.
Beginner’s Guide to Gradient Descent
       
Beginner’s Guide to Gradient DescentEverything you need to know about Gradient Descent MethodPhoto by Ales Krivec on UnsplashThe gradient descent method is a solution guide for optimization problems with the help of which one can find the minimum or maximum of a function. This method is used in the field of machine learning for training models and is known there as the gradient descent method. In the field of artificial intelligence, the so-called gradient descent method is most frequently used. Determination of the Starting Point: If we want to use the gradient descent method, we need a starting point. Inserting the Starting Point: Now we insert our starting point into the gradient:4.
Challenges Data Scientists face everyday
       
Photo by Boitumelo Phetla on UnsplashChallenges Data Scientists face everydayData science and machine learning are popular terms right now on the internet and the trend is growing. Furthermore, the salaries of data scientists and ML engineers is also increasing further with good compensation and stock benefits. Highlighted below are some of the challenges that data scientists face during their work along with a few tips and strategies about tackling them. ConclusionAll in all, we’ve seen how machine learning could be used and the challenges that are associated in the machine learning workflow. Taking a look at these challenges, data scientists can ensure that they have the right tools and resources to tackle them and give valuable insights to the companies.
New preprint describes a novel parameter-free geometric transformer of atomic coordinates to predict biological interfaces in proteins
       
AI after AlphaFoldNew preprint describes a novel parameter-free geometric transformer of atomic coordinates to predict biological interfaces in proteinsAnd it runs so fast that it can even scan large ensembles of protein structures to search interaction-prone amino acids. But protein structures have several levels of complexity. complexes between multiple proteins or between proteins and other biological macromolecules such as nucleic acids (DNA and RNA) or with membranes, ions, small molecules, etc. And if we consider the other kinds of interactions that proteins can establish, AlphaFold is out of the game. Modeling these other interactions is the next step on the road to modeling biological structures, interactions, and functions at atomic level, and there are many groups who have been working on this for years.
Generation of a synthetic microbial dataset with deep learning style transfer
       
Generation of a synthetic microbial dataset with deep learning style transferEfficient strategy to generate annotated synthetic dataset for training deep learning detectorsWritten by Jarosław Pawłowski and Sylwia Majchrowska. A generated dataset is then used to train deep learning object detectors in a fully supervised fashion. The goal is to generate synthetic images with microbial colonies that will be later used to train deep learning detection and segmentation models. We transfer the style to a given raw patch from one of the selected real images that serve as style carriers. References[1] https://blogs.nvidia.com/blog/2021/06/08/what-is-synthetic-data[2] J. Pawłowski, S. Majchrowska, and T. Golan, Generation of microbial colonies dataset with deep learning style transfer, Scientific Reports 12, 5212 (2022).
Build Your First Recommendation Engine
       
Build Your First Recommendation EngineWhat are the different types of recommendation algorithms? We will also build our first recommendation engine together using the Singapore second-hand car market data. Classification model-based recommendation: Based on user activity or available data we have, we can predict whether or not the user will like/purchase the item. Case Study on sgCarMartLet us build a recommendation engine together on content-based filtering and more specifically on pairwise item similarity. In this case, I will fill in missing numerical data by the median of the column and categorical data simply by ‘NA’.
Why Is Your Data Science Resume Not Getting You Interviews?
       
Why Is Your Data Science Resume Not Getting You Interviews? So many companies and organizations want to hire data scientists, and so many people are coming onto the job market claiming to have data science skills. Having screened thousands of data science resumés, I can tell you that if you are out there looking for a new data science role, there are a few things you can do to make life easier for yourself. This of course means that you need to tend to your data science garden regularly. It’s also a good idea to do this because the data science community loves open-source activities and sharing of work.
I Can’t Wait for Meta’s Next Virtual (And Augmented) Reality Headset and How It Might Change the Future of Work and Education
       
Meta poses its upcoming headset as THE new tool for work, which will “replace your laptop” as they said. Notably, Meta’s new device is intended mainly for work use cases, as opposed to gaming and entertainment which we are usually used to in the field of VR and AR. In fact, whispers say that apps for Oculus Quest 2 won’t even run on the new device. Having augmented reality is a must, I think, for the headsets to be useful for work, and probably more so for education so that students don’t get “alienated” from a purely virtual world and disconnected from the class. I hope I can try it out soon and that it meets my expectations, and perhaps surprises me with more.
Time Series Forecasting with ARIMA Models In Python [Part 2]
       
Time Series Forecasting with ARIMA Models In Python [Part 2] A practical guide for time series forecasting using ARIMA models in Python Time series data is one of the most common data types in the industry and you will probably be working with it in your career. This series will consist of 9 articles: Manipulating Time Series Data In Python Pandas [A Practical Guide] Time Series Analysis in Python Pandas [A Practical Guide] Visualizing Time Series Data in Python [A practical Guide] Time Series Forecasting with ARIMA Models In Python [Part 1] Time Series Forecasting with ARIMA Models In Python [Part 2] (You are here!) Machine Learning for Time Series Data [A practical Guide] Deep Learning for Time Series Data [A practical Guide] Time Series Forecasting project using statistical analysis, machine learning & deep learning. Seasonal ARIMA Models Introduction to seasonal time seriesSeasonal ARIMA modelProcess automation and model savingSARIMA and Box-Jenkins for seasonal time series 3. Seasonal ARIMA Models In the final section, we will discuss how to use seasonal ARIMA models to fit more complex data.
Manipulate Images with Blobs! BlobGAN Explained
       
Manipulate Images with Blobs! BlobGAN ExplainedA GAN model that uses simple blobs to manipulate objects in images…Originally published on louisbouchard.ai, read it 2 days before on my blog! BlobGAN allows for unreal manipulation of images, making super easily controlling simple blobs. As the authors shared in their results, you can even create novel images by duplicating blobs, creating unseen images in the dataset like a room with two ceiling fans! The title says it all, BlobGAN uses blobs to disentangle objects in a scene.
Apple Is Discontinuing The iPod After 20 Years Of Service
       
APPLE IPODApple Is Discontinuing The iPod After 20 Years Of ServiceThe current iPod Touch will be the last iPod that Apple will ever make — for now at leastPhoto by insung yoon on UnsplashWell, it’s finally happening. Apple is officially going to discontinue the production of the iPod after 20 years of providing easily accessible music on-to-go to millions and…
Finland, NATO and the left
       
Finnish PM and President announce historic moveFinland, NATO and the leftLeft Alliance leader confronts hard choices facing socialistsFinland’s decision to begin the process of joining NATO is a huge moment of change for Europe, and a moment of maturity for the left. The move is not only led by the social-democratic prime minister, Sanna Marin, but has seen Li Andersson, the leader of the radical left, overcome opposition within her own party.
On Beauty: White Hot, White Noise, And Are Beauty Standards Really Better Now?
       
On Beauty: White Hot, White Noise, And Are Beauty Standards Really Better Now? Photo by Jessica Felicio on UnsplashSo, I recently watched the Netflix documentary, White Hot. It’s about mall retail chain, Abercrombie & Fitch, and it’s deplorable hiring practices back in the early 2000’s to around 2013. In the end, the store had to atone for its hiring practices, of only hiring the young, white, and attractive, and the provocative images it used…
What Science Will Lose to an Abortion Ban
       
What Science Will Lose to an Abortion BanConsequences will impact IVF procedures, emergency room medicine, and advances in cell-cloning technologyLate one night, a young patient was admitted to the ER in terrible pain. She felt it in her abdomen and chest, and soon had difficulty breathing. She grew pale — then paler. Luckily, an ER doctor sounded the alarm.
Why “Defund the Police” Got Bad Press But “Defunding Public Schools” Didn’t
       
EDUCATION + POLICINGWhy “Defund the Police” Got Bad Press But “Defunding Public Schools” Didn’tUnpacking the hypocrisy one bad-faith talking point at a timePhoto by MChe Lee on UnsplashThe phrase “defund the police” has become a lasting reminder of Congress’s colossal failure to address criminal justice reform. Even though the majority of Americans agreed…
Why Siri Sounds A Little Off…
       
In this sense, Siri picks up speech much like a human, by hearing sounds and recognizing the differences between sounds. How Siri SpeaksAs discussed, much of the consumer appeal of Siri comes from the fact that Siri is able to speak back to us. Afterwards, Siri selects certain sounds from its database and stitches them together to form the speech which is produced. We know that this speech sounds weird to us because it doesn’t reflect the manner in which humans speak. “How Apple Finally Made Siri Sound More Human.” Wired, Conde Nast, 7 Sept. 2017, https://www.wired.com/story/how-apple-finally-made-siri-sound-more-human/.
No, You Don’t Actually Want Every Fetus to be Born
       
No, You Don’t Actually Want Every Fetus to be BornThe hypocrisy of enforcing pregnancy in a country where many people don’t even qualify to be egg donors. Photo by Gayatri Malhotra on UnsplashYou don’t actually want every embryo or fetus to have a chance at life. Just look at the requirements we set for egg donors.
Miracle in the Air: Air Traffic Controllers Guide Passenger to Land Plane Safely
       
Miracle in the Air: Air Traffic Controllers Guide Passenger to Land Plane SafelyClockwise from top left: Photo 1: Controller Robert Morgan, left, with the passenger he helped land a single-engine Cessna safely after an unusual in-flight emergency. The passengers had no flying experience, and what unfolded thereafter was truly remarkable thanks to a team of air traffic controllers. Joshua Somers, operations supervisor at Palm Beach air traffic control facility, rushed to provide help in tracking it. Flores advised the passenger to change his radio frequency to Palm Beach air traffic control, but the passenger did not know how to change frequencies. Flores reassured the passenger that a controller at the Palm Beach air traffic facility would help him.
Google AI Blog: Language Models Perform Reasoning via Chain of Thought
       
In “Chain of Thought Prompting Elicits Reasoning in Large Language Models,” we explore a prompting method for improving the reasoning abilities of language models. With chain of thought prompting, language models of sufficient scale (~100B parameters) can solve complex reasoning problems that are not solvable with standard prompting methods. We evaluate both the LaMDA collection of language models ranging from 422M to 137B parameters, as well as the PaLM collection of language models ranging from 8B to 540B parameters. ConclusionsChain of thought prompting is a simple and broadly applicable method for improving the ability of language models to perform various reasoning tasks. Broadening the range of reasoning tasks that language models can perform will hopefully inspire further work on language-based approaches to reasoning.
Google AI Blog: Unlocking Zero-Resource Machine Translation to Support New Languages in Google Translate
       
There are two key bottlenecks towards building functioning translation models for the long tail of languages. Both of these challenges need to be addressed for translation models to reach sufficient quality. The amount of monolingual data per language versus the amount of parallel (translated) data per language. A small number of languages have large amounts of parallel data, but there is a long tail of languages with only monolingual data. Our additional innovation is to use the same special tokens for both the monolingual MASS task and the translation task.
Achieve in-vehicle comfort using personalized machine learning and Amazon SageMaker
       
Once again, the personalized model beat the baselines (see the following table), reinforcing the conclusion that the personalized model is best. Model MSE(lower is better) Non-personalized baseline 60.885 Personalized baseline 69.902 Non-personalized model 24.823 Personalized model 18.059ConclusionIn this post, we demonstrated how to apply machine learning to achieve personalized in-vehicle thermal comfort. Yifu Hu is an Applied Scientist in the Amazon Machine Learning Solutions lab, where he helps design creative ML solutions to address customers’ business problems in various industries. Jennifer Zhu is an Applied Scientist from the Amazon AI Machine Learning Solutions Lab. Ivan Sosnovik is an Applied Scientist in the Amazon Machine Learning Solutions Lab.
Estimating the Performance of an ML Model in the Absence of Ground Truth
       
What makes detecting such failures especially challenging is the fact that the ground truth information can be absent or delayed. Predicting the performance of an ML model without ground truthNow we will explain how we can estimate the performance of a model, even when we do not have the ground truth (targets). In contrast to, for example, Kaggle competitions, we do not always have the ground truth to calculate the performance of our model. How to do itConfidence-based Performance Estimation (CBPE) is an algorithm allowing us to estimate the model’s performance in the absence of ground truth. Using Confidence-based Performance Estimation we can estimate the performance of the model even when we do not have access to (reliable) ground truth.
Now You Can Use Python to Build Client-Side Applications
       
You still have to import it inside py-script, the following py-script tag. Py-script tag is where your Python script lives. An excellent alternative is to link our python file to the py-script tag instead of writing it inside the HTML block. In the following file, we’ve added a separate div to render the output of our script file. Also, instead of writing the Python script directly in HTML, we write it on a separate regular Python file.
Imputing Missing Data with Simple and Advanced Techniques
       
In this article, we will see how to impute (replace) missing data with simple and advanced techniques. Later, we will explore advanced multivariate techniques and learn how to impute missing values using machine learning with KNN and MICE. Depending on its volume, missing data can harm the findings of any data analysis or the robustness of machine learning models. In the matrix view below, we can see the missing values with blank lines and not missing values with black lines. As we know AvgSpeed column doesn’t have missing values, and we replaced missing values in the MaxSpeed column with column mean.
Time Series Forecasting with ARIMA , SARIMA and SARIMAX
       
Autoregressive Component — AR(p)The autoregressive component of the ARIMA model is represented by AR(p), with the p parameter determining the number of lagged series that we use. ARIMAARIMA Formula — By AuthorThe ARIMA model is an ARMA model yet with a preprocessing step included in the model that we represent using I(d). So, an ARIMA model is simply an ARMA model on the differenced time series. SARIMA, ARIMAX, SARIMAX ModelsThe ARIMA model is great, but to include seasonality and exogenous variables in the model can be extremely powerful. Since the ARIMA model assumes that the time series is stationary, we need to use a different model.
Exploring the ML Tooling Landscape (Part 2 of 3)
       
Exploring the ML Tooling Landscape (Part 2 of 3)Current ML tooling and adoptionPhoto by Possessed Photography on UnsplashIn the previous blog post in this series, we examined overall machine learning (ML) maturity in industry with a specific focus on machine learning operations (MLOps). In this blog post, we will consider the implications for tooling adoption in industry and the wider ML tooling market. The Lay of the LandIt is by no means an overstatement to talk of a crowded ML tooling landscape. However, we appear to be entering a new phase of ML tooling and ML/AI more generally. Wrap-upIn this blog post, we continued from our previous discussion of ML maturity in industry showing the link between the generally low level of sophisticated ML adoption and both the number of and incompleteness of ML tooling offerings.
What Are the Most Important Preprocessing Steps in Machine Learning and Data Science?
       
What Are the Most Important Preprocessing Steps in Machine Learning and Data Science? Data Science and Machine Learning has been the latest talk right now and companies are looking for data scientists and machine learning engineers to handle their data and make significant contributions to them. OutliersDatasets often contain outliers or data points whose values are quite far from what was actually expected in our data. BinningIt is important to address the outliers present in the data that impact the performance of machine learning models. While clustering can be used for unsupervised machine learning, it can also be used for supervised machine learning as well.
Spotlighting: A Visual Approach to Precise Clustering Interpretation
       
Spotlighting: A Visual Approach to Precise Clustering InterpretationOn spotlights, radar charts, and how to make sense of your clustersCluster interpretation (image by author)Understanding the meaning of clusters is maybe more important than making the clusters. The visual approach described here uses two visual techniques — radar chart and spotlight. The color of the groups corresponds to the clusters — red, green, and blue. Till now, we have a meaning associated with each cluster, such as the red cluster is for a small-sized car. Spotlighting the red clusterSpotlight for red cluster (image by.
Using Python to Find Outliers With IQR: A How-To Guide
       
Using Python to Find Outliers With IQR: A How-To GuideHere’s how to find (and remove) outliers in your data set using IQRImage Source: AuthorEvery data set has issues, or points that don’t make sense. Including outliers in your data analysis skews your data set and negatively impacts the results of your analysis. Let’s talk about how to do that using IQR (interquartile ranges). You can see they’re quite close to 95 percent and five percent of the upper range of the data set which, in a non-normal data set, is what we expect. When using the IQR to remove outliers you remove all points that lie outside the range defined by the quartiles +/- 1.5 * IQR.
Build an Event-Driven Neural Style Transfer Application Using AWS Lambda
       
In this blog post, we’ll see how by building a neural style transfer application using Flyte and AWS Lambda. Our neural style transfer application will leverage the “event-driven feature” of AWS Lambda. Let’s look at how we could stitch the pipeline automation and event-driven service together using Flyte and AWS Lambda. An overview of the application (Image by Author)Application CodeNeural style transfer is applying the style of the style image onto the content image. The preprocess_img task downloads the content and style image files, and resizes them using the load_img function.
Image Processing: Trivialised. Deep learning, for everything, WHY?
       
Photo by Jem Sahagun on UnsplashThe significance of image processing is diminishing, and it is replaced by deep learning for many tasks such as image classification and object detection. For some, I would agree that deep learning provides far better results than image processing. A similar scenario occurs when deep learning is employed in place of image processing. This raised self-doubt about my decision to implement image processing as the majority of them implemented the convolutional neural networks for image classification. Positive Images from Concrete Crack Images for ClassificationNegative Images from Concrete Crack Images for ClassificationI have created an image processing algorithm using OpenCV in the next step.
Labeling And Visualizing Images For Object Detection
       
Labeling And Visualizing Images For Object DetectionThe classic deep learning for computer vision example project starts out with a dataset containing images, and labels. However, when computer vision is needed for solving business problems, data is usually unlabeled and labeling the data is itself a challenge. This article walks through labeling images at scale and associated challengesImage from Google Maps with Annotated Bounding Boxes | Skanda VivekThe classic example of a deep learning computer vision project starts out with a dataset containing images, and labels. It turns out that there are multiple image labeling service providers. Look out for more blogs that talk about other key aspects of business focused end-to-end deep learning!
Time Series Forecasting with ARIMA Models In Python [Part 1]
       
Time Series Forecasting with ARIMA Models In Python [Part 1] A practical guide for time series forecasting using ARIMA models in Python Time series data is one of the most common data types in the industry and you will probably be working with it in your career. This series will consist of 9 articles: Manipulating Time Series Data In Python Pandas [A Practical Guide] Time Series Analysis in Python Pandas [A Practical Guide] Visualizing Time Series Data in Python [A practical Guide] Time Series Forecasting with ARIMA Models In Python [Part 1](You are here!) Time Series Forecasting with ARIMA Models In Python [Part 2] Machine Learning for Time Series Data [A practical Guide] Deep Learning for Time Series Data [A practical Guide] Time Series Forecasting project using statistical analysis, machine learning & deep learning. ARMA Models We will start with a small introduction to stationarity and how this is important for ARMA models. Fitting time series models We had a quick look at fitting time series models in the last section but let’s have a closer look.
How To Approximate the Results of Your Sample Set (Empirical Rule vs. Chebyshev’s Formula)
       
How To Approximate the Results of Your Sample Set (Empirical Rule vs. Chebyshev’s Formula)The empirical rule is a powerful instrument to capture the distribution of your observations within the dataset. Empirical Rule Says:68% of the observations lie within 1 standard deviation range from the mean. 95% of the observations lie within a 2 standard deviation range from the mean. 99.7% of the observations lie within a 3 standard deviation range from the mean. We cannot be confident with the empirical rule.
My Amazon Data Scientist Interview Questions — and Answers!
       
My Amazon Data Scientist Interview Questions — and Answers! I have been at Amazon as a Data Scientist for over 6 months now and this is what my interview process was like. Screening interview with HRThe first interview I had was over the phone and was nearly 30 minutes long. Think of advantages and disadvantages of your model - Remember hyper-parameter optimization - Final variable you'll be tracking (basically result)3. When I was preparing for it myself, I honestly couldn’t find a single article that encompassed everything I needed to know regarding the interview process.
An In-Depth Tutorial on the F-Score For NER
       
The F-Score is a very useful metric for classification algorithms, balancing between False Positives (through Precision ) and False Negatives (through Recall ). NER as a Classification TaskIn NER, the goal is to predict which elements in a piece of text are ‘Named Entities’, e.g. A good NER model would be able to identify the named entities as shown below:Note that the non-NER words are represented as ‘Other’. Now in a standard classification task, we would simply take the argmax of the last dimension, and thus get the predicted label for each word. This is because in NER we are not concerned with how well each individual token is labeled, but instead, we are interested in how well each sequence of NER labels is labeled.
What? Your Assets at Coinbase are Not Safe?
       
Your Assets at Coinbase are Not Safe? The coming crypto crash may take down more than the tokens, but this is actually good for the blockchainPhoto by PiggyBank on UnsplashRetail crypto investors, already reeling from recent losses, were additionally shocked this week to learn that should Coinbase or one of the other exchanges go bankrupt, the assets in their accounts may not be safe…
The Trouble With Being Ageless
       
The Trouble With Being AgelessPhoto by Rod Long on UnsplashA student was perusing my classroom “wall of fame” the other day, her eyes skipping rapidly over the photos of the many students I have taught throughout my career. My younger self appears in many of the images, standing beside proud graduates, a museum of the many versions of my own past selves. Hair long, then short, then long again. Face a bit fuller and then slightly narrowed. Eyes progressively more crinkled at the edges.
What it’s like to live in Kyiv during the war
       
What it’s like to live in Kyiv during the warIt’s been over two months since Russia started its brutal military invasion of Ukraine — and, for a few weeks now, I’ve been living in Kyiv. I’ve written before on this war’s broader context — but in this post I’d like to give a sense of the daily life during the war. Also, the city of Kyiv itself hasn’t suffered from Russian occupation, unlike some of the towns around Kyiv like Bucha. 3) Although daily life in Kyiv is now relatively stable, there are still visible signs of the war raging and the martial law being in place. On some irrational level, getting rid of masks was a way for me to bring some joy into otherwise pretty grim daily life.
Eliminating Federal Abortion Protections Could Render the U.S. Government Illegitimate
       
Eliminating Federal Abortion Protections Could Render the U.S. Government IllegitimateIf the Supreme Court overturns Roe Vs. Wade and Planned Parenthood Vs. Casey, it could lead to a legitimacy crisis unlike anything we’ve seen. This isn’t politics — it’s math.
What To Do If You Get Covid
       
Dr. Robin’s Covid-19 UpdatesWhat To Do If You Get CovidVersion Three Point Uh-Oh! Photo by Medakit Ltd on UnsplashOmicron case numbers from the very contagious BA.2 and BA.3 variants — and soon BA.4 and BA.5 — continue to move upwards, now climbing in over forty states. — -The good news: Vaccines/boosters no longer are perfect at stopping us from getting Covid but they are doing a great job at…
Walter Tull: From Footballer to Soldier
       
Walter Tull: From Footballer to SoldierOne hundred years ago today pioneering black footballer and British Army officer Walter Tull died. In recent years the story of Walter Tull has thankfully become much more widely known. On both the fields of football and battle Walter Tull was a pioneer. It was this that caused his eye to fall on the struggling Walter Tull and see an opportunity for both Tull and Northampton Town to benefit. Walter Tull was a soldier, a leader, a hero and a bloody good footballer.
The kids are not ok
       
Today I went to give a climate talk at my old high school in Geneva — and was given a masterclass in our failings. I went, with a friend, racing on our bikes from school to school to school, as many as we could reach during the morning. My old high school does not look anything like this. The grown-ups (and their grown-ups) know they are hurting and harming the youth and they are still doing it. No wonder the high school students were muttering while I was pontificating to them about emissions and degrees of warming and impacts.
The Assassination of Amber Heard
       
The Assassination of Amber HeardAmber Heard in Fairfax County Circuit Court, April 2022When I saw the first news article about Johnny Depp’s abuse allegations, I quickly and intentionally scrolled past it. Texts from Stephen Deuters to Amber Heard in 2014. Amber Heard never claimed to be a perfect victim. Amber Heard after being granted a restraining order, May 2016. Despite now being a court-proven victim, Amber continued to be harassed, this time on a larger scale and with more ammo to attack her with.
Create video subtitles with Amazon Transcribe using this no-code workflow
       
This post walks you through setting up a no-code workflow for creating video subtitles using Amazon Transcribe within your Amazon Web Services account. Solution overviewThis post walks through a no-code workflow for generating subtitles using Amazon Simple Storage Service (Amazon S3) and Amazon Transcribe. If you prefer a video walkthrough, refer to the Amazon Transcribe video snacks episode Creating video subtitles without writing any code. Before you get started, review the Amazon Transcribe and Amazon S3 pricing pages for service pricing. Create a transcription jobWith the input file ready in Amazon S3, we now create a transcription job in Amazon Transcribe.
Understand Bias and Variance in Causal Inference with Linear Regression
       
Understand Bias and Variance in Causal Inference with Linear RegressionDiscussion of omitted variables, confounding variables, irrelevant variables, and multicollinearityImage by AuthorIn my previous article, Causal Inference: Econometric Models vs. A/B Testing, we discuss how to use an econometric model, namely, linear regression to investigate the causal relationship between the treatment variable and the response variable while controlling other covariates. In this article, we’ll discuss some common issues when designing a linear regression — Omitting Important Variables and Including Irrelevant Variables. If a treatment effect from linear regression is biased, it means we have an inaccurate causal effect. If a treatment effect from linear regression is biased, it means we have an causal effect. The normality assumption of the error term is optional for a linear regression model, but is recommended for the task of causal inference.
Understanding l1 and l2 Regularization
       
Understanding l1 and l2 RegularizationAn overview of regularization in the Linear Regression model. The linear regression formula is:The linear regression formula. When overfitting occurs in linear regression, we can try to regularize our linear model; Regularization is the most used technique to penalize complex models in machine learning: it avoids overfitting by penalizing the regression coefficients that have high values. L1 Regularization, or Lasso RegularizationLasso (Least Absolute and Selection Operator) regression performs an L1 regularization, which adds a penalty equal to the absolute value of the magnitude of the coefficients, as we can see in the image above in the blue rectangle (lambda is the regularization parameter). ConclusionsAs we have seen, regularization has to be performed when we have problems with the overfitting of our model.
Using AI to make my own Smart Assistant App
       
Applied Machine LearningUsing AI to make my own Smart Assistant AppSmart Assistant App made using Python and GPT-3Image by maxuser in ShutterboxSmart assistants are becoming more and more popular, with products like Apple’s Siri, Amazon’s Alexa, and Google Home. When making a smart assistant, I want to make a web app with a Graphical User Interface where I can interact with the smart assistant using my voice. MethodIn order to make the smart assistant, I use OpenAI’s GPT-3. The smart assistant can take more complicated queries:Image by AuthorThe smart assistant has no issues working with basic geometry or with acronyms. The bot is also contextually aware of modern companies:Image by AuthorConclusionsIn this article, I walk through how I made a smart assistant app.
Unhappy about your time series anomalies? Synthesize them!
       
Unhappy about your time series anomalies? Generate your own multivariate time series datasets with realistic anomaliesPhoto by Maxim Berg on UnsplashIf you’ve worked with anomaly detection problems on time series data, you may have searched for annotated datasets that include relevant anomalies. And you may have struggled in this search, especially when looking for multivariate time series data suitable for researching industrial IoT use cases. Here is the function I use to generate random walks: my time series starts with an initial value ( start ) and then I randomly add a quantity ( step ). Here is the associated code that produced this plot:Now I will use this function to add two random level shifts to three signals selected at random.
Advancing Machine Intelligence: Why Context Is Everything
       
Advancing Machine Intelligence: Why Context Is EverythingImage credit: REDPIXEL via Adobe Stock. This blog will discuss the significance of context in ML, and how late binding context could raise the bar on machine enlightenment. Building Context Into Machine LearningSo how could it be possible to incorporate and leverage late binding context in machine learning, at scale? However, the next level of machine intelligence will require significant advances in incorporating the ability to dynamically comprehend and apply the multiple facets of late-binding context. When considered within the scope of highly-aware, in-the moment interactive AI, context is everything.
The Curse of Dimensionality; More is not always better!
       
The Curse of DimensionalityWith the advent of technology, from Alexa to Autonomous cars, everything is fueled by Data-driven technology. Intuitively, we assume that more data means the models can learn better and more insights can be driven. This is generally mentioned as the Curse of Dimensionality! The curse of dimensionality is a curse in life as well. Credits: The Technological Aspects of the Curse of Dimensionality I learned from the lectures by Dr Iain Styles & Dr Kashif Rajpoot, faculty of Computer Science, University of Birmingham.
A Super-Fast Way to Loop in Python
       
A Super-Fast Way to Loop in PythonDo you think Python is slow? Although it’s a fact that Python is slower than other languages, there are some ways to speed up our Python code. If we write code that consumes little memory and storage, not only we’ll get the job done, but also make our Python code run faster. Here’s a fast and also a super-fast way to loop in Python that I learned in one of the Python courses I took (we never stop learning!). A faster way to loop using built-in functionsA faster way to loop in Python is using built-in functions.
A new tool for explainable AI
       
A new tool for explainable AIExplaining models trained in Julia, Python and R through counterfactualsTurning a 9 (nine) into a 4 (four). Explainable AI typically involves models that are not inherently interpretable but require additional tools to be explainable to humans. Counterfactuals for image data ?To introduce counterfactual explanations I used a simple binary classification problem in my previous post. This time we are going to step it up a notch: we will generate counterfactual explanations MNIST data. “Generating Interpretable Counterfactual Explanations by Implicit Minimisation of Epistemic and Aleatoric Uncertainties.” In International Conference on Artificial Intelligence and Statistics, 1756–64.
Escape Fantasies of the Tech Billionaires
       
Escape Fantasies of the Tech BillionairesMy Medium piece Survival of the Richest has grown into a whole bookWe always knew but now we know. The tech elite mean to leave us all behind. I learned this when I traveled to a remote resort to deliver what was supposed to be a talk for a group of tech investors. It turned out to be something of a “consult” to five…
It’s time we fix the unethical design of cookie consent windows
       
DESIGN PRINCIPLESIt’s time we fix the unethical design of cookie consent windowsHow can we design an ethical and transparent cookie consent window instead of forcing users to accept all cookies? Now, let’s examine the cookie consent window designs of two companies (N26 and Revolut) and learn about the design techniques they use. As soon as we enter N26’s site, the cookie consent window opens and we cannot use the site. Likewise, Revolut shows the cookie consent window when we enter the site and we cannot use the site. However, if you are going to use it, you must ethically design the cookie consent window, and be transparent to the users.
Why Every Main Street Looks The Same
       
Why Every Main Street Looks The SameHow Municipal Policy & Financial Incentives turned Main Street Into Chain StreetA retail strip with three well known chains that could be anywhere in the US. Source: Marcus & MillichapWalk down the Main Street of any city or town today and you might notice something off. It’s not that the roads are too wide (though they are), or that the sidewalks are too narrow (though they are, too), or even that there’s not much to walk towards (though this is…
Your Git Commit History Should Read Like a History Book. Here’s How.
       
Your Git Commit History Should Read Like a History Book. We must answer two questions to improve our commit history: What makes a good commit message? Use git hooks to enforce Conventional Commits You can change the behavior of git with git hooks. For example, this hook checks if the commit message is at least ten characters long: A simple commit message hook. > git commit -am "abc"[Commit message] abcThe commit message is to short!
Four Steps to Organizational Change Without the Drama
       
As a leader of an engineering organization in a rapidly growing scale-up, I’ve been through and led my fair share of organizational change. When things change around us, we engage in sensemaking to orient ourselves to the new situation. We try to understand what this change means to us and how we should act. But, when things make sense…When organizational change makes sense, it’s because it’s expected. If you let too much time pass between any of the steps then information will leak and rumors will spread.
Standing Tall in the Era of iPosture
       
Standing Tall in the Era of iPostureHow technology is reshaping our bodies and what we can do about itPhoto by KAL VISUALS on UnsplashAccording to the Pew Research Center, it is estimated there are over 5 billion people who own mobile phones. With over half of the world’s population engaging with electronic devices, the spotlight has been on what technology can do for us, rather than what it is doing to us. Consequently…
When Your Body Saves Your Life
       
The BodyWhen Your Body Saves Your LifeI was seventeen years old the second time someone pointed a loaded gun at my face. Of course, it was the police, of course I hadn’t broken any laws or even committed a misdemeanor, of course it was because my skin is black. Of course, it was because I was in the wrong place at the wrong time and that wrong place in the alley behind my apartment building and that wrong time was Los Angeles, California April 29, 1992 4:48 in the afternoon.
As Roe Goes, Why Organized Minorities Beat Disorganized Majorities
       
As Roe Goes, Why Organized Minorities Beat Disorganized MajoritiesUntil the Left invests in local orgs that matter in people’s daily lives and scaling that up, it’s likely to keep losing to the RightRally against Supreme Court nominee Brett Kavanaugh outside the Supreme Court, Washington DC, 2018Amid the deluge of reporting and commentary on the leak of the Supreme Court’s pending ruling overturning the right to abortion…
4 Ways to Find Meaning in Life
       
4 Ways to Find Meaning in LifePhoto by Greg Rakozy on Unsplash“What’s the meaning of life?” is a cliche philosophical question, but it touches on something fundamental about how humans relate to the world around them. People want to know that there’s significance to their lives, but not necessarily in any grandiose sense. Most of us just want to feel that there’s value in getting up and being active each day. We search for signs that our existence is a net good in the world, even if only on a…
We’re In For A Decade Of Generational Strife. Here’s How To Navigate It.
       
We’re In For A Decade Of Generational Strife. Here’s How To Navigate It. The physicist Max Planck made many historic breakthroughs, including a discovery that led to quantum theory. Still, he lamented that “A new scientific truth does not triumph by convincing its opponents and making them see the light, but rather because its opponents eventually die, and a new generation grows up that is familiar with it.”
Ukraine War, 9 May 2022
       
Ukraine War, 9 May 2022Good morning everybody! Further east, the 92nd Mech is pushing on Lyptsi and Bairak, defended by a BTG of the 61st Naval Infantry Brigade. Behind it, the 227th Battalion TD and the Sheikh Mansur Battalion (Chechens fighting for Ukraine) are mopping up and securing villages like Vekhnii Saltiv, Zamulivka etc., and collecting lots of prisoners. This was destroyed — but that distracted the Ukrainians away from two other pontoon bridges, constructed further downstream, marked with 1. I’f I’m to ask, this is now the decisive battle of this, second phase of this war (or, if you prefer: the Russian ‘Plan G’).
Web Scraping Top Movies With Python and Selenium
       
Web Scraping Top Movies With Python and SeleniumHow to scrape a website and create a dataset. Is Web Scraping Legal? Some websites allow web scraping and some prohibit access to their websites. from selenium import webdriverfrom selenium.webdriver.common.keys import Keysfrom selenium.webdriver.common.by import Byimport pandas as pdInstall the Web Driver# Install the chrome web driver from selenium. Selenium web driver offers a variety of locater functions to locate elements on the web page.
Understanding K-means Clustering: Hands-On with SciKit-Learn
       
Understanding K-means Clustering: Hands-On with SciKit-LearnUsing Python and Google ColabK-means is an unsupervised machine learning algorithm that tries to find cluster centers that can aggregate certain data points that are closer to any feature or group of features. How does the K-Means algorithm work? Define the centroids (or the initial means)Now the algorithm chooses k-random points as centroids and assigns the remaining data points to the closest centroid. Update centroids mean and re-assign observationsWhen the new mean is computed, some data points may be re-assigned to a different centroid. Apart from building the K-means algorithm, it is important to choose the ideal K-value.
Content moderation design patterns with AWS managed AI services
       
The ever-increasing volume, variety, and complexity of UGC make traditional human moderation workflows challenging to scale to protect users. The solution is scalable content moderation workflows that rely on artificial intelligence (AI), machine learning (ML), deep learning (DL), and natural language processing (NLP) technologies. This post reviews how to build content moderation workflows using AWS AI services. To learn more about business needs, impact, and cost reductions that automated content moderation brings to social media, gaming, e-commerce, and advertising industries, see Utilize AWS AI services to automate content moderation and compliance. For additional information, resources, and to get started for free today, visit the AWS content moderation homepage.
Utilize AWS AI services to automate content moderation and compliance
       
You can significantly reduce complexity by using AWS AI capabilities to automate tasks, update prediction models, and integrate human review stages. The following diagram illustrates the architecture of AWS AI services in a content moderation solution. You can use the following AWS AI services for moderation, contextual insights, and human-in-the-loop moderation:Amazon Augmented AI (Amazon A2I) makes it easy to build the workflows required for human review, whether moderation runs on AWS or not. Check out Content Moderation Design Patterns to learn more about how to combine AWS AI services into a multi-modal solution. For additional information about how to contact our sales and specialist teams, find an AWS Partner with content moderation expertise, or to get started for free, please visit our AWS content moderation page.
Let’s Talk about Graph Machine Learning in Biomedical Networks
       
Let’s Talk about Graph Machine Learning in Biomedical NetworksA quick overview of the application of machine learning techniques on biomedical graphsPhoto by Sangharsh Lohakare on UnsplashGraph Machine Learning is already popular, especially, in the field of social networks but it is relatively lesser-known in Biomedicine or more specifically, in the field of Bioinformatics. Graph Machine LearningWe cannot use a standard machine learning algorithm directly on a graph as the information stored in it is high-dimensional and non-euclidean. We have to use dynamic graph embedding methods to map the dynamic graph into low-dimensional space and then perform the tasks. For instance, if we have a dynamic graph with 12 time points, we could have a dynamic graph embedding for all 12 time points (calculated independently for each time point). Final ThoughtsThe research output on Graph Machine Learning especially, graph neural networks is very high these days and we get to see impressive results obtained by these methods.
Collaborative Denoising Autoencoders on PyTorch Lightning
       
Collaborative Denoising Autoencoders on PyTorch LightningAutoencoders are a simple neural network approach to recommendationRecommendation systems are ubiquitous in our digital lives. This is very related to matrix factorization, which learns a latent representation of users and items from the ratings matrix. It resulted in the following ratings matrix. The test loader contains the training matrix and the target matrix. This is arguably one of the simplest algorithms in this space, but you should also try out Variational Autoencoders and Sequence Autoencoders.
Predict Your Model’s Performance (Without Waiting for the Control Group)
       
Predict Your Model’s Performance (Without Waiting for the Control Group)A novel algorithm by NannyML allows estimating the performance of an ML model before the ground truth is available. NannyML’s algorithm allows getting the expected model performance at serving time. Of course, one approach does not exclude the other: you can estimate the expected model performance, then wait one month, observe the actual performance, and finally compare them. Expected model performance vs Observed model performance. Summing upIn this article, we have seen how to reliably predict the expected performance (e.g.
Support Vector Machines (SVMs): Important Derivations
       
Kernel methods use kernels to map the input data space to a higher dimensional space where the data is assumed to be linearly separable. Gif of SVM finding optimal decision boundary in binary classification (GIF by Author)SVMs are all about defining a decision boundary. The closest points to the decision boundary are called support vectors, and they are they only points that affect the decision boundary. In the third, the input data space (x) can be mapped to the feature space using the kernel ϕ(x). The maximization find weights and biases (w, b) whose decision boundary maximizes the distance between the decision boundary and the closest points.
Comprehensive Guide to Python Virtual Environments using Conda for Data Scientists
       
Comprehensive Guide to Python Virtual Environments using Conda for Data ScientistsGuide to Virtual Environments with Conda via TerminalImage taken by Robert Zunikoff from UnsplashThis article will be a comprehensive guide for Data Scientists towards using Conda to create, export and use virtual environments for your projects. Table of ContentsWhat are Conda Virtual Environments? Conda InstallationCreate a Virtual Environment- Through Command Line- Through Environment File- Active Environment- Deactivate Environment- Through Command Line - Through Environment File - Active Environment - Deactivate Environment Environment List- List of Installed Packages in Environment- List of Installed Packages in Environment Export Virtual EnvironmentCloning a Virtual EnvironmentDeleting Virtual Environment(s)Concluding RemarksResourcesWhat are Conda Virtual Environments? There are various services which provide you to create virtual environments, these services vary from coding language to coding language. For Python in particular, the two most common ways of creating and using virtual environments are through the package managers pip and conda .
Confidence Calibration for Deep Networks: Why and How?
       
Within this blog post, I will explore the topic of confidence calibration within deep learning, beginning with the motivation behind confidence calibration, why it is important, and how it can be used. I will then overview common methods of measuring confidence calibration, including brier score, expected calibration error, maximum calibration error, and others. Finally, I will overview existing confidence calibration methodologies within deep learning, focusing upon the methodologies that are most effective and efficient in large-scale applications. Calibration MethodologiesWithin this section, I will explore numerous methodologies that have been proposed for improving neural network confidence calibration. Bayesian Neural Networks [3]Bayesian neural networks — motivated by the fact that infinitely-wide neural networks with distributions over their weights converge to Gaussian Processes (and thus have closed-form uncertainty estimates) [4] — can be simply defined as finite neural networks with distributions placed over their weights.
Teaching a LightGBM How to Count to 10
       
LightGBM is a strong machine learning algorithm. LightGBM also delivers for time series data. The winner of the latest iteration of the M forecasting competitions (M5) used a LightGBM as the learning algorithm. We start by creating a trainable data set, which we call sequence_df (lines 1–10). After building the data set, we fit the LightGBM with it (lines 11–18).
Active Learning: A Practical Approach to Improve Your Data Labeling Experience
       
Hands-on TutorialsActive Learning: A Practical Approach to Improve Your Data Labeling ExperienceA smarter way of human labeling with modALPhoto by Michał Turkiewicz on UnsplashOkay, let’s talk about the one thing which doesn’t gain that much traction in the data science realm: labeling your data. We will see that it is a laborious activity and hence later we seek an alternative such as active learning. As an example of active learning, please take a look at the image below. The final decision boundary by active learning is also pretty decent in this case. You will see that active learning performance will approach that of human labeling using fewer data.
Large Language Model Morality
       
Recently, OpenAI has announced InstructGPT, which seems to be better at moral reasoning tasks. How large is a large language model? The dataset of 180 generated sentences across the 3 LLMs was saved to CSV here. I annotated the dataset, rather quickly, deciding for each row if the generated sentence was Debatable, Nonsensical, TRUE, or FALSE. These results might change on a larger set of generated samples, or by having many people give their opinion of the morality of each generated sentence.
Generating digital signatures with the gait of people
       
We believe such types of applications will take center stage with the continuous advancements in human landmark detection and faster processing. Generation of digital signatures through the manner of one’s walking style could be a promising start. The work done in recognizing the various landmarks in the human body through deep learning models will form a fundamental component of our architecture. Slow walking (GIF by Author)c) Stagnant hand(s):Many times, while walking, one may be holding a bag in one or both hands. Nevertheless, gait analysis combined with facial recognition will be a good mechanism to generate the digital signature of a person.
8 Tips To Build Powerful Deep Learning Models for Visual Similarity
       
The purpose of this post is to share with you my tips on building strong embedding models for visual similarity tasks. Triplet Loss Triplet loss has been introduced in the FaceNet paper by Google in 2015. d(a, n) is high The triplet loss can be formalized as follows: L = max(d(a, p) — d(a, n) + margin , 0) This loss is by definition lower-bounded by 0. A hard triplet (a, p, n) satisfies this inequality: d(a, n) < d(a, p): I used hard triplets only to optimize the loss. According to the writers of the paper, this method outperformed triplet loss, intra-loss, and inter-loss on the most common face identification benchmarks.
UNIKUD: Adding Vowels to Hebrew Text with Deep Learning
       
UNIKUD: Adding Vowels to Hebrew Text with Deep Learning Introducing an open-source Hebrew nakdan (נקדן) tool which uses no rule-based logic. Contents: The Hebrew Writing System Introduction to UNIKUD UNIKUD Datasets Methods Results Limitations and Further Directions Conclusion References1. In order to train UNIKUD to learn how to vocalize Hebrew text, we needed to collect a dataset of Hebrew text with vowel points. As a large deep learning model, UNIKUD is substantially slower at performing inference (adding vowels to text) when it is run on CPU rather than GPU. Conclusion In this project, we presented the UNIKUD model which adds nikud to Hebrew text, an open-source nakdan built with deep learning and using no hand-written rules specific to Hebrew.
The College Kids Are Back to Normal
       
The College Kids Are Back to NormalAt least campus bars are healing …When I was in college at the University of Illinois in the late 1990s, there was nothing quite as pathetic as a townie. Sure, I was here for four years, I had to be, but who were these people who were here all the time? Why would they choose to be in this … waystation? This was my party place, my town for self-discovery. The nerve of them to…
How to Grow Your Child’s Social Skills
       
How to Grow Your Child’s Social SkillsStrategies to help your child with their emotional intelligenceDaniela Dimitrova / PixabayWhether you have a highly extraverted child who constantly talks over others, or an introverted child who has trouble joining in group activities, most children benefit from honing their social skills. Just like walking and talking, interacting with others is something that is learned and…
The Return of 1980s-Era Nuclear-Strike Maps
       
The Return of 1980s-Era Nuclear-Strike MapsI grew up with maps showing how cities would be obliterated by a nuke. They’re backA Nukemap visualization of New York getting hit with a Topol Russian nuclear weaponIt’s never a great time to be a teenager, but the 1980s had their own particular challenges. One of the main ones? Wondering — on a daily basis — if you were going to die, without much warning, in a nuclear strike.
Is Allergy Season Getting Worse Every Year?
       
Is Allergy Season Getting Worse Every Year? Photo: Allef Vinicius / UnsplashEvery spring, the news outlets seem to recycle the same message: This is the worst allergy season on record. But, according to the Asthma and Allergy Foundation, they are actually true. In the last year, 19.2 million adults and 5.2 million children in the United States suffered from hay fever. Allergy season is getting worse every year.
How Lying Affects Linguistic Expression
       
Due to the cognitive load it causes, various patterns of speech tend to appear when a person lies. With this definition in mind, it is now possible to delve into the patterns that arise when a person lies. Lying Pattern #1: ProsodyGiven the cognitive demand of lying, a number of prosodic vocal cues associated with challenging thought processes appear when a person lies. It is thought that this added layer of complexity is limited by the cognitive load associated with lying (Newman, 2003). Lying Words: Predicting Deception from Linguistic Styles.
Can Linkin Park Ever Replace Chester Bennington?
       
Can Linkin Park Ever Replace Chester Bennington? For now, the answer is no. Photo of Chester Bennington from Drew de F Fawkes — Wikipedia CommonsCW: Suicide. If you or anyone you know needs help, the National Suicide Prevention Lifeline is available 24/7 at 1–800–273–8255. You can reach the Crisis Text Line by texting HOME to 741741.
18 Little Stories That Will Have Massive Impact On Your Life
       
18 Little Stories That Will Have Massive Impact On Your LifeWhen I was 18 years old, I was a research assistant to Robert Greene. My job was to find stories he could use in his writing. Nearly seventeen years later, I still use so much of what Robert taught me about finding great stories in researching for my own writing. But the gift has been less in how it has helped me professionally, and more in how it has helped me personally.
11 Stupidly Simple Side Hustles to Earn Extra Money as a Programmer
       
11 Stupidly Simple Side Hustles to Earn Extra Money as a Programmer No BS and please don’t expect blogging, YouTube, freelancing, or Medium. Selling Templates on Notion To be honest, I haven’t sold any Notion templates yet. Yes, I’ve sold Canva templates, digital items on Gumroad and Etsy, and code on Codecanyon. And the bulk of them are unaware that we may use Canva templates and customize them to meet our own demands. There is a page called Canva Creators where you may apply and, if accepted, make money by selling templates on Canva.
About those kill-switched Ukrainian tractors
       
About those kill-switched Ukrainian tractorsWhat John Deere did to Russian looters, anyone can do to farmers, anywhere. Here’s a delicious story: CNN reports that Russian looters, collaborating with the Russian military, stole 27 pieces of John Deere farm equipment from a dealership in Melitopol, Ukraine, collectively valued at $5,000,000. The equipment was shipped to Chechnya…
Democratizing access to large-scale language models with OPT-175B
       
Large language models — natural language processing (NLP) systems with more than 100 billion parameters — have transformed NLP and AI research over the last few years. The parameter count for these smaller-scale models includes 125 million, 350 million, 1.3 billion, 2.7 billion, 6.7 billion, 13 billion, and 30 billion (66 billion to be released soon). While there are many exciting developments in the space of large language models, the limitations and risks these models pose are still not well understood. Without direct access to these models, researchers are also limited in their ability to design detection and mitigation strategies for possible harm, which leaves detection and mitigation in the hands of only those with sufficient capital to access models of this scale. Access the open source code and small-scale pretrained models here, request access to OPT-175B here, and read the paper here.
Predicting Diabetes with Machine Learning — Part II
       
This is the second part of an overview of different Machine Learning models I made to compare them in predicting diabetes, using the famous ‘diabetes dataset’ provided by the scikit-learn library. Finally, I do a graphical analysis of the residuals:#figure sizeplt.figure(figsize=(10, 7)) #residual plotsns.residplot(x=y_test, y=y_test_pred) #labelingplt.title('REDISUALS VS PREDICTED VALUES')plt.xlabel('PREDICTED VALUES (DIABETES PROGRESSION)')plt.ylabel('REDISUALS')Residuals VS predicted (by Lasso model) values. I make a scatter chart of the actual values in comparison with the predicted:Actual VS predicted (by polynomial model) values. Finally, also here I want to make a visualization with KDE:KDE plot for actual VS predicted (by polynomial model) values. Ultimately, therefore, the polynomial regression model, with a third-degree polynomial, turns out to be a good model to solve this regression problem.
One Line of Code to Accelerate Your Sklearn Algorithms on Big Data
       
One Line of Code to Accelerate Your Sklearn Algorithms on Big DataThe introduction of the intel sklearn extension. However, training a big dataset with sklearn algorithms sometimes can be costly. This extension has the potential to accelerate your sklearn code 10X-100X. RandomForest vs. XGBoostFor many data science projects, we have seen people pick XGBoost over Random Forest. Smirnov, E. (2021) Save Time and Money with Intel Extension for Scikit-learn.
The Abercrombie & Fitch Documentary Reveals How Power Decides What’s Cool
       
Playing the Game: Both the American Left and Right are Shortsighted and It’s Going to Take Us Down
My Mom Was for the Birds
       
My Mom Was for the BirdsI loved her and nature equally. The mourning dove was in my sights, exposed by a gust of autumn that had taken away all its cover. My mother loved mourning doves. She was the one who had told me that the dove was in fact “mourning” — not a denizen of the “morning” — the naming inspired by the sadness of the bird’s plaintive five-note call. And now, at the…
It’s Not The Size Of Your Sabbatical
       
HUMORIt’s Not The Size Of Your SabbaticalNo matter the size, it makes people weird. by authorI’ve heard from women who experienced pregnancy that a baby in the womb does strange things to strangers:They get handsy. They get talky, saying whatever comes into their weird and wild minds.
What About the “Woke” Right?
       
What About the “Woke” Right? It’s time to call out which side is the real “woke” sidePhoto courtesy of Naomi McKinney on UnsplashAccording to our conservative friends, being “woke” refers to those who are ridiculously politically correct and who speak out too much on social injustice. It’s commonly used as a pejorative term when referring to those on the left, increasingly paring it with phrases like “woke mob” or as a sledgehammer, using…
Why the “Improve by 1% Every Day” Mantra Is Bullsh*t
       
Why the “Improve by 1% Every Day” Mantra Is Bullsh*tSuccess isn’t that easyAll images by the author. If you’ve spent any time among the “Hey! Let’s all get on our knees and worship success” crowd, you’ve probably heard about the 1% rule of improvement. It goes something like this:
I Replaced My Native iOS App with a Cross-Platform Web App and No One Noticed
       
I Replaced My Native iOS App with a Cross-Platform Web App and No One NoticedHow the performance trade-off in my cross-platform web app went unnoticed by users. Choosing a mobile app technology (aka pick your poison)Now, the problem with starting a mobile app in 2022 is that there are a lot of totally different technical directions you can take: native, cross-platform web app, React Native, Flutter, Progressive Web App, Xamarin, etc. Cross-Platform Web AppsWith cross-platform web apps, you write code once using common web technologies and deploy it to multiple platforms. With 3 commands I can deploy to an iOS app, an Android app, or deploy to my website on AWS! That flat line is when the cross platform web app was releasedSomehow my cross-platform web app is actually more stable!
A Peak at How the Brain can Perform Principal Component Analysis.
       
A Peak at How the Brain can Perform Principal Component Analysis. The brain and the modern world have one thing in common: outputs arise from the analysis of huge information datasets. One of the most well-known data analysis methods is called Principal component analysis or PCA. The Hebb rule in neuroscience can be classified as an unsupervised learning rule. Thus, each weight vector learns from a different input ensemble with more structure subtracted out as they are outputs.
Sorting & Analytics Pane in Tableau: A Road to Tableau Desktop Specialist Certification
       
Welcome to the tenth chapter, In this piece, we are going to learn about Sorting and Analytics Pane in Tableau . Chapter 10: A comprehensive guide on Sorting & Analytics Pane in Tableau with Sample Certification questions and free Udemy dumps. Analytics PaneWe can add Analytics objects from Analytics Pane that could be availed from the view. Analytics Pane offers drag-and-drop functionality to add analytics objects such as box plots, constant lines, and average lines in the view. Reference Line, Reference Band, Distribution Band, Box PlotWe can create custom Reference Line, Reference bands, Distribution bands, and Box Plots.
The Basics of Neural Networks (Neural Network Series) — Part 1
       
The Basics of Neural Networks (Neural Network Series) — Part 1 Neural Networks An Artificial Neural Network (ANN) or simply a Neural Network(NN) is interconnected layers of small units called nodes that perform mathematical operations to detect patterns in data. Artificial Neuron — Mathematical Operation on one Neuron An artificial neuron takes input values (it can be several) with weights assigned to them. Neural Network Design A Neural Network(NN) is made of several neurons stacked into layers. First, the input values are weighted by multiplying the input values with corresponding weights. Artificial neuron An artificial neuron (also called a unit or a node) mimics the biological neuron in structure and function (in a loose sense — see the next note).
What Does the Perfect Work Day Look Like?
       
What Does the Perfect Work Day Look Like? My 20-year experiment with work/life balance is paying offSource: Canva.comThe meditation teacher in me wishes this article was about manifesting the perfect day, without the work modifier. So whether you are working from home, commuting to work, or even looking for work — here are the things I wish I had known sooner. The promised (work)landSo what can the perfect work day actually look like? Once I figured out what a perfect workday was for me, I calibrated the perfect week, then the perfect year.
Fine-tune transformer language models for linguistic diversity with Hugging Face on Amazon SageMaker
       
A language model is an NLP model that learns to predict the next word (or any masked word) in a sequence. Multilingual masked language models – The other approach is to pre-train large transformer models on many languages. Two examples are: Multilingual BERT – The multilingual BERT model was trained in 104 different languages using the Wikipedia corpus. Multilingual language model with multilingual BERT – The pre-trained model is called bert-base-multilingual-uncased. To learn how Amazon SageMaker Training Compiler can accelerate the training of deep learning models by up to 50%, see New – Introducing SageMaker Training Compiler.
Build a custom Q&A dataset using Amazon SageMaker Ground Truth to train a Hugging Face Q&A NLU model
       
In this post, we demonstrate how to build a custom question answering dataset using Amazon SageMaker Ground Truth to train a Hugging Face question answering NLU model. For more information, see the following pricing pages:Amazon S3 PricingAWS Lambda PricingAmazon SageMaker PricingAmazon SageMaker Data Labeling Pricing – This fee depends on the type of workforce that you use. To start operating the notebook, complete the following steps:On the Amazon SageMaker console, navigate to the notebook instance page. Download and inspect the dataThe SQuAD dataset contains a training dataset as well as test and development datasets. Using an active learning model, data is labeled and only routed to humans if the model cannot confidently label it.
Process larger and wider datasets with Amazon SageMaker Data Wrangler
       
Amazon SageMaker Data Wrangler reduces the time to aggregate and prepare data for machine learning (ML) from weeks to minutes in Amazon SageMaker Studio. In this post, we share our findings from two benchmark tests to demonstrate how you can process larger and wider datasets with Data Wrangler. Sampling is enabled by default, and Data Wrangler only processes the first 100 rows when enabled.xAs we increased the Data Wrangler instance size, we observed a roughly linear speedup of Data Wrangler built-in transforms and custom Spark SQL. To learn more about using data flows with Data Wrangler, refer to Create and Use a Data Wrangler Flow and Amazon SageMaker Pricing. To get started with Data Wrangler, see Prepare ML Data with Amazon SageMaker Data Wrangler.
How to Add an Escape Hatch to Your Python Run in Two Steps
       
You can find the escape hatch in my_app/api_utils.pyPrerequisite KnowledgeBefore going into the details I just want to cover that I use decorator functions and a decorator factory to do this. StepsStep 1: Defining the decoratorCreate a new module, in the repository, I call this module api_utils.py, I define the decorator function here and call it escape hatch. Function_that_runs_forever() is a function that runs forever, used to test the escape hatch. The escape hatch decorator started, printed out some stuff and it listens to our keyboard to see if we press escape. Functioning escape hatch.
Remote View your Computer Vision Models Running on AWS Panorama
       
Monitoring a Panorama appRemote View your Computer Vision Models Running on AWS PanoramaIntroducing SpyGlass: A library to view the output of a Panorama application from your workstationPhoto by Elisa Schmidt on UnsplashReal-time smart video analytics application development and edge device deployment is a tricks task. This post introduces SpyGlass, the first in a series of open-source tools to make developing AWS Panorama application simpler. AWS Panorama is a machine learning appliance and software framework that allows you to deploy video analytics applications on edge. For a thorough introduction and a step-by-step tutorial on deploying a Panorama application, refer to Deploy an Object-Detector Model at the Edge on AWS Panorama. Moreover, you are responsible for keeping private all AWS credentials used by SpyGlass, as required by AWS Shared Responsibility Model.
A Complete Guide for Detecting and Dealing with Outliers
       
Photo by Will Myers on UnsplashA Complete Guide for Detecting and Dealing with Outliers6 Methods to Detect the Outliers and 4 different methods to Deal with ThemOutliers can be a big problem in data analysis or machine learning. So, it is important to detect outliers and deal with them carefully. Detection of OutliersThere are quite a few different ways to detect outliers. In this example, we will consider the lower limit as the tenth percentile and the upper limit as the 90th percentile. These are all the way to detect outliers I wanted to share today.
How Policy Gradients in RL can get you to the Moon
       
In today’s lesson, we will implement vanilla policy gradients from scratch and land on the Moon ?. ?? The LunarLander environment Baseline agent Welcome policy gradients ? Policy gradients agent Key take-aways Homework ? What’s next? Fail timeAs it turns out, landing on the Moon is not such a piece of cake (what a surprise)Let’s see how policy gradients can help us land on the Moon. Welcome policy gradients ?The goal of any reinforcement learning problem is to find an optimal policy that maximizes cumulative rewards. Policy gradients agent?? notebooks/03_vanilla_policy_gradient_with_rewards_to_go.ipynbLet’s implement the simplest policy gradient agent out there, where the weights in the policy gradient formula are the episodic rewards.
How to Easily Run Python Visualizations On a Web Browser with PyScript
       
How to Easily Run Python Visualizations On a Web Browser with PyScriptA step-by-step guide to run matplotlib and bokeh visualizations on your web browser using PyScriptPhoto by Firmbee on UnsplashIn PyCon US 2022, Anaconda’s CEO announced a new technology called PyScript that allows users to write Python code in the browser. One of the coolest and easiest things you can build on your web browser are Python visualizations and, in this guide, I’m going to show you how to display matplotlib and bokeh visualizations on your web browser using PyScript. Matplotlib Plots On Your Web Browser with PyScriptThe steps to plot a visualization with matplotlib and bokeh on our web browser are a bit different. # Python Code Goes Here ...Here’s the Python code to make the line plot (you should put it inside the py-script tag)Great! Now you know how to run visualizations on the web browser with Python and HTML.
As NFT Sales Continue to Plummet, Is the Bubble About To Burst?
       
As NFT Sales Continue to Plummet, Is the Bubble About To Burst? A report by Non Fungible lays bare the state of the market in 2022Image edited by authorHold onto your digital pixelated hardhats: the NFT market is beginning to collapse. Our Lord, savior and staunch defender of “free speech” — read: the right to talk shit about people online — Elon Musk…
Men Cause 100% of Unwanted Pregnancies
       
Men Cause 100% of Unwanted PregnanciesOur conversation about abortion places the burden of responsibility on women. I argue men are the root cause. As a mother of six and a Mormon, I have a good understanding of arguments surrounding abortion, religious and otherwise. When I hear men discussing women’s reproductive rights, I’m often left with the thought that…
Pregnancy is Dangerous, Debilitating, and Costly Unpaid Work
       
Pregnancy is Dangerous, Debilitating, and Costly Unpaid WorkHow dare the state insist that women perform it against their willPhoto by Toro Tseleng on UnsplashI’m not sure why I never thought of it this way before, but pregnancy is work. It’s dangerous, debilitating, and costly unpaid work. I first got that idea when reading this piece by marlene rosette in Fourth Wave:
The Abercrombie & Fitch Documentary Reveals How Power Decides What’s Cool
       
The Abercrombie & Fitch Documentary Reveals How Power Decides What’s Cool‘White Hot’ documents how the fashion retailer used exclusivity, racism, and warped beauty standards to shape 90’s teen cultureThe Abercrombie & Fitch Fall/Winter 2000–2001 campaign shot by Bruce WeberEarly in Allison Klayman’s new documentary White Hot: The Rise and Fall of Abercrombie & Fitch, former A&F model Bobby Blanksi…
Weapons of Mass Distraction. Global war and reproductive rights…
       
Weapons of Mass DistractionGlobal war and reproductive rights should not be at the mercy of our ever-shortening attention spansThe element fueling economic growth is not a rare earth metal, processing power, or NFTs: It’s attention. The most successful players in the Attention Economy are WMDs: weapons of mass distraction. One of the problems with the Attention Economy is the sclerotic lurching from single topic to single topic. WarAs Putin loses an information war, he continues to kill thousands in a real war we’re losing interest in. Unstoppable ForceIt’s still the Attention Economy, however.
A Data Scientist Is More Than Just a Data Scientist
       
So, to return to my original point, why is it so difficult to determine the abilities necessary for a data scientist? Indeed, the term “data scientist” has become diluted in recent years; now, a data scientist can play any role, ranging from business problem formulation to model deployment and monitoring. Second, the role of a data scientist can be influenced by the company’s culture and data maturity. Because of this division of labor, many data scientists wind up doing a lot of data modeling, which fosters the impression that data scientists only need data-related skills and nothing else. A good data scientist is much more than a data scientist.
7 Data Pre-Processing Methods With SciKit-Learn
       
7 Data Pre-Processing Methods With SciKit-LearnUsing Python and Google ColabPhoto by James Harrison on UnsplashData pre-processing is an important part of preparing, organizing, and structuring data for further analysis or Machine Learning model engineering. #Define X and y:y = df['RainTomorrow']X = df.drop('RainTomorrow', axis=1)If we further inspect y, we will find that it is coded as a string with values ‘Yes’ and ‘No’. The RobustScaler function tries to solve this problem by applying data transformation that removes the median and scales the data according to the quantile range. Remember that even though this change in scale may seem counter-productive for data visualization, our focus here is on data preparation to building Machine Learning models and not Data Visualization. Let’s try with our sample dataset:#Import and read DataFrame:df = pd.read_csv('/content/weather.csv')dfNow we will inspect our variables to check if any of them have binomial distribution:#Plot histograms for numerical variables:fig, axs = plt.subplots(4, 4, figsize=(14, 14)) sns.histplot(data=df, x="MinTemp", kde=True, color="skyblue", ax=axs[0, 0])sns.histplot(data=df, x="MaxTemp", kde=True, color="skyblue", ax=axs[0, 1])sns.histplot(data=df, x="Rainfall", kde=True, color="skyblue", ax=axs[0, 2])sns.histplot(data=df, x="Evaporation", kde=True, color="skyblue", ax=axs[0, 3]) sns.histplot(data=df, x="WindGustSpeed", kde=True, color="skyblue", ax=axs[1, 0])sns.histplot(data=df, x="WindSpeed9am", kde=True, color="skyblue", ax=axs[1, 1])sns.histplot(data=df, x="WindSpeed3pm", kde=True, color="skyblue", ax=axs[1, 2])sns.histplot(data=df, x="Humidity9am", kde=True, color="skyblue", ax=axs[1, 3]) sns.histplot(data=df, x="Humidity3pm", kde=True, color="skyblue", ax=axs[2, 0])sns.histplot(data=df, x="Pressure9am", kde=True, color="skyblue", ax=axs[2, 1])sns.histplot(data=df, x="Pressure3pm", kde=True, color="skyblue", ax=axs[2, 2])sns.histplot(data=df, x="Cloud9am", kde=True, color="skyblue", ax=axs[2, 3]) sns.histplot(data=df, x="Cloud3pm", kd
Data Engineering Using Julia Lang
       
Data Engineering Using Julia LangThe objective of this blog is to understand how to build a Data Engineering pipeline using Julia Lang. Group & Aggregations#create a seperate small subset for groupby opsgroup_df= select(emp_df,”DEPTNO”,”SAL”)first(group_df,7)#groupby creates a grouped dataframe other than normal dataframe, somewhat similar to pandas groupbytypeof(gd) #access group information using array indexesprintln(gd[1])println(gd[2])println(gd[3])Aggregationsusing Statistics println(“sum of sal deptwise “,combine(gd, :SAL => sum))println(“ — — — — — — — — — — — — — — — — — — — — — — — “)println(“avg of sal deptwise “,combine(gd, :SAL => mean))println(“max of sal deptwise “,combine(gd, :SAL => maximum))println(“ — — — — — — — — — — — — — — — — — — — — — — — “)println(“sum of sal deptwise “,combine(gd, :SAL => minimum))4. Julia UDFs#WAF to calculate tax of employees and add Tax as derived column in existing dataframe. · UDFs and derived column. · Write Julia data frames to CSV filesThanks to all for reading my blog If you like my content and explanation please follow me on medium and share your feedback, that will always help all of us to enhance our knowledge.
An Experimental Design Perspective on Model-Based Reinforcement Learning
       
In our recent ICLR paper, “An Experimental Design Perspective on Model-Based Reinforcement Learning“, we derive an acquisition function that guides an agent in choosing data for the most successful learning. In doing this, we draw a connection between model-based reinforcement learning and Bayesian optimal experimental design (BOED) and evaluate data prospectively in the context of the task reward function and the current uncertainty about the dynamics. Our approach can be efficiently implemented under a conventional assumption of a Gaussian Process (GP) prior on the dynamics function. Thus, we only need to approximate the optimal policy in the regions of the state space that are visited by the optimal policy. Therefore, we choose to learn about \(\tau^*\)—the optimal trajectory governed by the optimal policy \(\pi^*\).
Google AI Blog: Learning Locomotion Skills Safely in the Real World
       
The safe learning framework switches between the safe recovery policy and the learner policy to enable robots to safely acquire novel and agile motor skills. (2) If the learner policy cannot ensure safety in the near future after switching to the safe recovery policy, we keep using the safe recovery policy. For the two-leg balance task, the percentage drops from near 82.5% to 67.5%, suggesting that the two-leg balance is substantially harder than the previous two tasks. The reward learning curve (blue) and the percentage of safe recovery policy activations (red) using our safe RL algorithm in the real world. Our results suggest that learning legged locomotion skills autonomously and safely is possible in the real world, which could unlock new opportunities including offline dataset collection for robot learning.
OpenAI Leadership Team Update
       
Brad Lightcap has been pivotal in OpenAI's growth, scaling our structure, team, and capital base through his oversight of our Finance, Legal, People, and Operations organizations. Mira is taking on the role of Chief Technology Officer, reflecting her leadership across these critical areas within OpenAI. He will lead the operations of OpenAI’s nonprofit parent and key strategic projects including our relationships with mission-aligned partners. These executives are supported by world-class teams who are the lifeblood of OpenAI, constantly advancing the state of the art in artificial intelligence research and deployment. It’s a pleasure to work alongside such incredible talent and leadership across our company.
Predict customer churn with no-code machine learning using Amazon SageMaker Canvas
       
In this post, we show you how business analysts can build a customer churn ML model with Amazon SageMaker Canvas, no code required. We use Canvas to perform the following steps:Import the churn dataset from Amazon Simple Storage Service (Amazon S3). This information can help the marketing team gain insights that lead to taking actions to reduce customer churn. ConclusionIn this post, we showed how a business analyst can create a customer churn model with SageMaker Canvas using sample data. To learn more about using Canvas, see Build, Share, Deploy: how business analysts and data scientists achieve faster time-to-market using no-code ML and Amazon SageMaker Canvas.
Use custom vocabulary in Amazon Lex to enhance speech recognition
       
Starting today, you can give Amazon Lex additional information about how to process speech input by creating a custom vocabulary. Overview of the custom vocabulary capabilityYou define the custom vocabulary for a language in the bot. In the Amazon Lex section, select your Amazon Lex bot and make it available for use in the Amazon Connect contact flows. You can easily define the custom vocabulary for your Amazon Lex bot and augment it to the bot definition. You can configure custom vocabulary using the Amazon Lex V2 console or via the API.
How to Easily Run Python Visualizations On a Web Browser with PyScript
       
How to Easily Run Python Visualizations On a Web Browser with PyScriptA step-by-step guide to run matplotlib and bokeh visualizations on your web browser using PyScriptPhoto by Firmbee on UnsplashIn PyCon US 2022, Anaconda’s CEO announced a new technology called PyScript that allows users to write Python code in the browser. One of the coolest and easiest things you can build on your web browser are Python visualizations and, in this guide, I’m going to show you how to display matplotlib and bokeh visualizations on your web browser using PyScript. Matplotlib Plots On Your Web Browser with PyScriptThe steps to plot a visualization with matplotlib and bokeh on our web browser are a bit different. # Python Code Goes Here ...Here’s the Python code to make the line plot (you should put it inside the py-script tag)Great! Now you know how to run visualizations on the web browser with Python and HTML.
One Line of Code to Accelerate Your Sklearn Algorithms on Big Data
       
One Line of Code to Accelerate Your Sklearn Algorithms on Big DataThe introduction of the intel sklearn extension. However, training a big dataset with sklearn algorithms sometimes can be costly. This extension has the potential to accelerate your sklearn code 10X-100X. RandomForest vs. XGBoostFor many data science projects, we have seen people pick XGBoost over Random Forest. Smirnov, E. (2021) Save Time and Money with Intel Extension for Scikit-learn.
How to Use GPT-J for (Almost) Any NLP Task
       
In a previous blog post we had a look at how we can set up our very own GPT-J Playground using Streamlit, Hugging Face, and Amazon SageMaker. In this blog post we will have a look how we can achieve that using different parameters and particular prompts for the GPT-J model. This blog post will build on this previous blog post and this Github repo and it is assumed that you have already built your own GPT-J playground. The code for this blog post can be found in a separate branch in the same Github repo. ClassificationLet’s start with a relatively “simple” task, text classification:Image by authorIn this example we state the task explicitly (match food to countries) and also provide a few examples.
Getting Started with NLTK in Python
       
Getting Started with NLTK in Python Exploring some of the most common functions and techniques we can use to develop basic NLP pipelines. Photo by Aaron Burden @unsplash.com NLTK (Natural Language Toolkit) is one of the first implementations of Natural Language Processing techniques in Python. Guido van Rossum began working on Python in the late 1980s as a successor to the ABC programming language and first released it in 1991 as Python 0.9.0. from nltk import word_tokenizetoken_list = word_tokenize(python_wiki)print(token_list[0:10]) Let’s see our first ten tokens: ['Python', 'is', 'a', 'high-level', ',', 'interpreted', ',', 'general-purpose', 'programming', 'language'] Cool! Let’s see the first 10: [('Python', 'is', 'a'),('is', 'a', 'high-level'),('a', 'high-level', ','),('high-level', ',', 'interpreted'),(',', 'interpreted', ','),('interpreted', ',', 'general-purpose'),(',', 'general-purpose', 'programming'),('general-purpose', 'programming', 'language'),('programming', 'language', '.
Hands on Climate Time Series Clustering using Machine Learning, with Python
       
Hands on Climate Time Series Clustering using Machine Learning, with PythonHere’s how to use Machine Learning to classify unlabeled time series with few lines of code. Photo by Jonathan Bowers on UnsplashThe first Machine Learning lesson starts with something like this:There are two kinds of task in a Machine Learning scenario: classification and regression. This is a very simple introduction and it is efficient and insightful to understand what Machine Learning is about. What is hidden is that this distinction is true when we consider the so called supervised Machine Learning. Now, we have multiple kinds of Machine Learning algorithm to do a clustering job.
Dynamically Add Arguments to Argparse | Python Patterns
       
Dynamically Add Arguments to Argparse | Python PatternsHow to specify different arguments according to the user input using argparse.ArgumentParser. We have our example application, with the commands train and infer, each with different arguments that can be mandatory or optional. Let’s test our cli:$ python .\src\sub_commands.py train -m transformer --save_model_path $HOME/my_model_path/Training model with:model=transformersave_model_path=/home/me/my_model_path/dropout=0.1batch_size=Nonefunc=And it won’t accept the arguments that are specific for infer:$ python .\src\correct.py train -m transformer --save_model_path $HOME/my_model_path/ --model_path . # src/model_loader.py import argparsefrom commands import train, infer# src/model_loader.py from commands import train, inferfrom models.loader import load_model_args. ConclusionIn this article we have seen how to build clis of increasing complexity using argparse from the Python standard library.
Parallels in AI and Medicine
       
One thing I have been thinking about a lot is how medicine can inform the future of AI. Also, I’ll transition to using the term “machine learning” instead, since AI may be too big of a term. Machine learning (ML), as a subset of AI, is the body of work related to learning useful programs from large datasets. Andrew Ng, in an interview with VenturebeatIn medicine, human experience is encapsulated in large medical ontologies. Like medicine was in ages past, AI will have a long road ahead of it.
3 Most Valuable Data Science Skills That Increased My Salary by 60%
       
3 Most Valuable Data Science Skills That Increased My Salary by 60%Hint: Machine Learning is not one of themPhoto by Jason Hogan on UnsplashWhen I first started learning data science, there were so many topics and techniques to learn that it often felt overwhelming to decide what to learn and in what order to learn it. Looking back, I can surely say that certain skills that I learned have been much more practical and useful than others. In this article, I want to share with you three skills that have ultimately accelerated my career and increased my salary by 60% in the past year. The reason I attribute most of my credit to these three skills is that these three skills have allowed me to work completely autonomously, helped me discover insights and ideas with incredible business value, and have given me the ability to ship results faster. With that said, let’s dive into it!
Understanding MixNMatch: Creating A More Realistic Synthetic Image
       
Understanding MixNMatch: Creating A More Realistic Synthetic ImageCombine different factors from multiple real images to a single synthetic imageFigure 1: An overview of what is possible with MixNMatch Generative ModelI recently stumbled upon this paper called MixNMatch that aims to combine different factors from multiple real images to a single synthetic image — with minimal supervision. MixNMatch disentangles and encodes multiple factors from different real images to a single synthetic image. Specifically — it combines image background, pose, shape and texture from different real images to a single synthetic image with minimal supervision. More specifically, for each real image x, the authors propose four separate encoders to extract its z, b, p, c codes. Because the latent code dimension for pose/shape is not big enough to capture those fine grained feature.
How to Reduce the Training Time of Your Neural Network from Hours to Minutes
       
How to Reduce the Training Time of Your Neural Network from Hours to MinutesPart 2 of the articles on AI with HPC: parallelising a CNN with Horovod and GPUs to obtain a 75x-150x speed-up. For the remainder of this post, we will talk about how one can data parallelise their TensorFlow code using Horovod. We will trick the fitting call in TensorFlow to assign 1/n batches to each of the GPUs using the ‘steps_per_epoch’ keyword. At this point, with 2 GPUs, the data training strategy will be put into execution as shown below. Batch size & learning rate (Option 2 in data distribution strategy)If you have used a neural network (NN), you must be familiar with the concept of learning rate and batch size.
UNIKUD: Adding Vowels to Hebrew Text with Deep Learning
       
UNIKUD: Adding Vowels to Hebrew Text with Deep Learning Introducing an open-source Hebrew nakdan (נקדן) tool which uses no rule-based logic. Contents: The Hebrew Writing System Introduction to UNIKUD UNIKUD Datasets Methods Results Limitations and Further Directions Conclusion References1. In order to train UNIKUD to learn how to vocalize Hebrew text, we needed to collect a dataset of Hebrew text with vowel points. As a large deep learning model, UNIKUD is substantially slower at performing inference (adding vowels to text) when it is run on CPU rather than GPU. Conclusion In this project, we presented the UNIKUD model which adds nikud to Hebrew text, an open-source nakdan built with deep learning and using no hand-written rules specific to Hebrew.
In our Digital Future, Will We lose our History?
       
In our Digital Future, Will We lose our History? One concern I have these days, as do many others, is how we’ll be able to understand the human cultures of today in the distant future. But the formats that we have for storage today are very different from a decade ago. Our digital culture and much of the clues about today’s cultures would be lost. Preserving our now for the future is critical for cultural understanding and navigating our world in the future.
Fake News and the Growing Power of Asian American Voters: What this Means for 2022 Midterm Elections
       
Fake News and the Growing Power of Asian American Voters: What this Means for 2022 Midterm ElectionsWhen people think of those angrily taking to social media after being misguided by false claims of election fraud, they likely do not immediately think of middle-aged Chinese American Twitter users. As the political influence of Asian Americans increases, bad actors have worked to build sprawling misinformation networks, including a vast media empire bankrolled by Steve Bannon and Guo Wengui, targeting members of the Asian American diaspora. In the video below, a Chinese American volunteer for a political candidate was interviewed to speak about how opposition to CRT, which the individual incorrectly stated “caused discrimination,” helped galvanize Chinese American support for the candidate. The Asian American community, particularly first-generation immigrants who are English-language learners, are among the most vulnerable to false narratives often due to limited language access to available resources and information. Combatting Mis/Disinformation in the Asian American CommunityBroadly, misinformation has become an increasingly popular subject.
How to Add Value as a Data Analyst
       
How to Add Value as a Data AnalystThe journey to becoming a “real” data analystLet’s start with a quick summary of three common misconceptions about analytics:Analytics is statistics. Part of the confusion comes from misunderstanding the difference between a decision-maker’s job and an analyst’s job (see above). Data pro vs amateur differences #1-#3Software skills; handling lots of data with ease; immunity to data science bias. Data pro vs amateur differences #4–#6Understanding the career; refusing to be a data charlatan; resistance to confirmation bias. Data pro vs amateur difference #7Realistic expectations of data.
Lo-fi Hip Hop and POV Playlists are symptoms of late capitalism
       
By the way, this is also the demographic for those who watch the videos and streamings of lo-fi hip hop playlists on YouTube. In other cases, the playlists themselves communicate their function, such as in the case of the popular streaming lofi hip hop radio — beats to relax/study to. What Winston and Saywood argue, however, is that the lo-fi hip hop genre is in itself a contradiction. During the pandemic, lo-fi hip hop streamings became even more popular as they were being used as a resource to alleviate anxiety. In this context, lo-fi hip hop and POV playlists are contradictorily born in the mechanisms of late (and algorithmic) capitalism at the same time they attempt to comfort listeners affected by these same actors.
Netflix’s “Old Enough” as Viewed by a Mother in Japan
       
LIVING IN JAPANNetflix’s “Old Enough” as Viewed by a Mother in JapanAre those little ones really “old enough” to run errands? Adorable little girl shopping in Bangladesh. (sumanamul15. No attribution required.) Perhaps you have been charmed, like me, by Netflix‘s recent show, “Old Enough,” called “My First Errand” in Japanese, a show already popular for 30 years in Japan.
Just the Gods Hurling Javelins of Lightning Above Mesa Verde
       
Just the Gods Hurling Javelins of Lightning Above Mesa VerdeIt was better than any light show from Pink Floyd or Genesis. Wally Chapstick looks for his falcon while Brother Dave and Sister Noelle hug by required signage “We Are Here.” Photo by The Mom. Day 16: July 8, 1988 Grand Canyon, Arizona to Mesa Verde, ColoradoNo cheerful morning sun greeted us.
Covid-19 Vaccines Are Still Effective
       
Covid-19 Vaccines Are Still EffectiveDebunking the “12% efficacy” viral mythPictured: great job all round. Vaccines are amazing! Photo by CDC on UnsplashThis piece is based on a fantastic investigation by Dr. Jeffery Morris that is worth reading and linked here. Recently, the internet has been abuzz with a new and shocking claim. Apparently, the Pfizer vaccine was not as effective as the 95% we were sold on — in fact, people are claiming that it was barely…
Learn More from What You Read with a Lexicon
       
We often came to class armed with having just read at least one complete book. Because I had to read them rather quickly to keep up, I didn’t have time for the kinds of notes I took in other literature classes, at least not on the first read. My current read: Our Country Friends, by Gary Steyngart, and lexicon in the back of the book; photographs by Kathleen Waller, PhDI tend to start in the book itself. A strategy might be to find related quotes to highlight or write the page numbers on the lexicon as you read. By reducing a book to a lexicon, we may paradoxically expand to great truths like this.
What Happens When the GOP Catches the Car?
       
What Happens When the GOP Catches the Car? When minority rule and unpopular policies collideMy dog hates UPS trucks (and Amazon trucks, and motorcycles, and mail trucks). We’ll be walking on a sidewalk when the hated delivery guy drives by, and she’ll mightily lunge and strain to get at him. But, of course, she’s leashed, so (hopefully) we’ll never find out what would happen if my dog had a…
Elon Musk: Why He Now Fires Employees Almost Every Day
       
Elon Musk: Why He Now Fires Employees Almost Every DayTwitter is f*cked. Elon Musk on Joe RoganThere are so many questions now that Elon bought Twitter. Like is there any way that Elon actually makes money off this or is it just an impulse buy like how my Aunt bought a Peloton for Christmas? A better question is what’s next for the platform and for the people that work there — as it…
These Are the Two Main Types of Adversarial Attacks in Neural Networks
       
These Are the Two Main Types of Adversarial Attacks in Neural NetworksBlack-box and white-box attacks are the two types of adversarial attacks that ML engineers should understand. The deep learning space includes a subdiscipline known as adversarial networks that focuses on creating neural networks that can disrupt the functionality of other models. Using that criteria, deep learning researchers typically classify adversarial attacks in two main groups: black-box vs. white-box. White-Box Adversarial AttacksThe white-box adversarial attacks describe scenarios in which the attacker has access to the underlying training policy network of the target model. Black-Box Adversarial AttacksBlack-box adversarial attacks describe scenarios in which the attacker does not have complete access to the policy network.
Trends in AI — May 2022. A monthly selection of news and…
       
Trends in AI — May 2022A monthly selection of news and research papers: open-source DALLE·2, Meta openly shares a 175B GPT-3 clone, Video Diffusion Models, Autoregressive Search Engines, Adversarial backdoor attacks, and more. ? Key insights → This paper is quite theoretical in nature but the gist of it is quite simple. ? Key insights → Representation learning can be understood from the lens of Information Theory. ❓ Why → Language Models + Information Retrieval has been a hot topic from what we’ve seen in 2022. Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality → Like the Winograd schema challenge but for image-language pairs: ambiguous questions that require world knowledge to disambiguate.
Guide to Using Descriptive Statistics in Data Science
       
Guide to Using Descriptive Statistics in Data Science Understand the key concepts to summarize data Photo by Cathryn Lavery on Unsplash Statistics are at the heart of data science and data analysis. Data visualization Visualizations allow us to get a quick overview of the data to better understand the type of data we will be working with. It gives a better idea of how each data point is distributed, especially if the data follows a normal distribution. Normal Distribution The Normal Distribution, or Gaussian distribution, is the most common type of distribution. Standard Normal Distribution Finally, we can normalize any normal distribution to obtain a “standard normal distribution” by subtracting the mean at each data point and dividing the result by the standard deviation.
Machine Learning Advice for College Students
       
Machine Learning Advice for College StudentsFor Aspiring Data Scientists and Machine Learning EngineersPhoto by Vadim Sherbakov on UnsplashToday we are going to look into Machine Learning and Data Science Advice for university students or college students whatever the difference is I am European so no clue about that. If you get something along the lines of a Data Science or Machine Learning degree obviously this is even better. With aspiring Data Scientists focusing more on statistics and Machine Learning Engineers focusing more on Software Engineering. We all know that vicious circle that you will need work experience to get a job and you need a job to get work experience. Now if you are looking for more information on how to be a Machine Learning Engineer specifically I can recommend my video on how to be a machine learning engineer.
Google AI Blog: GraphWorld: Advances in Graph Benchmarking
       
The recently-introduced Open Graph Benchmark (OGB) is an open-source package for benchmarking GNNs on a handful of massive-scale graph datasets across a variety of tasks, facilitating consistent GNN experimental design. However, the OGB datasets are sourced from many of the same domains as existing datasets, such as citation and molecular networks. The animation below visualizes GNN performance data from the GraphWorld node classification pipeline (using the SBM as the dataset generator). First, GraphWorld generates regions of graph datasets that extend well-beyond the regions covered by the standard datasets. With GraphWorld, researchers can also investigate novel random/generative graph models for more-nuanced GNN experimentation, and potentially use GraphWorld datasets for GNN pre-training.
Deploy and manage machine learning pipelines with Terraform using Amazon SageMaker
       
This lets you route requests and incoming traffic to different Amazon SageMaker endpoints. In this post, we show you how to deploy and manage ML pipelines using Terraform and Amazon SageMaker. Solution overviewThis post provides code and walks you through the steps necessary to deploy AWS infrastructure for ML pipelines with Terraform for model training and inference using Amazon SageMaker. Run the ML pipelineTo train and run the ML pipeline, go to the Step Functions console and start the implementation. You can also check the SageMaker training job progress and the status of your SageMaker endpoint.
PyCaret 3.0 Is Coming Soon - What’s New?
       
PyCaret 3.0 Is Coming Soon - What’s New? The first release candidate will be available as early as May 2022Photo by Andy Hermawan on UnsplashIntroductionWe have been working on PyCaret 3.0 for quite some time. PyCaret 3.0 will be fully compatible with the latest version of the scikit-learn. It is now finally coming together and will be generally available in PyCaret 3.0. requirements.txt comparison of PyCaret 2.X vs. 3.XAutomated Data type handlingNo more data type confirmations.
CatBoost vs. LightGBM vs. XGBoost
       
Catboost vs. LightGBM vs. XGBoost Characteristics The table below is a summary of the differences between the three algorithms, read on for the elaboration of the characteristics. LightGBM and XGBoost, on the other hand, results in asymmetric trees, meaning splitting condition for each node across the same depth can differ. Even though LightGBM and XGBoost are both asymmetric trees, LightGBM grows leaf-wise (horizontally) while XGBoost grows level-wise (vertically). Fig 2: LightGBM (left) vs. XGBoost (right) — Image by author Splitting Method Splitting Method refers to how the splitting condition is determined. However, generally, from the literature, XGBoost and LightGBM yield similar performance, with CatBoost and LightGBM performing much faster than XGBoost, especially for larger datasets.
Solving Differential Equations with Neural Networks
       
Solving Differential Equations with Neural NetworksA hands-on introduction to physics-informed neural networks with PyTorchPhoto by Dawid Małecki on UnsplashOver the last decades, artificial neural networks have been used to solve problems in varied applied domains such as computer vision, natural language processing and many more. Recently, another very promising application has emerged in the scientific machine learning (ML) community: The solution of partial differential equations (PDEs) using artificial neural networks, using an approach normally referred to as physics-informed neural networks (PINNs). It is immediate to see that if the NN output respects the equation above, one is actually solving the logistic equation. Let’s now see how to construct such loss function with a simple neural network built with PyTorch. Build the loss functionNow that we defined our universal function approximator, let’s build the loss function.
Predicting Diabetes with Machine Learning — Part I
       
Predicting Diabetes with Machine Learning — Part I An overview of different ML models to predict diabetes Photo by Towfiqu barbhuiya on Unsplash This article is the first of a series of two articles in which I’m going to analyze the ‘diabetes dataset’ provided by scikit-learn with different Machine Learning models. Before analyzing the data, it is important to understand what we are going to do, to give a ‘practical sense’ of what Machine Learning does. This can be helpful for people because we could predict a certain progression of diabetes before having clinical trials. There does not seem to be a big correlation between the progression of diabetes and the various features. Let’s try applying the linear regression model, to start easily, and see what we get.
Expand your Time Series Arsenal with These Models
       
Expand your Time Series Arsenal with These ModelsRegularizing, Bagging, Stacking, and MoreImage by authorTime series data typically has four components:AutoregressionSeasonalityTrendResidualPredict these components, and you can forecast almost any time series. Prepare ModelsAll models are run using the scalecast package, which contains results and wraps Scikit-learn and other models around time-series data. f.add_seasonal_regressors('month','quarter','week','dayofyear',raw=False,sincos=True) # fourier transformationf.add_seasonal_regressors('dayofweek','is_leap_year','week',raw=False,dummy=True,drop_first=True) # dummy varsFinally, we can model the series’ trend by adding the year variable:f.add_seasonal_regressors('year')For all these models, you typically want to feed them stationary time series data. MLR assumes that the series’ errors are uncorrelated, which is spurious in time series. We can tune the Ridge model with the same grid we created for the Lasso model.
Squeezing More out of LIME with Python
       
Squeezing More out of LIME with PythonHow to create global aggregations of LIME weightsPhoto by Laure Noverraz on UnsplashLIME is a popular method for explaining how machine learning models work. We will walk you through the process of collecting LIME weights for multiple predictions. In Figure 3, we can see that as whole weight increases the LIME weight increases. Figure 7: beeswarm of LIME weights (source: author)The above charts also allow us to compare the LIME weights and SHAP values. For example, notice how in Figure 7 some of the LIME weights are clustered together.
Implementing an Enterprise Recommendation System
       
Implementing an Enterprise Recommendation SystemAn end-to-end look at implementing a “real-world” content-based recommendation systemPhoto by Ammentrop on DreamstimeI recently completed a recommendation system that will be released as part of a newsfeed for a high traffic global website. This article is intended to assist practitioners, as they wade through the various decision points of an enterprise recommendation system project. To ensure that only the highest quality recommendations are provided, API recommendations were capped to three of the highest scoring for each document. New visitors that have not interacted with newsfeed content are served a mix of the most popular content in the newsfeed. Serverless LambdaA serverless architecture was quickly becoming the best choice for deploying the recommendation API.
5 Things I Have Learned Working in an MIT AI Research Lab for a Year
       
5 Things I Have Learned Working in an MIT AI Research Lab for a YearHow it has changed my views on life, knowledge, truth, and what it means to be humanMe, getting absolutely blasted by wind chill in front of MIT’s dome, January 2022. Below are 5 things that I have learned in a year of working in an MIT AI lab — some things I hope you find amusing or useful for your own journey, and some things that have profoundly impacted the way that I view life, success, knowledge, and humanity itself. — Albert EinsteinIam going to be honest with you — before I started at MIT, I thought I was hot shit. Not just in my lab, but seemingly across MIT itself — it is just the culture. About the AuthorMike Ferguson is a computational research developer (ML/AI) in Jim DiCarlo’s Lab at MIT.
Diffusion Models Made Easy
       
Diffusion Models Made EasyUnderstanding the Basics of Denoising Diffusion Probabilistic ModelsFigure 1: Process of Denoising Diffusion Probabilistic Model (Image by author)1. Denoising Diffusion ModelThe idea of denoising diffusion model has been around for a long time. A denoising diffusion modeling is a two step process: the forward diffusion process and the reverse process or the reconstruction. Figure 2: Results of a forward Diffusion process on synthetic dataset of S-Curve (Image by author)The results for the reverse diffusion process can be seen in the following figure. Although Diffusion Models are computationally more expensive than other deep network architectures, however, they perform much better in certain applications.
The Right To Abortion Is Critical As A Human Right
       
dresseAbortion and HealthcareThe Right To Abortion Is Critical As A Human RightIf one human being cannot demand the use of another’s body for their own benefit, access to abortion must be protected by the same principle. Roughly once a month for the last few years I get a call from the local blood bank.
Junk Yard — the untold story. The story of the pinball machine Junk…
       
Junk Yard — the untold storyThe story of the pinball machine Junk Yard is the story about a turning point in pinball and what the game could have been. - Dwight Sullivan on his blogNow, it was time for their next collaboration, which turned out to be Junk Yard. Dwight Sullivan at his office with a prototype Junk Yard machine. To escape the Junk Yard you have to collect junk and build an escape vehicle (a “jalopy” that is later developed into a flying machine). Junk Yard is the only pinball game Kurt worked on (excluding the redemption machine “Ticket Tac Toe”).
I Don’t Care If a Fetus Is a Person — Abortion Should Remain Safe and Legal
       
I Don’t Care If a Fetus Is a Person — Abortion Should Remain Safe and Legal Why people arguing for the personhood of a fetus are completely missing the point. What was Roe v Wade anyway? Now they’ve made laws that got challenged all the way up to the Supreme Court, giving them and their conservative majority there the opportunity to overturn the landmark decision of Roe v. Wade. See, we can debate at what point a fetus is a person until we’re blue in the face. First of all, I want to stress that this is not a final decision — Roe v. Wade hasn’t officially been overturned.
Why The Supreme Court Is Starting to Remind You Of The Dred Scott Case
       
RACISMWhy The Supreme Court Is Starting to Remind You Of The Dred Scott CaseOverturning Roe v. Wade sounds a lot like Dred Scott v. Sandford CaseIllustration of Dred Scott and Wife Harriet Scott in 1857 | Photo Credit | Frank Leslie via Library of CongressThe Supreme Court’s leaked draft decision would shatter Roe v. Wade, disenfranchising millions of women. And this decision echoes another landmark case— Dred Scott v. Sanford. On March 6, 1857…
Why a Garden is Better than a Phone
       
Why a Garden is Better than a PhonePick up your spade, put down your device“Garden of Surprises” by It’s No Game is marked with CC BY 2.0. To view the terms, visit https://creativecommons.org/licenses/by/2.0/?ref=openverse“We must cultivate our garden.” When Voltaire wrote that famous concluding line to Candide he could never have foreseen the degree to which humans two-and-a-half centuries later would fail to heed his advice. Instead of consistently re-tilling our belief system, planting questions about the nature of truth, and field-testing gambles on…
The Insecurities of Being an Older Mom
       
The Insecurities of Being an Older MomMidlife MotherhoodPhoto by Alexey Shikov on UnsplashHaving a baby post 40 is challenging enough without people thinking you’re a grandma. I’m at a deli counter in a grocery store when the woman slicing my roast beef asks: “Is that your baby? Or yours for the day?”“I know,” I say, “she looks nothing like me.” To which she replies: “Oh, I guess that’s it…”, in a way…
Want to Share a Rental with Roommates? In Some Places, That’s Illegal
       
Want to Share a Rental with Roommates? In Some Places, That’s IllegalA ban on co-living in Kansas sparked outrage; it is not just happening there. Photo by Kinga Cichewicz on UnsplashWhen I was researching my book How We Live Now: Redefining Home and Family in the 21st Century, I asked many people how they would like to live if they could choose any way at all. One popular fantasy was the Golden…
Austria buys JavaScript from Brendan Eich.
       
Austria buys JavaScript from Brendan Eich. Brendan Eich — By Darcy Padilla, CC BY-SA 3.0Brendan Eich, the creator of JavaScript, has announced that he has sold the rights to the programming language to the Austrian government. The move comes as a surprise to the tech industry, as JavaScript is one of the most widely used languages on the web. In a statement, Eich said that he was “tired of the politics and the haters” and wanted to “focus on other…
PyScript — unleash the power of Python in your browser
       
PyScript — unleash the power of Python in your browserA sneak peek at how to run Python from HTML codeDuring a keynote speech at PyCon US 2022, Anaconda’s CEO Peter Wang unveiled quite a surprising project — PyScript. It is a JavaScript framework that allows users to create Python applications in the browser using a mix of Python and standard HTML. For the time being, PyScript supports writing and running Python code in a browser. Currently, when using PyScript we can only use the libraries that are supported by Pyodide. The most basic example provided on PyScript’s website is the following:As we can see, Python code is embedded in the block.
Create Indonesian Recipe Generator by Fine-tuning T5, BART, and GPT-2
       
Create Indonesian Recipe Generator by Fine-tuning T5, BART, and GPT-2An Indonesian recipe generator Deep Learning model trained by fine-tuning pre-trained models such as T5, BART, and GPT-2Preview Image is from Unsplash by Brooke LarkHello Everyone! Since the data is in the Indonesian language, I need to use models pre-trained with the Indonesian data. In this post, I experimented with an Indonesian recipe generator using pre-trained models. We can also conclude that fine-tuning a pre-trained model is generally better than the non-pre-trained one. For example, it is interesting to see the effect of pre-trained vs the non-pre-trained one of BART, T5, and GPT.
AI Researchers Are Constantly Trying To Recreate These Cognitive Functions of the Human Brain
       
AI Researchers Are Constantly Trying To Recreate These Cognitive Functions of the Human BrainAttention, memory, imagination, inference, and continual learning are cognitive functions of the human brain that are at the forefront of AI research. For many AI researchers, the ultimate goal of AI is to emulate the capabilities of the brain. Everyone knows that most foundational concepts in AI such as neural networks have been inspired by the architecture of the human brain. AttentionAttention is one of those magical capabilities of the human brain that we don’t understand very well. Attentional mechanisms have become a recent source of inspiration in deep learning models such as convolutional neural networks(CNNs) or deep generative models.
7 Data Pre-Processing Methods With SciKit-Learn
       
7 Data Pre-Processing Methods With SciKit-LearnUsing Python and Google ColabPhoto by James Harrison on UnsplashData pre-processing is an important part of preparing, organizing, and structuring data for further analysis or Machine Learning model engineering. #Define X and y:y = df['RainTomorrow']X = df.drop('RainTomorrow', axis=1)If we further inspect y, we will find that it is coded as a string with values ‘Yes’ and ‘No’. The RobustScaler function tries to solve this problem by applying data transformation that removes the median and scales the data according to the quantile range. Remember that even though this change in scale may seem counter-productive for data visualization, our focus here is on data preparation to building Machine Learning models and not Data Visualization. Let’s try with our sample dataset:#Import and read DataFrame:df = pd.read_csv('/content/weather.csv')dfNow we will inspect our variables to check if any of them have binomial distribution:#Plot histograms for numerical variables:fig, axs = plt.subplots(4, 4, figsize=(14, 14)) sns.histplot(data=df, x="MinTemp", kde=True, color="skyblue", ax=axs[0, 0])sns.histplot(data=df, x="MaxTemp", kde=True, color="skyblue", ax=axs[0, 1])sns.histplot(data=df, x="Rainfall", kde=True, color="skyblue", ax=axs[0, 2])sns.histplot(data=df, x="Evaporation", kde=True, color="skyblue", ax=axs[0, 3]) sns.histplot(data=df, x="WindGustSpeed", kde=True, color="skyblue", ax=axs[1, 0])sns.histplot(data=df, x="WindSpeed9am", kde=True, color="skyblue", ax=axs[1, 1])sns.histplot(data=df, x="WindSpeed3pm", kde=True, color="skyblue", ax=axs[1, 2])sns.histplot(data=df, x="Humidity9am", kde=True, color="skyblue", ax=axs[1, 3]) sns.histplot(data=df, x="Humidity3pm", kde=True, color="skyblue", ax=axs[2, 0])sns.histplot(data=df, x="Pressure9am", kde=True, color="skyblue", ax=axs[2, 1])sns.histplot(data=df, x="Pressure3pm", kde=True, color="skyblue", ax=axs[2, 2])sns.histplot(data=df, x="Cloud9am", kde=True, color="skyblue", ax=axs[2, 3]) sns.histplot(data=df, x="Cloud3pm", kd
Google AI Blog: Alpa: Automated Model-Parallel Deep Learning
       
Model parallelism often requires significant effort from system experts to identify an optimal parallelism plan for a specific model. In “Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning”, published at OSDI 2022, we describe a method for automating the complex model parallelism process. Alpa DesignWe begin by grouping existing ML parallelization strategies into two categories, inter-operator parallelism and intra-operator parallelism. Alpa is a new framework that leverages intra- and inter-operator parallelism for automated model-parallel distributed training. We believe that Alpa will democratize distributed model-parallel learning and accelerate the development of large deep learning models.
Rethinking Human-in-the-Loop for Artificial Augmented Intelligence
       
Rethinking Human-in-the-Loop for Artificial Augmented IntelligenceFigure 1: In real-world applications, we think there exist a human-machine loop where humans and machines are mutually augmenting each other. For demonstration, we designed a recognition framework that was a combination of active learning, semi-supervised learning, and human-in-the-loop (Figure 3). As a result, the percentage of high-confidence predictions on second step validation was 72.2%, the accuracy of high-confidence predictions was 90.2%, and the percentage of novel classes detected as low-confidence was 82.6%. For example, when AI models cannot recognize novel classes, human intervention can provide information to expand the model’s recognition capacity. However, this goal of replacing human effort is intrinsically building up opposition or a mutually exclusive relationship between humans and machines.
Comprehensive Guide to Principal Component Analysis
       
Comprehensive Guide to Principal Component AnalysisTheoretical Explanation of the Principal Component AnalysisPrincipal Component Analysis (short: PCA) is used when you want to reduce the number of variables in a large data set. When do we use Principal Component Analysis? The core idea of Principal Component Analysis is that several variables in a data set may measure the same thing, i.e. Compared to similar statistical analyses, Principal Component Analysis has only a few requirements that must be met in order to obtain meaningful results. Not all data sets can be used for Principal Component Analysis without further ado.
Why Text Summarization Is Still Hard
       
Why Text Summarization Is Still HardAnd how this poses an opportunity for startupsOut of all Natural Language Processing (NLP) tasks, summarization is arguably one of the least headline-worthy. However, despite its lower-key profile, text summarization is far from being solved, especially in industry. This article discusses the reasons why text summarization remains a challenge. (Excerpt from Google AI blogpost)Consistent with its own advice above, Google offers different end points for its experimental abstractive summarization API, with each one focusing on a rather narrow application:Separate summarization models offered by Google’s experimental abstractive summarization API (currently under private access). Beyond text: audio and video summariesCompared to audio and video, text is arguably the simpler modality to summarize.
4 Apps That Will Make You More Productive as a Data Scientist
       
4 Apps That Will Make You More Productive as a Data ScientistBecome more productive when writing code, taking notes, organizing tasks, projects, and more! Fortunately, there are apps that can help you become more productive and focus on what matters most. In this article, I listed some apps that helped me become more productive when writing code and taking notes and also used to organize my tasks, calendars, emails, projects, and even Github cards. As a data scientist, you might feel overwhelmed with so many things to do. Source: Sunsama (Keep Productive)As you can see, this daily planner helps you organize tasks, emails, and more in one place.
A.I. Talks with Animals
       
Talks with AnimalsCan machine learning algorithms eavesdrop on animal language? The coyote cutout received calls similar to those the prairie dogs used for real coyotes, but in response to the oval shape, the Prairie Dogs produced an entirely novel call¹². Some scientists, including Slobodchikoff, call animal communication “animal language” exactly because, just as humans use language to communicate, animals’ verbal and gestural signals are the way in which animals communicate. After all, animals’ communication systems serve them in their environment just as human language serves us in ours. In his work with prairie dogs, for example, Con Slobodchikoff carefully recorded which predator was approaching at the time of each prairie dog’s alarm call.
Your Anomaly Detection Model Is Smarter than You Think
       
Your Anomaly Detection Model Is Smarter than You ThinkMultivariate time series anomaly detection models can provide rich insights if you invest some time in post-processing their results…Photo by Markus Spiske on UnsplashWhile dealing with industrial sensor data, I often tackle anomaly detection use cases. I need sound root cause analysis before I adjust my manufacturing process.” “Anomaly detection is not enough: when a model detects an anomaly, it’s already too late. Feel free to update it to prepare the data to a format suitable for your own anomaly detection model. Training an anomaly detection modelI will use Amazon Lookout for Equipment to train an anomaly detection model on the previous dataset. My anomaly detection model outputs a dataframe with a status for each time stamp (0 if nothing is detected and 1 if an anomaly is found).
What Happens When You Include Irrelevant Variables in Your Regression Model?
       
k rows and 1 column, assuming there are k regression variables in the model including the intercept and also including any irrelevant variables. Since X is of size (n x k) and X’ is of size (k x n), X’X is of size (k x k). Since is of size (n x k) and is of size (k x n), is of size (k x k). The superscript of (-1) indicates that we have taken the inverse of this (k x k) matrix which is another matrix of size (k x k). And thus, we have another important result:Addition of irrelevant variables to a regression model will make the coefficient estimates of all regression variables to become less precise.
Efficient Generalized Spherical CNNs
       
Efficient Generalized Spherical CNNsHybrid rotationally equivariant spherical CNNsNotions of spherical convolution offer a promising route to unlocking the potential of deep learning for the variety of problems in which spherical data are prevalent. (Further details can be found in our related ICLR paper on Efficient Generalized Spherical CNNs.) Generalized Spherical CNNsArmed with way in which to linearly and non-linearly transform generalized signals in a rotationally equivariant manner, generalized spherical CNNs can be constructed (Kondor et al. By using channel-wise tensor products, efficient generalized spherical CNN layers reduce the computational footprint of the fragment computation. ExperimentsMaking these modifications within the layers of generalized spherical CNNs allow us to train much more expressive models than would otherwise be possible.
Named Entity Recognition with BERT in PyTorch
       
I wrote about how we can leverage BERT for text classification before, and in this article, we’re going to focus more on how to use BERT for named entity recognition (NER) tasks. This is exactly what we want since we want our BERT model to predict the entity of each token. Tokenization Tokenization can be easily implemented with BERT, as we can use BertTokenizerFast class from a pretrained BERT base model with HuggingFace. BertForTokenClassification class is a model that wraps BERT model and adds linear layers on top of BERT model that will act as token-level classifiers. Below is the example of the training output after we train our BERT model for 5 epochs: Of course, the output that you’ll see may vary when you train your own BERT model as there is stochasticity in the training process.
UNIKUD: Adding Vowels to Hebrew Text with Deep Learning
       
UNIKUD: Adding Vowels to Hebrew Text with Deep Learning Introducing the first open-source Hebrew nakdan (נקדן) tool which uses no rule-based logic. Contents: The Hebrew Writing System Introduction to UNIKUD UNIKUD Datasets Methods Results Limitations and Further Directions Conclusion References1. In order to train UNIKUD to learn how to vocalize Hebrew text, we needed to collect a dataset of Hebrew text with vowel points. As a large deep learning model, UNIKUD is substantially slower at performing inference (adding vowels to text) when it is run on CPU rather than GPU. Conclusion In this project, we presented the UNIKUD model which adds nikud to Hebrew text, the first open-source nakdan built with deep learning and using no hand-written rules specific to Hebrew.
Implementing SegFormer in PyTorch
       
Image by the AuthorImplementing SegFormer in PyTorchA fast, efficient, and lightweight model for image segmentationHello There!! Today we’ll see how to implement SegFormer in PyTorch proposed in SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. This model has two main advantages, first SegFormer comprises a novel hierarchically structured Transformer encoder which outputs multiscale features. In SegFormer , the conv layer is followed by a layer norm. ???ConclusionsIn this article we have seen, step by step, how to create SegFormer; a fast and efficient model for image segmentation.
4 Habits of Emotionally Intelligent People
       
4 Habits of Emotionally Intelligent People#4: Choosing values over feelingsPhoto by Andrea Piacquadio from PexelsAs a psychologist, I work with a lot of people who want to improve their emotional intelligence. But despite all the inspiring podcasts and YouTube videos they consume, they still struggle with it:
The Slow Burn of an Empty Nest
       
The Slow Burn of an Empty NestWhen there’s more than onePhoto by Mohan Moolepetlu on Unsplash“I don’t want them to go. I love them to death. But once they are gone, won’t it be kind of nice to have fewer complications, fewer interruptions?”These are the things you say before they go, before they’re gone.
Understanding the Important Role of Anger In Complex Trauma Recovery
       
Understanding the Important Role of Anger In Complex Trauma RecoveryWhy anger is a necessary part of healing from childhood traumapolga/shutterstockElisabeth Kübler-Ross was a psychiatrist who extensively studied death and dying (grief). She discovered that each of these people went through the same stages of grieving, even if they experienced them differently. This began her career…
The Great Resignation of Pastors
       
The Great Resignation of PastorsAlmost half of all pastors on the verge of quittingThere is a social and cultural phenomenon taking place in many western nations in the wake of the COVID-19 pandemic. Forced lockdowns, working from home, and not working at all have caused many people to reassess their employment situations and ask the question, “Is it time for a change?”
Is a Mental Health App a Substitute for Therapy?
       
Is a Mental Health App a Substitute for Therapy? Pixabay No Attribution RequiredIf you have been looking for a therapist or psychiatrist lately, you have probably found yourself on the back end of a long waiting list. The pandemic has taxed not only those that take care of our physical health but those that care for our mental health as well. Many people who are in need of mental health services are simply unable to find a provider who is taking new patients, and some are turning to…
My Statement with Michelle on the Draft Supreme Court Decision to Overturn Roe v. Wade
       
My Statement with Michelle on the Draft Supreme Court Decision to Overturn Roe v. WadeGetty ImagesToday, millions of Americans woke up fearing that their essential freedoms under the Constitution were at risk. If the Supreme Court ultimately decides to overturn the landmark case of Roe v. Wade, then it will not only reverse nearly 50 years of precedent — it will relegate the most intensely personal decision someone can make to the whims of politicians and ideologues. Few, if any, women make the decision to terminate a pregnancy casually — and people of goodwill, across the political spectrum, can hold different views on the subject. But this draft decision doesn’t seek to balance these interests. Instead, it simply forces folks to give up any constitutionally recognized interest in what happens to their body once they get pregnant.
Pink vs Black — A Rude Awakening. A racial journey in fits and starts
       
Pink vs Black — A Rude AwakeningA racial journey in fits and startsThere’s a picture of my dad holding me in the hospital on the day I was born, his jerry curl dangling in shiny coils above my face. If you look closely, which I’m sure I did, you’ll find a Mercedes Benz with diamond windows nestled in his chest hair. My dad’s beaming down at me through heavy square-framed Cazal’s. My mom isn’t pictured, but…
Nobody Cares Who Leaks Things
       
Nobody Cares Who Leaks ThingsAnd they shouldn’t. It has been roughly 20 hours since Politico’s incredible scoop of the Supreme Court draft opinion striking down Roe v. Wade hit, and I, like millions of Americans, am still stunned. The number of people’s lives who are going to be devastated by the ruling — if it ends up standing as the final ruling, as it is expected to — is impossible to quantify; it’s unfathomable and tragic…
Fake Meat Won’t Save the Planet
       
Fake Meat Won’t Save the PlanetHype around meat substitutes and the problems they supposedly solve is questioned by experts in many fieldsImage: Unsplash/Maude Frédérique LavoieTrying to eat a healthy diet with an eye toward sustainability is bewildering these days. The recipe for perplexity includes long-running arguments over the nutritional and environmental merits and demerits of meat vs. plant-based…
PyScript — unleash the power of Python in your browser
       
PyScript — unleash the power of Python in your browserA sneak peek at how to run Python from HTML codeDuring a keynote speech at PyCon US 2022, Anaconda’s CEO Peter Wang unveiled quite a surprising project — PyScript. It is a JavaScript framework that allows users to create Python applications in the browser using a mix of Python and standard HTML. For the time being, PyScript supports writing and running Python code in a browser. Currently, when using PyScript we can only use the libraries that are supported by Pyodide. The most basic example provided on PyScript’s website is the following:As we can see, Python code is embedded in the block.
Uber Uses This Framework for Rapid ML Development
       
Uber Uses This Framework for Rapid ML DevelopmentPyML is a framework that allows the rapid development of ML models compatible with Uber’s infrastructure. The goal is to keep you up to date with machine learning projects, research papers and concepts. Just this year, Uber has introduced technologies like Michelangelo, Pyro.ai, and Horovod that focus on key building blocks of machine learning solutions in the real world. To accomplish that, PyML focuses on three main aspects:1) Provide a standard contract for machine learning prediction models. Image Credit: UberA Standard Machine Learning ContractPyML models can be authored by different machine learning frameworks such as TensorFlow, PyTorch, or Scikit-Learn.
Applications of Generative Adversarial Networks (GANs)
       
What are Generative Adversarial NetworksFig 1: Examples of GAN real-world implementation [23]GANs (Generative Adversarial Networks) are called generative deep learning models. The notion was initially introduced in a 2014 study titled “Generative Adversarial Networks” by Ian Goodfellow and colleagues. Generic modeling algorithms are generated using generative adversarial networks, or GANs, which employ deep learning approaches like convolutional neural networks. Fig 7 : Generator and DiscriminatorBenefits of GANThe need for Generative Adversarial Networks (GANs) has risen dramatically in recent years. [6] Generative Algorithms in DL by Darshan Dilipbhai Patelhttps://medium.com/@16bit040/generative-algorithms-in-dl-fd93c15808d7[7] On Discriminative vs. Generative iclassifiers: A comparison of logistic regression and Naive Bayes, by Andrew Ng and Michael I. Jordanhttps://proceedings.neurips.cc/paper/2001/file/7b7a53e239400a13bd6be6c91c4f6c4e-Paper.pdf [8] The Math Behind Generative Adversarial Networkshttps://lilianweng.github.io/lil-log/2017/08/20/from-GAN-to-WGAN.html[9] A Beginner’s Guide to Generative Adversarial Networks (GANs) By Data Science https://datascience.eu/machine-learning/a-beginners-guide-to-generative-adversarial-networks-gans/[10] Advantages and disadvantages of generative adversarial networks (GAN) by Junaid Rehman https://www.itrelease.com/2020/06/advantages-and-disadvantages-of-generative-adversarial-networks-gan/[12] List of Papers published on GANshttps://github.com/zhangqianhui/AdversarialNetsPapers[13] Deep Learning CNN for Fashion-MNIST Clothing Classificationby Jason Brownlee on May 10, 2019, in Deep Learning for Computer Vision https://machinelearningmastery.com/how-to-develop-a-cnn-from-scratch-for-fashion-mnist-clothing-c ilassification/[14] A Gentle Introduction to Generative Adversarial Networks (GANs)by Jason Brownlee on June 17, 2019, in Generative Adversarial Networkshttps://machinelearningmastery.com/what-are-generative-adversarial-networks-gans/[15] A Beginner’s Guide to Gener
Using Kaggle in Machine Learning Projects
       
Setting up Kaggle NotebooksUsing Kaggle Notebooks with GPUs/TPUsUsing Kaggle Datasets with Kaggle NotebooksUsing Kaggle Datasets with Kaggle CLI toolWhat Is Kaggle? In addition to that, Kaggle also offers some courses and a discussions page for you to learn more about machine learning and talk with other machine learning practitioners! Setting up Kaggle NotebooksTo get started with Kaggle Notebooks, you’ll need to create a Kaggle account either using an existing Google account or creating one using your email. Using Kaggle Datasets with Kaggle NotebooksMachine learning projects are data-hungry monsters, and finding datasets for our current projects or looking for datasets to start new projects is always a chore. Specifically, you learnt:What is KaggleHow to use Kaggle notebooks along with their GPU/TPU acceleratorHow to use Kaggle datasets in Kaggle notebooks or download them using Kaggle’s CLI tool
Increase your content reach with automated document-to-speech conversion using Amazon AI services
       
We extract text from scanned documents using Amazon Textract, and then convert the text to speech using Amazon Polly. Architecture and codeAs described in the previous section, we use two key AI services, Amazon Textract and Amazon Polly, to build a document-to-speech conversion solution. When the image-to-text conversion is complete, Amazon Textract sends a notification to Amazon Simple Notification Service (Amazon SNS). Similar to Amazon Textract, Amazon Polly sends a notification to Amazon SNS when the job is done. The application uses an Amazon DynamoDB table to track job information such as Amazon Textract job ID, Amazon Polly job ID, and more.
Achieve hyperscale performance for model serving using NVIDIA Triton Inference Server on Amazon SageMaker
       
NVIDIA Triton Inference Server is an open-source inference serving software with features to maximize throughput and hardware utilization with ultra-low (single-digit milliseconds) inference latency. In this post, we look at best practices for deploying transformer models at scale on GPUs using Triton Inference Server on SageMaker. SageMaker Inference Recommender for benchmarking test resultsWe use SageMaker Inference Recommender to run our experiments. SageMaker Inference Recommender uses this information to pull an inference Docker image from Amazon Elastic Container Registry (Amazon ECR) and register the model with the SageMaker model registry. He helps customers achieve high performance model inference on SageMaker.
Build a corporate credit ratings classifier using graph machine learning in Amazon SageMaker JumpStart
       
The graph data and tabular data are used to fit a rating classifier using GNNs. You’re not restricted to the feature set in this example—you can change both the graph data and tabular data for your own use case. Data used in the solutionThe dataset has synthetic tabular data such as various accounting ratios (numerical) and industry codes (categorical). The graph information is passed in to the Deep Graph Library and combined with the tabular data to undertake graph ML. Read in graph data from Amazon Simple Storage Service (Amazon S3) and create the source and destination node lists for CorpNet.
10 Must-know Seaborn Functions for Multivariate Data Analysis in Python
       
Functions to use:sns.scatterplot() — axes-level plot— axes-level plot sns.relplot(kind=’line’) — figure-levelFunctions with regression line;sns.regplot() — axes-level— axes-level sns.lmplot() — figure-levelTwo numeric columns (bivariate)sns.scatterplot(x='num_col1', y='num_col2', data=df) — Let us visualize the engine size with the mileage (efficiency) of the vehicle. Functions to use:sns.lineplot() — axes-level plot— axes-level plot sns.relplot(kind=’line’) — figure-level plotTwo columns (bivariate): numeric and time series. sns.relplot(x, y, data, kind='line', col='cat_col') — As mentioned earlier, a rel plot’s kind=’line’ parameter plots a line graph. Functions to use:sns.barplot() — axes-level plot— axes-level plot sns.catplot(kind=’bar’) — figure-level plotTwo columns (bivariate): numeric and categoricalsns.barplot(x=’cat_col’, y=’num_col’, data=df)sns.barplot(x='fuel',y='selling_price',data=cars,color='blue',# estimator=sum,# estimator=np.median);Barplot by authorThree columns (multivariate): two categorical and one numeric. Image from sourceFunctions to use:sns.violinplot() — axes-level plot— axes-level plot sns.catplot(kind=’violin’) — figure-level plotTwo columns (bivariate): numeric and categorical.
The Basics of Neural Networks (Neural Network Series) — Part 1
       
The Basics of Neural Networks (Neural Network Series) — Part 1 Neural Networks An Artificial Neural Network (ANN) or simply a Neural Network(NN) is interconnected layers of small units called nodes that perform mathematical operations to detect patterns in data. Artificial Neuron — Mathematical Operation on one Neuron An artificial neuron takes input values (it can be several) with weights assigned to them. Neural Network Design A Neural Network(NN) is made of several neurons stacked into layers. First, the input values are weighted by multiplying the input values with corresponding weights. Artificial neuron An artificial neuron (also called a unit or a node) mimics the biological neuron in structure and function (in a loose sense — see the next note).
Diffusion Models Made Easy
       
Diffusion Models Made EasyUnderstanding the Basics of Denoising Diffusion Probabilistic ModelsFigure 1: Process of Denoising Diffusion Probabilistic Model (Image by author)1. Denoising Diffusion ModelThe idea of denoising diffusion model has been around for a long time. A denoising diffusion modeling is a two step process: the forward diffusion process and the reverse process or the reconstruction. Figure 2: Results of a forward Diffusion process on synthetic dataset of S-Curve (Image by author)The results for the reverse diffusion process can be seen in the following figure. Although Diffusion Models are computationally more expensive than other deep network architectures, however, they perform much better in certain applications.
Are your training and test sets comparable?
       
Data scientists usually split a dataset into training and test sets. A short guide on how to create comparable training and test datasetsShould training and test sets be similar? From the beginning of my career, everybody used to split a dataset into training and test sets randomly and uniformly. So, before training our model, we must make sure that training and test datasets are statistically similar. We then compare such a function of a feature in the training dataset with the same function of the same feature on the test dataset.
Generate distractors for MCQs using Word Vectors, Sentence Transformers and MMR algorithm
       
Generate distractors for MCQs using Word Vectors, Sentence Transformers and MMR algorithmUse NLP to generate wrong answers for MCQs in EdtechImage from PixabayIf you are working at the crossroads of NLP and Edtech, you will sooner or later encounter the problem of generating distractors (wrong answer choices) for a given question and answer, automatically using NLP. A: Barack ObamaNow the goal is to find wrong answer choices (distractors) for the word “Barack Obama”. You also encounter other duplicates like George Bush, George W Bush, etc from which you only need to keep only one as a distractor. And you can see that the output is -Barack Obama— — — — — — — — — ->John McCainGeorge W BushSarah PalinBill ClintonYou can clearly see that near-duplicates like ‘Barrack Obama’, ‘George W. Bush’, ‘George Bush’, ‘President Obama’, ‘Obama’, etc are filtered and are not present in the final output. ConclusionWord vector algorithms can be used as a means to generate distractors (wrong choices) for Multiple Choice Questions given a correct answer.
Optimal Undersampling using Machine Learning, with Python
       
Optimal Undersampling using Machine Learning, with PythonHere’s how to smartly undersample your signal using few lines of codePhoto by Prateek Katyal on UnsplashIn the era of Big Data undersampling is a key part of Data Processing. Basically you are just connecting the points using straight lines. Oversample the low quality area Connect these points using an interpolation technique. Interpolate the signal and compute, from your n starting point, an interpolated signal with values in the original N data points 2. Compute the absolute normalized difference between the interpolated signal in the N data points and the original N data points values 3.
Demystify Machine Learning Model Selection
       
Demystify Machine Learning Model SelectionLeverage cross-validation, performance metrics, and total runtime to determine the best model for your dataPhoto by Vladislav Babienko on UnsplashWhat is Model Selection? Model selection in Machine Learning is selecting the best model for your data. The idea with model selection is to pick the best performing model, not tune the model for its best performance. The final selection is up to you, but these methods should give you a strong baseline for selecting the best model for your use case! Choosing the right model can greatly impact the performance of your machine learning model, and choosing the wrong model, can leave you with unacceptable results.
Grid Search VS Random Search VS Bayesian Optimization
       
Grid Search VS Random Search VS Bayesian OptimizationWhich hyperparameter tuning method is best? Random searchThe random search is also an uninformed search method that treats iterations independently. Bayesian OptimizationUnlike the grid search and random search, which treat hyperparameter sets independently, the Bayesian optimization is an informed search method, meaning that it learns from previous iterations. The goal is to fine-tune a random forest model with the grid search, random search, and Bayesian optimization. Code Output (Created By Author)The grid search registered the highest score (joint with the Bayesian optimization method).
Get a 10% jump in Your Machine Learning Model Performance
       
Get a 10% jump in Your Machine Learning Model PerformanceA step by step guide for improving your model performance beyond hyperparameter tuningSuppose you have identified a machine learning model and the corresponding hyperparameters that give you the best performance but still the accuracy of the model is below the baseline/expected accuracy. In this blog, I will take you through a 3 step process for improving your model performance beyond hyperparameter tuning. Does it mean that we should focus on this tag for improving our model performance? Gap between the model error and human error (Image by Author)3. ConclusionIn order to improve the model performance beyond hyperparameter tuning, we can use the error analysis technique to identify the categories/tags for which the model underperforms as compared to the baseline.
Do Pistol Shrimps Hold The Key To Nuclear Fusion?
       
Shrimp — WikiCCDo Pistol Shrimps Hold The Key To Nuclear Fusion? Biomimicry of a deadly sea creature could soon provide us with near limitless clean energy. In theory, nuclear fusion is the perfect energy source, simply put hydrogen in and get copious amounts of energy and helium out. There is no vast habitat loss like solar and no horrific carbon emissions like coal or gas.
Americans Are Set to Lose the Right to Abortion. Why Aren’t Democrats Raising Hell?
       
Americans Are Set to Lose the Right to Abortion. Why Aren’t Democrats Raising Hell? The Supreme Court is expected to overturn Roe v. Wade next month, but Democrats are asleep at the wheel. Reproductive rights activists protest as oral arguments in Dobbs v. Jackson Womens Health Organization case are held on Wednesday, December 1, 2021. The case considers the constitutionality of Mississippi’s restrictive ban on abortion after 15 weeks.
Tucker Carlson’s Data-Driven Demagoguery
       
Tucker Carlson’s Data-Driven DemagogueryThe Fox Network star is a ratings crack-head, pandering to minute-by-minute data on his audience’s basest fears. In April 2017, Tucker Carlson’s star at Fox News rose to the top slot, the covered 8:00pm prime time hour that had just been vacated by Bill O’Reilly. He had one challenge: how to build a loyal audience while…
Bottled water monopolist admits recycling is bullshit
       
Bottled water monopolist admits recycling is bullshit“Personal responsibility” and “caveat emptor.”CORRECTION: The original version of this article identified Exxon as the creator of the recycling symbol,They did not create the symbol, but they did pressure 40 US state legislatures to mandate the use of the logo, though they knew that the plastics that bore it couldn’t be recycled.
5 Common Money Mistakes to Avoid
       
5 Common Money Mistakes to AvoidThese things prevent you from getting richI came across a study the other day that asked people where they burned most of their hard-earned money. The answer of over 64 million Americans? “Having fun.” I don’t think there’s much wrong with having fun. But when we spend too much on having fun, we often end up regretting it when we suffer financially.
Roe v. Wade Was Set Up To Fail
       
Roe v. Wade Was Set Up To FailWe did this to ourselves. Photo by EKATERINA BOLOVTSOVA via PexelsThe push notification materialized in deafening silence. News website Politico had leaked an initial draft of Supreme Court majority opinion overturning Roe v. Wade. The day that pro-choice America had been dreading (or had uneasily tried to convince themselves would never actually come) had dawned.
The WHCD Shows How You’re Supposed to Handle Jokes
       
The WHCD Shows How You’re Supposed to Handle JokesHumor is, you know, good. This clip, of host Trevor Noah imploring the journalists in his audience at the White House Correspondents’ Dinner this weekend to understand their power and their obligation in democracy — to do better — rattled around the Internet this weekend, and for good reason: It’s a vital, urgent sentiment that is often forgotten by the…
How Comfort and Conformity Are Dream Killers
       
How Comfort and Conformity Are Dream KillersWhy changing your environment can improve your lifeCartoon by John P. WeissIt’s hard for people to rise above their opinions of themselves. This curse of self-limiting beliefs can often be traced back to childhood. Either an unsupportive parent or careless teacher did or said something devastating, and we carry the wound forward.
6 Reasons Not to Move to Ireland
       
6 Reasons Not to Move to IrelandExactly how the Emerald Isle disappointsI’ve lived in Ireland for quite some time now. Although moving during the pandemic wasn’t easy regardless of where you moved from or where you were moving to, Ireland has not lived up to my expectations. I’ve lived in the US, Spain, and Germany, and I’ve spent a good amount of time traveling around mainland Europe. All that to say, this isn’t my first rodeo. It wasn’t my first international move, and it won’t be my last.
Johnny Depp Will Likely Lose His $50M Defamation Case
       
Self | Law | Social MediaJohnny Depp Will Likely Lose His $50M Defamation CaseThe legalities won’t protect him from the reality of the case. Editorial rights purchased via iStock PhotosThe trial hasn’t gone well for Amber thus far. Candidly, it’s been the mother of all PR nightmares. Yet if you assume it’s game over for Amber’s case, you are woefully wrong.
I replaced my native iOS app with a cross platform web app and no-one noticed
       
Choosing a mobile app technology (aka pick your poison)Now, the problem with starting a mobile app in 2022 is that there are a lot of totally different technical directions you can take: native, cross platform web app, React Native, Flutter, Progressive Web App, Xamarin, etc, etc, etc. Cross Platform Web AppsWith cross platform web apps, you write code once using common web technologies and deploy it to multiple platforms. With 3 commands I can deploy to an iOS app, an Android app or deploy to my website on AWS! That flat line is when the cross platform web app was releasedSomehow my cross platform web app is actually more stable! Today, cross platform web apps are indistinguishable to human beings for a lot of apps.
What Is the Effect of Batch Size on Model Learning?
       
What Is the Effect of Batch Size on Model Learning? Let’s start with the simplest method and examine the performance of models where the batch size is the sole variable. Orange: size 64Blue: size 256Purple: size 1024This clearly shows that increasing batch size reduces performance. The study “Train longer, generalize better: bridging the generalization gap in large batch training of neural networks” aims to close the generalization gap between batch sizes. The authors employed an altered training schedule to get the big batch size learners to catch up to the smaller batch size learners.
Easy Explanation of C++ and Python
       
Extending the C programming language with the addition of object-oriented concepts, C++ was developed. Compilation : Python is an interpreted programming language and thus requires an interpreter that processes it at runtime. : Python is an interpreted programming language and thus requires an interpreter that processes it at runtime. Variable Declaration : Python is a dynamically typed programming language where a variable need not be declared before using it. Implementation of Bubble Sort in C++:Implementation of Bubble Sort in Python:The following table summarizes the differences between C++ and Python.
Top 10 AI Articles for April 2022
       
Source: UnsplashTop 10 AI Articles for April 2022Artificial intelligence (AI) newsletter by Towards AI #18If you have trouble reading this email, see it on a web browser. NewsIf you haven’t heard, we recently announced an exciting investment in Towards AI to expand the AI platform for the AI community. We have lots of exciting new projects in the pipeline at Towards AI and we are looking forward to making Towards AI an essential platform for the AI community. Our goal is to make AI more accessible and play a role in ensuring that AI benefits everyone. We aim to build a community that will democratize access to AI by making it easier to learn AI, build AI tools and benefit from AI as a non-professional.
Duplicate Column Names In Pandas: Updated
       
Duplicate Column Names In Pandas: UpdatedPandas still permit duplicate column names, here is what you can do about itOverviewThis article shows how easy it is to inadvertently generate a data frame in Pandas without throwing an error. Then consider how you might decide to change the column names while inadvertently creating duplicate column names. You can detect duplicate column names with df.columns.is_unique and df.index.is_unique . You can locate duplicate column names (or index entries) with df.index.duplicated() and df.columns.duplicated() . ConclusionThis article showed how unexpectedly easy it is to create a Pandas data frame with duplicate column names.
How To Automate and Simplify Your Machine Learning Experiment Workflow
       
So, over the course of years, I’ve picked up several things from the machine learning community which I transformed into a neat workflow. With this, we have 12 different experiments (2 x 2 x 3) to perform to get results for all different combinations. As you can see, the purpose of running 12 different experiments is to select the appropriate techniques that would be better to solve the problem. Those are the user-defined arguments or the ones in which we are interested in the experimentation process. This way the results of different experiments and artifacts can be viewed at a single place without much mess.
A Comprehensive Guide to Image Augmentation using Pytorch
       
A Comprehensive Guide to Image Augmentation using PytorchA way to increase the amount of data and make the model more robustPhoto by Dan Gold on UnsplashLately, while working on my research project, I began to understand the importance of image augmentation techniques. Moreover, each dataset image is acquired at a resolution of 227 by 227 pixels. So, it can be useful to convert an image to greyscale:gray_img = T.Grayscale()(orig_img)plot([gray_img], cmap='gray', col_title=["Gray"])Original image vs Grayscale image. It consists in injecting a Gaussian Noise matrix, which is a matrix of random values drawn from a Gaussian distribution. Gaussian Noise.
The Rise of Synthetic Audio in Documentary Films
       
The Rise of Synthetic Audio in Documentary FilmsExamining the use of voice models from “Val” to “The Andy Warhol Diaries”Still image from The Andy Warhol DiariesIn the summer of 2021, mass audiences got an abrupt introduction to audio deepfaking thanks to the controversy over the documentary Roadrunner: A Film About Anthony Bourdain (2021). Such conversations will only continue as more media incorporating artificially-generated audio continues to come out. Several other recent documentary projects made similar use of synthetic voices; while they didn’t raise as many eyebrows as Roadrunner, they deserve just as much scrutiny. The character of Luke Skywalker, originally portrayed by Mark Hamill in the films, appears entirely via a mixture of different computer-generation tools. Case study: The Andy Warhol DiariesThis March saw the release of The Andy Warhol Diaries (2022), a Netflix docuseries narrated by a Warhol voice generated by Resemble AI, which read excerpts from the pop artist’s voluminous memoirs.
Ukraine: Biden ups the stakes
       
US airman loads missile en route for Ukraine: Source: US Department of DefenseUkraine: Biden ups the stakesDoes America’s arms commitment change the character of the war? It convened the Ukraine Defense Consultative Group of 40 countries, to co-ordinate the continous supply of heavy weapons and ammunition to Ukraine. He also opposes the left demanding arms for Ukraine because of the danger of nuclear rhetoric. The outcome will depend on the political struggle; whether, as the Ukrainian left group puts it, the war becomes a “people’s war”. There are signs that Putin will respond to these frustrations by declaring all out war on Ukraine on 9 May.
Messaging Won’t Save Democrats; Community Might
       
Messaging Won’t Save Democrats; Community MightAs the midterms approach and voters sour on the party in power, here’s an alternative path worth trying. A month ago in this space, I offered some arguments for optimism about the upcoming mid-term elections for the majority of Americans who don’t want to see Trump Republicans return to power. Voter turnout has…
Can We Even Be Real?
       
Can We Even Be Real? A look at emerging social media platforms and their fundamental flawPhoto by camilo jimenez on UnsplashBeing a glass is half-full kind of guy, I’ve tried to look on the bright side of billionaire Elon Musk scooping up Twitter for $44 billion and turning it into his private company plaything. But I’m faltering. Maybe it’s his constant needling tweets that intentionally attempt to pit the…
A Lunch Date with Dementia
       
A Lunch Date with Dementia My mother is a new person now Photo: Eduardo Barrios / Unsplash My mother is gone, except really she’s not. What relief to see some old behaviors still sharp and in tact even as these newest days skew dull. As usual, I decided Mom needs to eat. Sundowning … language deficits … unsteadiness … yes, yes, yes. Mom’s days came to life in their jargon.
I Got An AI To Autocomplete Famous Novels
       
I write three times a week about tech, science, culture — and how those collide. Writer for NYT mag/Wired; author of “Coders” and “Smarter Than You Think”
Reinforcement Learning: Monte-Carlo Learning
       
This is where a different class of reinforcement learning known as Model-free learning comes in. Incremental MeansAs described before, to calculate the value function of the state (s) we average over the rewards obtained after visiting that state (s). Ordinary Important sampling and Weighted important sampling. Weighted Important SamplingOrdinary important sampling is unbiased whereas weighted important sampling is biased. On the flip side, the variance of ordinary important sampling is unbounded whereas this is not the case for weighted important sampling.
Sets & Parameters in Tableau: A Road to Tableau Desktop Specialist Certification
       
Welcome to the ninth chapter, In this piece, we are going to learn about sets and parameters in Tableau . Chapter 9: A comprehensive guide on Sets & Parameters in Tableau with Sample Certification Questions and free Udemy Dumps. Sets ActionsWe know that when we use dynamic sets, the view change when underlying data is changed. To create a set action, choose Worksheet>Actions. Choose “Change Set Values” to update the sets based on user preferences.
Can Elon Musk Run A Business Without Government Subsidies?
       
Can Elon Musk Run A Business Without Government Subsidies? With Twitter, we’ll find out! Tesla, SpaceX and SolarCity certainly relied on tons of taxpayer doughvia PixabayAssuming all goes well with his impending bid, Elon Musk will soon own Twitter. And what happens next?
Techniques to Write Better Python Code
       
": if not seen_integer: return False # e.g., ".3456" elif not seen_dot: seen_dot = True else: return False # e.g., "1..23" else: return False # e.g. "foo" if not seen_integer: return False # e.g., "" if seen_dot and not seen_decimal: return False # e.g., "2." : if not seen_integer : return False # e.g., ".3456" elif not seen_dot : seen_dot = True else : return False # e.g., "1..23" else : return False # e.g. return True print ( isfloat ( "foo" ) ) # False print ( isfloat ( ".3456" ) ) # False print ( isfloat ( "1.23" ) ) # True print ( isfloat ( "1..23" ) ) # False print ( isfloat ( "2" ) ) # True print ( isfloat ( "2." isdigit ( ) : return False # bad transition, can't continue if state in [ "integer" , "decimal" ] : return True else : return False print ( isfloat ( "foo" ) ) # False print ( isfloat ( ".3456" ) ) # False print ( isfloat ( "1.23" ) ) # True print ( isfloat ( "1..23" ) ) # False print ( isfloat ( "2" ) ) # True print ( isfloat ( "2."
Google AI Blog: Extracting Skill-Centric State Abstractions from Value Functions
       
In “Value Function Spaces: Skill-Centric State Abstractions for Long-Horizon Reasoning”, presented at ICLR 2022, we address the task of learning suitable state and action abstractions for long-range problems. Given a decision process with a finite set of k skills trained with sparse outcome rewards and their corresponding value functions, we construct an embedding space by stacking these skill value functions. Value functions corresponding to each skill (top-right; aggregated in bottom) capture functional information about the scene (top-left) and aid decision-making. Through the trajectory, when the robot picks up the blue cube, the corresponding skill value peaks. The learned VFS representation can ignore task-irrelevant factors such as arm pose, distractor objects (green cube) and background appearance (brown desk).
Designing Societally Beneficial Reinforcement Learning Systems
       
Designing Societally Beneficial Reinforcement Learning SystemsDeep reinforcement learning (DRL) is transitioning from a research field focused on game playing to a technology with real-world applications. At the same time as the emergence of powerful RL systems in the real world, the public and researchers are expressing an increased appetite for fair, aligned, and safe machine learning systems. A Taxonomy of FeedbackReinforcement learning systems are often spotlighted for their ability to act in an environment, rather than passively make predictions. Other supervised machine learning systems, such as computer vision, consume data and return a prediction that can be used by some decision making rule. Control FeedbackFirst is control feedback - in the control systems engineering sense - where the action taken depends on the current measurements of the state of the system.
Bias-Variance Decomposition for Model Assessment
       
Bias-Variance Decomposition for Model AssessmentBias-variance decomposition of machine learning algorithms with a hands-on example in PythonPhoto by Lukas on PexelsBias and variance are two key concepts in model assessment for machine learning as they are closely linked to the performance of the model on unseen data. One of the main difficulties the data scientists face when implementing a new model is what is known as the bias-variance dilemma or bias-variance problem. This consists of the conflict of minimizing both sources of error in supervised learning algorithms, which can be assessed with the bias-variance decomposition method. As indicated in Figure 2, when the model complexity exceeds the sweet spot, our model is overfitting the training data, whereas if the model complexity falls short, the model is underfitting the data. Bias-variance decompositionThe bias-variance decomposition is a useful method to understand the performance of an algorithm.
The Benefits of Using Boomerang Plots
       
The aiqc boomerang plot visualizes various performance metrics across each split (train, validation, test) for every model in an experiment. ? How to evaluate many tuned modelsImagine that you’ve just trained a large batch of models that all seem to be performing relatively well. At this point, you could calculate aggregate metrics for each model. However, with only 2 or 3 splits to learn from, aggregate metrics aren’t very useful. Image by Author? As you may have guessed, it’s eponymously named the boomerang plot because of the curves it makes for each model.
Building a Credit Card Fraud Detection Online Training Pipeline with River ML and Apache Flink
       
Benefits of building an online training pipeline with Apache FlinkUsually we have at least two separate processes when dealing with a ML pipeline. Our ML pipeline will have two components: the realtime ingestion part, done using Apache Flink, and the ML serving part using Flask and RiverML, which is responsible for online training. Initially developed for JVM languages, Apache Flink has now good support for python and that’s what we will use in this tutorial. We create an Apache Flink environment first, which is the entry point for our ingestion app. The first iterations will have ROCAUC -0.0:{"performance": {"ROCAUC": -0.0}, "result": false}{"performance": {"ROCAUC": -0.0}, "result": false}{"performance": {"ROCAUC": -0.0}, "result": true}{"performance": {"ROCAUC": -0.0}, "result": false}{"performance": {"ROCAUC": -0.0}, "result": true}{"performance": {"ROCAUC": -0.0}, "result": false}{"performance": {"ROCAUC": -0.0}, "result": false}{"performance": {"ROCAUC": -0.0}, "result": false}{"performance": {"ROCAUC": -0.0}, "result": false}{"performance": {"ROCAUC": -0.0}, "result": false}But as we feed more and more data into the logistic regression algorithms this will improve:{"performance": {"ROCAUC": 0.4992462311557789}, "result": false}{"performance": {"ROCAUC": 0.4992466097438473}, "result": false}{"performance": {"ROCAUC": 0.4992469879518072}, "result": false}{"performance": {"ROCAUC": 0.4992473657802308}, "result": false}{"performance": {"ROCAUC": 0.49924774322968907}, "result": false}{"performance": {"ROCAUC": 0.4992481203007519}, "result": false}{"performance": {"ROCAUC": 0.499248496993988}, "result": false}{"performance": {"ROCAUC": 0.4992488733099649}, "result": false}{"performance": {"ROCAUC": 0.49924924924924924}, "result": false}That’s it!
How To Easily And Confidently Implement Unit Tests In Python
       
How To Easily And Confidently Implement Unit Tests In PythonDo you want to sleep better at night knowing that your code isn’t going to break? Photo by Brett Jordan on UnsplashWhat are unit tests, why unit testing is important, what are the best practices for unit tests, and how do I implement unit tests in Python? You write tests parallel to your primary code to ensure that it works the way you expect it to. As you write code, you write tests alongside it. The main test type that you’ll come across is the unit test.
From Jupyter Notebooks to Real-life: MLOps
       
From Jupyter Notebooks to Real-life: MLOps Why is it a must-have? This is I think the most challenging part for machine learning in real-life. The Motivation for MLOps Although building an ML model in a Jupyter notebook is good for learning, it is far away from creating any business value. Both these cannot be achieved with a machine learning model that was created and trained for once in a Jupyter notebook. Challenges for MLOps I think we all agree that MLOps is a fundamental requirement for making machine learning a rewarding and beneficial tool.
Reducing Pipeline Debt With Great Expectations
       
Reducing Pipeline Debt With Great ExpectationsAlways know what to expect from your dataThis article was first published on Neptune AI’s blog. Automated testing: expectations to the rescueAutomated testing tailored for data pipelines is the premise of Great Expectations, a widely used open-source Python package for data validation. Key features of Great ExpectationsGreat Expectations offer three very useful features:Automated data profiling, to create the expectations suite from the data at hand. This brings us to the second great feature of Great Expectations. While by no means do they provide an ultimate answer, Great Expectations can at least help us in detecting dangerous biases.
All My Apes Gone — So Too Your Dreams of Blockchain Revolution
       
All My Apes Gone — So Too Your Dreams of Blockchain RevolutionPhoto by CHUTTERSNAP on UnsplashIt’s always the human factor. It’s a tiny thing, an ‘immutable’ force destroying utopian dreams. Whenever we are told about revolutionary technology, where security is built into the tech itself, the ‘human factor’ comes along and teaches us the same ol’ lesson: Technology can only be as secure as the people using it.
The Automobile’s Reign of Terror
       
SocietyThe Automobile’s Reign of TerrorHow we made our world for cars. The disastrous consequences and a path to the future. We humans tend to believe that we rule over this planet and that we are the pinnacle of evolution. But imagine an advanced extraterrestrial civilization catching a glimpse of the Earth. Who is the…
Why Elon Musk Can’t Fix Twitter
       
Why Elon Musk Can’t Fix TwitterHe is a flawed human like the rest of us, and the problems of social media are not technological problems but profoundly humanIllustration by R FressonI have to admit that I’m growing tired of the ‘dopamine hit’ critique of social media (disseminated in the documentary ‘The Social Dilemma’), which gives the impression the human beings are merely…
The Future of Cinema is Video Games, Just Not the Way You Think
       
The Future of Cinema is Video Games, Just Not the Way You ThinkUp close with a new technology at the National Association of Broadcasters ConventionThe Batman — filmed with responsive LED monitor walls instead of green screen to produce a realistic Gotham City (source: WarnerBros)“In the next decade, the future of cinema will incorporate video game technology,” said Paul Graff, visual effects supervisor of Boardwalk Empire, Stranger Things and The Wolf of Wall Street. Graff said this to my…
How ‘Funny or Die’ Changed Hollywood’s Relationship to the Internet
       
How ‘Funny or Die’ Changed Hollywood’s Relationship to the Internet15 years ago, the web video channel showed TV stars the potential of YouTubeWhen the landlord knocked on the door, Will Ferrell became nervous and his friend Adam McKay threatened to leave. Ferrell’s rent was late and he knew the landlord would be pissed. Reluctantly, he goes to the door and opens…
The Key to the 2022 Midterms Is in My Backyard
       
The Key to the 2022 Midterms Is in My BackyardWhen Amazon announced in the fall of 2017 that it was opening a $100 million, 855k-square-foot warehouse on Staten Island, New York’s fifth and most suburban borough, I will admit I cringed. Did it mean jobs? Did it breathe new life into re-development efforts for Staten Island’s west shore, a section of swampy expanse that has long needed commerce and economic activity? But as a native of the…
The Hottest Thing You’ll Read All Day: A History of Chilli Peppers
       
Photo by Shaun Meintjes on UnsplashThe Hottest Thing You’ll Read All Day: A History of Chilli PeppersIt’s easy. You need a can of tomatoes, some onion and garlic, beans (I like black), a bell pepper (hated by Mexicans), and a chilli pepper of your choice. And this, basically, was the argument made by my wife when I brought home a bottle of SCORPION CHILLI SAUCE. My (unused at the time of writing) chilli sauce is made from the Trinidad moruga scorpion chilli, which scores 1.2 million SHU. In this large population-based prospective study, the consumption of hot red chilli pepper was associated with reduced mortality.
Why blockchain and Web 3 user interfaces will suck for a while
       
Why blockchain and Web 3 user interfaces will suck for a whileHow Web 2.0 grumpiness + Web 3.0 hubris are contributing to terrible user experiences on the cutting edge of tech. From that perspective, it makes no sense for Web 3 founders to look to Web 2 practices for…well…anything. — The World According to Garp (which is a great movie and more proof that I’m old)The Pre-Disastered founders are typically Web 3 founders with Web 2 experience. Not yet in the mix: Web 2 GrumpsGood user experience and product designers don’t need Web 3, so not many of them are exploring it (yet). Product and UX pros who are thinking about Web 3:Start learning about the actual potential of Web 3 beyond today’s hype-storm.
A Stupid-Simple Way to Calm and Focus Your Mind
       
A Stupid-Simple Way to Calm and Focus Your MindA skeptic’s journey into mindfulness meditation leads to more introspection, less stress and anxiety, and even a little more happinessPhoto: Unsplash/Dingzeyu LiWhen a remedy for well-being or a blueprint for happiness seems too good to be true, I’m naturally skeptical. Take mindfulness meditation, said to alleviate…
Can a White Billionaire Really Destroy Black Twitter? Only Time Will Tell
       
RACISM ON SOCIAL MEDIACan a White Billionaire Really Destroy Black Twitter? Only Time Will TellIt’s a little early to write a eulogy for Black TwitterPhoto by Edgar Moran on UnsplashElon Musk, the wealthiest man in the world, is attempting to secure a deal to buy Twitter for 44 billion dollars. When the news hit, many users felt dread in the pit of their stomachs…
Study Tips for Data Science Interview Preparation
       
Study Tips for Data Science Interview PreparationHow to keep your momentum and make your interview prep efficientPhoto by Siora Photography on UnsplashIt won’t take you long in your data science job search to learn that there are many subjects to study. The tips I want to discuss here are not only for interview preparation but also for studying or learning just about anything. I’m going to go over some tips to help you answer these three questions and use your study time wisely when preparing for data science interviews. Plan CyclesFinally, when making a study plan, you should plan cycles. Photo by Glenn Carstens-Peters on UnsplashStay MotivatedSo, you have been able to make a study plan, but now, how do you execute it?
Reinforcement Learning: Monte-Carlo Learning
       
This is where a different class of reinforcement learning known as Model-free learning comes in. Incremental MeansAs described before, to calculate the value function of the state (s) we average over the rewards obtained after visiting that state (s). Ordinary Important sampling and Weighted important sampling. Weighted Important SamplingOrdinary important sampling is unbiased whereas weighted important sampling is biased. On the flip side, the variance of ordinary important sampling is unbounded whereas this is not the case for weighted important sampling.
How Moovit turns data into insights to help passengers avoid delays using Apache Airflow and Amazon SageMaker
       
Prior to using SageMaker, we used to take the trained ML models and manually integrate them into our backend environment. This step fetches the dataset from the previous step and trains the model using SageMaker. This step fetches the dataset from the previous step and trains the model using SageMaker. This step triggers the deploy function for SageMaker (using Boto3) to update the existing endpoint or create a new one. ConclusionIn this post, we shared how Moovit used SageMaker with AirFlow to improve the number of classified service alerts by 200% (x3).
Identify paraphrased text with Hugging Face on Amazon SageMaker
       
In this post, we fine-tune a Hugging Face transformer on Amazon SageMaker to identify paraphrased sentence pairs in a few steps. Hugging Face and AWS announced a partnership earlier in 2022 that makes it even easier to train Hugging Face models on SageMaker. You can find many examples of how to train Hugging Face models with these DLCs and the Hugging Face Python SDK in the following GitHub repo. To use the Hugging Face dataset, we first need to install and import the Hugging Face library:!pip --quiet install "sagemaker" "transformers==4.17.0" "datasets==1.18.4" --upgrade !pip --quiet install sentence-transformers import sagemaker.huggingface import sagemaker from datasets import load_datasetNext, let’s establish a SageMaker session. We can use the SageMaker Hugging Face Estimator class to initiate the fine-tuning process in two steps.
How Searchmetrics uses Amazon SageMaker to automatically find relevant keywords and make their human analysts 20% faster
       
To do this, Searchmetrics has a team of analysts assessing the potential relevance of certain keywords given a specific seed word. SageMaker provides a direct integration with Hugging Face through a dedicated Hugging Face estimator in the SageMaker SDK. This makes it easy to run Hugging Face models on the fully managed SageMaker infrastructure. In addition, Hugging Face and AWS announced a partnership earlier in 2022 that makes it even easier to train Hugging Face models on SageMaker. You can find many examples of how to train Hugging Face models with these DLCs and the Hugging Face Python SDK in the following GitHub repo.
Pandas user-defined functions are now available in Amazon SageMaker Data Wrangler
       
Amazon SageMaker Data Wrangler reduces the time to aggregate and prepare data for machine learning (ML) from weeks to minutes. With Data Wrangler, you can select and query data with just a few clicks, quickly transform data with over 300 built-in data transformations, and understand your data with built-in visualizations without writing any code. Solution overviewAt the time of this writing, you can import datasets into Data Wrangler from Amazon Simple Storage Service (Amazon S3), Amazon Athena, Amazon Redshift, Databricks, and Snowflake. Create a custom Pandas UDF transformLet’s walk through the process of creating two Data Wrangler custom Pandas UDF transforms using Pandas and Python modes. To learn more about Data Wrangler, refer to Create and Use a Data Wrangler Flow.
Abode uses Amazon Rekognition Streaming Video Events to provide real-time notifications to their smart home customers
       
After weighing alternatives, Abode leaned on their relationship with AWS to pilot Amazon Rekognition Streaming Video Events. For more information, refer to the Amazon Rekognition Streaming Video Events Developer Guide. Amazon Rekognition Streaming Video Events APIs are accurate, scalable, and easy to incorporate into our systems. The proliferation of camera and streaming video technology is just beginning, and managed computer vision services like Amazon Rekognition Streaming Video Events is paving the way for new smart video streaming capabilities in the home automation market. To learn more, check out Amazon Rekognition Streaming Video Events and developer guide.
3xLOGIC uses Amazon Rekognition Streaming Video Events to provide intelligent video analytics on live video streams to monitoring agents
       
3xLOGIC wanted to improve their managed video monitoring product VIGIL CLOUD with intelligent video analytics and provide monitoring center operators with real-time smart notifications. To do this, 3xLOGIC used Amazon Rekognition Video Streaming Events, a low-latency, low-cost, scalable, managed computer vision service from AWS. To learn more about Amazon Rekognition Streaming Video Events, refer to the Amazon Rekognition Developer guide. With Amazon Rekognition Streaming Video Events, we simply call the API and surface the results to our users. To get started with Amazon Rekognition Streaming Video Events, visit Amazon Rekognition Streaming Video Events.
Amazon Rekognition introduces Streaming Video Events to provide real-time alerts on live video streams
       
Amazon Rekognition Streaming Video Events sends them a notification as soon as the desired object is detected in the live video stream. To learn more about 3xLOGIC’s case study, see 3xLOGIC uses Amazon Rekognition Streaming Video Events to provide intelligent video analytics on live video streams to monitoring agents. How it worksAmazon Rekognition Streaming Video Events works with Amazon Kinesis Video Streams to detect objects from live video streams. Choose relevant objects –Amazon Rekognition Streaming Video Events provides the capability to choose one or more objects for detection in live video streams. To get started with Amazon Rekognition Streaming Video Events, visit Amazon Rekognition Streaming Video Events.
4 Techniques to Handle Missing values in Time Series Data
       
4 Techniques to Handle Missing values in Time Series DataEssential guide to time series analysisImage by Willi Heidelbach from PixabayThe real-world data often contain missing values. The cause of missing values can be data corruption or failure to record data at any given time. In one of my previous articles, I have discussed 7 different techniques to handle missing values for a non-time series dataset:Time Series models work with the complete data and therefore they require to impute the missing values prior to the modeling or actual time series analysis. Estimating or imputing the missing values can be an excellent approach to dealing with the missing values. Conclusion:In this article, we have discussed various techniques to handle and impute missing values in a time series dataset.
Enhancing the Performance in Training Tiny Neural Networks
       
Enhancing the Performance in Training Tiny Neural NetworksBe aware of the differences in training large and tiny neural networksPhoto by Craige McGonigle on UnsplashTraining deep neural networks (NN) is difficult, sometimes tricky even for veteran practitioners. Since big models are prone to over-fitting, eliminating over-fitting using various regularization methods is one of the most important topics when training big models. However, that is not the case for tiny models, since tiny models are prone to under-fitting. If regularization methods are used when training tiny models, the performance may get worse. In fact, if you train a tiny model in the manner of training a big model, you may miss the point.
Smart Paraphrasing Using Constrained Beam Search in NLP
       
Smart Paraphrasing Using Constrained Beam Search in NLPParaphrase to retain particular keywords for SEO and copywritingCopyright-free image from PixabayHuggingface recently introduced guiding text generation with constrained beam search in the Transformers library. You can give guidance about which words need to be included in the decoded output text with constrained beam search. This is a perfect use-case for constrained beam, search where you want to paraphrase while keeping a phrase or keyword intact in the paraphrased version. Here we will use the copywriting example we discussed above and paraphrase our original sentence using constrained beam search. With the introduction of constrained beam search in HuggingFace, we have moved a step closer to that goal.
When Not to Use Neural Networks
       
At the time, most of his AI education was centered around neural networks and their many variants. It should already be apparent that, while neural networks are great, they lack some critical properties: interpretability and explainability. Therefore, neural networks might be a poor fit for the task whenever these two properties are needed. are all the heavily-numeric methods, such as SVMs and Neural Networks, or, more broadly, nearly all kernel and gradient-based methods. Using Add-Ons: while I said neural networks are neither explainable nor interpretable, some literature is dedicated to fixing such issues.
Analysing Fairness in Machine Learning (with Python)
       
Analysing Fairness in Machine Learning (with Python)Doing an exploratory fairness analysis and measuring fairness using equal opportunity, equalized odds and disparate impactIt is no longer enough to build models that make accurate predictions. Table 2: prevalence by protected features (source: author)We can go further by calculating the prevalence at the intersection of the protected features. In Figure 2, you can see the mutual information values between each of the 6 features and protected features. Table 5: TPR by protected features (source: author)Like with prevalence, we can go further by finding the TPR at the intersection of the protected features. Table 8: FPR by protected features (source: author)This leads us to the second definition of fairness, equalized odds.
Feature Engineering — Unraveling the Mystery
       
Data Science | Machine Learning | Feature EngineeringFeature Engineering — Unraveling the MysteryWhat is feature engineering, the problem it solves, and why it really mattersTheoretically or practically, we have all come across the term ‘Feature Engineering’ in the field of Machine Learning and Artificial Intelligence. Feature engineering is agreed to be key to success in applied machine learning. In this article, you will discover what is feature engineering, what problem it solves, and why it really matters. Feature Engineering helps you to get the most out of the data and the best results from an algorithm. Feature engineering is an art.
Lasso and Ridge regression: An intuitive comparison
       
Lasso and Ridge regression: An intuitive comparisonAnd how they can help you understand regularisationLasso and Ridge (The Elements of Statistical Learning)IntroductionWhen people begin their Machine Learning journey, they often start with Linear Regression, one of the most simple algorithms out there. Use Ridge and Lasso regression. Lasso and Ridge are both Linear Regression models but with a penalty (also called a regularization). l1-norm of a vector (Image by author)This makes Lasso zero out some coefficients in your Beta vector. At λ=0, both Lasso and Ridge become Linear Regression models (we simply do not put any penalties).
Convert PASCAL VOC XML to YOLO for Object Detection
       
Convert PASCAL VOC XML to YOLO for Object Detection Tips and tricks to preprocess image datasets Image by the author This tutorial covers the following step-by-step guides: convert XML annotations to YOLO annotationsvisualize the bounding boxes in image using the newly created YOLO annotationssplit the datasets into train, validation and test setsOverview PASCAL VOC XML The PASCAL Visual Object Classes (VOC) project is one of the earliest computer vision project that aims to standardize the datasets and annotations format. One of the major problem with PASCAL VOC XML annotations is that we cannot use it directly for training especially on object detection tasks. Convert PASCAL VOC XML to YOLO Create a new script called xml2yolo.py in the working directory. continue Logically, a label file should have a corresponding image file. An image file may contain no object and we call it as background image.
Create Image Classification Models with TensorFlow in 10 Minutes
       
Figure 1: An Multilayer perceptron with one hidden layer [2]Next, let’s define our model using the Keras API from Tensorflow. EarlyStopping monitors validation loss during training. If validation loss stops decreasing for a specified amount of epochs (called patience), the training immediately halts. Had the model been trained for more epochs, training loss would continue to decrease, while validation loss would remain constant (or even worse, increase). 1782/1782 [==============================] - 9s 5ms/step - loss: 0.0849 - accuracy: 0.9686 - val_loss: 0.2866 - val_accuracy: 0.9187Epoch 10: early stoppingAgain, we initialised our model with 100 epochs.
A Comprehensive Guide to Image Augmentation using Pytorch
       
A Comprehensive Guide to Image Augmentation using PytorchA way to increase the amount of data and make the model more robustPhoto by Dan Gold on UnsplashLately, while working on my research project, I began to understand the importance of image augmentation techniques. Moreover, each dataset image is acquired at a resolution of 227 by 227 pixels. So, it can be useful to convert an image to greyscale:gray_img = T.Grayscale()(orig_img)plot([gray_img], cmap='gray', col_title=["Gray"])Original image vs Grayscale image. We can display the original image together with its normalized version:Original Image vs Normalized Image. Gaussian Noise.
MLP Mixer in a Nutshell
       
Furthermore, I’d like to provide some further context based on my experience to quickly understand the key characteristics of the MLP Mixer. Then let’s summarize the contribution of the MLP Mixer paper and finally shift gear to review the MLP Mixer. The MLP Mixer on the other hand replaces the self-attention mechanism by a MLP block encapsulated between two matrix transposition operations to captor the global context. 3: Mixer layer of the MLP mixer [1]. The intention of the MLP Mixer is, to clearly separate (1) and (2) which the authors refer to as channel-mixing and token-mixing respectively.
Art Fundamentals: How Illumination & Shadow Add Meaning to Artworks
       
Art Fundamentals: How Illumination & Shadow Add Meaning to ArtworksThe use of light and dark in paintingsDetail of ‘Saint John the Baptist’ (c. 1513-1516) by Leonardo da Vinci. Oil on walnut wood. Image source Wikimedia CommonsIn art, light and shadow are fundamental to expressing three-dimensional form. Let’s start with the basics. Here is a circle.
What My Grandmother Was Trying to Tell Me.
       
What My Grandmother Was Trying to Tell Me. My grandmother, my Baba, was a painter who spent her days in her back bedroom studio, working viscous, multi-colored puddles of acrylic paints into flowers. Before getting back to work, my grandmother would usually follow up lunch with a nap on the couch…at least she would try. The way my grandmother told the story, she hadn’t seen her brother since before they fled Ukraine. That’s the story my Grandmother was trying to tell.
Feminine Mystiktok — the “that girl” phenomenon, Tiktok, and the allure of “Pretty Work”
       
Feminine Mystiktok — the “that girl” phenomenon, Tiktok, and the allure of “Pretty Work”Just who is “That Girl”? Depending on your Tiktok algorithm, you may be familiar with “That Girl” already — clean, aesthetic vlogs, usually showcasing a thin white woman, draped in athleisure or expensive sweatpants, posing towards the mirror for a “fit check”, or applying goop from an assortment of plastic jars promising eternal youth. She takes us with her to Whole Foods. She cleans her house, elegantly. Her boyfriend works in tech, presumably in the other room…
What is a “Bug” Anyways?
       
Ladies and gentlemen, we have arrived at the thesis statement: We need a more specific classification of software bugs to handle them appropriately. I wrote a bug yesterday that stopped an entire class of devices from communicating with the server. They are defined thusly: “A bug that prevents you from shipping the next release of your software.”If you find this bug deployed in production, it usually means a special release to get it fixed. If they are easy to fix, fix them when you are working in that area of the code for other reasons. I am attempting to create a new Taxonomy of software bugs.
What Marine Le Pen’s Success Could Teach Us About Politics
       
What Marine Le Pen’s Success Could Teach Us About PoliticsIf politics is no longer a linear line, the far-right is not very farFrench Presidential candidate Marine Le Pen. Photo: VOX España and WikiDataI interviewed Marine Le Pen twice. The first time was in 2015, shortly after the horrific terror attacks on the “Charlie Hebdo” newspaper, and the Jewish supermarket “HyperCacher”. Back then she was a European Parliament member and we met at her very…
Why Did We “Save the Whales”?
       
Why Did We “Save the Whales”? It’s more complicated than you think“Ban Whaling: people sign Japanese flag to stop whaling” by John Englart (Takver) is marked with CC BY-SA 2.0. To view the terms, visit https://creativecommons.org/licenses/by-sa/2.0/?ref=openverse“Whale Carpaccio — 130 Kroner.”Thus read an appetizer on a menu at a restaurant in Bergen, Norway, when I dined there a few years back. I wanted to sample this odd dish. Would the meat be chewy like pork, or flaky like fish?
Extreme Social and Political Ideals Reinforced at the ‘Great Homeschool Convention’
       
Extreme Social and Political Ideals Reinforced at the ‘Great Homeschool Convention’From young-earth creationism to Tucker Carlson, the gathering of homeschool families offered a view of the world — and homeschooling — that is filled with anxiety over the left
Should Job Applicants Stop Lying about Themselves?
       
Should Job Applicants Stop Lying about Themselves? The real question is: when will the world stop listening? Photo courtesy of authorThe HR department sent me a batch of job applications to review. “Could you let us know your top-5?”It was a thankless task. Recommendations.
The New Abortion Restriction No One is Talking About
       
The New Abortion Restriction No One is Talking AboutAnti-abortion laws have traditionally allowed an exception to protect the “life of the mother.” Not anymore. Performers participate in ACT FOR ABORTION in front of the Supreme Court of the United States on Jan. 22, 2022 in Washington, D.C. | Leigh Vogel/Getty Images for Act For AbortionOriginally published in POLITICO. In 1942, my grandmother lay in a hospital bed in center city Philadelphia waiting to die. She was 26 years old, happily married, and…
6 Reasons Not to Move to Ireland
       
6 Reasons Not to Move to IrelandExactly how the Emerald Isle disappointsI’ve lived in Ireland for quite some time now. I moved to Ireland for work in the middle of the pandemic, and though my employer determined the location, I was cautiously optimistic. While some of that held true, the realities of living in Dublin, Ireland were far from ideal. Let me fill you in on the drawbacks of living in Ireland that you should consider before moving here. Why Ireland shouldn’t be your #1 country to move toThe fact that it is English-speaking makes Ireland quite accessible to non-Europeans.
Gödel and Some Controversial Ideas About Machine Consciousness
       
Gödel and Some Controversial Ideas About Machine ConsciousnessA couple of unorthodox perspectives about consciousness in AI. While most experts agree that weak AI is definitely possible, there is still tremendous skepticism when comes to strong AI. The Consciousness ArgumentMy favorite argument in the strong AI debate is about consciousness. By AI consciousness, we are referring to the ability of an AI agent to be self-aware of its “mental state”. Applying Dr. Kaku’s space-time theory of consciousness to AI systems, it is obvious that AI agents can exhibit some basic forms of consciousness.
Top 10 Open-Source Data Science Tools in 2022
       
There is nothing wrong with these libraries; they’re already the bare minimum essential for data science using python. These libraries help you collect and synthesize data Indeed, if we don’t have the data, there’s no further AI, machine learning, or data science. Source YData Synthetic is an open-source synthetic data engine. Having used synthetic data for several use-cases during my full-time work, I have personally contributed to this open-source project and believe synthetic data is the way to achieve high-quality data at scale while protecting the user’s privacy. You need to get your hands on PyCaret to understand how easy it is to start modeling the data in today’s world of data science.
Monitor Machine Learning Experiments With MLFlow on Azure Cloud
       
Monitor Machine Learning Experiments With MLFlow on Azure CloudHow to setup and log machine learning experiments on a remote MLFlow tracking server with Azure Machine LearningPhoto by Oskar Kadaksoo on Unsplash1. Azure Machine Learning is part of Microsoft’s Azure cloud computing platform which helps data scientists and engineers to manage their machine learning workflow[2]. Create Azure ML WorkspaceThe workspace is the top-level resource for Azure Machine Learning, providing a centralized place to work with all the artifacts you create when you use Azure Machine Learning. #command lineconda activate generalPython PackagesEnsure that the following packages are installed in the Conda environmentazureml-coreazureml-mlflowpandasnumpyscikit-learnmlflowAzure Machine Learning Workspace ConfigsDownload the Azure Machine Learning workspace configurations. SummaryIn this article, we discussed the motivation of tracking machine learning experiments using MLFlow on Azure Machine Learning and examined how to:Setup a remote MLFlow tracking server on Azure Machine Learning Train a model locally using scikit-learn Log the model parameters, results, and artifacts onto MLFlow Retrieve trained model from MLFlow tracking server for offline batch scoringReference[1] MLFlow[2] Azure Machine Learning[3] Resource Group[4] Azure Machine Learning Workspace
Your Personal AI-Powered Photoshop Designer
       
Your Personal AI-Powered Photoshop DesignerThis AI can reconstruct, enhance and edit your images! This is both amazing and scary if you ask me, especially when you look at the results. Quickly, StyleGAN takes an image, encodes it using convolutional neural networks, and is trained to re-generate the same image. If this sounds like another language to you, just take two minutes to watch this video I made covering StyleGAN. It is also much cheaper than hiring a professional on Photoshop and asking to edit all your future pictures.
Take Your Machine Learning Skills Global
       
When small changes have big effects, it is unsurprising that companies and governments are turning to machine learning and AI to accurately predict risk. ​How the Global Community is Applying Machine Learning​“Machine learning and AI is being used extensively in the financial services and cyber security industry. In the cybersecurity world, AI can pick up patterns of behavior and help an analyst process large amounts of information.”​Why Choose a Global Risk Degree? A global risk program teaches quantitative analysis and modeling while emphasizing problem-solving, decision-making, and communication. ​Upskill or Career Change in Just Two Years​Johns Hopkins University offers a part-time Master of Arts in Global Risk (online) designed to help professionals make forward-looking decisions and contribute to risk management.
Google Colab for Machine Learning Projects
       
Google Colab quick start guideExploring your Colab environmentUseful Google Colab extensionsExample: Saving model progress on Google DriveWhat is Google Colab? Google Colab Quick Start GuideTo get create your Google Colab file and get started with Google Colab, you can go to Google Drive and create a Google Drive account if you do not have one. load_data ( ) input_layer = Input ( shape = ( 28 , 28 , 1 ) ) model = LeNet5 ( ) ( input_layer ) model = Model ( inputs = input_layer , outputs = model ) model . ModelCheckpoint ( filepath = checkpoint_path , save_weights_only = True , verbose = 1 ) input_layer = Input ( shape = ( 28 , 28 , 1 ) ) model = LeNet5 ( ) ( input_layer ) model = Model ( inputs = input_layer , outputs = model ) model . ModelCheckpoint ( filepath = checkpoint_path , save_weights_only = True , verbose = 1 ) input_layer = Input ( shape = ( 28 , 28 , 1 ) ) model = LeNet5 ( ) ( input_layer ) model = Model ( inputs = input_layer , outputs = model ) model .
Studying the brain to build AI that processes language as people do
       
Today, Meta AI is announcing a long-term research initiative to better understand how the human brain processes language. In collaboration with neuroimaging center Neurospin (CEA) and INRIA we’re comparing how AI language models and the brain respond to the same spoken or written sentences. Unlocking this long-range forecasting capability could help improve modern AI language models. In several studies, we’ve discovered the brain is systematically organized in a hierarchy that’s strikingly similar to AI language models (here, here, and here). Toward human-level AIOverall, these studies support an exciting possibility — there are, in fact, quantifiable similarities between brains and AI models.
How to Set Up a Development Environment for Machine Learning
       
How to Set Up a Development Environment for Machine LearningHow to install, activate, and use a virtual environment for machine learning and data science-related tasksPhoto by Bradley Lembach on UnsplashBefore we start coding, it is essential to set up our machine with a new development environment. A virtual environment is a development environment that acts as a container for our current project. Dedicating a virtual environment to a project is common practice, and should always be done for the reasons mentioned above. It is a lightweight installer of conda, an open-source data science-oriented development environment management system available for Linux, OSX and Windows. How to create a virtual environment with MinicondaOnce Anaconda or Miniconda has been installed and their correct functioning has been validated using the conda command, we can create a new development environment as follows:$ conda create -n name_of_my_environmentThis command will create a virtual development environment called name_of_my_environment in the installation directory.
What Does a Data Scientist Read?. Data science does not tend to stand…
       
I share my reading for Jan-March 2022Photo by Fang-Wei Lin on UnsplashData science does not tend to stand still. These days, despite the prevalence of data science courses and university courses, I would say that the pressure to learn is still there. My first tip is that if you are feeling exhausted and tired about data science, DO NOT DO IT ALL THE TIME! But what I wanted to say was do not feel everything you must read to progress in data science must be data science. The cause was the training data being labelled in a way which turned out to be biased.
Multi-Seasonal Time Series Decomposition Using MSTL in Python
       
Multi-Seasonal Time Series Decomposition Using MSTL in PythonFind out how to decompose multi-seasonal time series using MSTL, discover how MSTL works, and see MSTL in action on real world dataImage by author. IntroductionTime series decomposition is about breaking up a time series into components, most notably: a trend component, a seasonal component, and a residual component. In July 2021 Bandara, Hyndman, and Bergmeir proposed a new algorithm for multi-seasonal decomposition called Multiple Seasonal-Trend decomposition using Loess (MSTL) [1]. Time series decompositionAs you know by now, time series decomposition is about breaking up a time series into trend, seasonality, and residuals (Fig. Refine each seasonal component by adding it back to the de-seasonalised time series and re-extracting the seasonal component using STL.
NeRF From Nothing: A Tutorial with PyTorch
       
It’s NeRF From Nothing: Build A Complete NeRF with PyTorchA tutorial for how to build your own NeRF model in PyTorch, with step-by-step explanations of each component3D model from Matthew Tancik. IntroductionNeRF ExplosionThe Neural Radiance Field, or NeRF, is a fairly new paradigm in the world of deep learning and computer vision. In this tutorial, we will walk through the essential components of a NeRF and how to put them all together to train our own NeRF model. The NeRF model, implemented as a PyTorch module. The NeRF model, and more broadly differentiable rendering, are quickly bridging the gap between creation of images and creation of volumetric scenes.
How to Track Machine Learning Experiments using DagsHub
       
Tutorial on using DagsHub for enhancing the machine learning model training pipeline using experiment trackingThe end benefit of a system that can track machine learning experiments is improved productivity. It starts with reading the data using Pandas, followed by splitting the dataset into their respective features and labels. ª model.ipynbª metrics.csvª params.ymlª+---dataª iris.csvª+---modelsª RandomForestClassifier_model.sav3.3 Experiment Tracking using DagsHub LoggerBefore we start pushing our files to DagsHub, we need to establish some baseline understanding of how files are tracked on DagsHub. By using DagsHub, we can store both .dvc files, and the associated data and models in one place by using DagsHub as the remote storage. DagsHub operate quite similar to Github, in the sense that you could manage the DagsHub repo using command line.
Solving IBM’s Quantum Open Science Prize
       
Solving IBM’s Quantum Open Science Prize A practical way to find out whether to pursue a career in quantum computing Do you want to get started with Quantum Machine Learning? This is the challenge in IBM’s Quantum Open Science Prize. Essentially, the IBM challenge says: Here’s a pretty cool quantum algorithm that simulates particles. Given the noisy counts, we can use the modifiers to calculate the noise-free counts. So, we can calculate the noise-free counts, measure the noisy counts, and compute the modifiers.
How to form realistic expectations about data
       
How to form realistic expectations about dataThe journey to becoming a “real” data analystThere are some big differences between an amateur and a professional analyst. Data pro vs amateur difference #7 — Realistic expectations of dataIf you’re a professional analyst, you know that data doesn’t owe you anything. It’s quite rare (and usually ill-advised) to design your data collection before your team has explored some related data. Data pro vs amateur differences #1-#3Software skills; handling lots of data with ease; immunity to data science bias. Data pro vs amateur differences #4–#6Understanding the career; refusing to be a data charlatan; resistance to confirmation bias.
DALL·E: an AI Treasure Chest in Action
       
DALL·E: an AI Treasure Chest in Action Creative and comprehensive capacities of Artificial Intelligence Image created with DALL·E by OpenAI // Copyright: OpenAI // generated by Author The year 2021 began with several AI milestones. Image created with DALL·E by OpenAI // Copyright: OpenAI // generated by Author But DALL·E can do more. DALL·E is aware of this fact: Image created with DALL·E by OpenAI // Copyright: OpenAI // generated by AuthorCreative Glitches Sometimes DALL·E doesn’t deliver precisely what you demand. Instead, it gave me my probably most favorite image, created by DALL·E: Image created with DALL·E by OpenAI // Copyright: OpenAI // generated by Author This one: Image created with DALL·E by OpenAI // Copyright: OpenAI // generated by Author Everything is in this image: the idea itself, the perfect visualization, the atmosphere. Image created with DALL·E by OpenAI // Copyright: OpenAI // generated by Author Image created with DALL·E by OpenAI // Copyright: OpenAI // generated by Author
How to easily install private Python packages in Google Colab
       
How to easily install private Python packages in Google ColabAn elegant alternative to zipping projectsTLDR: ensure that the .json file has keys “username” → name of repository account holder, “access_token” → your GitHub access token. This means that you need to install all the packages you need every single time. However, all this fails if you are working with private packages. That is, a package that you are developing on a private repository hosted on a versioning service (e.g. This article details a reliable and quick way to install private packages using a package I’ve developed, colab-dev-tools.
Focal Loss: A better alternative for Cross-Entropy
       
Focal Loss: A better alternative for Cross-EntropyFocal loss is said to perform better than Cross-Entropy loss in many cases. But why Cross-Entropy loss fails, and how Focal loss addresses those problems let us find out in this articleGradient Descent, Photo by Rostyslav Savchyn on UnsplashLoss functions are mathematical equations that calculate how far the predictions deviate from the actual values. Now that we’ve defined the loss function, let’s go over the issues that Categorical Cross-Entropy loss causes and how Focal loss solves them. Categorical Cross-Entropy LossCategorical Cross-Entropy loss is traditionally used in classification tasks. Balanced Cross-Entropy LossBalanced Cross-Entropy loss adds a weighting factor to each class, which is represented by the Greek letter alpha, [0, 1].
Autoencoder For Anomaly Detection Using Tensorflow Keras
       
In this article, we will use the Python Tensorflow Keras library to illustrate the process of identifying outliers using an autoencoder. The train test split gives us 80,000 records for the training dataset and 20,000 for the validation dataset. Step 4: Autoencoder Algorithm For Anomaly DetectionThe autoencoder model for anomaly detection has six steps. Step 5: Autoencoder Model TrainingThe autoencoder model trains on the normal dataset, so we must first separate the expected data from the anomaly data. Step 6: Autoencoder Anomaly Detection ThresholdNow that we have an autoencoder model, let’s use it to predict the outliers.
Advanced Concepts in Python — I. A detailed look into Iterators…
       
Advanced Concepts in Python — I A detailed look into Iterators, Generators, Coroutines, and Iterator Protocol Photo by Chris Ried on Unsplash Python has been my go-to language for over two years now. Topics to be Covered Iterators Iterables Generators Coroutines Iterator ProtocolIterators Iterators in Python are objects that emit streams of values one at a time. Generator function generating values In the above gif, you can think of that machine as a generator and those mails are values. Code for a generator function In cell 1, I create a generator function “custom_generator” which squares each value in a range and returns it one by one. Coroutines Coroutines are functions that can be paused mid-execution and then their execution can be resumed from that point at a later time.
How to Filter Pandas DataFrame By Time
       
How to Filter Pandas DataFrame By TimeSimple Pandas methods to filter DataFrame by timePhoto by Thomas Bormans on UnsplashIntroductionDatasets with timestamps are common and we might be required to filter the DataFrame by time. In this article, we examine how to filter a Pandas DataFrame by time using the .between_time() , .at_time() and .loc methods in Pandas version 1.4.1. ts = pd.date_range('2022-03-04', periods=10, freq='12h20min')df_row = pd.DataFrame({'ts': ts, 'qty': [np.random.randint(10, 100) for i in range(10)]})df_col = pd.DataFrame(np.random.randint(0,100,size=(5, 10)), columns = ts)df_row has a timestamp column named tsImage by authorwhile df_col has column headers as timestampsImage by authorBetween Time.between_time() is a Pandas DataFrame method that filters for rows in a Pandas DataFrame between a start and end time. # swap the start_time and end_timedf_row.set_index('ts').between_time('16:00', '14:20').reset_index()Image by author.between_time() also allow us to filter a DataFrame by time across columns. df_col.between_time('14:20', '16:00', axis = 1)Image by authorAt Time.at_time() is a Pandas DataFrame method that selects rows with the exact time instead of a range of time.
Deploy HuggingFace NLP Models in Java With Deep Java Library
       
Deploy HuggingFace NLP Models in Java With Deep Java LibraryA step-by-step demonstration with HuggingFace question answering model. Equipped with these features, HuggingFace users can bring their own question answering model using the HuggingFace toolkit in 10 minutes. In this blog post, we walk through deploying your own HuggingFace question answering model step-by-step. // tokenA: [bbc, japan, was, a, general, entertainment, channel, ., which, operated, between, december, 2004, and, april, 2006, ., it, ceased, operations, after, its, japanese, distributor, folded, .] Equipped with this knowledge, you should be able to deploy your own transformer-based model from HuggingFace on Java applications, including SpringBoot and Apache Spark.
Top 5 Reasons Not to Become a Machine Learning Engineer
       
Top 5 Reasons Not to Become a Machine Learning EngineerUnderstand what is not for youImage created by Author, Person is the AuthorHere are my top 5 reasons not to become a Machine Learning Engineer. I hope this list of reasons when not to become a Machine Learning Engineer is helpful! Little Machine LearningThis brings me to my last point, you only want to do Machine Learning. Look I know the job is called Machine Learning Engineer but it’s still only around 10–20% of your job, depending on how you count, of course, is now building an ML Pipeline MLOps, or is it Machine Learning? If you are even more convinced that you want to become an ML Engineer than before make sure to also check out my video on How to become an ML Engineer.
The Homelessness Crisis: A Monster of Our Own Making
       
The Homelessness Crisis: A Monster of Our Own MakingBy Heidi MarstonHomelessness is a scar on the face of our nation. I lead the agency charged with ending homelessness in Los Angeles. The Homelessness Crisis: The Path Forward“I have little patience with scientists who take a board of wood, look for its thinnest part, and drill a great number of holes where drilling is easy,” Albert Einstein said. In Los Angeles, we have been housing our homeless population at record numbers, even as the crisis continues to expand. Leaders at the helm of the homelessness crisis are quick to state they want to end homelessness.
Why Success Is Often Elusive at the Highest Echelons
       
Why Success Is Often Elusive at the Highest EchelonsDuring recent discussions with friends, one common theme that crops up very frequently is how success has remained rather elusive to recent higher level hires at their companies. It’s extremely rare to hear about case studies of engineering leaders who’ve been wildly successful in an organization replicate the same success every time they change jobs. At The Highest Levels, You Need To Innovate And ExecuteI recently came across an interesting tweet, which I think is applicable here. This is especially true at the highest echelons of engineering — it’s a sine qua non to be able to dream and execute. Engineering leaders brought into embattled organizations tasked with stabilizing the chaos are often heavily incentivized to do this.
Anniversary of The Chornobyl Disaster: The Complexity of ‘Never’
       
Anniversary of The Chornobyl Disaster: The Complexity of ‘Never’The cruelty and ignorance of Russia can cause another Chornobyl. She was a child of war, born in 1937 and witnessing a World War where an estimated 8 million Ukrainians died. A memorial, dedicated to firefighters and workers who died after the Chornobyl disaster, 2016. They don’t teach about the reasons and consequences of the Chornobyl disaster in Russian schools. If you deny the responsibility of the Russian government for the Chornobyl disaster, you allow it to happen again.
Elon Musk May Turn the Digital Town Square into a Colosseum
       
Elon Musk May Turn the Digital Town Square into a ColosseumThe world’s richest man could use Twitter to radically disrupt politicsTen years ago, Google did something unprecedented for a giant tech company. It blacked out the landing page for search and replaced it with a call to action, urging people to email their elected representatives in Congress to stop legislation…
How Getting Covid Became My Reset Button
       
How Getting Covid Became My Reset ButtonWhy a full stop can push pause, in a good wayPhoto by Simon Wijers on UnsplashLast week I could sense the headless horseman coming for me. It was just like all those scary movies. Me, hiding behind a tree in the gully. Up above on the ledge are men on horses, dressed in long capes, with tall black hoods made of stiff linen. They are galloping along the trail, looking for me, down below, to make…
It Took Me 10 Years to Understand Entropy, Here is What I Learned.
       
It Took Me 10 Years to Understand Entropy, Here is What I Learned. Still, some systems are in a higher entropy state in their crystal form than fluid phases under the same thermodynamic conditions [1]. By going backward to the early time of the universe, one can conclude that it started at an extraordinarily and surprisingly low entropy state [4]. Formally, this is known as the Poincaré recurrence theorem, which states that certain dynamical systems will always return to their initial (low entropy) state after a finite time. “Solutions to the cosmic initial entropy problem without equilibrium initial conditions.” Entropy 19.8 (2017): 411.
A Textbook Editor’s Plea for Citizen Engagement
       
A Textbook Editor’s Plea for Citizen EngagementK-12 textbooks aren’t subject to critical race theory, but they are full of mistakesPhoto by Siora Photography on UnsplashWhen I tell people I’m a freelance editor, they get interested. When I specify that I’m a freelance copy editor for K-12 textbook publishers, their eyes glaze over.
An Aging Brain Isn’t a Subpar Brain
       
All About AgingAn Aging Brain Isn’t a Subpar BrainMental lapses people attribute to aging aren’t inevitably dementia and regardless shouldn’t be targets of ridiculePhoto: Alexander Schimmeck / UnsplashOne of the easiest-access targets in U.S. culture — especially in political and comedic commentary — is any hint in someone over age 60 that their brain is…
Why Do Some White People Enjoy Using Black People as Political Props?
       
POLITICS + RACISMWhy Do Some White People Enjoy Using Black People as Political Props? They think tokenism is poetic justice, and it’s notPhoto by Rolands Zilvinskis on UnsplashThroughout American history, White people have often taken great pleasure in using Black people as political props to promote racist policies. In Fredrick Douglass’s memoir, The Myth of the Happy Slave, he…
A Coherent Compass for Charting Toward a Salutogenic Culture
       
A Coherent Compass for Charting Toward a Salutogenic CultureTo navigate the compass, go inward, outward, and upward‪”The most powerful tool in economics is not money, nor even algebra. And if education and economics are the cornerstones of culture, Economics Education has a pretty central role in that transition, says the economics teacher. Doughnut Economics then “unrolls the doughnut” to explore social and environmental factors and solutions at the local and global scale. Salutogenic Economics is an endless Design Challenge. Navigating this compass is admittedly complex at a glance, but in practice: this is the true work of economic understanding and creating coherent, compassionate cultures.
The Worst Bug Ever—Randomly Losing Your Best Players
       
The Worst Bug Ever—Randomly Losing Your Best PlayersBy Ron LittleImage: NetflixImagine discovering a serious bug in production immediately after releasing your game. This is a story of such a bug, the worst bug I have ever dealt with in 30 years of programming. What could be causing Unity IAP 4.1.1 to take more than two seconds just to parse a chunk of text in memory? The default in the Package Manager would be Unity IAP 3.2.3, and our builder would choose Unity IAP 4.1.1 for Android. Here is the summary in the Unity IAP 4.1.3 changelog; a lot of work and stress were behind this innocent-sounding sentence!
Accelerate data preparation with data quality and insights in Amazon SageMaker Data Wrangler
       
Today, we’re excited to announce the new Data Quality and Insights Report feature within Data Wrangler. Target Column Insights – This section provides statistics on the target column including % valid, % missing, % outliers, univariate statistics such as min/median/max, and also presents examples of observations with outlier or invalid target values. Feature Importance – This section provides a ranking of features by feature importance which are automatically calculated when preparing the data insights and data quality report. In this post, we use the insights and recommendations of the Data Quality and Insights Report to process data by applying the suggested transformation steps without writing any code. Address additional warnings on transformed dataAfter we address the initial issues found in the data, we run the Data Quality and Insights Report on the transformed data.
Part 1: How NatWest Group built a scalable, secure, and sustainable MLOps platform
       
This is the first post of a four-part series detailing how NatWest Group, a major financial services institution, partnered with AWS to build a scalable, secure, and sustainable machine learning operations (MLOps) platform. Strategic collaboration between NatWest Group and AWSNatWest Group is the largest business and commercial bank in the UK, with a leading retail business. AWS and NatWest Group data scientists and engineers co-created the baseline environment templates and SageMaker pipelines based upon these use cases. Cloud-first: The solution for sustainable ML model development and deploymentTraining ML models using large datasets requires a lot of computational resources. Part 3 provides an overview of how NatWest Group uses SageMaker services to build auditable, reproducible, and explainable ML models.
Part 2: How NatWest Group built a secure, compliant, self-service MLOps platform using AWS Service Catalog and Amazon SageMaker
       
In this post, we share how the NatWest Group utilized AWS to enable the self-service deployment of their standardized, secure, and compliant MLOps platform using AWS Service Catalog and Amazon SageMaker. Why AWS Service Catalog? The team chose AWS Service Catalog to build a catalog of secure, compliant, and preapproved infrastructure templates. Iterative changes to products are made with the help of AWS Service Catalog product versioning. However, the AWS Service Catalog products themselves are built using standard CloudFormation templates.
Part 3: How NatWest Group built auditable, reproducible, and explainable ML models with Amazon SageMaker
       
This post is intended for data scientists, MLOps engineers, and data engineers who are interested in building ML pipeline templates with Amazon SageMaker. The following screenshot shows what is displayed in Amazon SageMaker Studio when this pipeline is successfully run. Amazon SageMaker Model Monitor and Amazon SageMaker Clarify suit these purposes, and both operate in a scalable and repeatable manner. ConclusionNatWest’s ML templates align our MLOps solution to our need for auditable, reusable, and explainable MLOps assets across the organization. In addition to the quick-start ML templates and account infrastructure, several existing NatWest ML use cases were developed using these capabilities.
Part 4: How NatWest Group migrated ML models to Amazon SageMaker architectures
       
The CLV model consists of a series of separate ML models that are brought together into a single pipeline. To help improve this, AWS and NatWest collaborated to develop a series of ML project and environment templates using AWS services. They also include a set of self-service, secure, multi-account infrastructure deployments for AWS ML services and data services via Amazon Simple Storage Service (Amazon S3). All processing, feature engineering, model training, and inference tasks on premises were done using PySpark or Python. BenefitsThe AWS-NatWest collaboration has brought innovation to the implementation of ML models via SageMaker pipelines and MLOps best practices.
Create random and stratified samples of data with Amazon SageMaker Data Wrangler
       
Data Wrangler reduces the time it takes to aggregate and prepare data for machine learning (ML) from weeks to minutes. Data Wrangler supports two of the most common strategies: random sampling and stratified sampling. Data Wrangler provides random sampling so you can efficiently process and visualize your data. As the sample size increases, the error decreases as the inverse of the square root of the sample size. Ajai Sharma is a Principal Product Manager for Amazon SageMaker where he focuses on Data Wrangler, a visual data preparation tool for data scientists.
Build and deploy a scalable machine learning system on Kubernetes with Kubeflow on AWS
       
Pipeline artifacts in Amazon S3 – Amazon S3 offers industry-leading scalability, data availability, security, and performance, and could be used to meet your compliance requirements. Set up Amazon RDS, Amazon S3, and Secrets ManagerYou create Amazon RDS and Amazon S3 resources before you deploy the Kubeflow manifests. Access the artifacts in Amazon S3While deploying Kubeflow, we specified Kubeflow Pipelines should use Amazon S3 to store its artifacts. The use case in this post demonstrated Kubeflow integration with Amazon Cognito, Secrets Manager, Amazon RDS, and Amazon S3. To get started with Kubeflow on AWS, refer to the available AWS-integrated deployment options in Kubeflow on AWS.
How to Catch Multiple Exceptions in Python
       
How to Catch Multiple Exceptions in Python Handling multiple exceptions in Python Photo by CHUTTERSNAP on Unsplash Introduction A well developed application must always be capable of handling unexpected events — such as exceptions — in a proper way. In today’s short tutorial we will showcase how one can handle multiple Exceptions in Python. Handling multiple exceptions with Python ≥ 3.11 As of Python 3.11, a new standard exception type was introduced, namely ExceptionGroup . What to Expect in Python 3.11 Exploring the new additions and updates in Python 3.11 and how to get early access to 3.11 Alpha towardsdatascience.comFinal Thoughts In today’s short tutorial we showcased various different approaches when it comes to handling multiple exceptions in Python. We’ve seen how to catch multiple exceptions using the traditional except clause but we also showcased how to do so using the new except* clause that will be introduced in Python 3.11.
An Agile Framework for AI Projects — Development, QA, Deployment and Maintenance
       
Some will be faster to annotate but might be less effective or flexible for algorithm development (e.g. Some will be faster to annotate but might be less effective or flexible for algorithm development (e.g. Streamline experiment reproducibility -SeedingSnapshot: code version, data version and splits, script, hyperparameters, configurations, seed value, experiment results, experiment artefacts etc. The inference used for evaluation should run through the “production pipeline” — the same pipeline used for inference in production. There are two types of dependencies in that aspect — dependencies that are allowed to affect the algorithm performance (e.g.
Understanding Train Test Split (Scikit-Learn + Python)
       
Understanding Train Test Split (Scikit-Learn + Python)W hat a train test split is, how to use it to tune models using Python, and the bias-variance tradeoff. This tutorial goes over the train test split procedure and how to apply it in Python. This tutorial includes:What is the Train Test Split ProcedureUsing Train Test Split to Tune Models using PythonThe Bias-variance TradeoffIf you would like to follow along, the code and images used in this tutorial is available on GitHub. What is the Train Test Split Proceduretrain test split procedure. Consequences of NOT using Train Test SplitYou could try not using train test split and train and test the model on the same data.
What’s Up after AlphaFold on ML for Structural Biology?
       
If you are specifically interested in the rolling evaluation by CAMEO and knowing the current state of protein structure prediction, click here . What’s going on right now in the field of protein structure prediction? What’s going on right now in the field of structure prediction? There will also be increased emphasis on assessment of accuracy estimates, a key feature of AlphaFold predictions that we anticipated in our CASP13 assessment. But it’s always there to see and open for everybody to explore the most updated information about methods for protein structure prediction.
How to Design the Most Powerful Graph Neural Network
       
How to Design the Most Powerful Graph Neural NetworkGraph classification with Graph Isomorphism NetworksImage by authorGraph Neural Networks are not limited to classifying nodes. A. Weisfeiler-Lehman testA way to characterize the “power” of a GNN is to use the Weisfeiler-Lehman (WL) graph isomorphism test. This is what inspired Xu et al.² to design a new aggregator that they proved to be as good as the WL test. GCN test accuracy = 59.38%GIN test accuracy = 73.70%This time, there’s no competition! GCN test accuracy = 59.38%GIN test accuracy = 73.70%GCN+GIN test accuracy = 75.00%This time, we’re lucky enough to see the accuracy improved.
Model Tests Are Critical for Building Domain Knowledge
       
Model Tests Are Critical for Building Domain KnowledgeTesting protects against regressions. But viewing tests as building and systematizing domain knowledge shows the power of testing beyond a ratchet to prevent backsliding. Testing is Knowledge-BuildingA model test is an assertion that a model should behave a certain way in some scenario. Implementing that process means the platform is backed by a data model tying metadata around testing, labeling, and model prediction together. Platforms for Knowledge DevelopmentBuilding knowledge across different development activities requires a platform with a unified data model.
Common Mistakes During A/B Testing
       
During the last couple of years, I have witnessed a lot of mistakes that people make during A/B testing design and post-analysis. In reality, we can use Mann-whitney test only to check if there is a shift in our distribution. Image by AuthorWhen we apply Mann-Whitney test, we set our hypotheses as following:The null hypothesis. Why we just can’t choose 0% for type I error rate and 0% for type II error rate ? For Google it’s an absolute breeze to involve in an experiment event dozens of millions of people , therefore it’s better to set your type error rate as 0.1 % and to be more confident of your result.
An introduction to transformers and Hugging Face
       
This means we can actually produce useful language models with minimal data and a regular CPU. In order to standardise all the steps involved in training and using a language model, Hugging Face was founded. (This example closely mirrors the introduction to transformers in Natural Language Processing with Transformers, which is a great reference manual for the field.) Learning Hugging Face is about moving down levels of abstraction until we get into the depths of the code. It’s pretty remarkable that it probably genuinely takes less time to generate some text in Hugging Face than to type a reply.)
Focal Loss: A better alternative for Cross-Entropy
       
Focal Loss: A better alternative for Cross-EntropyFocal loss is said to perform better than Cross-Entropy loss in many cases. But why Cross-Entropy loss fails, and how Focal loss addresses those problems let us find out in this articleGradient Descent, Photo by Rostyslav Savchyn on UnsplashLoss functions are mathematical equations that calculate how far the predictions deviate from the actual values. Now that we’ve defined the loss function, let’s go over the issues that Categorical Cross-Entropy loss causes and how Focal loss solves them. Categorical Cross-Entropy LossCategorical Cross-Entropy loss is traditionally used in classification tasks. Balanced Cross-Entropy LossBalanced Cross-Entropy loss adds a weighting factor to each class, which is represented by the Greek letter alpha, [0, 1].
Comprehend Dropout: Deep Learning by doing toy examples
       
Comprehend Dropout: Deep Learning by doing toy examplesDropout is one of the main regularization techniques in deep neural networks. Fully Connected network (Created by Author)In Deep Learning, especially in Object Detection, overfitting can easily happen. The Dropout Regularization SchemeThe Dropout technique creates a sub-neural network from the original one by selecting some neurons in the hidden layers. The first hidden layer:where g⁽¹⁾ is the ReLU activation function, thenNote: The first dropout layer, μ⁰, is one for all nodes. I plan to add more such toy examples in Machine Learning and Deep Learning.
Pick Your Deep Learning Tool
       
Domain-Specific DL ToolsThe way most practitioners work with ML is to use a domain-specific tool for their branch of deep learning. Unfortunately, the only solution to the fast-changing-frameworks problem is to write all the deep learning operations by yourself. They are designed differently, since Returnn is meant to be a standalone software while Keras is a framework for developing deep learning tools. ConclusionsThe current applicative deep learning landscape is ruled by tools built around deep learning frameworks and specialized for a single, or a few tasks. In this article I wanted to share another option to build deep learning tools, using the example of Returnn, the only deep learning engine I am aware of.
In France, Like America, Pandering to White Grievance is a Winning Political Formula
       
In France, Like America, Pandering to White Grievance is a Winning Political FormulaIt’s marginalized communities who always step up and fight for countries that don’t love them backFrance’s far-right party ‘Rassemblement National’ (RN) leader, Marine Le Pen, candidate for the 2022 presidential election makes a statement after the results of the votes in the second round at ‘Pavillon d’Armenonville’ on April 24, 2022 in Paris, France. (Photo by Thierry Chesnot/Getty Images)Those of us around the world who are opposed to fascism, white supremacy, and politicians beholden to Vladimir Putin briefly exhaled over the weekend…
It’s OK to Ignore Twitter
       
It’s OK to Ignore TwitterAfter all, the vast majority of the country does. About nine months ago, I stopped spending much time on Twitter. Part of this was accidental: I’d just signed a new book contract and had to transfer much of my doomscrolling time into book-writing time. But I also was exhausted. I’d been shut inside like you for most of 2020 and 2021, spending that time terrified by whatever fresh horror…
AI Will Soon Put Creators Out of Work. How Should We Adapt?
       
AI Will Soon Put Creators Out of Work. How Should We Adapt? A review of Daniel Susskind’s ‘A World Without Work’Cover Design by Nicolette SeebackIt wasn’t long ago that “creatives” were told automation would finally validate our life choices. Those of us who chose to major in the liberal or fine arts would have the last laugh, as our less remunerative careers would still be around while the accountants and lawyers of…
Why Mariupol Will Resist and Revive
       
Why Mariupol Will Resist and ReviveWhen the rocket hit their apartment, my grandparents moved out of the city. With my husband Fabien at my grandparents apartment in 2019But even before, Mariupol has been a cool and edgy city. When I lived in Mariupol, I was doing ballroom dancing and made a lot of friends through that. Like, one of my friends is a writer and she started writing a book about Mariupol before the war, “A City of Dust”. And I believe Mariupol will be rebuilt from scratch and will revive.
The Gen X work problem
       
The Gen X work problemFirst, let’s be clear: I have never been a big fan of generalizing about generations. I am making a small exception for parts of the argument in this post. Who is Gen X? Molly Ringwald, baby! And how have they been described?
Russian Occupants Didn’t Understand Why They Captured Chornobyl Nuclear Power Plant
       
Russian Occupants Didn’t Understand Why They Captured Chornobyl Nuclear Power PlantPhoto by Bohdan SerdiukI’ve worked at Chornobyl for almost 25 years and watched its occupation through on-line surveillance cameras. Thousands of tubes with used nuclear fuel will be deprived of cooling capacity and melt along with a few meters long constructions containing this fuel. However, only 10% of used nuclear fuel has been transferred there. Besides CNPP personnel, nuclear waste is under the control of the International Atomic Energy Agency, IAEA. They organize regular checkups and control the state of nuclear fuel on-line with special sensors.
Russia reminds us that journalism is too big to fail
       
Russia reminds us that journalism is too big to failThe Russia-Ukraine tragedy offers a useful reminder of the vital importance of the news media in any country and for the world. But for the moment, they get their information from media beholden in one way or another to Russian President Vladimir Putin. What they hear is that the West is threatening Russia, the “operation” protects Russian speakers from neo-Nazi attacks, and Ukrainians hate their fascist government. The Palestinian media are mostly not free and lack any tradition of impartiality, introspection or the challenging of dogma. The challenge is to expand the paying audience beyond the highly educated — and to educate more people, because when the free media is simplistic it is responding to market demand.
A Clash of Two Systems
       
A Clash of Two SystemsThe war in Ukraine is a confrontation between two systems, one modern, legalistic, decentralized and multicephalous; the other archaic, nationalistic, centralized and monocephalous( This is a copyright compatible version of my side of a conversation with Laetitia Strauch-Bonart published in the French periodical l’Express.) This war not only pits Ukraine and Russia against it, it is a confrontation between two systems, one modern, decentralized and multicephalous, the other archaic, centralized and autocephalous. Putin and the “realists” are the wrong century, they do not think in terms of systems or in terms of individuals. This model tends to “antifragility” — a concept present in my books that refers to a property of systems that strengthen when exposed to stressors, shocks or volatility. This also confirms, if it were still needed, the absurdity of Samuel Huntington’s ideas in The Clash of Civilizations.
Transformers: What Are They and How Can I Make One?
       
Transformers: What Are They and How Can I Make One? A simple summary of Transformers for NLP tasks, and a guide to making a Transformer for text generation with PyTorch. Transformers then can be made to autocomplete your sentence or your code or simply write you a story. The output of the decoder’s self-attention will be a square matrix, holding the score for every combination of target words. Finally, just for completion, note that some other models instead dump the decoder in favor of an encoder-only architecture, starting with Google’s BERT (Bidirectional Encoder Representations from Transformers).
K-Fold Cross Validation Explained
       
K-Fold Cross Validation ExplainedUsing SciKit-Learn and Yellowbrick librariesK-Fold cross-validationThe overfitting problemOne of the biggest problems in Machine Learning is the risk of Overfitting. In the image above we can see that the score in the training set was 85%, but in the test set, it was only 81%. The test set is specially created to check if the model created by our machine suffers from overfitting. From the moment the model “sees” the test set data, it starts to know them, and we run the risk of overfitting again. The K-Fold solutionCross-validation is a technique used when data is limited, that reuses data for training and test sets.
Practical Lessons About Debugging Neural Networks
       
Practical Lessons About Debugging Neural NetworksEven though debugging stacks have come a long way, they still remain one of the fundamental challenges in machine learning applications. Let’s take the example of a deep neural network that has been regularly achieving a 3.5% error rate. Practical Tips for Deep Learning DebuggingThe complex structure of deep neural networks and the lack of sophisticated tools makes the debugging of deep learning applications nothing short of a nightmare. Debugging and understanding deep learning programs feels unnatural to many mainstream software engineers. As deep learning research evolves, the architecture of deep neural networks should become more interpretable and, consequently, easier to debug.
Six Months Later: What Data Science (Hopefully) Learned From Facebook’s Whistleblower
       
This is important: as data scientists, we have the obligation to avoid letting others use our work in harmful ways. Despite no data-science-specific code of professional conduct, in practice data scientists need to observe this obligation. I am not the first to call for a sense of responsibility among data scientists. To the extent that Haugen also called for a sense of responsibility among data scientists through her actions, she is also not the first. The analyst will have already established that he or she recognizes a sense of ethical obligation and that the team should recognize a sense of ethical obligation as well.
Multiprocessing in Python
       
import multiprocessing p1 = multiprocessing . Here’s the output with the join statements added:Sleeping for 0.5 seconds Sleeping for 0.5 seconds Finished sleeping Finished sleeping Program finished in 0.5688213340181392 seconds 1 2 3 4 5 Sleeping for 0.5 seconds Sleeping for 0.5 seconds Finished sleeping Finished sleeping Program finished in 0.5688213340181392 secondsWith the similar reasoning, we can make more processes to run. perf_counter ( ) print ( f "Program finished in {finish_time-start_time} seconds" )Multiprocessing for real useStarting a new process and then join it back to the main process is how multiprocessing works in Python (so as in many other languages). BooksHigh Performance Python, 2nd edition, by Micha Gorelick and Ian OzsvaldAPIsjoblibmultiprocessing in Python standard libraryconcurrent.futures in Python standard librarySummaryIn this tutorial, you learned how we run Python functions in parallel for speed. In particular, you learnedHow to use the multiprocessing module in Python to create new processes that runs a functionmodule in Python to create new processes that runs a function The mechanism of launching and completing a processThe use of process pool in multiprocessing for controlled multiprocessing, and the counterpart syntax in concurrent.futuresfor controlled multiprocessing, and the counterpart syntax in How to use the third-party library joblib for multiprocessingDiscover Fast Machine Learning in Python!
Google AI Blog: Google at ICLR 2022
       
The 10th International Conference on Learning Representations (ICLR 2022) kicks off this week, bringing together researchers, entrepreneurs, engineers and students alike to discuss and explore the rapidly advancing field of deep learning. Entirely virtual this year, ICLR 2022 offers conference and workshop tracks that present some of the latest research in deep learning and its applications to areas ranging from computer vision, speech recognition and text understanding to robotics, computational biology, and more. As a Platinum Sponsor of ICLR 2022 and Champion DEI Action Fund contributor, Google will have a robust presence with nearly 100 accepted publications and extensive participation on organizing committees and in workshops. If you have registered for ICLR 2022, we hope you’ll watch our talks and learn about the work done at Google to address complex problems that affect billions of people. Here you can learn more about the research we will be presenting as well as our general involvement at ICLR 2022 (those with Google affiliations in bold).
Interactive Course on Optimizing Search Engines With Ricardo Baeza-Yates Starting May 10
       
To successfully evaluate, build, deploy and scale information retrieval systems, engineers working with search systems must understand the frameworks and algorithms that underpin this technology. Professors Ricardo Baeza-Yates (Northeastern University) has done research on information retrieval and web search for more than 25 years in academia and industry, leading Yahoo Labs in Europe. He’ll be joined by web search expert Professor Fabrizio Silvestri to host a live, interactive course that will explore the main skeleton of a search system. The course consists of four live, online sessions that combine theory, case studies and personal experience. Ricardo and Fabrizio will interact with you live, and you’ll get the chance to engage with a small group of like-minded peers across industries.
Should I Use Offline RL or Imitation Learning?
       
Should I Use Offline RL or Imitation Learning? Are there fundamental limitations to methods that rely on some form of imitation (BC, conditional BC, filtered BC) that offline RL addresses? While it might be clear that offline RL should enjoy a large advantage over imitation learning when learning from diverse datasets that contain a lot of suboptimal behavior, we will also discuss how even cases that might seem BC-friendly can still allow offline RL to attain significantly better results. Empirical Results Comparing Offline RL and BCIn our discussion so far, we have already studied settings such as the antmazes, where offline RL methods can significantly outperform imitation-style methods due to stitching. This highlights the need for understanding how offline RL methods must be tuned, and at least, in part explains the poor performance of offline RL when learning from demonstration data in prior works.
How Nordic Aviation Capital uses Amazon Rekognition to streamline operations and save up to EUR200,000 annually
       
In this post, we share how NAC uses Amazon Rekognition to streamline their operations. “Amazon Rekognition Custom Labels has given us superpowers when it comes to improving our aircraft maintenance reviews. Rekognition Custom Labels allows you to build custom computer vision models for image classification and object detection tasks. NAC chose Amazon Rekognition because it significantly reduced the undifferentiated heavy lifting of training and deploying a custom computer vision model. Learn more about how you can build custom computer vision models tailored to your specific use case by visiting Getting Started with Amazon Rekognition Custom Labels or reviewing the Amazon Rekognition Custom Labels Guide.
Host Hugging Face transformer models using Amazon SageMaker Serverless Inference
       
Amazon SageMaker and Hugging Face have been collaborating to simplify and accelerate adoption of transformer models with Hugging Face DLCs, integration with SageMaker Training Compiler, and SageMaker distributed libraries. In this post, we explore how to use SageMaker Serverless Inference to deploy Hugging Face transformer models and discuss the inference performance and cost-effectiveness in different scenarios. Let’s walk through how to deploy Hugging Face models on SageMaker Serverless Inference. In this post, we introduced how you can use the newly announced SageMaker Serverless Inference to deploy Hugging Face models. We provided a detailed code snippet using the SageMaker Python SDK to deploy Hugging Face models with SageMaker Serverless Inference.
Probably The Easiest Way To Animate Your Python Plots
       
Image by u_w8477gyh9u from PixabayProbably The Easiest Way To Animate Your Python PlotsGenerate frames of plots and combine them as GIFVisualisation is always an important usage of Python for Data Science and Data Analytics. In this article, I will introduce a relatively less scalable but much easier way to animate our Python plots, which is using the ImageIO library. An Example of Line ChartImage by Pexels from PixabayLet’s start with a basic line chart for this demo. Let’s say that we want to animate the line chart by plotting the points one by one. The idea is to plot the line chart with 2 points, 3 points, … and 50 points.
Sparse Matrices: Why They Matter for Machine Learning and Data Science
       
Sparse Matrices: Why They Matter for Machine Learning and Data ScienceAnd why you should careIntroductionWhat is sparse data? A matrix (or dataset) that mostly contains 0s is called a sparse matrix. An example of a sparse matrix (Image by author)But how do we do that in practice? Pandas code to one-hot encode your data matrix (Image by author)The memory usage is now around 990MB compared to 700MB before one-hot encoding. Making sure to use sparse matrices when running your Machine Learning models can greatly help speed up the run time.
SageMaker Batch Transform
       
SageMaker Batch TransformGenerate large offline predictions with an Sklearn exampleImage from Unsplash by Jonathon FarberIn my last article I talked about the latest SageMaker Inference option in Serverless Inference. An older, yet equally important option is SageMaker Batch Transform. There will be costs incurred through the deployment process for the Batch Transform Job. With Batch Inference we do not work with endpoints as the other three SageMaker Inference options do. If you’re more into video tutorials here’s a great Batch Transform Section in the following course.
Introduction to Random Forest Algorithm
       
Introduction to Random Forest AlgorithmHow the algorithm works and what we can use it forPhoto by Jeremy Bishop on UnsplashRandom Forest is a supervised machine learning algorithm that is composed of individual decision trees. The basis for the Random Forest is formed by many individual decision trees, the so-called Decision Trees. How Random Forest worksThe Random Forest consists of a large number of these decision trees, which work together as a so-called ensemble. The secret behind the Random Forest is the so-called principle of the wisdom of crowds. Application areas of the Random Forest algorithmRandom Forest models are used for classification tasks and regression analyses, similar to decision trees.
Postgres Fuzzy Search With pg_trgm: Smart Database Guesses What You Want and Returns “Cat Food” When “Pet Food” Not Found
       
Postgres Fuzzy Search With pg_trgm: Smart Database Guesses What You Want and Returns “Cat Food” When “Pet Food” Not Found Intelligent search with pg_trgm and Levenshtein distance in Postgres All images in the article by author There are three common scenarios in database searches that go beyond the capability of traditional wildcard querying. First, when users search for “pet food” but there are no products called “pet food” found in your database, would the database be smart enough to return “cat food” or “dog food” instead? We can see that the pg_trgm search is able to return “cutting boards” as the closest match and it is exactly what the user wants. fuzzy search with pg_trgmLevenshtein distance in Postgres Different from pg_trgm, Levenshtein distance measures similarity by looking into how different two strings are. What are the limitations of fuzzy search in Postgres There are two main limitations in Postgres’s fuzzy search methods.
Feature Selection for the Lazy Data Scientist
       
Feature Selection for the Lazy Data ScientistA comprehensive literature review and code on Filter-based methods for Feature SelectionPhoto by Nathan Dumlao on UnsplashIf you’re a data scientist and the curse of dimensionality has struck you, this post is for you. If you’re facing the problem of high dimensionality, you probably have heard the terms “dimension reduction (or PCA/Auto Encoders)” and “feature selection.” Before we delve into feature selection, here is a short description of dimension reduction that will help you decide whether you should pursue this approach. First, some terminology: Feature selection methods can be divided into three families:(1) Filter methods: Feature selection is made as part of the pre-processing, that is, before we train a model. In Laplacian feature selection, we embed data in the nearest neighbor’s graph by measuring an arbitrary distance and calculating the weight matrix. We haven’t covered unsupervised algorithms for feature selection [17] and [18], spectral methods for feature selection [19], and graph-based methods [20], [21] and [22].
What Is an MLP, and Why Should You Care?
       
An MLP, or multi-layer perceptron, is a type of neural network composed of many different interconnected perceptrons, or mathematical equations. ?What Is an MLP, and Why Should You Care? An artificial neural network attempts to replicate the way the human brain works, albeit not quite in the same way. We run this training data through our MLP, and if the answer is wrong, the weights and bias are adjusted. The pixels of an image can be reduced down to one long row of data and fed into a MLP.
Structure Prediction and Learning
       
Structure Prediction and LearningCombining predictive modeling with structure inferencePhoto by Evgeniy Surzhan on UnsplashSupervised machine learning involves predicting the value of an outcome variable from some input. Protein Structure Prediction: Predict the secondary or tertiary structure of a protein from its sequence. This is simply the number of transitions from state Det to state Noun divided by the number of occurrences of state Det in the training set. Finally, we brought in discriminative learning for training the weights of the joint feature functions. This culminated in the Structured Perceptron: a blend of efficient inference and discriminative learning in the more general setting of joint feature functions.
Elon Musk is a Poster Child for How Power Corrupts
       
And with great success comes great power — the power to stop listening. I believe Elon will be a poster child for how power corrupts potential. We should be wary of billionaires promising free speech. Supporters claim that any objection to Musk’s ownership is an attempt to throttle free speech. This is about power, a lack of counterweights, and a history that confirms that when the ratio of power to guardrails becomes this imbalanced … bad things happen.
What a $400 Million Pizza Order Teaches Us About Crypto
       
What a $400 Million Pizza Order Teaches Us About CryptoCrypto people think differentlyIllustration via photos by Gado Images, Photo by André François McKenzie on UnsplashOn May 22, 2010, Florida resident Laszlo Hanyecz bought two pizzas that would later be valued at over $400 million. That means the 10,000 Bitcoins that Hanyecz spent on his two pizzas would today be worth over $400 million. Hanyecz’s $400 million pizza can also teach us three important lessons about cryptocurrencies and the crypto world. It’s Hard to Value Crypto CoinsAnother truism about cryptocurrencies is that they’re hard to value. It’s Not About the MoneyWhen Hanyecz says that he has no regrets about losing out on $400 million in Bitcoin gains, I believe him.
The Big Tech Reckoning is Here
       
The Big Tech Reckoning is HereThe European Union’s DSA could fundamentally change the InternetPhoto by NASA on UnsplashOver the course of 48 hours the ephemeral discussion of how to manage online content solidified into something concrete and formidable. It seemed to start with a landmark speech on the dangers of disinformation from former President Barack Obama at Stanford University’s Cyber…
The Intersection of Disability and Gender with Online Harassment
       
The Intersection of Disability and Gender with Online Harassmentby Dr. Erin PritchardI am a lecturer in Disability studies. Disability studies is characterized by a personal interest in disability and evolved from disability activism in the 1960s. In 2019, I had a chapter published in an academic book arguing that the word midget is a form of disability hate speech. However, it was only after the story came out that I began to receive online abuse. This is all to try and reclaim their power over my objection to disability hate speech.
Being a Queer Parent is Terrifying Right Now
       
Being a Queer Parent is Terrifying Right NowOn trying to get one tiny child through a culture war unscathed. “Rainbow umbrella” is a cheesy photo choice, but the other result for “protection” was “barbed wire prison camp,” which sends a less-than-nurturing message. Photo by Jason Blackeye on UnsplashI’ve run out of things to say about myself during therapy. I now spend my sessions talking about one and only one thing: My kid, who is preparing to enter kindergarten. Specifically, I talk about my fear that I have fucked up my kid’s life…
Becoming a “real” data analyst
       
Becoming a “real” data analyst10 differences between amateurs and professional analystsPreviously, I introduced you to a few analytics tasks disguised as everyday activities to prove that you’re already a data analyst. Data pro vs amateur difference #1 — Software skillsUnlike most amateurs, the pro knows how to use software (e.g. “With data, you’re still just another person with an opinion.’’One of my favorite pioneers of statistics, W. Edwards Deming, famously said that “without data, you’re just another person with an opinion.’’ That is true, but unfortunately so is this: “With data, you’re still just another person with an opinion.’’ Expert analysts understand this in their very bones. Photo by Alexander Sinn on UnsplashIn addition to more practice with professional tools, the professional analyst understands the, ahem, professional aspects of the profession, which we’ll cover in the next article in this series. For a sneak preview, here are the upcoming section headings:Data pro vs amateur difference #4 — Understanding the careerData pro vs amateur difference #5—Refusing to be a data charlatanData pro vs amateur difference #6 — Resistance to confirmation biasData pro vs amateur difference #7—Realistic expectations of dataData pro vs amateur difference #8—Knowing how to add valueData pro vs amateur difference #9—Thinking differently about timeData pro vs amateur difference #10 — Nuanced view of excellenceIf you’ve thought of any other differences that might not fall under these headings, let me know in the comments!
I Grew Up In Portland, And I’m Done Hating On It
       
I Grew Up In Portland, And I’m Done Hating On ItIt’s easier to dislike something than to really miss it. Illustration by the author. (A note: this piece was originally written in January of 2020, pre-pandemic. The sentiment remains.) First, two brief anecdotes, separated in time by twelve years.
We Need Feminist Cultural Criticism — But There’s Nowhere Left To Go
       
We Need Feminist Cultural Criticism — But There’s Nowhere Left To GoAs Bitch Media shuts down, generations of feminists ask: where will feminist cultural criticism go? Photo by Aron Visuals on UnsplashWhen I first watched Anitta’s “Vai Malandra” video, I desperately wanted to write about it, but I found it difficult to find anywhere to publish me. The video was controversial in…
NASA’s New AI Will Terrify Putin
       
Photo by Maciej Ruminkiewicz on UnsplashNASA’s New AI Will Terrify PutinHypersonic missiles designed by AI. Unless you have been living under a rock you will have noticed international relations have not been so peachy recently. Particularly with Russia, as they continue to use brutal and criminal acts in Ukraine, despite universal outcry. One of the most shocking revelations of the war in Ukraine has been Russia’s liberal use of hypersonic missiles…
Famous Modern Math Problems: The Erdos-Turan Conjecture
       
Famous Modern Math Problems: The Erdos-Turan ConjectureOne of the most elegant problems in number theory that remains unproven. Please give it a try by subscribing below:In another installment of our series about famous mathematical problems, today we would like to cover one of the most famous conjectures about the properties of large number series. The problem itself states that, in a large dense-enough series of numbers there are arbitrarily long series of evenly spaced numbers. The first significant progress towards proving the Erdős-Turan conjecture came in 1953 by German-British mathematician Klaus Roth. The Erdős-Turan conjecture remains unproven for all the other size of arithmetic progressions making one of the most enigmatic modern mathematical problems.
Data Mesh Operability Pattern
       
Data Mesh Operability PatternThe Data Mesh Operability Pattern helps us understand the operating characteristics of an enterprise Data Mesh. Key activities in the Data Mesh Operability Pattern are grouped in three sections:Steps 1–7 describe operability flows within a data product (Figure 1)Steps 8–10 describe operability flows between data products in a Data Mesh (Figure 2)Steps 10–13 describe usage of the Data Mesh Operability pattern (Figure 3)Figure 1, Data Mesh Operability Pattern, Steps 1–7Operability flows within a data Product (Steps 1–7) include:Application interactions with a Data Product cause an error. Figure 2, Data Mesh Operability Pattern, Steps 8–11All data products emit operability events, alerts, performance, and operability characteristics between Data Products across the enterprise Data Mesh (Steps 8–11):8. Figure 3, Data Mesh Operability Pattern, Steps 12–14An Enterprise Data Product Catalog surfaces Data Product and enterprise Data Mesh operating metrics and characteristics (Steps 12–14):12. Data Mesh Operability Pattern captures downtime of key components in a data Product and enterprise Data Mesh.
Pick Your Deep Learning Tool
       
Domain-Specific DL ToolsThe way most practitioners work with ML is to use a domain-specific tool for their branch of deep learning. Unfortunately, the only solution to the fast-changing-frameworks problem is to write all the deep learning operations by yourself. They are designed differently, since Returnn is meant to be a standalone software while Keras is a framework for developing deep learning tools. ConclusionsThe current applicative deep learning landscape is ruled by tools built around deep learning frameworks and specialized for a single, or a few tasks. In this article I wanted to share another option to build deep learning tools, using the example of Returnn, the only deep learning engine I am aware of.
Optimize PyTorch Performance for Speed and Memory Efficiency (2022)
       
Set the sizes of all different architecture designs as the multiples of 8 (for FP16 of mixed precision) Training10. Set the batch size as the multiples of 8 and maximize GPU memory usage11. Set the batch size as the multiples of 8 and maximize GPU memory usage 11. Besides setting batch size as the multiple of 8, we also maximize the batch size until it hits the memory limit of GPU. It’s quite easy to leverage mixed precision in PyTorch with the automatic mixed precision (AMP) package.
Don’t believe Obama’s Big Tech criti-hype
       
Don’t believe Obama’s Big Tech criti-hypeThey’re not evil geniuses (they’re not geniuses, period). Obama’s Stanford University speech this Thursday (correctly) raised the alarm about conspiratorial thinking, and (correctly) identified that Big Tech was at the center of that rise — and then (wildly incorrectly) blamed “the algorithm” for it.
4 Huge Wastes of Money That People Still Indulge
       
Self | Money4 Huge Wastes of Money That People Still IndulgeCut down on your spending to expand your opportunities. Editorial rights purchased via iStock PhotosI was the most wasteful spender alive. I had the mindset of a toddler. If I wanted something, I had to have it that moment. If it felt good, looked good, tasted good, I handed over my cash.
Filtering Data in Tableau: A Road to Tableau Desktop Specialist Certification
       
We can simply filter the data by selecting required data points from the view. Using this filter, Tableau will create an extract of filtered data that could be further used for visualizations. Data Source FiltersThese filters are used to filter out data at the data source level. The data which was earlier getting filtered independently now will be filtered on the data filtered by context filter. A context filter would always be applied first and the rest of the filters will be applied to data filtered by context filter.
BRIO: Bringing Order to Abstractive Summarization (Paper Review/Described)
       
A new state-of-the-art (SOTA) result for the abstractive text summarization task was published [1] shortly after I wrote about the SimCLS [2] (previous SOTA) paper. The Contrastive Loss (ctr) is responsible for guiding the model to learn how it should rank multiple candidates for a given article. (Image from [1])The variable gamma (γ) controls the contribution of contrastive loss to the final loss. ResultsThe BRIO approach set the new SOTA result for three abstractive summarization datasets: CNN/DailyMail, XSum, and NYT. BRIO: Bringing Order to Abstractive Summarization.
How to write a Custom Keras model so that it can be deployed for Serving
       
How to write a Custom Keras model so that it can be deployed for ServingHow to adapt custom Layers, Model, loss, preprocessing, postprocessing into a servable APIIf the only Keras models you write are sequential or functional models with pre-built layers like Dense and Conv2D, you can ignore this article. Just that it involves custom Keras layers and a custom Keras model (i.e. Here's the issue: When you write a custom Keras layer or Keras loss or Keras model, you are defining code. Sending in:model = tf.keras.models.load_model(EXPORT_PATH)sample_input = ["Justin Trudeau went to New Delhi India","Vladimir Putin was chased out of Kyiv Ukraine"]model.predict(sample_input)gives back:array([[b'B-NAME', b'I-NAME', b'OUT', b'OUT', b'B-LOCATION',b'I-LOCATION', b'I-LOCATION', b'[PAD]', b'[PAD]', b'[PAD]',b'[PAD]', b'[PAD]', b'[PAD]', b'[PAD]', b'[PAD]', b'[PAD]'],[b'B-NAME', b'I-NAME', b'OUT', b'OUT', b'OUT', b'OUT',b'B-LOCATION', b'I-LOCATION', b'[PAD]', b'[PAD]', b'[PAD]',b'[PAD]', b'[PAD]', b'[PAD]', b'[PAD]', b'[PAD]']], dtype=object)i.e. Once you write your custom layers, you have to do custom object registration.
In America, We Were All “Russian,” but Soviet Jewish Identity Has Always Been Complex
       
(Per our internal passports, our nationality wasn’t marked as “Russian,” or “Ukrainian,” but as “Jewish.” That’s how our compatriots saw us, too.) I had thought that the severing of national allegiance to our former homes was permanent, for many of us Soviet Jews. In the past few weeks, I’ve seen Ukrainian Jewish friends and celebrities who came to the U.S. during the same late 1980s wave speak emotionally of their Ukrainian pride. Still, Ukrainian voters overwhelmingly chose a Jewish president, someone who said he came from a nonreligious “ordinary Soviet Jewish family,” a family much like mine. I’d argue that Ukrainian Jews were — and still are — much more rooted to a sense of place than Russian Jews, because of an enduring Jewish presence in that country.
Why Snapchat’s Product Is Booming
       
Why Snapchat’s Product Is BoomingThe company withstood Facebook’s best swing, now it’s growing fast again. Here’s why. In August 2016, Snapchat seemed destined for decline. Facebook that month cloned Stories — Snapchat’s most beloved feature — and rolled it out on Instagram, en route to placing it on Messenger, Whatsapp, and Facebook itself. At the time, Snapchat was a plucky…
A Clash of Two Systems
       
A Clash of Two SystemsThe war in Ukraine is a confrontation between two systems, one modern, legalistic, decentralized and multicephalous; the other archaic, nationalistic, centralized and monocephalous( This is a copyright compatible version of my side of a conversation with Laetitia Strauch-Bonart published in the French periodical l’Express.) This war not only pits Ukraine and Russia against it, it is a confrontation between two systems, one modern, decentralized and multicephalous, the other archaic, centralized and autocephalous. Putin and the “realists” are the wrong century, they do not think in terms of systems or in terms of individuals. This model tends to “antifragility” — a concept present in my books that refers to a property of systems that strengthen when exposed to stressors, shocks or volatility. This also confirms, if it were still needed, the absurdity of Samuel Huntington’s ideas in The Clash of Civilizations.
Top 3 Data Science Courses Every Data Scientist Should Consider
       
Top 3 Data Science Courses Every Data Scientist Should ConsiderOnline courses for data scientistsPhoto by Avel Chuklanov on UnsplashAs someone interested in data science, you might be wondering what the top online courses are to break into the data science field or if you are already in the industry you might be looking for courses to complement your knowledge and expertise. These are online courses that cover material that is essential for approaching data science challenges. Machine Learning by Andrew NgOne of the students’ all-time favourites is the Machine Learning course by Andrew Ng. Anyone who inspires to become a true expert in the data science field should know about deep learning approaches at a deeper level. AWS or GCP or MSFT Azure Data Science Cloud specialisationsBeing a data scientist is more than having knowledge about machine learning models.
Web Frameworks for Your Python Projects
       
update ( { "activation" : activation , "optimizer" : optimizer , "epoch" : epoch , "batchsize" : batchsize , } ) model , history = train ( ) model_data [ "model" ] = model # keep the trained model history = pd . update ( { "activation" : activation , "optimizer" : optimizer , "epcoh" : epoch , "batchsize" : batchsize , } ) model , history = train ( ) model_data [ "model" ] = model # keep the trained model history = pd . update ( { "activation" : activation , "optimizer" : optimizer , "epoch" : epoch , "batchsize" : batchsize , } ) model , history = train ( ) model_data [ "model" ] = model # keep the trained model history = pd . callback ( Output ( "activationdisplay" , "children" ) , Input ( "activation" , "value" ) ) def update_activation ( value ) : model_data [ "activation" ] = value return f "Activation: {value}" @ app . update ( { "activation" : activation , "optimizer" : optimizer , "epoch" : epoch , "batchsize" : batchsize , } ) model , history = train ( ) model_data [ "model" ] = model # keep the trained model history = pd .
Google AI Blog: Pix2Seq: A New Language Interface for Object Detection
       
In “Pix2Seq: A Language Modeling Framework for Object Detection”, published at ICLR 2022, we present a simple and generic method that tackles object detection from a completely different perspective. Unlike existing approaches that are task-specific, we cast object detection as a language modeling task conditioned on the observed pixel inputs. We demonstrate that Pix2Seq achieves competitive results on the large-scale object detection COCO dataset compared to existing highly-specialized and well-optimized detection algorithms, and its performance can be further improved by pre-training the model on a larger object detection dataset. Pix2Seq framework for object detection. Since our approach incorporates minimal inductive bias or prior knowledge of the object detection task into the model design, we further explore how pre-training the model using the large-scale object detection COCO dataset can impact its performance.
Secure AWS CodeArtifact access for isolated Amazon SageMaker notebook instances
       
In this post, we demonstrate how to securely connect to AWS CodeArtifact from an Internet-disabled SageMaker Notebook Instance. The architecture allows our Internet-disabled SageMaker notebook instance to access CodeArtifact repositories without traversing the public internet. We also need a CodeArtifact domain in the AWS Region where you created your Internet-disabled SageMaker notebook instance. Choose the AWS account you are working in and the AWS CodeArtifact domain you use for this account. Choose Apply a repository policy under Repository policy.
Shifting your mindset from amateur to professional analyst
       
Data pro vs amateur difference #2 — Handling lots of data with easeCovered in the previous article. Data pro vs amateur difference #3 — Immunity to data science biasCovered in the previous article. Data pro vs amateur difference #4 — Understanding the careerUnlike amateurs, the professional analyst is an analyst by choice, not by misfortune. The professional analyst is an analyst by choice, not by misfortune. In the next one, we’ll tackle the last four differences between amateurs and professional analysts:Data pro vs amateur difference #7 — Realistic expectations of dataData pro vs amateur difference #8 — Knowing how to add valueData pro vs amateur difference #9 — Thinking differently about timeData pro vs amateur difference #10 — Nuanced view of excellenceLet me know if you’re enjoying this topic and don’t forget to share your favorite insights with your community!
How to Interpret Any Machine Learning Prediction
       
ARTIFICIAL INTELLIGENCE | EXPLAINABILITY | DATA SCIENCEHow to Interpret Any Machine Learning PredictionTransforming black-box models into glass boxesPhoto by Wilhelm Gunkel on UnsplashLocal Interpretable Model-agnostic Explanations (LIME) is a Python project developed by Ribeiro et al. [1] to interpret the predictions of any supervised Machine Learning (ML) model. Therefore, LIME is able to explain a specific prediction by understanding which features had the most contribution to the prediction. A surrogate model g is any model which is used to interpret the results of another predictive algorithm. # importsimport numpy as npfrom sklearn.datasets import load_irisfrom sklearn.ensemble import RandomForestClassifierfrom sklearn.metrics import accuracy_scorefrom sklearn.model_selection import train_test_splitWe can also import LIME as follows:from lime.lime_tabular import LimeTabularExplainerOur problem has a supervised tabular structure.
AI Applied to Mask Detection: A Transfer Learning Application
       
Applied Machine LearningAI Applied to Mask Detection: A Transfer Learning ApplicationApplying Pre-trained Models to a Classification ProblemImage adapted from Face Mask Detection | KaggleTransfer Learning is the best way to adapt extremely complex models in a matter of minutes for extremely good results. In this article, I apply Transfer Learning to mask detection. The cropped faces are classified into 3 classes: “Mask”, “No Mask”, “Mask worn incorrectly”. Image adapted from Face Mask Detection | KaggleIn the image above the woman on the left has her mask under her nose, but the model still classifies it as Mask. One should consider ethical concerns when choosing a suitable application for a mask detection model.
Getting Started with Recurrent Neural Network (RNNs)
       
Getting Started with Recurrent Neural Network (RNNs)Using RNNs for Sentiment AnalysisPhoto by Nishaan Ahmed from UnsplashThis article will discuss a separate set of networks known as Recurrent Neural Networks(RNNs) built to solve sequence or time series problems. What is a Recurrent Neural Network? A Recurrent Neural Network is a special category of neural networks that allows information to flow in both directions. How does a Recurrent Neural Network work? Final thoughtsIn this article, we have talked about the Recurrent Neural Network and its variations.
Guide to Data Labeling for Search Relevance Evaluation
       
Many e-businesses use it to gauge search quality relevance on their platforms to provide better services to users. One of the most effective ways to evaluate search relevance is through human-in-the-loop data labeling, of which crowdsourcing is our methodology of choice. Naturally, we want to obtain as fair an estimation of the platform’s search relevance quality as possible. Human judgements are much more robust when it comes to overfitting, allowing us to measure search relevance without being influenced by any factors built into the system. Generally speaking, crowdsourcing allows for large dataset volumes and tends to provide deeper insights into IR and search query relevance than managed/in-house crowds.
5 Must-Have Machine Learning Books in 2022
       
Beginning with an introduction to Python, progressing towards hands-on machine learning and finally dwelling deeper into tackling machine learning problems (a top-down approach to learning). With this in mind, I wanted to recommend several books that helped me land my first job in machine learning . To become an expert in machine learning, you first need to develop a strong foundation in three areas: coding , machine learning theory and maths . Géron does an excellent job with his concise and detailed introductions to a broad range of machine learning topics including supervised learning, unsupervised learning, deep learning and even reinforcement learning. In his short book Machine Learning Yearning, Andrew Ng passes on his wisdom on how to structure machine learning projects.
How to Make Artistic Images with Neural Style Transfer
       
This article explains Neural Style Transfer, which refers to the transfer of an image’s style while preserving the content of an image using a pre-trained model VGG-19. Style Loss: Style Loss expresses the similarity of correlation of activation layers between style loss style image and generated image. Total Loss: Total Loss is calculated as style loss + content loss obtained as above. Style Transfer Algorithm, sourceFigure 3 shows the style transfer algorithm(calculating losses) by the original paper. The following code block includes the implementation of Neural Style Transfer using Tensorflow in Python:ConclusionThe most enjoyable aspect of Neural Style Transfer is that, rather than uniformity of results, various visual results are obtained with hyperparameter tuning.
Get Uncertainty Estimates in Neural Networks for Free
       
Get Uncertainty Estimates in Neural Networks for Free Given the right loss function, a standard neural network can output uncertainty as well Photo by Christina Deravedisian on Unsplash Whenever we build a machine learning model, we usually design it in such a way that it outputs a single number as the prediction. However, if you feel like going into the realm of fully Bayesian neural networks at some point, try out libraries like Tensorflow Probability or Pyro for PyTorch. So, for a small recap, the following is the mean squared error (MSE) loss function: Image by the author. ? Interpretable Neural Networks With PyTorch Learn how to build feed-forward neural networks that are interpretable by design using PyTorch towardsdatascience.com Let’s start with a simple example. All it takes is an additional output neural and a loss function that is only slightly more complicated than the MSE.
A Mushroom Farm in Every Closet?
       
A Mushroom Farm in Every Closet? A different kind of urban garden may be the key to feeding ourselves locally“Oyster Mushroom (Pleurotus ostreatus)” by Martin Cooper Ipswich is marked with CC BY 2.0. To view the terms, visit https://creativecommons.org/licenses/by/2.0/?ref=openverseLet’s talk about mushrooms. I’ve been comparing their underground stealthiness to Russian democracy, describing how they grow out of sight, keeping their peace until the right moment. I’ve pointed out that you never do know when they’re going to burst into…
Is This a Crisis or Just Midlife?
       
Is This a Crisis or Just Midlife? Rethinking the notion of the midlife crisisPhoto by Ryan Moreno on UnsplashWhich comes first, midlife or the crisis? Is it that finding oneself in midlife causes the crisis or do a series of crises make us acknowledge our midlife status? What I find most shocking about being past the fifty threshold is not my age or how much of my life is…
Disinformation Is a Threat to Our Democracy
       
Disinformation Is a Threat to Our DemocracyTech platforms need to recognize that their decisions have an impact on every aspect of society. Social media platforms have been similarly implicated in fanning ethnic violence in Ethiopia, far-right extremism in Europe. Number one, media companies, tech companies, social media platforms did not create the divisions in our society, here or in other parts of the world. So, the social media platforms called themselves neutral platforms with no editorial role in what their users saw. In Russia, Putin has weaponized ethnonationalism through disinformation, waging hate campaigns against domestic opponents, delegitimizing democracy itself.
Why the smartest people embrace being wrong
       
Why the smartest people embrace being wrongA mental model used by Jeff Bezos, Adam Grant, Ben Franklin, Ralph Waldo Emerson, and other great thinkers. “Let me demonstrate why I’m right and you’re wrong.”The preacherThe preacher evangelizes their sacred beliefs to protect and promote their ideals. You’re wrong. He’s observed that the smartest people are constantly revising their understanding, reconsidering a problem they thought they’d already solved. Individuals with high levels of intellectual humility are less certain about their personal religious beliefs and less likely to judge others based on theirs.
Do You Still Need to Wear a Mask?
       
Do You Still Need to Wear a Mask? So if I’m up to date on my vaccines and not feeling sick, should I continue to mask up? The more people who wear masks when Covid is spreading, as it is once again starting to, the better protected we all are. Likewise, we need public health tools to protect us if a new variant or another health threat emerges. Remember: Even if you don’t think you’re at risk from Covid, people around you might be.
I Would’ve Been a Great Mother — and a Terrible One
       
This Is UsI Would’ve Been a Great Mother — and a Terrible OneLooking back on what might have been, and what never wasImage by Leah Kelley via PexelsI would have made a great mom. My heart aches for helpless creatures. Pass me a newborn. I will never put her down.
Michael Saylor is Secretly Selling His Bitcoin and Doesn’t Want You To Know.
       
Michael Saylor is Secretly Selling His Bitcoin and Doesn’t Want You To Know. Michael Saylor has been one of the largest proponents and shillers of Bitcoin after his company announced they would be adding BTC to their balance sheet in 2020. Since then, the Bitcoin community has rallied around him, providing him with millions of followers, and giving him a massive public spotlight.
Have You Ever Wanted to Know What Kind of Bird is Singing that Song?
       
Bennett Manor FarmsHave You Ever Wanted to Know What Kind of Bird is Singing that Song? Let The Cornell Lab of Ornithology tell you with their handy App. Photo by AuthorHave you ever heard a bird singing and wondered what it was? The Cornell Lab of Ornithology and the Chemnitz University of Technology have teamed up and…
A Clash of Two Systems
       
A Clash of Two SystemsThe war in Ukraine is a confrontation between two systems, one modern, legalistic, decentralized and multicephalous; the other archaic, nationalistic, centralized and monocephalous( This is a copyright compatible version of my side of a conversation with Laetitia Strauch-Bonart published in the French periodical l’Express.) This war not only pits Ukraine and Russia against it, it is a confrontation between two systems, one modern, decentralized and multicephalous, the other archaic, centralized and autocephalous. Putin and the “realists” are the wrong century, they do not think in terms of systems or in terms of individuals. This model tends to “antifragility” — a concept present in my books that refers to a property of systems that strengthen when exposed to stressors, shocks or volatility. This also confirms, if it were still needed, the absurdity of Samuel Huntington’s ideas in The Clash of Civilizations.
Google AI Blog: Hidden Interfaces for Ambient Computing
       
Illustration of how hidden interfaces can appear and disappear in everyday surfaces, such as a mirror or the wood paneling of a home appliance. Parallel Rendering: Boosting PMOLED Brightness for Ambient ComputingWhile many of today’s consumer devices employ active-matrix organic light-emitting diode (AMOLED) displays, their cost and manufacturing complexity is prohibitive for ambient computing. Rendering User Interfaces and TextWe show that hidden interfaces can be used to create dynamic and expressive interactions. Realizing Hidden Interfaces with Interactive HardwareTo implement proof-of-concept hidden interfaces, we use a PMOLED display with 128×96 resolution that has all row and column drivers routed to a connector for direct access. Finally, longitudinal deployment would enable us to go deeper into understanding user adoption and behavior with hidden interfaces.
Inside Meta's AI optimization platform for engineers across the company
       
To address these needs at Meta, we’ve built an end-to-end AI platform called Looper, with easy-to-use APIs for optimization, personalization, and feedback collection. Rather than rebuild our existing products around AI models, Looper enables us to upgrade them to use AI for personalized optimizations. The Looper platform currently hosts 700 AI models and generates 4 million of AI outputs per second. The spectrum of AI expertise varied across product teams from beginners to experienced AI engineers, and only 15 percent of teams using the Looper platform include AI engineers. For teams without production AI experience, an easy-to-use AI platform is often the deciding factor for adoption, and AI investment continues upon evidence of utility.
Boost Performance of Text Classification tasks with Easy Data Augmentation
       
Boost Performance of Text Classification tasks with Easy Data AugmentationText data augmentation for NLP tasksImage by Felix Lichtenfeld from PixabayTraining on a small sample of data increases the chances of overfitting. But when it comes to NLP tasks, data augmentation of text data is not that easy. We will now discuss how each of the above-mentioned text augmentation techniques works under the hood and its improvements on text classification tasks. Conclusion:In this article, we have discussed 4 text data augmentation techniques that boost the performance of text classification tasks performed on a small sample dataset. References:[1] Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks Paper by Jason Wei & Kai Zou: https://arxiv.org/abs/1901.11196
Embedding Interactive Python Plots on the Web
       
Embedding Interactive Python Plots on the WebA guide on how to use Plotly Chart Studio and Datapane to share Python plots on the webPhoto by Jason Coudriet on UnsplashIntroductionOne of the most important steps in the Data Science pipeline is Data Visualization. Using Plotly, you can be able to create interactive graphs and dashboards using programming languages such as Python, R, Julia and Javascript. Some examples of plots created using Plotly Chart Studio, are available at this link. Additionally, using Datapane, it is also possible to share, not just individual plots but also collections of plots and Latex/Markdown text. ConclusionIn this article, we explored Plotly Chart Studio and Datapane as two possible options to embed interactive charts on the web.
A Quick and Simple Introduction to Object-Oriented Programming with Python
       
Photo by Nicole Wolf on Unsplash A Quick and Simple Introduction to Object-Oriented Programming with Python From objects to behaviors with OOP using Python How to manage complex programs with a bunch of data to manipulate, and a bunch of things to do? Object-Oriented Programming (OOP) is nothing more than a common programming paradigm, meaning, a way to design software. Image by the authorPython Classes To get into the topic of objects, properties, and behaviors, let’s use Python classes as our guide. In the context of Python Classes, the self.name = name line is an example of an instance attribute in Python, which refers to the specific attributes of a unique instantiation of that class. Let’s say I wanted to have my Person() class to be able to relate somehow to the Motorcycle() class, how about we write a modification to the turn_on() method from the Motorcycle() class, where it requires an instance of the Person() class to be created?
Exploring SageMaker Canvas
       
SageMaker Canvas completely removes the code aspect from ML, as you’ll see in this article. SageMaker Canvas ExampleTo set up SageMaker Canvas you need to create a SageMaker Domain. Option for Canvas (Screenshot by Author)This application should launch within a few minutes and you’ll be able to see the UI. If we return to the SageMaker Canvas UI we can perform prediction either using Batch Inference or a single data point. I hope this article was a good primer for SageMaker Canvas.
itertools and functools : Two Python Lone Soldiers
       
itertools and functools : Two Python Lone SoldiersOne cool thing about Python is its support for functional programming which states that processing steps are done through functions. In this article, I will talk about two functional programming modules that Python offer us to perform a variety of functional-based computations: itertools and functools . Simple and effective :>> from itertools import cycle>> for el in cycle(range(10)): print(el, end="")>> 0123456789012345 ...>> .. This is in contrast to SQL ’s GROUP BY , which groups similar data irrespective of the order. Let us work on a small example :>>> from itertools import starmap>>> v = starmap(sum, [[range(5)], [range(5, 10)], [range(10, 15)]])>>> list(v)[10, 35, 60]>>> ...
Best Practices for Visualizing Your Cluster Results
       
Best Practices for Visualizing Your Cluster ResultsProven techniques for cluster visualization and interpretationImage by author. You will learn best practices for analyzing and diagnosing your clustering output, visualizing your clusters properly with PaCMAP dimension reduction, and presenting your cluster’s characteristics. To create the following plots (figure 5) we will use the data-science-utils package, which can be installed with pip install data-science-utils . After running the code you should get the following plot (figure 6):Figure 6. The following code plots the differences per cluster for each feature.
Stop Using SMOTE to Treat Class Imbalance
       
There are many methods to treat class imbalance, with undersampling and SMOTE being the most popular. This hyperparameter was designed to control the trade-off between precision and recall, which in other terms, is the same as handling class imbalance. Class weights have the best f1 score and similar precision (Image by author)As you can observe, class weights perform better than other approaches regarding class imbalance. In a case study, we showed how carefully choosing class weights can result in the best modelling performance. We also demonstrated that this method maintains its superiority at different levels of class imbalance, from 25/75 all the way to 1/99.
Bayesian Customer Lifetime Values Modeling using PyMC3
       
Bayesian Customer Lifetime Values Modeling using PyMC3 Implementing BG-NBD, a probabilistic hierarchical model, using PyMC3 to analyze customer purchase behavior Source: Unsplash Customer lifetime value (CLV) is the total worth of a customer to a company over the length of their relationship. Some remarks on the above blueprint: The Gamma distribution represents the distribution of transaction rate λ in the customer population. Here are some examples of the Truncated Normal: Gamma and Beta Priors With these considerations, let's now construct our Gamma priors. We then select a specific combination of the Gamma parameters r and α that leads to a Gamma distribution with the desired mean of 4. The plot above displays a relatively desirable Gamma distribution with most of the ? found near 2.
Implementing ConvNext in PyTorch
       
Today we are going to implement the famous ConvNext in PyTorch proposed in A ConvNet for the 2020s . Starting point: ResNetAs you know (if you don’t I have an article about implementing ResNet in PyTorch) ResNet uses a residual BottleNeck block, this will be our starting point. ResNeXt-ifyResNetXt employs grouped convolution for the 3x3 conv layer in the BottleNeck to reduce FLOPS. The authors removed the stride=2 and add a downsampling block before the three convs using a 2x2 stride=2 conv. ConclusionsIn this article we have seen, step by step, all the changes the Authors did to create ConvNext from ResNet.
Fool me twice, I’ll find a solution
       
I was craving time in nature and so I booked an Airbnb in Winston-Salem, North Carolina. This was one step in getting this wrong because the receipt email does reinforce the number of nights very clearly. If that’s the case, then other communication channels should reinforce the number of nights — like notifications and in-product. Expedia depends on the check in and check out dates to communicate length of stay during the booking process. 3-night stayNo length of stay communicatedVrbo does a great job of communicating length of stay from the moment you start your booking process.
Embracing Possibilities: A key to contingency thinking
       
Embracing Possibilities: A key to contingency thinkingJust because something is possible, doesn’t mean it will happen. Here is a story of how contingency thinking and embracing possibilities can result in something even more incredible than you thought was possible. A tiny plane should be my nightmare fuel — everything about it goes against how I define comfort. And that is the value of contingency thinking: it helps you define what is possible, so that you might guide that journey a bit. Keep your threads of thinking reasonable and actionable: contingency thinking has diminishing returns after a certain point, and results in rumination.
Yes, We Are Living in Weird Times, Which is Exactly Why We Should All Become Artists.
       
Yes, We Are Living in Weird Times, Which is Exactly Why We Should All Become Artists. Self Portrait in Shapes (2022) Aquatint EtchingWe are social beings living through weird, stressful, and isolated times, to say the least. Getting art education into more spaces and especially non-traditional ones can radically change how and why people become artists on and offline. Without my own art practice, first alone and then in hybrid digital art communities, I simply would not have survived. —Lindsey Frances Jones is an artist and graduate student in art and art education at Teachers College, Columbia University
What Do You Say to Someone Who Gets Covid Now?
       
What Do You Say to Someone Who Gets Covid Now? Our views have changed a lot in the last two years. The first person I knew who had Covid-19 was an old friend who had been traveling in Spain in mid-March 2020 and, upon returning to New York City, came down with a nasty, debilitating fever. She was “as sick as I’ve ever been,” and my friend, who is rarely ill, was bed-ridden for several days. “It was really scary there for a few…
What They Don’t Tell You About Research
       
What They Don’t Tell You About ResearchOverseas travel, mystery instructions, disappearing SD cards, and hidden elevators: is it espionage? Or just what it takes to research a troubled and turbulent past? I am a historian. That means I belong to a group of secret super-heroes responsible for upturning the soil on past…
The Trouble With Trigger Warnings
       
The Trouble With Trigger WarningsThere’s a better way to protect the vulnerableTrigger warnings, like so much in life, began with the best of intentions but quickly devolved into the absurd. First used in online discussions of sexual violence, they expanded exponentially and now include warnings about everything from racism to classism, as well as books such as The Great Gatsby and The Adventures of…
A New Way to Celebrate Earth Day
       
A New Way to Celebrate Earth DayHow taking care of your land might be enough todayPhoto: Michele BigleyNikko tucked the watermelon seedling into the dirt and announced, “We’ll call it the People’s Garden.” I thought of all the people who pass our front lawn every day — the young woman with the straight back leading her doberman pincher around the block on a short leash, the…
The Devil, the Indigenous God and the Colonizer in American Place Names
       
The Devil, the Indigenous God and the Colonizer in American Place NamesHow English linguistic colonialism stains the legacy of sacred native spacesDevils Tower, Crook County, WY (photo by Kyle Petzer on Unsplash)The prevalence of names associated with Hell or the Devil in the natural wonders of the American landscape can come as a shock.
Is Crypto Just a Religion of Online Gambling?
       
Is Crypto Just a Religion of Online Gambling? It might be time to give up on the financial “revolution” of cryptocurrencyLast night, I was scrolling through Twitter in a hot bath after getting back from a long run when I saw the following quote:
Data salaries at FAANG companies in 2022
       
Data salaries at FAANG companies in 2022Salary is a sensitive topic. I’ve previously worked at Google and know many people at FAANG companies (Facebook, Apple, Amazon, Netflix, Google). Data salaries in US, Europe and elsewhereIt’s no surprise that US tech and data salaries are high. Median US data salary for companies I looked at is $187,000 compared to $108,000 in Europe and $87,000 in the rest of the world. Data salaries by seniorityIf money is cold, it’s cold at the top.
A Clash of Two Systems
       
A Clash of Two SystemsThe war in Ukraine is a confrontation between two systems, one modern, legalistic, decentralized and multicephalous; the other archaic, nationalistic, centralized and monocephalous( This is a copyright compatible version of my side of a conversation with Laetitia Strauch-Bonart published in the French periodical l’Express.) This war not only pits Ukraine and Russia against it, it is a confrontation between two systems, one modern, decentralized and multicephalous, the other archaic, centralized and autocephalous. Putin and the “realists” are the wrong century, they do not think in terms of systems or in terms of individuals. This model tends to “antifragility” — a concept present in my books that refers to a property of systems that strengthen when exposed to stressors, shocks or volatility. This also confirms, if it were still needed, the absurdity of Samuel Huntington’s ideas in The Clash of Civilizations.
When AI Sparks into Metaverse
       
But the real question is, how various sectors can expand their abilities with the blend of AI and Metaverse? Incorporating artificial intelligence (AI) into the Metaverse for smarter business is unavoidable, and it will transform as well as revolutionize the business world. Metaverse with AI allows students to walk through a virtual environment with life-like avatars instead of just staring at a computer screen and speaking through microphones. Today, most Metaverse games have a decentralized economic model, with developers and publishers using AI to take the gaming experience to the next level. In the fourth revolution, metaverse, virtual reality, and augmented reality (AR/VR) will play a significant role.
Cross-Validation Types and When to Use It
       
Cross-Validation Types and When to Use ItA better way to test your modelsOverviewBuilding machine learning models is a great process that includes several stepsCollection of data Data Preparation & Preprocessing Expletory Data Analysis Feature Engineering and Selection Model Building and Evaluation
Backtesting Machine Learning Models the Uber Way
       
Backtesting Machine Learning Models the Uber WayThe architecture is used to regularly back-test hundreds of forecasting models at Uber. Please give it a try by subscribing below:Backtesting is an incredibly important aspect of the lifecycle of machine learning models. The relevance of backtesting scales exponentially with the number of machine learning models used in a given environment. Recently, Uber unveiled a new service completely built from the ground up to backtest machine learning models at scale. Uber’s Backtesting ServiceOver the years, Uber has built different proprietary technologies that help to simplify the lifecycle management of machine learning models.
Using AI to Analyze Speech
       
Using AI to Analyze SpeechApplying NLP to analyze a podcast and making a web-appNatural language processing models have become increasingly more powerful in the past few years. In this article, I analyze a complex speech using NLP to make this little web-app! Categories of VideoCategories of video using AssemblyAI (Image by Author)The video is about self-improvement, but the tool classifies the tool as bodybuilding or weight lifting. ConclusionsIn this article, I explain how AI can be applied to analyze speech. It breaks down a clip from a podcast, breaking it down into chapters, summarizing the chapters, and even provides sentiment analysis of the speech.
How Does Google Generate Summaries?
       
How Does Google Generate Summaries? A new model for automatically generating summaries using machine learning, released in Google Docs that you can already use! Indeed, Google recently announced a new model for automatically generating summaries using machine learning, released in Google Docs that you can already use. A blue summary icon appears in the top left corner when a document summary suggestion is available. They trained their model to replicate our thought process for generating summaries using way too many documents with manually-generated summaries.
ICLR 2022 — A Selection of 10 Papers You Shouldn’t Miss
       
ICLR 2022 — A Selection of 10 Papers You Shouldn’t MissHopefully the last big virtual-only AI conference of the year? This is yet a new weakness of using uncurated data for training models that should be considered when developing and deploying models. Authors’ TL;DR → We introduce a language model that implicitly plans via a latent stochastic process. This work proposes to model language at the coarser level of sentences as a stochastic process that guides the LM generation to be globally coherent. Other relevant works on Language models at ICLR are (FLAN) Fine-tuned Language Models are Zero-Shot Learners, Multitask Prompted Training Enables Zero-Shot Task Generalization, Charformer: Fast Character Transformers via Gradient-based Subword Tokenization, GreaseLM: Graph REASoning Enhanced Language Models, HTLM: Hyper-Text Pre-Training and Prompting of Language Models or Fine-Tuning Distorts Pretrained Features and Underperforms Out-of-Distribution.
A First Course on Deploying Python Projects
       
But as your Python project getting larger, it is not as simple as sending your friend a small script. A module in Python is usually a folder of Python scripts and usually with a clear entry point. It is just a text file, usually place in a directory with a Python module or some Python scripts. To activate a virtual environment, we execute the activation shell script with the following command (e.g., under bash or zsh in Linux and macOS)$ source myproject/bin/activate 1 $ source myproject / bin / activateand afterwards, you’re under the Python virtual environment. The command python will be the command you created the virtual environment (in case you have multiple Python versions installed in your OS).
Google AI Blog: FormNet: Beyond Sequential Modeling for Form-Based Document Understanding
       
These unique challenges in form document structural modeling have been largely underexplored in literature. An illustration of the form document information extraction task using an example from the FUNSD dataset. In “FormNet: Structural Encoding Beyond Sequential Modeling in Form Document Information Extraction”, presented at ACL 2022, we propose a structure-aware sequence model, called FormNet, to mitigate the sub-optimal serialization of forms for document information extraction. The global tokens attend to and are attended by all tokens, but the long tokens attend only locally to other long tokens within a specified local radius, reducing the complexity so that it is more manageable for long sequences. Unlike the ETC model, the FormNet model makes tokens attend to other tokens within the same visual blocks, along with tokens aligned horizontally, thus strongly leveraging structural cues.
Offline RL Made Easier: No TD Learning, Advantage Reweighting, or Transformers
       
Offline RL Made Easier: No TD Learning, Advantage Reweighting, or TransformersA demonstration of the RvS policy we learn with just supervised learning and a depth-two MLP. Offline reinforcement learning (RL) is conventionally approached using value-based methods based on temporal difference (TD) learning. These algorithms learn conditional policies by conditioning on goal states (Lynch et al., 2019; Ghosh et al., 2021), reward-to-go (Kumar et al., 2019; Chen et al., 2021), or language descriptions of the task (Lynch and Sermanet, 2021). The video above shows the complex behavior we learn using just supervised learning with a depth-two MLP – no TD learning, data reweighting, or Transformers! While lots of prior work (Kumar et al., 2019; Ghosh et al., 2021; and Chen et al., 2021) share the same core algorithm, it lacks a common name.
Introduction to GraphSAGE in Python
       
Introduction to GraphSAGE in PythonScaling Graph Neural Networks to billions of connectionsWhat do UberEats and Pinterest have in common? GraphSAGE in theoryImage by authorThe GraphSAGE algorithm can be divided into two steps:Neighbor sampling; Aggregation. The sampler looks at the list of neighbors, of neighbors of neighbors, etc. GraphSAGE in PyTorch GeometricWe can easily implement a GraphSAGE architecture in PyTorch Geometric with the SAGEConv layer. With GraphSAGE, we loop through batches (our 4 subgraphs) created by the neighbor sampling process.
A Physicist’s View: The Thermodynamics of Machine Learning
       
A Physicist’s View: The Thermodynamics of Machine LearningComplex systems are ubiquitous in nature, and physicists have found great success using thermodynamics to study these system. Well, ML models are typically constructed from layers of simple mathematical operations: multiplication, additions, or basic logical operation (e.g. Similar to machine learning models, dynamical systems are made up of a large number of simple physical interactions. However, remember that the training data (and testing data) is always just a subset of the full data. We cannot really understand ML models using just a handle of metrics.
Machine Learning in 5 Minutes
       
Machine Learning in 5 MinutesThe emerging tech that has the media buzzingPhoto by Kevin Ku on UnsplashMachine learning. In order to learn the values of the weights that give a machine learning algorithm its unique connection-based value, a neural network needs data. MNIST datasetThe machine learning algorithm takes this massive dataset and learns patterns it can use to determine how to get to that output. Training data allows the computer to look at thousands of examples to perfect its technique. The real magic ?of a neural network happens after the prediction is calculated, contrary to what you may believe.
IBM Data Science Certification Review
       
IBM Data Science Certification ReviewThe IBM data Science Certification (Specialization) Course on Coursera provides a great introduction to Data Science for those wishing to expand their analytical skills. While completing a Masters or PhD in Data Science is an admirable undertaking, applying for and accomplishing a Data Science Certification is another outstanding option that will not force you to break the bank while honing your data science skills. While completing my Master of Science (MS) in Operations Research, I decided to enroll in the IBM Data Science Certification Course offered on Coursera to use those skills in my educational assignments. During week 2, the student learns about some of the main areas of data science such as Big Data, Data Mining, Deep Learning, and Machine Learning. ConclusionWhether you plan to cross over into the field of Data Science from another labor sector, or just want to learn new skills for conducting a more in-depth analysis, I highly recommend completing IBM Data Science Specialization Certification for skill development.
How to Develop and Test Your Google Cloud Function Locally
       
How to Develop and Test Your Google Cloud Function Locally So, you have written your serverless cloud function, but don’t want to waste time deploying it and hoping it works. Let me show you how to develop and test your cloud functions locally. Photo by Tomas Sobek on Unsplash Quick Intro: Cloud Functions If you are reading this article, I am sure you already know what Google Cloud Functions is. For condo virtual environment, use conda create --name YOUR_ENV_NAME python=3.8 to create a virtual environment with python version 3.8.to create a virtual environment with python version 3.8. You now know everything you need to develop and test your Google Cloud Function locally.
Algorithmic Behavior Modification by Big Tech is Crippling Academic Data Science Research
       
How major platforms are using persuasive tech to manipulate our behavior and increasingly stifle socially-meaningful academic data science researchA diverse community of data science academics does applied and methodological research using behavioral big data (BBD). While a lack of access to human behavior data is a serious concern, the lack of data on machine behavior is increasingly a barrier to progress in data science research as well. We define algorithmic BMOD as any algorithmic action, manipulation or intervention on digital platforms intended to impact user behavior. Less reproducible research Research using BMOD data by platform researchers or with academic collaborators cannot be reproduced by the scientific community. Corporate scrutiny of research findings Platform research boards may prevent publication of research critical of platform and shareholder interests.
Demystifying Modules and Packages in Python
       
Demystifying Modules and Packages in PythonModules and packages are the core of any large project. How about demystifying some common reflexes of handling packages and modules? Preprocessor holds all sorts of preprocessor modules with Pandas, NumPy and Scikit-Learn. Suppose you actually add them :Watch closely what happens when you try to merge the modules ’ directories :>>> import sys>>> sys.path.extend(['backend_connectors','custom_functions'])>>> from modules import cloud_connectors, optimizersTraceback (most recent call last):File "", line 1, in ImportError: cannot import name 'optimizers' from 'modules' (path_to_project_directory/superlibrary/utils/backend_connectors/modules/__init__.py)...As expected, Python is unable to load your functions when you treat ordinary packages as namespace packages. There is so much to learn about how Python handles packages and modules.
The More You Write, the Better You Are at Explaining Your Work
       
So, for me, DSSG (Data Science for Social Good) projects are some of the most interesting ones to read. With a very busy work life, why did you decide to start writing publicly about data-related topics? Does your work at the Turing Institute inform the kind of posts you write for a broader audience on TDS? And what advice would you give to someone who wants to write about their work, but isn’t sure where to start? Soon you’ll realize: the more you write, the better you are at explaining your work.
Transformers: The bigger, the better?
       
Transformers: The bigger, the better? This latest model from Google, called Pathways Language Model (PaLM), outperforms all existing ones so far. Figure 1: Trend of the state-of-the-art main large language model sizes with time (image by author). So, large language models based on Transformers require a huge amount of computational resources to be trained. Figure 2: Training compute in FLOPs (exponential axis) of the main large language model (image by author).
Dynamic Siting Posture Recognition and Correction
       
Dynamic Siting Posture Recognition and CorrectionTo identify people’s sitting posture in real-time and re-build the awareness of the muscle and spatial position for back pain suffersSitting posture skeletons recognition by Openpose [Image by Author]1 AbstractLower back pain (LBP) recently became a severe and common problem for most office workers. The aim of this project is to develop a novel way to designing a dynamic siting posture recognition and correction system. 3 Algorithm StructureThe model is built on recognising human-object interaction, human head pose, human body orientation, and human pose. If only one spine-leg angle is detected, the final spine-leg angle is defined by this angle. If both of the angles are detected, the final spine-leg angle θ is defined as the mean of the two angles:Use the previous spine-leg angleif left spine-leg angle and right spine-leg angle both exist then spine-leg angle⬇the mean of (left spine - leg angle and right spine - leg angle) else if left spine-leg angle > right spine-leg angle then Use the measurements of left spine-leg angle else Use the measurements of right spine-leg angle end ifend ifThis procedure contains two phases:Phase 1: Spine-leg angle model learning 1.
Why The UK Is Exporting Refugees To Rwanda
       
Why The UK Is Exporting Refugees To RwandaAlthough more complex than the media is suggesting, it’s still not an adequate solutionPhoto by Ravi Sharma on UnsplashIn the past week, the UK government has incited a firestorm of criticism for a new scheme in which some asylum seekers will be exported to Rwanda.
Why a New True Crime Podcast Contains No Crime At All
       
Why a New True Crime Podcast Contains No Crime At All“Tiffany Dover is Dead*” is a gripping forensic investigation into how conspiracy theories spreadOne day in April of 2020 it hit me: “Where did John go?”About a month into the pandemic, I’d realized that one of my friends on Instagram just suddenly stopped posting.
Why Are We Erasing Trans Elders?
       
Why Are We Erasing Trans Elders? It’s time we discuss the generations that we let fall through the cracks. Recently, a social media post made the rounds from an academic looking to do research on the relationships between trans youth and “Elders.” The criteria as implied in the graphic post asked for trans youth who are between the ages of 18–25 and have been out for less than three years, as well as trans Elders; Those in their mid to…
What Do You Have To Teach?
       
What Do You Have To Teach? The act of teaching — the art of teaching — could not be more crucial for the maintenance of our culturePhoto courtesy of the authorA few years ago, not long before the pandemic began, I made the decision to stop teaching and become a full-time writer. I was a little hesitant to give up my affiliation with the university and its…
Melissa Lucio Is Not Alone
       
A jury did believe it, and sentenced Melissa Lucio to death. Melissa Lucio was vulnerable, too. And even if they do, merely refraining from killing Melissa Lucio will not be justice. This past weekend, I met several new exonerees who had survived the same devastating trauma Melissa Lucio is still going through. I wish I could do the same for Melissa Lucio.
Don’t Confuse A Moment With A Movement
       
Don’t Confuse A Moment With A MovementOn September 17th, 2011, protesters began to flood into Zuccotti Park in lower Manhattan. Declaring “We are the 99%” they planned to #Occupy Wall Street for as long as it took to make their voice heard. Similar protests soon spread like wildfire across 951 cities in 82 countries. It seemed to be a massive global movement of historic proportions.
When the canary stops singing…
       
At Amazon, there was an entirely different thing we called “canary.” A canary was a test continuously running against a production environment, validating that a critical user scenario was still operational. That code needed to execute forever, at some desired throughput (such as once per minute). I never envisioned it to be a canary execution platform, but once you thought abstractly about what functionality you needed in a canary platform, it was surprisingly similar. I needed a canary to run forever, whereas the load test platform had been optimized to run for a finite amount of time. When you were trying to run a load test at a million TPS, plus or minus a few was ok.
I Didn’t Make Good Choices, I Had Good Choices
       
I Didn’t Make Good Choices, I Had Good ChoicesThe sheer, stupid luck of growing up in the right family. Photo by Robert Anasch on UnsplashThere is a book, which I haven’t read, called Little Fires Everywhere by Celeste Ng. A miniseries on Hulu, which I haven’t watched, is based on the book. It deals with a lot of heavy topics — race, class, and people from different circumstances, although I cannot reasonably speak to the…
I Did Not Have Children to Bestow Unto Them a Dying Planet
       
I Did Not Have Children to Bestow Unto Them a Dying PlanetWhy I actively seek climate solutionsKai at the Great Barrier Reef in 2010Valclav Havel says hope is the “ability to work on something because it is good, not because it has a chance to succeed.”When I took three-year-old Kai to the Great Barrier Reef in 2010, I didn’t know I had brought him to his first graveyard. He won’t remember us standing on…
Disinformation and Democracy Reading List
       
Below is some of what I’ve read that offers useful context, solutions we can learn from, and interesting perspectives. Tomorrow, I’m heading to Stanford to deliver a speech about changes in the way we create and consume information, and the very real threat it poses to democracy. This report from the Aspen Institute offers an in-depth investigation into the chain reaction of harm caused by bad information. How to Stop Misinformation Before It Gets Shared via WiredA look back at the events over the past ten years to help us understand how we got here. This op-ed offers ideas on how we can become a more “disinformation resistant public.”Fighting Disinformation Can Feel Like a Lost Cause.
A Clash of Two Systems
       
A Clash of Two SystemsThe war in Ukraine is a confrontation between two systems, one modern, legalistic, decentralized and multicephalous; the other archaic, nationalistic, centralized and monocephalous( This is a copyright compatible version of my side of a conversation with Laetitia Strauch-Bonart published in the French periodical l’Express.) This war not only pits Ukraine and Russia against it, it is a confrontation between two systems, one modern, decentralized and multicephalous, the other archaic, centralized and autocephalous. Putin and the “realists” are the wrong century, they do not think in terms of systems or in terms of individuals. This model tends to “antifragility” — a concept present in my books that refers to a property of systems that strengthen when exposed to stressors, shocks or volatility. This also confirms, if it were still needed, the absurdity of Samuel Huntington’s ideas in The Clash of Civilizations.
When AI Sparks into Metaverse
       
But the real question is, how various sectors can expand their abilities with the blend of AI and Metaverse? Incorporating artificial intelligence (AI) into the Metaverse for smarter business is unavoidable, and it will transform as well as revolutionize the business world. Metaverse with AI allows students to walk through a virtual environment with life-like avatars instead of just staring at a computer screen and speaking through microphones. Today, most Metaverse games have a decentralized economic model, with developers and publishers using AI to take the gaming experience to the next level. In the fourth revolution, metaverse, virtual reality, and augmented reality (AR/VR) will play a significant role.
Customer Segmentation as A Strategy
       
The simplest explanation is that customer segmentation is more on the strategy level and segment-of-one (or personalization) is more on the tactic level. These are some of the examples for marketing, however, the use cases of customer segmentation can be expanded out of the marketing department like logistics, strategy, and others, to name a few. The ultimate goal of customer segmentation within an organization is to have a strategic level of understanding of its customer base . For the data-driven approach to customer segmentation, the activity will mainly rely on the data in our segmentation method. Characteristics of good customer segmentationYou start to get the segmentation, so what are the characteristics or criteria for good customer segmentation.
How to Create and Use Multi-Index DataFrame to Scale Up Your Data Analysis
       
Photo by Pacto Visual on UnsplashHow to Create and Use Multi-Index DataFrame to Scale Up Your Data AnalysisDetails of Multi-Index DataFrame: Create, Slice and Index, and AnalyzeIn most of the DataFrames, we see one index that works as a row identifier. To really improve and upscale your data analysis skill, it is important to learn about multi-index DataFrame well. In this article we will see:How to create a multi-index DataFrameHow to use a multi-index DataFrame for efficient data analysis. How to create a multi-index DataFrame? To create a multi-index DataFrame, I will first create the indices.
10 seats remaining | A series of live ML strategy workshops
       
Building successful machine learning products requires mastering ML Strategy, including problem formulation, evaluation, and tactics for dealing with stakeholders and project uncertainties. Many ML projects fail not for technical reasons, but because of a poor understanding of the difference between doing ML well technically and actually realizing business impact with an ML solution. Professor Foster Provost, a leading ML practitioner, entrepreneur, and scholar, is hosting a live interactive course that will help you to master these skills. The course consists of five live online sessions that dive into actionable frameworks and industry case studies. The course is also accredited by CPD, meaning you can expense the full cost to your employee learning budget.
Google AI Blog: Learning to Prompt for Continual Learning
       
In “Learning to Prompt for Continual Learning”, presented at CVPR2022, we attempt to answer these questions. Drawing inspiration from prompting techniques in natural language processing, we propose a novel continual learning framework called Learning to Prompt (L2P). L2P is applicable to various challenging continual learning settings and outperforms previous state-of-the-art methods consistently on all benchmarks. In the continual learning scenario, L2P maintains a learnable prompt pool, where prompts can be flexibly grouped as subsets to work jointly. Further, it can handle various complex continual learning scenarios, including the challenging task-agnostic setting.
Accelerating renewable energy with new data set for green hydrogen fuel
       
As part of this project, we’ve already made progress by open-sourcing OC20, the world’s largest training data set of materials for renewable energy storage. The OER data set contains ~8M data points from 40K unique simulations. The data set and baseline models will be open-sourced in the coming months to help the global scientific community advance renewable energy technologies. Why it matters:Scalable solutions to renewable energy storage are essential to addressing the world’s rising energy needs while slowing climate change. Improved catalysts for OER will advance several renewable energy technologies, such as solar and wind fuel production, as well as rechargeable metal-air batteries, a renewable energy storage device that is useful for electric cars.
Integrate ServiceNow with Amazon Lex chatbot for ticket processing
       
In this post, we show you how to integrate an Amazon Lex chatbot with ServiceNow . Amazon Lex invokes fulfillment Lambda function: Amazon Lex sends the event to the fulfillment AWS Lambda function. Fulfillment Lambda function returns the response to Amazon Lex bot based on Sentiment. Create the Amazon Lex chatbotNow that you have created the Lambda function, you create the conversational interface (the chatbot) using Amazon Lex. ConclusionThis post showed how you can integrate Amazon Lex bot with ServiceNow incident management and a Slack app.
Search for knowledge in Quip documents with intelligent search using the Quip connector for Amazon Kendra
       
We’re excited to announce that you can now use the Amazon Kendra connector for Quip to search messages and documents in your Quip repository. For our solution, we demonstrate how to configure a Quip repository as a data source of a search index using the Amazon Kendra connector for Quip. PrerequisitesTo get started using the Quip connector for Amazon Kendra, you must have a Quip repository. Create an Amazon Kendra indexTo set up your Amazon Kendra index, complete the following steps:Sign in to the AWS Management Console and open the Amazon Kendra console. ConclusionThe Amazon Kendra connector for Quip enables organizations to make the invaluable information stored in Quip documents available to their users securely using intelligent search powered by Amazon Kendra.
The Easiest Way to Deploy Your ML/DL Models in 2022: Streamlit + BentoML + DagsHub
       
The Easiest Way to Deploy Your ML/DL Models in 2022: Streamlit + BentoML + DagsHub Deploy models as lightweight APIs with a user-friendly interface Image by author Introduction You have a ready machine learning model. Since we build the Streamlit UI on top of an API, the web app will be even more lightweight. You won't have dependency issues as you will only need the requests library to process requests to the BentoML API through the Streamlit app. Storage: has dedicated storage for data & model storage managed by DVCExperiment tracking: has support for MLflow Tracking and Git Tracking. Next, we create a function to save the model to BentoML local store: The keras.save function saves Keras models in a format suitable for other BentoML operations.
Kiss Your Bias Goodbye: Is the Fundamental Theory of Supervised Learning Incomplete?
       
Kiss Your Bias Goodbye: Is the Fundamental Theory of Supervised Learning Incomplete? Deep Learning’s Apparent Violation of the Bias-Variance Tradeoff Sparks Questions for All Predictive ModelingThe bias-variance tradeoff is a touchstone for all supervised learning. When I first saw a deep learning study that showed a strange bias-variance tradeoff I immediately dismissed it. The bias-variance tradeoff is a central tenant in supervised learning, but all these results appeared to violate it. Or, to speak for myself, my previous understanding of the bias-variance tradeoff was incomplete.
Neural Feature Importance
       
Neural Feature ImportanceFeature Importance based on ANN Weights (plus Hands-On Code)Photo by J K on UnsplashWhat’s in for You? Let’s get startedANNImage by AuthorThe above image represents an ANN with a single hidden layer with a single neuron in that hidden layer. Neural Feature Importance is all about deriving the feature importance from the weights of the neuron(s), so let’s dive into the matrix multiplication form of the above super simple neural network to see the actual weights. print([coef.shape for coef in model.coefs_])# Output: [(835, 512), (512, 2)]As you can see there are 2 elements in there, why? Feature Importance FunctionImage by AuthorFinally, we call our functions related to computing feature importance and convert the result into a dictionary.
Stop Hardcoding Values in Python Apps — Use ConfigParser Instead
       
Stop Hardcoding Values in Python Apps — Use ConfigParser Instead Hardcoded values are a terrible idea. Use configparser to read .ini configuration files instead Photo by Florian Olivo on Unsplash Hardcoding configuration values in Python apps is fun and games until something breaks. Up next, you’ll see how to install and use the configparser Python library with .ini configuration files in Python. How to Install ConfigParser Python Library You’ll need the configparser module to work with .ini configuration files. Summary of .ini Configuration Files with ConfigParser in Python Hardcoding values in Python apps is always a terrible idea.
4 Tips for Using Python Pandas More Efficiently
       
4 Tips for Using Python Pandas More Efficiently A simple yet practical guide Photo by simon sun on Unsplash Pandas is an extremely practical and functional library for data analysis and manipulation tasks. I have been using Pandas since 2019 and it has always been able to provide a solution for my tasks. What I realized after using Pandas for about 3 years is that I wasn’t using it very efficiently at the beginning. Filling missing values by using other columns The real-life datasets usually contain missing values which cannot always be ignored. 4 Python Pandas Functions That Serve Better With Dictionaries Make more use of Pandas.
4 Methods to Power Feature Engineering for Your Next ML Model
       
4 Methods to Power Feature Engineering for Your Next ML ModelPlus, tips for dealing with categoric, numeric, and mixed dataPhoto by Edu Grande on UnsplashWhat is Feature Selection? Categorical data using the Chi-Squared Test Pearson’s Correlation Coefficient for Numeric Data Principal Component Analysis for Numeric Data Feature Importance with Random Forests for Both Categorical and Numeric DataLet’s get started! You apply the Chi-Squared test when both your feature data is categorical, and your target data is categorical, e.g., classification problems. Note how the above functions only fit the encoders to the training data and transform both train and test data. Feature Importance from Random ForestsFinally, my preferred method for feature selection is to utilize Random Forest and its ability to calculate Feature Importance.
On Writing Clean Data Pipelines
       
Software Engineering for Data ScienceOn Writing Clean Data PipelinesBest practices for organizing your data analysis logicWhen helping members of our community with architectural decisions regarding their data pipelines, a recurring question is how to group the analysis logic into tasks. This blog post summarizes the advice we’ve given to our community members for writing clean data pipelines. Before diving into the details, let’s get on the same page and clearly define the concept of data pipeline. Writing Clean Data PipelinesThe pipeline and task concepts are simple, but it might be hard to decide what constitutes a task when applying the idea to a real-world problem. If you think your use case does not fit into these definitions, ping us, and we’ll happily help you design a clean data pipeline.
5 Data Distributions for Data Scientists
       
5 Data Distributions for Data Scientists Data distributions help us understand our random variables a bit better. Normal Distribution — Continuous Distribution Arguably, the most famous data distribution is the normal one. Let’s plot 10.000 mean samples taken from a normal distribution — we change this by changing the size argument in the plot: np.random.seed(42)(sns.kdeplot(np.random.normal(loc = 100, scale = 3, size = 10000)).set(title='Density Plot of Random Normal Distribution')) Random “Normal Distribution” generated — Image generated by Author Cool! If you want to explore the math behind the normal distribution, the Normal Distribution Wikipedia page is very good. Bernoulli Distribution — Discrete Distribution The Bernoulli Distribution is one of the most simple ones.
ML & Neuroscience: March 2022 must-reads
       
ML & Neuroscience: March 2022 must-readsThis month: PDE and deep learning to build up brain’s ? connectome?, brain-computer ? interface for hand prosthesis and finally a new look at Hidden Markov Models for brain dynamics understanding. In this series, I will cover 3 main papers, under review on arxiv.org which deal with machine learning and neuroscience. In particular, I will cover the following aspects:can ML research help neuroscience in getting a deeper understanding of the brain’s dynamics and activities? how neuroscience can help enhance ML with new biologically inspired models? how ML and models can give us new clinical neuroscience, with new imaging and signal techniques?
I donated both money and new socks to help the Ukrainian refugees, although I know I shouldn’t have
       
I donated both money and new socks to help the Ukrainian refugees, although I know I shouldn’t haveHere is what humanitarian aid organizations should learn about the desire of small donors to helpPhoto by Neelam279 on PixabayThe humanitarian crisis in Ukraine is heartbreaking. Many humanitarian aid organizations are trying to help and support the refugees. I wanted to do my part and donate money via The UN Refugee Agency. I say seemingly because despite how it feels, it is undeniable that small donations add up to a huge amount that can make a critical change. Organizations should cultivate their outreach to small donors during crises and help them feel a part of the community of aid givers.
Are Easter and Christmas “Pagan” Holidays?
       
Are Easter and Christmas “Pagan” Holidays? Digging in to a Common Idea about how Religions InteractEvery year, especially around Easter and Christmas, a wide variety of memes, social media posts, and cheeky discussions at holiday gatherings assert one simple fact over and over: that Christmas and Easter, the two most important Christian holidays, are not really Christian at all, but repackaged…
Five Koans of Software Architecture
       
Five Koans of Software ArchitectureRandom advice I find myself repeating a lot…Yoga meditation vector created by vectorjuice“Those who should decide on the architecture are those that will be on call for it”Software architecture is fun. So much so that there’s never any shortage of smart people eager to jump in with their opinions. In my various engineering leadership roles throughout my career so…
Why is modern software so bad?
       
Why is modern software so bad? No chance for a day-one patch or to ship a sub-optimal experience with the promise of making it better down the line. This skyrockets that day-one patch to an eye-watering four and a half hours to download! “Sure, we’ll send out the disks, but we will resolve the issues in the day-one patch; we don’t even need to finalise it until the day before release.” But what happens when that day-one patch slips? You can run Windows 11 on a 15-year-old Pentium 4 processor, with the ability to run software written for Windows 98!
I Would’ve Been a Great Mother — and a Terrible One
       
I Would’ve Been a Great Mother — and a Terrible OneLooking back on what might have been, and what never wasImage by Leah Kelley via PexelsI would have made a great mom. My heart aches for helpless creatures. Pass me a newborn. I will never put her down. I know from experience that I can hold a screaming, flailing baby all night long and manage to hide my frazzled…
Did You Enjoy Spending Time, Money to Give the IRS Info It Already Had?
       
Did You Enjoy Spending Time, Money to Give the IRS Info It Already Had? Imagine how much stress and hassle you’d avoid if you were able to log into an IRS website and see what information the tax agency already has about your income.
How Ethical Dilemmas Overwhelm Your Brain
       
How Ethical Dilemmas Overwhelm Your BrainEveryday life has become so loaded with moral considerations that your well-being may suffer for it. Photo by Javier Allegue Barros on UnsplashA package of cheese stopped me cold in the grocery store last week. I was buying ingredients for dinner and I needed some cheddar. Cheddar isn’t popular in…
Shonda Rimes: Angry Black Woman
       
Shonda Rimes: Angry Black WomanWhy do you think I need her so much? My mother is the darkest-skinned member of my extended family. She refuses to leave the house without curling her hair into waves more palatable for the white-washed suburbs in which I was raised. She piles on layers of makeup as though it will defend her from my father’s casual emotional abuse and psychological…
Ron DeSantis’s Extremism is Setting Up His Presidential Run
       
Ron DeSantis’s Extremism is Setting Up His Presidential RunThe Florida governor is concocting a successful political recipe that can take him and the GOP’s hateful vision back to the White HousePaul Hennessy/ Sopa Images/ Lightrocket via Getty ImagesIf you want a sneak preview of how the GOP will use the culture wars to establish minority rule for their radicalized and weaponized base…
Why I Don’t Care if My Students Respect Me
       
Why I Don’t Care if My Students Respect MePhoto by Gabe Pierce on UnsplashRespect your elders. Respect your teachers. My understanding of respect was and continues to involve a feeling of esteem or admiration for a person—a feeling that could not develop on command. This proved especially true when it came to students that needed higher levels of support. Some of my students will like me, and some may even feel that they respect me.
The rise and fall of crypto culture
       
The rise and fall of crypto cultureCrypto is dead. I have long been vocal on my disdain of crypto culture, and my love for crypto ethos. Reading that might sound weird, but crypto ethos is concept like self-sovereign rights, self custody, self empowerment. Crypto culture is concepts like wealth, entitlement, enrichment, and ego. Crypto culture has strangled crypto ethos.
Serving Python Machine Learning Models With Ease
       
Serving Python Machine Learning Models With EasePhoto by Marius Masalar on UnsplashEver trained a new model and just wanted to use it through an API straight away? Training the Scikit-learn ModelFirst up, we’re going to train a support vector machine (SVM) model using the scikit-learn framework. It is heavily coupled to the machine learning framework used to train your model. In our case, we trained the model using scikit-learn so we're going to use the scikit-learn implementation for MLServer. That’s it, two small config files and we’re ready to serve our model using the command:mlserver start .
Top 3 Data Science Courses Every Data Scientist Should Consider
       
Top 3 Data Science Courses Every Data Scientist Should ConsiderOnline courses for data scientistsPhoto by Avel Chuklanov on UnsplashAs someone interested in data science, you might be wondering what the top online courses are to break into the data science field or if you are already in the industry you might be looking for courses to complement your knowledge and expertise. These are online courses that cover material that is essential for approaching data science challenges. Machine Learning by Andrew NgOne of the students’ all-time favourites is the Machine Learning course by Andrew Ng. Anyone who inspires to become a true expert in the data science field should know about deep learning approaches at a deeper level. AWS or GCP or MSFT Azure Data Science Cloud specialisationsBeing a data scientist is more than having knowledge about machine learning models.
Guide to Iteratively Tuning GNNs
       
Analyze mini-batch vs full graph training behavior across HPO iterations. We define quality by running an unconstrained performance tuning loop, and use the results to set thresholds in a constrained tuning loop that optimizes for training efficiency. For GraphSAGE and RGCN we implemented both a mini batch and a full graph approach. This phase finds the best performance by tuning GraphSAGE and RCGN. RGCN Mini Batch Tuning Experiment – Parameter SpaceRGCN Full Graph Tuning Experiment – Parameter SpaceTuning GraphSAGE with a mini-batch approach we found that of the parameters we introduced, the fanout_slope was important in predicting accuracy scores and the max_batch_num_nodes were relatively unimportant.
How AI is helping address the climate crisis
       
As a force multiplier for scientific research, AI is helping accelerate the rate of progress across many domains, including those most important to solving the climate crisis. One promising attempt at this is the Open Catalyst Project, run by a partnership between Meta AI and Carnegie Mellon University’s Department of Chemical Engineering. We believe this work will enable AI systems to grow sustainably and with lower infrastructure needs. In agriculture, AI systems are helping optimize water and fertilizer usage and increase the productivity of farm equipment and systems. We’re incredibly optimistic about the impact AI is going to have on climate and sustainability, and the role that our researchers and engineers can play in helping build it.
Hands On Integer (Binary) Linear Optimization using Python
       
Hands On Integer (Binary) Linear Optimization using PythonA step by step introduction to Binary Linear Optimization with few lines of codesPhoto by Jon Tyson on Unsplash1. :)You are in that supermarket, with that specific price, that specific number of available pack of razor blades. Nonetheless, there might be situations where Linear Optimization can become handy. Theoretical BackgroundIn this article we will talk about Binary Linear Optimization. Now, why do we have to talk about binary linear optimization?
Linear and Logistic Regressions as Degenerate Neural Networks in Keras
       
Linear and Logistic Regressions as Degenerate Neural Networks in KerasNeural networks are supersets of linear and logistic regressions. Use Keras to quickly and efficiently build regression models and switch between them and neural networks as needed. Logistic regressionSometimes though we need to predict not a continuous value, but a true-or-false one, which is where logistic regression enters the picture. We again expect the linear model to be between the two linear expressions. ConclusionWe have seen how neural networks are supersets of linear and logistic regressions, and how with existing software components used to build neural networks we can very easily implement regression models.
How does Twitter feel about Pokemon Legends: Arceus?
       
How does Twitter feel about Pokemon Legends: Arceus? On January 28, 2022, the latest Pokémon game, titled Pokémon Legends: Arceus, was released. Or, in simple words, we could say there are many neutral and positive tweets but not many negatives. Figure 4: Top 15 mentioned Pokémon. Figure 5: Top 15 mentioned Pokémon (excluding Arceus).
Understand Machine Learning through More Design Patterns
       
Understand Machine Learning through More Design Patterns The Gang of Four’s book on design patterns seems to be the place where it all started. In this article, we tackle four additional design patterns that are translated into several scenarios and put to use in many areas of software design. Next, we write down a Data class that inherits Observable properties with the addition of a drift flag in the set_data function. Visitor : The visitor Pattern Represent an operation to be performed on the elements of an object structure. The object structure supplies a single classification and a single regression model for the example’s sake.
Multiclass Text Classification Using Keras to Predict Emotions: A Comparison with and without Word Embeddings
       
Do word embeddings add value to text classification models? Let’s find out in this multiclass prediction task for detecting emotionsThere are multiple ways to obtain word embeddings . In other words, in word embeddings, words are represented as vectors (i.e. Because each word embedding is stored using a key that uniquely identifies the word for which that embedding is. This concludes that these embeddings, or rather word vectors, do not borrow from the idea of word embeddings which takes a distributional semantics approach to encode texts to numbers.
Gamma Distribution Simply Explained
       
Gamma Distribution Simply ExplainedAn explanation of the Gamma Distribution and its originsPhoto by m. on UnsplashIntroductionIn my previous post we discussed and derived the Exponential Distribution which you can check out here:In a nutshell, the Exponential Distribution calculates the probability of waiting times between events in a Poisson Process. In this post we will derive the Gamma Distribution and gain some intuition behind it. DerivationThe derivation for the Gamma Distribution is similar to the Exponential Distribution as you may expect. Gamma FunctionThe reason the distribution is named the Gamma Distribution is because it contains the Gamma Function:Equation generated in LaTeX by author. ConclusionIn this blog we have generalised the Exponential Distribution to derive the Gamma Distribution giving us the probability of a waiting time until the nth event.
Auto ML in Python — An Overview of the MLBox Package
       
Auto ML in Python — An Overview of the MLBox PackageLearn about MLBox to quickly and efficiently train an automated machine learning pipeline for a classification problem in pythonPhoto by Crystal Kwok on UnsplashToday’s post is very special. It’s written in collaboration with Axel De Romblay, the author of the MLBox Auto-ML package that has gained a lot of popularity these last years. More information:about the algorithm: https://github.com/AxeldeRomblay/MLBox/blob/master/docs/webinars/features.pdfabout MLBox implementation: https://mlbox.readthedocs.io/en/latest/features.html#mlbox.preprocessing.Drift_thresholderImage modified by the authorHow does MLBox compute drifts for individual variablesMLBox builds a classifier that separates train from test data. Drift_thresholder then deletes the variables that have a drift score higher than a threshold (default to 0.6). More details here: https://mlbox.readthedocs.io/en/latest/features.html#mlbox.prediction.Predictor6 — ConclusionRunning an automated AutoML pipeline has never been easier.
VAE: Variational Autoencoders — How to Employ Neural Networks to Generate New Images
       
Neural NetworksVAE: Variational Autoencoders — How to Employ Neural Networks to Generate New ImagesAn overview of VAEs with a complete Python example that teaches you how to build one yourselfVariational Autoencoders (VAE). IntroThis article will take you through Variational Autoencoders (VAE), which fall into a broader group of Deep Generative Models alongside the famous GANs (Generative Adversarial Networks). Despite Variational Autoencoders (VAE) having similar objectives as GANs, their architecture is closer to other types of Autoencoders such as Undercomplete Autoencoders. VAE model trainingWith the Variational Autoencoder model assembled, let’s train it over 25 epochs and plot the loss chart. Visualising latent space and generating new digitsSince our latent space is two-dimensional, we can visualise the neighbourhoods of different digits on the latent 2D plane:2D visualisation of the MNIST digits encoded into a latent space.
Feature Engineering and deep learning for solar energy prediction
       
Feature Engineering and deep learning for solar energy predictionDoes feature Engineering help DL ModelsThe classic ML model suffered when it is challenged by unstructured data. A quick recap of the motivation is that an accurate prediction of solar energy prediction is imperative for sustainable use. Now, there is a popular variant of LSTM, GRUs which are called bi-directional models. Such a setup is shown in the following figure, we call this a bi-directional model of forecasting with truly bi-directional features. The apparent one is that the bi-directional model with the bi-directional feature is indeed kind of cool, but that is not what we wanted to get at.
How I secured an internship at NVIDIA as a Data Scientist
       
Getting an internship at NVIDIA can be tough, especially to become a data scientist intern during the graduation days. Below are some of the steps that I’ve taken to get an internship at NVIDIA as a data scientist. There is a lot of competition to get into a data science position. People who are pursuing masters in various fields such as statistics, data mining, data analytics, data science and computer science compete for these roles. Hence, I suggest that you could revisit those courses that you have already gone through in the field of machine learning and data science.
Telegram Chronicles: Donbas and its War (a.k.a. “the last 8 years”)
       
Telegram Chronicles: Donbas and its War (a.k.a. What follows is based on the materials I’ve been reading lately, mostly peer-reviewed scholarly articles covering the period between 2013 and 2020 and focusing specifically on the Donbas war. Some have described the war in Donbas as a civil war, others as a confrontation between Russia and Ukraine. If you are a proponent of the other two models, you believe that all the discontent aside, if it weren’t for the Russian involvement, the Donbas war wouldn’t have happened.) However, under this agreement, Ukraine would have to change its constitution to decentralize its control of Donbas.
What Do Alcohol and Sleep Have In Common? A Hidden Brain System
       
What Do Alcohol and Sleep Have In Common? A Hidden Brain SystemResearch adds to the complex picture of alcohol and health. Photo by Kinga Cichewicz on UnsplashA lot goes wrong with us, and particularly with our brains, when we don’t get enough sleep. A single rough night leaves us feeling unfocused and groggy, while chronic poor sleep is a major…
You Can’t Just Solve the “Fake News” Problem by Getting Rid of Social Media
       
You Can’t Just Solve the “Fake News” Problem by Getting Rid of Social MediaFirst, think about why people are so angry in the first placePhilippe Donn, PexelsLast week, The Atlantic published “Why the Past 10 Years of American Life Have Been Uniquely Stupid” by Jon Haidt, the NYU professor and author of The Coddling of the American Mind.
I Work as a Technical Director at Pixar
       
I Work as a Technical Director at PixarHow I got here, what I do, my thoughts, and some advice. My first film credit, Turning Red (2022). When I tell people I work as a Technical Director at Pixar Animation Studios, I typically get one of two responses.
Back to basics: What is the point of decentralization?
       
Back to basics: What is the point of decentralization? I start the conversation here because it sets the context for all decentralization: that we have mastered individualist operation and collectivist standardization but have failed at collectivist operation. Individualist resources require standards — which are inherently collectivist — and collectivist resources cannot remove the power of implementation. The answer is that networks which are comprised of individualist resources are woven into a variety of collectivist resources. The point is that when you do need those collectivist resources, your solution to coordinating them ends up defining the entire network.
Deaf Culture, Sign Language and History in the Making
       
It was great to see sign language with voice-over talent, and to be able to work with both Deaf and hearing actors. We can chat in sign language through a window, and at a restaurant where there’s a lot of background noise. Some think sign language is a language for people who are disabled and limited because people are so reliant on their hearing. Many people don’t think about the benefits and advantages of ASL (American Sign Language) — for young children, especially. Hopefully, the parents are involved and supportive in their life, and learn sign language, as well.
What’s So Bad About Elon Musk’s Vision for Twitter?
       
What’s So Bad About Elon Musk’s Vision for Twitter? Photo by Brett Jordan on UnsplashWhen you have a thought, opinion, or idea to share, all you have to do is open your mouth and speak it into existence. If there’s an audience for your ideas, you might even be fortunate enough to secure a platform that extends your reach farther out into the world than you could have ever imagined. You could write a book, send it off to a few respected publishing houses, and one day discover a copy of…
Did the Internet Break Democracy?
       
Did the Internet Break Democracy? No, and Jonathan Haidt’s big new Atlantic essay about social media’s damaging effects couldn’t be more wrong about what ails us. Members of the Missouri Student Association registering students to vote, Columbia, Mo., on Tuesday, Sept. 25, 2012, National Voter Registration Day (KOMU News)Jonathan Haidt, the New York University social psychologist and author of the bestselling book The Righteous Mind: Why Good People are Divided by Politics and Religion, has been on a tear for a…
Folders, Groups & Hierarchies: A Road to Tableau Desktop Specialist Certification
       
; A road to Tableau Desktop Specialist Certification. Welcome to the seventh chapter, In this piece, we are going to learn about Folders, Groups, and hierarchies in Tableau. Chapter 7: A comprehensive guide to creating folders, groups, and hierarchies in Tableau with included sample questions from the examCreating FoldersWe can group our data either on basis of Data Source or Folders. To create a folder, simply choose ‘Group by Folder’ and right-click on any dimension/measure to create a folder. While creating groups within the view, the new group field is instantly used in the view.
My Researcher’s Portfolio: 8 Years of Studying AI Applied to Natural Language Processing
       
My Researcher’s Portfolio: 8 Years of Studying AI Applied to Natural Language ProcessingIn 2013, I decided to start a Ph.D. thesis in AI applied to NLP. Image from www.aismartz.comI chose to enter to field of scientific research clearly for two reasons. As a laureate of a more practice-oriented engineering study, the research field was an unknown area for me. We, therefore, worked on 5 sub-projects, and sub-components: “database generation”, “generation of SQL queries”, “architecture generation”, “Java code or Python generation”, and “user interface generation”. My next scientific work aims to explore the possibilities of explaining machine learning models on natural language issues.
Managing Data for Machine Learning Project
       
Tweet Tweet Share ShareBig data, labeled data, noisy data. Data is a critical aspect of machine learning projects and how we handle that data is an important consideration for our project. OverviewThis tutorial is divided into seven parts:Managing data in SQLiteSQLite in actionManaging data in dbmUsing dbm database in machine learning pipelineManaging data in ExcelManaging data in Google SheetOther use of the databaseManaging data in SQLiteWhen we mention database, very often it means a relational database that stores data in a tabular format. fetchall ( ) X = [ row [ : - 1 ] for row in data ] y = [ 1 if row [ - 1 ] == "tested_positive" else 0 for row in data ] yield np . valueWriting data into Excel cell by cell is tedious and indeed we can add data row by row.
Pixel-level Dense Contrastive Learning
       
Pixel-level Dense Contrastive LearningDense contrastive learning with active sampling strategyPhoto by Tom Winckels on UnsplashContrastive learning (CL) is a self-supervised learning process without labels. The semi-supervised learning frameworkAs shown above from left to right, the learning framework contains three parts: supervised semantic segmentation learning for labeled data; unsupervised semantic segmentation learning for unlabeled data; and pixel-level contrastive learning ReCo. The importance of RCLWith RCL, we can do dense contrastive learning for model pre-training, which is favorable for downstream dense tasks. would be considered in the contrastive learning literature, which is important for general AI development. ReferencesMean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results, 2018A Simple Framework for Contrastive Learning of Visual Representations, 2020Understanding Contrastive Learning and MoCo, 2021Bootstrapping Semantic Segmentation with Regional Contrast, 2022
Announcing the Learning on Graphs Conference
       
Graph Machine Learning has become large enough of a field to deserve its own standalone event: the Learning on Graphs Conference (LoG). Examples of such focused events include workshops on algorithmic reasoning, molecule discovery, learning on knowledge graphs, graph ML in industry, and graph learning benchmarks. This year, with the help of an international team of some of the world’s top graph learning researchers, we are organising the inaugural Conference for Learning on Graphs (the LoG Conference). These cover topics like graph learning benchmarks, graph learning in industry, algorithmic reasoning, learning with molecules, and more. AcknowledgmentsWe would like to thank the organisers of several workshops and conferences who have helped us with this process: the WWW Workshop on Graph Learning Benchmarks (GLB), WSDM Machine Learning on Graphs Workshop (MLoG), ICLR Workshop on Geometric and Topological Representation Learning (GTRL), KDD/AAAI Workshop on Deep Learning on Graphs (DLG), and the Machine Learning for Health symposium (ML4H).
Graph Attention Networks in Python
       
Graph Attention Networks in PythonIntroduction to Graph Neural Networks with PyTorch GeometricImage by author, file icon by OpenMoji (CC BY-SA 4.0)Graph Attention Networks are one of the most popular types of Graph Neural Networks. Graph Attention NetworksLet’s implement a GAT in PyTorch Geometric. This library has two different graph attention layers: GATConv and GATv2Conv . Note that we use graph attention layers in two configurations:The first layer concatenates 8 outputs (multi-head attention);(multi-head attention); The second layer only has 1 head, which produces our final embeddings. GCN((gcn1): GCNConv(3703, 16)(gcn2): GCNConv(16, 6)) Epoch 0 | Train Loss: 1.782 | Train Acc: 20.83% | Val Loss: 1.79Epoch 20 | Train Loss: 0.165 | Train Acc: 95.00% | Val Loss: 1.30Epoch 40 | Train Loss: 0.069 | Train Acc: 99.17% | Val Loss: 1.66Epoch 60 | Train Loss: 0.053 | Train Acc: 99.17% | Val Loss: 1.50Epoch 80 | Train Loss: 0.054 | Train Acc: 100.00% | Val Loss: 1.67Epoch 100 | Train Loss: 0.062 | Train Acc: 99.17% | Val Loss: 1.62Epoch 120 | Train Loss: 0.043 | Train Acc: 100.00% | Val Loss: 1.66Epoch 140 | Train Loss: 0.058 | Train Acc: 98.33% | Val Loss: 1.68Epoch 160 | Train Loss: 0.037 | Train Acc: 100.00% | Val Loss: 1.44Epoch 180 | Train Loss: 0.036 | Train Acc: 99.17% | Val Loss: 1.65Epoch 200 | Train Loss: 0.093 | Train Acc: 95.83% | Val Loss: 1.73GCN test accuracy: 67.70%CPU times: user 25.1 s, sys: 847 ms, total: 25.9 sWall time: 32.4 sGAT((gat1): GATv2Conv(3703, 8, heads=8)(gat2): GATv2Conv(64, 6, heads=1)) Epoch 0 | Train Loss: 1.790 | Val Loss: 1.81 | Val Acc: 12.80%Epoch 20 | Train Loss: 0.040 | Val Loss: 1.21 | Val Acc: 64.80%Epoch 40 | Train Loss: 0.027 | Val Loss: 1.20 | Val Acc: 67.20%Epoch 60 | Train Loss: 0.009 | Val Loss: 1.11 | Val Acc: 67.00%Epoch 80 | Train Loss: 0.013 | Val Loss: 1.16 | Val Acc: 66.80%Epoch 100 | Train Loss: 0.013 | Val Loss: 1.07 | Val Acc: 67.20%Epoch 120 | Train Loss: 0.014 | Val Loss: 1.12 | Val Acc: 66.40%Epoch 140 | Train Loss: 0.007 | Val Loss: 1.19 | Val Acc: 65.40%E
Technology Taught Us to Never Wait
       
Technology Taught Us to Never WaitAre we too impatient for our own good? Photo by NordWood Themes on UnsplashI think it started with microwaves. A typical microwave can cook an entire chicken in 16 minutes. We started using microwaves to reheat and cook virtually everything back in the late 1970s. It's also around that time that we started standing in…
It’s Time to Release Our Own Kraken!
       
It’s Time to Release Our Own Kraken! There’s No Room in American Jurisprudence for Activist Judges. “It’s Time to Release Our Own Kraken!”, © 2022 Jeff Gates and the Chamomile Tea PartyIn 2000, my wife and I sat on the edge of our bed watching CNN. The Supreme Court had just announced that time had run out on the weeks-long series of recounts for Florida’s 25 electoral votes. George W. Bush had beat Al Gore and was…
Folders, Groups & Hierarchies: A Road to Tableau Desktop Specialist Certification
       
; A road to Tableau Desktop Specialist Certification. Welcome to the seventh chapter, In this piece, we are going to learn about Folders, Groups, and hierarchies in Tableau. Chapter 7: A comprehensive guide to creating folders, groups, and hierarchies in Tableau with included sample questions from the examCreating FoldersWe can group our data either on basis of Data Source or Folders. To create a folder, simply choose ‘Group by Folder’ and right-click on any dimension/measure to create a folder. While creating groups within the view, the new group field is instantly used in the view.
Teaching AI to be more collaborative with humans without learning directly from them
       
We are sharing a paper on our work along, open-sourcing the code, and releasing a public demo where everyone can play with our model trained using off-belief learning. This is where off-belief learning comes in. The goal of off-belief learning is to find the most efficient way to communicate without assuming any prior conventions. In more complex scenarios where grounded play may not be the best solution, we can use the outcome of off-belief learning as the new common-knowledge policy and apply off-belief learning again. Off-belief learning will help address this.
Beyond chat-bots: the power of prompt-based GPT models for downstream NLP tasks
       
Beyond chat-bots: the power of prompt-based GPT models for downstream NLP tasksImage via Vecteezy under license to Ties de Kok. However, the immense potential of prompt-based machine learning using GPT models for other tasks is often less intuitive as it embodies a paradigm shift of sorts. In this article, I will discuss how you can use creative prompt engineering and GPT models to help solve the downstream NLP tasks that you care about. The family of GPT models consists of generative language models that can predict the next token in a sequence of tokens. Wrap-upI hope this post gave you a clearer idea of how you can use creative prompt engineering to use GPT models for downstream NLP tasks!
When Accuracy Isn’t Enough: Visualization and Game Design
       
When Accuracy Isn’t Enough: Visualization and Game DesignModeling typically starts with defining a metric to optimize and, often, this definition of “good” leaves something out. However, consider two sometimes uncommon tools that might be able to help: information and game design. Game design: we don’t know everythingVisualization helps understand goal performance but what about wrong goals? The discipline of MLUX addresses these questions and game design also offers perspective on building competency in complex systems, crafting accessible experiences for gaining understanding, and experimenting [12][13][14][15][16][17]. This framing in mind, visualization and game design move ML from competitor for traditional approaches (or human expertise) to decision making tool.
Modern Recommendation Systems with Neural Networks
       
However, with the increasing popularity of Neural Networks, companies have started experimenting with new hybrid Recommendation Systems that combine them all. In this article, I will show how to build modern Recommendation Systems with Neural Networks, using Python and TensorFlow. # See predictions detailstest.merge(dtf_products[["name","old","genres"]], left_on="product",right_index=True).sort_values("yhat", ascending=False)Image by authorCollaborative FilteringCollaborative Filtering is based on the assumption that similar users like similar products. ConclusionThis article has been a tutorial to demonstrate how to design and build Recommendation Systems with Neural Networks. More importantly, we understood how to use Neural Networks to improve traditional techniques and build modern hybrid Recommendation Systems that can include context and any other additional information.
Five myths that Narcos propagates about the drugs war
       
Five myths that Narcos propagates about the drugs warYou can’t escape signs of the narco economy in Latin America, it affects everything. As I watched series after series, I wondered: how true is the version of history that Narcos and Narcos: Mexico presents to us? Ben spent years in Mexican archives, interviewed over 30 DEA agents, and pieced together the history of the Mexican drugs trade going back a century. But the Mexican drugs trade wasn’t always as violent as it is today. There are two aspects to the Mexican drugs market — trafficking to the US, and selling it at home.
War of the Words: How Language Has Sparked Almost Every European War Since 1870
       
War of the Words: How Language Has Sparked Almost Every European War Since 1870Maps like this one, claiming that territories outside Germany were German “racial soil” (Volksboden) because German speakers lived there, led the world to disaster. In 1871, the leaders of all the German states proclaimed Wilhelm I their one Kaiser. Ethnic German minorities in Czechoslovakia and other countries were generally enthusiastic to be united with the Reich. In 1939, Hitler demanded that the ethnic Germans of the region be allowed to vote on their future. For all Europe has changed, this war started for the same reason that has sparked every other war there for the past 150 years.
Am I a Millennial or Gen Z?
       
Am I a Millennial or Gen Z? What does it mean to be part of a generation? Photo of the author as a child circa 2000, playing with Pokémon and Polly Pockets in her closetIn late 2019, I started teaching in an Irish high school. At 24, I didn’t feel all that different from the 13-year-olds I was mentoring at first, but it soon became clear that we were very different. My references went over their heads — even to what I thought were current, mainstream…
Why Playing Pointless Guessing Games Is Good for the Soul
       
Why Playing Pointless Guessing Games Is Good for the SoulYou learn it’s about the journey, not the destination. Image: Le Penseur by Jean-David & Anne-Laure via Wikimedia Commons | Edited by Eve PeyserMy boyfriend Hudson and I were walking along a thin creek, a streak of water cutting through the barren desert landscape. Immersed in deep conversation, we paused for a moment to admire a white egret, dinosaur-like, walking in the water, and then got back to business.
Preparing for Easter with Ukrainians
       
Preparing for Easter with UkrainiansRiver of tears, river of lightPhoto by VijayajiMy youth was spent running down the long straight roads of the Oregon plains. One mile straight; turn right; one mile and a half straight, turn right; two miles with a few lazy turns, turn right; one mile straight home. Checkerboard theme, with variations.
Web Crawling in Python
       
read_html ( "https://www.federalreserve.gov/releases/h15/" ) print ( tables )[ Instruments 2022Apr7 2022Apr8 2022Apr11 2022Apr12 2022Apr13 0 Federal funds (effective) 1 2 3 0.33 0.33 0.33 0.33 0.33 1 Commercial Paper 3 4