Introducing Triton: Open-Source GPU Programming for Neural Networks
Triton is an open-source Python-like programming language that enables researchers with no CUDA experience to write highly efficient CUDA code. It can be used to write FP16 matrix multiplication kernels that match the performance of cuBLAS in under 25 lines of code. Researchers have already used it to produce kernels that are up to 2x more efficient than equivalent Torch implementations.
Learning to Think Like a Data Scientist
Earlier this year I made the transition from academic neuroscientist to applied data scientist. I participated in the Insight Data Science fellowship where I used a business-minded approach to develop a data product. The overarching purpose of any data product is to provide value by filling an unmet need. It is important to define this need right at the start of the project planning phase because it will become the guide by which all subsequent decisions are made.
Introduction to Binary Classification with PyCaret
PyCaret is an open-source, low-code machine learning library in Python that automates machine learning workflows. It is an end-to-end machine learning and model management tool that speeds up the experiment cycle exponentially and makes you more productive. The design and simplicity of the library are inspired by the emerging role of citizen data scientists, a term first used by Gartner.
A Brief Overview of Methods to Explain AI (XAI)
A Brief Overview of Methods to Explain AI (XAI) How to design an interpretable machine learning process I know this topic has been discussed many times. I recently gave some talks on interpretability (for SCAI and France Innovation) I thought it would be good to include some of my work in this article. The importance of explainability for the decision-making process in machine learning doesn’t need to be proved any longer.
Why Neural Networks Forget, and Lessons from the Brain
Artificial neural networks struggle to learn continually and suffer from catastrophic forgetting. Each task that a neural network learns to perform has a single error function that the network is trying to minimize, irrespective of the error of any other task. For the network to change its behavior or learn how to perform a task, it must change its parameters. This is exactly what happens during training.
The challenges in teaching machines to see like humans
It takes up to thousands of data and algorithms for the machine to identify it is an apple. To kick start the development phase, proper planning and consideration should be given to the data collection phase. We decided to build an engine based on JavaScript and Python, with Electron to automate the task. By ticking off these items from the checklist, the quality of the data provided for machine learning is then assured.
Exploring Model Hyperparameter Search Space with PyCaret & MLflow
We will use the classical airline dataset to see what impact these hyperparameters have on model performance. The results match with our intuition that including at least 12 points in the window provides the best performance. Anything less than 11 points tends to degrade the performance since we have not included the data from one season back. Anything more than 12 data points results in a fairly stable result close to the optimal metric value.
Improving Language Model Behavior by Training on a Curated Dataset
We’ve found we can improve language model behavior with respect to specific behavioral values by fine-tuning on a curated dataset of <100 examples of those values. We also found that this process becomes more effective as models get larger. While the technique is still nascent, we’re looking for OpenAI API users who would like to try it out and are excited to find ways to use these and other techniques in production use cases.
OpenAI Codex
OpenAI Codex is the model that powers GitHub Copilot, which we built and launched in partnership with GitHub a month ago. Codex can interpret simple commands in natural language and execute them on the user’s behalf. It has a memory of 14KB for Python code, compared to GPT-3 which has only 4KB. Codex is most capable in Python, but it is also proficient in over a dozen languages including JavaScript, Go, Perl, PHP, Ruby, Swift and TypeScript.
Refactoring a Jupyter notebook into a maintainable pipeline: A step-by-step guide (Part I)
This blog post series provides a step-by-step guide to convert monolithic Jupyter notebooks into maintainable pipelines. The pipeline will likely crash due to missing dependencies (i.e., you’ll see a error) The exact steps for creating a virtual environment depend on the package manager you’re using and the virtual environment manager; here’s some sample code.
Data Lineage Explained To My Grandmother
Data lineage is a technology that retraces the relationships between data assets. The data lineage helps you rebuild the family tree of your data. It is one of the underlying technologies behind a lot of data products. New interfaces for data lineage tend to be simpler and address one use case per visualization in order to make it easier to understand for business experts. The further you go in the article, the more technical it gets.
Summarizing Books with Human Feedback
To safely deploy powerful, general-purpose artificial intelligence in the future, we need to ensure that machine learning models act in accordance with human intentions. This challenge has become known as the alignment problem. Our best model is fine-tuned from GPT-3 and generates sensible summaries of entire books. It achieves a 6/7 rating (similar to the average human-written summary) from humans who have read the book.
Difference between distributed learning versus federated learning algorithms
Distributed machine learning algorithm is a multi-nodal system that builds training models by independent training on different nodes. Having a distributed training system accelerates training on huge amounts of data. When working with big data, training time exponentially increases which makes scalability and online re-training easier. In effect, federated learning is having a centralized model using decentralized model training.
EmTech MIT, Research Highlights, and More | October 2021
Numenta co-founder Jeff Hawkins recently kicked off Day 2 of MIT Technology Review’s flagship event, the EmTech Virtual Conference. He had a fascinating conversation with Will Douglas Heaven about how to build a better AI with a neuroscience-based approach. Their live conversation marked the second time this year the two had a chance to discuss the topic. In this month’s newsletter, I’m pleased to share some event highlights, new research resources and podcasts.
On-Device Deep Learning: PyTorch Mobile and TensorFlow Lite
PyTorch and TensorFlow are the two leading AI/ML Frameworks. In this article, we examine them more deeply from the perspective of someone who wishes to develop and deploy models for use on mobile platforms. We look at the features and capabilities that each provides along a set of key dimensions such as developer productivity, extensibility, ease of use, hardware support, etc.
Two Simple Things You Need to Steal from Agile for Data and Analytics Work
Jon Loyens: Applying various aspects of the software development lifecycle to data science, engineering, and analytics is very on trend right now. If you want to win this game, you need to find ways for players to interact and collaborate together, capture knowledge and make it easier for more people to play. He says agile practices that are inclusive and truly involve stakeholders throughout the project have the potential to bring data producers (data engineers and stewards) and consumers (data scientists and analysts) together in really meaningful ways.
Artificial intelligence for lung disease detection using chest CT scan images
Two data sets were gathered from Kaggle and Github for training Convolutional Nural Networks (CNN) First, a two-class classification model was trained on balanced data (covid vs normal) to differentiate the healthy cases from covid cases. Second, a neural network was trained to separate four classes including pneumocystis, covid, streptococcus, and normal.
Unsupervised learning can detect unknown adversarial attacks
Researchers at Carnegie Mellon University and KAIST Cybersecurity Research Center developed a new machine learning technique. The new technique takes advantage of machine learning explainability methods to find out which input data might have gone through adversarial perturbation. It was presented at the Adversarial Machine Learning Workshop (AdvML) of the ACM Conference on Knowledge Discovery and Data Mining (KDD 2021)
Watching a Language Model Learning to Play Chess
In a previous article (Chess2Vec), I analyzed which moves in a game of chess are close, in the sense that they often occur in similar situations in games. If two or more moves follow or precede one another, they are considered to be related in some way. The source for my analyses were files of games played on the internet chess server Lichess.