Are you a sourcer or recruiter tasked with filling a Machine Engineering (ML) role but unsure of all the terms? Here is a quick glossary from your friends at Rocket that you find useful based on our experience recruiting in this field for top tier clients like Cruise, Lyft, Amazon and more.
Here are the some generalized terms and concepts you might encounter:
- Machine learning: A subfield of artificial intelligence that involves the use of algorithms to automatically learn and improve from data without explicitly being programmed.
- Deep learning: A subfield of machine learning that involves the use of neural networks to learn from data.
- Neural network: A type of machine learning model that is inspired by the structure and function of the human brain, and is made up of interconnected processing units called "neurons."
- Supervised learning: A type of machine learning in which the algorithm is trained on labeled data, where the correct output is provided for each example in the training set.
- Unsupervised learning: A type of machine learning in which the algorithm is not provided with labeled training examples, and must discover the underlying structure of the data through techniques such as clustering.
- Reinforcement learning: A type of machine learning in which an agent learns to interact with its environment in order to maximize a reward.
- Classification: A type of machine learning task in which the goal is to predict a class label for each example in the dataset.
- Regression: A type of machine learning task in which the goal is to predict a continuous value for each example in the dataset.
- Clustering: A type of unsupervised machine learning task in which the goal is to group examples in the dataset into distinct clusters based on their similarity.
- Feature engineering: The process of creating new features from raw data that can be used to train machine learning models.
- Feature selection: The process of selecting a subset of the most relevant features to use in a machine learning model.
- Hyperparameter tuning: The process of adjusting the hyperparameters of a machine learning model to improve its performance.
- Model selection: The process of choosing the most appropriate machine learning model for a given task.
- Overfitting: A condition in which a machine learning model performs well on the training data, but poorly on new data. It occurs when the model is too complex and has learned the noise in the training data rather than the underlying relationships.
- Underfitting: A condition in which a machine learning model performs poorly on both the training data and new data. It occurs when the model is too simple and is unable to capture the underlying relationships in the data.
- Bias-variance tradeoff: A fundamental tradeoff in machine learning, whereby increasing the complexity of a model can reduce bias but increase variance, and vice versa.
- Cross-validation: A method for evaluating the performance of a machine learning model by training it on a portion of the data and evaluating it on the remaining portion.
- Confusion matrix: A matrix that is used to evaluate the performance of a classification model, where the rows represent the actual class labels and the columns represent the predicted class labels.
- Precision: A metric that measures the proportion of true positive predictions made by a classification model.
- Recall: A metric that measures the proportion of actual positive examples that were correctly predicted by a classification model.
- F1 score: A metric that combines precision and recall, and is calculated as the harmonic mean of the two.
In addition to the broad concepts, here are the various technologies, tools and libraries that are often encountered on resumes of ML engineers:
- TensorFlow: an open-source software library for machine learning developed by Google, widely used for training and deploying ML models.
- PyTorch: an open-source machine learning library developed by Facebook, widely used for training and deploying ML models.
- scikit-learn: an open-source machine learning library for Python, widely used for training and evaluating machine learning models.
- Keras: an open-source library for building and training neural networks, often used as a high-level interface to other ML libraries such as TensorFlow.
- NumPy: an open-source library for numerical computing in Python, often used for scientific computing and data manipulation tasks in machine learning.
- Pandas: an open-source library for data analysis in Python, often used for reading, manipulating, and cleaning data in machine learning.
- SciPy: an open-source library for scientific computing in Python, often used for numerical optimization, signal processing, and statistical testing in machine learning.
- Matplotlib: an open-source library for data visualization in Python, often used to visualize and explore data in machine learning.
- Seaborn: an open-source library for data visualization in Python, often used to create more advanced and aesthetically pleasing plots than those available in Matplotlib.
- XGBoost: an open-source library for training and deploying gradient boosting models, often used for improving the performance of machine learning models.
- LightGBM: an open-source library for training and deploying gradient boosting models, often used for improving the performance of machine learning models.
- CatBoost: an open-source library for training and deploying gradient boosting models, often used for improving the performance of machine learning models.
- Random forest: an ensemble machine learning model that consists of many decision trees, often used for classification and regression tasks.
- Support vector machine (SVM): a type of machine learning model that can be used for classification and regression tasks, often used for linearly separable data.
- Naive Bayes: a simple probabilistic classifier based on applying Bayes' theorem with strong (naive) independence assumptions, often used for classification tasks.
- K-nearest neighbors (KNN): a simple non-parametric method for classification and regression tasks, where the model makes predictions based on the majority class of the nearest neighbors.
- Decision tree: a tree-like model used for classification and regression tasks, where the model makes predictions based on a series of decisions made by following the branches of the tree.
- Principal component analysis (PCA): a dimensionality reduction technique that projects the data onto a lower-dimensional space while retaining as much of the variance as possible.
We recommend printing this cheatsheet out or bookmarking it when sourcing! Good luck.
About Rocket
Rocket pairs talented recruiters with advanced AI to help companies hit their hiring goals and knows technology recruiting inside out. Rocket is headquartered in the heart of Silicon Valley but has recruiters all over the US & Canada serving the needs of our growing client base across engineering, product management, data science and more.