Here’s an interesting article on how to represent a categorical feature, with 100’s of levels, in a model in R.

In this post, we will discuss using an embedding matrix as an alternative to using one-hot encoded categorical features for in modeling. We usually find references to embedding matrices in natural language processing applications but they may also be used on tabular data. An embedding matrix replaces the spares one-hot encoded matrix with an array of vectors where each vector represents some level of the feature. Using an embedding matrix can greatly reduce the memory needed to handle the categorical features.