In recent years, the application of deep learning techniques in formal domains such as programming languages, theorem proving, and formal logics has witnessed remarkable advancements. However, the capability of modern deep neural network (NN) architectures to recognize formal languages has not been thoroughly studied. The literature has established connections between specific types of NN architectures and various levels within the Chomsky hierarchy, which categorizes formal languages based on their formal grammar type and the computational models. For example, Long Short-Term Memory NNs have been associated with counter languages, while Stack Recurrent NNs have been linked to deterministic context-free languages. However, the widely used transformer architecture does not align straightforwardly with any category in the Chomsky hierarchy. It identifies star-free regular languages and some counter languages, yet cannot reliably identify non-star-free languages. This is unexpected because regular languages constitute the simplest level within the Chomsky hierarchy. This thesis contributes to the understanding of how transformer architecture recognize regular languages, focusing on the challenges posed by non-star-free regular languages. We propose adding recurrent elements into transformer architectures by using a recurrent neural network to generate a positional encoding. This has resulted in a substantial improvement in performance. We also leveraged the universal transformer’s shared parameter architecture as an alternative way to introduce recurrent elements and have established that this architecture is not suitable in handling languages characterized by periodic patterns. Additionally, we extended our exploration to modifying attention mechanisms to incorporate positional information, employing methods such as ALiBi and RoPE. This was complemented by a review of various positional encodings documented in the literature.