Polar codes were introduced by Arıkan in 2009 and have since received considerable attention in the communications field. Particularly noteworthy is its adoption as a control channel coding scheme in the 5G standard in 2016. Powerful error-correction performance based on the Successive Cancellation (SC) algorithm, coupled with moderate encoding and decoding complexities, positions polar codes for versatile applications across various domains. The potential applications of polar codes continue to expand with their increasing prominence in diverse communication scenarios. SC-based decoding algorithms are mainly used for polar codes. However, SC decoding has the disadvantage of long latency due to the bit-sequential decoding schedule. To address this latency issue in SC decoding, Simplified SC (SSC) and Fast SSC (FSSC) based on multi-bit decision have been introduced. Furthermore, FSSC-based Fast-SC-Flip (FSCF) and Fast-SC-List (FSCL) have been proposed and are being actively studied to achieve better error correction performance. This thesis explains the algorithms and hardware architectures for fast polar decoder, a low-latency polar code decoder such as FSSC, FSCL, and FSCF. First, this thesis proposes a one-hot encoding-based simplified control unit for FSSC decoder and a decoder architecture based on it. The proposed control unit requires approximately 71% less normalized area than the existing finite state machine (FSM)-based control unit, and the proposed decoder improved the area efficiency by 63% compared to the latest FSSC-based decoder. Second, the low-latency FSCF decoding and hardware architecture is described. In particular, this thesis proposes a history memory for intermediate decoding results of FSCF decoding and an FSCF decoder architecture based on history memory. The proposed History-based FSCF (HFSCF) decoder improved worst- case throughput by doubles compared to the FSCF decoder, which shares the same decoder architecture. Third, this thesis describes an optimized sorting network for the FSCL decoder. In particular, a sorting network that performs the sorting operation of the Fast SCL decoder in one clock cycle was proposed, and the design results were compared. The proposed sorting network have up to 90% fewer compare-and- swap units (CASUs) and up to 137% higher operating frequencies than existing sorting networks.