KT trains smart speakers and customer call centres with NVIDIA AI

- Advertisment -

South Korean telecom provider KT uses the NVIDIA DGX SuperPOD platform and NeMo Megatron framework to train billion-parameter big language models.

In a statement, NVIDIA said GiGA Genie, an AI-powered speaker from KT, can operate TVs, provide real-time traffic reports, and carry out various other home assistance activities via voice commands. Thanks to large language models (LLMs), which are machine learning algorithms that can detect, analyse, forecast, and synthesise human languages based on enormous text datasets, GiGA Genie has perfected its conversational abilities in the highly complex Korean language.

KT uses the NVIDIA DGX SuperPOD data centre infrastructure and the NeMo Megatron platform to train and deploy LLMs with billions of parameters. 

The Korean language, known as Hangul, frequently appears on lists of the most challenging languages to learn. Words are frequently made up of two or more roots, with four forms of compound verbs.

According to NVIDIA, by building LLMs with roughly 40 billion attributes, KT enhanced the smart speaker’s interpretation of such phrases. GiGA Genie can also converse with users in English thanks to its interface with Amazon Alexa.

“With transformer-based models, we’ve achieved significant quality improvements for the GiGA Genie smart speaker, as well as our customer service platform AI Contact Center, or AICC,” KT LLM development team lead Hwijung Ryu said. 

AICC is a cloud-based platform that provides artificial intelligence (AI) voice agents and other customer service-related apps.

“LLMs enable GiGA Genie to gain better language understanding and generate more human-like sentences, and AICC to reduce consultation times by 15 seconds as it summarizes and classifies inquiry types more quickly,” Ryu stated. 

According to NVIDIA, creating LLMs can be time-consuming, expensive, and involve full-stock technological investment. The company added for KT, the procedure was sped up and made simpler by the NVIDIA AI platform. 

“We trained our LLM models more effectively with NVIDIA DGX SuperPOD’s powerful performance — as well as NeMo Megatron’s optimized algorithms and 3D parallelism techniques. NeMo Megatron is continuously adopting new features, which is the biggest advantage we think it offers in improving our model accuracy,” Ryu said. 

Training KT’s LLMs required the use of 3D parallelism, a distributed training technique that partitions a very large-scale deep learning model across several devices. According to Ryu, the team was able to complete this assignment quickly and with the highest throughput, thanks to NeMo Megatron.

“We considered using other platforms, but it was difficult to find an alternative that provides full-stack environments — from the hardware level to the inference level. NVIDIA also provides exceptional expertise from product, engineering teams and more, so we easily solved several technical issues,” he stated. 

According to Ryu, KT trained its LLMs twice as quickly as competing frameworks using hyperparameter optimisation features in NeMo Megatron. NVIDIA said that these tools facilitate and accelerate the development and deployment process by enabling users to automatically identify the optimal configurations for LLM training and inference. 

Additionally, KT intends to employ NVIDIA Base Command Manager to conveniently monitor and control the hundreds of nodes in its AI cluster in addition to NVIDIA Triton Inference Server to offer an optimal real-time inference solution.

“Thanks to LLMs, KT can release competitive products faster than ever. We also believe that our technology can drive innovation from other companies, as it can be used to improve their value and create innovative products,” Ryu said.

According to NVIDIA, KT expects to provide more than 20 natural language understanding and generating APIs for developers in November. The application programming interfaces’ usage is possible for various activities, such as the classification and summarising of documents, the identification of emotions, and the screening of potentially offensive content.


- Advertisment -