Scaling Artificial Intelligence using the Chapel Programming Language

A major challenge in using machine learning models such as LLMs or convolutional networks is being able to scale these models to large inputs to accommodate business demand. While a single computer may be sufficient to process a handful of images or textual inputs, it will struggle to keep up with thousands of requests from a service's users. As a result, scaling machine learning models – using more computational resources to process larger or more numerous inputs – is crucial for their use in production. 

The Chapel programming language is built from scratch for the purpose of productive parallel computing. This includes using multiple CPU cores, multiple computers ("nodes"), and, recently, other devices such as GPUs to perform computations. Recently, The Chapel team at HPE Cray has been applying Chapel to the problem of scaling machine learning models. This led to the development of a PyTorch-like interface titled ChAI (Chapel AI) which uses Chapel's parallel constructs to allow for distributed training and inference of machine learning models.

So far, this interface has only been applied to relatively simple models performing simple tasks (MNIST handwritten digit classifier). The goal of this senior capstone project is to apply Chapel and ChAI to larger scale models, such as convolutional models performing classification or segmentation, and language models such as BERT or, as a stretch goal, a LLaMa-like model. Language models can be used for a wide range of tasks, including personal assistants, writing and editing text, and summarization. Specifically, the team is interested in scaling models like these from a single computer (e.g., a personal laptop or desktop) to large-scale multi-node High Performance Computing (HPC) machines, and examining various ways in which they can be distributed (replicating models across many machines or running a single model spread across the memories of many nodes). The team is also interested in the various applications of these models at scale. Students participating in the capstone project will gain experience with the details of models, including the layer types and mathematical operations that underpin them; they will also learn about the fundamentals of distributed computing, and the use of high-performance machines (clusters and supercomputers). 

Objectives


  • Implement larger-scale language models using ChaAI to perform tasks such as summarization, text generation, or conversation
  • Investigate and document various ways in which such models can be scaled to improve their speed or throughput, allowing for processing more data faster
  • Demonstrate the scaling of the models by measuring performance on large systems such as the OSU HPC cluster

Motivations


Machine learning is a hugely popular area today; though language and diffusion models have gained the most public attention, various machine learning algorithms are deployed in production. As mentioned in the project description, a challenge in using machine learning models in production is scaling them; going from a limited number of inputs to the rates required for a popular user-facing application. The Chapel language's goal is to enable productive scalable computing; improving the state of scalable AI falls under that umbrella. 

Qualifications


Minimum Qualifications:
  • Some knowledge of Python or C++

Preferred Qualifications:
  • Strong mathematical background in linear algebra or multivariable calculus
  • Experience with machine learning frameworks such as PyTorch
  • Familiarity with parallel computing concepts


Details


Project Partner:

Daniel Fedorin

NDA/IPA:

No Agreement Required

Number Groups:

1

Project Status:

Accepting Applicants

Keywords:
CPythonArtificial Intelligence (AI)ResearchHPCMachine Learning ML
Card Image Capstone