Helping AI to learn about Indigenous cultures

By Alex V. CipolleThe New York Times Company

In September 2021, Native American technology students in high school and college gathered at a conference in Phoenix and were asked to create photo tags — word associations, essentially — for a series of images.

One image showed ceremonial sage in a seashell; another, a black-and-white photograph circa 1884, showed hundreds of Native American children lined up in uniform outside the Carlisle Indian Industrial School, one of the most prominent boarding schools run by the U.S. government during the 19th and 20th centuries.

For the ceremonial sage, the students chose the words “sweetgrass,” “sage,” “sacred,” “medicine,” “protection” and “prayers.” They gave the photo of the boarding school tags with a different tone: “genocide,” “tragedy,” “cultural elimination,” “resiliency” and “Native children.”

The exercise was for the workshop Teaching Heritage to Artificial Intelligence Through Storytelling at the annual conference for the American Indian Science and Engineering Society. The students were creating metadata that could train a photo recognition algorithm to understand the cultural meaning of an image.

The workshop presenters — Chamisa Edmo, a technologist and citizen of the Navajo Nation, who is also Blackfeet and Shoshone-Bannock; Tracy Monteith, a senior Microsoft engineer and member of the Eastern Band of Cherokee Indians; and journalist Davar Ardalan — then compared these answers with those produced by a major image recognition app.

For the ceremonial sage, the app’s top tag was “plant,” but other tags included “ice cream” and “dessert.” The app tagged the school image with “human,” “crowd,” “audience” and “smile” — the last a particularly odd descriptor, given that few of the children are smiling.

The image recognition app botched its task, Monteith said, because it did not have proper training data. Edmo explained that tagging results are often “outlandish” and “offensive,” recalling how one app identified a Native American person wearing regalia as a bird. And yet similar image recognition apps have identified with ease a St. Patrick’s Day celebration, Ardalan noted as an example, because of the abundance of data on the topic.

As Monteith put it, AI is only as good as the data it is fed. And data on cultures that have long been marginalized, like Native ones, are simply not at the levels they need to be.

“Clearly, there’s a bias represented,” he said.

The workshop was the initiative of Intelligent Voices of Wisdom, or IVOW, a tech startup that Ardalan, an executive producer of audio at National Geographic, founded to preserve culture through AI and to counter those biases.

“The internet is not representative of the entire population, and when people are represented, it may not be accurate because of stereotypes and hate speech,” said Percy Liang, an associate professor of computer science at Stanford University and director of the school’s Center for Research on Foundation Models.

To counter this tendency, Ardalan, who is an Iranian American of Bakhtiari and Kurdish descent, wants IVOW to develop tools to create “cultural engines” for underrepresented groups so they can generate, and take ownership of, their data.

“The cultural engine cannot be a data scientist in Philadelphia trying to create data sets for a tribe in Arizona,” she said.

More representative, accurate data is beneficial not only to the groups it represents, but also to AI systems at large, said W. Victor H. Yarlott, an AI researcher at Florida International University, a member of the Crow Tribe of Montana and an IVOW collaborator.

“Lacking this knowledge just makes your system worse,” he said. “You’re not really representing human intelligence or human knowledge unless your system can handle it from a broad range of cultures.”

The participation of Indigenous people in the project was critical. Monteith, who led the effort to enter the Cherokee writing system into Microsoft Windows and Office, said he has worked on building trust for technology, and more recently AI, in his Native communities for decades.

“I knew without me doing this that we would be in a worse spot in terms of literacy, and our culture,” he said.

The team at IVOW, along with a group of volunteer collaborators and advisers, has been developing proofs of concept for these cultural engines — smart data sets that can feed more inclusive AI tools, including chatbots and image recognition apps.

One such tool is IVOW’s Indigenous Knowledge Graph, or IKG, a cultural engine in early development that is focused on storytelling about Indigenous recipes and culinary practices. After meeting the IVOW team in 2018, Yarlott pitched the IKG, a sort of visualization of a data set, to capture Indigenous knowledge.

“You know in dramas, you see the person trying to unravel a mystery and they have the corkboard and the little notes and the string between them?” Yarlott said. “That’s basically what the IKG is, but for cultural knowledge.”

The first step was to gather the data. The team chose a culinary focus because it is a part of life that all people share. They collected recipes and related stories from both the public domain and team members.

Monteith chose to enter the story of the Three Sisters stew, a recipe created from symbiotic crops (corn, beans and squash) that he said is known among Indigenous peoples wherever those ingredients grow. The story of the Three Sisters, he said, is not only a recipe but a way to teach sustainability practices, such as the preservation of water.

“It’s just a great metaphor for what we need to do as a society and as a people across the world,” Monteith said.

Using Neo4j, a graph database management system, the recipes were broken down into components (title, ingredients, instructions and related stories) and tagged with information, like the tribe of origin or whether the recipe was contemporary or historical, or had roots in folklore. This data set was then entered into Dialogflow, a natural language processing platform, so it could be fed into a chatbot — in this case, Sina Storyteller, the Siri-like conversational agent designed by IVOW. Anyone can interact with the early version through Google Assistant.

The tools and techniques to create the IKG were designed to be basic enough that anyone, not just those with a background in computer science, could use them. And IKG uses only information that is widely available or that the team had permission to use from their own tribes, bands and nations.

There are challenges, though. The process is labor intensive and expensive. IVOW is a self-funded enterprise, and the work of the collaborators is voluntary.

“It’s a little bit of a chicken and an egg problem because you need the data to really build a big system that demonstrates value,” Yarlott said. “But to get all the data, you need money, which only really starts to come when people realize that there’s substantial value here.”

Liang said that while this kind of “artisanal” data is important, it is difficult to scale, and that more emphasis should be placed on improving foundation knowledge — models that are trained on large-scale data sets.

For years, computer scientists have warned Ardalan that cultivating this sort of data is a tedious process. She does not disagree, which is why she says the time to start is now.

“The future is going to be these cultural engines that communities create that are relevant to their heritage,” she said, adding that the notion that AI will be all-encompassing is wrong. “Machines cannot replace humans. They can only be there with us around the campfire and inform us.”

This article originally appeared in The New York Times.

Source: Read Full Article