Privacy fears as report reveals MILLIONS of online photos are being used to train facial recognition AI without users’ consent
- Researchers are scraping millions of online photos without users’ knowledge
- IBM in particular took photos from Flickr to build its ‘Diversity in Faces’ database
- Users were shocked to discover their photos were scraped, NBC News found
- The dubious practice has raised privacy and ethical concerns among experts
Many facial recognition systems are being trained using millions of online photos uploaded by everyday people and, more often than not, the photos are being taken without users’ consent, an NBC News investigation has found.
In one worrying case, IBM scraped almost a million photos from unsuspecting users on Flickr to build its facial recognition database.
The practice not only raises privacy concerns, but also fuels fears that the systems could one day be used to disproportionately target minorities.
Scroll down for video
Many facial recognition systems are being trained using millions of online photos uploaded by everyday people and, more often than not, the photos are being taken without users’ consent
IBM’s database, called ‘Diversity in Faces,’ was released in January as part of the company’s efforts to ‘advance the study of fairness and accuracy in facial recognition technology.’
The database was released following a study from MIT Media Lab researcher Joy Buolamwini, which found that popular facial recognition services from Microsoft, IBM and Face++ vary in accuracy based on gender and race.
The Diversity in Faces dataset is based on 100 million images published with Creative Commons licenses, which allows anyone to reuse the photos without paying a licensing fee.
However, only academic or corporate research groups can request access to the Diversity in Faces database, according to NBC News.
The website was only able to view the contents of IBM’s database after obtaining it from a source.
- Children and parents who use smartphones spend more time in… Average westerner consumes 9.8 tons of oil, gas, food and… I spy with my little eye… a sakura bud! Start of Japan’s… ‘Things before me danced up and down upon the table’:…
Share this article
Once the photos are collected, they’re then tagged by age, measurements of facial attributes, skin tone, gender and other characteristics.
Many photographers were surprised to find their photos had been to train IBM’s algorithms.
In one case, IBM scraped almost a million photos from unsuspecting users on Flickr to build its facial recognition database. The photos were then tagged for certain attributes (pictured)
‘None of the people I photographed had any idea their images were being used in this way,’ Greg Peverill-Conti, who had 700 of his photos used in the dataset, told NBC News.
‘It seems a little sketchy that IBM can use these pictures without saying anything to anybody.’
IBM defended the database, saying that it helps ensure fairness in facial recognition technology and promised to protect ‘the privacy of individuals.’
‘For the facial recognition systems to perform as desired, and the outcomes to become increasingly accurate, training data must be diverse and offer a breadth of coverage,’ John Smith, manager of AI Tech for IBM Research, wrote in the blog post announcing Diversity in Faces’ launch.
The firm has also argued that the dataset wouldn’t be used for its commercial products; instead, it will only be used for research purposes.
Mail Online has reached out to IBM representatives for comment.
IBM told NBC News it would assist anyone who wanted their photos removed from the training dataset.
HOW DO RESEARCHERS DETERMINE IF AN AI IS ‘RACIST’?
In a new study titled Gender Shades, team of researchers discovered that popular facial recognition services from Microsoft, IBM and Face++ can discriminate based on gender and race
The data set was made up of 1,270 photos of parliamentarians from three African nations and three Nordic countries where women held positions
The faces were selected to represent a broad range of human skin tones, using a labeling system developed by dermatologists, called the Fitzpatrick scale
All three services worked better on white, male faces and had the highest error rates on dark-skinned males and females
Microsoft was unable to detect darker-skinned females 21% of the time, while IBM and Face++ wouldn’t work on darker-skinned females in roughly 35% of cases
The study tried to find out whether Microsoft, IBM and Face++’s facial recognition systems were discriminating based on gender and race. Researchers found that Microsoft’s systems were unable to correctly identify darker-skinned females 21% of the time, while IBM and Face++ had an error rate of about 35%
Despite this, NBC News found that it was almost impossible for users to prevent their photos from being used.
To request for removal, photographers have to email IBM with links of each photo they want taken down.
But the contents of the database aren’t publicly available, so it’s extremely difficult for photographers to know which of their photos have been swept up in the database.
Flickr users whose photos were scraped voiced concerns about the database.
‘Of course, you can never forget about the good uses of image recognition such as finding family pictures faster, but it can also be used to restrict fundamental rights and privacy,’ Georg Holzer, whose photos were used, told NBC News.
‘I can never approve or accept the widespread use of such a technology.
‘Since I assume that IBM is not a charitable organization and at the end of the day wants to make money with this technology, this is clearly a commercial use,’ he added.
The proliferation of content supplied to social networking sites like Facebook, Google, YouTube and others has made it that much easier for researchers to find data for their studies. This has led to many users’ photographs being scraped by researchers without their consent
Additionally, experts pointed out that IBM isn’t the only organization potentially using users’ photos without their permission.
The proliferation of content supplied to social networking sites like Facebook, Google, YouTube and others has made it that much easier for researchers to find data for their studies.
‘This is the dirty little secret of AI training sets,’ Jason Schultz, a professor at the NYU School of Law, told NBC News.
‘Researchers often just grab whatever images are available in the wild.’
It comes as tech giants ranging from Amazon to Microsoft have faced growing scrutiny from human rights and privacy advocates over their facial recognition software.
Amazon, in particular, has dealt with pushback over its decision to sell its ‘Rekognition’ software to government agencies.
HOW DOES FACIAL RECOGNITION TECHNOLOGY WORK?
Facial recognition software works by matching real time images to a previous photograph of a person.
Each face has approximately 80 unique nodal points across the eyes, nose, cheeky and mouth which distinguish one person from another.
A digital video camera measures the distance between various points on the human face, such as the width of the nose, depth of the eye sockets, distance between the eyes and shape of the jawline.
A different smart surveillance system (pictured) can scan 2 billion faces within seconds has been revealed in China. The system connects to millions of CCTV cameras and uses artificial intelligence to pick out targets. The military is working on applying a similar version of this with AI to track people across the country
This produces a unique numerical code that can then be linked with a matching code gleaned from a previous photograph.
A facial recognition system used by officials in China connects to millions of CCTV cameras and uses artificial intelligence to pick out targets.
Experts believe that facial recognition technology will soon overtake fingerprint technology as the most effective way to identify people.
Source: Read Full Article