We are excited to announce the launch of our Vision Language Demo, developed in collaboration with our long-time partners at the University of Massachusetts Amherst, the University of Edinburgh, the University College London and MIT, with generous support from Microsoft AI for Earth. This demo enables you to search a snapshot of 10 million iNaturalist photos using text queries. For instance, typing in "a bird eating fruit" will return matching photos ranked by their relevance to your query.
By clicking the “View these observations in Identify” button at the bottom, you can open these photos in the iNaturalist Identify tool where you can add the observations to projects or add observation fields or annotations. We are excited to learn if you find this tool useful for finding and organizing observations representing different life stages (“a caterpillar”), flowering phenology (“a cluster of red berries on a leafy green branch”), captive/cultivated (“a houseplant in a pot”) etc. into projects and with annotations.
Unlike the iNaturalist Computer Vision Model and Geomodel which we train ourselves off of iNaturalist observations, we did not train this model nor is it trained on iNaturalist data. This demo is built off a freely available Vision Language Model that was trained on millions of captioned images not necessarily relating to the natural world. This means that it knows about other things in addition to living organisms (e.g. "a bird perched on a car") but it also means that it currently has biases and may return inappropriate or offensive results that we don’t fully understand. Please keep that in mind when using it.
You can help us and our research collaborators understand how this model (or other Vision Language Models models we may explore or build) perform by clicking on the “Help us Improve” button. By marking the photos on the page that are relevant or not relevant to your search (e.g. "Mating dragonflies") and clicking submit we will be able to compare the performance of different Vision Language Models at this image retrieval task.
We built this demo to better understand the potential of Vision Language Models to help the community organize, explore, and explain the information contained within iNaturalist images. Building this demo has helped us understand the opportunities and challenges associated with this new technology. For example, while these models sometimes demonstrate a surprising ability to describe what is happening in images at a coarse level, they also fail to grasp more complex, finer concepts such as species names.
Two exciting possible future avenues are:
1. Helping to explore and organize iNaturalist images
iNaturalist data have been used in more than 4,900 scientific publications. While many scientific applications stem from qualities of the data that are already easy to filter (species, location, date etc.), an increasing number of studies are leveraging “secondary data” contained within the images themselves ranging from species interactions, to animal behavior, to phenotypic patterns revealed in the images such as color. Here are some examples of recent published studies that resulted from pulling patterns out of iNaturalist images:
Conservation: Recovery plans for the endangered Red-bellied Macaw were premised on the belief that the parrot relies on fruits from a single species of palm for food.
Silva and colleagues examined iNaturalist images of this parrot eating fruit and found that it has a much more diverse diet than previously thought.
Climate Change: To reveal how plants are adapting to climate change,
Funkano and colleagues examined iNaturalist images of wood sorrels from around the world and found that leaf color is evolving to become redder in urban heat islands.
Animal Behavior: Jagiello and colleagues examined thousands of iNaturalist images of hermit crabs and found that they are increasingly utilizing lighter weight plastic trash in lieu of shells. This study reveals how certain animals are able alter their behavior to take advantage of the Anthropocene and resulting impacts on the ecosystem.
Evolution: Most mammals are thought to have brown eyes.
Tabin and Chiasson examined iNaturalist images to test this. They found an exception in the cat branch of the family tree where eye-color is extremely variable and explored the role that sexual selection plays. This paper was covered by
Science magazine.
Mimicry: Muñoz-Amezcua and colleagues used computer vision models to examine iNaturalist images and found that many more insects mimic spiders than previously thought. This study reveals how in addition to more efficiently surfacing patterns that the human eye can detect (e.g. cats with blue eyes), vision models can also detect patterns that have gone undetected (e.g. moths that resemble jumping spiders).
We’re very excited to explore whether Vision Language Models can make it explore and organize the rich data contained within iNaturalist images.
2. Explaining Computer Vision species identifications
As anyone using tools such as ChatGPT knows, multimodal Vision Language Models can help explain images in a way that complements more traditional Computer Vision systems. The iNaturalist Computer Vision AI does a great job of telling us what species is in a photo, but it doesn’t do a great job of explaining why that species is suggested.
Offering explanations is something the identifier community does quite well by sharing expertise in text remarks (e.g. “This is Striped Rocket Frog and not Rainforest Rocket Frog because the white stripe extends from the eye to above the leg rather than to the groin.”). We’re interested in building Vision Language Models trained on iNaturalist images and remarks that will help iNautralist users understand why the Computer Vision AI is suggesting certain species and how to distinguish between them.
Deeply integrating Vision Language Models into iNaturalist is still far off and will require new funding opportunities and lots of product and engineering work. But we are very excited to share this small milestone on that journey. Please share your feedback on this exciting new demo!