Using the Geomodel to Highlight Unusual Observations

When we first introduced the Geomodel, we mentioned its potential to help surface unusual observations. Today, we’re thrilled to share the first step in realizing that vision. Internally, we've been calling this the "Anomaly Detector"

For the more than 90,000 species included in the Geomodel, like the Joro Spider shown below, we’ve now ranked observations by their “unusualness” using relative Geo scores. The Joro Spider, native to Asia, is rapidly spreading across the southeastern US.

On the unthresholded Geomodel map below, observations (orange points) in lighter blue areas, where Geo scores are lower, are considered more unusual. For instance, the most unusual Joro Spider sighting is in Oklahoma, followed by two observations in Boston—where they’ve recently made headlines.

You can now use new filters in the Identify tool to search for these unusual observations. Here’s how:

  1. Enable Research Grade: The Identify tool defaults to showing Needs ID observations only, so toggle on Research Grade observations first.
  2. Sort by Geo Score: Set the Sort By option to the new Geo score (Ascending) filter.
  3. Exclude Private Locations: Use the Hide observations with private locations option to exclude those records.
  4. Not Expected Nearby: To focus on truly unusual sightings, select the Not expected nearby option, which displays observations falling outside the blue “Expected Nearby” area on the thresholded Geomodel map.
  5. Refine Accuracy: To exclude observations with imprecise locations, enter a number in meters in the Maximum positional accuracy filter.

Understanding Unusual Observations

Unusual observations tend to fall into two categories: errors (e.g., Misidentifications or Inaccurate locations) and interesting discoveries (e.g., Joro Spiders arriving in Boston). Right now, most of the surfaced observations fall into the error category.

Please use this tool to correct these mistakes!

For Inaccurate Locations: Kindly ask the observer to double-check the location by leaving a polite comment and voting "No" on "Location is Accurate."

For Misidentifications: Submit a correcting identification or a disagreeing ancestor ID.


However, not all unusual observations are mistakes. As with the Joro Spider in Boston, some may represent important findings, so it’s crucial to carefully evaluate each case. When in doubt, politely engage the observer and the community for clarification.

The Limitations of the Geomodel

It’s important to note that the Geomodel isn’t perfect. For example, even though there’s a well-documented introduced population of Italian Cave Salamanders in Germany (with over 60 observations), the Geomodel hasn’t yet learned about this range and flags these observations as “Not Expected Nearby” with low Geo scores. We’re actively working to improve the Geomodel’s predictive accuracy, but keep in mind that its current limitations can affect how observations are surfaced.

Looking Ahead

We hope this tool will shine a spotlight on unusual observations in iNaturalist. In the short term, it may mostly help flag inaccurate locations and fix misidentifications—critical work that enhances the quality of the iNaturalist dataset. But in the long run, we believe this feature could evolve into an exciting and powerful Early Detection System for iNaturalist, allowing us to more rapidly surface important discoveries, like invasive Joro Spiders in new areas, which could ultimately help conservationists and invasive species managers respond more effectively.

Thank you to everyone who helps improve the data quality and surface exciting discoveries!

Posted on October 2, 2024 11:56 PM by loarie loarie

Comments

The inaccuracies of this new feature worry me. They will actively decrease the geo score when it’s supposed to be high.

Posted by thescientificbeast 3 days ago

Thanks for adding this! I'm interested to see what anomalies it shows up.

Posted by rupertclayton 3 days ago

Very cool idea, I think this will be useful.

Posted by operculum_ben 3 days ago

Thank you, I've been waiting for this feature ever since you teased it in the geomodel release. I just played with it for a few minutes using Hawaiian plants and it's correctly identifying a bunch of the plants I found for the first time in Hawai'i as anomalous, but then also a bunch of Stapelia for some reason, despite that having hundreds of observations in Hawai'i.

Posted by kevinfaccenda 3 days ago

Only now did I realize there are several filters only available at Identify and not at Observations!

Posted by pladacryptus_wand... 3 days ago

awesome, nice work!

Posted by alecc 3 days ago

fantastic tool, makes it so much easier to find the low-hanging fruit errors and easily clean them up

Posted by thebeachcomber 3 days ago

I can already tell this will be a very useful tool for finding misidentified Asilidae. Great work!

Posted by myelaphus 3 days ago

Nice! Might be less time consuming than typing in species and zooming out on the map.

Posted by jhousephotos 3 days ago

well done, iNat!

Posted by diegoalmendras 3 days ago

Very nice!

Posted by pnwcoasthiker 3 days ago

Fan-damn-tastic!

Posted by gcwarbler 3 days ago

@gcwarbler https://www.inaturalist.org/taxa/904334-Trichonephila-clavata --> About tab --> 'Learn more about the Geomodel here.' right hand side

Posted by thebeachcomber 3 days ago

On the "About" tab on the lower right where it says "The "Expected Nearby" label is derived from the Geomodel. Learn more about the Geomodel here."

Posted by loarie 3 days ago

Got it, thank you. Answered my own question and then deleted the comment. BTW, that link reads like it points to a general discussion of the Geomodel, not the species specific output. Hmm...

Posted by gcwarbler 3 days ago

Awesome idea! We should be cautious about using past data for future predictions in a changing world, but it seems like this was taken into account and in fact will help highlight what those changes are. Will be interesting to see how usage evolves—so far it’s successfully helped me catch a few buried errors in Found Feathers 👌

Posted by featherenthusiast 3 days ago

Brilliant!! In my quick experience, it also appears like the "error" category might also include not-wild organisms that haven't been marked as such yet.

I would love to see a crossover analysis with the accuracy experiments. For example, what % of the observations that were rated as incorrect are also "not expected nearby"? When I just looked at the top 20 most unusual observations in Alaska, I believe that 90% were likely errors with 10% (n=2) possibly interesting discoveries.

Posted by muir 3 days ago

This'll be handy, alright... thanks for adding it! :)

Posted by pinefrog 3 days ago

That's a great tool! Here is the translated Taiwan's traditional Chinese version: https://taiwan.inaturalist.org/blog/99763-

Posted by mutolisp 3 days ago

Exclude Private Locations: Use the Hide observations with private locations option to exclude those records.

Does that option only apply here in this context?
Or can we now choose to hide private location obs across our iNat searches?

(PS I can apply the filter, but it didn't make a difference to that particular search)

Posted by dianastuder 3 days ago

This looks to be an extremely useful tool - thanks for adding it to the tool box!

Posted by jakob 3 days ago

Wonderful tool - thank you!

Posted by lynnharper 3 days ago

This is great. It is much more efficient than my previous method for finding out-of-range IDs that need correcting (which involved using the species view in Explore and scrolling all the way down to the bottom to check for outliers)

Posted by spiphany 3 days ago

Great tool! Thanks!

Posted by texas_nature_family 3 days ago

Hmm... Did a No Man's Sky player name this new feature?

Posted by roblengacher 3 days ago

This is extremely useful, thank you!

Posted by cigazze 3 days ago

I agree -- this tool is supremely helpful! I've been scrolling through some clearly mis-ID'ed observation in my region of interest. In several cases, it's simply a cultivated plant that hasn't been marked as such.

Spectacular job on this, iNat team!!!

Posted by sambiology 3 days ago

YES !!!!!!! THIS IS SO COOL

Posted by humanbyweight 3 days ago

I spent some time with it last night and find it very useful. I generally used a maximum positional accuracy of 10,000. That seems to be the easiest resolution to work with generally for Euphorbia sect. Anisophyllum. I still get plenty of less interesting observations, but it provides a workable ratio of false positives to true positives. The optimal maximum positional accuracy differs a lot by species, though. Some initial observations:

It's best at finding misidentifications of species with distributions that don't fluctuate much. I.e., the less the ranges fluctuate, the smaller maximum positional accuracy can usefully be applied.

The accuracy of this model is based heavily on the accuracy of the geomodel that underlies it. I.e., the more accurate the geomodel, the smaller maximum positional accuracy can usefully be applied.

As explained above, the geomodels struggle a bit with disjunctions, especially those that are undersampled compared to other parts of the range. This causes the model to overpredict in locations between the disjunction and under predict in the areas will fewer observations (see example here). In theory, this means that curating species with disjunct populations might require a different, more complex strategy.

Except for the most obvious cases, it really struggles with widespread weedy species. This is almost certainly due to fluctuating ranges. If a urban areas could be added to the geomodels, perhaps this could be accounted for? Ultimately though, the more weedy the species, the larger the maximum positional accuracy should be for filtering.

Posted by nathantaylor 3 days ago

What about known invasive or naturalised populations?
https://www.inaturalist.org/observations/10786735

Why is this an 'anomaly' ? Vulnerable sp from the Cape Peninsula.
https://www.inaturalist.org/observations/18259187

Posted by dianastuder 3 days ago

very excited to make use of this tool!

Posted by pirarucu 3 days ago

Can this search be added to Explore too, and not just Identify?

Posted by raymie 3 days ago

I'd love to use this tool to highlight unusual records that could be invasive species that could be a threat to Alberta and other regions.

Posted by mothmaniac 3 days ago

Amazing tool!!! Tried it out and the first page was almost entirely mis-IDs and a few unmarked captives; maybe just one or two both correct and wild. I bet this will do a ton to help clean up the inat dataset!

Posted by wildskyflower 3 days ago

@dianastuder: There seems to be a problem with the geomodel data for Serruria cyanoides. Maybe that's causing some unexceptional observations to show up as anomalies.

Posted by rupertclayton 3 days ago

@dianastuder: - it is simple. There is a maximum grain size used in the model. A hexagon approximately 50km across.
Well most of the Se cyanoides records occur in one hexagon (110 out of 120), and the other 10 are in another hexagon,
So the model predicts those in the higher hexagon, but not those in the lower hexagon.
You cannot expect any real predictions from the model based on only two points.

Unfortunately we can expect at least 10-20% of our Cape Flora species to only fit in a single point, and thus produce anomalous results. You chose Serruria cynaroides (https://www.inaturalist.org/geo_model/594454/explain) but the same will apply to Serruria hisuta (https://www.inaturalist.org/geo_model/594464/explain) and compare this with Serruria florida (https://www.inaturalist.org/geo_model/503953/explain) which also occurs in a single cell, but is widely planted (I reviewed and marked several planted as captive...)

If we want to meaningfully map and predict plant species in the Cape Flora, we will need a grain size of a minimum of 15km, and ideally 5km. Unfortunately, getting people to map at a resolution of less than 2.5km radius is not easy.

Posted by tonyrebelo 2 days ago

Thanks @rupertclayton. We fixed the issue causing Geomodel page not to load for that taxon.

Posted by pleary 2 days ago

Only now did I realize there are several filters only available at Identify and not at Observations!

Can this search be added to Explore too, and not just Identify?

@pladacryptus_wandatus @raymie you can copy the query string parameters from an /identify page into an /observations page, e.g. https://www.inaturalist.org/observations?subview=map&user_id=sessilefielder&place_id=any&quality_grade=needs_id,research&order=asc&order_by=geo_score&expected_nearby=false

On that note, the tool is cool but the results (at least for my own observations, in the link above) seem unintuitive. My "most unusual" observation is this California barnacle, because the geomodel excludes, by a matter of meters, the hexagon including the beach where it was found. I realize the geomodel is a work in progress, but I find it unexpected that a one-tile miss would be ranked so highly.

In contrast, this white wagtail vagrant ranks in the lower half of my most unusual observations (least most unusual?), despite actually being, well, unusual; the nearest tile for the species in the geomodel is hundreds of miles north. Is it weighted by number of nearby observations even when all of those observations are "unexpected", i.e. the tile has not yet been added to the geomodel?

It seems like there would be a bias introduced, especially for charismatic and popular taxonomic categories like birds, when a single vagrant individual pops up and prompts a flurry of observations. How do you distinguish these cases from those of the salamanders mentioned above, i.e. "there is a highly observed transient individual" vs. "there is a small, infrequently observed established population"? Introduce some temporal clustering quality?

Posted by sessilefielder 2 days ago

Is it possible to use the map view as depicted yet? There's no map view option that I'm seeing anywhere on the page with these options selected as of yet

Posted by xaxy13 2 days ago

Interesting!

Posted by krechmer 2 days ago

This is huge, not just for improving data quality but also for detecting potential rarities or odd out of range observations. Can't wait to try it out.

Posted by radrat 2 days ago

Agree with the comments above that adding the geomodel to the filters in Explore would be super useful. Here is a fun list of mega rarities and uncommon exotic birds reported over the years in the Greater San Francisco / Monterey Bay Area https://www.inaturalist.org/observations?acc_below_or_unknown=5000&expected_nearby=false&iconic_taxa=Aves&order=asc&order_by=geo_score&place_id=94045&quality_grade=needs_id,research&view=species

Posted by radrat 2 days ago

This is excellent, thanks! It should help greatly in reducing errors. My method for doing this is whenever I have cause to look at a species map, check the map for outliers and open them individually. This will be much more efficient

Posted by deboas 2 days ago

This is really nice.. tried it out on some super commonly misIDed species and it works great

Posted by ajott 2 days ago

Cool

Posted by ck2az 1 day ago

I'm very interested in using this tool to more easily find and remedy erroneous Research Grade observations.

Posted by hannadv 1 day ago

Regarding cultivated observations: if I remember correctly, the geomodel is agnostic as to whether an observation is cultivated or not. So cultivated observations should not necessarily be more likely to show up as anomalous, as long as their presence in the area is widespread and well-documented. However, singular plants in botanical gardens, isolated house plants, and similar would be labeled anomalous. Another thing to keep in mind is that cultivated observations are subject to far less curation, and are more likely to be misidentified. So observations that are showing as highly anomalous may still be worth extra scrutiny, even if they appear to be or are marked cultivated. There is a widespread problem of people who mark observations as cultivated based on an incorrect CV ID without first questioning whether that ID is even correct. I hope this tool will give us another way to find and correct such lost observations.

Posted by alexbinck 1 day ago

This is a great feature that is extremely useful. However, if an anomalous observation with a species-level ID gets bumped to family level (for example) by identifiers who are eager to correct the obviously erroneous initial species-level ID, it would just get lost in the huge pile of observations that are at family-level (assuming the family or coarser taxon has a wide distribution).

So yes, this feature is really useful for quickly detecting and removing erroneous observations, but not so much for actually nailing the finest possible ID, unless the right identifier(s) with the requisite knowledge happen to see it first.

Posted by atronox 1 day ago

Is that any different from someone uploading an observation with an initial broad taxon label applied?

Posted by pladacryptus_wand... 1 day ago

However, if an anomalous observation with a species-level ID gets bumped to family level (for example) by identifiers who are eager to correct the obviously erroneous initial species-level ID

Generally people checking for out-of-range IDs should have enough knowledge to assess whether it is plausible (just because it is out of range doesn't mean it is wrong).

People already push back observations with obviously wrong or out-of-range IDs to whatever level they are confident of; the new feature does not change this. If they are pushing it back to family level and not something broader, there is a reasonably good chance that it will be seen by an identifier with relevant knowledge.

Posted by spiphany about 23 hours ago

It makes a difference if the observation in question is one that is readily identifiable by users who possess the experience/requisite knowledge (i'm thinking fairly common/distinctive taxa to those with the knowledge/experience). If that gets bumped to a coarser ID its identification would be delayed, although that is not necessarily a bad thing

Posted by atronox about 23 hours ago
However, if an anomalous observation with a species-level ID gets bumped to family level

Family level or lower is fine. Any higher rank and it would probably be best to leave the ID incorrect, than deprecate it. Just move it out of RG level to some rank below family. Sixty percent of the time, a specialist should be able to correctly assign it to a similar family, where it will be useful. But higher ranks are likely to lost for much of the tropics and less resourced regions. Although, thankfully the AI is rescuing lots of these (thanks @jeanphilippeb).

Posted by tonyrebelo about 22 hours ago

First 2 for the Cape Peninsula were Sonoran then Texan. So I can use it to rescue - out of range, please check distribution in future.

Posted by dianastuder about 21 hours ago

Why do these discussions always turn into claims that people who are taking the time to provide IDs and clean up wrong information are somehow hindering the ID process?

If an observation is mis-ID'd it needs to be corrected. A broader ID is an improvement over a wrong out-of-range ID. Ideally people would provide correct, specific IDs from the beginning, but that doesn't always happen. When it doesn't happen, the important thing is that the ID gets moved in the right direction. I fail to see how an ID that moves the process forward is detrimental, even if this may not always be as fast as we would like.

Common and distinctive taxa are more likely to be correctly recognized by the CV and therefore less likely to be ID'd incorrectly in the first place, so suggesting that cleaning up out-of-range observations could be counterproductive because there might be some expert who could ID it correctly from the start seems to me to be something of a red herring.

As I said, people reviewing out-of-range observations should have enough knowledge to assess whether the ID is plausible; disagreeing with an ID merely because the geomodel says it is unlikely is no more responsible than uncritically using CV suggestions. It follows that people looking at out-of-range observations will likely have some expertise and therefore be able to suggest a suitable alternative -- at least if the wrong ID is for something that is a closely related taxon.

If the ID is so wildly inaccurate that not even the family is correct, it should get ID'd at whatever level is appropriate. Specialists cannot be expected to provide precise IDs for taxa outside their area of expertise, so it is possible they may sometimes provide a broader ID than is necessary -- but leaving it with an egregiously wrong ID will not get the observation seen by the relevant people either. The fact that the observation ends up at a high level in such cases is not the fault of the IDers. It is a result of the initial wrong ID.

It seems to me that the only reason observations that end up at a broader rank could get "lost" is if people are so convinced that nobody looks at observations with broad IDs that this becomes a self-fulfilling prophecy. (I have seen no solid evidence that this is in fact the case -- on the contrary, anecdotally I know that there are numerous users who do look at observations stuck at order or kingdom, including older observations and observations from the Global South.)

Posted by spiphany about 19 hours ago

Because @spiphany
Plants above family, only in Africa, about 80K - where your anecdotal evidence is a teacup in the ocean (appreciated, but)
https://www.inaturalist.org/observations/identify?per_page=8&iconic_taxa=unknown%2CPlantae&order_by=observed_on&place_id=97392&project_id=123926&hrank=kingdom&lrank=epifamily

Posted by dianastuder about 19 hours ago

So? The fact that there are a lot of observations above family doesn't prove anything.

I've ID'd plenty of perfectly good observations to family or even genus or species that didn't get any further response for months or years. I have also added broad IDs that were then taken to genus or species within a matter of days (and no, these were not for popular taxa like birds or mammals). The specificity of an ID is no guarantee that an observation will get seen. There are a lot of factors at play, including non-controllable factors like who happens to be online and looking at a particular set of observations at a particular time.

I agree that we all should be striving to provide as specific IDs as we can. But it seems to me that the time that gets spent chastizing users for adding IDs that are not felt to be specific enough -- and on at least a few occasions driving them away from IDing in a particular region or at all -- could be spent more productively helping novice IDers improve their skills (so that they can add the family-level IDs that are being demanded) or tackling some of the observations that are stuck at a high level oneself.

Posted by spiphany about 18 hours ago

Add a Comment

Sign In or Sign Up to add comments