- DATE:
- AUTHOR:
- The GEDmatch Team
GEDmatch February Improvements & Updates (massive catch-up since our last!)
February Roundup: AutoKinship Update, One to One updates, and much more
It's been too long since our last product update! We'll do our best to publish these each month in the future. This update covers some important improvements and fixes, so please make sure you read this top to bottom!
AutoKinship update
Almost one year ago, the DNA tree building tool AutoKinship was integrated to the GEDmatch website for Tier 1 users!
We recently discovered that not all shared matches were employed by the clustering analysis. This affected the size of the clusters considerably, and therefore also the resulting trees from AutoKinship. This has now been fixed. So if you did any AutoKinship analyses, it might be interesting to redo the analysis and see how much the results are improved.
To refresh your memory about the purpose of AutoKinship, here is our write-up from last year:
AutoKinship automatically predicts family trees based on the amount of DNA your DNA matches share with you and each other. Note that AutoKinship does not require any known genealogical trees from your DNA matches. IInstead, AutoKinship looks at the predicted relationships between your DNA matches and calculates many different paths you could all be related to each other.
The trees from our analysis are ranked and represent the most likely trees out of all the possibilities we calculated.
The probabilities used by AutoKinship are based on simulated data, kindly provided by Brit Nicholson (methodology described here: https://dna-sci.com/2021/04/06/a-new-probability-calculator-for-genetic-genealogy/).
In addition to the new AutoKinship feature, it also includes the AutoCluster, AutoTree, and AutoSegment tools.
Some improvements were made to the AutoKinship tool. These include the integration of FIR data (for improved prediction of full and half-siblings), visualization of triangulated segments in the reconstructed trees, modified cluster settings resulting in more dense clusters for improved AutoKinship trees, and last, integration of common ancestors from AutoTree into the AutoKinship trees.
Roberta from the blog DNAeXplained has provided us with a very in-depth blog post about this new tool, you can find it here: https://dna-explained.com/2022/02/21/autokinship-at-gedmatch-by-genetic-affairs
Updates to the One-To-Many tools
We made several improvements to our One-To-Many tools. These include:
Updated offset and limit options
The "limit" option is to define how many matches are shown at a time on the screen. This number used to be higher but we saw performance issues come up with the site more frequently, so we have now limited this to 7500 matches that can be shown at a time at the most.
Now, in order to see more than 7500 matches (if you really want to do that!) then you just need to change the "offset" value to a new starting match number. Leaving the offset value to zero means you start that beginning of your match list. Changing the offset to "1000" means that the match result list will start at match #1000.
So that you always know where you are in the match list, we have made this improvement:
Added a match number column
You will always see what # each of your matches are, so that you can easily adjust the "offset" value to essentially page through your match list.
Added the kit #, display name, and email address of the kit being searched
Another small but important improvement we made is to include these details when you search the One-To-Many tools.
Updates to One-to-One Autosomal and One-to-One Q tools
One-to-One Autosomal Update
The one-to-one update includes three parts: an updated algorithm, additional reported segment information on dynamic SNP threshold and bunch limit, and additional reported segment information on template threshold.
The updated algorithm uses a dynamic method for SNP threshold to determine the probability of a segment being Identical By Chance (IBC). This new algorithm uses a 800 SNP window average to determine the random probability of a match. This average SNP threshold is set to correspond to the default Q precision of 7 for random matching. The dynamic range of the SNP threshold is around 200 for autosomal and has one standard deviation between 185 and 214 SNPs.
The segment's dynamic SNP threshold and bunch limit are now reported on the numerical table of the segments. For segments close to the threshold, this information indicates how close the segment was to not being called.
The reporting of the template SNP density ratio compares the number of aligned SNPs in a segment between two kits to the number of SNPs in the SNP template. A good SNP ratio is around 0.50, which means two kits that use the same set of SNPs are likely to be close to that ratio. A SNP ratio of 0.10 is of questionable validity, meaning only 10% of the template SNPs were used in that segment. The SNP ratio is reported numerically and through the richness of the blue color under the segment on the legend.
One-to-One Q Update
The Q update involves two main areas - update of the algorithm and choice of probabilities to use.
Update to the Algorithm
The values of precision and Q are related to the probability of a segment being identical by chance (IBC).
The calculation of the precision and Q has been improved - the new values for precision and Q will be different than the previous version - in general these will be smaller.
With the new algorithm an increase in Q by 1 corresponds to the chance of a given segment being IBC will decrease by a factor of 10.
The default precision value has been adjusted to obtain similar results from one-to-one on average.
Choice of Probabilities
There is now a choice of two different sets of probabilities in calling a segment. These two different sets compute the probability of IBC by two different questions.
If “Random probability” is checked then the probability of a particular SNP being aligned and matching the question asked is:
“What is the probability that this randomly would happen?”
If “Random probability” is not checked then the probability of a particular being aligned and matching the question asked is:
“What is the probability that this given kit 1 alleles values would happen?”
The previous version of Q used the non-random probabilities.
Non-Random Probabilities
Heterozygous (HTZ) SNP alleles on one kit always half match the other kit’s alleles. Every SNP for kit 1 which is HTZ on the segment will not change the total probability of the whole segment. If you reverse the order of the kits then a different set of SNPs which are HTZ and would not change the total probability of the whole segment. This resulted in different Q values depending on the order of the kits entered. The results are not symmetric.
If you check “Homozygous only” the HTZ SNPs are removed and you will get the same Q values independent of the order of the kit or symmetric results.
The use of non random probabilities is however believed to be more sensitive than random probabilities.
Random Probabilities
If “Random probabilities” is checked then the probabilities used do not depend on the value of kit 1 alleles and the order of the kits entered does not matter. The results are symmetric.
The random probabilities also are used in the updated one-to-one tool and are used to get similar results between one-to-one and Q on the average.
GEDmatch Classic Access Ends March 31, 2023
At the end of March 2023, we will be sunsetting GEDmatch Classic, which is the older, outdated user interface for the GEDmatch tools. Maintaining and securing two different code bases for our tools is very challenging, so doing this will help us focus on continuing to improve the experience for our GEDmatch users. The newest version of the GEDmatch interface is where you can find all of our latest tools and improvements.
Additional improvements and bug fixes made
Improvements:
Created new GEDCOM error check tool allowing user to check if their GEDCOM's have any issues.
Updated reference data and algorithm to optimize one to one comparison between Autosomal DNA kits.
Update Q-matching to allow for choosing between RandomL probability and Asymmetric probability.
User now can click and drag the AutoCluster & AutoTree results, and the result can be scaled with large screens.
Improved messaging to users when they’ve uploaded a kit that has a status that would not be batched.
Improved file naming convention to avoid duplicate file names.
Updated offset and limit options and added a match number column to One-To-Many tools.
Improved error handling in A-Matrix, One-To-One and Q-Matching tools
Improved error handling for MRCA tool.
Error code improvements for the SegmentSearch, Parents Related, and AutoCluster tools.
Improved error handling when max daily uploads reached.
Updated kit number text on Triangulation form.
Added a "Show/Hide Chat" button to allow users to hide Tidio chat bar.
Added a “Partner Benefits” tab on the Profile page to support benefits provided by our partners to our Tier 1 users.
Added live chat feature to the Manage Subscriptions page for Tier 1 users.
Improved communication to user during DNA upload process.
Several improvements and fixes made to the GEDCOM compare tool.
Bug Fixes:
Fixed bug where previously searched kit is getting sent to Visualization Option in Tier 1 One-To-Many Tools
Fixed bug related to loss of segments in 3D Chromosome Browser
Fixed button text for MKA's segment search.
Fixed issue where Triangulated Group column was not in triangulation download.
Fixed missing individual issue with surname search.
Fixed character encoding issue that caused AutoCluster result to not show.
Fixed issue where invalid kits are not being filtered out in Multi Kit Analysis if the user edits the kit list.
Have any questions?
Questions or concerns about anything we’ve rolled out recently? As always, our support team is available through our support ticketing system or by sending an email to support@gedmatch.com.
You can also join the community-run Facebook GEDmatch User Group to engage in the community and ask questions! There are many knowledgeable and helpful individuals there that are actively involved in discussions on using GEDmatch.