Pitfalls of Family Research (1 Viewer)

ColinEssex · Jul 5, 2018

I thought all Americans were derived from immigrants, except Red Indians of course.

So doing family ancestry is a bit of a waste of time I would think, as being immigrants, you're bound to hit a brick wall.

Col

Frothingslosh · Jul 5, 2018

Every family ancestry will hit a brick wall at some point. Does that mean no one should ever do it?

The_Doc_Man · Jul 7, 2018

ColinEssex - the "Red Indians" prefer "Native Americans" these days. And they aren't natives, either. Based on DNA studies, including mitochondrial DNA and other more advanced methods, they are descendants of Asians. More specifically, descended from Siberians who walked across a then-existing land bridge that included the Aleutian Islands, at least 15,000 years ago.

http://www.ucl.ac.uk/news/news-articles/1207/12072012-native-american-migration

Now if you could trace your ancestry back THAT far, you'd have a pretty serious family tree!

On the other hand, Col - and anyone else who wonders why we do this sort of thing ...

I think we do it to prove our own diversity to ourselves. Our diversity makes us who we are. We do it to show that we are related to many people in many situations. Call it a form of tribal identification if you want to put primitive terms on it, but that is what we do. Based on this kind of research, I have found my roots in Sussex, England and in a part of France near Tours. I found my wife's roots in Nova Scotia, in the Acadian region. It is perhaps just a mental exercise, but if so, it is a generally harmless way of seeking knowledge - and that is enough to make it worthwhile.

The_Doc_Man · Jun 12, 2020

It has been 2 full years since my last post on this topic, so I thought I would offer a progress report.

I've been working on presentation programming using the facilities of Office and that is working well. I have been filling in where I can with more research on ancestors for me, my wife, and even for the biological father of my step-children. My 900-person tree from two years ago has blossomed to over 1800 people and I have traced back another branch of my wife's family to the Canary Islands and Adalucia, Spain, which explains some of the names in that branch, names that don't AT ALL look French in their origins.

I am getting to the point that when I finish a couple of more trees back to the point that the records give out, I won't push so hard and will drop my subscription to Ancestry.COM - but I can't say I didn't learn a lot from them and I can't say that the project was useless. If for no other reason, I can say that my family and my wife's family have enjoyed learning about their ancestors and origins.

NauticalGent · Jun 12, 2020

A hobby I may take up once I have fully retired. I dont know though, this micro farming hobby I have now is looking like it's full-time!

The_Doc_Man · Oct 26, 2021

And yet ANOTHER turn of the wheel. Over a year after my last post, I am ready to scream at Ancestry again because they have no concept of backwards compatibility. I got my list up to 2200 people in the family tree counting my parents, my wife's parents, and her first husband's parents, because my target is to have the family tree for my grandsons. So.... there I am, getting ready to do the next analysis while I was in Alabama, evacuated because of Hurricane Ida. Had plenty of time to clean up some links. Got some REALLY great family info from a cousin I didn't even know I had, who helped me with a "lost" branch of the family tree. Downloaded the file... and could not process it because at least three different things have happened.

First, they changed to version 5.5.1 of the GEDCOM file standard (Genealogy databases actually DO have a standard.) Then, the UTF8 standard ALSO changed with regard to legal line delimiters. Finally, they added so many people that their "person ID" became MUCH bugger.

I finally used my own ParserObj module to help me convert their .GED file to a .TXT file in ANSI format, including catching cases of having only a CR for line ending characters instead of CR/LF, or the *NIX style of LF only. That was the newer UTF8 standard that did that for me. So... got that done. The new converter routine stripped out the extraneous extended characters AND normalized line termination to always use the CR/LF standard. Worked great!

Got past that point, realized that Ancestry.COM added new data elements that I had to update in my system because they track stuff I'd never even thought about. They are using entity/attribute/value definitions so I had to account for the new attributes so that I could reject them if they were something I didn't want to know about. I think I have fixed that problem well enough that I'm not getting barfed on by the semantics scanner.

But the thing that is currently frosting my cookies is that Ancestry uses an internal person ID number that gets assigned automatically by their system. In Version 5.5 of the GEDCOM standard, this was a 9-digit integer. In version 5.5.1, it is a 12-digit number. Which means I can't use a LONG for holding this key. Fortunately my 'puter is 64-bit and I'm running 64-bit Windows. I believe my copy of Access will allow me to use a QUAD integer, so I might just have to convert my tables for a QUAD PK and some QUAD integer relationships.

To say that this journey has been interesting just doesn't quite cover it.

But at least I have learned how to process UTF8 files using the ParserObj that I put in the sample code section. I no longer have to use NOTEPAD to do it, because NOTEPAD didn't fix the problem of non-standard line termination.

Uncle Gizmo · Oct 26, 2021

The_Doc_Man said:
QUAD integer

New one on Me! Never heard of that before!

isladogs · Oct 26, 2021

QUAD Integer

Nor me!

Do you have A2016 or later? If so, consider using Large Number (bigint) or perhaps that's what you meant!

jdraw · Oct 27, 2021

Nor me, but Via Google Quad Integer

The_Doc_Man · Oct 27, 2021

Unfortunately, though the machine and O/S might support it, data type QUAD is not supported on my old home copy of Access, which is 2010. I ran into it on an AC2016 with the Navy, so I knew it was legal. Just didn't know how far back it was legal. So I might have to cheat slightly.

It is possible to still get number translation accuracy and minimal roundoff if I use a DOUBLE instead of a LONG. I can go up to 15 digits with a DOUBLE and only need 12. Might not be the most elegant PK, but in theory it should work with what I have. OR I could upgrade to a version of Office later than 2016. Either way, I have to rework everything that uses that particular function to pull an ID number out of the file. I'll be able to find the references to the routine that extracted the number (it's always in a specific context with VERY specific syntax) so I will know what fields need to be updated. It's just a pain in the toches to have to do it.

The_Doc_Man · Nov 8, 2021

Solved the long-number problem another way. I don't need THEIR number, all I need is a unique number. So when they come up with one of their internal ID numbers, regardless of the length, I dump it into a table and use a DMax+1 to generate a new unique number associated with the identifier. So when the person ID comes around again during the family portion of the file, I can just look up that unique ID that is shorter and fits into a LONG even if the other ID won't. After a couple of months of revamping that ID number, I'm back on the air again. A little slower because the ID number translation stuff takes a little extra time - but not bad otherwise. Up to 2280 family members and still growing.

NauticalGent · Nov 8, 2021

The_Doc_Man said:
Up to 2280 family members and still growing.

Thanksgiving dinner is going to be quite the event...

The_Doc_Man · Nov 8, 2021

Well, I have often said my dear wife cooks for a small army...

But seriously, most of the 2280 have been dead long enough that they won't eat much. The family tree goes back to 1600s Spain, France, and England. I have two or three branches left to explore, a couple of tangles to resolve, and then I'm calling it quits.

kevlray · Nov 8, 2021

Doc: Lots of luck on your search.

Pat Hartman · Nov 8, 2021

Use text to hold their long ID field.

Have you looked into the Mormons? They are serious about ancestry and offer help to the public. My husband's best friend use to donate time in a library helping with searches. If you have a Mormon library in town, check them out.

The_Doc_Man · Nov 9, 2021

Ancestry IS a Mormon project. If you delve into the GEDCOM format, it includes special notation for Mormon marriages and births. As I may have mentioned earlier, they put special emphasis on such marriages, sort of analogous to the rule that you can't really be Jewish unless your mother was Jewish. Otherwise, you are just a convert.

After looking into it more carefully, I didn't need to use the long ID field literally. What I really needed was a unique ID that would be reliably translated and still fit into a LONG.

To accomplish this, I have a table with the actual 13 byte alphanumeric ID and a LONG field that is generated as a DMAX+1 number. I use the generated number as the internal ID in my DB, but when translating the GEDCOM file, if they reference the person's long ID number I can look it up to get the number I used.

When I have a long-ID person number, that person occurs two or more times. Once as an individual (code INDI) and once as a child of a family (code FAMC). And sometimes as the parent in a family (code FAMP). Other relationships also crop up that repeat the individual number, like spousal relations. The long code ID number is defined for the INDI case but referenced for the other cases. I have tested this and it works without me having to change the other tables, relationships, or code. There was already a subroutine to parse and extract the GEDCOM ID, so now it stores the long ID and computes (or looks up) the ID that I will use for the rest of the session. I have already tested it and verified that this works. It is a few percent slower because it takes time for VBA to fully parse the GEDCOM file. But it does it.

Also for the record in case someone runs into this. I mentioned it earlier but it bears repeating.

UTF-8 formatted files recently changed due to updated standards probably related to web-oriented languages. In a text file output via VBA's "PRINT" verb, lines WILL end with CRLF. In a text file output by *NIX variants, text lines end with LF. But in a UTF-8, the line can ALSO end with CR.

This became a problem because VBA's LINE INPUT does not recognize the CR as a line delimiter. My text parser can handle the necessary tests to break up the line with careful parsing, which is how I managed to fix this problem. Without the conversion, what I got was a single input line full of CR characters but treated as a single line over 1.2 MB long (out of 1.3 MB in the whole file). Takes me 7 1/2 minutes to convert the 1.3 MB file but the result is perfect.

Pitfalls of Family Research (1 Viewer)

ColinEssex

Old registered user

Frothingslosh

Premier Pale Stale Ale

The_Doc_Man

Immoderate Moderator

The_Doc_Man

Immoderate Moderator

NauticalGent

Ignore List Poster Boy

The_Doc_Man

Immoderate Moderator

Uncle Gizmo

Nifty Access Guy

isladogs

MVP / VIP

jdraw

Super Moderator

The_Doc_Man

Immoderate Moderator

The_Doc_Man

Immoderate Moderator

NauticalGent

Ignore List Poster Boy

The_Doc_Man

Immoderate Moderator

kevlray

Registered User.

Pat Hartman

Super Moderator

The_Doc_Man

Immoderate Moderator

Similar threads

Users who are viewing this thread