MTM - Multiple junction tables? (1 Viewer)

mactreb

Registered User.
Local time
Today, 08:50
Joined
Jan 15, 2017
Messages
13
All,

I currently have a database which manages files for different projects. For various reasons we start with a fresh back-end for each new project.

This is a slight over-simplification, but the gist is that information about different types of documents (by which I mean "what that document does" rather than "filetype") are stored in different tables. In theory, every type of document can be linked to every other type of document - these MTM relationships are done through junction tables -- at the moment, one for each conceivable type-type relationship.

That's all fine. Until you want to add a new document type, because you need to create a new junction table for every type of document.

I'd be interested in your thoughts on the wisdom of just having a single junction table for the entire database.

It seems to me it would massively simplify the architecture (in this case by reducing the number of junction tables by ~120), which would also simplify the code which adds records to the junction table (because there's no longer a need to determine which table to add the record to) and the number of queries required to find the joined records.

But it would mean either adding some unique table identifier to the Primary Key in each table (which is fine, though would mean you had alpha-numeric PKs), or adding two extra columns to the junction table to identify the origin of a given FK in that table.

The current way works fine, it's just a total pain to maintain...
 

Galaxiom

Super Moderator
Staff member
Local time
Today, 19:50
Joined
Jan 20, 2009
Messages
12,849
I can't quite picture your situation but I suggest you consider the pros and cons of an Entity-Attribute-Value data model.

This will eliminate the problem of adding new types of information fields for new document types which you currently have to do by adding new fields.
 

mactreb

Registered User.
Local time
Today, 08:50
Joined
Jan 15, 2017
Messages
13
Thanks -
I wasn’t hugely clear (for which, sorry!). The issue isn’t so much with fields, but with tables.

Say I have 4 “types” of record: invoices, transcripts, reports and issues. Because each of these has different information that I want to store (and that information is not limited to a finite set), it’s most practical to have a table for each type of record (let’s call those tbl_[record type]). I want to be able to link each record to each other record. At the moment that’s achieved through junction tables, so a table - call it jun_invoices-transcripts - which has 3 columns - an ID, the PK from tbl_transcripts, and the PK from tbl_invoices. That all works absolutely fine, the problem is just one of scalability.

In the example above, for any record to be able to link to any other record, you need 10 junction tables. If you add another “type” of record, you need to add 5 junction tables. A 6th unwieldy (granted, you can programmatically generate the tables, but say in the future you want to know which user created the relationship - you need to add a field to huge number of tables).

The junction table (single or multiple) is, effectively, an EAV model - you have one incredibly long table that stores the relationships, rather than effectively a column within each table to represent each record in all tables.

My question is just one of wisdom: the above scalability issue can be resolved through one table with two additional columns (which would store the origin of each the two PKs). But I just query whether someone else would open that and think “what is this madness”, or lead to some other issue which isn’t apparent now, but when it became apparent would be a nightmare to fix.
 

MajP

You've got your good things, and you've got mine.
Local time
Today, 04:50
Joined
May 21, 2018
Messages
8,463
I think it would depend on how many fields uniquely define a relationship and how common they are across the many to many tables. Obviously if you made one table it would not be Normal, but it may be worth it if the majority of the fields are common. If well thought out and planned for, sometimes a non normal design is just easier, as long as you think forward. If your single many to many table has less than say 20 fields (in addition to your two foreign keys) with a good percentage filled in for each type of relationship that would be doable. However, if you end up with 100 fields because the different relationships do not have a lot of related characteristics this is probably not worth it. Can you describe a couple of the relationships and what additional fields you track for these relationships?
 

mactreb

Registered User.
Local time
Today, 08:50
Joined
Jan 15, 2017
Messages
13
MajP, I think that answers it.

The fact of the relationship is the only thing that is tracked. I can see potential to track who created it and he date of creation, but can’t really see a use for that in the implementation.

Given that the single junction is, I expect, many many times more efficient (even if it requires going back and replacing the existing queries!)
 

MajP

You've got your good things, and you've got mine.
Local time
Today, 04:50
Joined
May 21, 2018
Messages
8,463
Ok that is different from what is posted. You do not just have Documents. You have invoices, transcripts, reports and issues. As currently designed they are different entities.
 

mactreb

Registered User.
Local time
Today, 08:50
Joined
Jan 15, 2017
Messages
13
Yes and no - they are all physical documents, and the access record is just information about that document. (But yes, within the current architecture they are all separate entities).
 

MajP

You've got your good things, and you've got mine.
Local time
Today, 04:50
Joined
May 21, 2018
Messages
8,463
Yes they are documents but they are not maintained in a document table with related fields. They are uniquely described by different properties. However, this is different from what I thought. I envisioned a single document table and different relations would have lots of different characteristics. This is multiple entities but the relations are nothing but date, person created, who created. I know it would not be normal, but I personally would have a single junction

InvoiceID_FK
TranscriptID_FK
ReportID_FK
....
RelationType ' transcript-report, invoice-report, invoice-transcript
DateCreated
etce

So you fill in only two foreign keys and define the type of relationship. You will have to role your own referential integrity, cascade deletes, etc.
 

Pat Hartman

Super Moderator
Staff member
Local time
Today, 04:50
Joined
Feb 19, 2002
Messages
42,981
For various reasons we start with a fresh back-end for each new project.
Generally poor practice since it means that you also either have to duplicate the FE's or have the users relink the FE to the BE they want to work with at that moment.

are stored in different tables.
Also poor practice and the proximate cause of your problem. Put all the documents into a single table with a type field so you can isolate them by type if you need to. (120 types - egad!!!!)


The document table needs a unique identifier. Autonumber is the best choice. The Junction table uses only the two autonumbers to relate the two tables.
 

MajP

You've got your good things, and you've got mine.
Local time
Today, 04:50
Joined
May 21, 2018
Messages
8,463
Put all the documents into a single table with a type field so you can isolate them by type if you need to. (120 types - egad!!!!)

If I interpreted the OP correctly they are not really all generic "documents." They are in fact separate entities with lots of different properties. Simply providing a type field is not sufficient.

Say I have 4 “types” of record: invoices, transcripts, reports and issues. Because each of these has different information that I want to store (and that information is not limited to a finite set), it’s most practical to have a table for each type of record

So yes if they had a few unique fields and a lot of common fields then you may be better off combining, but it would not be Normal but practical. If most of the fields are unique then would make less sense. Could not enforce a required field that is limited to certain values for a contract that does not apply for a transcript.

I interpreted the 120 link tables to be the combination of possible relations. I assume there are only 16 entities and each table is a combination of 2 foreign keys. Therefore 16 choose 2 combinations without replacement would equal 120.
 

The_Doc_Man

Immoderate Moderator
Staff member
Local time
Today, 03:50
Joined
Feb 28, 2001
Messages
27,001
I'm trying to find the big picture.

If you have four generic document functions (thus four classes of document), the fact of their being related for different purposes doesn't stop them from being related to SOMETHING they have in common, not to each other but to something common among all of them.

There must be SOME commonality among them. Invoices, transcripts, reports and issues do not occur in a vacuum. They relate to a project. If you have four tables, one each to describe invoices, transcripts, reports, and issues SEPARATELY, what ties them together?

You mentioned that you make a new file for each project. What is the superstructure that provides the context for the documents? It is that superstructure that should give you some ideas about how to tie things together based on being part of that over-arching raison de etre' for those documents.

We are still missing some of the structure as evidenced by the example name that was given - jun_invoices-transcripts - which implies breaking things apart by the month, and further suggests an aug_invoices-transcripts would be possible. If so, this information is not nearly normalized correctly.

There is also the discussion about starting over with a new BE every so often. It is not clear to me whether the documents in question are internally stored or externally stored (i.e. stored using OLE or referenced by a UNC path as a data field). The answer to that question would help us understand WHY you want/need to start over again from an empty database.
 

mactreb

Registered User.
Local time
Today, 08:50
Joined
Jan 15, 2017
Messages
13
Thanks, all. I think we've strayed a bit beyond the scope of the original question (one junction table or many - I've now implemented one, and it works perfectly), and I don't propose to change the overall structure. But in the interests of making this a useful resource for others who may stumble across it:

Superstructure.
I'm sure to the horror of many, I have one FE and two (plus) BEs. Let's call them FE-User, BE-Admin, and BE-[Project].

Generally poor practice since it means that you also either have to duplicate the FE's or have the users relink the FE to the BE they want to work with at that moment.

True, but the re-linking is programmatic, so it's not an issue. The user (through FE-User) selects the project they are working on at the time they login (and indeed, can do so with two clicks once they've logged in). The details of each project location are in the BE-Admin, and the tables re-linked to the appropriate BE-[Project]. It takes (quite literally) about 2 seconds.

Why do it this way? A product of necessity:

- some clients demand that their data is stored in specific jurisdictions, and so the BEs (and the files to which they refer) must sit on particular servers within the company;

- there are information barriers, so some users can't have access to certain projects. This can be controlled to a certain extent through user access within the database, but even then, if the data is all in one place (i.e., a single BE), a sufficiently mal-intented user can access it. If you put the entire BE on a server to which they do not have access, then it makes it significantly harder to access the data.

You can design things on the assumption that everyone uses things as intended, but sadly that's rarely a good idea. You have to design with stupidity and malice in mind.

It also has the benefit of making archiving easy: at the end of a project, you can just take the entire project folder (including the BE, and all of the associated files) and archive it. You can give that entire archive to a client with virtually no risk of there being other client data in the archive (because it was never intermingled to begin with).

There is also the discussion about starting over with a new BE every so often. It is not clear to me whether the documents in question are internally stored or externally stored (i.e. stored using OLE or referenced by a UNC path as a data field). The answer to that question would help us understand WHY you want/need to start over again from an empty database.

I think this is largely answered in the above (at least as it relates to the why). As to the document storage: this is all external. The entire aim was to effectively replicate the way you might work with hard copies of documents, but with an interface that's vastly more convenient than what you get through explorer: users really don't care all that much for how file storage is structured, they just want to be able to easily access underlying files and achieve that through a file structure. So in this case you need to think of the DB as, effectively, a gloss applied to what would otherwise be a series of folders a user might create (e.g., for a given project, you might otherwise have '\\June 2018\Project Name\Invoices\Subproject\Service Provider\April 2017), but unless there is some forced naming convention and everyone uses it properly, that becomes very difficult to maintain. The DB dispenses with that - users can drag and drop files, just as they would in explorer, but unlike a user-created folder structure, it's filename-agnostic, and there's no need for any kind of agreed structure that a single user can screw up (at a cost to all other users).

Of course there are plenty of DM systems - Hummingbird, filesite/worksite etc - that exist to do this same thing, but they have their own downsides.

Entities.
We are still missing some of the structure as evidenced by the example name that was given - jun_invoices-transcripts - which implies breaking things apart by the month, and further suggests an aug_invoices-transcripts would be possible.

'Jun' refers to the 'Junction', not the month. Often the thing that links two records will be something completely abstract (an issue, for example - e.g., I have an engineer's report about concrete quality, an invoice for concrete, and some meeting notes about cement mix - on the face of those documents you wouldn't know they all related to the same issue: the cement content of a particular concrete batch.)

If I interpreted the OP correctly they are not really all generic "documents." They are in fact separate entities with lots of different properties. Simply providing a type field is not sufficient.

That's exactly correct. There is some overlap in fields between some entities, but not sufficient to collapse them all down into a single table. In circumstances where a single table is unworkable, I would argue it's better practice to be consistent and have separate tables for each entity than some entities in a single table and separate tables for other entities.

I interpreted the 120 link tables to be the combination of possible relations. I assume there are only 16 entities and each table is a combination of 2 foreign keys. Therefore 16 choose 2 combinations without replacement would equal 120.

Again, that's exactly right.

The Single / Multiple Junction Table.
I've now implemented just the single junction table, which works perfectly.

To achieve that, there are now five columns: an autonumber ID (the PK), the two FKs, and code for each of the origin tables of the FKs. For a given record, you can then find all records in the junction table where that record's ID (nominally) appears, and then filter out those records which do not have the correct origin code(s) (but which would appear in your original result because all entities use the same IDs). (Of course, the user doesn't see this happening).

Thanks!
 

MajP

You've got your good things, and you've got mine.
Local time
Today, 04:50
Joined
May 21, 2018
Messages
8,463
If I am understanding everything correctly, this is one of those cases where you may want to violate traditional database Normalcy for ease and maintainability. How you had it originally was probably the theoretically correct design, but cumbersome. I think if I was doing this I would have a single main document table. This would have only those fields that are common for all your documents. Something like:

Code:
tblDocuments
  documentID
  documentType ' Identifies the specific type: Invoice, Transcript, Invoice
  Author
  CreationDate
 ........

Then I would have details tables for each type of document since they have additional unique fields

Code:
tblInvoiceDetails
  DocumentID_FK  ' A one to one relationship to the document table
  ... Other fields unique to invoice

tblTranscriptDetails
  DocumentID_FK  ' A one to one to the document table
  .... other fields unique to transcripts

Then the junction table is simply

Code:
tblJunDocuments
  DocumentID_1_FK
  DocumentID_2_FK
  AssociationType
  AssociationDate
  ...Other fields you track when you associate two document

This is still not a normal design, because I have to go to a child table to uniquely identify an entity, but I think this would be easy to implement, efficient, and clean. It would be pretty easy to modify your current design to this. But if it is working for you I would go with what you have.
 

Users who are viewing this thread

Top Bottom