How to assign unique IDs to rows in the product of a table join
This might be more of a math problem than a Caché question.
I have a SQL query that joins two tables. I want to assign a unique ID to each row of the product table.
- I could append the GUIDs of the rows in the two tables, but there are a number of clients that expect a maximum length of 50 on this unique ID. Two GUIDs appended make 72 characters.
- I could append the two GUIDs and then truncate the result, but now I'm worried about collision.
What's the chance of collision if I append the GUIDs and truncate the result to 50 characters? Is there a good way to solve this without updating the clients?
Comments
How about adding an SQL method to one of the classes being joined? The method would call %SYSTEM.Util's "CreateGUID" method and the field associated with that method would be included in the join.
Good thought, but I should have mentioned that the GUID for a specific row needs to stay the same over multiple calls of the query.
We have a table for holidays and for people. Both of these tables have a country column. Each country has a list of holidays and all the people in that country have all those holidays off. The result of the join means semantically: which people have which days off. I could create a third table for this but it would have to be updated any time a holiday or a person gets added.
Your suggestion of using a hash function is good, and I think I'll do that.
Ah, ok thanks for the feedback Pravin. Does the new GUID need to be different than the GUIDs for the records being joined? If the new GUID is not to be persisted, could you pick the GUID from one of the records being joined?
Currently we're using the GUID from one of the tables, but the problem is it's not unique anymore after the join.
We're uniting this person+holiday table with a different table of personal time off to create a general absences table. Client applications access this table through a web service in order to sync a schedule. They need a GUID on each absence entry so they know what needs to be updated. For example, if a holiday changes there's an absence for each person in that country, and the client needs to update each of those entries.
We're only sending across the absences that have been updated since the last sync, so the client can't just rebuild the whole schedule every time.
Thanks, that might work. Is there any danger of the ID from CachéStorage getting reused if an entry gets deleted?
NO NEVER!
ID in CacheStorage is an "autoincrement" positive Integer (1...2**64-1) ~19 digits decimal.
AND is never reused.
And you can split this construct in the original pieces if required
Okay, what does GUID means in your case? I get how people have GUIDs and holidays have GUIDs, but the purpose for the GUID of a person+holiday combo escapes me.
if this is something like
FROM people JOIN holiday
ON people.country = holiday.country
Then the bolded expression should not be longer than your 50 Char.
IF you use CachéStrorage
Can you elaborate on your data model? What are your two tables, and what information joining them generates.
Consider the following database: it has clients and products -and each client and each product has a guid.
The join between clients and products would mean semantically - what client bought which products.
But it's probably be better to store this information in another table - orders and just add properties/fk/relationships to clients and products.
You want GUIDs - a mark of persistency, but you want them in a transient query. I think it would be better to create another table and populate it with the relevant data and new GUIDs and return that new GUIDs.
Another approach would be exposing hash function as an sql procedure and passing both GUIDs into it and returning a hash to a client.
rewrite of comment:
if this is something like
SELECT people.name, holiday.date, people.ID||'^'||holiday.ID as UNIQ
FROM people JOIN holiday
ON people.country = holiday.country
Then the bolded expression should not be longer than your 50 Char.
IF you use CachéStorage
ID is always a unique positive Integer (1.... 19digits) and you can always disassemble it by the separator.