Map unique number to a unique string of 6 characters

Tag: string , algorithm , numbers , unique-key Author: sheeyr Date: 2012-11-26

I have a database table where every row has its unique ID (RowID).

Is there a good way to convert this RowID to a unique key that is always 6 characters in length. Unique key characters can be {A-Za-z0-9}. One example of unique key would be: a5Fg3A.

Of course I do realize there's only certain number of keys I can generate using this method but that doesn't matter for my case.

I've thought much about this but I can't come up with an algorithm that would be able to do this properly.

One idea I had was: Unique key = RowID If RowID is lower than 100000 then append 0 in front of it, for example: 123 becomes 000123 1 becomes 000001

Then for numbers in the range of 100000 to 900000 I would replace first number to a string, e.g. 0 = a, 1 = b, 2 = c, ..., 9 = j.

Then I could do the same with capital letter, etc.

My problem is that my algorithm is very limited and generates low number of keys because it wouldn't utilize all possible characters.

So basically I should be able to generate 56800235584 unique keys assuming every key is of length 6 and utilizes these characters: {A-Za-z0-9}.

A-Z = 26 characters a-z = 26 characters 0-9 = 10 characters

So it's 62^6 unique keys.

Any feedback would be appreciated on how this could be done properly (or even optimal) :-)

Thanks!

Aren't you just looking for a way to convert a number into its base-62 representation ?
That's right, that did the trick. Thanks!

Best Answer

If you want to make A-Z a-z 0-9 to be the alphabet as you noticed you have base 62 number system. So encode the unique rowid in base 62, there is a standard algorithm to do so. If your application allows (needs) it you can add a few more printable characters like '+', '/', '!', '@'.. so you get more uniques. The ready made answer is base64 encoding, widely used.

comments:

Good point. What if the encoded value of the number is less than 6 characters, would I be safe to append some character in front of it?
Whichever number base you choose it will have a zero...and you can pad with zeros on the left, if you need fixed length. If you are using a normal database there is no saving in using variable length columns (varchar), so you may as well pad.

Other Answer1

You can sort your IDs, and then attach an increasing lexicographical string to each.

Simple example where your alphabet is only {a,b} (for simplicity only), and Ids= [20,1,7,90]:

sort: Ids = [1,7,20,90]
Attach increasing strings:
1 =  aaaaaa
7 =  aaaaab
20 = aaaaba
90 = 0000bb

If you want it as a hash function of some sort, and not data dependent - you can just use the same binary encoding that is used to the number, and convert it similary (i.e. 1 = aaaaaa, 2 = aaaaab, 3 = aaaaac...)
[Edit: basically the same as base-62 suggested by @HighPerformanceMark in comments]


The advantages of the first approach: allows you to deal with up to 62^6 numbers, regardless of that their size is, while the second approach does not allow it.

The second approach however, allows you a consistent conversion from number to string, regardless on the specific data.

comments:

Thanks for the great feedback. Indeed, converting the Row ID to base62 did the trick :-)

Other Answer2

There are many ways to do this - the challenge is picking the one that's "best" for whatever your criteria are. Some examples, but far from exhaustive (some already suggested elsewhere):

  • pad with an increasing sequence
  • base-62 representation (note: base-64 is in common use and might even already have code available for it in whatever libraries you have at hand)
  • truncated cryptographic hash (slow, but has some other properties that might be useful, depending on exactly why you need to do this; if you only have to do it once, the performance hit may be worth it)
  • other not-necessarily-cryptographic hash functions that might be considerably faster
  • ......