Understanding different types of encoding

First of all, Why Character encoding is required?

Character encoding is used to tell computer that how to interpret raw zeros and ones into the real characters, letters and symbols. It is usually done by pairing numbers with characters. This is a way to create software for anybody who uses different language than you do.

There are different kinds of encoding like ASCII, UNICODE, BASE64 and many more.

We generally know how to get binary numbers for any decimal that we have. For example if we want to convert 55 into Binary then it would become 110111 but what about characters like A to Z and a to Z and other things like *, (, ) and $, % that we use daily, how does computer understand all of these characters? The answer is encoding. Encoding makes sure that each character has a representation in computer.

In our keyboard, each of the characters has been assigned a decimal number, even all the special characters that we discussed above. For example A is assigned number 65 and small a is assigned 97, so on and so forth for all other characters in in the alphabet.

This assignment has been done in one of the agreed early formats in 60’s is called ASCII format. As we have 8 bits ( 1 byte ) to represent everything, in ASCII we have 7 bits. This makes it up-to 127 characters that we can encode using ASCII.

64 32 16 8 4 2 1

So when you want to show capital which is assigned 65, how do you represent it in Binary? Simple

1000001

and C? 1000011

You can see the list of full ASCII table here ASCII Table

This makes it easy to encode english and other special characters and represent, because now you can convert any of the character into computer understandable binary code which are represented using ASCII characters.

As people started using computers more and wanted to communicate, this wasn’t enough as everyone around the world to represent their characters in the computer. The ASCII was representing only english characters.

The problem really started when you wanted to say encode Arabic, Japanese, Chinese and many other languages.

UNICODE

Then UNICODE came into picture which could represent much much more characters, or you can say all characters possible in all languages around the world. UNICODE can use more bits than ASCII (just 6 bits). In UNICODE you have 8 ,16 and 32 bits. Which gives us possibility to use around Billion characters and represent them into the computer. UNICODE is back compatible, like if it uses ASCII characters till 127. If you are using UNICODE encoding and use letter A, then still it will be number 65 decimal or 1000001 in Binary.

Base64

Base64 is a way to encode binary data into ASCII characters set known to pretty much every computer system. Example mail systems cannot deal with binary data because they expect ASCII textual data. So if you want to transfer an image or another file, it will get corrupted because of the way it deals with the data.

Most of the computer system stores data in bytes which is 8 bits per byte so ASCII becomes unsuitable for transferring that kind of data. That data might get corrupt. System might wipeout the 8th bit which might corrupt the entire data you want to send. To solve these problems Base64 encoding was introduced. This allows you to encode arbitrary bytes to bytes which are known to be safe to send without getting corrupted

If we want to encode this example test in base64:

String: “This is encoding test in base64”

Base64 encoding: VGhpcyBpcyBlbmNvZGluZyB0ZXN0IGluIGJhc2U2NA==

How base64 encoding is done?

Base64 encoding breaks binary data stream into 6-bits stream of 3 full bytes and represents those as printable ASCII characters in ASCII standard.

The base64 encoding table starts with A-Z, a-z, 0–9 and +, / and pad which is =

26 + 26 + 10 + 2 = 64 characters, that’s why it is called base64 encoding.

Assuming that we have string called “Dog” and we need to convert that into base64 encoded string.

  • Need to convert ASCII string “Dog” into ASCII numbers
  • Need to convert these numbers into Binary, which becomes
  • These binary stream of 1s and 0s are divided into subset of 6 bits each.
  • A six-character binary stream is converted between binary or base 2 to decimal base 10 Characters by squaring each value represented by a 1 in the binary sequence with its positional square. Its basically same as converting the binary 8 bit into decimal but here the scale is 6 bits. So we will only consider following

So the ASCII Dog representation in base64 is RG9n.

Remember that Base64 encodes 24 bits in chunks of 6 bits equaling 6 base64 characters. The group where 4 characters are not getting completed we will use = to complete the character set of 4.

Base64 encoding table

--

--

--

Techie, Curious photographer, Friend and rebellious soul

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Can Kotlin do it better?

Run, Debug & get IntelliSense for C C++ in VSCode

My iOS mentorship program experience

Including third-party libraries in your Swift project using Cocoapods

Why Software Development is a Never-Ending Process?

Coding Problem|Armstrong Numbers

Spring Cloud Kubernetes Persistence

How Heptio Engineers Ace the Certified Kubernetes Administrator Exam

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Gaurav Gupta

Gaurav Gupta

Techie, Curious photographer, Friend and rebellious soul

More from Medium

Using objects as parameters.

Train a custom dataset for object detection using Haar Cascade in Windows

Using the modulus operator to choose a digit

CS373 Spring 2022: Dinesh Krishnan Balakrishnan