Skip to main content

Unicode | What and Why?

We will discuss the following concerns in this article

  • What were the needs of character sets? (ASCII, Unicode etc.)
  • How ASCII has emerged?
  • What is  Unicode and Why is it?
  • Why 1 Bit = 8 Byte?
  • Why Java character takes Two Bytes?

In the early age of Computer Technology. The only Binary language was there(1,0). We, the Human, were not comforting with Binary. When they tried to write their name in binary it was time taking and can be a mesh. They needed a human language to use which is most often to use. This technology was emerging in the US. so they need the English language. They found Latin which contains 256 characters including 26 characters of English + German + French + special characters + numbers and others. Language is just a set of characters. So they assign a binary sequence of 8 bit for their every English alphabet's character as:
a => 01100001
=> 01100010
=> 01100011
=> 01000001 etc.

That's why 1 Byte = 8 Bit, as 2^8 = 256 characters.
That's why character data type of C, C++, and many technologies takes One byte.
This is how ASCII(American Standard Code for Information Interchange) created.

And by following this, Every country was making its own character set. As  KOI for Japanese, Big5 for Chinese, ISO-8859 for Europe's seven languages etc.


But with this, a problem arises. let's look at this:
"The character set is the fundamental raw material of any language and they are used to represent information. Like natural languages, computer language will also have a well-defined character set, which is useful to build the programs."
This means that Choosing a programming language means choosing a character set also.
So If we create a software using C language which supports ASCII character set then this software will understand English only, not Chinese or French etc. Softwares were language specific.

We humans have not one language. Every country has their own language. And every country wanted the software to understand their language too. It means that they need a character set which contains all languages of this world.  And Programming languages use that character set to make software.

So they collect the world's all languages and try to make a new character set. They became more than 256 characters so they assign 2 bytes for that character set. It means now this character set can hold 2^16 = 65536 character. But in world's all language's characters are more than 65536. To increase this size, they encode it first in Hexadecimal then convert in binary and generate a Unique Code called Unicode. This is how Unicode arises.

So languages which use Unicode, much have their character data type of 2 bytes.
for ex: Java, Ruby etc
That's why Java reserve 2 bytes for character data type. 

Comments

suggestions

Popular posts from this blog

Why "F" and "L" suffix | (10.0F, 10L)

Let us take it this way, We will create their needs. So we will get why they are needed. Try to guess, which functions will be executed in the following program: public class MyClass {     public static void main(String args[]) {         MyClass obj = new MyClass();         obj.fun1(10);     }     void fun1(byte val){         System.out.println(val);     }     void fun1(int val){         System.out.println(val);     }     void fun1(float val){         System.out.println(val);     }     void fun1(long val){         System.out.println(val);     }     } It seems like every method is capable to run this program because 10 is still literal because It has no data type. Before Java, In previous technologies, this scenario gave an ambiguity...

Promises and Async-await in depth : Asynchronous Programming in Javascript

Promises and Asynchronous Programming One of the most powerful aspects of JavaScript is how easily it handles asynchronous programming. As a language created for the Web, JavaScript needed to be able to respond to asynchronous user interactions such as clicks and key presses from the beginning. Node.js further popularized asynchronous programming in JavaScript by using callbacks as an alternative to events. As more and more programs started using asynchronous programming, events and callbacks were no longer powerful enough to support everything developers wanted to do.  Promises  are the solution to this problem. Promises are another option for asynchronous programming, and they work like futures and deferreds do in other languages. A promise specifies some code to be executed later (as with events and callbacks) and also explicitly indicates whether the code succeeded or failed at its job. You can chain promises together based on success or failure in ways that make your code...

ChatGPT aur Google me kya difference hai?

  ChatGPT aur Google dono alag-alag hain. ChatGPT ek language model hai jo ki OpenAI dwara train kiya gaya hai. Iski madad se hum natural language text ko generate, understand, aur respond kar sakte hain. Google, on the other hand, ek search engine hai jo ki internet par available information ko search karne ki capability rakhta hai. ChatGPT language model ke through hum natural language text ko generate kar sakte hain, jese ki poetry, stories, articles and so on. Iski madad se hum human-like text ko generate kar sakte hain. Google, on the other hand, internet par available information ko search karta hai. Iske through hum kisi bhi topic ke bare me jankari le sakte hain. Dono alag hain, lekin kuch similarities bhi hain. Dono natural language text ke sath work karte hain. ChatGPT natural language text generate karne ke liye use kiya jata hai, aur Google natural language text ko search karne ke liye use kiya jata hai. ChatGPT ke through hum natural language text ko generate kar sakte...