MNP5 Data Compression

In MNP5 (Microcom Network Protocol 5) compression, character frequency statistics are used to compress arbitrary binary messages. Codewords from 4 to 10 bits long are used to represent 8-bit characters. The available codewords are assigned dynamically to use the shortest codewords for the most frequently occuring message characters.

There is a fixed selection of codewords ranging from 4 to 10 bits in length. For most codewords, the first 3 bits (highlighted) indicate how many bits remain in the codeword. Exceptions are the first two 4-bit words (beginning in 000), and the last two words where an additional bit is added to allow a 'reset' codeword (number 256). While this approach limits the number of codewords available with a given length, it ensures that codewords can be extracted from a binary stream by first examining three bits to determine how many more bits to take.


For example, consider this MNP5 compressed binary message:

0100010110111000110001100011...
These 28 bits represent five MNP5 codewords, as:
01000 10110111 0001 1000110 0011...
This corresponds to exactly five 8-bit characters (40 bits) in the original uncompressed message.

MNP5 Codeword Distribution

The 257 available codewords are distributed as follows:

Distribution of Available MNP5 Codewords
LENGTH (bits) 4 5 6 7 8 9 10 11
AVAILABLE 4 4 8 16 32 64 127 2

It is significant that most MNP5 codewords are more than 8 bits long. This implies that for random data, application of the MNP5 compression algorithm will lead to a file which is longer than the original file. To achieve effective compression, the most frequently occuring characters must be assigned to codewords having less than 8 bits.

The initial (default) assignments for codewords are shown in the table below. Note that in this default assignment, printable ASCII characters align with 9-bit codewords. Again, this suggests that without some adjustment, this table would be useless for compressing text messages. In practice, the table assignments change after the first character is transmitted, such that on the second occurence of a given character, a shorter codeword would be used. (See Examples)

A full description of the MNP5 algorithm may be found in The Complete Modem Reference by Gilbert Held


MNP5 Codeword Table (Initial State)

DECIMALCODEWORDOCTET ASCII
0 0000 00
1 0001 01
2 0010 02
3 0011 03
4 01000 04
5 01001 05
6 01010 06
7 01011 07
8 011000 08
9 011001 09
10 011010 0A
11 011011 0B
12 011100 0C
13 011101 0D
14 011110 0E
15 011111 0F
16 1000000 10
17 1000001 11
18 1000010 12
19 1000011 13
20 1000100 14
21 1000101 15
22 1000110 16
23 1000111 17
24 1001000 18
25 1001001 19
26 1001010 1A
27 1001011 1B
28 1001100 1C
29 1001101 1D
30 1001110 1E
31 1001111 1F
32 10100000 20 spc
33 10100001 21 !
34 10100010 22 "
35 10100011 23 #
36 10100100 24 $
37 10100101 25 %
38 10100110 26 &
39 10100111 27 '
40 10101000 28 (
41 10101001 29 )
42 10101010 2A *
43 10101011 2B +
44 10101100 2C ,
45 10101101 2D -
46 10101110 2E .
47 10101111 2F /
48 10110000 30 0
49 10110001 31 1
50 10110010 32 2
51 10110011 33 3
52 10110100 34 4
53 10110101 35 5
54 10110110 36 6
55 10110111 37 7
56 10111000 38 8
57 10111001 39 9
58 10111010 3A :
59 10111011 3B ;
60 10111100 3C <
61 10111101 3D =
62 10111110 3E >
63 10111111 3F ?
64 110000000 40 @
65 110000001 41 A
66 110000010 42 B
67 110000011 43 C
68 110000100 44 D
69 110000101 45 E
70 110000110 46 F
71 110000111 47 G
72 110001000 48 H
73 110001001 49 I
74 110001010 4A J
75 110001011 4B K
76 110001100 4C L
77 110001101 4D M
78 110001110 4E N
79 110001111 4F O
80 110010000 50 P
81 110010001 51 Q
82 110010010 52 R
83 110010011 53 S
84 110010100 54 T
85 110010101 55 U
86 110010110 56 V
87 110010111 57 W
88 110011000 58 X
89 110011001 59 Y
90 110011010 5A Z
91 110011011 5B [
92 110011100 5C \
93 110011101 5D ]
94 110011110 5E ^
95 110011111 5F _
96 110100000 60 `
97 110100001 61 a
98 110100010 62 b
99 110100011 63 c
100 110100100 64 d
101 110100101 65 e
102 110100110 66 f
103 110100111 67 g
104 110101000 68 h
105 110101001 69 i
106 110101010 6A j
107 110101011 6B k
108 110101100 6C l
109 110101101 6D m
110 110101110 6E n
111 110101111 6F o
112 110110000 70 p
113 110110001 71 q
114 110110010 72 r
115 110110011 73 s
116 110110100 74 t
117 110110101 75 u
118 110110110 76 v
119 110110111 77 w
120 110111000 78 x
121 110111001 79 y
122 110111010 7A z
123 110111011 7B {
124 110111100 7C |
125 110111101 7D }
126 110111110 7E ~
127 110111111 7F
128 1110000000 80
129 1110000001 81
130 1110000010 82
131 1110000011 83
132 1110000100 84
133 1110000101 85
134 1110000110 86
135 1110000111 87
136 1110001000 88
137 1110001001 89
138 1110001010 8A
139 1110001011 8B
140 1110001100 8C
141 1110001101 8D
142 1110001110 8E
143 1110001111 8F
144 1110010000 90
145 1110010001 91
146 1110010010 92
147 1110010011 93
148 1110010100 94
149 1110010101 95
150 1110010110 96
151 1110010111 97
152 1110011000 98
153 1110011001 99
154 1110011010 9A
155 1110011011 9B
156 1110011100 9C
157 1110011101 9D
158 1110011110 9E
159 1110011111 9F
160 1110100000 A0
161 1110100001 A1
162 1110100010 A2
163 1110100011 A3
164 1110100100 A4
165 1110100101 A5
166 1110100110 A6
167 1110100111 A7
168 1110101000 A8
169 1110101001 A9
170 1110101010 AA
171 1110101011 AB
172 1110101100 AC
173 1110101101 AD
174 1110101110 AE
175 1110101111 AF
176 1110110000 B0
177 1110110001 B1
178 1110110010 B2
179 1110110011 B3
180 1110110100 B4
181 1110110101 B5
182 1110110110 B6
183 1110110111 B7
184 1110111000 B8
185 1110111001 B9
186 1110111010 BA
187 1110111011 BB
188 1110111100 BC
189 1110111101 BD
190 1110111110 BE
191 1110111111 BF
192 1111000000 C0
193 1111000001 C1
194 1111000010 C2
195 1111000011 C3
196 1111000100 C4
197 1111000101 C5
198 1111000110 C6
199 1111000111 C7
200 1111001000 C8
201 1111001001 C9
202 1111001010 CA
203 1111001011 CB
204 1111001100 CC
205 1111001101 CD
206 1111001110 CE
207 1111001111 CF
208 1111010000 D0
209 1111010001 D1
210 1111010010 D2
211 1111010011 D3
212 1111010100 D4
213 1111010101 D5
214 1111010110 D6
215 1111010111 D7
216 1111011000 D8
217 1111011001 D9
218 1111011010 DA
219 1111011011 DB
220 1111011100 DC
221 1111011101 DD
222 1111011110 DE
223 1111011111 DF
224 1111100000 E0
225 1111100001 E1
226 1111100010 E2
227 1111100011 E3
228 1111100100 E4
229 1111100101 E5
230 1111100110 E6
231 1111100111 E7
232 1111101000 E8
233 1111101001 E9
234 1111101010 EA
235 1111101011 EB
236 1111101100 EC
237 1111101101 ED
238 1111101110 EE
239 1111101111 EF
240 1111110000 F0
241 1111110001 F1
242 1111110010 F2
243 1111110011 F3
244 1111110100 F4
245 1111110101 F5
246 1111110110 F6
247 1111110111 F7
248 1111111000 F8
249 1111111001 F9
250 1111111010 FA
251 1111111011 FB
252 1111111100 FC
253 1111111101 FD
254 1111111110 FE
255 11111111110 FF
256 11111111111 RESET