I am refactoring my site builder from a Python implementation to Haskell. For signature generation, I implemented SHA-256 in Haskell. The implementation is correct, but it is about 30 times slower than the Linux sha256sum tool, even after accounting for process invocation overhead.
As a result, I switched to the system tool and wrote this post to preserve my non-production SHA-256 implementation.
SHA-256 Algorithm
Key components:
- 8 initial hash values: the first 32 bits of the fractional parts of the square roots of the first eight primes
- 64 constants: the first 32 bits of the fractional parts of the cube roots of the first 64 primes
- Chunk size: 512 bits
- Scheduled words per chunk: 64 × 32-bit words
- Rotate-right function:
RTOR - Functions for expanding a chunk to words:
S0,S1 - Functions for updating the result hash values:
EP0,EP1,CH,MAJ
Procedure
- Preprocessing the message.
- Append a
1bit after the message. - Append
0bits after the message until the length mod the chunk size (512) is 448. - Append the length of the original message, in bits, as a 64-bit unsigned big-endian integer. So that the length of the message mod the chunk size is exactly 0. (448 + 64 = 512)
- Append a
- Split the preprocessed message into chunks of size 512 bits.
- Let
h[]be the 8 initial hash values; letk[]be the 64 constants. - Process each chunk iteratively to update the hash state. For a single chunk:
- Create a list of 32-bit words containing 64 elements:
w[64]. - Split the chunk into 16 words (512 = 32 * 16), and put them into the first 16 words.
- Generate the remaining words using the following recurrence relation:
w[i] = w[i - 16] + S0(w[i - 15]) + w[i - 7] + S1(w[i - 2])S0(x) = RTOR(x, 7) ^ RTOR(x, 18) ^ (x >> 3)S1(x) = RTOR(x, 17) ^ RTOR(x, 19) ^ (x >> 10)
- Make a copy of
h[]and call ith'[] - Iterate over
w[]to update values inh'[]t1 = h'[7] + EP1(h'[4]) + CH(h'[4], h'[5], h'[6]) + k[i] + w[i]EP1(x) = RTOR(x, 6) ^ RTOR(x, 11) ^ RTOR(x, 25)CH(x, y, z) = (x & y) ^ ((~x) & z)
t2 = EP0(h'[0]) + MAJ(h'[0], h'[1], h'[2])EP0(x) = RTOR(x, 2) ^ RTOR(x, 13) ^ RTOR(x, 22)MAJ(x, y, z) = ((x & y) ^ (x & z) ^ (y & z))
- Update
h' {a, b, c, d, e, f, g, h}to{t1 + t2, a, b, c, d + t1, e, f, g}
- Update
h[]by adding the corresponding values fromh'[].
- Create a list of 32-bit words containing 64 elements:
- Concatenate the final hash words
h[]to the final result. (32 * 8 = 256 bit)
Complete Source Code
module Modules.Utils.Sha256 (sha256Hex) where
import Data.Word
import Data.Bits
import Data.List
import Numeric (showHex)
import qualified Data.Text as Text
import qualified Data.Text.Encoding as TextEncoding
import qualified Data.ByteString as ByteString
-- ---[ Overview ]------------------------------------------------------------
-- | Pure SHA-256 implementation used by the builder.
--
-- Design notes:
-- - public API takes 'String', encodes as UTF-8, and returns lowercase hex
-- - core implementation follows SHA-256 standard flow:
-- padding -> 512-bit chunking -> word schedule expansion -> compression
-- - this module is self-contained (no external crypto dependency)
-- ---[ Public API ]------------------------------------------------------------
-- | Computes SHA-256 digest as a lowercase hexadecimal string.
--
-- Input text is encoded as UTF-8 bytes before hashing.
sha256Hex :: String -> String
sha256Hex = (concatMap byteToHex) . sha256 . utf8Encode
where
byteToHex :: Word8 -> String
byteToHex b =
let s = showHex b ""
in if length s == 1 then '0' : s else s
utf8Encode :: String -> [Word8]
utf8Encode = ByteString.unpack . TextEncoding.encodeUtf8 . Text.pack
-- ---[ Implementation Details ]-----------------------------------------------
-- 512-bit chunk size in bytes.
chunkSize :: Int
chunkSize = 64
-- Initial hash state constants (H0..H7).
h :: [Word32]
h = [ 0x6a09e667, 0xbb67ae85, 0x3c6ef372, 0xa54ff53a
, 0x510e527f, 0x9b05688c, 0x1f83d9ab, 0x5be0cd19]
-- Round constants (K0..K63).
k :: [Word32]
k = [ 0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5, 0x3956c25b
, 0x59f111f1, 0x923f82a4, 0xab1c5ed5, 0xd807aa98, 0x12835b01
, 0x243185be, 0x550c7dc3, 0x72be5d74, 0x80deb1fe, 0x9bdc06a7
, 0xc19bf174, 0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc
, 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da, 0x983e5152
, 0xa831c66d, 0xb00327c8, 0xbf597fc7, 0xc6e00bf3, 0xd5a79147
, 0x06ca6351, 0x14292967, 0x27b70a85, 0x2e1b2138, 0x4d2c6dfc
, 0x53380d13, 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85
, 0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3, 0xd192e819
, 0xd6990624, 0xf40e3585, 0x106aa070, 0x19a4c116, 0x1e376c08
, 0x2748774c, 0x34b0bcb5, 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f
, 0x682e6ff3, 0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208
, 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2]
messageSizeFieldBytes :: Int
messageSizeFieldBytes = 8
-- Number of words directly parsed from one 512-bit chunk.
initialWordsCount :: Int
initialWordsCount = 16
-- Number of scheduled words per chunk after expansion.
wordsCount :: Int
wordsCount = 64
-- Small sigma 0 for message schedule.
s0 :: Word32 -> Word32
s0 x = rotateR x 7 `xor` rotateR x 18 `xor` shiftR x 3
-- Small sigma 1 for message schedule.
s1 :: Word32 -> Word32
s1 x = rotateR x 17 `xor` rotateR x 19 `xor` shiftR x 10
-- Big sigma 0 for compression rounds.
ep0 :: Word32 -> Word32
ep0 x = rotateR x 2 `xor` rotateR x 13 `xor` rotateR x 22
-- Big sigma 1 for compression rounds.
ep1 :: Word32 -> Word32
ep1 x = rotateR x 6 `xor` rotateR x 11 `xor` rotateR x 25
-- Choice function.
ch :: Word32 -> Word32 -> Word32 -> Word32
ch x y z = (x .&. y) `xor` (complement x .&. z)
-- Majority function.
maj :: Word32 -> Word32 -> Word32 -> Word32
maj x y z = (x .&. y) `xor` (x .&. z) `xor` (y .&. z)
-- Hashes raw bytes and returns digest bytes.
sha256 :: [Word8] -> [Word8]
sha256 msg = concatMap w32ToW8s finalH
where
initialWords :: [[Word32]]
initialWords = (splitChunksToWords . splitToChunks . padding) msg
expandedWords :: [[Word32]]
expandedWords = map expandWords initialWords
processChunk :: [Word32] -> [Word32] -> [Word32]
processChunk hs ws
| length hs /= 8 = error "Wrong initial H for processChunk."
| length ws /= wordsCount = error "Wrong words for processChunk."
| otherwise =
-- Start from previous hash state and run 64 compression rounds.
let finalWorking = foldl' (refreshH ws) hs [0 .. wordsCount - 1]
in zipWith (+) hs finalWorking
finalH :: [Word32]
finalH = foldl' processChunk h expandedWords
w32ToW8s :: Word32 -> [Word8]
w32ToW8s x =
[ fromIntegral (shiftR x 24)
, fromIntegral (shiftR x 16)
, fromIntegral (shiftR x 8)
, fromIntegral x
]
-- Applies SHA-256 message padding:
-- append 0x80, then zero bytes, then original message bit length in BE64.
padding :: [Word8] -> [Word8]
padding msg = initBytes ++ replicate paddingSize continueByte ++ msgSizeBytes
where
firstByte :: Word8
firstByte = 0x80
continueByte :: Word8
continueByte = 0x00
initBytes :: [Word8]
initBytes = msg ++ [firstByte]
modSize :: Int
modSize = length initBytes `mod` chunkSize
targetModSize :: Int
targetModSize = chunkSize - messageSizeFieldBytes
paddingSize :: Int
paddingSize = (targetModSize - modSize) `mod` chunkSize
msgSizeBytes :: [Word8]
msgSizeBytes = intToBE64 (fromIntegral (length msg) * 8)
intToBE64 :: Integer -> [Word8]
intToBE64 x =
let w = fromIntegral x :: Word64
in
[ fromIntegral (shiftR w 56)
, fromIntegral (shiftR w 48)
, fromIntegral (shiftR w 40)
, fromIntegral (shiftR w 32)
, fromIntegral (shiftR w 24)
, fromIntegral (shiftR w 16)
, fromIntegral (shiftR w 8)
, fromIntegral w
]
-- Splits padded bytes into 512-bit chunks.
splitToChunks :: [Word8] -> [[Word8]]
splitToChunks msg
| length msg `mod` chunkSize /= 0 = error "Message length incorrect!"
| otherwise = splitToChunks' msg
where
splitToChunks' :: [Word8] -> [[Word8]]
splitToChunks' [] = []
splitToChunks' msg' =
let (chunk, rest) = splitAt chunkSize msg'
in chunk : splitToChunks' rest
-- Converts each 512-bit chunk into sixteen big-endian Word32 values.
splitChunksToWords :: [[Word8]] -> [[Word32]]
splitChunksToWords = map splitChunkToWords
where
splitChunkToWords :: [Word8] -> [Word32]
splitChunkToWords chunk
| length chunk `mod` 4 /= 0 = error "Chunk size incorrect!"
| otherwise = splitChunkToWords' chunk
splitChunkToWords' :: [Word8] -> [Word32]
splitChunkToWords' [] = []
splitChunkToWords' word8s =
let (wordGroup, rest) = splitAt 4 word8s
in bytesToWord32 wordGroup : splitChunkToWords' rest
bytesToWord32 :: [Word8] -> Word32
bytesToWord32 [w0,w1,w2,w3] =
shiftL (fromIntegral w0) 24 .|.
shiftL (fromIntegral w1) 16 .|.
shiftL (fromIntegral w2) 8 .|.
fromIntegral w3
-- Expands 16 initial words into 64 scheduled words.
expandWords :: [Word32] -> [Word32]
expandWords initialWords
| length initialWords /= initialWordsCount = error "Count of initial words must be 16"
| otherwise = expandWords' initialWords initialWordsCount
where
expandWords' :: [Word32] -> Int -> [Word32]
expandWords' ws i
| i == 64 = ws
| otherwise = expandWords' (ws ++ [nextWord ws i]) (i + 1)
nextWord :: [Word32] -> Int -> Word32
nextWord ws i = (ws !! (i - 16)) + s0(ws !! (i - 15)) + ws !! (i - 7) + s1(ws !! (i - 2))
-- Executes one compression round and returns the next working state.
refreshH :: [Word32] -> [Word32] -> Int -> [Word32]
refreshH ws h' i
| i < 0 || i >= wordsCount = error "Invalid number for refresh H."
| otherwise =
case h' of
[h0, h1, h2, h3, h4, h5, h6, h7] ->
let t1 = h7 + ep1 h4 + (ch h4 h5 h6) + (k !! i) + (ws !! i)
t2 = ep0 h0 + maj h0 h1 h2
in [t1 + t2, h0, h1, h2, h3 + t1, h4, h5, h6]
_ -> error "Wrong initial H for refresh H."
Why It Is Slow
The implementation is intentionally simple and list-oriented, which makes it easy to follow but extremely inefficient.
Some major performance costs include:
- linked-list indexing via
(!!) - repeated list concatenation using
(++) - boxed list representation for bytes and words
- conversion from ByteString to
[Word8]
Still, I like the current structure, and I am not interested in sacrificing readability for performance.