The final implementation is simpler, at the cost of doing work in chacha20_x_final.
They both will only alloc and free the internal *_ctx structs. Get rid of the void * argument for new and only pass arg to *_free instead of the whole lc_*_ctx struct.