EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning Paper • 2401.17690 • Published Jan 31 • 5