인공지능(AI) 및 생성형 인공지능/AI 이슈 및 동향

2월 2일 AI 주요 논문

최술사 2026. 2. 6. 15:23

1. ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas
   - 자동화된 프레임워크로, 도구 보강된 언어 모델을 합성 데이터를 사용해 훈련하여 다단계 의사결정 능력을 향상시킴.
   - [아카이브 링크](https://arxiv.org/abs/2601.21558)

2. THINKSAFE: Self-Generated Safety Alignment for Reasoning Models
   - 대형 추론 모델의 안전성을 향상시키기 위해 가벼운 거부 유도와 자가 생성 응답에 대한 미세 조정을 통해 성능을 유지하며 계산 비용을 절감함.
   - [아카이브 링크](https://arxiv.org/abs/2601.23143)

3. TTCS: Test-Time Curriculum Synthesis for Self-Evolving
   - LLM의 추론 능력을 높이기 위해 반복적으로 도전적인 질문 변형을 생성하고 스스로 일관성 보상을 통해 추론 해결사를 업데이트하는 공동 진화형 테스트 시간 훈련 프레임워크를 제안함.
   - [아카이브 링크](https://arxiv.org/abs/2601.22628)

4. SSL: Sweet Spot Learning for Differentiated Guidance in Agentic Optimization
   - 최적 솔루션 공간으로의 대리 최적화를 유도하기 위해 계층화된 보상을 사용하는 새로운 강화학습 프레임워크를 소개하여 샘플 효율성과 과제 간 전이성을 개선함.
   - [아카이브 링크](https://arxiv.org/abs/2601.22491)

5. MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning
   - 긴 수명을 가진 추론을 향상시키기 위해 상호 작용 역사들을 시각적인 레이아웃으로 압축하는 다중 모달 메모리 에이전트를 제안하며 효율적인 문맥 활용을 가능하게 함.
   - [아카이브 링크](https://arxiv.org/abs/2601.21468)

6. DreamActor-M2: Universal Character Image Animation via Spatiotemporal In-Context Learning
   - 다양한 캐릭터에 대한 일반화 개선을 위해 in-context 학습 및 자가 부트스트랩 데이터 합성을 통해 운동 주입 거래 및 자세 사전의 한계를 해결하는 일반 캐릭터 애니메이션 프레임워크를 제시함.
   - [아카이브 링크](https://arxiv.org/abs/2601.21716)

7. DenseGRPO: From Sparse to Dense Reward for Flow Matching Model Alignment
   - 흐름 매칭 모델에서 희소 보상 문제를 해결하기 위해 중간 디노이징 단계를 위한 밀집 보상 및 적응형 탐색 보정을 도입함.
   - [아카이브 링크](https://arxiv.org/abs/2601.20218)

8. Pushing the Boundaries of Natural Reasoning: Interleaved Bonus from Formal-Logic Verification
   - 형식 논리 검증 안내 프레임워크가 자연어 생성과 상징적 검증을 동적으로 상호 연결하여 LLM의 추론 정확도를 높이고 오류를 줄임.
   - [아카이브 링크](https://arxiv.org/abs/2601.22642)

9. Routing the Lottery: Adaptive Subnetworks for Heterogeneous Data
   - 다양한 데이터 조건에 맞춘 여러 전문화된 서브네트워크를 발견하여 전통적인 가지치기 방법을 능가함.
   - [아카이브 링크](https://arxiv.org/abs/2601.22141)